# Version 1.2.0¶

**In Development**

## Legend for changelogs¶

Major Feature : something big that you couldn’t do before.

Feature : something that you couldn’t do before.

Efficiency : an existing feature now may not require as much computation or memory.

Enhancement : a miscellaneous minor improvement.

Fix : something that previously didn’t work as documentated – or according to reasonable expectations – should now work.

API Change : you will need to change your code to have the same effect in the future; or a feature will be removed in the future.

## Changed models¶

The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.

Enhancement The default

`eigen_tol`

for`cluster.SpectralClustering`

,`manifold.SpectralEmbedding`

,`cluster.spectral_clustering`

, and`manifold.spectral_embedding`

is now`None`

when using the`'amg'`

or`'lobpcg'`

solvers. This change improves numerical stability of the solver, but may result in a different model.Enhancement

`linear_model.GammaRegressor`

,`linear_model.PoissonRegressor`

and`linear_model.TweedieRegressor`

can reach higher precision with the lbfgs solver, in particular when`tol`

is set to a tiny value. Moreover,`verbose`

is now properly propagated to L-BFGS-B. #23619 by Christian Lorentzen.Fix Make sign of

`components_`

deterministic in`decomposition.SparsePCA`

. #23935 by Guillaume Lemaitre.Fix The

`components_`

signs in`decomposition.FastICA`

might differ. It is now consistent and deterministic with all SVD solvers. #22527 by Meekail Zain and Thomas Fan.Fix The condition for early stopping has now been changed in

`linear_model._sgd_fast._plain_sgd`

which is used by`linear_model.SGDRegressor`

and`linear_model.SGDClassifier`

. The old condition did not disambiguate between training and validation set and had an effect of overscaling the error tolerance. This has been fixed in #23798 by Harsh Agrawal

## Changes impacting all modules¶

Enhancement Finiteness checks (detection of NaN and infinite values) in all estimators are now significantly more efficient for float32 data by leveraging NumPy’s SIMD optimized primitives. #23446 by Meekail Zain

Enhancement Finiteness checks (detection of NaN and infinite values) in all estimators are now faster by utilizing a more efficient stop-on-first second-pass algorithm. #23197 by Meekail Zain

Enhancement Support for combinations of dense and sparse datasets pairs for all distance metrics and for float32 and float64 datasets has been added or has seen its performance improved for the following estimators:

#23604 and #23585 by Julien Jerphanion, Olivier Grisel, and Thomas Fan.

## Changelog¶

`sklearn.calibration`

¶

API Change Rename

`base_estimator`

to`estimator`

in`CalibratedClassifierCV`

to improve readability and consistency. The parameter`base_estimator`

is deprecated and will be removed in 1.4. #22054 by Kevin Roice.Efficiency Low-level routines for reductions on pairwise distances for dense float32 datasets have been refactored. The following functions and estimators now benefit from improved performances in terms of hardware scalability and speed-ups:

For instance

`sklearn.neighbors.NearestNeighbors.kneighbors`

and`sklearn.neighbors.NearestNeighbors.radius_neighbors`

can respectively be up to ×20 and ×5 faster than previously on a laptop.Moreover, implementations of those two algorithms are now suitable for machine with many cores, making them usable for datasets consisting of millions of samples.

`sklearn.cluster`

¶

Enhancement The

`predict`

and`fit_predict`

methods of`cluster.OPTICS`

now accept sparse data type for input data. #14736 by Hunt Zhan, #20802 by Brandon Pokorny, and #22965 by Meekail Zain.Enhancement

`cluster.Birch`

now preserves dtype for`numpy.float32`

inputs. #22968 by`Meekail Zain <micky774>`

.Enhancement

`cluster.KMeans`

and`cluster.MiniBatchKMeans`

now accept a new`'auto'`

option for`n_init`

which changes the number of random initializations to one when using`init='k-means++'`

for efficiency. This begins deprecation for the default values of`n_init`

in the two classes and both will have their defaults changed to`n_init='auto'`

in 1.4. #23038 by Meekail Zain.Enhancement

`cluster.SpectralClustering`

and`cluster.spectral_clustering`

now propogates the`eigen_tol`

parameter to all choices of`eigen_solver`

. Includes a new option`eigen_tol="auto"`

and begins deprecation to change the default from`eigen_tol=0`

to`eigen_tol="auto"`

in version 1.3. #23210 by Meekail Zain.API Change The

`affinity`

attribute is now deprecated for`cluster.AgglomerativeClustering`

and will be renamed to`metric`

in v1.4. #23470 by Meekail Zain.Fix

`cluster.KMeans`

now supports readonly attributes when predicting. #24258 by Thomas FanEfficiency

`cluster.KMeans`

with`algorithm="lloyd"`

is now faster and uses less memory. #24264 by Vincent Maladiere.

`sklearn.datasets`

¶

Enhancement Introduce the new parameter

`parser`

in`datasets.fetch_openml`

.`parser="pandas"`

allows to use the very CPU and memory efficient`pandas.read_csv`

parser to load dense ARFF formatted dataset files. It is possible to pass`parser="liac-arff"`

to use the old LIAC parser. When`parser="auto"`

, dense datasets are loaded with “pandas” and sparse datasets are loaded with “liac-arff”. Currently,`parser="liac-arff"`

by default and will change to`parser="auto"`

in version 1.4 #21938 by Guillaume Lemaitre.Enhancement

`datasets.dump_svmlight_file`

is now accelerated with a Cython implementation, providing 2-4x speedups. #23127 by Meekail Zain

`sklearn.decomposition`

¶

Efficiency

`decomposition.FastICA.fit`

has been optimised w.r.t its memory footprint and runtime. #22268 by MohamedBsh.Enhancement

`decomposition.SparsePCA`

and`decomposition.MiniBatchSparsePCA`

now implements an`inverse_transform`

function. #23905 by Guillaume Lemaitre.API Change The

`n_iter`

parameter of`decomposition.MiniBatchSparsePCA`

is deprecated and replaced by the parameters`max_iter`

,`tol`

, and`max_no_improvement`

to be consistent with`decomposition.MiniBatchDictionaryLearning`

.`n_iter`

will be removed in version 1.3. #23726 by Guillaume Lemaitre.Fix Make sign of

`components_`

deterministic in`decomposition.SparsePCA`

. #23935 by Guillaume Lemaitre.Enhancement

`decomposition.FastICA`

now allows the user to select how whitening is performed through the new`whiten_solver`

parameter, which supports`svd`

and`eigh`

.`whiten_solver`

defaults to`svd`

although`eigh`

may be faster and more memory efficient in cases where`num_features > num_samples`

. #11860 by Pierre Ablin, #22527 by Meekail Zain and Thomas Fan.Enhancement

`decomposition.LatentDirichletAllocation`

now preserves dtype for`numpy.float32`

input. #24528 by Takeshi Oura and Jérémie du Boisberranger.API Change The

`n_features_`

attribute of`decomposition.PCA`

is deprecated in favor of`n_features_in_`

and will be removed in 1.4. #24421 by Kshitij Mathur.

`sklearn.discriminant_analysis`

¶

Major Feature

`discriminant_analysis.LinearDiscriminantAnalysis`

now supports the Array API for`solver="svd"`

. Array API support is considered experimental and might evolve without being subjected to our usual rolling deprecation cycle policy. See Array API support (experimental) for more details. #22554 by Thomas Fan.Fix Validate parameters only in

`fit`

and not in`__init__`

for`discriminant_analysis.QuadraticDiscriminantAnalysis`

. #24218 by Stefanie Molin.

`sklearn.ensemble`

¶

Feature Adds

`class_weight`

to`ensemble.HistGradientBoostingClassifier`

. #22014 by Thomas Fan.Efficiency Improve runtime performance of

`ensemble.IsolationForest`

by avoiding data copies. #23252 by Zhehao Liu.Enhancement

`ensemble.StackingClassifier`

now supports multilabel-indicator target #24146 by Nicolas Peretti, Nestor Navarro, Nati Tomattis, and Vincent Maladiere.Fix Fixed the issue where

`ensemble.AdaBoostClassifier`

outputs NaN in feature importance when fitted with very small sample weight. #20415 by Zhehao Liu.API Change Rename the constructor parameter

`base_estimator`

to`estimator`

in the following classes:`ensemble.BaggingClassifier`

,`ensemble.BaggingRegressor`

,`ensemble.AdaBoostClassifier`

,`ensemble.AdaBoostRegressor`

.`base_estimator`

is deprecated in 1.2 and will be removed in 1.4. #23819 by Adrian Trujillo and Edoardo Abati.API Change Rename the fitted attribute

`base_estimator_`

to`estimator_`

in the following classes:`ensemble.BaggingClassifier`

,`ensemble.BaggingRegressor`

,`ensemble.AdaBoostClassifier`

,`ensemble.AdaBoostRegressor`

,`ensemble.RandomForestClassifier`

,`ensemble.RandomForestRegressor`

,`ensemble.ExtraTreesClassifier`

,`ensemble.ExtraTreesRegressor`

,`ensemble.RandomTreesEmbedding`

,`ensemble.IsolationForest`

.`base_estimator_`

is deprecated in 1.2 and will be removed in 1.4. #23819 by Adrian Trujillo and Edoardo Abati.

`sklearn.feature_selection`

¶

`sklearn.gaussian_process`

¶

Fix Fix

`gaussian_process.kernels.Matern`

gradient computation with`nu=0.5`

for PyPy (and possibly other non CPython interpreters). #24245 by Loïc Estève.Fix The

`fit`

method of`gaussian_process.GaussianProcessRegressor`

will not modify the input X in case a custom kernel is used, with a`diag`

method that returns part of the input X. #24405 by Omar Salman.

`sklearn.kernel_approximation`

¶

Enhancement

`kernel_approximation.RBFSampler`

now preserves dtype for`numpy.float32`

inputs. #24317 by`Tim Head <betatim>`

.Enhancement

`kernel_approximation.SkewedChi2Sampler`

now preserves dtype for`numpy.float32`

inputs. #24350 by Rahil Parikh.

`sklearn.linear_model`

¶

Enhancement

`linear_model.GammaRegressor`

,`linear_model.PoissonRegressor`

and`linear_model.TweedieRegressor`

can reach higher precision with the lbfgs solver, in particular when`tol`

is set to a tiny value. Moreover,`verbose`

is now properly propagated to L-BFGS-B. #23619 by Christian Lorentzen.API Change The default value for the

`solver`

parameter in`linear_model.QuantileRegressor`

will change from`"interior-point"`

to`"highs"`

in version 1.4. #23637 by Guillaume Lemaitre.API Change String option

`"none"`

is deprecated for`penalty`

argument in`linear_model.LogisticRegression`

, and will be removed in version 1.4. Use`None`

instead. #23877 by Zhehao Liu.Fix

`linear_model.SGDOneClassSVM`

no longer performs parameter validation in the constructor. All validation is now handled in`fit()`

and`partial_fit()`

. #24433 by Yogendrasingh, Arisa Y. and Tim Head.Fix Fix average loss calculation when early stopping is enabled in

`linear_model.SGDRegressor`

and`linear_model.SGDClassifier`

. Also updated the condition for early stopping accordingly. #23798 by Harsh Agrawal.

`sklearn.manifold`

¶

Feature Adds option to use the normalized stress in

`manifold.MDS`

. This is enabled by setting the new`normalize`

parameter to`True`

. #10168 by Łukasz Borchmann, #12285 by Matthias Miltenberger, #13042 by Matthieu Parizy, #18094 by Roth E Conrad and #22562 by Meekail Zain.Enhancement Adds

`eigen_tol`

parameter to`manifold.SpectralEmbedding`

. Both`manifold.spectral_embedding`

and`manifold.SpectralEmbedding`

now propogate`eigen_tol`

to all choices of`eigen_solver`

. Includes a new option`eigen_tol="auto"`

and begins deprecation to change the default from`eigen_tol=0`

to`eigen_tol="auto"`

in version 1.3. #23210 by Meekail Zain.

`sklearn.metrics`

¶

Feature

`metrics.ConfusionMatrixDisplay.from_estimator`

,`metrics.ConfusionMatrixDisplay.from_predictions`

, and`metrics.ConfusionMatrixDisplay.plot`

accepts a`text_kw`

parameter which is passed to matplotlib’s`text`

function. #24051 by Thomas Fan.Feature

`metrics.class_likelihood_ratios`

is added to compute the positive and negative likelihood ratios derived from the confusion matrix of a binary classification problem. #22518 by Arturo Amor.- Fix Allows
`csr_matrix`

as input for parameter:`y_true`

of the

`metrics.label_ranking_average_precision_score`

metric. #23442 by Sean Atukorala

- Fix Allows
Fix

`metrics.ndcg_score`

will now trigger a warning when the`y_true`

value contains a negative value. Users may still use negative values, but the result may not be between 0 and 1. Starting in v1.4, passing in negative values for`y_true`

will raise an error. #22710 by Conroy Trinh and #23461 by Meekail Zain.Fix

`metrics.log_loss`

with`eps=0`

now returns a correct value of 0 or`np.inf`

instead of`nan`

for predictions at the boundaries (0 or 1). It also accepts integer input. #24365 by Christian Lorentzen.Feature

`metrics.roc_auc_score`

now supports micro-averaging (`average="micro"`

) for the One-vs-Rest multiclass case (`multi_class="ovr"`

). #24338 by Arturo Amor.

`sklearn.model_selection`

¶

Fix For all

`SearchCV`

classes and scipy >= 1.10, rank corresponding to a nan score is correctly set to the maximum possible rank, rather than`np.iinfo(np.int32).min`

. #24141 by Loïc Estève.

`sklearn.multioutput`

¶

Feature Added boolean

`verbose`

flag to classes:`multioutput.ClassifierChain`

and`multioutput.RegressorChain`

. #23977 by Eric Fiegel, Chiara Marmo, Lucy Liu, and Guillaume Lemaitre.

`sklearn.naive_bayes`

¶

Feature Add methods

`predict_joint_log_proba`

to all naive Bayes classifiers. #23683 by Andrey Melnik.Enhancement A new parameter

`force_alpha`

was added to`naive_bayes.BernoulliNB`

,`naive_bayes.ComplementNB`

,`naive_bayes.CategoricalNB`

, and`naive_bayes.MultinomialNB`

, allowing user to set parameter alpha to a very small number, greater or equal 0, which was earlier automatically changed to`1e-10`

instead. #16747 by @arka204, #18805 by @hongshaoyang, #22269 by Meekail Zain.

`sklearn.neighbors`

¶

Enhancement

`neighbors.KernelDensity`

bandwidth parameter now accepts definition using Scott’s and Silverman’s estimation methods. #10468 by Ruben and #22993 by Jovan Stojanovic.Feature Adds new function

`neighbors.sort_graph_by_row_values`

to sort a CSR sparse graph such that each row is stored with increasing values. This is useful to improve efficiency when using precomputed sparse distance matrices in a variety of estimators and avoid an`EfficiencyWarning`

. #23139 by Tom Dupre la Tour.Fix

`NearestCentroid`

now raises an informative error message at fit-time instead of failing with a low-level error message at predict-time. #23874 by Juan Gomez.Fix Set

`n_jobs=None`

by default (instead of`1`

) for`neighbors.KNeighborsTransformer`

and`neighbors.RadiusNeighborsTransformer`

. #24075 by Valentin Laurent.

`sklearn.pipeline`

¶

Enhancement

`pipeline.FeatureUnion.get_feature_names_out`

can now be used when one of the transformers in the`pipeline.FeatureUnion`

is`"passthrough"`

. #24058 by Diederik Perdok

`sklearn.preprocessing`

¶

Enhancement

`preprocessing.FunctionTransformer`

will always try to set`n_features_in_`

and`feature_names_in_`

regardless of the`validate`

parameter. #23993 by Thomas Fan.API Change The

`sparse`

parameter of`preprocessing.OneHotEncoder`

is now deprecated and will be removed in version 1.4. Use`sparse_output`

instead. #24412 by Rushil Desai.

`sklearn.svm`

¶

API Change The

`class_weight_`

attribute is now deprecated for`svm.NuSVR`

,`svm.SVR`

,`svm.OneClassSVM`

. #22898 by Meekail Zain.

`sklearn.tree`

¶

Enhancement

`tree.plot_tree`

,`tree.export_graphviz`

now uses a lower case`x[i]`

to represent feature`i`

. #23480 by Thomas Fan.

`sklearn.utils`

¶

Enhancement

`utils.extmath.randomized_svd`

now accepts an argument,`lapack_svd_driver`

, to specify the lapack driver used in the internal deterministic SVD used by the randomized SVD algorithm. #20617 by Srinath KailasaFix

`utils.multiclass.type_of_target`

now properly handles sparse matrices. #14862 by Léonard Binet.Feature A new module exposes development tools to discover estimators (i.e.

`utils.discovery.all_estimators`

), displays (i.e.`utils.discovery.all_displays`

) and functions (i.e.`utils.discovery.all_functions`

) in scikit-learn. #21469 by Guillaume Lemaitre.

## Code and Documentation Contributors¶

Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.1, including:

TODO: update at the time of the release.