# Version 1.1.0¶

**In Development**

## Legend for changelogs¶

Major Feature : something big that you couldn’t do before.

Feature : something that you couldn’t do before.

Efficiency : an existing feature now may not require as much computation or memory.

Enhancement : a miscellaneous minor improvement.

Fix : something that previously didn’t work as documentated – or according to reasonable expectations – should now work.

API Change : you will need to change your code to have the same effect in the future; or a feature will be removed in the future.

## Minimal dependencies¶

Version 1.1.0 of scikit-learn requires python 3.7+, numpy 1.14.6+ and scipy 1.1.0+. Optional minimal dependency is matplotlib 2.2.3+.

Put the changes in their relevant module.

## Changed models¶

The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.

Efficiency

`cluster.KMeans`

now defaults to`algorithm="lloyd"`

instead of`algorithm="auto"`

, which was equivalent to`algorithm="elkan"`

. Lloyd’s algorithm and Elkan’s algorithm converge to the same solution, up to numerical rounding errors, but in general Lloyd’s algorithm uses much less memory, and it is often faster.Fix The eigenvectors initialization for

`cluster.SpectralClustering`

and`manifold.SpectralEmbedding`

now samples from a Gaussian when using the`'amg'`

or`'lobpcg'`

solver. This change improves numerical stability of the solver, but may result in a different model.Fix

`feature_selection.f_regression`

and`feature_selection.r_regression`

will now returned finite score by default instead of`np.nan`

and`np.inf`

for some corner case. You can use`force_finite=False`

if you really want to get non-finite values and keep the old behavior.

## Changelog¶

Enhancement All scikit-learn models now generate a more informative error message when some input contains unexpected

`NaN`

or infinite values. In particular the message contains the input name (“X”, “y” or “sample_weight”) and if an unexpected`NaN`

value is found in`X`

, the error message suggests potential solutions. #21219 by Olivier Grisel.Enhancement All scikit-learn models now generate a more informative error message when setting invalid hyper-parameters with

`set_params`

. #21542 by Olivier Grisel.

`sklearn.calibration`

¶

Enhancement

`calibration.calibration_curve`

accepts a parameter`pos_label`

to specify the positive class label. #21032 by Guillaume Lemaitre.Enhancement

`CalibrationDisplay`

accepts a parameter`pos_label`

to add this information to the plot. #21038 by Guillaume Lemaitre.Enhancement

`calibration.CalibratedClassifierCV.fit`

now supports passing`fit_params`

, which are routed to the`base_estimator`

. #18170 by Benjamin Bossan.

`sklearn.cluster`

¶

Enhancement

`cluster.SpectralClustering`

and`cluster.spectral`

now include the new`'cluster_qr'`

method from`cluster.cluster_qr`

that clusters samples in the embedding space as an alternative to the existing`'kmeans'`

and`'discrete'`

methods. See`cluster.spectral_clustering`

for more details. #21148 by Andrew KnyazevEfficiency In

`cluster.KMeans`

, the default`algorithm`

is now`"lloyd"`

which is the full classical EM-style algorithm. Both`"auto"`

and`"full"`

are deprecated and will be removed in version 1.3. They are now aliases for`"lloyd"`

. The previous default was`"auto"`

, which relied on Elkan’s algorithm. Lloyd’s algorithm uses less memory than Elkan’s, it is faster on many datasets, and its results are identical, hence the change. #21735 by Aurélien Geron.Enhancement

`cluster.SpectralClustering`

now raises consistent error messages when passed invalid values for`n_clusters`

,`n_init`

,`gamma`

,`n_neighbors`

,`eigen_tol`

or`degree`

. #21881 by Hugo Vassard.

`sklearn.cross_decomposition`

¶

Enhancement

`cross_decomposition._PLS.inverse_transform`

now allows reconstruction of a`X`

target when a`Y`

parameter is given. #19680 by Robin Thibaut.API Change Adds get_feature_names_out to all transformers in the

`cross_decomposition`

module:`cross_decomposition.CCA`

,`cross_decomposition.PLSSVD`

,`cross_decomposition.PLSRegression`

, and`cross_decomposition.PLSCanonical`

. #22119 by Thomas Fan.

`sklearn.discriminant_analysis`

¶

API Change Adds get_feature_names_out to

`discriminant_analysis.LinearDiscriminantAnalysis`

. #22120 by Thomas Fan.

`sklearn.feature_extraction`

¶

Feature Added auto mode to

`feature_selection.SequentialFeatureSelection`

. If the argument`n_features_to_select`

is`'auto'`

, select features until the score improvement does not exceed the argument`tol`

. The default value of`n_features_to_select`

changed from`None`

to`‘warn’`

in 1.1 and will become`'auto'`

in 1.3.`None`

and`'warn'`

will be removed in 1.3. #20145 by @murata-yu.

`sklearn.feature_selection`

¶

Efficiency Improve runtime performance of

`feature_selection.chi2`

with boolean arrays. #22235 by Thomas Fan.

`sklearn.datasets`

¶

Enhancement

`datasets.make_swiss_roll`

now supports the optional argument hole; when set to True, it returns the swiss-hole dataset. #21482 by Sebastian Pujalte.Enhancement

`datasets.load_diabetes`

now accepts the parameter`scaled`

, to allow loading unscaled data. The scaled version of this dataset is now computed from the unscaled data, and can produce slightly different different results that in previous version (within a 1e-4 absolute tolerance). #16605 by Mandy Gu.Enhancement

`datasets.fetch_openml`

now has two optional arguments`n_retries`

and`delay`

. By default,`datasets.fetch_openml`

will retry 3 times in case of a network failure with a delay between each try. #21901 by Rileran.

`sklearn.decomposition`

¶

Enhancement

`decomposition.PCA`

exposes a parameter`n_oversamples`

to tune`sklearn.decomposition.randomized_svd`

and get accurate results when the number of features is large. #21109 by Smile.Enhancement

`decomposition.dict_learning`

,`decomposition.dict_learning_online`

and`decomposition.sparse_encode`

preserve dtype for`numpy.float32`

.`decomposition.DictionaryLearning`

,`decompsition.MiniBatchDictionaryLearning`

and`decomposition.SparseCoder`

preserve dtype for`numpy.float32`

. #22002 by Takeshi Oura.Enhancement

`decomposition.SparsePCA`

and`decomposition.MiniBatchSparsePCA`

preserve dtype for`numpy.float32`

. #22111 by Takeshi Oura.Enhancement

`decomposition.TruncatedSVD`

now allows n_components == n_features, if`algorithm='randomized'`

. #22181 by Zach Deane-Mayer.API Change Adds get_feature_names_out to all transformers in the

`decomposition`

module:`DictionaryLearning`

,`FactorAnalysis`

,`FastICA`

,`IncrementalPCA`

,`KernelPCA`

,`LatentDirichletAllocation`

,`MiniBatchDictionaryLearning`

,`MiniBatchSparsePCA`

,`NMF`

,`PCA`

,`SparsePCA`

, and`TruncatedSVD`

. #21334 by Thomas Fan.Fix

`decomposition.FastICA`

now validates input parameters in`fit`

instead of`__init__`

. #21432 by Hannah Bohle and Maren Westermann.Fix

`decomposition.FactorAnalysis`

now validates input parameters in`fit`

instead of`__init__`

. #21713 by Haya and Krum Arnaudov.Fix

`decomposition.KernelPCA`

now validates input parameters in`fit`

instead of`__init__`

. #21567 by Maggie Chege.

`sklearn.ensemble`

¶

Efficiency

`fit`

of`ensemble.BaseGradientBoosting`

now calls`check_array`

with parameter`force_all_finite=False`

for non initial warm-start runs as it has already been checked before. #22159 by Geoffrey ParisEnhancement

`ensemble.HistGradientBoostingClassifier`

is faster, for binary and in particular for multiclass problems thanks to the new private loss function module. #20811, #20567 and #21814 by Christian Lorentzen.API Change Changed the default of

`max_features`

to 1.0 for`ensemble.RandomForestRegressor`

and to`"sqrt"`

for`ensemble.RandomForestClassifier`

. Note that these give the same fit results as before, but are much easier to understand. The old default value`"auto"`

has been deprecated and will be removed in version 1.3. The same changes are also applied for`ensemble.ExtraTreesRegressor`

and`ensemble.ExtraTreesClassifier`

. #20803 by Brian Sun.

`sklearn.feature_selection`

¶

Efficiency Reduced memory usage of

`feature_selection.chi2`

. #21837 by Louis WagnerEfficiency Fitting a

`ensemble.RandomForestClassifier`

,`ensemble.RandomForestRegressor`

,`ensemble.ExtraTreesClassifier`

,`ensemble.ExtraTreesRegressor`

, and`ensemble.RandomTreesEmbedding`

is now faster in a multiprocessing setting, especially for subsequent fits with`warm_start`

enabled. #22106 by Pieter Gijsbers.

`sklearn.feature_extraction`

¶

API Change

`decomposition.FastICA`

now supports unit variance for whitening. The default value of its`whiten`

argument will change from`True`

(which behaves like`'arbitrary-variance'`

) to`'unit-variance'`

in version 1.3. #19490 by Facundo Ferrin and Julien Jerphanion.Fix

`feature_extraction.FeatureHasher`

now validates input parameters in`transform`

instead of`__init__`

. #21573 by Hannah Bohle and Maren Westermann.

`sklearn.feature_extraction.text`

¶

Fix

`feature_extraction.text.TfidfVectorizer`

now does not create a`feature_extraction.text.TfidfTransformer`

at`__init__`

as required by our API. #21832 by Guillaume Lemaitre.

`sklearn.feature_selection`

¶

Enhancement Add a parameter

`force_finite`

to`feature_selection.f_regression`

and`feature_selection.r_regression`

. This parameter allows to force the output to be finite in the case where a feature or a the target is constant or that the feature and target are perfectly correlated (only for the F-statistic). #17819 by Juan Carlos Alfaro Jiménez.

`sklearn.gaussian_process`

¶

Fix

`gaussian_process.GaussianProcessClassifier`

raises a more informative error if`CompoundKernel`

is passed via`kernel`

. #22223 by MarcoM.

`sklearn.impute`

¶

Enhancement Added support for

`pd.NA`

in`SimpleImputer`

. #21114 by Ying Xiong.API Change Adds

`get_feature_names_out`

to`impute.SimpleImputer`

,`impute.KNNImputer`

,`impute.IterativeImputer`

, and`impute.MissingIndicator`

. #21078 by Thomas Fan.API Change The

`verbose`

parameter was deprecated for`impute.SimpleImputer`

. A warning will always be raised upon the removal of empty columns. #21448 by Oleh Kozynets and Christian Ritter.Enhancement

`SimpleImputer`

now warns with feature names when features which are skipped due to the lack of any observed values in the training set. #21617 by Christian Ritter.Enhancement

`linear_model.RidgeClassifier`

is now supporting multilabel classification. #19689 by Guillaume Lemaitre.Enhancement

`linear_model.RidgeCV`

and`linear_model.RidgeClassifierCV`

now raise consistent error message when passed invalid values for`alphas`

. #21606 by Arturo Amor.Enhancement

`linear_model.Ridge`

and`linear_model.RidgeClassifier`

now raise consistent error message when passed invalid values for`alpha`

,`max_iter`

and`tol`

. #21341 by Arturo Amor.

`sklearn.linear_model`

¶

API Change

`linear_model.LassoLarsIC`

now exposes`noise_variance`

as a parameter in order to provide an estimate of the noise variance. This is particularly relevant when`n_features > n_samples`

and the estimator of the noise variance cannot be computed. #21481 by Guillaume Lemaitre.Enhancement

`orthogonal_mp_gram`

preservse dtype for`numpy.float32`

. #22002 by Takeshi OuraEnhancement

`linear_model.QuantileRegressor`

support sparse input for the highs based solvers. #21086 by Venkatachalam Natchiappan. In addition, those solvers now use the CSC matrix right from the beginning which speeds up fitting. #22206 by Christian Lorentzen.Enhancement Rename parameter

`base_estimator`

to`estimator`

in`linear_model.RANSACRegressor`

to improve readability and consistency.`base_estimator`

is deprecated and will be removed in 1.3. #22062 by Adrian Trujillo.Fix

`linear_model.LassoLarsIC`

now correctly computes AIC and BIC. An error is now raised when`n_features > n_samples`

and when the noise variance is not provided. #21481 by Guillaume Lemaitre and Andrés Babino.Enhancement

`linear_model.ElasticNet`

and and other linear model classes using coordinate descent show error messages when non-finite parameter weights are produced. #22148 by Christian Ritter and Norbert Preining.Fix

`linear_model.ElasticNetCV`

now produces correct warning when`l1_ratio=0`

. #21724 by Yar Khine Phyo.

`sklearn.metrics`

¶

Feature

`r2_score`

and`explained_variance_score`

have a new`force_finite`

parameter. Setting this parameter to`False`

will return the actual non-finite score in case of perfect predictions or constant`y_true`

, instead of the finite approximation (`1.0`

and`0.0`

respectively) currently returned by default. #17266 by Sylvain Marié.API Change

`metrics.DistanceMetric`

has been moved from`sklearn.neighbors`

to`sklearn.metric`

. Using`neighbors.DistanceMetric`

for imports is still valid for backward compatibility, but this alias will be removed in 1.3. #21177 by Julien Jerphanion.API Change Parameters

`sample_weight`

and`multioutput`

of`metrics. mean_absolute_percentage_error`

are now keyword-only, in accordance with SLEP009. A deprecation cycle was introduced. #21576 by Paul-Emile Dugnat.API Change The

`"wminkowski"`

metric of`sklearn.metrics.DistanceMetric`

is deprecated and will be removed in version 1.3. Instead the existing`"minkowski"`

metric now takes in an optional`w`

parameter for weights. This deprecation aims at remaining consistent with SciPy 1.8 convention. #21873 by Yar Khine PhyoFix

`metrics.silhouette_score`

now supports integer input for precomputed distances. #22108 by Thomas Fan.

`sklearn.manifold`

¶

Enhancement

`manifold.spectral_embedding`

and`manifold.SpectralEmbedding`

supports`np.float32`

dtype and will preserve this dtype. #21534 by Andrew Knyazev.Fix

`manifold.spectral_embedding`

now uses Gaussian instead of the previous uniform on [0, 1] random initial approximations to eigenvectors in eigen_solvers`lobpcg`

and`amg`

to improve their numerical stability. #21565 by Andrew Knyazev.

`sklearn.model_selection`

¶

Enhancement raise an error during cross-validation when the fits for all the splits failed. Similarly raise an error during grid-search when the fits for all the models and all the splits failed. #21026 by Loïc Estève.

Enhancement it is now possible to pass

`scoring="matthews_corrcoef"`

to all model selection tools with a`scoring`

argument to use the Matthews correlation coefficient (MCC). #22203 by Olivier Grisel.Fix

`model_selection.GridSearchCV`

,`model_selection.HalvingGridSearchCV`

now validate input parameters in`fit`

instead of`__init__`

. #21880 by Mrinal Tyagi.

`sklearn.mixture`

¶

Fix Fix a bug that correctly initialize

`precisions_cholesky_`

in`mixture.GaussianMixture`

when providing`precisions_init`

by taking its square root. #22058 by Guillaume Lemaitre.

`sklearn.neighbors`

¶

Enhancement

`utils.validation.check_array`

and`utils.validation.type_of_target`

now accept an`input_name`

parameter to make the error message more informative when passed invalid input data (e.g. with NaN or infinite values). #21219 by Olivier Grisel.Enhancement

`utils.validation.check_array`

returns a float ndarray with`np.nan`

when passed a`Float32`

or`Float64`

pandas extension array with`pd.NA`

. #21278 by Thomas Fan.Enhancement Adds get_feature_names_out to

`neighbors.RadiusNeighborsTransformer`

,`neighbors.KNeighborsTransformer`

and`neighbors.NeighborhoodComponentsAnalysis`

. #22212 by :user :`Meekail Zain <micky774>`

.Fix

`neighbors.KernelDensity`

now validates input parameters in`fit`

instead of`__init__`

. #21430 by Desislava Vasileva and Lucy Jimenez.

`sklearn.neural_network`

¶

Enhancement

`neural_network.MLPClassifier`

and`neural_network.MLPRegressor`

show error messages when optimizers produce non-finite parameter weights. #22150 by Christian Ritter and Norbert Preining.

`sklearn.pipeline`

¶

Enhancement Added support for “passthrough” in

`FeatureUnion`

. Setting a transformer to “passthrough” will pass the features unchanged. #20860 by Shubhraneel Pal.Fix

`pipeline.Pipeline`

now does not validate hyper-parameters in`__init__`

but in`.fit()`

. #21888 by iofall and Arisa Y..

`sklearn.preprocessing`

¶

Enhancement Adds a

`subsample`

parameter to`preprocessing.KBinsDiscretizer`

. This allows specifying a maximum number of samples to be used while fitting the model. The option is only available when`strategy`

is set to`quantile`

. #21445 by Felipe Bidu and Amanda Dsouza.Enhancement Added the

`get_feature_names_out`

method and a new parameter`feature_names_out`

to`preprocessing.FunctionTransformer`

. You can set`feature_names_out`

to ‘one-to-one’ to use the input features names as the output feature names, or you can set it to a callable that returns the output feature names. This is especially useful when the transformer changes the number of features. If`feature_names_out`

is None (which is the default), then`get_output_feature_names`

is not defined. #21569 by Aurélien Geron.Fix

`preprocessing.LabelBinarizer`

now validates input parameters in`fit`

instead of`__init__`

. #21434 by Krum Arnaudov.

`sklearn.random_projection`

¶

Enhancement

`random_projection.SparseRandomProjection`

and`random_projection.GaussianRandomProjection`

preserves dtype for`numpy.float32`

. #22114 by Takeshi Oura.API Change Adds get_feature_names_out to all transformers in the

`random_projection`

module:`GaussianRandomProjection`

and`SparseRandomProjection`

. #21330 by Loïc Estève.

`sklearn.svm`

¶

Enhancement

`svm.OneClassSVM`

,`svm.NuSVC`

,`svm.NuSVR`

,`svm.SVC`

and`svm.SVR`

now expose`n_iter_`

, the number of iterations of the libsvm optimization routine. #21408 by Juan Martín Loyola.Fix

`smv.NuSVC`

,`svm.NuSVR`

,`svm.SVC`

,`svm.SVR`

,`svm.OneClassSVM`

now validate input parameters in`fit`

instead of`__init__`

. #21436 by Haidar Almubarak.Enhancement

`svm.SVR`

,`svm.SVC`

,`svm.NuSVR`

,`svm.OneClassSVM`

,`svm.NuSVC`

now raise an error when the dual-gap estimation produce non-finite parameter weights. #22149 by Christian Ritter and Norbert Preining.

`sklearn.utils`

¶

Enhancement

`utils.estimator_html_repr`

shows a more helpful error message when running in a jupyter notebook that is not trusted. #21316 by Thomas Fan.Enhancement

`utils.estimator_html_repr`

displays an arrow on the top left corner of the HTML representation to show how the elements are clickable. #21298 by Thomas Fan.

## Code and Documentation Contributors¶

Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.0, including:

TODO: update at the time of the release.