Legend for changelogs¶
Major Feature : something big that you couldn’t do before.
Feature : something that you couldn’t do before.
Efficiency : an existing feature now may not require as much computation or memory.
Enhancement : a miscellaneous minor improvement.
Fix : something that previously didn’t work as documentated – or according to reasonable expectations – should now work.
API Change : you will need to change your code to have the same effect in the future; or a feature will be removed in the future.
Version 1.1.0 of scikit-learn requires python 3.7+, numpy 1.14.6+ and scipy 1.1.0+. Optional minimal dependency is matplotlib 2.2.3+.
Put the changes in their relevant module.
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
cluster.KMeansnow defaults to
algorithm="auto", which was equivalent to
algorithm="elkan". Lloyd’s algorithm and Elkan’s algorithm converge to the same solution, up to numerical rounding errors, but in general Lloyd’s algorithm uses much less memory, and it is often faster.
Fix The eigenvectors initialization for
manifold.SpectralEmbeddingnow samples from a Gaussian when using the
'lobpcg'solver. This change improves numerical stability of the solver, but may result in a different model.
feature_selection.r_regressionwill now returned finite score by default instead of
np.inffor some corner case. You can use
force_finite=Falseif you really want to get non-finite values and keep the old behavior.
Enhancement All scikit-learn models now generate a more informative error message when some input contains unexpected
NaNor infinite values. In particular the message contains the input name (“X”, “y” or “sample_weight”) and if an unexpected
NaNvalue is found in
X, the error message suggests potential solutions. #21219 by Olivier Grisel.
cluster.spectralnow include the new
cluster.cluster_qrthat clusters samples in the embedding space as an alternative to the existing
cluster.spectral_clusteringfor more details. #21148 by Andrew Knyazev
cluster.KMeans, the default
"lloyd"which is the full classical EM-style algorithm. Both
"full"are deprecated and will be removed in version 1.3. They are now aliases for
"lloyd". The previous default was
"auto", which relied on Elkan’s algorithm. Lloyd’s algorithm uses less memory than Elkan’s, it is faster on many datasets, and its results are identical, hence the change. #21735 by Aurélien Geron.
API Change Adds get_feature_names_out to all transformers in the
cross_decomposition.PLSCanonical. #22119 by Thomas Fan.
Feature Added auto mode to
feature_selection.SequentialFeatureSelection. If the argument
'auto', select features until the score improvement does not exceed the argument
tol. The default value of
‘warn’in 1.1 and will become
'warn'will be removed in 1.3. #20145 by @murata-yu.
datasets.load_diabetesnow accepts the parameter
scaled, to allow loading unscaled data. The scaled version of this dataset is now computed from the unscaled data, and can produce slightly different different results that in previous version (within a 1e-4 absolute tolerance). #16605 by Mandy Gu.
datasets.fetch_openmlnow has two optional arguments
delay. By default,
datasets.fetch_openmlwill retry 3 times in case of a network failure with a delay between each try. #21901 by Rileran.
decomposition.sparse_encodepreserve dtype for
decomposition.SparseCoderpreserve dtype for
numpy.float32. #22002 by Takeshi Oura.
API Change Adds get_feature_names_out to all transformers in the
TruncatedSVD. #21334 by Thomas Fan.
force_all_finite=Falsefor non initial warm-start runs as it has already been checked before. #22159 by Geoffrey Paris
ensemble.HistGradientBoostingClassifieris faster, for binary and in particular for multiclass problems thanks to the new private loss function module. #20811, #20567 and #21814 by Christian Lorentzen.
API Change Changed the default of
max_featuresto 1.0 for
ensemble.RandomForestClassifier. Note that these give the same fit results as before, but are much easier to understand. The old default value
"auto"has been deprecated and will be removed in version 1.3. The same changes are also applied for
ensemble.ExtraTreesClassifier. #20803 by Brian Sun.
Efficiency Fitting a
ensemble.RandomTreesEmbeddingis now faster in a multiprocessing setting, especially for subsequent fits with
warm_startenabled. #22106 by Pieter Gijsbers.
decomposition.FastICAnow supports unit variance for whitening. The default value of its
whitenargument will change from
True(which behaves like
'unit-variance'in version 1.3. #19490 by Facundo Ferrin and Julien Jerphanion.
Enhancement Add a parameter
feature_selection.r_regression. This parameter allows to force the output to be finite in the case where a feature or a the target is constant or that the feature and target are perfectly correlated (only for the F-statistic). #17819 by Juan Carlos Alfaro Jiménez.
noise_varianceas a parameter in order to provide an estimate of the noise variance. This is particularly relevant when
n_features > n_samplesand the estimator of the noise variance cannot be computed. #21481 by Guillaume Lemaitre.
linear_model.QuantileRegressorsupport sparse input for the highs based solvers. #21086 by Venkatachalam Natchiappan. In addition, those solvers now use the CSC matrix right from the beginning which speeds up fitting. #22206 by Christian Lorentzen.
Enhancement Rename parameter
linear_model.RANSACRegressorto improve readability and consistency.
base_estimatoris deprecated and will be removed in 1.3. #22062 by Adrian Trujillo.
linear_model.LassoLarsICnow correctly computes AIC and BIC. An error is now raised when
n_features > n_samplesand when the noise variance is not provided. #21481 by Guillaume Lemaitre and Andrés Babino.
linear_model.ElasticNetand and other linear model classes using coordinate descent show error messages when non-finite parameter weights are produced. #22148 by Christian Ritter and Norbert Preining.
explained_variance_scorehave a new
force_finiteparameter. Setting this parameter to
Falsewill return the actual non-finite score in case of perfect predictions or constant
y_true, instead of the finite approximation (
0.0respectively) currently returned by default. #17266 by Sylvain Marié.
metrics.DistanceMetrichas been moved from
neighbors.DistanceMetricfor imports is still valid for backward compatibility, but this alias will be removed in 1.3. #21177 by Julien Jerphanion.
API Change Parameters
metrics. mean_absolute_percentage_errorare now keyword-only, in accordance with SLEP009. A deprecation cycle was introduced. #21576 by Paul-Emile Dugnat.
API Change The
sklearn.metrics.DistanceMetricis deprecated and will be removed in version 1.3. Instead the existing
"minkowski"metric now takes in an optional
wparameter for weights. This deprecation aims at remaining consistent with SciPy 1.8 convention. #21873 by Yar Khine Phyo
manifold.spectral_embeddingnow uses Gaussian instead of the previous uniform on [0, 1] random initial approximations to eigenvectors in eigen_solvers
amgto improve their numerical stability. #21565 by Andrew Knyazev.
Enhancement raise an error during cross-validation when the fits for all the splits failed. Similarly raise an error during grid-search when the fits for all the models and all the splits failed. #21026 by Loïc Estève.
utils.validation.type_of_targetnow accept an
input_nameparameter to make the error message more informative when passed invalid input data (e.g. with NaN or infinite values). #21219 by Olivier Grisel.
Enhancement Adds get_feature_names_out to
neighbors.NeighborhoodComponentsAnalysis. #22212 by :user :
Meekail Zain <micky774>.
Enhancement Adds a
preprocessing.KBinsDiscretizer. This allows specifying a maximum number of samples to be used while fitting the model. The option is only available when
strategyis set to
quantile. #21445 by Felipe Bidu and Amanda Dsouza.
Enhancement Added the
get_feature_names_outmethod and a new parameter
preprocessing.FunctionTransformer. You can set
feature_names_outto ‘one-to-one’ to use the input features names as the output feature names, or you can set it to a callable that returns the output feature names. This is especially useful when the transformer changes the number of features. If
feature_names_outis None (which is the default), then
get_output_feature_namesis not defined. #21569 by Aurélien Geron.
svm.NuSVCnow raise an error when the dual-gap estimation produce non-finite parameter weights. #22149 by Christian Ritter and Norbert Preining.
Code and Documentation Contributors¶
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.0, including:
TODO: update at the time of the release.