Enhancement The error message is improved when importing
impute.IterativeImputerwithout importing the experimental flag. #23194 by Thomas Fan.
For a short description of the main highlights of the release, please refer to Release Highlights for scikit-learn 1.1.
Legend for changelogs¶
Major Feature : something big that you couldn’t do before.
Feature : something that you couldn’t do before.
Efficiency : an existing feature now may not require as much computation or memory.
Enhancement : a miscellaneous minor improvement.
Fix : something that previously didn’t work as documentated – or according to reasonable expectations – should now work.
API Change : you will need to change your code to have the same effect in the future; or a feature will be removed in the future.
Version 1.1.0 of scikit-learn requires python 3.8+, numpy 1.17.3+ and scipy 1.3.2+. Optional minimal dependency is matplotlib 3.1.2+.
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
cluster.KMeansnow defaults to
algorithm="auto", which was equivalent to
algorithm="elkan". Lloyd’s algorithm and Elkan’s algorithm converge to the same solution, up to numerical rounding errors, but in general Lloyd’s algorithm uses much less memory, and it is often faster.
ensemble.GradientBoostingRegressoris on average 15% faster than in previous versions thanks to a new sort algorithm to find the best split. Models might be different because of a different handling of splits with tied criterion values: both the old and the new sorting algorithm are unstable sorting algorithms. #22868 by Thomas Fan.
Fix The eigenvectors initialization for
manifold.SpectralEmbeddingnow samples from a Gaussian when using the
'lobpcg'solver. This change improves numerical stability of the solver, but may result in a different model.
feature_selection.r_regressionwill now returned finite score by default instead of
np.inffor some corner case. You can use
force_finite=Falseif you really want to get non-finite values and keep the old behavior.
Fix Panda’s DataFrames with all non-string columns such as a MultiIndex no longer warns when passed into an Estimator. Estimators will continue to ignore the column names in DataFrames with non-string columns. For
feature_names_in_to be defined, columns must be all strings. #22410 by Thomas Fan.
preprocessing.KBinsDiscretizerchanged handling of bin edges slightly, which might result in a different encoding with the same data.
calibration.calibration_curvechanged handling of bin edges slightly, which might result in a different output curve given the same data.
discriminant_analysis.LinearDiscriminantAnalysisnow uses the correct variance-scaling coefficient which may result in different model behavior.
feature_selection.SelectFromModel.partial_fitcan now be called with
estimators_will be a deep copy of
prefit=True. #23271 by Guillaume Lemaitre.
Efficiency Low-level routines for reductions on pairwise distances for dense float64 datasets have been refactored. The following functions and estimators now benefit from improved performances in terms of hardware scalability and speed-ups:
sklearn.neighbors.NearestNeighbors.radius_neighborscan respectively be up to ×20 and ×5 faster than previously.
Enhancement All scikit-learn models now generate a more informative error message when some input contains unexpected
NaNor infinite values. In particular the message contains the input name (“X”, “y” or “sample_weight”) and if an unexpected
NaNvalue is found in
X, the error message suggests potential solutions. #21219 by Olivier Grisel.
API Change The option for using the log loss, aka binomial or multinomial deviance, via the
lossparameters was made more consistent. The preferred way is by setting the value to
"log_loss". Old option names are still valid and produce the same models, but are deprecated and will be removed in version 1.3.
lossparameter names “auto”, “binary_crossentropy” and “categorical_crossentropy” are deprecated in favor of the new name “log_loss”, which is now the default. #23040 by Christian Lorentzen.
API Change Rich html representation of estimators is now enabled by default in Jupyter notebooks. It can be deactivated by setting
sklearn.set_config. #22856 by Jérémie du Boisberranger.
normalizeparameter is now deprecated and will be removed in version 1.3. It is recommended that a proper probability (i.e. a classifier’s predict_proba positive class) is used for
y_prob. #23095 by Jordan Silke.
cluster.spectral_clusteringnow include the new
'cluster_qr'method that clusters samples in the embedding space as an alternative to the existing
cluster.spectral_clusteringfor more details. #21148 by Andrew Knyazev.
cluster.AffinityPropagationnow returns cluster centers and labels if they exist, even if the model has not fully converged. When returning these potentially-degenerate cluster centers and labels, a new warning message is shown. If no cluster centers were constructed, then the cluster centers remain an empty list with labels set to
-1and the original warning message is shown. #22217 by Meekail Zain.
cluster.KMeans, the default
"lloyd"which is the full classical EM-style algorithm. Both
"full"are deprecated and will be removed in version 1.3. They are now aliases for
"lloyd". The previous default was
"auto", which relied on Elkan’s algorithm. Lloyd’s algorithm uses less memory than Elkan’s, it is faster on many datasets, and its results are identical, hence the change. #21735 by Aurélien Geron.
cross_decomposition.CCAnow allows reconstruction of a
Xtarget when a
Yparameter is given. #19680 by Robin Thibaut.
Enhancement Adds get_feature_names_out to all transformers in the
cross_decomposition.PLSCanonical. #22119 by Thomas Fan.
Fix The shape of the coef_ attribute of
cross_decomposition.PLSRegressionwill change in version 1.3, from
(n_targets, n_features), to be consistent with other linear models and to make it work with interface expecting a specific shape for
feature_selection.RFE). #22016 by Guillaume Lemaitre.
API Change add the fitted attribute
cross_decomposition.CCA. The method
predictis indeed equivalent to
Y = X @ coef_ + intercept_. #22015 by Guillaume Lemaitre.
datasets.load_diabetesnow accepts the parameter
scaled, to allow loading unscaled data. The scaled version of this dataset is now computed from the unscaled data, and can produce slightly different results that in previous version (within a 1e-4 absolute tolerance). #16605 by Mandy Gu.
datasets.fetch_openmlnow has two optional arguments
delay. By default,
datasets.fetch_openmlwill retry 3 times in case of a network failure with a delay between each try. #21901 by Rileran.
datasets.make_sparse_coded_signalnow accepts a parameter
data_transposedto explicitly specify the shape of matrix
X. The default behavior
Trueis to return a transposed matrix
Xcorresponding to a
(n_features, n_samples)shape. The default value will change to
Falsein version 1.3. #21425 by Gabriel Stefanini Vicente.
Major Feature Added a new estimator
decomposition.MiniBatchNMF. It is a faster but less accurate version of non-negative matrix factorization, better suited for large datasets. #16948 by Chiara Marmo, Patricio Cerda and Jérémie du Boisberranger.
decomposition.sparse_encodepreserve dtype for
decomposition.SparseCoderpreserve dtype for
numpy.float32. #22002 by Takeshi Oura.
decomposition.dict_learning_onlinehave been refactored and now have a stopping criterion based on a small change of the dictionary or objective function, controlled by the new
max_no_improvementparameters. In addition, some of their parameters and attributes are deprecated.
n_iterparameter of both is deprecated. Use
decomposition.dict_learning_onlineserve internal purpose and are deprecated.
decomposition.MiniBatchDictionaryLearningserve internal purpose and are deprecated.
the default value of the
batch_sizeparameter of both will change from 3 to 256 in version 1.3.
Enhancement Adds get_feature_names_out to all transformers in the
decomposition.TruncatedSVD. #21334 by Thomas Fan.
decomposition.TruncatedSVDexposes the parameter
utils.randomized_svdand get accurate results when the number of features is large, the rank of the matrix is high, or other features of the matrix make low rank approximation difficult. #21705 by Jay S. Stanley III.
decomposition.PCAexposes the parameter
utils.randomized_svdand get more accurate results when low rank approximation is difficult. #21705 by Jay S. Stanley III.
np.float32data without silent upcasting. The dtype is preserved by
fit_transformand the main fitted attributes use a dtype of the same precision as the training data. #22806 by Jihane Bennis and Olivier Grisel.
decomposition.IncrementalPCAmore safely calculate precision using the inverse of the covariance matrix if
self.noise_variance_is zero. #22300 by Meekail Zain and #15948 by @sysuresh.
decomposition.FastICAnow supports unit variance for whitening. The default value of its
whitenargument will change from
True(which behaves like
'unit-variance'in version 1.3. #19490 by Facundo Ferrin and Julien Jerphanion.
Major Feature Added additional option
ensemble.HistGradientBoostingRegressorfor modelling quantiles. The quantile level can be specified with the new parameter
quantile. #21800 and #20567 by Christian Lorentzen.
force_all_finite=Falsefor non initial warm-start runs as it has already been checked before. #22159 by Geoffrey Paris.
ensemble.HistGradientBoostingClassifieris faster, for binary and in particular for multiclass problems thanks to the new private loss function module. #20811, #20567 and #21814 by Christian Lorentzen.
ensemble.RandomTreesEmbeddingnow has an informative get_feature_names_out function that includes both tree index and leaf index in the output feature names. #21762 by Zhehao Liu and Thomas Fan.
Efficiency Fitting a
ensemble.RandomTreesEmbeddingis now faster in a multiprocessing setting, especially for subsequent fits with
warm_startenabled. #22106 by Pieter Gijsbers.
Fix Change the parameter
ensemble.GradientBoostingRegressorso that an error is raised if anything other than a float is passed in as an argument. #21632 by Genesis Valencia.
Fix Removed a potential source of CPU oversubscription in
ensemble.HistGradientBoostingRegressorwhen CPU resource usage is limited, for instance using cgroups quota in a docker container. #22566 by Jérémie du Boisberranger.
ensemble.HistGradientBoostingRegressorno longer warns when fitting on a pandas DataFrame with a non-default
scoringparameter and early_stopping enabled. #22908 by Thomas Fan.
API Change Changed the default of
max_featuresto 1.0 for
ensemble.RandomForestClassifier. Note that these give the same fit results as before, but are much easier to understand. The old default value
"auto"has been deprecated and will be removed in version 1.3. The same changes are also applied for
ensemble.ExtraTreesClassifier. #20803 by Brian Sun.
Feature Added auto mode to
feature_selection.SequentialFeatureSelector. If the argument
'auto', select features until the score improvement does not exceed the argument
tol. The default value of
'warn'in 1.1 and will become
'warn'will be removed in 1.3. #20145 by murata-yu.
Feature Added the ability to pass callables to the
feature_selection.SelectFromModel. Also introduced new attribute
max_features_which is inferred from
max_featuresand the data during
max_featuresis an integer, then
max_features_ = max_features. If
max_featuresis a callable, then
max_features_ = max_features(X). #22356 by Meekail Zain.
Enhancement Add a parameter
feature_selection.r_regression. This parameter allows to force the output to be finite in the case where a feature or a the target is constant or that the feature and target are perfectly correlated (only for the F-statistic). #17819 by Juan Carlos Alfaro Jiménez.
gaussian_process.GaussianProcessRegressornow return arrays of the correct shape in single-target and multi-target cases, and for both
normalize_y=True. #22199 by Guillaume Lemaitre, Aidar Shakerimoff and Tenavi Nakamura-Zimmerer.
kindto accept a list of strings to specify which type of plot to draw for each feature interaction. #19438 by Guillaume Lemaitre.
inspection.plot_partial_dependencenow support plotting centered Individual Conditional Expectation (cICE) and centered PDP curves controlled by setting the parameter
centered. #18310 by Johannes Elfner and Guillaume Lemaitre.
linear_model.QuantileRegressorsupport sparse input for the highs based solvers. #21086 by Venkatachalam Natchiappan. In addition, those solvers now use the CSC matrix right from the beginning which speeds up fitting. #22206 by Christian Lorentzen.
linear_model.LogisticRegressionis faster for
solver="newton-cg", for binary and in particular for multiclass problems thanks to the new private loss function module. In the multiclass case, the memory consumption has also been reduced for these solvers as the target is now label encoded (mapped to integers) instead of label binarized (one-hot encoded). The more classes, the larger the benefit. #21808, #20567 and #21814 by Christian Lorentzen.
Enhancement Rename parameter
linear_model.RANSACRegressorto improve readability and consistency.
base_estimatoris deprecated and will be removed in 1.3. #22062 by Adrian Trujillo.
linear_model.ElasticNetand and other linear model classes using coordinate descent show error messages when non-finite parameter weights are produced. #22148 by Christian Ritter and Norbert Preining.
linear_model.LassoLarsICnow correctly computes AIC and BIC. An error is now raised when
n_features > n_samplesand when the noise variance is not provided. #21481 by Guillaume Lemaitre and Andrés Babino.
linear_model.LogisticRegressionCVnow set the
n_iter_attribute with a shape that respects the docstring and that is consistent with the shape obtained when using the other solvers in the one-vs-rest setting. Previously, it would record only the maximum of the number of iterations for each binary sub-problem while now all of them are recorded. #21998 by Olivier Grisel.
Fix The property
linear_model.TweedieRegressoris not validated in
__init__anymore. Instead, this (private) property is deprecated in
linear_model.TweedieRegressor, and will be removed in 1.3. #22548 by Christian Lorentzen.
solver="lbfgs"are now correctly computed in the presence of sample weights when the input is sparse. #22899 by Jérémie du Boisberranger.
noise_varianceas a parameter in order to provide an estimate of the noise variance. This is particularly relevant when
n_features > n_samplesand the estimator of the noise variance cannot be computed. #21481 by Guillaume Lemaitre.
manifold.spectral_embeddingnow uses Gaussian instead of the previous uniform on [0, 1] random initial approximations to eigenvectors in eigen_solvers
amgto improve their numerical stability. #21565 by Andrew Knyazev.
metrics.explained_variance_scorehave a new
force_finiteparameter. Setting this parameter to
Falsewill return the actual non-finite score in case of perfect predictions or constant
y_true, instead of the finite approximation (
0.0respectively) currently returned by default. #17266 by Sylvain Marié.
metrics.d2_absolute_error_scorecalculate the \(D^2\) regression score for the pinball loss and the absolute error respectively.
metrics.d2_absolute_error_scoreis a special case of
metrics.d2_pinball_scorewith a fixed quantile parameter
alpha=0.5for ease of use and discovery. The \(D^2\) scores are generalizations of the
r2_scoreand can be interpeted as the fraction of deviance explained. #22118 by Ohad Michel.
im_kwparameter is passed to the
matplotlib.pyplot.imshowcall when plotting the confusion matrix. #20753 by Thomas Fan.
API Change Parameters
metrics.mean_absolute_percentage_errorare now keyword-only, in accordance with SLEP009. A deprecation cycle was introduced. #21576 by Paul-Emile Dugnat.
API Change The
metrics.DistanceMetricis deprecated and will be removed in version 1.3. Instead the existing
"minkowski"metric now takes in an optional
wparameter for weights. This deprecation aims at remaining consistent with SciPy 1.8 convention. #21873 by Yar Khine Phyo.
metrics.DistanceMetrichas been moved from
neighbors.DistanceMetricfor imports is still valid for backward compatibility, but this alias will be removed in 1.3. #21177 by Julien Jerphanion.
Enhancement raise an error during cross-validation when the fits for all the splits failed. Similarly raise an error during grid-search when the fits for all the models and all the splits failed. #21026 by Loïc Estève.
neighbors.KNeighborsRegressor.predictnow works properly when given an array-like input if
KNeighborsRegressoris first constructed with a callable passed to the
weightsparameter. #22687 by Meekail Zain.
preprocessing.OneHotEncodernow supports grouping infrequent categories into a single feature. Grouping infrequent categories is enabled by specifying how to select infrequent categories with
max_categories. #16018 by Thomas Fan.
Enhancement Adds a
preprocessing.KBinsDiscretizer. This allows specifying a maximum number of samples to be used while fitting the model. The option is only available when
strategyis set to
quantile. #21445 by Felipe Bidu and Amanda Dsouza.
Enhancement Added the
get_feature_names_outmethod and a new parameter
preprocessing.FunctionTransformer. You can set
feature_names_outto ‘one-to-one’ to use the input features names as the output feature names, or you can set it to a callable that returns the output feature names. This is especially useful when the transformer changes the number of features. If
feature_names_outis None (which is the default), then
get_output_feature_namesis not defined. #21569 by Aurélien Geron.
encode="ordinal". #22735 by Thomas Fan.
Enhancement Adds an
inverse_transformmethod and a
random_projection.SparseRandomProjection. When the parameter is set to True, the pseudo-inverse of the components is computed during
fitand stored as
inverse_components_. #21701 by Aurélien Geron.
Enhancement Adds get_feature_names_out to all transformers in the
random_projection.SparseRandomProjection. #21330 by Loïc Estève.
svm.NuSVCnow raise an error when the dual-gap estimation produce non-finite parameter weights. #22149 by Christian Ritter and Norbert Preining.
API Change Changed the default value of
max_featuresto 1.0 for
tree.ExtraTreeClassifier, which will not change the fit result. The original default value
"auto"has been deprecated and will be removed in version 1.3. Setting
"auto"is also deprecated for
tree.DecisionTreeRegressor. #22476 by Zhehao Liu.
utils.multiclass.type_of_targetnow accept an
input_nameparameter to make the error message more informative when passed invalid input data (e.g. with NaN or infinite values). #21219 by Olivier Grisel.
dtype=Nonereturns numeric arrays when passed in a pandas DataFrame with mixed dtypes.
dtype="numeric"will also make better infer the dtype when the DataFrame has mixed dtypes. #22237 by Thomas Fan.
Fix Changes the error message of the
utils.check_X_ywhen y is None so that it is compatible with the
check_requires_y_noneestimator check. #22578 by Claudio Salvatore Arcidiacono.
utils.class_weight.compute_class_weightnow only requires that all classes in
yhave a weight in
class_weight. An error is still raised when a class is present in
ybut not in
class_weight. #22595 by Thomas Fan.
Code and Documentation Contributors¶
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.0, including:
2357juan, Abhishek Gupta, adamgonzo, Adam Li, adijohar, Aditya Kumawat, Aditya Raghuwanshi, Aditya Singh, Adrian Trujillo Duron, Adrin Jalali, ahmadjubair33, AJ Druck, aj-white, Alan Peixinho, Alberto Mario Ceballos-Arroyo, Alek Lefebvre, Alex, Alexandr, Alexandre Gramfort, alexanmv, almeidayoel, Amanda Dsouza, Aman Sharma, Amar pratap singh, Amit, amrcode, András Simon, Andreas Grivas, Andreas Mueller, Andrew Knyazev, Andriy, Angus L’Herrou, Ankit Sharma, Anne Ducout, Arisa, Arth, arthurmello, Arturo Amor, ArturoAmor, Atharva Patil, aufarkari, Aurélien Geron, avm19, Ayan Bag, baam, Bardiya Ak, Behrouz B, Ben3940, Benjamin Bossan, Bharat Raghunathan, Bijil Subhash, bmreiniger, Brandon Truth, Brenden Kadota, Brian Sun, cdrig, Chalmer Lowe, Chiara Marmo, Chitteti Srinath Reddy, Chloe-Agathe Azencott, Christian Lorentzen, Christian Ritter, christopherlim98, Christoph T. Weidemann, Christos Aridas, Claudio Salvatore Arcidiacono, combscCode, Daniela Fernandes, darioka, Darren Nguyen, Dave Eargle, David Gilbertson, David Poznik, Dea María Léon, Dennis Osei, DessyVV, Dev514, Dimitri Papadopoulos Orfanos, Diwakar Gupta, Dr. Felix M. Riese, drskd, Emiko Sano, Emmanouil Gionanidis, EricEllwanger, Erich Schubert, Eric Larson, Eric Ndirangu, ErmolaevPA, Estefania Barreto-Ojeda, eyast, Fatima GASMI, Federico Luna, Felix Glushchenkov, fkaren27, Fortune Uwha, FPGAwesome, francoisgoupil, Frans Larsson, ftorres16, Gabor Berei, Gabor Kertesz, Gabriel Stefanini Vicente, Gabriel S Vicente, Gael Varoquaux, GAURAV CHOUDHARY, Gauthier I, genvalen, Geoffrey-Paris, Giancarlo Pablo, glennfrutiz, gpapadok, Guillaume Lemaitre, Guillermo Tomás Fernández Martín, Gustavo Oliveira, Haidar Almubarak, Hannah Bohle, Hansin Ahuja, Haoyin Xu, Haya, Helder Geovane Gomes de Lima, henrymooresc, Hideaki Imamura, Himanshu Kumar, Hind-M, hmasdev, hvassard, i-aki-y, iasoon, Inclusive Coding Bot, Ingela, iofall, Ishan Kumar, Jack Liu, Jake Cowton, jalexand3r, J Alexander, Jauhar, Jaya Surya Kommireddy, Jay Stanley, Jeff Hale, je-kr, JElfner, Jenny Vo, Jérémie du Boisberranger, Jihane, Jirka Borovec, Joel Nothman, Jon Haitz Legarreta Gorroño, Jordan Silke, Jorge Ciprián, Jorge Loayza, Joseph Chazalon, Joseph Schwartz-Messing, Jovan Stojanovic, JSchuerz, Juan Carlos Alfaro Jiménez, Juan Martin Loyola, Julien Jerphanion, katotten, Kaushik Roy Chowdhury, Ken4git, Kenneth Prabakaran, kernc, Kevin Doucet, KimAYoung, Koushik Joshi, Kranthi Sedamaki, krishna kumar, krumetoft, lesnee, Lisa Casino, Logan Thomas, Loic Esteve, Louis Wagner, LucieClair, Lucy Liu, Luiz Eduardo Amaral, Magali, MaggieChege, Mai, mandjevant, Mandy Gu, Manimaran, MarcoM, Marco Wurps, Maren Westermann, Maria Boerner, MarieS-WiMLDS, Martel Corentin, martin-kokos, mathurinm, Matías, matjansen, Matteo Francia, Maxwell, Meekail Zain, Megabyte, Mehrdad Moradizadeh, melemo2, Michael I Chen, michalkrawczyk, Micky774, milana2, millawell, Ming-Yang Ho, Mitzi, miwojc, Mizuki, mlant, Mohamed Haseeb, Mohit Sharma, Moonkyung94, mpoemsl, MrinalTyagi, Mr. Leu, msabatier, murata-yu, N, Nadirhan Şahin, Naipawat Poolsawat, NartayXD, nastegiano, nathansquan, nat-salt, Nicki Skafte Detlefsen, Nicolas Hug, Niket Jain, Nikhil Suresh, Nikita Titov, Nikolay Kondratyev, Ohad Michel, Oleksandr Husak, Olivier Grisel, partev, Patrick Ferreira, Paul, pelennor, PierreAttard, Piet Brömmel, Pieter Gijsbers, Pinky, poloso, Pramod Anantharam, puhuk, Purna Chandra Mansingh, QuadV, Rahil Parikh, Randall Boyes, randomgeek78, Raz Hoshia, Reshama Shaikh, Ricardo Ferreira, Richard Taylor, Rileran, Rishabh, Robin Thibaut, Rocco Meli, Roman Feldbauer, Roman Yurchak, Ross Barnowski, rsnegrin, Sachin Yadav, sakinaOuisrani, Sam Adam Day, Sanjay Marreddi, Sebastian Pujalte, SEELE, SELEE, Seyedsaman (Sam) Emami, ShanDeng123, Shao Yang Hong, sharmadharmpal, shaymerNaturalint, Shuangchi He, Shubhraneel Pal, siavrez, slishak, Smile, spikebh, sply88, Srinath Kailasa, Stéphane Collot, Sultan Orazbayev, Sumit Saha, Sven Eschlbeck, Sven Stehle, Swapnil Jha, Sylvain Marié, Takeshi Oura, Tamires Santana, Tenavi, teunpe, Theis Ferré Hjortkjær, Thiruvenkadam, Thomas J. Fan, t-jakubek, toastedyeast, Tom Dupré la Tour, Tom McTiernan, TONY GEORGE, Tyler Martin, Tyler Reddy, Udit Gupta, Ugo Marchand, Varun Agrawal, Venkatachalam N, Vera Komeyer, victoirelouis, Vikas Vishwakarma, Vikrant khedkar, Vladimir Chernyy, Vladimir Kim, WeijiaDu, Xiao Yuan, Yar Khine Phyo, Ying Xiong, yiyangq, Yosshi999, Yuki Koyama, Zach Deane-Mayer, Zeel B Patel, zempleni, zhenfisher, 赵丰 (Zhao Feng)