Fix Avoid explicitly forming inverse covariance matrix in
gaussian_process.GaussianProcessRegressorwhen set to output standard deviation. With certain covariance matrices this inverse is unstable to compute explicitly. Calling Cholesky solver mitigates this issue in computation. #19939 by Ian Halvic.
Fix Avoid division by zero when scaling constant target in
gaussian_process.GaussianProcessRegressor. It was due to a std. dev. equal to 0. Now, such case is detected and the std. dev. is affected to 1 avoiding a division by zero and thus the presence of NaN values in the normalized target. #19703 by @sobkevich, Boris Villazón-Terrazas and Alexandr Fonari.
fitmethod of the successive halving parameter search (
model_selection.HalvingRandomSearchCV) now correctly handles the
groupsparameter. #19847 by Xiaoyu Chai.
The 0.24.0 scikit-learn wheels were not working with MacOS <1.15 due to
libomp. The version of
libomp used to build the wheels was too recent for
older macOS versions. This issue has been fixed for 0.24.1 scikit-learn wheels.
Scikit-learn wheels published on PyPI.org now officially support macOS 10.13
For a short description of the main highlights of the release, please refer to Release Highlights for scikit-learn 0.24.
Legend for changelogs¶
Major Feature : something big that you couldn’t do before.
Feature : something that you couldn’t do before.
Efficiency : an existing feature now may not require as much computation or memory.
Enhancement : a miscellaneous minor improvement.
Fix : something that previously didn’t work as documentated – or according to reasonable expectations – should now work.
API Change : you will need to change your code to have the same effect in the future; or a feature will be removed in the future.
Put the changes in their relevant module.
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
decomposition.KernelPCAbehaviour is now more consistent between 32-bits and 64-bits data when the kernel has small positive eigenvalues.
decomposition.TruncatedSVDbecomes deterministic by exposing a
Fix Change in the random sampling procedures for the center initialization of
Details are listed in the changelog below.
(While we are trying to better inform users by providing this information, we cannot assure that this list is complete.)
calibration.CalibratedClassifierCV, which enables implementation of calibration via an ensemble of calibrators (current method) or just one calibrator using all the data (similar to the built-in feature of
sklearn.svmestimators with the
probabilities=Trueparameter). #17856 by Lucy Liu and Andrea Esuli.
cluster.AgglomerativeClusteringhas a new parameter
compute_distances. When set to
True, distances between clusters are computed and stored in the
distances_attribute even when the parameter
distance_thresholdis not used. This new parameter is useful to produce dendrogram visualizations, but introduces a computational and memory overhead. #17984 by Michael Riedmann, Emilie Delattre, and Francesco Casalegno.
cluster.spectral_clusteringhave a new keyword argument
verbose. When set to
True, additional messages will be displayed which can aid with debugging. #18052 by Sean O. Stalley.
compose.ColumnTransformerenforces strict count and order of column names between
transformby raising an error instead of a warning, following the deprecation cycle. #18256 by Madhura Jayratne.
Fix Fixed a bug in
cross_decomposition.PLSCanonical, which would lead to incorrect predictions for
est.transform(Y)when the training data is single-target. #17095 by Nicolas Hug.
API Change The bounds of the
n_componentsparameter is now restricted:
API Change For
y_scores_attributes were deprecated and will be removed in 1.1 (renaming of 0.26). They can be retrieved by calling
transformon the training data. The
norm_y_weightsattribute will also be removed. #17095 by Nicolas Hug.
API Change For
y_std_attributes were deprecated and will be removed in 1.1 (renaming of 0.26). #18768 by Maren Westermann.
datasets.fetch_covtypenow supports the optional argument
as_frame; when it is set to True, the returned Bunch object’s
framemembers are pandas DataFrames, and the
targetmember is a pandas Series. #17491 by Alex Liang.
datasets.fetch_kddcup99now supports the optional argument
as_frame; when it is set to True, the returned Bunch object’s
framemembers are pandas DataFrames, and the
targetmember is a pandas Series. #18280 by Alex Liang and Guillaume Lemaitre.
API Change For
initvalue, when ‘init=None’ and n_components <= min(n_samples, n_features) will be changed from
'nndsvda'in 1.1 (renaming of 0.26). #18525 by Chiara Marmo.
decomposition.NMFnow supports the optional parameter
regularization, which can take the values
None, ‘components’, ‘transformation’ or ‘both’, in accordance with
decomposition.NMF.non_negative_factorization. #17414 by Bharat Raghunathan.
decomposition.KernelPCAbehaviour is now more consistent between 32-bits and 64-bits data input when the kernel has small positive eigenvalues. Small positive eigenvalues were not correctly discarded for 32-bits data. #18149 by Sylvain Marié.
decomposition.SparseCodersuch that it follows scikit-learn API and support cloning. The attribute
components_is deprecated in 0.24 and will be removed in 1.1 (renaming of 0.26). This attribute was redundant with the
dictionaryattribute and constructor parameter. #17679 by Xavier Dupré.
decomposition.TruncatedSVD.fit_transformconsistently returns the same as
decomposition.TruncatedSVD.transform. #18528 by Albert Villanova del Moral and Ruifeng Zheng.
ensemble.HistGradientBoostingClassifiernow have native support for categorical features with the
categorical_featuresparameter. #18394 by Nicolas Hug and Thomas Fan.
Efficiency break cyclic references in the tree nodes used internally in
ensemble.HistGradientBoostingClassifierto allow for the timely garbage collection of large intermediate datastructures and to improve memory usage in
fit. #18334 by Olivier Grisel Nicolas Hug, Thomas Fan and Andreas Müller.
Efficiency Histogram initialization is now done in parallel in
ensemble.HistGradientBoostingClassifierwhich results in speed improvement for problems that build a lot of nodes on multicore machines. #18341 by Olivier Grisel, Nicolas Hug, Thomas Fan, and Egor Smirnov.
Feature A new parameter
importance_getterwas added to
feature_selection.SelectFromModel, allowing the user to specify an attribute name/path or a
callablefor extracting feature importance from the estimator. #15361 by Venkatachalam N.
Efficiency Reduce memory footprint in
neighbors.KDTreefor counting nearest neighbors. #17878 by Noel Rogers.
feature_selection.RFEsupports the option for the number of
n_features_to_selectto be given as a float representing the percentage of features to select. #17090 by Lisa Schwetlick and Marija Vlajic Wheeler.
Fix replace the default values in
np.inf, respectively instead of
None. However, the behaviour of the class does not change since
Nonewas defaulting to these values already. #16493 by Darshan N.
inspection.plot_partial_dependencenow support calculating and plotting Individual Conditional Expectation (ICE) curves controlled by the
kindparameter. #16619 by Madhura Jayratne.
Feature Expose fitted attributes
y_thresholds_that hold the de-duplicated interpolation thresholds of an
isotonic.IsotonicRegressioninstance for model inspection purpose. #16289 by Masashi Kishimoto and Olivier Grisel.
linear_model.RidgeCVnow supports finding an optimal regularization value
alphafor each target separately by setting
alpha_per_target=True. This is only supported when using the default efficient leave-one-out cross-validation scheme
cv=None. #6624 by Marijn van Vliet.
manifold.TSNE, which provides backward compatibility during deprecation of legacy squaring behavior. Distances will be squared by default in 1.1 (renaming of 0.26), and this parameter will be removed in 1.3. #17662 by Joshua Newton.
Feature new metric
metrics.top_k_accuracy_score. It’s a generalization of
metrics.top_k_accuracy_score, the difference is that a prediction is considered correct as long as the true label is associated with one of the
khighest predicted scores.
metrics.accuracy_scoreis the special case of
k = 1. #16625 by Geoffrey Bolmier.
metrics.mean_absolute_percentage_errormetric and the associated scorer for regression problems. #10708 fixed with the PR #15007 by Ashutosh Hathidara. The scorer and some practical test cases were taken from PR #10711 by Mohamed Ali Jamaoui.
metrics.plot_precision_recall_curvein order to specify the positive class to be used when computing the precision and recall statistics. #17569 by Guillaume Lemaitre.
Fix Fix scorers that accept a pos_label parameter and compute their metrics from values returned by
predict_proba. Previously, they would return erroneous values when pos_label was not corresponding to
classifier.classes_. This is especially important when training classifiers directly with string labeled target classes. #18114 by Guillaume Lemaitre.
Fix Fixed bug in
metrics.plot_confusion_matrixwhere error occurs when
y_truecontains labels that were not previously seen by the classifier while the
display_labelsparameters are set to
None. #18405 by Thomas J. Fan and Yakov Pchelintsev.
Major Feature Added (experimental) parameter search estimators
model_selection.HalvingGridSearchCVwhich implement Successive Halving, and can be used as a drop-in replacements for
model_selection.GridSearchCV. #13900 by Nicolas Hug, Joel Nothman and Andreas Müller.
model_selection.TimeSeriesSplithas two new keyword arguments
test_sizeallows the out-of-sample time series length to be fixed for all folds.
gapremoves a fixed number of samples between the train and test set on each fold. #13204 by Kyle Kosic.
model_selection.validation_curvenow accept fit_params to pass additional estimator parameters. #18527 by Gaurav Dhingra, Julien Jerphanion and Amanda Dsouza.
model_selection.RandomizedSearchCVallows estimator to fail scoring and replace the score with
error_score="raise", the error will be raised. #18343 by Guillaume Lemaitre and Devi Sandeep.
Fix A fix to raise warning when one or more CV splits of
model_selection.RandomizedSearchCVresults in non-finite scores. #18266 by Subrat Sahu, Nirvan and Arthur Book.
scoringbeing a callable returning a dictionary of of multiple metric names/values association. #15126 by Thomas Fan.
multiclass.OneVsOneClassifiernow accepts the inputs with missing values. Hence, estimators which can handle missing values (may be a pipeline with imputation step) can be used as a estimator for multiclass wrappers. #17987 by Venkatachalam N.
Fix A fix to allow
multiclass.OutputCodeClassifierto accept sparse input data in its
predictmethods. The check for validity of the input is now delegated to the base estimator. #17233 by Zolisa Bleki.
multioutput.MultiOutputRegressornow accepts the inputs with missing values. Hence, estimators which can handle missing values (may be a pipeline with imputation step, HistGradientBoosting estimators) can be used as a estimator for multiclass wrappers. #17987 by Venkatachalam N.
Enhancement Adds a parameter
naive_bayes.CategoricalNBthat allows a minimum number of categories per feature to be specified. This allows categories unseen during training to be accounted for. #16326 by George Armstrong.
API Change The attributes
intercept_are now deprecated in
naive_bayes.CategoricalNB, and will be removed in v1.1 (renaming of 0.26). #17427 by Juan Carlos Alfaro Jiménez.
Efficiency Speed up
neighbors.DistanceMetricby avoiding unexpected GIL acquiring in Cython when setting
metrics.pairwise_distancesand by validating data out of loops. #17038 by Wenbo Zhao.
neighbors.NeighborsBasebenefits of an improved
algorithm = 'auto'heuristic. In addition to the previous set of rules, now, when the number of features exceeds 15,
bruteis selected, assuming the data intrinsic dimensionality is too high for tree-based methods. #17148 by Geoffrey Bolmier.
Fix In methods
sort_results=Truenow correctly sorts the results even when fitting with the “brute” algorithm. #18612 by Tom Dupre la Tour.
Feature Add a new
handle_unknownparameter with a
use_encoded_valueoption, along with a new
preprocessing.OrdinalEncoderto allow unknown categories during transform and set the encoded value of the unknown categories. #17406 by Felix Wick and #18406 by Nicolas Hug.
preprocessing.StandardScaler. Allows setting individual weights for each sample. #18510 and #18447 and #16066 and #18682 by Maria Telenczuk and Albert Villanova and @panpiort8 and Alex Gramfort.
Fix Raise error on
drop=Nonefor samples encoded as all zeros. #14982 by Kevin Winata.
Major Feature Added
semi_supervised.SelfTrainingClassifier, a meta-classifier that allows any supervised classifier to function as a semi-supervised classifier that can learn from unlabeled data. #11682 by Oliver Rausch and Patrice Becker.
check_estimator, which checks that estimator methods are invariant if applied to the same dataset with different sample order #17598 by Jason Ngo.
Fix Check that we raise proper error when axis=1 and the dimensions do not match in
utils.sparse_func.incr_mean_variance_axis. By Alex Gramfort.
Code and Documentation Contributors¶
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 0.23, including:
Abo7atm, Adam Spannbauer, Adrin Jalali, adrinjalali, Agamemnon Krasoulis, Akshay Deodhar, Albert Villanova del Moral, Alessandro Gentile, Alex Henrie, Alex Itkes, Alex Liang, Alexander Lenail, alexandracraciun, Alexandre Gramfort, alexshacked, Allan D Butler, Amanda Dsouza, amy12xx, Anand Tiwari, Anderson Nelson, Andreas Mueller, Ankit Choraria, Archana Subramaniyan, Arthur Imbert, Ashutosh Hathidara, Ashutosh Kushwaha, Atsushi Nukariya, Aura Munoz, AutoViz and Auto_ViML, Avi Gupta, Avinash Anakal, Ayako YAGI, barankarakus, barberogaston, beatrizsmg, Ben Mainye, Benjamin Bossan, Benjamin Pedigo, Bharat Raghunathan, Bhavika Devnani, Biprateep Dey, bmaisonn, Bo Chang, Boris Villazón-Terrazas, brigi, Brigitta Sipőcz, Bruno Charron, Byron Smith, Cary Goltermann, Cat Chenal, CeeThinwa, chaitanyamogal, Charles Patel, Chiara Marmo, Christian Kastner, Christian Lorentzen, Christoph Deil, Christos Aridas, Clara Matos, clmbst, Coelhudo, crispinlogan, Cristina Mulas, Daniel López, Daniel Mohns, darioka, Darshan N, david-cortes, Declan O’Neill, Deeksha Madan, Elizabeth DuPre, Eric Fiegel, Eric Larson, Erich Schubert, Erin Khoo, Erin R Hoffman, eschibli, Felix Wick, fhaselbeck, Forrest Koch, Francesco Casalegno, Frans Larsson, Gael Varoquaux, Gaurav Desai, Gaurav Sheni, genvalen, Geoffrey Bolmier, George Armstrong, George Kiragu, Gesa Stupperich, Ghislain Antony Vaillant, Gim Seng, Gordon Walsh, Gregory R. Lee, Guillaume Chevalier, Guillaume Lemaitre, Haesun Park, Hannah Bohle, Hao Chun Chang, Harry Scholes, Harsh Soni, Henry, Hirofumi Suzuki, Hitesh Somani, Hoda1394, Hugo Le Moine, hugorichard, indecisiveuser, Isuru Fernando, Ivan Wiryadi, j0rd1smit, Jaehyun Ahn, Jake Tae, James Hoctor, Jan Vesely, Jeevan Anand Anne, JeroenPeterBos, JHayes, Jiaxiang, Jie Zheng, Jigna Panchal, jim0421, Jin Li, Joaquin Vanschoren, Joel Nothman, Jona Sassenhagen, Jonathan, Jorge Gorbe Moya, Joseph Lucas, Joshua Newton, Juan Carlos Alfaro Jiménez, Julien Jerphanion, Justin Huber, Jérémie du Boisberranger, Kartik Chugh, Katarina Slama, kaylani2, Kendrick Cetina, Kenny Huynh, Kevin Markham, Kevin Winata, Kiril Isakov, kishimoto, Koki Nishihara, Krum Arnaudov, Kyle Kosic, Lauren Oldja, Laurenz Reitsam, Lisa Schwetlick, Louis Douge, Louis Guitton, Lucy Liu, Madhura Jayaratne, maikia, Manimaran, Manuel López-Ibáñez, Maren Westermann, Maria Telenczuk, Mariam-ke, Marijn van Vliet, Markus Löning, Martin Scheubrein, Martina G. Vilas, Martina Megasari, Mateusz Górski, mathschy, mathurinm, Matthias Bussonnier, Max Del Giudice, Michael, Milan Straka, Muoki Caleb, N. Haiat, Nadia Tahiri, Ph. D, Naoki Hamada, Neil Botelho, Nicolas Hug, Nils Werner, noelano, Norbert Preining, oj_lappi, Oleh Kozynets, Olivier Grisel, Pankaj Jindal, Pardeep Singh, Parthiv Chigurupati, Patrice Becker, Pete Green, pgithubs, Poorna Kumar, Prabakaran Kumaresshan, Probinette4, pspachtholz, pwalchessen, Qi Zhang, rachel fischoff, Rachit Toshniwal, Rafey Iqbal Rahman, Rahul Jakhar, Ram Rachum, RamyaNP, rauwuckl, Ravi Kiran Boggavarapu, Ray Bell, Reshama Shaikh, Richard Decal, Rishi Advani, Rithvik Rao, Rob Romijnders, roei, Romain Tavenard, Roman Yurchak, Ruby Werman, Ryotaro Tsukada, sadak, Saket Khandelwal, Sam, Sam Ezebunandu, Sam Kimbinyi, Sarah Brown, Saurabh Jain, Sean O. Stalley, Sergio, Shail Shah, Shane Keller, Shao Yang Hong, Shashank Singh, Shooter23, Shubhanshu Mishra, simonamaggio, Soledad Galli, Srimukh Sripada, Stephan Steinfurt, subrat93, Sunitha Selvan, Swier, Sylvain Marié, SylvainLan, t-kusanagi2, Teon L Brooks, Terence Honles, Thijs van den Berg, Thomas J Fan, Thomas J. Fan, Thomas S Benjamin, Thomas9292, Thorben Jensen, tijanajovanovic, Timo Kaufmann, tnwei, Tom Dupré la Tour, Trevor Waite, ufmayer, Umberto Lupo, Venkatachalam N, Vikas Pandey, Vinicius Rios Fuck, Violeta, watchtheblur, Wenbo Zhao, willpeppo, xavier dupré, Xethan, Xue Qianming, xun-tang, yagi-3, Yakov Pchelintsev, Yashika Sharma, Yi-Yan Ge, Yue Wu, Yutaro Ikeda, Zaccharie Ramzi, zoj613, Zhao Feng.