Version 0.24.2¶
April 2021
Changelog¶
sklearn.compose¶
Fix
compose.ColumnTransformer.get_feature_namesdoes not call get_feature_names on transformers with an empty column selection. #19579 by Thomas Fan.
sklearn.cross_decomposition¶
Fix Fixed a regression in
cross_decomposition.CCA. #19646 by Thomas Fan.Fix
cross_decomposition.PLSRegressionraises warning for constant y residuals instead of aStopIterationerror. #19922 by Thomas Fan.
sklearn.decomposition¶
Fix Fixed a bug in
decomposition.KernelPCA’sinverse_transform. #19732 by Kei Ishikawa.
sklearn.ensemble¶
Fix Fixed a bug in
ensemble.HistGradientBoostingRegressorfitwithsample_weightparameter andleast_absolute_deviationloss function. #19407 by Vadim Ushtanit.
sklearn.gaussian_process¶
Fix Avoid explicitly forming inverse covariance matrix in
gaussian_process.GaussianProcessRegressorwhen set to output standard deviation. With certain covariance matrices this inverse is unstable to compute explicitly. Calling Cholesky solver mitigates this issue in computation. #19939 by Ian Halvic.Fix Avoid division by zero when scaling constant target in
gaussian_process.GaussianProcessRegressor. It was due to a std. dev. equal to 0. Now, such case is detected and the std. dev. is affected to 1 avoiding a division by zero and thus the presence of NaN values in the normalized target. #19703 by @sobkevich, Boris Villazón-Terrazas and Alexandr Fonari.
sklearn.linear_model¶
Fix : Fixed a bug in
linear_model.LogisticRegression: the sample_weight object is not modified anymore. #19182 by Yosuke KOBAYASHI.
sklearn.metrics¶
Fix
metrics.top_k_accuracy_scorenow supports multiclass problems where only two classes appear iny_trueand all the classes are specified inlabels. #19721 by Joris Clement.
sklearn.model_selection¶
Fix
model_selection.RandomizedSearchCVandmodel_selection.GridSearchCVnow correctly shows the score for single metrics and verbose > 2. #19659 by Thomas Fan.Fix Some values in the
cv_results_attribute ofmodel_selection.HalvingRandomSearchCVandmodel_selection.HalvingGridSearchCVwere not properly converted to numpy arrays. #19211 by Nicolas Hug.Fix The
fitmethod of the successive halving parameter search (model_selection.HalvingGridSearchCV, andmodel_selection.HalvingRandomSearchCV) now correctly handles thegroupsparameter. #19847 by Xiaoyu Chai.
sklearn.multioutput¶
Fix
multioutput.MultiOutputRegressornow works with estimators that dynamically definepredictduring fitting, such asensemble.StackingRegressor. #19308 by Thomas Fan.
sklearn.preprocessing¶
Fix Validate the constructor parameter
handle_unknowninpreprocessing.OrdinalEncoderto only allow for'error'and'use_encoded_value'strategies. #19234 byGuillaume Lemaitre <glemaitre>.Fix Fix encoder categories having dtype=’S’
preprocessing.OneHotEncoderandpreprocessing.OrdinalEncoder. #19727 by Andrew Delong.Fix
preprocessing.OrdinalEncoder.transfromcorrectly handles unknown values for string dtypes. #19888 by Thomas Fan.Fix
preprocessing.OneHotEncoder.fitno longer alters thedropparameter. #19924 by Thomas Fan.
sklearn.semi_supervised¶
Fix Avoid NaN during label propagation in
LabelPropagation. #19271 by Zhaowei Wang.
sklearn.tree¶
Fix Fix a bug in
fitoftree.BaseDecisionTreethat caused segmentation faults under certain conditions.fitnow deep copies theCriterionobject to prevent shared concurrent accesses. #19580 by Samuel Brice and Alex Adamson and Wil Yegelwel.
sklearn.utils¶
Fix Better contains the CSS provided by
utils.estimator_html_reprby giving CSS ids to the html representation. #19417 by Thomas Fan.
Version 0.24.1¶
January 2021
Packaging¶
The 0.24.0 scikit-learn wheels were not working with MacOS <1.15 due to
libomp. The version of libomp used to build the wheels was too recent for
older macOS versions. This issue has been fixed for 0.24.1 scikit-learn wheels.
Scikit-learn wheels published on PyPI.org now officially support macOS 10.13
and later.
Changelog¶
sklearn.metrics¶
Fix Fix numerical stability bug that could happen in
metrics.adjusted_mutual_info_scoreandmetrics.mutual_info_scorewith NumPy 1.20+. #19179 by Thomas Fan.
sklearn.semi_supervised¶
Fix
semi_supervised.SelfTrainingClassifieris now accepting meta-estimator (e.g.ensemble.StackingClassifier). The validation of this estimator is done on the fitted estimator, once we know the existence of the methodpredict_proba. #19126 by Guillaume Lemaitre.
Version 0.24.0¶
December 2020
For a short description of the main highlights of the release, please refer to Release Highlights for scikit-learn 0.24.
Legend for changelogs¶
Major Feature : something big that you couldn’t do before.
Feature : something that you couldn’t do before.
Efficiency : an existing feature now may not require as much computation or memory.
Enhancement : a miscellaneous minor improvement.
Fix : something that previously didn’t work as documentated – or according to reasonable expectations – should now work.
API Change : you will need to change your code to have the same effect in the future; or a feature will be removed in the future.
Put the changes in their relevant module.
Changed models¶
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
Fix
decomposition.KernelPCAbehaviour is now more consistent between 32-bits and 64-bits data when the kernel has small positive eigenvalues.Fix
decomposition.TruncatedSVDbecomes deterministic by exposing arandom_stateparameter.Fix
linear_model.Perceptronwhenpenalty='elasticnet'.Fix Change in the random sampling procedures for the center initialization of
cluster.KMeans.
Details are listed in the changelog below.
(While we are trying to better inform users by providing this information, we cannot assure that this list is complete.)
Changelog¶
sklearn.base¶
Fix
base.BaseEstimator.get_paramsnow will raise anAttributeErrorif a parameter cannot be retrieved as an instance attribute. Previously it would returnNone. #17448 by Juan Carlos Alfaro Jiménez.
sklearn.calibration¶
Efficiency
calibration.CalibratedClassifierCV.fitnow supports parallelization viajoblib.Parallelusing argumentn_jobs. #17107 by Julien Jerphanion.Enhancement Allow
calibration.CalibratedClassifierCVuse with prefitpipeline.Pipelinewhere data is notXis not array-like, sparse matrix or dataframe at the start. #17546 by Lucy Liu.Enhancement Add
ensembleparameter tocalibration.CalibratedClassifierCV, which enables implementation of calibration via an ensemble of calibrators (current method) or just one calibrator using all the data (similar to the built-in feature ofsklearn.svmestimators with theprobabilities=Trueparameter). #17856 by Lucy Liu and Andrea Esuli.
sklearn.cluster¶
Enhancement
cluster.AgglomerativeClusteringhas a new parametercompute_distances. When set toTrue, distances between clusters are computed and stored in thedistances_attribute even when the parameterdistance_thresholdis not used. This new parameter is useful to produce dendrogram visualizations, but introduces a computational and memory overhead. #17984 by Michael Riedmann, Emilie Delattre, and Francesco Casalegno.Enhancement
cluster.SpectralClusteringandcluster.spectral_clusteringhave a new keyword argumentverbose. When set toTrue, additional messages will be displayed which can aid with debugging. #18052 by Sean O. Stalley.Enhancement Added
cluster.kmeans_plusplusas public function. Initialization by KMeans++ can now be called separately to generate initial cluster centroids. #17937 by @g-walshAPI Change
cluster.MiniBatchKMeansattributes,counts_andinit_size_, are deprecated and will be removed in 1.1 (renaming of 0.26). #17864 by Jérémie du Boisberranger.
sklearn.compose¶
Fix
compose.ColumnTransformerwill skip transformers the column selector is a list of bools that are False. #17616 by Thomas Fan.Fix
compose.ColumnTransformernow displays the remainder in the diagram display. #18167 by Thomas Fan.Fix
compose.ColumnTransformerenforces strict count and order of column names betweenfitandtransformby raising an error instead of a warning, following the deprecation cycle. #18256 by Madhura Jayratne.
sklearn.covariance¶
API Change Deprecates
cv_alphas_in favor ofcv_results_['alphas']andgrid_scores_in favor of split scores incv_results_incovariance.GraphicalLassoCV.cv_alphas_andgrid_scores_will be removed in version 1.1 (renaming of 0.26). #16392 by Thomas Fan.
sklearn.cross_decomposition¶
Fix Fixed a bug in
cross_decomposition.PLSSVDwhich would sometimes return components in the reversed order of importance. #17095 by Nicolas Hug.Fix Fixed a bug in
cross_decomposition.PLSSVD,cross_decomposition.CCA, andcross_decomposition.PLSCanonical, which would lead to incorrect predictions forest.transform(Y)when the training data is single-target. #17095 by Nicolas Hug.Fix Increases the stability of
cross_decomposition.CCA#18746 by Thomas Fan.API Change For
cross_decomposition.NMF, theinitvalue, when ‘init=None’ and n_components <= min(n_samples, n_features) will be changed from'nndsvd'to'nndsvda'in 1.1 (renaming of 0.26). #18525 by Chiara Marmo.API Change The bounds of the
n_componentsparameter is now restricted:into
[1, min(n_samples, n_features, n_targets)], forcross_decomposition.PLSSVD,cross_decomposition.CCA, andcross_decomposition.PLSCanonical.into
[1, n_features]orcross_decomposition.PLSRegression.
An error will be raised in 1.1 (renaming of 0.26). #17095 by Nicolas Hug.
API Change For
cross_decomposition.PLSSVD,cross_decomposition.CCA, andcross_decomposition.PLSCanonical, thex_scores_andy_scores_attributes were deprecated and will be removed in 1.1 (renaming of 0.26). They can be retrieved by callingtransformon the training data. Thenorm_y_weightsattribute will also be removed. #17095 by Nicolas Hug.API Change For
cross_decomposition.PLSRegression,cross_decomposition.PLSCanonical,cross_decomposition.CCA, andcross_decomposition.PLSSVD, thex_mean_,y_mean_,x_std_, andy_std_attributes were deprecated and will be removed in 1.1 (renaming of 0.26). #18768 by Maren Westermann.Fix
decomposition.TruncatedSVDbecomes deterministic by using therandom_state. It controls the weights’ initialization of the underlying ARPACK solver. :pr:` #18302` by Gaurav Desai and Ivan Panico.
sklearn.datasets¶
Feature
datasets.fetch_openmlnow validates md5 checksum of arff files downloaded or cached to ensure data integrity. #14800 by Shashank Singh and Joel Nothman.Feature
datasets.fetch_openmlnow validates md5checksum of arff files downloaded or cached to ensure data integrity. #14800 by Shashank Singh and Joel Nothman.Enhancement
datasets.fetch_openmlnow allows argumentas_frameto be ‘auto’, which tries to convert returned data to pandas DataFrame unless data is sparse. #17396 by Jiaxiang.Enhancement
datasets.fetch_covtypenow now supports the optional argumentas_frame; when it is set to True, the returned Bunch object’sdataandframemembers are pandas DataFrames, and thetargetmember is a pandas Series. #17491 by Alex Liang.Enhancement
datasets.fetch_kddcup99now now supports the optional argumentas_frame; when it is set to True, the returned Bunch object’sdataandframemembers are pandas DataFrames, and thetargetmember is a pandas Series. #18280 by Alex Liang and Guillaume Lemaitre.Enhancement
datasets.fetch_20newsgroups_vectorizednow supports loading as a pandasDataFrameby settingas_frame=True. #17499 by Brigitta Sipőcz and Guillaume Lemaitre.API Change The default value of
as_frameindatasets.fetch_openmlis changed from False to ‘auto’. #17610 by Jiaxiang.
sklearn.decomposition¶
Enhancement
decomposition.FactorAnalysisnow supports the optional argumentrotation, which can take the valueNone,'varimax'or'quartimax'. #11064 by Jona Sassenhagen.Enhancement
decomposition.NMFnow supports the optional parameterregularization, which can take the valuesNone, ‘components’, ‘transformation’ or ‘both’, in accordance withdecomposition.NMF.non_negative_factorization. #17414 by Bharat Raghunathan.Fix
decomposition.KernelPCAbehaviour is now more consistent between 32-bits and 64-bits data input when the kernel has small positive eigenvalues. Small positive eigenvalues were not correctly discarded for 32-bits data. #18149 by Sylvain Marié.Fix Fix
decomposition.SparseCodersuch that it follows scikit-learn API and support cloning. The attributecomponents_is deprecated in 0.24 and will be removed in 1.1 (renaming of 0.26). This attribute was redundant with thedictionaryattribute and constructor parameter. #17679 by Xavier Dupré.Fix
TruncatedSVD.fit_transformconsistently returns the same asTruncatedSVD.fitfollowed byTruncatedSVD.transform. #18528 by Albert Villanova del Moral and Ruifeng Zheng.
sklearn.discriminant_analysis¶
Enhancement
discriminant_analysis.LinearDiscriminantAnalysiscan now use custom covariance estimate by setting thecovariance_estimatorparameter. #14446 by Hugo Richard.
sklearn.ensemble¶
Major Feature
ensemble.HistGradientBoostingRegressorandensemble.HistGradientBoostingClassifiernow have native support for categorical features with thecategorical_featuresparameter. #18394 by Nicolas Hug and Thomas Fan.Feature
ensemble.HistGradientBoostingRegressorandensemble.HistGradientBoostingClassifiernow support the methodstaged_predict, which allows monitoring of each stage. #16985 by Hao Chun Chang.Efficiency break cyclic references in the tree nodes used internally in
ensemble.HistGradientBoostingRegressorandensemble.HistGradientBoostingClassifierto allow for the timely garbage collection of large intermediate datastructures and to improve memory usage infit. #18334 by Olivier Grisel Nicolas Hug, Thomas Fan and Andreas Müller.Efficiency Histogram initialization is now done in parallel in
ensemble.HistGradientBoostingRegressorandensemble.HistGradientBoostingClassifierwhich results in speed improvement for problems that build a lot of nodes on multicore machines. #18341 by Olivier Grisel, Nicolas Hug, Thomas Fan, and Egor Smirnov.Fix Fixed a bug in
ensemble.HistGradientBoostingRegressorandensemble.HistGradientBoostingClassifierwhich can now accept data withuint8dtype inpredict. #18410 by Nicolas Hug.API Change The parameter
n_classes_is now deprecated inensemble.GradientBoostingRegressorand returns1. #17702 by Simona Maggio.API Change Mean absolute error (‘mae’) is now deprecated for the parameter
criterioninensemble.GradientBoostingRegressorandensemble.GradientBoostingClassifier. #18326 by Madhura Jayaratne.
sklearn.exceptions¶
API Change
exceptions.ChangedBehaviorWarningandexceptions.NonBLASDotWarningare deprecated and will be removed in 1.1 (renaming of 0.26). #17804 by Adrin Jalali.
sklearn.feature_extraction¶
Enhancement
feature_extraction.DictVectorizeraccepts multiple values for one categorical feature. #17367 by Peng Yu and Chiara Marmo.Fix
feature_extraction.CountVectorizerraises an issue if a custom token pattern which capture more than one group is provided. #15427 by Gangesh Gudmalwar and Erin R Hoffman.
sklearn.feature_selection¶
Feature Added
feature_selection.SequentialFeatureSelectorwhich implements forward and backward sequential feature selection. #6545 by Sebastian Raschka and #17159 by Nicolas Hug.Feature A new parameter
importance_getterwas added tofeature_selection.RFE,feature_selection.RFECVandfeature_selection.SelectFromModel, allowing the user to specify an attribute name/path or acallablefor extracting feature importance from the estimator. #15361 by Venkatachalam N.Efficiency Reduce memory footprint in
feature_selection.mutual_info_classifandfeature_selection.mutual_info_regressionby callingneighbors.KDTreefor counting nearest neighbors. #17878 by Noel Rogers.Enhancement
feature_selection.RFEsupports the option for the number ofn_features_to_selectto be given as a float representing the percentage of features to select. #17090 by Lisa Schwetlick and Marija Vlajic Wheeler.
sklearn.gaussian_process¶
Enhancement A new method
gaussian_process.Kernel._check_bounds_paramsis called after fitting a Gaussian Process and raises aConvergenceWarningif the bounds of the hyperparameters are too tight. #12638 by Sylvain Lannuzel.
sklearn.impute¶
Feature
impute.SimpleImputernow supports a list of strings whenstrategy='most_frequent'orstrategy='constant'. #17526 by Ayako YAGI and Juan Carlos Alfaro Jiménez.Feature Added method
impute.SimpleImputer.inverse_transformto revert imputed data to original when instantiated withadd_indicator=True. #17612 by Srimukh Sripada.Fix replace the default values in
impute.IterativeImputerofmin_valueandmax_valueparameters to-np.infandnp.inf, respectively instead ofNone. However, the behaviour of the class does not change sinceNonewas defaulting to these values already. #16493 by Darshan N.Fix
impute.IterativeImputerwill not attempt to set the estimator’srandom_stateattribute, allowing to use it with more external classes. #15636 by David Cortes.Efficiency
impute.SimpleImputeris now faster withobjectdtype array. whenstrategy='most_frequent'inSimpleImputer. #18987 by David Katz.
sklearn.inspection¶
Feature
inspection.partial_dependenceandinspection.plot_partial_dependencenow support calculating and plotting Individual Conditional Expectation (ICE) curves controlled by thekindparameter. #16619 by Madhura Jayratne.Feature Add
sample_weightparameter toinspection.permutation_importance. #16906 by Roei Kahny.API Change Positional arguments are deprecated in
inspection.PartialDependenceDisplay.plotand will error in 1.1 (renaming of 0.26). #18293 by Thomas Fan.
sklearn.isotonic¶
Feature Expose fitted attributes
X_thresholds_andy_thresholds_that hold the de-duplicated interpolation thresholds of anisotonic.IsotonicRegressioninstance for model inspection purpose. #16289 by Masashi Kishimoto and Olivier Grisel.Enhancement
isotonic.IsotonicRegressionnow accepts 2d array with 1 feature as input array. #17379 by Jiaxiang.Fix Add tolerance when determining duplicate X values to prevent inf values from being predicted by
isotonic.IsotonicRegression. #18639 by Lucy Liu.
sklearn.kernel_approximation¶
Feature Added class
kernel_approximation.PolynomialCountSketchwhich implements the Tensor Sketch algorithm for polynomial kernel feature map approximation. #13003 by Daniel López Sánchez.Efficiency
kernel_approximation.Nystroemnow supports parallelization viajoblib.Parallelusing argumentn_jobs. #18545 by Laurenz Reitsam.
sklearn.linear_model¶
Feature
linear_model.LinearRegressionnow forces coefficients to be all positive whenpositiveis set toTrue. #17578 by Joseph Knox, Nelle Varoquaux and Chiara Marmo.Enhancement
linear_model.RidgeCVnow supports finding an optimal regularization valuealphafor each target separately by settingalpha_per_target=True. This is only supported when using the default efficient leave-one-out cross-validation schemecv=None. #6624 by Marijn van Vliet.Fix Fixes bug in
linear_model.TheilSenRegressorwherepredictandscorewould fail whenfit_intercept=Falseand there was one feature during fitting. #18121 by Thomas Fan.Fix Fixes bug in
linear_model.ARDRegressionwherepredictwas raising an error whennormalize=Trueandreturn_std=TruebecauseX_offset_andX_scale_were undefined. #18607 by fhaselbeck.Fix Added the missing
l1_ratioparameter inlinear_model.Perceptron, to be used whenpenalty='elasticnet'. This changes the default from 0 to 0.15. #18622 by Haesun Park.
sklearn.manifold¶
Efficiency Fixed #10493. Improve Local Linear Embedding (LLE) that raised
MemoryErrorexception when used with large inputs. #17997 by Bertrand Maisonneuve.Enhancement Add
square_distancesparameter tomanifold.TSNE, which provides backward compatibility during deprecation of legacy squaring behavior. Distances will be squared by default in 1.1 (renaming of 0.26), and this parameter will be removed in 1.3. #17662 by Joshua Newton.Fix
manifold.MDSnow correctly sets its_pairwiseattribute. #18278 by Thomas Fan.
sklearn.metrics¶
Feature Added
metrics.cluster.pair_confusion_matriximplementing the confusion matrix arising from pairs of elements from two clusterings. #17412 by Uwe F Mayer.Feature new metric
metrics.top_k_accuracy_score. It’s a generalization ofmetrics.top_k_accuracy_score, the difference is that a prediction is considered correct as long as the true label is associated with one of thekhighest predicted scores.accuracy_scoreis the special case ofk = 1. #16625 by Geoffrey Bolmier.Feature Added
metrics.det_curveto compute Detection Error Tradeoff curve classification metric. #10591 by Jeremy Karnowski and Daniel Mohns.Feature Added
metrics.plot_det_curveandmetrics.DetCurveDisplayto ease the plot of DET curves. #18176 by Guillaume Lemaitre.Feature Added
metrics.mean_absolute_percentage_errormetric and the associated scorer for regression problems. #10708 fixed with the PR #15007 by Ashutosh Hathidara. The scorer and some practical test cases were taken from PR #10711 by Mohamed Ali Jamaoui.Feature Added
metrics.rand_scoreimplementing the (unadjusted) Rand index. #17412 by Uwe F Mayer.Feature
metrics.plot_confusion_matrixnow supports making colorbar optional in the matplotlib plot by settingcolorbar=False. #17192 by Avi GuptaFeature
metrics.plot_confusion_matrixnow supports making colorbar optional in the matplotlib plot by setting colorbar=False. #17192 by Avi Gupta.Enhancement Add
sample_weightparameter tometrics.median_absolute_error. #17225 by Lucy Liu.Enhancement Add
pos_labelparameter inmetrics.plot_precision_recall_curvein order to specify the positive class to be used when computing the precision and recall statistics. #17569 by Guillaume Lemaitre.Enhancement Add
pos_labelparameter inmetrics.plot_roc_curvein order to specify the positive class to be used when computing the roc auc statistics. #17651 by Clara Matos.Fix Fixed a bug in
metrics.classification_reportwhich was raising AttributeError when called withoutput_dict=Truefor 0-length values. #17777 by Shubhanshu Mishra.Fix Fixed a bug in
metrics.classification_reportwhich was raising AttributeError when called withoutput_dict=Truefor 0-length values. #17777 by Shubhanshu Mishra.Fix Fixed a bug in
metrics.jaccard_scorewhich recommended thezero_divisionparameter when called with no true or predicted samples. #17826 by Richard Decal and Joseph WillardFix bug in
metrics.hinge_losswhere error occurs wheny_trueis missing some labels that are provided explictly in thelabelsparameter. #17935 by Cary Goltermann.Fix Fix scorers that accept a pos_label parameter and compute their metrics from values returned by
decision_functionorpredict_proba. Previously, they would return erroneous values when pos_label was not corresponding toclassifier.classes_[1]. This is especially important when training classifiers directly with string labeled target classes. #18114 by Guillaume Lemaitre.Fix Fixed bug in
metrics.plot_confusion_matrixwhere error occurs wheny_truecontains labels that were not previously seen by the classifier while thelabelsanddisplay_labelsparameters are set toNone. #18405 by Thomas J. Fan and Yakov Pchelintsev.
sklearn.model_selection¶
Major Feature Added (experimental) parameter search estimators
model_selection.HalvingRandomSearchCVandmodel_selection.HalvingGridSearchCVwhich implement Successive Halving, and can be used as a drop-in replacements formodel_selection.RandomizedSearchCVandmodel_selection.GridSearchCV. #13900 by Nicolas Hug, Joel Nothman and Andreas Müller.Feature
model_selection.RandomizedSearchCVandmodel_selection.GridSearchCVnow have the methodscore_samples#17478 by Teon Brooks and Mohamed Maskani.Enhancement
model_selection.TimeSeriesSplithas two new keyword argumentstest_sizeandgap.test_sizeallows the out-of-sample time series length to be fixed for all folds.gapremoves a fixed number of samples between the train and test set on each fold. #13204 by Kyle Kosic.Enhancement
model_selection.permutation_test_scoreandmodel_selection.validation_curvenow accept fit_params to pass additional estimator parameters. #18527 by Gaurav Dhingra, Julien Jerphanion and Amanda Dsouza.Enhancement
model_selection.cross_val_score,model_selection.cross_validate,model_selection.GridSearchCV, andmodel_selection.RandomizedSearchCVallows estimator to fail scoring and replace the score witherror_score. Iferror_score="raise", the error will be raised. #18343 by Guillaume Lemaitre and Devi Sandeep.Enhancement
model_selection.learning_curvenow accept fit_params to pass additional estimator parameters. #18595 by Amanda Dsouza.Fix Fixed the
lenofmodel_selection.ParameterSamplerwhen all distributions are lists andn_iteris more than the number of unique parameter combinations. #18222 by Nicolas Hug.Fix A fix to raise warning when one or more CV splits of
model_selection.GridSearchCVandmodel_selection.RandomizedSearchCVresults in non-finite scores. #18266 by Subrat Sahu, Nirvan and Arthur Book.Enhancement
model_selection.GridSearchCV,model_selection.RandomizedSearchCVandmodel_selection.cross_validatesupportscoringbeing a callable returning a dictionary of of multiple metric names/values association. #15126 by Thomas Fan.
sklearn.multiclass¶
Enhancement
multiclass.OneVsOneClassifiernow accepts the inputs with missing values. Hence, estimators which can handle missing values (may be a pipeline with imputation step) can be used as a estimator for multiclass wrappers. #17987 by Venkatachalam N.Fix A fix to allow
multiclass.OutputCodeClassifierto accept sparse input data in itsfitandpredictmethods. The check for validity of the input is now delegated to the base estimator. #17233 by Zolisa Bleki.
sklearn.multioutput¶
Enhancement
multioutput.MultiOutputClassifierandmultioutput.MultiOutputRegressornow accepts the inputs with missing values. Hence, estimators which can handle missing values (may be a pipeline with imputation step, HistGradientBoosting estimators) can be used as a estimator for multiclass wrappers. #17987 by Venkatachalam N.Fix A fix to accept tuples for the
orderparameter inmultioutput.ClassifierChain. #18124 by Gus Brocchini and Amanda Dsouza.
sklearn.naive_bayes¶
Enhancement Adds a parameter
min_categoriestonaive_bayes.CategoricalNBthat allows a minimum number of categories per feature to be specified. This allows categories unseen during training to be accounted for. #16326 by George Armstrong.API Change The attributes
coef_andintercept_are now deprecated innaive_bayes.MultinomialNB,naive_bayes.ComplementNB,naive_bayes.BernoulliNBandnaive_bayes.CategoricalNB, and will be removed in v1.1 (renaming of 0.26). #17427 by Juan Carlos Alfaro Jiménez.
sklearn.neighbors¶
Efficiency Speed up
seuclidean,wminkowski,mahalanobisandhaversinemetrics inneighbors.DistanceMetricby avoiding unexpected GIL acquiring in Cython when settingn_jobs>1inneighbors.KNeighborsClassifier,neighbors.KNeighborsRegressor,neighbors.RadiusNeighborsClassifier,neighbors.RadiusNeighborsRegressor,metrics.pairwise_distancesand by validating data out of loops. #17038 by Wenbo Zhao.Efficiency
neighbors.NeighborsBasebenefits of an improvedalgorithm = 'auto'heuristic. In addition to the previous set of rules, now, when the number of features exceeds 15,bruteis selected, assuming the data intrinsic dimensionality is too high for tree-based methods. #17148 by Geoffrey Bolmier.Fix
neighbors.BinaryTreewill raise aValueErrorwhen fitting on data array having points with different dimensions. #18691 by Chiara Marmo.Fix
neighbors.NearestCentroidwith a numericalshrink_thresholdwill raise aValueErrorwhen fitting on data with all constant features. #18370 by Trevor Waite.Fix In methods
radius_neighborsandradius_neighbors_graphofneighbors.NearestNeighbors,neighbors.RadiusNeighborsClassifier,neighbors.RadiusNeighborsRegressor, andneighbors.RadiusNeighborsTransformer, usingsort_results=Truenow correctly sorts the results even when fitting with the “brute” algorithm. #18612 by Tom Dupre la Tour.
sklearn.neural_network¶
Efficiency Neural net training and prediction are now a little faster. #17603, #17604, #17606, #17608, #17609, #17633, #17661, #17932 by Alex Henrie.
Enhancement Avoid converting float32 input to float64 in
neural_network.BernoulliRBM. #16352 by Arthur Imbert.Enhancement Support 32-bit computations in
neural_network.MLPClassifierandneural_network.MLPRegressor. #17759 by Srimukh Sripada.Fix Fix method
fitofneural_network.MLPClassifiernot iterating tomax_iterif warm started. #18269 by Norbert Preining and Guillaume Lemaitre.
sklearn.pipeline¶
Enhancement References to transformers passed through
transformer_weightstopipeline.FeatureUnionthat aren’t present intransformer_listwill raise aValueError. #17876 by Cary Goltermann.Fix A slice of a
pipeline.Pipelinenow inherits the parameters of the original pipeline (memoryandverbose). #18429 by Albert Villanova del Moral and Paweł Biernat.
sklearn.preprocessing¶
Feature
preprocessing.OneHotEncodernow supports missing values by treating them as a category. #17317 by Thomas Fan.Feature Add a new
handle_unknownparameter with ause_encoded_valueoption, along with a newunknown_valueparameter, topreprocessing.OrdinalEncoderto allow unknown categories during transform and set the encoded value of the unknown categories. #17406 by Felix Wick and #18406 by Nicolas Hug.Feature Add
clipparameter topreprocessing.MinMaxScaler, which clips the transformed values of test data tofeature_range. #17833 by Yashika Sharma.Feature Add
sample_weightparameter topreprocessing.StandardScaler. Allows setting individual weights for each sample. #18510 and #18447 and #16066 and #18682 by Maria Telenczuk and Albert Villanova and @panpiort8 and Alex Gramfort.Enhancement Verbose output of
model_selection.GridSearchCVhas been improved for readability. #16935 by Raghav Rajagopalan and Chiara Marmo.Enhancement Add
unit_variancetopreprocessing.RobustScaler, which scales output data such that normally distributed features have a variance of 1. #17193 by Lucy Liu and Mabel Villalba.Enhancement Add
dtypeparameter topreprocessing.KBinsDiscretizer. #16335 by Arthur Imbert.Fix Raise error on
sklearn.preprocessing.OneHotEncoder.inverse_transformwhenhandle_unknown='error'anddrop=Nonefor samples encoded as all zeros. #14982 by Kevin Winata.
sklearn.semi_supervised¶
Major Feature Added
semi_supervised.SelfTrainingClassifier, a meta-classifier that allows any supervised classifier to function as a semi-supervised classifier that can learn from unlabeled data. #11682 by Oliver Rausch and Patrice Becker.Fix Fix incorrect encoding when using unicode string dtypes in
preprocessing.OneHotEncoderandpreprocessing.OrdinalEncoder. #15763 by Thomas Fan.
sklearn.svm¶
sklearn.tree¶
Feature
tree.DecisionTreeRegressornow supports the new splitting criterion'poisson'useful for modeling count data. #17386 by Christian Lorentzen.Enhancement
tree.plot_treenow uses colors from the matplotlib configuration settings. #17187 by Andreas Müller.API Change The parameter
X_idx_sortedis now deprecated intree.DecisionTreeClassifier.fitandtree.DecisionTreeRegressor.fit, and has not effect. #17614 by Juan Carlos Alfaro Jiménez.
sklearn.utils¶
Enhancement Add
check_methods_sample_order_invariancetocheck_estimator, which checks that estimator methods are invariant if applied to the same dataset with different sample order #17598 by Jason Ngo.Enhancement Add support for weights in
utils.sparse_func.incr_mean_variance_axis. By Maria Telenczuk and Alex Gramfort.Fix Raise ValueError with clear error message in
check_arrayfor sparse DataFrames with mixed types. #17992 by Thomas J. Fan and Alex Shacked.Fix Allow serialized tree based models to be unpickled on a machine with different endianness. #17644 by Qi Zhang.
Fix Check that we raise proper error when axis=1 and the dimensions do not match in
utils.sparse_func.incr_mean_variance_axis. By Alex Gramfort.
Miscellaneous¶
Code and Documentation Contributors¶
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 0.23, including:
Abhinav Gupta, Abo7atm, Adam Spannbauer, Adrian Garcia Badaracco, Adrian Sadłocha, Adrin Jalali, adrinjalali, Agamemnon Krasoulis, Akshay Deodhar, Albert Villanova del Moral, Alessandro Gentile, Alessia Marcolini, Alexander Lenail, alexandracraciun, Alexandre Gramfort, Alexandr Fonari, Alex Henrie, Alex Itkes, Alex Liang, alexshacked, Alihan Zihna, Allan D Butler, Amanda Dsouza, amy12xx, Anand Tiwari, Ana Pessoa, Anderson Nelson, Andreas Mueller, Andrew Delong, Ankit Choraria, Archana Subramaniyan, Arthur Imbert, Ashish, Ashutosh Hathidara, Ashutosh Kushwaha, Atsushi Nukariya, Aura Munoz, AutoViz and Auto_ViML, Avi Gupta, Avinash Anakal, Ayako YAGI, Ayush Singh, BaptBillard, barankarakus, barberogaston, beatrizsmg, Benjamin Bossan, Benjamin Pedigo, Ben Mainye, Bharat Raghunathan, Bhavika Devnani, Biprateep Dey, bmaisonn, Bo Chang, Boris Villazón-Terrazas, brigi, Brigitta Sipőcz, Bruno Charron, Byron Smith, Cary Goltermann, Cat Chenal, CeeThinwa, chaitanyamogal, Charles Patel, Chiara Marmo, Christian Kastner, Christian Lorentzen, Christoph Deil, Christopher Yeh, Christos Aridas, Clara Matos, cliffordEmmanuel, clmbst, Coelhudo, Connor Tann, crispinlogan, Cristina Mulas, Daniel López, Daniel Mohns, darioka, Darshan N, david-cortes, Declan O’Neill, Deeksha Madan, dmallia17, EdwinWenink, EL-ATEIF Sara, Elizabeth DuPre, Eric Fiegel, Erich Schubert, Eric Larson, Erin Khoo, Erin R Hoffman, eschibli, Felix Wick, fhaselbeck, flyingdutchman23, Forrest Koch, Fortune Uwha, Francesco Casalegno, Frans Larsson, Gael Varoquaux, Gaurav Desai, Gaurav Sheni, genvalen, Geoffrey Bolmier, Geoffrey Thomas, George Armstrong, George Kiragu, Gesa Stupperich, Ghislain Antony Vaillant, Gim Seng, Gordon Walsh, Gregory R. Lee, Guillaume Chevalier, Guillaume Lemaitre, guiweber, Haesun Park, Hannah Bohle, Hans Moritz Günther, Hao Chun Chang, Harry Scholes, Harry Wei, Harsh Soni, Helder Geovane Gomes de Lima, Henry, Hirofumi Suzuki, Hitesh Somani, Hoda1394, Hugo Le Moine, hugorichard, indecisiveuser, Isaack Mungui, Ishan Mishra, Isuru Fernando, Ivan Wiryadi, iwhalvic, j0rd1smit, Jaehyun Ahn, Jake Tae, James Alan Preiss, James Budarz, James Hoctor, Jan Vesely, Jeevan Anand Anne, Jérémie du Boisberranger, JeroenPeterBos, JHayes, Jianzhu Guo, Jiaxiang, Jie Zheng, Jigna Panchal, jim0421, Jin Li, Joaquin Vanschoren, Joel Nothman, JohanWork, Jona Sassenhagen, Jonathan, Jon Haitz Legarreta Gorroño, Jorge Gorbe Moya, Joseph Lucas, Joshua Newton, Juan Carlos Alfaro Jiménez, Julien Jerphanion, Justin Huber, Kartik Chugh, Katarina Slama, kaylani2, Kei Ishikawa, Kendrick Cetina, Kenny Huynh, Kevin Markham, Kevin Winata, Kiril Isakov, kishimoto, Koki Nishihara, Krum Arnaudov, Kunj, Kyle Kosic, Lauren Oldja, Laurenz Reitsam, Lisa Schwetlick, Loic Esteve, Louis Douge, Louis Guitton, Lucy Liu, Madhura Jayaratne, maikia, Manimaran, Manuel López-Ibáñez, Maren Westermann, Mariam-ke, Maria Telenczuk, Marijn van Vliet, Markus Löning, Martina G. Vilas, Martina Megasari, Martin Hirzel, Martin Scheubrein, Mateusz Górski, Mathieu Blondel, mathschy, mathurinm, Matthias Bussonnier, Max Del Giudice, Mehmet Ali Özer, Miao Cai, Michael, Milan Straka, Muhammad Jarir Kanji, Muoki Caleb, Nadia Tahiri, Ph. D, Naoki Hamada, Neil Botelho, N. Haiat, Nicolas Hug, Nigel Bosch, Nils Werner, Nodar Okroshiashvili, noelano, Norbert Preining, Ogbonna Chibuike Stephen, oj_lappi, Oleh Kozynets, Olivier Grisel, Pankaj Jindal, Pardeep Singh, Parthiv Chigurupati, Patrice Becker, Paulo S. Costa, Pete Green, Peter Dye, pgithubs, Poorna Kumar, Prabakaran Kumaresshan, Probinette4, pspachtholz, putschblos, pwalchessen, Qi Zhang, rachel fischoff, Rachit Toshniwal, Rafey Iqbal Rahman, Rahul Jakhar, Ram Rachum, RamyaNP, ranjanikrishnan, rauwuckl, Ravi Kiran Boggavarapu, Ray Bell, Reshama Shaikh, Richard Decal, RichardScottOZ, Rishi Advani, Rithvik Rao, Rob Romijnders, roei, Romain Tavenard, Roman Yurchak, Ruby Werman, Ryotaro Tsukada, sadak, Saket Khandelwal, Sam, Sam Ezebunandu, Sam Kimbinyi, Samuel Brice, Sandy Khosasi, Sarah Brown, Saurabh Jain, Sean Benhur J, Sean O. Stalley, Sebastian Pölsterl, Sergio, Shail Shah, Shane Keller, Shao Yang Hong, Shashank Singh, shinnar, Shooter23, Shubhanshu Mishra, simonamaggio, Sina Tootoonian, Soledad Galli, Srimukh Sripada, Stephan Steinfurt, Steve Stagg, subrat93, Sunitha Selvan, Swier, SylvainLan, Sylvain Marié, Teon L Brooks, Terence Honles, TFiFiE, Thijs van den Berg, Thomas9292, Thomas J Fan, Thomas J. Fan, Thomas S Benjamin, Thorben Jensen, tijanajovanovic, Timo Kaufmann, t-kusanagi2, tnwei, Tom Dupré la Tour, Trevor Waite, ufmayer, Umberto Lupo, vadim-ushtanit, Vangelis Gkiastas, Venkatachalam N, Vikas Pandey, Vinicius Rios Fuck, Violeta, Vlasovets, waijean, watchtheblur, Wenbo Zhao, willpeppo, xavier dupré, Xethan, xiaoyuchai, Xue Qianming, xun-tang, yagi-3, Yakov Pchelintsev, Yashika Sharma, Yi-Yan Ge, Yosuke KOBAYASHI, Yue Wu, Yutaro Ikeda, yzhenman, Zaccharie Ramzi, Zito, Zito Relova, zoj613, Zhao Feng.