Version 1.6#
Legend for changelogs
Major Feature something big that you couldn’t do before.
Feature something that you couldn’t do before.
Efficiency an existing feature now may not require as much computation or memory.
Enhancement a miscellaneous minor improvement.
Fix something that previously didn’t work as documented – or according to reasonable expectations – should now work.
API Change you will need to change your code to have the same effect in the future; or a feature will be removed in the future.
Version 1.6.0#
In Development
Changes impacting many modules#
API Change
utils.validation.validate_data
is introduced and replaces previously privatebase.BaseEstimator._validate_data
method. This is intended for third party estimator developers, who should use this function in most cases instead ofutils.validation.check_array
andutils.validation.check_X_y
. #29696 by Adrin Jalali.Enhancement
__sklearn_tags__
was introduced for setting tags in estimators. More details in Estimator Tags. #22606 by Thomas Fan and #29677 by Adrin Jalali.
Support for Array API#
Additional estimators and functions have been updated to include support for all Array API compliant inputs.
See Array API support (experimental) for more details.
Functions:
sklearn.metrics.cluster.entropy
#29141 by Yaroslav Korobko;sklearn.metrics.mean_absolute_error
#27736 by Edoardo Abati and #29143 by Tialo and Loïc Estève;sklearn.metrics.mean_absolute_percentage_error
#29300 by Emily Chen;sklearn.metrics.mean_squared_error
#29142 by Yaroslav Korobko;sklearn.metrics.mean_squared_log_error
#29709 by Virgil Chan;sklearn.metrics.root_mean_squared_error
#29709 by Virgil Chan;sklearn.metrics.root_mean_squared_log_error
#29709 by Virgil Chan;sklearn.metrics.pairwise.additive_chi2_kernel
#29144 by Yaroslav Korobko;sklearn.metrics.pairwise.chi2_kernel
#29267 by Yaroslav Korobko;sklearn.metrics.pairwise.cosine_distances
#29265 by Emily Chen;sklearn.metrics.pairwise.cosine_similarity
#29014 by Edoardo Abati;sklearn.metrics.pairwise.euclidean_distances
#29433 by Omar Salman;sklearn.metrics.pairwise.linear_kernel
#29475 by Omar Salman;sklearn.metrics.pairwise.paired_cosine_distances
#29112 by Edoardo Abati.sklearn.metrics.pairwise.paired_euclidean_distances
#29389 by Emily Chen;sklearn.metrics.pairwise.polynomial_kernel
#29475 by Omar Salman;sklearn.metrics.pairwise.sigmoid_kernel
#29475 by Omar Salman.
Classes:
preprocessing.LabelEncoder
now supports Array API compatible inputs. #27381 by Omar Salman.model_selection.GridSearchCV
,model_selection.RandomizedSearchCV
,model_selection.HalvingGridSearchCV
andmodel_selection.HalvingRandomSearchCV
now support Array API compatible inputs when their base estimators do. #27096 by Tim Head and Olivier Grisel.preprocessing.MinMaxScaler
withclip=True
#29751 by Shreekant Nandiyawar
Other
Support for the soon to be deprecated
cupy.array_api
module has been removed in favor of directly supporting the top levelcupy
module, possibly via thearray_api_compat.cupy
compatibility wrapper. #29639 by Olivier Grisel.
Metadata Routing#
The following models now support metadata routing in one or more of their methods. Refer to the Metadata Routing User Guide for more details.
Feature
model_selection.learning_curve
now supports metadata routing for thefit
method of its estimator and for its underlying CV splitter and scorer. #28975 by Stefanie Senger.Feature
ensemble.StackingClassifier
andensemble.StackingRegressor
now support metadata routing and pass**fit_params
to the underlying estimators via theirfit
methods. #28701 by Stefanie Senger.Feature
compose.TransformedTargetRegressor
now supports metadata routing in itsfit
andpredict
methods and routes the corresponding params to the underlying regressor. #29136 by Omar Salman.Feature
feature_selection.SequentialFeatureSelector
now supports metadata routing in itsfit
method and passes the corresponding params to themodel_selection.cross_val_score
function. #29260 by Omar Salman.Feature
model_selection.validation_curve
now supports metadata routing for thefit
method of its estimator and for its underlying CV splitter and scorer. #29329 by Stefanie Senger.Feature
semi_supervised.SelfTrainingClassifier
now supports metadata routing. The fit method now accepts**fit_params
which are passed to the underlying estimators via theirfit
methods. In addition, thepredict
,predict_proba
,predict_log_proba
,score
anddecision_function
methods also accept**params
which are passed to the underlying estimators via their respective methods. #28494 by Adam Li.Feature
model_selection.permutation_test_score
now supports metadata routing for thefit
method of its estimator and for its underlying CV splitter and scorer. #29266 by Adam Li.Feature
feature_selection.RFE
andfeature_selection.RFECV
now support metadata routing. #29312 by Omar Salman.Fix Metadata is routed correctly to grouped CV splitters via
linear_model.RidgeCV
andlinear_model.RidgeClassifierCV
andUnsetMetadataPassedError
is fixed forlinear_model.RidgeClassifierCV
with default scoring. #29634 by Stefanie Senger.
Dropping support for building with setuptools#
From scikit-learn 1.6 onwards, support for building with setuptools has been removed. Meson is the only supported way to build scikit-learn, see Building from source for more details.
Dropping official support for PyPy#
Due to limited maintainer resources and small number of users, official PyPy support has been dropped. Some parts of scikit-learn may still work but PyPy is not tested anymore in the scikit-learn Continuous Integration. #29128 by Loïc Estève.
Changelog#
sklearn.base
#
Enhancement Added a function
base.is_clusterer
which determines whether a given estimator is of category clusterer. #28936 by Christian Veenhuis.
sklearn.cluster
#
API Change The
copy
parameter ofcluster.Birch
was deprecated in 1.6 and will be removed in 1.8. It has no effect as the estimator does not perform in-place operations on the input data. #29124 by Yao Xiao.
sklearn.compose
#
Enhancement
sklearn.compose.ColumnTransformer
verbose_feature_names_out
now accepts string format or callable to generate feature names. #28934 by Marc Bresson.
sklearn.covariance
#
Efficiency
covariance.MinCovDet
fitting is now slightly faster. #29835 by Antony Lee.
sklearn.cross_decomposition
#
Fix
cross_decomposition.PLSRegression
properly raises an error whenn_components
is larger thann_samples
. #29710 by Thomas Fan.
sklearn.datasets
#
Feature
datasets.fetch_file
allows downloading arbitrary data-file from the web. It handles local caching, integrity checks with SHA256 digests and automatic retries in case of HTTP errors. #29354 by Olivier Grisel.
sklearn.discriminant_analysis
#
Fix
discriminant_analysis.QuadraticDiscriminantAnalysis
will now causeLinAlgWarning
in case of collinear variables. These errors can be silenced using thereg_param
attribute. #19731 by Alihan Zihna.
sklearn.ensemble
#
Efficiency Small runtime improvement of fitting
ensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
by parallelizing the initial search for bin thresholds. #28064 by Christian Lorentzen.Enhancement The verbosity of
ensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
got a more granular control. Now,verbose = 1
prints only summary messages,verbose >= 2
prints the full information as before. #28179 by Christian Lorentzen.Efficiency
ensemble.IsolationForest
now runs parallel jobs during predict offering a speedup of up to 2-4x on sample sizes larger than 2000 usingjoblib
. #28622 by Adam Li and Sérgio Pereira.Feature
ensemble.ExtraTreesClassifier
andensemble.ExtraTreesRegressor
now support missing-values in the data matrixX
. Missing-values are handled by randomly moving all of the samples to the left, or right child node as the tree is traversed. #28268 by Adam Li.API Change The parameter
algorithm
ofensemble.AdaBoostClassifier
is deprecated and will be removed in 1.8. #29997 by Jérémie du Boisberranger.
sklearn.impute
#
Fix
impute.KNNImputer
excludes samples with nan distances when computing the mean value for uniform weights. #29135 by Xuefeng Xu.
sklearn.linear_model
#
Fix
linear_model.LogisticRegressionCV
corrects sample weight handling for the calculation of test scores. #29419 by Shruti Nath.Fix
linear_model.RidgeCV
now properly supports custom multioutput scorers by letting the scorer manage the multioutput averaging. Previously, the predictions and true targets were both squeezed to a 1D array before computing the error. #29884 by Guillaume Lemaitre.Fix
linear_model.RidgeCV
now properly uses predictions on the same scale as the target seen duringfit
. These predictions are stored incv_results_
whenscoring != None
. Previously, the predictions were rescaled by the square root of the sample weights and offset by the mean of the target, leading to an incorrect estimate of the score. #29842 by Guillaume Lemaitre, Jérôme Dockes and Hanmin Qin.API Change Deprecates
copy_X
inlinear_model.TheilSenRegressor
as the parameter has no effect.copy_X
will be removed in 1.8. #29105 by Adam Li.Fix
linear_model.LassoCV
andlinear_model.ElasticNetCV
now take sample weights into accounts to define the search grid for the internally tunedalpha
hyper-parameter. #29442 by John Hopfensperger <s-banach> and :user:`Shruti Nath.
sklearn.manifold
#
Efficiency
manifold.locally_linear_embedding
andmanifold.LocallyLinearEmbedding
now allocate more efficiently the memory of sparse matrices in the Hessian, Modified and LTSA methods. #28096 by Giorgio Angelotti.
sklearn.metrics
#
Enhancement
sklearn.metrics.check_scoring
now acceptsraise_exc
to specify whether to raise an exception if a subset of the scorers in multimetric scoring fails or to return an error code. #28992 by Stefanie Senger.Enhancement Adds
zero_division
tocohen_kappa_score
. When there is a division by zero, the metric is undefined and this value is returned. #29210 by Marc Torrellas Socastro and Stefanie Senger.Efficiency
sklearn.metrics.classification_report
is now faster by caching classification labels. #29738 by Adrin Jalali.API Change scoring=”neg_max_error” should be used instead of scoring=”max_error” which is now deprecated. #29462 by Farid “Freddie” Taba.
API Change the
assert_all_finite
parameter of functionsmetrics.pairwise.check_pairwise_arrays
andmetrics.pairwise_distances
is renamed intoensure_all_finite
.force_all_finite
will be removed in 1.8. #29404 by Jérémie du Boisberranger.Fix the functions
metrics.mean_squared_log_error
andmetrics.root_mean_squared_log_error
now check whether the inputs are within the correct domain for the function \(y=\log(1+x)\), rather than \(y=\log(x)\). #29709 by Virgil Chan.Fix the functions
metrics.mean_absolute_error
,metrics.mean_absolute_percentage_error
,metrics.mean_squared_error
andmetrics.root_mean_squared_error
now explicitly check whether a scalar will be returned whenmultioutput=uniform_average
. #29709 by Virgil Chan.
sklearn.model_selection
#
Enhancement Add the parameter
prefit
tomodel_selection.FixedThresholdClassifier
allowing the use of a pre-fitted estimator without re-fitting it. #29067 by Guillaume Lemaitre.Fix Improve error message when
model_selection.RepeatedStratifiedKFold.split
is called without ay
argument #29402 by Anurag Varma.
sklearn.neighbors
#
Fix
neighbors.LocalOutlierFactor
raises a warning in thefit
method when duplicate values in the training data lead to inaccurate outlier detection. #28773 by Henrique Caroço.
sklearn.preprocessing
#
Enhancement The HTML representation of
preprocessing.FunctionTransformer
will show the function name in the label. #29158 by Yao Xiao.Fix
preprocessing.PowerTransformer
now usesscipy.special.inv_boxcox
to outputnan
if the input of BoxCox’s inverse is invalid. #27875 by Xuefeng Xu.
sklearn.semi_supervised
#
API Change
semi_supervised.SelfTrainingClassifier
deprecated thebase_estimator
parameter in favor ofestimator
. #28494 by Adam Li.
sklearn.tree
#
Feature
tree.ExtraTreeClassifier
andtree.ExtraTreeRegressor
now support missing-values in the data matrixX
. Missing-values are handled by randomly moving all of the samples to the left, or right child node as the tree is traversed. #27966 by Adam Li.
sklearn.utils
#
Enhancement
utils.validation.check_array
now acceptsensure_non_negative
to check for negative values in the passed array, until now only available through callingutils.validation.check_non_negative
. #29540 by Tamara Atanasoska.Fix
utils.estimator_checks.parametrize_with_checks
andutils.estimator_checks.check_estimator
now support estimators that haveset_output
called on them. #29869 by Adrin Jalali.Enhancement
utils.validation.check_is_fitted
now passes on stateless estimators. An estimator can indicate it’s stateless by setting therequires_fit
tag. See Estimator Tags for more information. #29880 by Adrin Jalali.API Change the
assert_all_finite
parameter of functionsutils.check_array
,utils.check_X_y
,utils.as_float_array
is renamed intoensure_all_finite
.force_all_finite
will be removed in 1.8. #29404 by Jérémie du Boisberranger.
Code and documentation contributors
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.5, including:
TODO: update at the time of the release.