Version 1.4.0¶
In Development
Legend for changelogs¶
Major Feature : something big that you couldn’t do before.
Feature : something that you couldn’t do before.
Efficiency : an existing feature now may not require as much computation or memory.
Enhancement : a miscellaneous minor improvement.
Fix : something that previously didn’t work as documentated – or according to reasonable expectations – should now work.
API Change : you will need to change your code to have the same effect in the future; or a feature will be removed in the future.
Changed models¶
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
Changes impacting all modules¶
Enhancement All estimators now recognizes the column names from any dataframe that adopts the DataFrame Interchange Protocol. Dataframes that return a correct representation through
np.asarray(df)
is expected to work with our estimators and functions. #26464 by Thomas Fan.
Metadata Routing¶
The following models now support metadata routing in one or more or their methods. Refer to the Metadata Routing User Guide for more details.
Feature
pipeline.Pipeline
now supports metadata routing according to metadata routing user guide. #26789 by Adrin Jalali.Feature
cross_validate
,cross_val_score
, andcross_val_predict
now support metadata routing. The metadata are routed to the estimator’sfit
, the scorer, and the CV splitter’ssplit
. The metadata is accepted via the newparams
parameter.fit_params
is deprecated and will be removed in version 1.6.groups
parameter is also not accepted as a separate argument when metadata routing is enabled and should be passed via theparams
parameter. #26896 by Adrin Jalali.Feature
GridSearchCV
,RandomizedSearchCV
,HalvingGridSearchCV
, andHalvingRandomSearchCV
now support metadata routing in theirfit
andscore
, and route metadata to the underlying estimator’sfit
, the CV splitter, and the scorer. #27058 by Adrin Jalali.Enhancement
ColumnTransformer
now supports metadata routing according to metadata routing user guide. #27005 by Adrin Jalali.Enhancement
linear_model.LogisticRegressionCV
now supports metadata routing.linear_model.LogisticRegressionCV.fit
now accepts**params
which are passed to the underlying splitter and scorer.linear_model.LogisticRegressionCV.score
now accepts**score_params
which are passed to the underlying scorer. #26525 by Omar Salman.
Support for SciPy sparse arrays¶
Several estimators are now supporting SciPy sparse arrays. The following functions and classes are impacted:
Functions:
decomposition.non_negative_factorization
in #27100 by Isaac Virshup;metrics.f_regression
in #27239 by Yaroslav Korobko;metrics.r_regression
in #27239 by Yaroslav Korobko;sklearn.utils.multiclass.type_of_target
in #27274 by Yao Xiao.
Classes:
decomposition.NMF
in #27100 by Isaac Virshup;feature_extraction.text.TfidfTransformer
in #27219 by Yao Xiao;impute.SimpleImputer
in #27277 by Yao Xiao;impute.KNNImputer
in #27277 by Yao Xiao;kernel_approximation.PolynomialCountSketch
in #27301 by Lohit SundaramahaLingam;neural_network.BernoulliRBM
in #27252 byYao Xiao <Charlie-XIAO>
.
Changelog¶
sklearn.base
¶
Enhancement
base.ClusterMixin.fit_predict
andbase.OutlierMixin.fit_predict
now accept**kwargs
which are passed to thefit
method of the estimator. #26506 by Adrin Jalali.Enhancement
base.TransformerMixin.fit_transform
andbase.OutlierMixin.fit_predict
now raise a warning iftransform
/predict
consume metadata, but no customfit_transform
/fit_predict
is defined in the class inheriting from them correspondingly. #26831 by Adrin Jalali.Enhancement
base.clone
now supportsdict
as input and creates a copy. #26786 by Adrin Jalali.API Change
process_routing
now has a different signature. The first two (the object and the method) are positional only, and all metadata are passed as keyword arguments. #26909 by Adrin Jalali.
sklearn.calibration
¶
Enhancement The internal objective and gradient of the
sigmoid
method ofcalibration.CalibratedClassifierCV
have been replaced by the private loss module. #27185 by Omar Salman.
sklearn.cluster
¶
API Change :
kdtree
andballtree
values are now deprecated and are renamed askd_tree
andball_tree
respectively for thealgorithm
parameter ofcluster.HDBSCAN
ensuring consistency in naming convention.kdtree
andballtree
values will be removed in 1.6. #26744 by Shreesha Kumar Bhat.
sklearn.decomposition
¶
Enhancement An “auto” option was added to the
n_components
parameter ofdecomposition.non_negative_factorization
,decomposition.NMF
anddecomposition.MiniBatchNMF
to automatically infer the number of components from W or H shapes when using a custom initialization. The default value of this parameter will change fromNone
toauto
in version 1.6. #26634 by Alexandre Landeau and Alexandre Vigny.Enhancement
decomposition.PCA
now supports the Array API for thefull
andrandomized
solvers (with QR power iterations). See Array API support (experimental) for more details. #26315 and #27098 by Mateusz Sokół, Olivier Grisel and Edoardo Abati.
sklearn.ensemble
¶
Major Feature
ensemble.RandomForestClassifier
andensemble.RandomForestRegressor
support missing values when the criterion isgini
,entropy
, orlog_loss
, for classification orsquared_error
,friedman_mse
, orpoisson
for regression. #26391 by Thomas Fan.Feature
ensemble.RandomForestClassifier
,ensemble.RandomForestRegressor
,ensemble.ExtraTreesClassifier
andensemble.ExtraTreesRegressor
now support monotonic constraints, useful when features are supposed to have a positive/negative effect on the target. Missing values in the train data and multi-output targets are not supported. #13649 by Samuel Ronsin, initiated by Patrick O’Reilly.Efficiency
ensemble.GradientBoostingClassifier
is faster, for binary and in particular for multiclass problems thanks to the private loss function module. #26278 by Christian Lorentzen.Efficiency Improves runtime and memory usage for
ensemble.GradientBoostingClassifier
andensemble.GradientBoostingRegressor
when trained on sparse data. #26957 by Thomas Fan.API Change In
AdaBoostClassifier
, thealgorithm
argumentSAMME.R
was deprecated and will be removed in 1.6. #26830 by Stefanie Senger.
sklearn.linear_model
¶
Enhancement Solver
"newton-cg"
inLogisticRegression
andLogisticRegressionCV
uses a little less memory. The effect is proportional to the number of coefficients (n_features * n_classes
). #27417 by Christian Lorentzen.
sklearn.metrics
¶
Efficiency Computing pairwise distances via
metrics.DistanceMetric
for CSR × CSR, Dense × CSR, and CSR × Dense datasets is now 1.5x faster. #26765 by Meekail ZainEfficiency Computing distances via
metrics.DistanceMetric
for CSR × CSR, Dense × CSR, and CSR × Dense now uses ~50% less memory, and outputs distances in the same dtype as the provided data. #27006 by Meekail ZainEnhancement Improve the rendering of the plot obtained with the
metrics.PrecisionRecallDisplay
andmetrics.RocCurveDisplay
classes. the x- and y-axis limits are set to [0, 1] and the aspect ratio between both axis is set to be 1 to get a square plot. #26366 by Mojdeh Rastgoo.Enhancement Added
neg_root_mean_squared_log_error_scorer
as scorer #26734 by Alejandro Martin Gil.Enhancement
sklearn.metrics.accuracy_score
andsklearn.metrics.zero_one_loss
now support Array API compatible inputs. #27137 by Edoardo Abati.API Change The
squared
parameter ofmetrics.mean_squared_error
andmetrics.mean_squared_log_error
is deprecated and will be removed in 1.6. Use the new functionsmetrics.root_mean_squared_error
androot_mean_squared_log_error
instead. #26734 by Alejandro Martin Gil.
sklearn.model_selection
¶
Enhancement
sklearn.model_selection.train_test_split
now supports Array API compatible inputs. #26855 by Tim Head.Fix
model_selection.GridSearchCV
,model_selection.RandomizedSearchCV
, andmodel_selection.HalvingGridSearchCV
now don’t change the given object in the parameter grid if it’s an estimator. #26786 by Adrin Jalali.
sklearn.neighbors
¶
Efficiency
sklearn.neighbors.KNeighborsRegressor.predict
andsklearn.neighbors.KNeighborsRegressor.predict_proba
now efficiently support pairs of dense and sparse datasets. #27018 by Julien Jerphanion.API Change
neighbors.KNeighborsRegressor
now acceptsmetric.DistanceMetric
objects directly via themetric
keyword argument allowing for the use of accelerated third-partymetric.DistanceMetric
objects. #26267 by Meekail Zain
sklearn.preprocessing
¶
Major Feature
preprocessing.MinMaxScaler
andpreprocessing.MaxAbsScaler
now support the Array API. Array API support is considered experimental and might evolve without being subject to our usual rolling deprecation cycle policy. See Array API support (experimental) for more details. #26243 by Tim Head and #27110 by Edoardo Abati.Efficiency
preprocessing.OrdinalEncoder
avoids calculating missing indices twice to improve efficiency. #27017 by Xuefeng Xu.Enhancement Improves warnings in
preprocessing.FunctionTransfomer
whenfunc
returns a pandas dataframe and the output is configured to be pandas. #26944 by Thomas Fan.Enhancement
preprocessing.TargetEncoder
now supportstarget_type
‘multiclass’. #26674 by Lucy Liu.
sklearn.tree
¶
Feature
tree.DecisionTreeClassifier
,tree.DecisionTreeRegressor
,tree.ExtraTreeClassifier
andtree.ExtraTreeRegressor
now support monotonic constraints, useful when features are supposed to have a positive/negative effect on the target. Missing values in the train data and multi-output targets are not supported. #13649 by Samuel Ronsin, initiated by Patrick O’Reilly.
sklearn.utils
¶
Enhancement
sklearn.utils.estimator_html_repr
dynamically adapts diagram colors based on the browser’sprefers-color-scheme
, providing improved adaptability to dark mode environments. #26862 by Andrew Goh Yisheng, Thomas Fan, Adrin Jalali.Enhancement
MetadataRequest
andMetadataRouter
now have aconsumes
method which can be used to check whether a given set of parameters would be consumed. #26831 by Adrin Jalali.Fix
sklearn.utils.check_array
should accept both matrix and array from the sparse SciPy module. The previous implementation would fail ifcopy=True
by calling specific NumPynp.may_share_memory
that does not work with SciPy sparse array and does not return the correct result for SciPy sparse matrix. #27336 by Guillaume Lemaitre.
Code and Documentation Contributors¶
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.3, including:
TODO: update at the time of the release.