Version 1.0.2¶
In Development
Fix
cluster.Birch
,feature_selection.RFECV
,ensemble.RandomForestRegressor
,ensemble.RandomForestClassifier
,ensemble.GradientBoostingRegressor
, andensemble.GradientBoostingClassifier
do not raise warning when fitted on a pandas DataFrame anymore. #21578 by Thomas Fan.
Changelog¶
sklearn.cluster
¶
Fix Fixed an infinite loop in
cluster.SpectralClustering
by moving an iteration counter from try to except. #21271 by Tyler Martin.
sklearn.datasets
¶
Fix
datasets.fetch_openml
is now thread safe. Data is first downloaded to a temporary subfolder and then renamed. #21833 by Siavash Rezazadeh.
sklearn.decomposition
¶
Fix Fixed the constraint on the objective function of
decomposition.DictionaryLearning
,decomposition.MiniBatchDictionaryLearning
,decomposition.SparsePCA
anddecomposition.MiniBatchSparsePCA
to be convex and match the referenced article. #19210 by Jérémie du Boisberranger.
sklearn.ensemble
¶
Fix
ensemble.RandomForestClassifier
,ensemble.RandomForestRegressor
,ensemble.ExtraTreesClassifier
,ensemble.ExtraTreesRegressor
, andensemble.RandomTreesEmbedding
now raise aValueError
whenbootstrap=False
andmax_samples
is notNone
. #21295 Haoyin Xu.Fix Solve a bug in
ensemble.GradientBoostingClassifier
where the exponential loss was computing the positive gradient instead of the negative one. #22050 by Guillaume Lemaitre.
sklearn.feature_selection
¶
Fix Fixed
feature_selection.SelectFromModel
by improving support for base estimators that do not setfeature_names_in_
. #21991 by Thomas Fan.
sklearn.impute
¶
Fix Fix a bug in
linear_model.RidgeClassifierCV
where the methodpredict
was performing anargmax
on the scores obtained fromdecision_function
instead of returning the multilabel indicator matrix. #19869 by Guillaume Lemaitre.
sklearn.linear_model
¶
Fix
linear_model.LassoLarsIC
now correctly computes AIC and BIC. An error is now raised whenn_features > n_samples
and when the noise variance is not provided. #21481 by Guillaume Lemaitre and Andrés Babino.
sklearn.manifold
¶
Fix Fixed an unnecessary error when fitting
manifold.Isomap
with a precomputed dense distance matrix where the neighbors graph has multiple disconnected components. #21915 by Tom Dupre la Tour.
sklearn.metrics
¶
Fix All
sklearn.metrics.DistanceMetric
subclasses now correctly support read-only buffer attributes. This fixes a regression introduced in 1.0.0 with respect to 0.24.2. #21694 by Julien Jerphanion.Fix All
sklearn.metrics.MinkowskiDistance
now accepts a weight parameter that makes it possible to write code that behaves consistently both with scipy 1.8 and earlier versions. In turns this means that all neighbors-based estimators (except those that usealgorithm="kd_tree"
) now accept a weight parameter withmetric="minknowski"
to yield results that are always consistent withscipy.spatial.distance.cdist
. #21741 by Olivier Grisel.
sklearn.multiclass
¶
Fix
multiclass.OneVsRestClassifier.predict_proba
does not error when fitted on constant integer targets. #21871 by Thomas Fan.
sklearn.neighbors
¶
Fix
neighbors.KDTree
andneighbors.BallTree
correctly supports read-only buffer attributes. #21845 by Thomas Fan.
sklearn.preprocessing
¶
Fix Fixes compatibility bug with NumPy 1.22 in
preprocessing.OneHotEncoder
. #21517 by Thomas Fan.
sklearn.tree
¶
Fix Prevents
tree.plot_tree
from drawing out of the boundary of the figure. #21917 by Thomas Fan.Fix Support loading pickles of decision tree models when the pickle has been generated on a platform with a different bitness. A typical example is to train and pickle the model on 64 bit machine and load the model on a 32 bit machine for prediction. #21552 by Loïc Estève.
sklearn.utils
¶
Fix
utils.estimator_html_repr
now escapes all the estimator descriptions in the generated HTML. #21493 by Aurélien Geron.
Version 1.0.1¶
October 2021
Changelog¶
Fixed models¶
Fix Non-fit methods in the following classes do not raise a UserWarning when fitted on DataFrames with valid feature names:
covariance.EllipticEnvelope
,ensemble.IsolationForest
,ensemble.AdaBoostClassifier
,neighbors.KNeighborsClassifier
,neighbors.KNeighborsRegressor
,neighbors.RadiusNeighborsClassifier
,neighbors.RadiusNeighborsRegressor
. #21199 by Thomas Fan.
sklearn.calibration
¶
Fix Fixed
calibration.CalibratedClassifierCV
to take into accountsample_weight
when computing the base estimator prediction whenensemble=False
. #20638 by Julien Bohné.Fix Fixed a bug in
calibration.CalibratedClassifierCV
withmethod="sigmoid"
that was ignoring thesample_weight
when computing the the Bayesian priors. #21179 by Guillaume Lemaitre.
sklearn.cluster
¶
Fix Fixed a bug in
cluster.KMeans
, ensuring reproducibility and equivalence between sparse and dense input. #21195 by Jérémie du Boisberranger.
sklearn.ensemble
¶
Fix Fixed a bug that could produce a segfault in rare cases for
ensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
. #21130 Christian Lorentzen.
sklearn.gaussian_process
¶
Fix Compute
y_std
properly with multi-target insklearn.gaussian_process.GaussianProcessRegressor
allowing proper normalization in multi-target scene. #20761 by Patrick de C. T. R. Ferreira.
sklearn.feature_extraction
¶
Efficiency Fixed an efficiency regression introduced in version 1.0.0 in the
transform
method offeature_extraction.text.CountVectorizer
which no longer checks for uppercase characters in the provided vocabulary. #21251 by Jérémie du Boisberranger.Fix Fixed a bug in
feature_extraction.CountVectorizer
andfeature_extraction.TfidfVectorizer
by raising an error when ‘min_idf’ or ‘max_idf’ are floating-point numbers greater than 1. #20752 by Alek Lefebvre.
sklearn.linear_model
¶
Fix Improves stability of
linear_model.LassoLars
for different versions of openblas. #21340 by Thomas Fan.Fix
linear_model.LogisticRegression
now raises a better error message when the solver does not support sparse matrices with int64 indices. #21093 by Tom Dupre la Tour.
sklearn.neighbors
¶
Fix
neighbors.KNeighborsClassifier
,neighbors.KNeighborsRegressor
,neighbors.RadiusNeighborsClassifier
,neighbors.RadiusNeighborsRegressor
withmetric="precomputed"
raises an error forbsr
anddok
sparse matrices in methods:fit
,kneighbors
andradius_neighbors
, due to handling of explicit zeros inbsr
anddok
sparse graph formats. #21199 by Thomas Fan.
sklearn.pipeline
¶
Fix
pipeline.Pipeline.get_feature_names_out
correctly passes feature names out from one step of a pipeline to the next. #21351 by Thomas Fan.
sklearn.svm
¶
Fix
svm.SVC
andsvm.SVR
check for an inconsistency in its internal representation and raise an error instead of segfaulting. This fix also resolves CVE-2020-28975. #21336 by Thomas Fan.
sklearn.utils
¶
Enhancement
utils.validation._check_sample_weight
can perform a non-negativity check on the sample weights. It can be turned on using the only_non_negative bool parameter. Estimators that check for non-negative weights are updated:linear_model.LinearRegression
(here the previous error message was misleading),ensemble.AdaBoostClassifier
,ensemble.AdaBoostRegressor
,neighbors.KernelDensity
. #20880 by Guillaume Lemaitre and András Simon.Fix Solve a bug in
if_delegate_has_method
where the underlying check for an attribute did not work with NumPy arrays. #21145 by Zahlii.
Miscellaneous¶
Fix Fitting an estimator on a dataset that has no feature names, that was previously fitted on a dataset with feature names no longer keeps the old feature names stored in the
feature_names_in_
attribute. #21389 by Jérémie du Boisberranger.
Version 1.0.0¶
September 2021
For a short description of the main highlights of the release, please refer to Release Highlights for scikit-learn 1.0.
Legend for changelogs¶
Major Feature : something big that you couldn’t do before.
Feature : something that you couldn’t do before.
Efficiency : an existing feature now may not require as much computation or memory.
Enhancement : a miscellaneous minor improvement.
Fix : something that previously didn’t work as documentated – or according to reasonable expectations – should now work.
API Change : you will need to change your code to have the same effect in the future; or a feature will be removed in the future.
Minimal dependencies¶
Version 1.0.0 of scikit-learn requires python 3.7+, numpy 1.14.6+ and scipy 1.1.0+. Optional minimal dependency is matplotlib 2.2.2+.
Enforcing keyword-only arguments¶
In an effort to promote clear and non-ambiguous use of the library, most
constructor and function parameters must now be passed as keyword arguments
(i.e. using the param=value
syntax) instead of positional. If a keyword-only
parameter is used as positional, a TypeError
is now raised.
#15005 #20002 by Joel Nothman, Adrin Jalali, Thomas Fan,
Nicolas Hug, and Tom Dupre la Tour. See SLEP009
for more details.
Changed models¶
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
Fix
manifold.TSNE
now avoids numerical underflow issues during affinity matrix computation.Fix
manifold.Isomap
now connects disconnected components of the neighbors graph along some minimum distance pairs, instead of changing every infinite distances to zero.Fix The splitting criterion of
tree.DecisionTreeClassifier
andtree.DecisionTreeRegressor
can be impacted by a fix in the handling of rounding errors. Previously some extra spurious splits could occur.Fix
model_selection.train_test_split
with astratify
parameter andmodel_selection.StratifiedShuffleSplit
may lead to slightly different results.
Details are listed in the changelog below.
(While we are trying to better inform users by providing this information, we cannot assure that this list is complete.)
Changelog¶
API Change The option for using the squared error via
loss
andcriterion
parameters was made more consistent. The preferred way is by setting the value to"squared_error"
. Old option names are still valid, produce the same models, but are deprecated and will be removed in version 1.2. #19310 by Christian Lorentzen.For
ensemble.ExtraTreesRegressor
,criterion="mse"
is deprecated, use"squared_error"
instead which is now the default.For
ensemble.GradientBoostingRegressor
,loss="ls"
is deprecated, use"squared_error"
instead which is now the default.For
ensemble.RandomForestRegressor
,criterion="mse"
is deprecated, use"squared_error"
instead which is now the default.For
ensemble.HistGradientBoostingRegressor
,loss="least_squares"
is deprecated, use"squared_error"
instead which is now the default.For
linear_model.RANSACRegressor
,loss="squared_loss"
is deprecated, use"squared_error"
instead.For
linear_model.SGDRegressor
,loss="squared_loss"
is deprecated, use"squared_error"
instead which is now the default.For
tree.DecisionTreeRegressor
,criterion="mse"
is deprecated, use"squared_error"
instead which is now the default.For
tree.ExtraTreeRegressor
,criterion="mse"
is deprecated, use"squared_error"
instead which is now the default.
API Change The option for using the absolute error via
loss
andcriterion
parameters was made more consistent. The preferred way is by setting the value to"absolute_error"
. Old option names are still valid, produce the same models, but are deprecated and will be removed in version 1.2. #19733 by Christian Lorentzen.For
ensemble.ExtraTreesRegressor
,criterion="mae"
is deprecated, use"absolute_error"
instead.For
ensemble.GradientBoostingRegressor
,loss="lad"
is deprecated, use"absolute_error"
instead.For
ensemble.RandomForestRegressor
,criterion="mae"
is deprecated, use"absolute_error"
instead.For
ensemble.HistGradientBoostingRegressor
,loss="least_absolute_deviation"
is deprecated, use"absolute_error"
instead.For
linear_model.RANSACRegressor
,loss="absolute_loss"
is deprecated, use"absolute_error"
instead which is now the default.For
tree.DecisionTreeRegressor
,criterion="mae"
is deprecated, use"absolute_error"
instead.For
tree.ExtraTreeRegressor
,criterion="mae"
is deprecated, use"absolute_error"
instead.
API Change
np.matrix
usage is deprecated in 1.0 and will raise aTypeError
in 1.2. #20165 by Thomas Fan.API Change get_feature_names_out has been added to the transformer API to get the names of the output features. get_feature_names has in turn been deprecated. #18444 by Thomas Fan.
API Change All estimators store
feature_names_in_
when fitted on pandas Dataframes. These feature names are compared to names seen in non-fit
methods, e.g.transform
and will raise aFutureWarning
if they are not consistent. TheseFutureWarning
s will becomeValueError
s in 1.2. #18010 by Thomas Fan.
sklearn.base
¶
Fix
config_context
is now threadsafe. #18736 by Thomas Fan.
sklearn.calibration
¶
Feature
calibration.CalibrationDisplay
added to plot calibration curves. #17443 by Lucy Liu.Fix The
predict
andpredict_proba
methods ofcalibration.CalibratedClassifierCV
can now properly be used on prefitted pipelines. #19641 by Alek Lefebvre.Fix Fixed an error when using a
ensemble.VotingClassifier
asbase_estimator
incalibration.CalibratedClassifierCV
. #20087 by Clément Fauchereau.
sklearn.cluster
¶
Efficiency The
"k-means++"
initialization ofcluster.KMeans
andcluster.MiniBatchKMeans
is now faster, especially in multicore settings. #19002 by Jon Crall and Jérémie du Boisberranger.Efficiency
cluster.KMeans
withalgorithm='elkan'
is now faster in multicore settings. #19052 by Yusuke Nagasaka.Efficiency
cluster.MiniBatchKMeans
is now faster in multicore settings. #17622 by Jérémie du Boisberranger.Efficiency
cluster.OPTICS
can now cache the output of the computation of the tree, using thememory
parameter. #19024 by Frankie Robertson.Enhancement The
predict
andfit_predict
methods ofcluster.AffinityPropagation
now accept sparse data type for input data. #20117 by Venkatachalam NatchiappanFix Fixed a bug in
cluster.MiniBatchKMeans
where the sample weights were partially ignored when the input is sparse. #17622 by Jérémie du Boisberranger.Fix Improved convergence detection based on center change in
cluster.MiniBatchKMeans
which was almost never achievable. #17622 by Jérémie du Boisberranger.Fix
cluster.AgglomerativeClustering
now supports readonly memory-mapped datasets. #19883 by Julien Jerphanion.Fix
cluster.AgglomerativeClustering
correctly connects components when connectivity and affinity are both precomputed and the number of connected components is greater than 1. #20597 by Thomas Fan.Fix
cluster.FeatureAgglomeration
does not accept a**params
kwarg in thefit
function anymore, resulting in a more concise error message. #20899 by Adam Li.Fix Fixed a bug in
cluster.KMeans
, ensuring reproducibility and equivalence between sparse and dense input. #20200 by Jérémie du Boisberranger.API Change
cluster.Birch
attributes,fit_
andpartial_fit_
, are deprecated and will be removed in 1.2. #19297 by Thomas Fan.API Change the default value for the
batch_size
parameter ofcluster.MiniBatchKMeans
was changed from 100 to 1024 due to efficiency reasons. Then_iter_
attribute ofcluster.MiniBatchKMeans
now reports the number of started epochs and then_steps_
attribute reports the number of mini batches processed. #17622 by Jérémie du Boisberranger.API Change
cluster.spectral_clustering
raises an improved error when passed anp.matrix
. #20560 by Thomas Fan.
sklearn.compose
¶
Enhancement
compose.ColumnTransformer
now records the output of each transformer inoutput_indices_
. #18393 by Luca Bittarello.Enhancement
compose.ColumnTransformer
now allows DataFrame input to have its columns appear in a changed order intransform
. Further, columns that are dropped will not be required in transform, and additional columns will be ignored ifremainder='drop'
. #19263 by Thomas Fan.Enhancement Adds
**predict_params
keyword argument tocompose.TransformedTargetRegressor.predict
that passes keyword argument to the regressor. #19244 by Ricardo.Fix
compose.ColumnTransformer.get_feature_names
supports non-string feature names returned by any of its transformers. However, note thatget_feature_names
is deprecated, useget_feature_names_out
instead. #18459 by Albert Villanova del Moral and Alonso Silva Allende.Fix
compose.TransformedTargetRegressor
now takes nD targets with an adequate transformer. #18898 by Oras Phongpanagnam.API Change Adds
verbose_feature_names_out
tocompose.ColumnTransformer
. This flag controls the prefixing of feature names out in get_feature_names_out. #18444 and #21080 by Thomas Fan.
sklearn.covariance
¶
Fix Adds arrays check to
covariance.ledoit_wolf
andcovariance.ledoit_wolf_shrinkage
. #20416 by Hugo Defois.API Change Deprecates the following keys in
cv_results_
:'mean_score'
,'std_score'
, and'split(k)_score'
in favor of'mean_test_score'
'std_test_score'
, and'split(k)_test_score'
. #20583 by Thomas Fan.
sklearn.datasets
¶
Enhancement
datasets.fetch_openml
now supports categories with missing values when returning a pandas dataframe. #19365 by Thomas Fan and Amanda Dsouza and EL-ATEIF Sara.Enhancement
datasets.fetch_kddcup99
raises a better message when the cached file is invalid. #19669 Thomas Fan.Enhancement Replace usages of
__file__
related to resource file I/O withimportlib.resources
to avoid the assumption that these resource files (e.g.iris.csv
) already exist on a filesystem, and by extension to enable compatibility with tools such asPyOxidizer
. #20297 by Jack Liu.Fix Shorten data file names in the openml tests to better support installing on Windows and its default 260 character limit on file names. #20209 by Thomas Fan.
Fix
datasets.fetch_kddcup99
returns dataframes whenreturn_X_y=True
andas_frame=True
. #19011 by Thomas Fan.API Change Deprecates
datasets.load_boston
in 1.0 and it will be removed in 1.2. Alternative code snippets to load similar datasets are provided. Please report to the docstring of the function for details. #20729 by Guillaume Lemaitre.
sklearn.decomposition
¶
Enhancement added a new approximate solver (randomized SVD, available with
eigen_solver='randomized'
) todecomposition.KernelPCA
. This significantly accelerates computation when the number of samples is much larger than the desired number of components. #12069 by Sylvain Marié.Fix Fixes incorrect multiple data-conversion warnings when clustering boolean data. #19046 by Surya Prakash.
Fix Fixed
dict_learning
, used bydecomposition.DictionaryLearning
, to ensure determinism of the output. Achieved by flipping signs of the SVD output which is used to initialize the code. #18433 by Bruno Charron.Fix Fixed a bug in
decomposition.MiniBatchDictionaryLearning
,decomposition.MiniBatchSparsePCA
anddecomposition.dict_learning_online
where the update of the dictionary was incorrect. #19198 by Jérémie du Boisberranger.Fix Fixed a bug in
decomposition.DictionaryLearning
,decomposition.SparsePCA
,decomposition.MiniBatchDictionaryLearning
,decomposition.MiniBatchSparsePCA
,decomposition.dict_learning
anddecomposition.dict_learning_online
where the restart of unused atoms during the dictionary update was not working as expected. #19198 by Jérémie du Boisberranger.API Change In
decomposition.DictionaryLearning
,decomposition.MiniBatchDictionaryLearning
,decomposition.dict_learning
anddecomposition.dict_learning_online
,transform_alpha
will be equal toalpha
instead of 1.0 by default starting from version 1.2 #19159 by Benoît Malézieux.API Change Rename variable names in
KernelPCA
to improve readability.lambdas_
andalphas_
are renamed toeigenvalues_
andeigenvectors_
, respectively.lambdas_
andalphas_
are deprecated and will be removed in 1.2. #19908 by Kei Ishikawa.API Change The
alpha
andregularization
parameters ofdecomposition.NMF
anddecomposition.non_negative_factorization
are deprecated and will be removed in 1.2. Use the new parametersalpha_W
andalpha_H
instead. #20512 by Jérémie du Boisberranger.
sklearn.dummy
¶
API Change Attribute
n_features_in_
indummy.DummyRegressor
anddummy.DummyRegressor
is deprecated and will be removed in 1.2. #20960 by Thomas Fan.
sklearn.ensemble
¶
Enhancement
HistGradientBoostingClassifier
andHistGradientBoostingRegressor
take cgroups quotas into account when deciding the number of threads used by OpenMP. This avoids performance problems caused by over-subscription when using those classes in a docker container for instance. #20477 by Thomas Fan.Enhancement
HistGradientBoostingClassifier
andHistGradientBoostingRegressor
are no longer experimental. They are now considered stable and are subject to the same deprecation cycles as all other estimators. #19799 by Nicolas Hug.Enhancement Improve the HTML rendering of the
ensemble.StackingClassifier
andensemble.StackingRegressor
. #19564 by Thomas Fan.Enhancement Added Poisson criterion to
ensemble.RandomForestRegressor
. #19836 by Brian Sun.Fix Do not allow to compute out-of-bag (OOB) score in
ensemble.RandomForestClassifier
andensemble.ExtraTreesClassifier
with multiclass-multioutput target since scikit-learn does not provide any metric supporting this type of target. Additional private refactoring was performed. #19162 by Guillaume Lemaitre.Fix Improve numerical precision for weights boosting in
ensemble.AdaBoostClassifier
andensemble.AdaBoostRegressor
to avoid underflows. #10096 by Fenil Suchak.Fix Fixed the range of the argument
max_samples
to be(0.0, 1.0]
inensemble.RandomForestClassifier
,ensemble.RandomForestRegressor
, wheremax_samples=1.0
is interpreted as using alln_samples
for bootstrapping. #20159 by @murata-yu.Fix Fixed a bug in
ensemble.AdaBoostClassifier
andensemble.AdaBoostRegressor
where thesample_weight
parameter got overwritten duringfit
. #20534 by Guillaume Lemaitre.API Change Removes
tol=None
option inensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
. Please usetol=0
for the same behavior. #19296 by Thomas Fan.
sklearn.feature_extraction
¶
Fix Fixed a bug in
feature_extraction.text.HashingVectorizer
where some input strings would result in negative indices in the transformed data. #19035 by Liu Yu.Fix Fixed a bug in
feature_extraction.DictVectorizer
by raising an error with unsupported value type. #19520 by Jeff Zhao.Fix Fixed a bug in
feature_extraction.image.img_to_graph
andfeature_extraction.image.grid_to_graph
where singleton connected components were not handled properly, resulting in a wrong vertex indexing. #18964 by Bertrand Thirion.Fix Raise a warning in
feature_extraction.text.CountVectorizer
withlowercase=True
when there are vocabulary entries with uppercase characters to avoid silent misses in the resulting feature vectors. #19401 by Zito Relova
sklearn.feature_selection
¶
Feature
feature_selection.r_regression
computes Pearson’s R correlation coefficients between the features and the target. #17169 by Dmytro Lituiev and Julien Jerphanion.Enhancement
feature_selection.RFE.fit
accepts additional estimator parameters that are passed directly to the estimator’sfit
method. #20380 by Iván Pulido, Felipe Bidu, Gil Rutter, and Adrin Jalali.Fix Fix a bug in
isotonic.isotonic_regression
where thesample_weight
passed by a user were overwritten duringfit
. #20515 by Carsten Allefeld.Fix Change
feature_selection.SequentialFeatureSelector
to allow for unsupervised modelling so that thefit
signature need not do anyy
validation and allow fory=None
. #19568 by Shyam Desai.API Change Raises an error in
feature_selection.VarianceThreshold
when the variance threshold is negative. #20207 by Tomohiro EndoAPI Change Deprecates
grid_scores_
in favor of split scores incv_results_
infeature_selection.RFECV
.grid_scores_
will be removed in version 1.2. #20161 by Shuhei Kayawari and @arka204.
sklearn.inspection
¶
Enhancement Add
max_samples
parameter ininspection.permutation_importance
. It enables to draw a subset of the samples to compute the permutation importance. This is useful to keep the method tractable when evaluating feature importance on large datasets. #20431 by Oliver Pfaffel.Enhancement Add kwargs to format ICE and PD lines separately in partial dependence plots
inspection.plot_partial_dependence
andinspection.PartialDependenceDisplay.plot
. #19428 by Mehdi Hamoumi.Fix Allow multiple scorers input to
inspection.permutation_importance
. #19411 by Simona Maggio.API Change
inspection.PartialDependenceDisplay
exposes a class method:from_estimator
.inspection.plot_partial_dependence
is deprecated in favor of the class method and will be removed in 1.2. #20959 by Thomas Fan.
sklearn.kernel_approximation
¶
Fix Fix a bug in
kernel_approximation.Nystroem
where the attributecomponent_indices_
did not correspond to the subset of sample indices used to generate the approximated kernel. #20554 by Xiangyin Kong.
sklearn.linear_model
¶
Major Feature Added
linear_model.QuantileRegressor
which implements linear quantile regression with L1 penalty. #9978 by David Dale and Christian Lorentzen.Feature The new
linear_model.SGDOneClassSVM
provides an SGD implementation of the linear One-Class SVM. Combined with kernel approximation techniques, this implementation approximates the solution of a kernelized One Class SVM while benefitting from a linear complexity in the number of samples. #10027 by Albert Thomas.Feature Added
sample_weight
parameter tolinear_model.LassoCV
andlinear_model.ElasticNetCV
. #16449 by Christian Lorentzen.Feature Added new solver
lbfgs
(available withsolver="lbfgs"
) andpositive
argument tolinear_model.Ridge
. Whenpositive
is set toTrue
, forces the coefficients to be positive (only supported bylbfgs
). #20231 by Toshihiro Nakae.Efficiency The implementation of
linear_model.LogisticRegression
has been optimised for dense matrices when usingsolver='newton-cg'
andmulti_class!='multinomial'
. #19571 by Julien Jerphanion.Enhancement
fit
method preserves dtype for numpy.float32 inlinear_model.Lars
,linear_model.LassoLars
,linear_model.LassoLars
,linear_model.LarsCV
andlinear_model.LassoLarsCV
. #20155 by Takeshi Oura.Enhancement Validate user-supplied gram matrix passed to linear models via the
precompute
argument. #19004 by Adam Midvidy.Fix
linear_model.ElasticNet.fit
no longer modifiessample_weight
in place. #19055 by Thomas Fan.Fix
linear_model.Lasso
andlinear_model.ElasticNet
no longer have adual_gap_
not corresponding to their objective. #19172 by Mathurin MassiasFix
sample_weight
are now fully taken into account in linear models whennormalize=True
for both feature centering and feature scaling. #19426 by Alexandre Gramfort and Maria Telenczuk.Fix Points with residuals equal to
residual_threshold
are now considered as inliers forlinear_model.RANSACRegressor
. This allows fitting a model perfectly on some datasets whenresidual_threshold=0
. #19499 by Gregory Strubel.Fix Sample weight invariance for
linear_model.Ridge
was fixed in #19616 by Oliver Grisel and Christian Lorentzen.Fix The dictionary
params
inlinear_model.enet_path
andlinear_model.lasso_path
should only contain parameter of the coordinate descent solver. Otherwise, an error will be raised. #19391 by Shao Yang Hong.API Change Raise a warning in
linear_model.RANSACRegressor
that from version 1.2,min_samples
need to be set explicitly for models other thanlinear_model.LinearRegression
. #19390 by Shao Yang Hong.API Change : The parameter
normalize
oflinear_model.LinearRegression
is deprecated and will be removed in 1.2. Motivation for this deprecation:normalize
parameter did not take any effect iffit_intercept
was set to False and therefore was deemed confusing. The behavior of the deprecatedLinearModel(normalize=True)
can be reproduced with aPipeline
withLinearModel
(whereLinearModel
isLinearRegression
,Ridge
,RidgeClassifier
,RidgeCV
orRidgeClassifierCV
) as follows:make_pipeline(StandardScaler(with_mean=False), LinearModel())
. Thenormalize
parameter inLinearRegression
was deprecated in #17743 by Maria Telenczuk and Alexandre Gramfort. Same forRidge
,RidgeClassifier
,RidgeCV
, andRidgeClassifierCV
, in: #17772 by Maria Telenczuk and Alexandre Gramfort. Same forBayesianRidge
,ARDRegression
in: #17746 by Maria Telenczuk. Same forLasso
,LassoCV
,ElasticNet
,ElasticNetCV
,MultiTaskLasso
,MultiTaskLassoCV
,MultiTaskElasticNet
,MultiTaskElasticNetCV
, in: #17785 by Maria Telenczuk and Alexandre Gramfort.API Change The
normalize
parameter ofOrthogonalMatchingPursuit
andOrthogonalMatchingPursuitCV
will default to False in 1.2 and will be removed in 1.4. #17750 by Maria Telenczuk and Alexandre Gramfort. Same forLars
LarsCV
LassoLars
LassoLarsCV
LassoLarsIC
, in #17769 by Maria Telenczuk and Alexandre Gramfort.API Change Keyword validation has moved from
__init__
andset_params
tofit
for the following estimators conforming to scikit-learn’s conventions:SGDClassifier
,SGDRegressor
,SGDOneClassSVM
,PassiveAggressiveClassifier
, andPassiveAggressiveRegressor
. #20683 by Guillaume Lemaitre.
sklearn.manifold
¶
Enhancement Implement
'auto'
heuristic for thelearning_rate
inmanifold.TSNE
. It will become default in 1.2. The default initialization will change topca
in 1.2. PCA initialization will be scaled to have standard deviation 1e-4 in 1.2. #19491 by Dmitry Kobak.Fix Change numerical precision to prevent underflow issues during affinity matrix computation for
manifold.TSNE
. #19472 by Dmitry Kobak.Fix
manifold.Isomap
now usesscipy.sparse.csgraph.shortest_path
to compute the graph shortest path. It also connects disconnected components of the neighbors graph along some minimum distance pairs, instead of changing every infinite distances to zero. #20531 by Roman Yurchak and Tom Dupre la Tour.Fix Decrease the numerical default tolerance in the lobpcg call in
manifold.spectral_embedding
to prevent numerical instability. #21194 by Andrew Knyazev.
sklearn.metrics
¶
Feature
metrics.mean_pinball_loss
exposes the pinball loss for quantile regression. #19415 by Xavier Dupré and Oliver Grisel.Feature
metrics.d2_tweedie_score
calculates the D^2 regression score for Tweedie deviances with power parameterpower
. This is a generalization of ther2_score
and can be interpreted as percentage of Tweedie deviance explained. #17036 by Christian Lorentzen.Feature
metrics.mean_squared_log_error
now supportssquared=False
. #20326 by Uttam kumar.Efficiency Improved speed of
metrics.confusion_matrix
when labels are integral. #9843 by Jon Crall.Enhancement A fix to raise an error in
metrics.hinge_loss
whenpred_decision
is 1d whereas it is a multiclass classification or whenpred_decision
parameter is not consistent with thelabels
parameter. #19643 by Pierre Attard.Fix
metrics.ConfusionMatrixDisplay.plot
uses the correct max for colormap. #19784 by Thomas Fan.Fix Samples with zero
sample_weight
values do not affect the results frommetrics.det_curve
,metrics.precision_recall_curve
andmetrics.roc_curve
. #18328 by Albert Villanova del Moral and Alonso Silva Allende.Fix avoid overflow in
metrics.cluster.adjusted_rand_score
with large amount of data. #20312 by Divyanshu Deoli.API Change
metrics.ConfusionMatrixDisplay
exposes two class methodsfrom_estimator
andfrom_predictions
allowing to create a confusion matrix plot using an estimator or the predictions.metrics.plot_confusion_matrix
is deprecated in favor of these two class methods and will be removed in 1.2. #18543 by Guillaume Lemaitre.API Change
metrics.PrecisionRecallDisplay
exposes two class methodsfrom_estimator
andfrom_predictions
allowing to create a precision-recall curve using an estimator or the predictions.metrics.plot_precision_recall_curve
is deprecated in favor of these two class methods and will be removed in 1.2. #20552 by Guillaume Lemaitre.API Change
metrics.DetCurveDisplay
exposes two class methodsfrom_estimator
andfrom_predictions
allowing to create a confusion matrix plot using an estimator or the predictions.metrics.plot_det_curve
is deprecated in favor of these two class methods and will be removed in 1.2. #19278 by Guillaume Lemaitre.
sklearn.mixture
¶
Fix Ensure that the best parameters are set appropriately in the case of divergency for
mixture.GaussianMixture
andmixture.BayesianGaussianMixture
. #20030 by Tingshan Liu and Benjamin Pedigo.
sklearn.model_selection
¶
Feature added
model_selection.StratifiedGroupKFold
, that combinesmodel_selection.StratifiedKFold
andmodel_selection.GroupKFold
, providing an ability to split data preserving the distribution of classes in each split while keeping each group within a single split. #18649 by Leandro Hermida and Rodion Martynov.Enhancement warn only once in the main process for per-split fit failures in cross-validation. #20619 by Loïc Estève
Enhancement The
model_selection.BaseShuffleSplit
base class is now public. #20056 by @pabloduque0.Fix Avoid premature overflow in
model_selection.train_test_split
. #20904 by Tomasz Jakubek.
sklearn.naive_bayes
¶
Fix The
fit
andpartial_fit
methods of the discrete naive Bayes classifiers (naive_bayes.BernoulliNB
,naive_bayes.CategoricalNB
,naive_bayes.ComplementNB
, andnaive_bayes.MultinomialNB
) now correctly handle the degenerate case of a single class in the training set. #18925 by David Poznik.API Change The attribute
sigma_
is now deprecated innaive_bayes.GaussianNB
and will be removed in 1.2. Usevar_
instead. #18842 by Hong Shao Yang.
sklearn.neighbors
¶
Enhancement The creation of
neighbors.KDTree
andneighbors.BallTree
has been improved for their worst-cases time complexity from \(\mathcal{O}(n^2)\) to \(\mathcal{O}(n)\). #19473 by jiefangxuanyan and Julien Jerphanion.Fix
neighbors.DistanceMetric
subclasses now support readonly memory-mapped datasets. #19883 by Julien Jerphanion.Fix
neighbors.NearestNeighbors
,neighbors.KNeighborsClassifier
,neighbors.RadiusNeighborsClassifier
,neighbors.KNeighborsRegressor
andneighbors.RadiusNeighborsRegressor
do not validateweights
in__init__
and validatesweights
infit
instead. #20072 by Juan Carlos Alfaro Jiménez.API Change The parameter
kwargs
ofneighbors.RadiusNeighborsClassifier
is deprecated and will be removed in 1.2. #20842 by Juan Martín Loyola.
sklearn.neural_network
¶
Fix
neural_network.MLPClassifier
andneural_network.MLPRegressor
now correctly support continued training when loading from a pickled file. #19631 by Thomas Fan.
sklearn.pipeline
¶
API Change The
predict_proba
andpredict_log_proba
methods of thepipeline.Pipeline
now support passing prediction kwargs to the final estimator. #19790 by Christopher Flynn.
sklearn.preprocessing
¶
Feature The new
preprocessing.SplineTransformer
is a feature preprocessing tool for the generation of B-splines, parametrized by the polynomialdegree
of the splines, number of knotsn_knots
and knot positioning strategyknots
. #18368 by Christian Lorentzen.preprocessing.SplineTransformer
also supports periodic splines via theextrapolation
argument. #19483 by Malte Londschien.preprocessing.SplineTransformer
supports sample weights for knot position strategy"quantile"
. #20526 by Malte Londschien.Feature
preprocessing.OrdinalEncoder
supports passing through missing values by default. #19069 by Thomas Fan.Feature
preprocessing.OneHotEncoder
now supportshandle_unknown='ignore'
and dropping categories. #19041 by Thomas Fan.Feature
preprocessing.PolynomialFeatures
now supports passing a tuple todegree
, i.e.degree=(min_degree, max_degree)
. #20250 by Christian Lorentzen.Efficiency
preprocessing.StandardScaler
is faster and more memory efficient. #20652 by Thomas Fan.Efficiency Changed
algorithm
argument forcluster.KMeans
inpreprocessing.KBinsDiscretizer
fromauto
tofull
. #19934 by Gleb Levitskiy.Efficiency The implementation of
fit
forpreprocessing.PolynomialFeatures
transformer is now faster. This is especially noticeable on large sparse input. #19734 by Fred Robinson.Fix The
preprocessing.StandardScaler.inverse_transform
method now raises error when the input data is 1D. #19752 by Zhehao Liu.Fix
preprocessing.scale
,preprocessing.StandardScaler
and similar scalers detect near-constant features to avoid scaling them to very large values. This problem happens in particular when using a scaler on sparse data with a constant column with sample weights, in which case centering is typically disabled. #19527 by Oliver Grisel and Maria Telenczuk and #19788 by Jérémie du Boisberranger.Fix
preprocessing.StandardScaler.inverse_transform
now correctly handles integer dtypes. #19356 by @makoeppel.Fix
preprocessing.OrdinalEncoder.inverse_transform
is not supporting sparse matrix and raises the appropriate error message. #19879 by Guillaume Lemaitre.Fix The
fit
method ofpreprocessing.OrdinalEncoder
will not raise error whenhandle_unknown='ignore'
and unknown categories are given tofit
. #19906 by Zhehao Liu.Fix Fix a regression in
preprocessing.OrdinalEncoder
where large Python numeric would raise an error due to overflow when casted to C type (np.float64
ornp.int64
). #20727 by Guillaume Lemaitre.Fix
preprocessing.FunctionTransformer
does not setn_features_in_
based on the input toinverse_transform
. #20961 by Thomas Fan.API Change The
n_input_features_
attribute ofpreprocessing.PolynomialFeatures
is deprecated in favor ofn_features_in_
and will be removed in 1.2. #20240 by Jérémie du Boisberranger.
sklearn.svm
¶
API Change The parameter
**params
ofsvm.OneClassSVM.fit
is deprecated and will be removed in 1.2. #20843 by Juan Martín Loyola.
sklearn.tree
¶
Enhancement Add
fontname
argument intree.export_graphviz
for non-English characters. #18959 by Zero and wstates.Fix Improves compatibility of
tree.plot_tree
with high DPI screens. #20023 by Thomas Fan.Fix Fixed a bug in
tree.DecisionTreeClassifier
,tree.DecisionTreeRegressor
where a node could be split whereas it should not have been due to incorrect handling of rounding errors. #19336 by Jérémie du Boisberranger.API Change The
n_features_
attribute oftree.DecisionTreeClassifier
,tree.DecisionTreeRegressor
,tree.ExtraTreeClassifier
andtree.ExtraTreeRegressor
is deprecated in favor ofn_features_in_
and will be removed in 1.2. #20272 by Jérémie du Boisberranger.
sklearn.utils
¶
Enhancement Deprecated the default value of the
random_state=0
inrandomized_svd
. Starting in 1.2, the default value ofrandom_state
will be set toNone
. #19459 by Cindy Bezuidenhout and Clifford Akai-Nettey.Enhancement Added helper decorator
utils.metaestimators.available_if
to provide flexiblity in metaestimators making methods available or unavailable on the basis of state, in a more readable way. #19948 by Joel Nothman.Enhancement
utils.validation.check_is_fitted
now uses__sklearn_is_fitted__
if available, instead of checking for attributes ending with an underscore. This also makespipeline.Pipeline
andpreprocessing.FunctionTransformer
passcheck_is_fitted(estimator)
. #20657 by Adrin Jalali.Fix Fixed a bug in
utils.sparsefuncs.mean_variance_axis
where the precision of the computed variance was very poor when the real variance is exactly zero. #19766 by Jérémie du Boisberranger.Fix The docstrings of propreties that are decorated with
utils.deprecated
are now properly wrapped. #20385 by Thomas Fan.Fix
utils.stats._weighted_percentile
now correctly ignores zero-weighted observations smaller than the smallest observation with positive weight forpercentile=0
. Affected classes aredummy.DummyRegressor
forquantile=0
andensemble.HuberLossFunction
andensemble.HuberLossFunction
foralpha=0
. #20528 by Malte Londschien.Fix
utils._safe_indexing
explicitly takes a dataframe copy when integer indices are provided avoiding to raise a warning from Pandas. This warning was previously raised in resampling utilities and functions using those utilities (e.g.model_selection.train_test_split
,model_selection.cross_validate
,model_selection.cross_val_score
,model_selection.cross_val_predict
). #20673 by Joris Van den Bossche.Fix Fix a regression in
utils.is_scalar_nan
where large Python numbers would raise an error due to overflow in C types (np.float64
ornp.int64
). #20727 by Guillaume Lemaitre.Fix Support for
np.matrix
is deprecated incheck_array
in 1.0 and will raise aTypeError
in 1.2. #20165 by Thomas Fan.API Change
utils._testing.assert_warns
andutils._testing.assert_warns_message
are deprecated in 1.0 and will be removed in 1.2. Usedpytest.warns
context manager instead. Note that these functions were not documented and part from the public API. #20521 by Olivier Grisel.API Change Fixed several bugs in
utils.graph.graph_shortest_path
, which is now deprecated. Usescipy.sparse.csgraph.shortest_path
instead. #20531 by Tom Dupre la Tour.
Code and Documentation Contributors¶
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 0.24, including:
Abdulelah S. Al Mesfer, Abhinav Gupta, Adam J. Stewart, Adam Li, Adam Midvidy, Adrian Garcia Badaracco, Adrian Sadłocha, Adrin Jalali, Agamemnon Krasoulis, Alberto Rubiales, Albert Thomas, Albert Villanova del Moral, Alek Lefebvre, Alessia Marcolini, Alexandr Fonari, Alihan Zihna, Aline Ribeiro de Almeida, Amanda, Amanda Dsouza, Amol Deshmukh, Ana Pessoa, Anavelyz, Andreas Mueller, Andrew Delong, Ashish, Ashvith Shetty, Atsushi Nukariya, Aurélien Geron, Avi Gupta, Ayush Singh, baam, BaptBillard, Benjamin Pedigo, Bertrand Thirion, Bharat Raghunathan, bmalezieux, Brian Rice, Brian Sun, Bruno Charron, Bryan Chen, bumblebee, caherrera-meli, Carsten Allefeld, CeeThinwa, Chiara Marmo, chrissobel, Christian Lorentzen, Christopher Yeh, Chuliang Xiao, Clément Fauchereau, cliffordEmmanuel, Conner Shen, Connor Tann, David Dale, David Katz, David Poznik, Dimitri Papadopoulos Orfanos, Divyanshu Deoli, dmallia17, Dmitry Kobak, DS_anas, Eduardo Jardim, EdwinWenink, EL-ATEIF Sara, Eleni Markou, EricEllwanger, Eric Fiegel, Erich Schubert, Ezri-Mudde, Fatos Morina, Felipe Rodrigues, Felix Hafner, Fenil Suchak, flyingdutchman23, Flynn, Fortune Uwha, Francois Berenger, Frankie Robertson, Frans Larsson, Frederick Robinson, frellwan, Gabriel S Vicente, Gael Varoquaux, genvalen, Geoffrey Thomas, geroldcsendes, Gleb Levitskiy, Glen, Glòria Macià Muñoz, gregorystrubel, groceryheist, Guillaume Lemaitre, guiweber, Haidar Almubarak, Hans Moritz Günther, Haoyin Xu, Harris Mirza, Harry Wei, Harutaka Kawamura, Hassan Alsawadi, Helder Geovane Gomes de Lima, Hugo DEFOIS, Igor Ilic, Ikko Ashimine, Isaack Mungui, Ishaan Bhat, Ishan Mishra, Iván Pulido, iwhalvic, J Alexander, Jack Liu, James Alan Preiss, James Budarz, James Lamb, Jannik, Jeff Zhao, Jennifer Maldonado, Jérémie du Boisberranger, Jesse Lima, Jianzhu Guo, jnboehm, Joel Nothman, JohanWork, John Paton, Jonathan Schneider, Jon Crall, Jon Haitz Legarreta Gorroño, Joris Van den Bossche, José Manuel Nápoles Duarte, Juan Carlos Alfaro Jiménez, Juan Martin Loyola, Julien Jerphanion, Julio Batista Silva, julyrashchenko, JVM, Kadatatlu Kishore, Karen Palacio, Kei Ishikawa, kmatt10, kobaski, Kot271828, Kunj, KurumeYuta, kxytim, lacrosse91, LalliAcqua, Laveen Bagai, Leonardo Rocco, Leonardo Uieda, Leopoldo Corona, Loic Esteve, LSturtew, Luca Bittarello, Luccas Quadros, Lucy Jiménez, Lucy Liu, ly648499246, Mabu Manaileng, Manimaran, makoeppel, Marco Gorelli, Maren Westermann, Mariangela, Maria Telenczuk, marielaraj, Martin Hirzel, Mateo Noreña, Mathieu Blondel, Mathis Batoul, mathurinm, Matthew Calcote, Maxime Prieur, Maxwell, Mehdi Hamoumi, Mehmet Ali Özer, Miao Cai, Michal Karbownik, michalkrawczyk, Mitzi, mlondschien, Mohamed Haseeb, Mohamed Khoualed, Muhammad Jarir Kanji, murata-yu, Nadim Kawwa, Nanshan Li, naozin555, Nate Parsons, Neal Fultz, Nic Annau, Nicolas Hug, Nicolas Miller, Nico Stefani, Nigel Bosch, Nikita Titov, Nodar Okroshiashvili, Norbert Preining, novaya, Ogbonna Chibuike Stephen, OGordon100, Oliver Pfaffel, Olivier Grisel, Oras Phongpanangam, Pablo Duque, Pablo Ibieta-Jimenez, Patric Lacouth, Paulo S. Costa, Paweł Olszewski, Peter Dye, PierreAttard, Pierre-Yves Le Borgne, PranayAnchuri, Prince Canuma, putschblos, qdeffense, RamyaNP, ranjanikrishnan, Ray Bell, Rene Jean Corneille, Reshama Shaikh, ricardojnf, RichardScottOZ, Rodion Martynov, Rohan Paul, Roman Lutz, Roman Yurchak, Samuel Brice, Sandy Khosasi, Sean Benhur J, Sebastian Flores, Sebastian Pölsterl, Shao Yang Hong, shinehide, shinnar, shivamgargsya, Shooter23, Shuhei Kayawari, Shyam Desai, simonamaggio, Sina Tootoonian, solosilence, Steven Kolawole, Steve Stagg, Surya Prakash, swpease, Sylvain Marié, Takeshi Oura, Terence Honles, TFiFiE, Thomas A Caswell, Thomas J. Fan, Tim Gates, TimotheeMathieu, Timothy Wolodzko, Tim Vink, t-jakubek, t-kusanagi, tliu68, Tobias Uhmann, tom1092, Tomás Moreyra, Tomás Ronald Hughes, Tom Dupré la Tour, Tommaso Di Noto, Tomohiro Endo, TONY GEORGE, Toshihiro NAKAE, tsuga, Uttam kumar, vadim-ushtanit, Vangelis Gkiastas, Venkatachalam N, Vilém Zouhar, Vinicius Rios Fuck, Vlasovets, waijean, Whidou, xavier dupré, xiaoyuchai, Yasmeen Alsaedy, yoch, Yosuke KOBAYASHI, Yu Feng, YusukeNagasaka, yzhenman, Zero, ZeyuSun, ZhaoweiWang, Zito, Zito Relova