Legend for changelogs¶
Major Feature : something big that you couldn’t do before.
Feature : something that you couldn’t do before.
Efficiency : an existing feature now may not require as much computation or memory.
Enhancement : a miscellaneous minor improvement.
Fix : something that previously didn’t work as documentated – or according to reasonable expectations – should now work.
API Change : you will need to change your code to have the same effect in the future; or a feature will be removed in the future.
Put the changes in their relevant module.
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
Details are listed in the changelog below.
(While we are trying to better inform users by providing this information, we cannot assure that this list is complete.)
cluster.Birchimplementation of the predict method avoids high memory footprint by calculating the distances matrix using a chunked scheme. #16149 by Jeremie du Boisberranger and Alex Shacked.
API Change The
cluster.SpectralBiclusteringis deprecated. They now use OpenMP based parallelism. For more details on how to control the number of threads, please refer to our Parallelism notes. #11950 by Jeremie du Boisberranger.
Efficiency The critical parts of
cluster.KMeanshave a more optimized implementation. Parallelism is now over the data instead of over initializations allowing better scalability. #11950 by Jeremie du Boisberranger.
Feature embedded dataset loaders
load_winenow support loading as a pandas
as_frame=True. #15980 by @wconnell and Reshama Shaikh.
API Change Fixed a bug in
ensemble.HistGradientBoostingRegrerssorthat would not respect the
max_leaf_nodesparameter if the criteria was reached at the same time as the
max_depthcriteria. #16183 by Nicolas Hug.
Fix Changed the convention for
ensemble.HistGradientBoostingRegressor. The depth now corresponds to the number of edges to go from the root to the deepest leaf. Stumps (trees with one split) are now allowed. :pr:
16182by Santhosh B
Feature Early stopping in
ensemble.HistGradientBoostingRegressoris now determined with a new
early_stoppingparameter instead of
n_iter_no_change. Default value is ‘auto’, which enables early stopping if there are at least 10,000 samples in the training set. #14516 by Johann Faouzi.
ensemble.HistGradientBoostingRegressornow support monotonic constraints, useful when features are supposed to have a positive/negative effect on the target. #15582 by Nicolas Hug.
Fix Fixed a bug in
ensemble.IsolationForestwhere the attribute
estimators_samples_did not generate the proper indices used during
fit. #16437 by Jin-Hwan CHO.
Fix Fixed a bug in
sample_weightargument was not being passed to
cross_val_predictwhen evaluating the base estimators on cross-validation folds to obtain the input to the meta estimator. #16539 by Bill DeRose.
feature_extraction.text.CountVectorizernow sorts features after pruning them by document frequency. This improves performances for datasets with large vocabularies combined with
max_df. #15834 by Santiago M. Mola.
Major Feature Added generalized linear models (GLM) with non normal error distributions, including
linear_model.TweedieRegressorwhich use Poisson, Gamma and Tweedie distributions respectively. #14300 by Christian Lorentzen, Roman Yurchak, and Olivier Grisel.
Fix Fixed a bug where if a
sample_weightparameter was passed to the fit method of
linear_model.RANSACRegressor, it would not be passed to the wrapped
base_estimatorduring the fitting of the final model. #15573 by Jeremy Alexandre.
linear_model.RidgeClassifierCVnow does not allocate a potentially large array to store dual coefficients for all hyperparameters during its
fit, nor an array to store all error or LOO predictions unless
True. #15652 by Jérôme Dockès.
API Change Deprecated public attributes
linear_model.PassiveAggressiveRegressor. #16261 by Carlos Brandt.
linear_model.LogisticRegressionwill now avoid an unnecessary iteration when
solver='newton-cg'by checking for inferior or equal instead of strictly inferior for maximum of
utils.optimize._newton_cg. #16266 by Rushabh Vasani.
API Change Changed the formatting of values in
metrics.plot_confusion_matrixto pick the shorter format (either ‘2g’ or ‘d’). #16159 by Rick Mackenbach and Thomas Fan.
Fix Fixed a bug in
metrics.confusion_matrixthat would raise an error when
y_predwere length zero and
None. In addition, we raise an error when an empty list is given to the
labelsparameter. #16442 by
Kyle Parsons <parsons-kyle-89>.
model_selection.RandomizedSearchCVyields stack trace information in fit failed warning messages in addition to previously emitted type and details. #15622 by Gregory Morse.
Fix Fix use of custom kernel not taking float entries such as string kernels in
svm.SVR. Note that custom kennels are now expected to validate their input where they previously received valid numeric arrays. #11296 by Alexandre Gramfort and Georgi Peev.
Fix Fix support of read-only float32 array input in
ensemble.GradientBoostingClassifieras well as
ensemble.GradientBoostingRegressor. #16331 by Alexandre Batisse.
utils.validation.check_arraynow constructs a sparse matrix from a pandas DataFrame that contains only
SparseArray`s. :pr:`16728by Thomas Fan.