May 18 2020
May 12 2020
For a short description of the main highlights of the release, please refer to Release Highlights for scikit-learn 0.23.
Legend for changelogs¶
Major Feature : something big that you couldn’t do before.
Feature : something that you couldn’t do before.
Efficiency : an existing feature now may not require as much computation or memory.
Enhancement : a miscellaneous minor improvement.
Fix : something that previously didn’t work as documentated – or according to reasonable expectations – should now work.
API Change : you will need to change your code to have the same effect in the future; or a feature will be removed in the future.
Put the changes in their relevant module.
Enforcing keyword-only arguments¶
In an effort to promote clear and non-ambiguous use of the library, most
constructor and function parameters are now expected to be passed as keyword
arguments (i.e. using the
param=value syntax) instead of positional. To
ease the transition, a
FutureWarning is raised if a keyword-only parameter
is used as positional. In version 0.25, these parameters will be strictly
keyword-only, and a
TypeError will be raised.
#15005 by Joel Nothman, Adrin Jalali, Thomas Fan, and
Nicolas Hug. See SLEP009
for more details.
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
metrics.mutual_info_scorewith negative scores.
metrics.confusion_matrixwith zero length
partial_fitand sparse input.
ensemble.GradientBoostingClassifieras well as
ensemble.GradientBoostingRegressorand read-only float32 input in
Details are listed in the changelog below.
(While we are trying to better inform users by providing this information, we cannot assure that this list is complete.)
cluster.Birchimplementation of the predict method avoids high memory footprint by calculating the distances matrix using a chunked scheme. #16149 by Jeremie du Boisberranger and Alex Shacked.
Efficiency Major Feature The critical parts of
cluster.KMeanshave a more optimized implementation. Parallelism is now over the data instead of over initializations allowing better scalability. #11950 by Jeremie du Boisberranger.
API Change The
cluster.SpectralBiclusteringis deprecated. They now use OpenMP based parallelism. For more details on how to control the number of threads, please refer to our Parallelism notes. #11950 by Jeremie du Boisberranger.
'passthrough'columns, with the feature name being either the column name for a dataframe, or
'xi'for column index
i. #14048 by Lewis Ball.
Feature embedded dataset loaders
load_winenow support loading as a pandas
as_frame=True. #15980 by @wconnell and Reshama Shaikh.
API Change The
StreamHandlerwas removed from
sklearn.loggerto avoid double logging of messages in common cases where a hander is attached to the root logger, and to follow the Python logging documentation recommendation for libraries to leave the log message handling to users and application code. #16451 by Christoph Deil.
n_components='mle'now correctly handles small eigenvalues, and does not infer 0 as the correct number of components. #16224 by Lisa Schwetlick, and Gelavizh Ahmadi and Marija Vlajic Wheeler and #16841 by Nicolas Hug.
Feature Early stopping in
ensemble.HistGradientBoostingRegressoris now determined with a new
early_stoppingparameter instead of
n_iter_no_change. Default value is ‘auto’, which enables early stopping if there are at least 10,000 samples in the training set. #14516 by Johann Faouzi.
ensemble.HistGradientBoostingRegressornow support monotonic constraints, useful when features are supposed to have a positive/negative effect on the target. #15582 by Nicolas Hug.
API Change Fixed a bug in
ensemble.HistGradientBoostingRegrerssorthat would not respect the
max_leaf_nodesparameter if the criteria was reached at the same time as the
max_depthcriteria. #16183 by Nicolas Hug.
Fix Changed the convention for
ensemble.HistGradientBoostingRegressor. The depth now corresponds to the number of edges to go from the root to the deepest leaf. Stumps (trees with one split) are now allowed. #16182 by Santhosh B
Fix Fixed a bug in
ensemble.IsolationForestwhere the attribute
estimators_samples_did not generate the proper indices used during
fit. #16437 by Jin-Hwan CHO.
Fix Fixed a bug in
sample_weightargument was not being passed to
cross_val_predictwhen evaluating the base estimators on cross-validation folds to obtain the input to the meta estimator. #16539 by Bill DeRose.
Fix Fixed a bug where
ensemble.HistGradientBoostingClassifierwould fail with multiple calls to fit when
early_stopping=True, and there is no validation set. #16663 by Thomas Fan.
impute.IterativeImputeraccepts both scalar and array-like inputs for
min_value. Array-like inputs allow a different max and min to be specified for each feature. #16403 by Narendra Mukherjee.
Major Feature Added generalized linear models (GLM) with non normal error distributions, including
linear_model.TweedieRegressorwhich use Poisson, Gamma and Tweedie distributions respectively. #14300 by Christian Lorentzen, Roman Yurchak, and Olivier Grisel.
linear_model.RidgeClassifierCVnow does not allocate a potentially large array to store dual coefficients for all hyperparameters during its
fit, nor an array to store all error or LOO predictions unless
True. #15652 by Jérôme Dockès.
linear_model.Larsnow support a
jitterparameter that adds random noise to the target. This might help with stability in some edge cases. #15179 by @angelaambroz.
Fix Fixed a bug where if a
sample_weightparameter was passed to the fit method of
linear_model.RANSACRegressor, it would not be passed to the wrapped
base_estimatorduring the fitting of the final model. #15773 by Jeremy Alexandre.
linear_model.LogisticRegressionwill now avoid an unnecessary iteration when
solver='newton-cg'by checking for inferior or equal instead of strictly inferior for maximum of
utils.optimize._newton_cg. #16266 by Rushabh Vasani.
API Change Deprecated public attributes
linear_model.PassiveAggressiveRegressor. #16261 by Carlos Brandt.
linear_model.ARDRegressionis more stable and much faster when
n_samples > n_features. It can now scale to hundreds of thousands of samples. The stability fix might imply changes in the number of non-zero coefficients and in the predicted output. #16849 by Nicolas Hug.
Fix Fixed a bug in
linear_model.MultitaskLassoCVwhere fitting would fail when using joblib loky backend. #14264 by Jérémie du Boisberranger.
Efficiency Speed up
linear_model.MultiTaskElasticNetCVby avoiding slower BLAS Level 2 calls on small arrays #17021 by Alex Gramfort and Mathurin Massias.
Fix Fixed a bug in
metrics.confusion_matrixthat would raise an error when
y_predwere length zero and
None. In addition, we raise an error when an empty list is given to the
labelsparameter. #16442 by
Kyle Parsons <parsons-kyle-89>.
API Change Changed the formatting of values in
metrics.plot_confusion_matrixto pick the shorter format (either ‘2g’ or ‘d’). #16159 by Rick Mackenbach and Thomas Fan.
API Change From version 0.25,
metrics.pairwise.pairwise_distanceswill no longer automatically compute the
VIparameter for Mahalanobis distance and the
Vparameter for seuclidean distance if
Yis passed. The user will be expected to compute this parameter on the training data of their choice and pass it to
pairwise_distances. #16993 by Joel Nothman.
model_selection.RandomizedSearchCVyields stack trace information in fit failed warning messages in addition to previously emitted type and details. #15622 by Gregory Morse.
preprocessing.RobustScalernow supports pandas’ nullable integer dtype with missing values. #16508 by Thomas Fan.
Fix Efficiency Improved
liblinearrandom number generators used to randomly select coordinates in the coordinate descent algorithms. Platform-dependent C
rand()was used, which is only able to generate numbers up to
32767on windows platform (see this blog post) and also has poor randomization power as suggested by this presentation. It was replaced with C++11
mt19937, a Mersenne Twister that correctly generates 31bits/63bits random numbers on all platforms. In addition, the crude “modulo” postprocessor used to get a random number in a bounded interval was replaced by the tweaked Lemire method as suggested by this blog post. Any model using the
linear_model.LogisticRegression, is affected. In particular users can expect a better convergence when the number of samples (LibSVM) or the number of features (LibLinear) is large. #13511 by Sylvain Marié.
Fix Fix use of custom kernel not taking float entries such as string kernels in
svm.SVR. Note that custom kennels are now expected to validate their input where they previously received valid numeric arrays. #11296 by Alexandre Gramfort and Georgi Peev.
Fix Fix support of read-only float32 array input in
ensemble.GradientBoostingClassifieras well as
ensemble.GradientBoostingRegressor. #16331 by Alexandre Batisse.
Major Feature Estimators can now be displayed with a rich html representation. This can be enabled in Jupyter notebooks by setting
set_config. The raw html can be returned by using
utils.estimator_html_repr. #14180 by Thomas Fan.
utils.validation.check_arraysupports pandas’ nullable integer dtype with missing values when
force_all_finiteis set to
'allow-nan'in which case the data is converted to floating point values where
pd.NAvalues are replaced by
np.nan. As a consequence, all
sklearn.preprocessingtransformers that accept numeric inputs with missing values represented as
np.nannow also accepts being directly fed pandas dataframes with
pd.Int* or `pd.Uint*typed columns that use
pd.NAas a missing value marker. #16508 by Thomas Fan.
API Change Passing classes to
utils.estimator_checks.parametrize_with_checksis now deprecated, and support for classes will be removed in 0.24. Pass instances instead. #17032 by Nicolas Hug.
API Change The private utility
utils.estimator_checkswas removed, hence all tags should be obtained through
estimator._get_tags(). Note that Mixins like
RegressorMixinmust come before base classes in the MRO for
_get_tags()to work properly. #16950 by Nicolas Hug.
Major Feature Adds a HTML representation of estimators to be shown in a jupyter notebook or lab. This visualization is acitivated by setting the
sklearn.set_config. #14180 by Thomas Fan.
API Change Estimators now have a
requires_ytags which is False by default except for estimators that inherit from
~sklearn.base.ClassifierMixin. This tag is used to ensure that a proper error message is raised when y was expected but None was passed. #16622 by Nicolas Hug.
API Change The default setting
print_changed_onlyhas been changed from False to True. This means that the
reprof estimators is now more concise and only shows the parameters whose default value has been changed when printing an estimator. You can restore the previous behaviour by using
sklearn.set_config(print_changed_only=False). Also, note that it is always possible to quickly inspect the parameters of any estimator using
est.get_params(deep=False). #17061 by Nicolas Hug.
Code and Documentation Contributors¶
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 0.22, including:
Abbie Popa, Adrin Jalali, Aleksandra Kocot, Alexandre Batisse, Alexandre Gramfort, Alex Henrie, Alex Itkes, Alex Liang, alexshacked, Alonso Silva Allende, Ana Casado, Andreas Mueller, Angela Ambroz, Ankit810, Arie Pratama Sutiono, Arunav Konwar, Baptiste Maingret, Benjamin Beier Liu, bernie gray, Bharathi Srinivasan, Bharat Raghunathan, Bibhash Chandra Mitra, Brian Wignall, brigi, Brigitta Sipőcz, Carlos H Brandt, CastaChick, castor, cgsavard, Chiara Marmo, Chris Gregory, Christian Kastner, Christian Lorentzen, Corrie Bartelheimer, Daniël van Gelder, Daphne, David Breuer, david-cortes, dbauer9, Divyaprabha M, Edward Qian, Ekaterina Borovikova, ELNS, Emily Taylor, Erich Schubert, Eric Leung, Evgeni Chasnovski, Fabiana, Facundo Ferrín, Fan, Franziska Boenisch, Gael Varoquaux, Gaurav Sharma, Geoffrey Bolmier, Georgi Peev, gholdman1, Gonthier Nicolas, Gregory Morse, Gregory R. Lee, Guillaume Lemaitre, Gui Miotto, Hailey Nguyen, Hanmin Qin, Hao Chun Chang, HaoYin, Hélion du Mas des Bourboux, Himanshu Garg, Hirofumi Suzuki, huangk10, Hugo van Kemenade, Hye Sung Jung, indecisiveuser, inderjeet, J-A16, Jérémie du Boisberranger, Jin-Hwan CHO, JJmistry, Joel Nothman, Johann Faouzi, Jon Haitz Legarreta Gorroño, Juan Carlos Alfaro Jiménez, judithabk6, jumon, Kathryn Poole, Katrina Ni, Kesshi Jordan, Kevin Loftis, Kevin Markham, krishnachaitanya9, Lam Gia Thuan, Leland McInnes, Lisa Schwetlick, lkubin, Loic Esteve, lopusz, lrjball, lucgiffon, lucyleeow, Lucy Liu, Lukas Kemkes, Maciej J Mikulski, Madhura Jayaratne, Magda Zielinska, maikia, Mandy Gu, Manimaran, Manish Aradwad, Maren Westermann, Maria, Mariana Meireles, Marie Douriez, Marielle, Mateusz Górski, mathurinm, Matt Hall, Maura Pintor, mc4229, meyer89, m.fab, Michael Shoemaker, Michał Słapek, Mina Naghshhnejad, mo, Mohamed Maskani, Mojca Bertoncelj, narendramukherjee, ngshya, Nicholas Won, Nicolas Hug, nicolasservel, Niklas, @nkish, Noa Tamir, Oleksandr Pavlyk, olicairns, Oliver Urs Lenz, Olivier Grisel, parsons-kyle-89, Paula, Pete Green, Pierre Delanoue, pspachtholz, Pulkit Mehta, Qizhi Jiang, Quang Nguyen, rachelcjordan, raduspaimoc, Reshama Shaikh, Riccardo Folloni, Rick Mackenbach, Ritchie Ng, Roman Feldbauer, Roman Yurchak, Rory Hartong-Redden, Rüdiger Busche, Rushabh Vasani, Sambhav Kothari, Samesh Lakhotia, Samuel Duan, SanthoshBala18, Santiago M. Mola, Sarat Addepalli, scibol, Sebastian Kießling, SergioDSR, Sergul Aydore, Shiki-H, shivamgargsya, SHUBH CHATTERJEE, Siddharth Gupta, simonamaggio, smarie, Snowhite, stareh, Stephen Blystone, Stephen Marsh, Sunmi Yoon, SylvainLan, talgatomarov, tamirlan1, th0rwas, theoptips, Thomas J Fan, Thomas Li, Thomas Schmitt, Tim Nonner, Tim Vink, Tiphaine Viard, Tirth Patel, Titus Christian, Tom Dupré la Tour, trimeta, Vachan D A, Vandana Iyer, Venkatachalam N, waelbenamara, wconnell, wderose, wenliwyan, Windber, wornbb, Yu-Hang “Maxin” Tang