Version 0.22

For a short description of the main highlights of the release, please refer to Release Highlights for scikit-learn 0.22.

Legend for changelogs

  • Major Feature something big that you couldn’t do before.

  • Feature something that you couldn’t do before.

  • Efficiency an existing feature now may not require as much computation or memory.

  • Enhancement a miscellaneous minor improvement.

  • Fix something that previously didn’t work as documented – or according to reasonable expectations – should now work.

  • API Change you will need to change your code to have the same effect in the future; or a feature will be removed in the future.

Version 0.22.2.post1

March 3 2020

The 0.22.2.post1 release includes a packaging fix for the source distribution but the content of the packages is otherwise identical to the content of the wheels with the 0.22.2 version (without the .post1 suffix). Both contain the following changes.

Changelog

sklearn.impute

sklearn.metrics

sklearn.neighbors

Version 0.22.1

January 2 2020

This is a bug-fix release to primarily resolve some packaging issues in version 0.22.0. It also includes minor documentation improvements and some bug fixes.

Changelog

sklearn.cluster

sklearn.inspection

sklearn.metrics

sklearn.model_selection

sklearn.naive_bayes

  • Fix Removed abstractmethod decorator for the method _check_X in naive_bayes.BaseNB that could break downstream projects inheriting from this deprecated public base class. #15996 by Brigitta Sipőcz.

sklearn.preprocessing

sklearn.semi_supervised

sklearn.utils

  • Fix utils.check_array now correctly converts pandas DataFrame with boolean columns to floats. #15797 by Thomas Fan.

  • Fix utils.validation.check_is_fitted accepts back an explicit attributes argument to check for specific attributes as explicit markers of a fitted estimator. When no explicit attributes are provided, only the attributes that end with a underscore and do not start with double underscore are used as “fitted” markers. The all_or_any argument is also no longer deprecated. This change is made to restore some backward compatibility with the behavior of this utility in version 0.21. #15947 by Thomas Fan.

Version 0.22.0

December 3 2019

Website update

Our website was revamped and given a fresh new look. #14849 by Thomas Fan.

Clear definition of the public API

Scikit-learn has a public API, and a private API.

We do our best not to break the public API, and to only introduce backward-compatible changes that do not require any user action. However, in cases where that’s not possible, any change to the public API is subject to a deprecation cycle of two minor versions. The private API isn’t publicly documented and isn’t subject to any deprecation cycle, so users should not rely on its stability.

A function or object is public if it is documented in the API Reference and if it can be imported with an import path without leading underscores. For example sklearn.pipeline.make_pipeline is public, while sklearn.pipeline._name_estimators is private. sklearn.ensemble._gb.BaseEnsemble is private too because the whole _gb module is private.

Up to 0.22, some tools were de-facto public (no leading underscore), while they should have been private in the first place. In version 0.22, these tools have been made properly private, and the public API space has been cleaned. In addition, importing from most sub-modules is now deprecated: you should for example use from sklearn.cluster import Birch instead of from sklearn.cluster.birch import Birch (in practice, birch.py has been moved to _birch.py).

Note

All the tools in the public API should be documented in the API Reference. If you find a public tool (without leading underscore) that isn’t in the API reference, that means it should either be private or documented. Please let us know by opening an issue!

This work was tracked in issue 9250 and issue 12927.

Deprecations: using FutureWarning from now on

When deprecating a feature, previous versions of scikit-learn used to raise a DeprecationWarning. Since the DeprecationWarnings aren’t shown by default by Python, scikit-learn needed to resort to a custom warning filter to always show the warnings. That filter would sometimes interfere with users custom warning filters.

Starting from version 0.22, scikit-learn will show FutureWarnings for deprecations, as recommended by the Python documentation. FutureWarnings are always shown by default by Python, so the custom filter has been removed and scikit-learn no longer hinders with user filters. #15080 by Nicolas Hug.

Changed models

The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.

Details are listed in the changelog below.

(While we are trying to better inform users by providing this information, we cannot assure that this list is complete.)

Changelog

sklearn.base

  • API Change From version 0.24 base.BaseEstimator.get_params will raise an AttributeError rather than return None for parameters that are in the estimator’s constructor but not stored as attributes on the instance. #14464 by Joel Nothman.

sklearn.calibration

sklearn.cluster

sklearn.compose

sklearn.cross_decomposition

sklearn.datasets

sklearn.decomposition

sklearn.dummy

sklearn.ensemble

sklearn.feature_extraction

sklearn.feature_selection

sklearn.gaussian_process

sklearn.impute

sklearn.inspection

sklearn.kernel_approximation

sklearn.linear_model

sklearn.manifold

sklearn.metrics

sklearn.model_selection

sklearn.multioutput

sklearn.naive_bayes

sklearn.neighbors

sklearn.neural_network

sklearn.pipeline

sklearn.preprocessing

sklearn.model_selection

sklearn.svm

  • Enhancement svm.SVC and svm.NuSVC now accept a break_ties parameter. This parameter results in predict breaking the ties according to the confidence values of decision_function, if decision_function_shape='ovr', and the number of target classes > 2. #12557 by Adrin Jalali.

  • Enhancement SVM estimators now throw a more specific error when kernel='precomputed' and fit on non-square data. #14336 by Gregory Dexter.

  • Fix svm.SVC, svm.SVR, svm.NuSVR and svm.OneClassSVM when received values negative or zero for parameter sample_weight in method fit(), generated an invalid model. This behavior occurred only in some border scenarios. Now in these cases, fit() will fail with an Exception. #14286 by Alex Shacked.

  • Fix The n_support_ attribute of svm.SVR and svm.OneClassSVM was previously non-initialized, and had size 2. It has now size 1 with the correct value. #15099 by Nicolas Hug.

  • Fix fixed a bug in BaseLibSVM._sparse_fit where n_SV=0 raised a ZeroDivisionError. #14894 by Danna Naser.

  • Fix The liblinear solver now supports sample_weight. #15038 by Guillaume Lemaitre.

sklearn.tree

sklearn.utils

  • Feature check_estimator can now generate checks by setting generate_only=True. Previously, running check_estimator will stop when the first check fails. With generate_only=True, all checks can run independently and report the ones that are failing. Read more in Rolling your own estimator. #14381 by Thomas Fan.

  • Feature Added a pytest specific decorator, parametrize_with_checks, to parametrize estimator checks for a list of estimators. #14381 by Thomas Fan.

  • Feature A new random variable, utils.fixes.loguniform implements a log-uniform random variable (e.g., for use in RandomizedSearchCV). For example, the outcomes 1, 10 and 100 are all equally likely for loguniform(1, 100). See #11232 by Scott Sievert and Nathaniel Saul, and SciPy PR 10815 <https://github.com/scipy/scipy/pull/10815>.

  • Enhancement utils.safe_indexing (now deprecated) accepts an axis parameter to index array-like across rows and columns. The column indexing can be done on NumPy array, SciPy sparse matrix, and Pandas DataFrame. An additional refactoring was done. #14035 and #14475 by Guillaume Lemaitre.

  • Enhancement utils.extmath.safe_sparse_dot works between 3D+ ndarray and sparse matrix. #14538 by Jérémie du Boisberranger.

  • Fix utils.check_array is now raising an error instead of casting NaN to integer. #14872 by Roman Yurchak.

  • Fix utils.check_array will now correctly detect numeric dtypes in pandas dataframes, fixing a bug where float32 was upcast to float64 unnecessarily. #15094 by Andreas Müller.

  • API Change The following utils have been deprecated and are now private:

    • choose_check_classifiers_labels

    • enforce_estimator_tags_y

    • mocking.MockDataFrame

    • mocking.CheckingClassifier

    • optimize.newton_cg

    • random.random_choice_csc

    • utils.choose_check_classifiers_labels

    • utils.enforce_estimator_tags_y

    • utils.optimize.newton_cg

    • utils.random.random_choice_csc

    • utils.safe_indexing

    • utils.mocking

    • utils.fast_dict

    • utils.seq_dataset

    • utils.weight_vector

    • utils.fixes.parallel_helper (removed)

    • All of utils.testing except for all_estimators which is now in utils.

sklearn.isotonic

Miscellaneous

  • Fix Port lobpcg from SciPy which implement some bug fixes but only available in 1.3+. #13609 and #14971 by Guillaume Lemaitre.

  • API Change Scikit-learn now converts any input data structure implementing a duck array to a numpy array (using __array__) to ensure consistent behavior instead of relying on __array_function__ (see NEP 18). #14702 by Andreas Müller.

  • API Change Replace manual checks with check_is_fitted. Errors thrown when using a non-fitted estimators are now more uniform. #13013 by Agamemnon Krasoulis.

Changes to estimator checks

These changes mostly affect library developers.

  • Estimators are now expected to raise a NotFittedError if predict or transform is called before fit; previously an AttributeError or ValueError was acceptable. #13013 by by Agamemnon Krasoulis.

  • Binary only classifiers are now supported in estimator checks. Such classifiers need to have the binary_only=True estimator tag. #13875 by Trevor Stephens.

  • Estimators are expected to convert input data (X, y, sample_weights) to numpy.ndarray and never call __array_function__ on the original datatype that is passed (see NEP 18). #14702 by Andreas Müller.

  • requires_positive_X estimator tag (for models that require X to be non-negative) is now used by utils.estimator_checks.check_estimator to make sure a proper error message is raised if X contains some negative entries. #14680 by Alex Gramfort.

  • Added check that pairwise estimators raise error on non-square data #14336 by Gregory Dexter.

  • Added two common multioutput estimator tests utils.estimator_checks.check_classifier_multioutput and utils.estimator_checks.check_regressor_multioutput. #13392 by Rok Mihevc.

  • Fix Added check_transformer_data_not_an_array to checks where missing

  • Fix The estimators tags resolution now follows the regular MRO. They used to be overridable only once. #14884 by Andreas Müller.

Code and documentation contributors

Thanks to everyone who has contributed to the maintenance and improvement of the project since version 0.21, including:

Aaron Alphonsus, Abbie Popa, Abdur-Rahmaan Janhangeer, abenbihi, Abhinav Sagar, Abhishek Jana, Abraham K. Lagat, Adam J. Stewart, Aditya Vyas, Adrin Jalali, Agamemnon Krasoulis, Alec Peters, Alessandro Surace, Alexandre de Siqueira, Alexandre Gramfort, alexgoryainov, Alex Henrie, Alex Itkes, alexshacked, Allen Akinkunle, Anaël Beaugnon, Anders Kaseorg, Andrea Maldonado, Andrea Navarrete, Andreas Mueller, Andreas Schuderer, Andrew Nystrom, Angela Ambroz, Anisha Keshavan, Ankit Jha, Antonio Gutierrez, Anuja Kelkar, Archana Alva, arnaudstiegler, arpanchowdhry, ashimb9, Ayomide Bamidele, Baran Buluttekin, barrycg, Bharat Raghunathan, Bill Mill, Biswadip Mandal, blackd0t, Brian G. Barkley, Brian Wignall, Bryan Yang, c56pony, camilaagw, cartman_nabana, catajara, Cat Chenal, Cathy, cgsavard, Charles Vesteghem, Chiara Marmo, Chris Gregory, Christian Lorentzen, Christos Aridas, Dakota Grusak, Daniel Grady, Daniel Perry, Danna Naser, DatenBergwerk, David Dormagen, deeplook, Dillon Niederhut, Dong-hee Na, Dougal J. Sutherland, DrGFreeman, Dylan Cashman, edvardlindelof, Eric Larson, Eric Ndirangu, Eunseop Jeong, Fanny, federicopisanu, Felix Divo, flaviomorelli, FranciDona, Franco M. Luque, Frank Hoang, Frederic Haase, g0g0gadget, Gabriel Altay, Gabriel do Vale Rios, Gael Varoquaux, ganevgv, gdex1, getgaurav2, Gideon Sonoiya, Gordon Chen, gpapadok, Greg Mogavero, Grzegorz Szpak, Guillaume Lemaitre, Guillem García Subies, H4dr1en, hadshirt, Hailey Nguyen, Hanmin Qin, Hannah Bruce Macdonald, Harsh Mahajan, Harsh Soni, Honglu Zhang, Hossein Pourbozorg, Ian Sanders, Ingrid Spielman, J-A16, jaehong park, Jaime Ferrando Huertas, James Hill, James Myatt, Jay, jeremiedbb, Jérémie du Boisberranger, jeromedockes, Jesper Dramsch, Joan Massich, Joanna Zhang, Joel Nothman, Johann Faouzi, Jonathan Rahn, Jon Cusick, Jose Ortiz, Kanika Sabharwal, Katarina Slama, kellycarmody, Kennedy Kang’ethe, Kensuke Arai, Kesshi Jordan, Kevad, Kevin Loftis, Kevin Winata, Kevin Yu-Sheng Li, Kirill Dolmatov, Kirthi Shankar Sivamani, krishna katyal, Lakshmi Krishnan, Lakshya KD, LalliAcqua, lbfin, Leland McInnes, Léonard Binet, Loic Esteve, loopyme, lostcoaster, Louis Huynh, lrjball, Luca Ionescu, Lutz Roeder, MaggieChege, Maithreyi Venkatesh, Maltimore, Maocx, Marc Torrellas, Marie Douriez, Markus, Markus Frey, Martina G. Vilas, Martin Oywa, Martin Thoma, Masashi SHIBATA, Maxwell Aladago, mbillingr, m-clare, Meghann Agarwal, m.fab, Micah Smith, miguelbarao, Miguel Cabrera, Mina Naghshhnejad, Ming Li, motmoti, mschaffenroth, mthorrell, Natasha Borders, nezar-a, Nicolas Hug, Nidhin Pattaniyil, Nikita Titov, Nishan Singh Mann, Nitya Mandyam, norvan, notmatthancock, novaya, nxorable, Oleg Stikhin, Oleksandr Pavlyk, Olivier Grisel, Omar Saleem, Owen Flanagan, panpiort8, Paolo, Paolo Toccaceli, Paresh Mathur, Paula, Peng Yu, Peter Marko, pierretallotte, poorna-kumar, pspachtholz, qdeffense, Rajat Garg, Raphaël Bournhonesque, Ray, Ray Bell, Rebekah Kim, Reza Gharibi, Richard Payne, Richard W, rlms, Robert Juergens, Rok Mihevc, Roman Feldbauer, Roman Yurchak, R Sanjabi, RuchitaGarde, Ruth Waithera, Sackey, Sam Dixon, Samesh Lakhotia, Samuel Taylor, Sarra Habchi, Scott Gigante, Scott Sievert, Scott White, Sebastian Pölsterl, Sergey Feldman, SeWook Oh, she-dares, Shreya V, Shubham Mehta, Shuzhe Xiao, SimonCW, smarie, smujjiga, Sönke Behrends, Soumirai, Sourav Singh, stefan-matcovici, steinfurt, Stéphane Couvreur, Stephan Tulkens, Stephen Cowley, Stephen Tierney, SylvainLan, th0rwas, theoptips, theotheo, Thierno Ibrahima DIOP, Thomas Edwards, Thomas J Fan, Thomas Moreau, Thomas Schmitt, Tilen Kusterle, Tim Bicker, Timsaur, Tim Staley, Tirth Patel, Tola A, Tom Augspurger, Tom Dupré la Tour, topisan, Trevor Stephens, ttang131, Urvang Patel, Vathsala Achar, veerlosar, Venkatachalam N, Victor Luzgin, Vincent Jeanselme, Vincent Lostanlen, Vladimir Korolev, vnherdeiro, Wenbo Zhao, Wendy Hu, willdarnell, William de Vazelhes, wolframalpha, xavier dupré, xcjason, x-martian, xsat, xun-tang, Yinglr, yokasre, Yu-Hang “Maxin” Tang, Yulia Zamriy, Zhao Feng