Version 0.22.0

November 29 2019

For a short description of the main highlights of the release, please refer to Release Highlights for scikit-learn 0.22.

Legend for changelogs

  • Major Feature : something big that you couldn’t do before.

  • Feature : something that you couldn’t do before.

  • Efficiency : an existing feature now may not require as much computation or memory.

  • Enhancement : a miscellaneous minor improvement.

  • Fix : something that previously didn’t work as documentated – or according to reasonable expectations – should now work.

  • API Change : you will need to change your code to have the same effect in the future; or a feature will be removed in the future.

Website update

Our website was revamped and given a fresh new look. #14849 by Thomas Fan.

Clear definition of the public API

Scikit-learn has a public API, and a private API.

We do our best not to break the public API, and to only introduce backward-compatible changes that do not require any user action. However, in cases where that’s not possible, any change to the public API is subject to a deprecation cycle of two minor versions. The private API isn’t publicly documented and isn’t subject to any deprecation cycle, so users should not rely on its stability.

A function or object is public if it is documented in the API Reference and if it can be imported with an import path without leading underscores. For example sklearn.pipeline.make_pipeline is public, while sklearn.pipeline._name_estimators is private. sklearn.ensemble._gb.BaseEnsemble is private too because the whole _gb module is private.

Up to 0.22, some tools were de-facto public (no leading underscore), while they should have been private in the first place. In version 0.22, these tools have been made properly private, and the public API space has been cleaned. In addition, importing from most sub-modules is now deprecated: you should for example use from sklearn.cluster import Birch instead of from sklearn.cluster.birch import Birch (in practice, birch.py has been moved to _birch.py).

Note

All the tools in the public API should be documented in the API Reference. If you find a public tool (without leading underscore) that isn’t in the API reference, that means it should either be private or documented. Please let us know by opening an issue!

This work was tracked in issue 9250 and issue 12927.

Deprecations: using FutureWarning from now on

When deprecating a feature, previous versions of scikit-learn used to raise a DeprecationWarning. Since the DeprecationWarnings aren’t shown by default by Python, scikit-learn needed to resort to a custom warning filter to always show the warnings. That filter would sometimes interfere with users custom warning filters.

Starting from version 0.22, scikit-learn will show FutureWarnings for deprecations, as recommended by the Python documentation. FutureWarnings are always shown by default by Python, so the custom filter has been removed and scikit-learn no longer hinders with user filters. #15080 by Nicolas Hug.

Changed models

The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.

Details are listed in the changelog below.

(While we are trying to better inform users by providing this information, we cannot assure that this list is complete.)

Changelog

sklearn.base

  • API Change From version 0.24 base.BaseEstimator.get_params will raise an AttributeError rather than return None for parameters that are in the estimator’s constructor but not stored as attributes on the instance. #14464 by Joel Nothman.

sklearn.calibration

sklearn.cluster

sklearn.compose

sklearn.cross_decomposition

sklearn.datasets

sklearn.decomposition

sklearn.dummy

sklearn.ensemble

sklearn.feature_extraction

sklearn.feature_selection

sklearn.gaussian_process

sklearn.impute

sklearn.inspection

sklearn.kernel_approximation

sklearn.linear_model

sklearn.manifold

sklearn.metrics

sklearn.model_selection

sklearn.multioutput

sklearn.naive_bayes

sklearn.neighbors

sklearn.neural_network

sklearn.pipeline

sklearn.preprocessing

sklearn.model_selection

sklearn.svm

  • Enhancement svm.SVC and svm.NuSVC now accept a break_ties parameter. This parameter results in predict breaking the ties according to the confidence values of decision_function, if decision_function_shape='ovr', and the number of target classes > 2. #12557 by Adrin Jalali.

  • Enhancement SVM estimators now throw a more specific error when kernel='precomputed' and fit on non-square data. #14336 by Gregory Dexter.

  • Fix svm.SVC, svm.SVR, svm.NuSVR and svm.OneClassSVM when received values negative or zero for parameter sample_weight in method fit(), generated an invalid model. This behavior occured only in some border scenarios. Now in these cases, fit() will fail with an Exception. #14286 by Alex Shacked.

  • Fix The n_support_ attribute of svm.SVR and svm.OneClassSVM was previously non-initialized, and had size 2. It has now size 1 with the correct value. #15099 by Nicolas Hug.

  • Fix fixed a bug in BaseLibSVM._sparse_fit where n_SV=0 raised a ZeroDivisionError. #14894 by Danna Naser.

  • Fix The liblinear solver now supports sample_weight. #15038 by Guillaume Lemaitre.

sklearn.utils

  • Feature check_estimator can now generate checks by setting generate_only=True. Previously, running check_estimator will stop when the first check fails. With generate_only=True, all checks can run independently and report the ones that are failing. Read more in Rolling your own estimator. #14381 by Thomas Fan.

  • Feature Added a pytest specific decorator, parametrize_with_checks, to parametrize estimator checks for a list of estimators. #14381 by Thomas Fan.

  • API Change The following utils have been deprecated and are now private:

    • utils.choose_check_classifiers_labels

    • utils.enforce_estimator_tags_y

    • utils.optimize.newton_cg

    • utils.random.random_choice_csc

    • utils.safe_indexing

    • utils.mocking

    • utils.fast_dict

    • utils.seq_dataset

    • utils.weight_vector

    • utils.fixes.parallel_helper (removed)

    • All of utils.testing except for all_estimators which is now in utils.

  • A new random variable, utils.fixes.loguniform implements a log-uniform random variable (e.g., for use in RandomizedSearchCV). For example, the outcomes 1, 10 and 100 are all equally likely for loguniform(1, 100). See #11232 by Scott Sievert and Nathaniel Saul, and SciPy PR 10815 <https://github.com/scipy/scipy/pull/10815>.

  • Enhancement utils.safe_indexing (now deprecated) accepts an axis parameter to index array-like across rows and columns. The column indexing can be done on NumPy array, SciPy sparse matrix, and Pandas DataFrame. An additional refactoring was done. #14035 and #14475 by Guillaume Lemaitre.

  • Enhancement utils.extmath.safe_sparse_dot works between 3D+ ndarray and sparse matrix. #14538 by Jérémie du Boisberranger.

  • Fix utils.check_array is now raising an error instead of casting NaN to integer. #14872 by Roman Yurchak.

  • Fix utils.check_array will now correctly detect numeric dtypes in pandas dataframes, fixing a bug where float32 was upcast to float64 unnecessarily. #15094 by Andreas Müller.

  • API Change The following utils have been deprecated and are now private:

    • choose_check_classifiers_labels

    • enforce_estimator_tags_y

    • mocking.MockDataFrame

    • mocking.CheckingClassifier

    • optimize.newton_cg

    • random.random_choice_csc

sklearn.isotonic

Miscellaneous

  • API Change Scikit-learn now converts any input data structure implementing a duck array to a numpy array (using __array__) to ensure consistent behavior instead of relying on __array_function__ (see NEP 18). #14702 by Andreas Müller.

  • API Change Replace manual checks with check_is_fitted. Errors thrown when using a non-fitted estimators are now more uniform. #13013 by Agamemnon Krasoulis.

  • Fix Port lobpcg from SciPy which implement some bug fixes but only available in 1.3+. #13609 and #14971 by Guillaume Lemaitre.

Changes to estimator checks

These changes mostly affect library developers.

  • Estimators are now expected to raise a NotFittedError if predict or transform is called before fit; previously an AttributeError or ValueError was acceptable. #13013 by by Agamemnon Krasoulis.

  • Binary only classifiers are now supported in estimator checks. Such classifiers need to have the binary_only=True estimator tag. #13875 by Trevor Stephens.

  • Estimators are expected to convert input data (X, y, sample_weights) to numpy.ndarray and never call __array_function__ on the original datatype that is passed (see NEP 18). #14702 by Andreas Müller.

  • requires_positive_X estimator tag (for models that require X to be non-negative) is now used by utils.estimator_checks.check_estimator to make sure a proper error message is raised if X contains some negative entries. #14680 by Alex Gramfort.

  • Added check that pairwise estimators raise error on non-square data #14336 by Gregory Dexter.

  • Added two common multioutput estimator tests check_classifier_multioutput and check_regressor_multioutput. #13392 by Rok Mihevc.

  • Fix Added check_transformer_data_not_an_array to checks where missing

  • Fix The estimators tags resolution now follows the regular MRO. They used to be overridable only once. #14884 by Andreas Müller.