.. include:: _contributors.rst .. currentmodule:: sklearn .. _release_notes_1_6: =========== Version 1.6 =========== .. -- UNCOMMENT WHEN 1.6.0 IS RELEASED -- For a short description of the main highlights of the release, please refer to :ref:`sphx_glr_auto_examples_release_highlights_plot_release_highlights_1_6_0.py`. .. include:: changelog_legend.inc .. _changes_1_6: Version 1.6.0 ============= **In Development** Changes impacting many modules ------------------------------ - |API| :func:`utils.validation.validate_data` is introduced and replaces previously private `base.BaseEstimator._validate_data` method. This is intended for third party estimator developers, who should use this function in most cases instead of :func:`utils.validation.check_array` and :func:`utils.validation.check_X_y`. :pr:`29696` by `Adrin Jalali`_. - |Enhancement| `__sklearn_tags__` was introduced for setting tags in estimators. More details in :ref:`estimator_tags`. :pr:`22606` by `Thomas Fan`_ and :pr:`29677` by `Adrin Jalali`_. Support for Array API --------------------- Additional estimators and functions have been updated to include support for all `Array API `_ compliant inputs. See :ref:`array_api` for more details. **Functions:** - :func:`sklearn.metrics.cluster.entropy` :pr:`29141` by :user:`Yaroslav Korobko `; - :func:`sklearn.metrics.d2_tweedie_score` :pr:`29207` by :user:`Emily Chen `; - :func:`sklearn.metrics.max_error` :pr:`29212` by :user:`Edoardo Abati `; - :func:`sklearn.metrics.mean_absolute_error` :pr:`27736` by :user:`Edoardo Abati ` and :pr:`29143` by :user:`Tialo ` and :user:`Loïc Estève `; - :func:`sklearn.metrics.mean_absolute_percentage_error` :pr:`29300` by :user:`Emily Chen `; - :func:`sklearn.metrics.mean_gamma_deviance` :pr:`29239` by :user:`Emily Chen `; - :func:`sklearn.metrics.mean_poisson_deviance` :pr:`29227` by :user:`Emily Chen `; - :func:`sklearn.metrics.mean_squared_error` :pr:`29142` by :user:`Yaroslav Korobko `; - :func:`sklearn.metrics.mean_tweedie_deviance` :pr:`28106` by :user:`Thomas Li `; - :func:`sklearn.metrics.pairwise.additive_chi2_kernel` :pr:`29144` by :user:`Yaroslav Korobko `; - :func:`sklearn.metrics.pairwise.chi2_kernel` :pr:`29267` by :user:`Yaroslav Korobko `; - :func:`sklearn.metrics.pairwise.cosine_distances` :pr:`29265` by :user:`Emily Chen `; - :func:`sklearn.metrics.pairwise.cosine_similarity` :pr:`29014` by :user:`Edoardo Abati `; - :func:`sklearn.metrics.pairwise.euclidean_distances` :pr:`29433` by :user:`Omar Salman `; - :func:`sklearn.metrics.pairwise.linear_kernel` :pr:`29475` by :user:`Omar Salman `; - :func:`sklearn.metrics.pairwise.paired_cosine_distances` :pr:`29112` by :user:`Edoardo Abati `. - :func:`sklearn.metrics.pairwise.paired_euclidean_distances` :pr:`29389` by :user:`Emily Chen `; - :func:`sklearn.metrics.pairwise.polynomial_kernel` :pr:`29475` by :user:`Omar Salman `; - :func:`sklearn.metrics.pairwise.rbf_kernel` :pr:`29433` by :user:`Omar Salman `; - :func:`sklearn.metrics.pairwise.sigmoid_kernel` :pr:`29475` by :user:`Omar Salman `. **Classes:** - :class:`preprocessing.LabelEncoder` now supports Array API compatible inputs. :pr:`27381` by :user:`Omar Salman `. - :class:`model_selection.GridSearchCV`, :class:`model_selection.RandomizedSearchCV`, :class:`model_selection.HalvingGridSearchCV` and :class:`model_selection.HalvingRandomSearchCV` now support Array API compatible inputs when their base estimators do. :pr:`27096` by :user:`Tim Head ` and :user:`Olivier Grisel `. **Other** - Support for the soon to be deprecated `cupy.array_api` module has been removed in favor of directly supporting the top level `cupy` module, possibly via the `array_api_compat.cupy` compatibility wrapper. :pr:`29639` by :user:`Olivier Grisel `. Metadata Routing ---------------- The following models now support metadata routing in one or more of their methods. Refer to the :ref:`Metadata Routing User Guide ` for more details. - |Feature| :func:`model_selection.learning_curve` now supports metadata routing for the `fit` method of its estimator and for its underlying CV splitter and scorer. :pr:`28975` by :user:`Stefanie Senger `. - |Feature| :class:`ensemble.StackingClassifier` and :class:`ensemble.StackingRegressor` now support metadata routing and pass ``**fit_params`` to the underlying estimators via their `fit` methods. :pr:`28701` by :user:`Stefanie Senger `. - |Feature| :class:`compose.TransformedTargetRegressor` now supports metadata routing in its `fit` and `predict` methods and routes the corresponding params to the underlying regressor. :pr:`29136` by :user:`Omar Salman `. - |Feature| :class:`feature_selection.SequentialFeatureSelector` now supports metadata routing in its `fit` method and passes the corresponding params to the :func:`model_selection.cross_val_score` function. :pr:`29260` by :user:`Omar Salman `. - |Feature| :func:`model_selection.validation_curve` now supports metadata routing for the `fit` method of its estimator and for its underlying CV splitter and scorer. :pr:`29329` by :user:`Stefanie Senger `. - |Feature| :class:`semi_supervised.SelfTrainingClassifier` now supports metadata routing. The fit method now accepts ``**fit_params`` which are passed to the underlying estimators via their `fit` methods. In addition, the `predict`, `predict_proba`, `predict_log_proba`, `score` and `decision_function` methods also accept ``**params`` which are passed to the underlying estimators via their respective methods. :pr:`28494` by :user:`Adam Li `. - |Feature| :func:`model_selection.permutation_test_score` now supports metadata routing for the `fit` method of its estimator and for its underlying CV splitter and scorer. :pr:`29266` by :user:`Adam Li `. - |Feature| :class:`feature_selection.RFE` and :class:`feature_selection.RFECV` now support metadata routing. :pr:`29312` by :user:`Omar Salman `. Dropping support for building with setuptools --------------------------------------------- From scikit-learn 1.6 onwards, support for building with setuptools has been removed. Meson is the only supported way to build scikit-learn, see :ref:`Building from source ` for more details. :pr:`29400` by :user:`Loïc Estève ` Dropping official support for PyPy ---------------------------------- Due to limited maintainer resources and small number of users, official PyPy support has been dropped. Some parts of scikit-learn may still work but PyPy is not tested anymore in the scikit-learn Continuous Integration. :pr:`29128` by :user:`Loïc Estève `. Changelog --------- .. Entries should be grouped by module (in alphabetic order) and prefixed with one of the labels: |MajorFeature|, |Feature|, |Efficiency|, |Enhancement|, |Fix| or |API| (see whats_new.rst for descriptions). Entries should be ordered by those labels (e.g. |Fix| after |Efficiency|). Changes not specific to a module should be listed under *Multiple Modules* or *Miscellaneous*. Entries should end with: :pr:`123456` by :user:`Joe Bloggs `. where 123455 is the *pull request* number, not the issue number. :mod:`sklearn.base` ................... - |Enhancement| Added a function :func:`base.is_clusterer` which determines whether a given estimator is of category clusterer. :pr:`28936` by :user:`Christian Veenhuis `. :mod:`sklearn.cluster` ...................... - |API| The `copy` parameter of :class:`cluster.Birch` was deprecated in 1.6 and will be removed in 1.8. It has no effect as the estimator does not perform in-place operations on the input data. :pr:`29124` by :user:`Yao Xiao `. :mod:`sklearn.compose` ...................... - |Enhancement| :func:`sklearn.compose.ColumnTransformer` `verbose_feature_names_out` now accepts string format or callable to generate feature names. :pr:`28934` by :user:`Marc Bresson `. :mod:`sklearn.cross_decomposition` .................................. - |Fix| :class:`cross_decomposition.PLSRegression` properly raises an error when `n_components` is larger than `n_samples`. :pr:`29710` by `Thomas Fan`_. :mod:`sklearn.datasets` ....................... - |Feature| :func:`datasets.fetch_file` allows downloading arbitrary data-file from the web. It handles local caching, integrity checks with SHA256 digests and automatic retries in case of HTTP errors. :pr:`29354` by :user:`Olivier Grisel `. :mod:`sklearn.decomposition` ............................ - |Fix| Increase rank defficiency threshold in the whitening step of :class:`decomposition.FastICA` with `whiten_solver="eigh"` to improve the platform-agnosticity of the estimator. :pr:`29612` by :user:`Olivier Grisel `. :mod:`sklearn.discriminant_analysis` .................................... - |Fix| :class:`discriminant_analysis.QuadraticDiscriminantAnalysis` will now cause `LinAlgWarning` in case of collinear variables. These errors can be silenced using the `reg_param` attribute. :pr:`19731` by :user:`Alihan Zihna `. :mod:`sklearn.ensemble` ....................... - |Efficiency| Small runtime improvement of fitting :class:`ensemble.HistGradientBoostingClassifier` and :class:`ensemble.HistGradientBoostingRegressor` by parallelizing the initial search for bin thresholds. :pr:`28064` by :user:`Christian Lorentzen `. - |Enhancement| The verbosity of :class:`ensemble.HistGradientBoostingClassifier` and :class:`ensemble.HistGradientBoostingRegressor` got a more granular control. Now, `verbose = 1` prints only summary messages, `verbose >= 2` prints the full information as before. :pr:`28179` by :user:`Christian Lorentzen `. - |Efficiency| :class:`ensemble.IsolationForest` now runs parallel jobs during :term:`predict` offering a speedup of up to 2-4x on sample sizes larger than 2000 using `joblib`. :pr:`28622` by :user:`Adam Li ` and :user:`Sérgio Pereira `. - |Feature| :class:`ensemble.ExtraTreesClassifier` and :class:`ensemble.ExtraTreesRegressor` now support missing-values in the data matrix `X`. Missing-values are handled by randomly moving all of the samples to the left, or right child node as the tree is traversed. :pr:`28268` by :user:`Adam Li `. :mod:`sklearn.impute` ..................... - |Fix| :class:`impute.KNNImputer` excludes samples with nan distances when computing the mean value for uniform weights. :pr:`29135` by :user:`Xuefeng Xu `. :mod:`sklearn.linear_model` ........................... - |Fix| :class:`linear_model.LogisticRegressionCV` corrects sample weight handling for the calculation of test scores. :pr:`29419` by :user:`Shruti Nath `. - |API| Deprecates `copy_X` in :class:`linear_model.TheilSenRegressor` as the parameter has no effect. `copy_X` will be removed in 1.8. :pr:`29105` by :user:`Adam Li `. :mod:`sklearn.manifold` ....................... - |Efficiency| :func:`manifold.locally_linear_embedding` and :class:`manifold.LocallyLinearEmbedding` now allocate more efficiently the memory of sparse matrices in the Hessian, Modified and LTSA methods. :pr:`28096` by :user:`Giorgio Angelotti `. :mod:`sklearn.metrics` ...................... - |Enhancement| :func:`sklearn.metrics.check_scoring` now accepts `raise_exc` to specify whether to raise an exception if a subset of the scorers in multimetric scoring fails or to return an error code. :pr:`28992` by :user:`Stefanie Senger `. - |Enhancement| Adds `zero_division` to :func:`cohen_kappa_score`. When there is a division by zero, the metric is undefined and this value is returned. :pr:`29210` by :user:`Marc Torrellas Socastro ` and :user:`Stefanie Senger `. - |Efficiency| :func:`sklearn.metrics.classification_report` is now faster by caching classification labels. :pr:`29738` by `Adrin Jalali`_. - |API| scoring="neg_max_error" should be used instead of scoring="max_error" which is now deprecated. :pr:`29462` by :user:`Farid "Freddie" Taba `. - |API| the `assert_all_finite` parameter of functions :func:`metrics.pairwise.check_pairwise_arrays` and :func:`metrics.pairwise_distances` is renamed into `ensure_all_finite`. `force_all_finite` will be removed in 1.8. :pr:`29404` by :user:`Jérémie du Boisberranger `. :mod:`sklearn.model_selection` .............................. - |Enhancement| Add the parameter `prefit` to :class:`model_selection.FixedThresholdClassifier` allowing the use of a pre-fitted estimator without re-fitting it. :pr:`29067` by :user:`Guillaume Lemaitre `. - |Fix| Improve error message when :func:`model_selection.RepeatedStratifiedKFold.split` is called without a `y` argument :pr:`29402` by :user:`Anurag Varma `. :mod:`sklearn.neighbors` ........................ - |Fix| :class:`neighbors.LocalOutlierFactor` raises a warning in the `fit` method when duplicate values in the training data lead to inaccurate outlier detection. :pr:`28773` by :user:`Henrique Caroço `. :mod:`sklearn.preprocessing` ............................ - |Enhancement| The HTML representation of :class:`preprocessing.FunctionTransformer` will show the function name in the label. :pr:`29158` by :user:`Yao Xiao `. - |Fix| :class:`preprocessing.PowerTransformer` now uses `scipy.special.inv_boxcox` to output `nan` if the input of BoxCox's inverse is invalid. :pr:`27875` by :user:`Xuefeng Xu `. :mod:`sklearn.semi_supervised` .............................. - |API| :class:`semi_supervised.SelfTrainingClassifier` deprecated the `base_estimator` parameter in favor of `estimator`. :pr:`28494` by :user:`Adam Li `. :mod:`sklearn.tree` ................... - |Feature| :class:`tree.ExtraTreeClassifier` and :class:`tree.ExtraTreeRegressor` now support missing-values in the data matrix ``X``. Missing-values are handled by randomly moving all of the samples to the left, or right child node as the tree is traversed. :pr:`27966` by :user:`Adam Li `. :mod:`sklearn.utils` .................... - |Enhancement| :func:`utils.validation.check_array` now accepts `ensure_non_negative` to check for negative values in the passed array, until now only available through calling :func:`utils.validation.check_non_negative`. :pr:`29540` by :user:`Tamara Atanasoska `. - |API| the `assert_all_finite` parameter of functions :func:`utils.check_array`, :func:`utils.check_X_y`, :func:`utils.as_float_array` is renamed into `ensure_all_finite`. `force_all_finite` will be removed in 1.8. :pr:`29404` by :user:`Jérémie du Boisberranger `. .. rubric:: Code and documentation contributors Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.5, including: TODO: update at the time of the release.