.. include:: _contributors.rst .. currentmodule:: sklearn .. _release_notes_1_6: =========== Version 1.6 =========== For a short description of the main highlights of the release, please refer to :ref:`sphx_glr_auto_examples_release_highlights_plot_release_highlights_1_6_0.py`. .. include:: changelog_legend.inc .. towncrier release notes start .. _changes_1_6_0: Version 1.6.0 ============= **December 2024** Changes impacting many modules ------------------------------ - |Enhancement| `__sklearn_tags__` was introduced for setting tags in estimators. More details in :ref:`estimator_tags`. By :user:`Thomas Fan ` and :user:`Adrin Jalali ` :pr:`29677` - |Enhancement| Scikit-learn classes and functions can be used while only having a `import sklearn` import line. For example, `import sklearn; sklearn.svm.SVC()` now works. By :user:`Thomas Fan ` :pr:`29793` - |Fix| Classes :class:`metrics.ConfusionMatrixDisplay`, :class:`metrics.RocCurveDisplay`, :class:`calibration.CalibrationDisplay`, :class:`metrics.PrecisionRecallDisplay`, :class:`metrics.PredictionErrorDisplay` and :class:`inspection.PartialDependenceDisplay` now properly handle Matplotlib aliases for style parameters (e.g., `c` and `color`, `ls` and `linestyle`, etc). By :user:`Joseph Barbier ` :pr:`30023` - |API| :func:`utils.validation.validate_data` is introduced and replaces previously private `base.BaseEstimator._validate_data` method. This is intended for third party estimator developers, who should use this function in most cases instead of :func:`utils.check_array` and :func:`utils.check_X_y`. By :user:`Adrin Jalali ` :pr:`29696` Support for Array API --------------------- Additional estimators and functions have been updated to include support for all `Array API `_ compliant inputs. See :ref:`array_api` for more details. - |Feature| :class:`model_selection.GridSearchCV`, :class:`model_selection.RandomizedSearchCV`, :class:`model_selection.HalvingGridSearchCV` and :class:`model_selection.HalvingRandomSearchCV` now support Array API compatible inputs when their base estimators do. By :user:`Tim Head ` and :user:`Olivier Grisel ` :pr:`27096` - |Feature| :func:`sklearn.metrics.f1_score` now supports Array API compatible inputs. By :user:`Omar Salman ` :pr:`27369` - |Feature| :class:`preprocessing.LabelEncoder` now supports Array API compatible inputs. By :user:`Omar Salman ` :pr:`27381` - |Feature| :func:`sklearn.metrics.mean_absolute_error` now supports Array API compatible inputs. By :user:`Edoardo Abati ` :pr:`27736` - |Feature| :func:`sklearn.metrics.mean_tweedie_deviance` now supports Array API compatible inputs. By :user:`Thomas Li ` :pr:`28106` - |Feature| :func:`sklearn.metrics.pairwise.cosine_similarity` now supports Array API compatible inputs. By :user:`Edoardo Abati ` :pr:`29014` - |Feature| :func:`sklearn.metrics.pairwise.paired_cosine_distances` now supports Array API compatible inputs. By :user:`Edoardo Abati ` :pr:`29112` - |Feature| :func:`sklearn.metrics.cluster.entropy` now supports Array API compatible inputs. By :user:`Yaroslav Korobko ` :pr:`29141` - |Feature| :func:`sklearn.metrics.mean_squared_error` now supports Array API compatible inputs. By :user:`Yaroslav Korobko ` :pr:`29142` - |Feature| :func:`sklearn.metrics.pairwise.additive_chi2_kernel` now supports Array API compatible inputs. By :user:`Yaroslav Korobko ` :pr:`29144` - |Feature| :func:`sklearn.metrics.d2_tweedie_score` now supports Array API compatible inputs. By :user:`Emily Chen ` :pr:`29207` - |Feature| :func:`sklearn.metrics.max_error` now supports Array API compatible inputs. By :user:`Edoardo Abati ` :pr:`29212` - |Feature| :func:`sklearn.metrics.mean_poisson_deviance` now supports Array API compatible inputs. By :user:`Emily Chen ` :pr:`29227` - |Feature| :func:`sklearn.metrics.mean_gamma_deviance` now supports Array API compatible inputs. By :user:`Emily Chen ` :pr:`29239` - |Feature| :func:`sklearn.metrics.pairwise.cosine_distances` now supports Array API compatible inputs. By :user:`Emily Chen ` :pr:`29265` - |Feature| :func:`sklearn.metrics.pairwise.chi2_kernel` now supports Array API compatible inputs. By :user:`Yaroslav Korobko ` :pr:`29267` - |Feature| :func:`sklearn.metrics.mean_absolute_percentage_error` now supports Array API compatible inputs. By :user:`Emily Chen ` :pr:`29300` - |Feature| :func:`sklearn.metrics.pairwise.paired_euclidean_distances` now supports Array API compatible inputs. By :user:`Emily Chen ` :pr:`29389` - |Feature| :func:`sklearn.metrics.pairwise.euclidean_distances` and :func:`sklearn.metrics.pairwise.rbf_kernel` now supports Array API compatible inputs. By :user:`Omar Salman ` :pr:`29433` - |Feature| :func:`sklearn.metrics.pairwise.linear_kernel`, :func:`sklearn.metrics.pairwise.sigmoid_kernel`, and :func:`sklearn.metrics.pairwise.polynomial_kernel` now supports Array API compatible inputs. By :user:`Omar Salman ` :pr:`29475` - |Feature| :func:`sklearn.metrics.mean_squared_log_error` and :func:`sklearn.metrics.root_mean_squared_log_error` now supports Array API compatible inputs. By :user:`Virgil Chan ` :pr:`29709` - |Feature| :class:`preprocessing.MinMaxScaler` with `clip=True` now supports Array API compatible inputs. By :user:`Shreekant Nandiyawar ` :pr:`29751` - Support for the soon to be deprecated `cupy.array_api` module has been removed in favor of directly supporting the top level `cupy` module, possibly via the `array_api_compat.cupy` compatibility wrapper. By :user:`Olivier Grisel ` :pr:`29639` Metadata routing ---------------- Refer to the :ref:`Metadata Routing User Guide ` for more details. - |Feature| :class:`semi_supervised.SelfTrainingClassifier` now supports metadata routing. The fit method now accepts ``**fit_params`` which are passed to the underlying estimators via their `fit` methods. In addition, the :meth:`~semi_supervised.SelfTrainingClassifier.predict`, :meth:`~semi_supervised.SelfTrainingClassifier.predict_proba`, :meth:`~semi_supervised.SelfTrainingClassifier.predict_log_proba`, :meth:`~semi_supervised.SelfTrainingClassifier.score` and :meth:`~semi_supervised.SelfTrainingClassifier.decision_function` methods also accept ``**params`` which are passed to the underlying estimators via their respective methods. By :user:`Adam Li ` :pr:`28494` - |Feature| :class:`ensemble.StackingClassifier` and :class:`ensemble.StackingRegressor` now support metadata routing and pass ``**fit_params`` to the underlying estimators via their `fit` methods. By :user:`Stefanie Senger ` :pr:`28701` - |Feature| :func:`model_selection.learning_curve` now supports metadata routing for the `fit` method of its estimator and for its underlying CV splitter and scorer. By :user:`Stefanie Senger ` :pr:`28975` - |Feature| :class:`compose.TransformedTargetRegressor` now supports metadata routing in its :meth:`~compose.TransformedTargetRegressor.fit` and :meth:`~compose.TransformedTargetRegressor.predict` methods and routes the corresponding params to the underlying regressor. By :user:`Omar Salman ` :pr:`29136` - |Feature| :class:`feature_selection.SequentialFeatureSelector` now supports metadata routing in its `fit` method and passes the corresponding params to the :func:`model_selection.cross_val_score` function. By :user:`Omar Salman ` :pr:`29260` - |Feature| :func:`model_selection.permutation_test_score` now supports metadata routing for the `fit` method of its estimator and for its underlying CV splitter and scorer. By :user:`Adam Li ` :pr:`29266` - |Feature| :class:`feature_selection.RFE` and :class:`feature_selection.RFECV` now support metadata routing. By :user:`Omar Salman ` :pr:`29312` - |Feature| :func:`model_selection.validation_curve` now supports metadata routing for the `fit` method of its estimator and for its underlying CV splitter and scorer. By :user:`Stefanie Senger ` :pr:`29329` - |Fix| Metadata is routed correctly to grouped CV splitters via :class:`linear_model.RidgeCV` and :class:`linear_model.RidgeClassifierCV` and `UnsetMetadataPassedError` is fixed for :class:`linear_model.RidgeClassifierCV` with default scoring. By :user:`Stefanie Senger ` :pr:`29634` - |Fix| Many method arguments which shouldn't be included in the routing mechanism are now excluded and the `set_{method}_request` methods are not generated for them. By `Adrin Jalali`_ :pr:`29920` Dropping official support for PyPy ---------------------------------- Due to limited maintainer resources and small number of users, official PyPy support has been dropped. Some parts of scikit-learn may still work but PyPy is not tested anymore in the scikit-learn Continuous Integration. By :user:`Loïc Estève ` :pr:`29128` Dropping support for building with setuptools --------------------------------------------- From scikit-learn 1.6 onwards, support for building with setuptools has been removed. Meson is the only supported way to build scikit-learn, see :ref:`Building from source ` for more details. By :user:`Loïc Estève ` :pr:`29400` Free-threaded CPython 3.13 support ---------------------------------- scikit-learn has preliminary support for free-threaded CPython, in particular free-threaded wheels are available for all of our supported platforms. Free-threaded (also known as nogil) CPython 3.13 is an experimental version of CPython 3.13 who aims at enabling efficient multi-threaded use cases by removing the Global Interpreter Lock (GIL). For more details about free-threaded CPython see `py-free-threading doc `_, in particular `how to install a free-threaded CPython `_ and `Ecosystem compatibility tracking `_. Feel free to try free-threaded on your use case and report any issues! By :user:`Loïc Estève ` and many other people in the wider Scientific Python and CPython ecosystem, for example :user:`Nathan Goldbaum `, :user:`Ralf Gommers `, :user:`Edgar Andrés Margffoy Tuay `. :pr:`30360` :mod:`sklearn.base` ------------------- - |Enhancement| Added a function :func:`base.is_clusterer` which determines whether a given estimator is of category clusterer. By :user:`Christian Veenhuis ` :pr:`28936` - |API| Passing a class object to :func:`~sklearn.base.is_classifier`, :func:`~sklearn.base.is_regressor`, and :func:`~sklearn.base.is_outlier_detector` is now deprecated. Pass an instance instead. By `Adrin Jalali`_ :pr:`30122` :mod:`sklearn.calibration` -------------------------- - |API| `cv="prefit"` is deprecated for :class:`~sklearn.calibration.CalibratedClassifierCV`. Use :class:`~sklearn.frozen.FrozenEstimator` instead, as `CalibratedClassifierCV(FrozenEstimator(estimator))`. By `Adrin Jalali`_ :pr:`30171` :mod:`sklearn.cluster` ---------------------- - |API| The `copy` parameter of :class:`cluster.Birch` was deprecated in 1.6 and will be removed in 1.8. It has no effect as the estimator does not perform in-place operations on the input data. By :user:`Yao Xiao ` :pr:`29124` :mod:`sklearn.compose` ---------------------- - |Enhancement| :func:`sklearn.compose.ColumnTransformer` `verbose_feature_names_out` now accepts string format or callable to generate feature names. By :user:`Marc Bresson ` :pr:`28934` :mod:`sklearn.covariance` ------------------------- - |Efficiency| :class:`covariance.MinCovDet` fitting is now slightly faster. By :user:`Antony Lee ` :pr:`29835` :mod:`sklearn.cross_decomposition` ---------------------------------- - |Fix| :class:`cross_decomposition.PLSRegression` properly raises an error when `n_components` is larger than `n_samples`. By :user:`Thomas Fan ` :pr:`29710` :mod:`sklearn.datasets` ----------------------- - |Feature| :func:`datasets.fetch_file` allows downloading arbitrary data-file from the web. It handles local caching, integrity checks with SHA256 digests and automatic retries in case of HTTP errors. By :user:`Olivier Grisel ` :pr:`29354` :mod:`sklearn.decomposition` ---------------------------- - |Enhancement| :class:`~sklearn.decomposition.LatentDirichletAllocation` now has a ``normalize`` parameter in :meth:`~sklearn.decomposition.LatentDirichletAllocation.transform` and :meth:`~sklearn.decomposition.LatentDirichletAllocation.fit_transform` methods to control whether the document topic distribution is normalized. By `Adrin Jalali`_ :pr:`30097` - |Fix| :class:`~sklearn.decomposition.IncrementalPCA` will now only raise a ``ValueError`` when the number of samples in the input data to ``partial_fit`` is less than the number of components on the first call to ``partial_fit``. Subsequent calls to ``partial_fit`` no longer face this restriction. By :user:`Thomas Gessey-Jones ` :pr:`30224` :mod:`sklearn.discriminant_analysis` ------------------------------------ - |Fix| :class:`discriminant_analysis.QuadraticDiscriminantAnalysis` will now cause `LinAlgWarning` in case of collinear variables. These errors can be silenced using the `reg_param` attribute. By :user:`Alihan Zihna ` :pr:`19731` :mod:`sklearn.ensemble` ----------------------- - |Feature| :class:`ensemble.ExtraTreesClassifier` and :class:`ensemble.ExtraTreesRegressor` now support missing-values in the data matrix `X`. Missing-values are handled by randomly moving all of the samples to the left, or right child node as the tree is traversed. By :user:`Adam Li ` :pr:`28268` - |Efficiency| Small runtime improvement of fitting :class:`ensemble.HistGradientBoostingClassifier` and :class:`ensemble.HistGradientBoostingRegressor` by parallelizing the initial search for bin thresholds. By :user:`Christian Lorentzen ` :pr:`28064` - |Efficiency| :class:`ensemble.IsolationForest` now runs parallel jobs during :term:`predict` offering a speedup of up to 2-4x on sample sizes larger than 2000 using `joblib`. By :user:`Adam Li ` and :user:`Sérgio Pereira ` :pr:`28622` - |Enhancement| The verbosity of :class:`ensemble.HistGradientBoostingClassifier` and :class:`ensemble.HistGradientBoostingRegressor` got a more granular control. Now, `verbose = 1` prints only summary messages, `verbose >= 2` prints the full information as before. By :user:`Christian Lorentzen ` :pr:`28179` - |API| The parameter `algorithm` of :class:`ensemble.AdaBoostClassifier` is deprecated and will be removed in 1.8. By :user:`Jérémie du Boisberranger ` :pr:`29997` :mod:`sklearn.feature_extraction` --------------------------------- - |Fix| :class:`feature_extraction.text.TfidfVectorizer` now correctly preserves the `dtype` of `idf_` based on the input data. By :user:`Guillaume Lemaitre ` :pr:`30022` :mod:`sklearn.frozen` --------------------- - |MajorFeature| :class:`~sklearn.frozen.FrozenEstimator` is now introduced which allows freezing an estimator. This means calling `.fit` on it has no effect, and doing a `clone(frozenestimator)` returns the same estimator instead of an unfitted clone. :pr:`29705` By `Adrin Jalali`_ :pr:`29705` :mod:`sklearn.impute` --------------------- - |Fix| :class:`impute.KNNImputer` excludes samples with nan distances when computing the mean value for uniform weights. By :user:`Xuefeng Xu ` :pr:`29135` - |Fix| When `min_value` and `max_value` are array-like and some features are dropped due to `keep_empty_features=False`, :class:`impute.IterativeImputer` no longer raises an error and now indexes correctly. By :user:`Guntitat Sawadwuthikul ` :pr:`29451` - |Fix| Fixed :class:`impute.IterativeImputer` to make sure that it does not skip the iterative process when `keep_empty_features` is set to `True`. By :user:`Arif Qodari ` :pr:`29779` - |API| Add a warning in :class:`impute.SimpleImputer` when `keep_empty_feature=False` and `strategy="constant"`. In this case empty features are not dropped and this behaviour will change in 1.8. By :user:`Arthur Courselle ` and :user:`Simon Riou ` :pr:`29950` :mod:`sklearn.linear_model` --------------------------- - |Enhancement| The `solver="newton-cholesky"` in :class:`linear_model.LogisticRegression` and :class:`linear_model.LogisticRegressionCV` is extended to support the full multinomial loss in a multiclass setting. By :user:`Christian Lorentzen ` :pr:`28840` - |Fix| In :class:`linear_model.Ridge` and :class:`linear_model.RidgeCV`, after `fit`, the `coef_` attribute is now of shape `(n_samples,)` like other linear models. By :user:`Maxwell Liu`, `Guillaume Lemaitre`_, and `Adrin Jalali`_ :pr:`19746` - |Fix| :class:`linear_model.LogisticRegressionCV` corrects sample weight handling for the calculation of test scores. By :user:`Shruti Nath ` :pr:`29419` - |Fix| :class:`linear_model.LassoCV` and :class:`linear_model.ElasticNetCV` now take sample weights into accounts to define the search grid for the internally tuned `alpha` hyper-parameter. By :user:`John Hopfensperger ` and :user:`Shruti Nath ` :pr:`29442` - |Fix| :class:`linear_model.LogisticRegression`, :class:`linear_model.PoissonRegressor`, :class:`linear_model.GammaRegressor`, :class:`linear_model.TweedieRegressor` now take sample weights into account to decide when to fall back to `solver='lbfgs'` whenever `solver='newton-cholesky'` becomes numerically unstable. By :user:`Antoine Baker ` :pr:`29818` - |Fix| :class:`linear_model.RidgeCV` now properly uses predictions on the same scale as the target seen during `fit`. These predictions are stored in `cv_results_` when `scoring != None`. Previously, the predictions were rescaled by the square root of the sample weights and offset by the mean of the target, leading to an incorrect estimate of the score. By :user:`Guillaume Lemaitre `, :user:`Jérôme Dockes ` and :user:`Hanmin Qin ` :pr:`29842` - |Fix| :class:`linear_model.RidgeCV` now properly supports custom multioutput scorers by letting the scorer manage the multioutput averaging. Previously, the predictions and true targets were both squeezed to a 1D array before computing the error. By :user:`Guillaume Lemaitre ` :pr:`29884` - |Fix| :class:`linear_model.LinearRegression` now sets the `cond` parameter when calling the `scipy.linalg.lstsq` solver on dense input data. This ensures more numerically robust results on rank-deficient data. In particular, it empirically fixes the expected equivalence property between fitting with reweighted or with repeated data points. By :user:`Antoine Baker ` :pr:`30040` - |Fix| :class:`linear_model.LogisticRegression` and and other linear models that accept `solver="newton-cholesky"` now report the correct number of iterations when they fall back to the `"lbfgs"` solver because of a rank deficient Hessian matrix. By :user:`Olivier Grisel ` :pr:`30100` - |Fix| :class:`~sklearn.linear_model.SGDOneClassSVM` now correctly inherits from :class:`~sklearn.base.OutlierMixin` and the tags are correctly set. By :user:`Guillaume Lemaitre ` :pr:`30227` - |API| Deprecates `copy_X` in :class:`linear_model.TheilSenRegressor` as the parameter has no effect. `copy_X` will be removed in 1.8. By :user:`Adam Li ` :pr:`29105` :mod:`sklearn.manifold` ----------------------- - |Efficiency| :func:`manifold.locally_linear_embedding` and :class:`manifold.LocallyLinearEmbedding` now allocate more efficiently the memory of sparse matrices in the Hessian, Modified and LTSA methods. By :user:`Giorgio Angelotti ` :pr:`28096` :mod:`sklearn.metrics` ---------------------- - |Efficiency| :func:`sklearn.metrics.classification_report` is now faster by caching classification labels. By :user:`Adrin Jalali ` :pr:`29738` - |Enhancement| :meth:`metrics.RocCurveDisplay.from_estimator`, :meth:`metrics.RocCurveDisplay.from_predictions`, :meth:`metrics.PrecisionRecallDisplay.from_estimator`, and :meth:`metrics.PrecisionRecallDisplay.from_predictions` now accept a new keyword `despine` to remove the top and right spines of the plot in order to make it clearer. By :user:`Yao Xiao ` :pr:`26367` - |Enhancement| :func:`sklearn.metrics.check_scoring` now accepts `raise_exc` to specify whether to raise an exception if a subset of the scorers in multimetric scoring fails or to return an error code. By :user:`Stefanie Senger ` :pr:`28992` - |Fix| :func:`metrics.roc_auc_score` will now correctly return np.nan and warn user if only one class is present in the labels. By :user:`Gleb Levitski ` and :user:`Janez Demšar ` :pr:`27412`, :pr:`30013` - |Fix| The functions :func:`metrics.mean_squared_log_error` and :func:`metrics.root_mean_squared_log_error` now check whether the inputs are within the correct domain for the function :math:`y=\log(1+x)`, rather than :math:`y=\log(x)`. The functions :func:`metrics.mean_absolute_error`, :func:`metrics.mean_absolute_percentage_error`, :func:`metrics.mean_squared_error` and :func:`metrics.root_mean_squared_error` now explicitly check whether a scalar will be returned when `multioutput=uniform_average`. By :user:`Virgil Chan ` :pr:`29709` - |API| The `assert_all_finite` parameter of functions :func:`metrics.pairwise.check_pairwise_arrays` and :func:`metrics.pairwise_distances` is renamed into `ensure_all_finite`. `force_all_finite` will be removed in 1.8. By :user:`Jérémie du Boisberranger ` :pr:`29404` - |API| `scoring="neg_max_error"` should be used instead of `scoring="max_error"` which is now deprecated. By :user:`Farid "Freddie" Taba ` :pr:`29462` - |API| The default value of the `response_method` parameter of :func:`metrics.make_scorer` will change from `None` to `"predict"` and `None` will be removed in 1.8. In the mean time, `None` is equivalent to `"predict"`. By :user:`Jérémie du Boisberranger ` :pr:`30001` :mod:`sklearn.model_selection` ------------------------------ - |Enhancement| :class:`~model_selection.GroupKFold` now has the ability to shuffle groups into different folds when `shuffle=True`. By :user:`Zachary Vealey ` :pr:`28519` - |Enhancement| There is no need to call `fit` on a :class:`~sklearn.model_selection.FixedThresholdClassifier` if the underlying estimator is already fitted. By :user:`Adrin Jalali ` :pr:`30172` - |Fix| Improve error message when :func:`model_selection.RepeatedStratifiedKFold.split` is called without a `y` argument By :user:`Anurag Varma ` :pr:`29402` :mod:`sklearn.neighbors` ------------------------ - |Enhancement| :class:`neighbors.NearestNeighbors`, :class:`neighbors.KNeighborsClassifier`, :class:`neighbors.KNeighborsRegressor`, :class:`neighbors.RadiusNeighborsClassifier`, :class:`neighbors.RadiusNeighborsRegressor`, :class:`neighbors.KNeighborsTransformer`, :class:`neighbors.RadiusNeighborsTransformer`, and :class:`neighbors.LocalOutlierFactor` now work with `metric="nan_euclidean"`, supporting `nan` inputs. By :user:`Carlo Lemos `, `Guillaume Lemaitre`_, and `Adrin Jalali`_ :pr:`25330` - |Enhancement| Add :meth:`neighbors.NearestCentroid.decision_function`, :meth:`neighbors.NearestCentroid.predict_proba` and :meth:`neighbors.NearestCentroid.predict_log_proba` to the :class:`neighbors.NearestCentroid` estimator class. Support the case when `X` is sparse and `shrinking_threshold` is not `None` in :class:`neighbors.NearestCentroid`. By :user:`Matthew Ning ` :pr:`26689` - |Enhancement| Make `predict`, `predict_proba`, and `score` of :class:`neighbors.KNeighborsClassifier` and :class:`neighbors.RadiusNeighborsClassifier` accept `X=None` as input. In this case predictions for all training set points are returned, and points are not included into their own neighbors. By :user:`Dmitry Kobak ` :pr:`30047` - |Fix| :class:`neighbors.LocalOutlierFactor` raises a warning in the `fit` method when duplicate values in the training data lead to inaccurate outlier detection. By :user:`Henrique Caroço ` :pr:`28773` :mod:`sklearn.neural_network` ----------------------------- - |Fix| :class:`neural_network.MLPRegressor` does no longer crash when the model diverges and that `early_stopping` is enabled. By :user:`Marc Bresson ` :pr:`29773` :mod:`sklearn.pipeline` ----------------------- - |MajorFeature| :class:`pipeline.Pipeline` can now transform metadata up to the step requiring the metadata, which can be set using the `transform_input` parameter. By `Adrin Jalali`_ :pr:`28901` - |Enhancement| :class:`pipeline.Pipeline` now warns about not being fitted before calling methods that require the pipeline to be fitted. This warning will become an error in 1.8. By `Adrin Jalali`_ :pr:`29868` - |Fix| Fixed an issue with tags and estimator type of :class:`~sklearn.pipeline.Pipeline` when pipeline is empty. This allows the HTML representation of an empty pipeline to be rendered correctly. By :user:`Gennaro Daniele Acciaro ` :pr:`30203` :mod:`sklearn.preprocessing` ---------------------------- - |Enhancement| Added `warn` option to `handle_unknown` parameter in :class:`preprocessing.OneHotEncoder`. By :user:`Gleb Levitski ` :pr:`28637` - |Enhancement| The HTML representation of :class:`preprocessing.FunctionTransformer` will show the function name in the label. By :user:`Yao Xiao ` :pr:`29158` - |Fix| :class:`preprocessing.PowerTransformer` now uses `scipy.special.inv_boxcox` to output `nan` if the input of BoxCox's inverse is invalid. By :user:`Xuefeng Xu ` :pr:`27875` :mod:`sklearn.semi_supervised` ------------------------------ - |API| :class:`semi_supervised.SelfTrainingClassifier` deprecated the `base_estimator` parameter in favor of `estimator`. By :user:`Adam Li ` :pr:`28494` :mod:`sklearn.tree` ------------------- - |Feature| :class:`tree.ExtraTreeClassifier` and :class:`tree.ExtraTreeRegressor` now support missing-values in the data matrix ``X``. Missing-values are handled by randomly moving all of the samples to the left, or right child node as the tree is traversed. By :user:`Adam Li ` and :user:`Loïc Estève ` :pr:`27966`, :pr:`30318` - |Fix| Escape double quotes for labels and feature names when exporting trees to Graphviz format. By :user:`Santiago M. Mola `. :pr:`17575` :mod:`sklearn.utils` -------------------- - |Enhancement| :func:`utils.check_array` now accepts `ensure_non_negative` to check for negative values in the passed array, until now only available through calling :func:`utils.check_non_negative`. By :user:`Tamara Atanasoska ` :pr:`29540` - |Enhancement| :func:`~sklearn.utils.estimator_checks.check_estimator` and :func:`~sklearn.utils.estimator_checks.parametrize_with_checks` now check and fail if the classifier has the `tags.classifier_tags.multi_class = False` tag but does not fail on multi-class data. By `Adrin Jalali`_ :pr:`29874` - |Enhancement| :func:`utils.validation.check_is_fitted` now passes on stateless estimators. An estimator can indicate it's stateless by setting the `requires_fit` tag. See :ref:`estimator_tags` for more information. By :user:`Adrin Jalali ` :pr:`29880` - |Enhancement| Changes to :func:`~utils.estimator_checks.check_estimator` and :func:`~utils.estimator_checks.parametrize_with_checks`. - :func:`~utils.estimator_checks.check_estimator` introduces new arguments: ``on_skip``, ``on_fail``, and ``callback`` to control the behavior of the check runner. Refer to the API documentation for more details. - ``generate_only=True`` is deprecated in :func:`~utils.estimator_checks.check_estimator`. Use :func:`~utils.estimator_checks.estimator_checks_generator` instead. - The ``_xfail_checks`` estimator tag is now removed, and now in order to indicate which tests are expected to fail, you can pass a dictionary to the :func:`~utils.estimator_checks.check_estimator` as the ``expected_failed_checks`` parameter. Similarly, the ``expected_failed_checks`` parameter in :func:`~utils.estimator_checks.parametrize_with_checks` can be used, which is a callable returning a dictionary of the form:: { "check_name": "reason to mark this check as xfail", } By `Adrin Jalali`_ :pr:`30149` - |Fix| :func:`utils.estimator_checks.parametrize_with_checks` and :func:`utils.estimator_checks.check_estimator` now support estimators that have `set_output` called on them. By :user:`Adrin Jalali ` :pr:`29869` - |API| The `assert_all_finite` parameter of functions :func:`utils.check_array`, :func:`utils.check_X_y`, :func:`utils.as_float_array` is renamed into `ensure_all_finite`. `force_all_finite` will be removed in 1.8. By :user:`Jérémie du Boisberranger ` :pr:`29404` - |API| `utils.estimator_checks.check_sample_weights_invariance` replaced by `utils.estimator_checks.check_sample_weight_equivalence_on_dense_data` which uses integer (including zero) weights and `utils.estimator_checks.check_sample_weight_equivalence_on_sparse_data` which does the same on sparse data. By :user:`Antoine Baker ` :pr:`29818`, :pr:`30137` - |API| Using `_estimator_type` to set the estimator type is deprecated. Inherit from :class:`~sklearn.base.ClassifierMixin`, :class:`~sklearn.base.RegressorMixin`, :class:`~sklearn.base.TransformerMixin`, or :class:`~sklearn.base.OutlierMixin` instead. Alternatively, you can set `estimator_type` in :class:`~sklearn.utils.Tags` in the `__sklearn_tags__` method. By `Adrin Jalali`_ :pr:`30122` .. rubric:: Code and documentation contributors Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.5, including: Aaron Schumacher, Abdulaziz Aloqeely, abhi-jha, Acciaro Gennaro Daniele, Adam J. Stewart, Adam Li, Adeel Hassan, Adeyemi Biola, Aditi Juneja, Adrin Jalali, Aisha, Akanksha Mhadolkar, Akihiro Kuno, Alberto Torres, alexqiao, Alihan Zihna, antoinebaker, Antony Lee, Anurag Varma, Arif Qodari, Arthur Courselle, Arturo Amor, Aswathavicky, Audrey Flanders, aurelienmorgan, Austin, awwwyan, AyGeeEm, a.zy.lee, baggiponte, BlazeStorm001, bme-git, brdav, Brigitta Sipőcz, Cailean Carter, Carlo Lemos, Christian Lorentzen, Christian Veenhuis, claudio, Conrad Stevens, datarollhexasphericon, Davide Chicco, David Matthew Cherney, Dea María Léon, Deepak Saldanha, Deepyaman Datta, dependabot[bot], dinga92, Dmitry Kobak, Drew Craeton, dymil, Edoardo Abati, EmilyXinyi, Eric Larson, Evelyn, fabianhenning, Farid "Freddie" Taba, Gael Varoquaux, Giorgio Angelotti, Gleb Levitski, Guillaume Lemaitre, Guntitat Sawadwuthikul, Henrique Caroço, hhchen1105, Ilya Komarov, Inessa Pawson, Ivan Pan, Ivan Wiryadi, Jaimin Chauhan, Jakob Bull, James Lamb, Janez Demšar, Jérémie du Boisberranger, Jérôme Dockès, Jirair Aroyan, João Morais, Joe Cainey, John Enblom, JorgeCardenas, Joseph Barbier, jpienaar-tuks, Julian Chan, K.Bharat Reddy, Kevin Doshi, Lars, Loic Esteve, Lucy Liu, lunovian, Marc Bresson, Marco Edward Gorelli, Marco Maggi, Marco Wolsza, Maren Westermann, MarieS-WiMLDS, Martin Helm, Mathew Shen, mathurinm, Matthew Feickert, Maxwell Liu, Meekail Zain, Michael Dawson, Miguel Cárdenas, m-maggi, mrastgoo, Natalia Mokeeva, Nathan Goldbaum, Nathan Orgera, nbrown-ScottLogic, Nikita Chistyakov, Nithish Bolleddula, Noam Keidar, NoPenguinsLand, Norbert Preining, notPlancha, Olivier Grisel, Omar Salman, ParsifalXu, Piotr, Priyank Shroff, Priyansh Gupta, Quentin Barthélemy, Rachit23110261, Rahil Parikh, raisadz, Rajath, renaissance0ne, Reshama Shaikh, Roberto Rosati, Robert Pollak, rwelsch427, Santiago M. Mola, scikit-learn-bot, sean moiselle, SHREEKANT VITTHAL NANDIYAWAR, Shruti Nath, Søren Bredlund Caspersen, Stefanie Senger, Steffen Schneider, Štěpán Sršeň, Sylvain Combettes, Tamara, Thomas, Thomas Gessey-Jones, Thomas J. Fan, Thomas Li, Tialo, Tim Head, Tuhin Sharma, Tushar Parimi, vedpawar2254, Victoria Shevchenko, viktor765, Vince Carey, Virgil Chan, Wang Jiayi, Xiao Yuan, Xuefeng Xu, Yao Xiao, yareyaredesuyo, Zachary Vealey, Ziad Amerr