.. include:: _contributors.rst .. currentmodule:: sklearn .. _release_notes_1_7: =========== Version 1.7 =========== .. -- UNCOMMENT WHEN 1.7.0 IS RELEASED -- For a short description of the main highlights of the release, please refer to :ref:`sphx_glr_auto_examples_release_highlights_plot_release_highlights_1_6_0.py`. .. DELETE WHEN 1.7.0 IS RELEASED Since October 2024, DO NOT add your changelog entry in this file. .. Instead, create a file named `..rst` in the relevant sub-folder in `doc/whats_new/upcoming_changes/`. For full details, see: https://github.com/scikit-learn/scikit-learn/blob/main/doc/whats_new/upcoming_changes/README.md .. include:: changelog_legend.inc .. towncrier release notes start .. _changes_1_7_dev0: Version 1.7.dev0 ================ **November 2024** Changes impacting many modules ------------------------------ - |Enhancement| `__sklearn_tags__` was introduced for setting tags in estimators. More details in :ref:`estimator_tags`. By :user:`Thomas Fan ` and :user:`Adrin Jalali ` :pr:`29677` - |Enhancement| Scikit-learn classes and functions can be used while only having a `import sklearn` import line. For example, `import sklearn; sklearn.svm.SVC()` now works. By :user:`Thomas Fan ` :pr:`29793` - |Fix| Classes :class:`metrics.ConfusionMatrixDisplay`, :class:`metrics.RocCurveDisplay`, :class:`calibration.CalibrationDisplay`, :class:`metrics.PrecisionRecallDisplay`, :class:`metrics.PredictionErrorDisplay` and :class:`inspection.PartialDependenceDisplay` now properly handle Matplotlib aliases for style parameters (e.g., `c` and `color`, `ls` and `linestyle`, etc). By :user:`Joseph Barbier ` :pr:`30023` - |API| :func:`utils.validation.validate_data` is introduced and replaces previously private `base.BaseEstimator._validate_data` method. This is intended for third party estimator developers, who should use this function in most cases instead of :func:`utils.check_array` and :func:`utils.check_X_y`. By :user:`Adrin Jalali ` :pr:`29696` Support for Array API --------------------- Additional estimators and functions have been updated to include support for all `Array API `_ compliant inputs. See :ref:`array_api` for more details. - |Feature| :class:`model_selection.GridSearchCV`, :class:`model_selection.RandomizedSearchCV`, :class:`model_selection.HalvingGridSearchCV` and :class:`model_selection.HalvingRandomSearchCV` now support Array API compatible inputs when their base estimators do. By :user:`Tim Head ` and :user:`Olivier Grisel ` :pr:`27096` - |Feature| :class:`preprocessing.LabelEncoder` now supports Array API compatible inputs. By :user:`Omar Salman ` :pr:`27381` - |Feature| :func:`sklearn.metrics.mean_absolute_error` now supports Array API compatible inputs. By :user:`Edoardo Abati ` :pr:`27736` - |Feature| :func:`sklearn.metrics.mean_tweedie_deviance` now supports Array API compatible inputs. By :user:`Thomas Li ` :pr:`28106` - |Feature| :func:`sklearn.metrics.pairwise.cosine_similarity` now supports Array API compatible inputs. By :user:`Edoardo Abati ` :pr:`29014` - |Feature| :func:`sklearn.metrics.pairwise.paired_cosine_distances` now supports Array API compatible inputs. By :user:`Edoardo Abati ` :pr:`29112` - |Feature| :func:`sklearn.metrics.cluster.entropy` now supports Array API compatible inputs. By :user:`Yaroslav Korobko ` :pr:`29141` - |Feature| :func:`sklearn.metrics.mean_squared_error` now supports Array API compatible inputs. By :user:`Yaroslav Korobko ` :pr:`29142` - |Feature| :func:`sklearn.metrics.pairwise.additive_chi2_kernel` now supports Array API compatible inputs. By :user:`Yaroslav Korobko ` :pr:`29144` - |Feature| :func:`sklearn.metrics.d2_tweedie_score` now supports Array API compatible inputs. By :user:`Emily Chen ` :pr:`29207` - |Feature| :func:`sklearn.metrics.max_error` now supports Array API compatible inputs. By :user:`Edoardo Abati ` :pr:`29212` - |Feature| :func:`sklearn.metrics.mean_poisson_deviance` now supports Array API compatible inputs. By :user:`Emily Chen ` :pr:`29227` - |Feature| :func:`sklearn.metrics.mean_gamma_deviance` now supports Array API compatible inputs. By :user:`Emily Chen ` :pr:`29239` - |Feature| :func:`sklearn.metrics.pairwise.cosine_distances` now supports Array API compatible inputs. By :user:`Emily Chen ` :pr:`29265` - |Feature| :func:`sklearn.metrics.pairwise.chi2_kernel` now supports Array API compatible inputs. By :user:`Yaroslav Korobko ` :pr:`29267` - |Feature| :func:`sklearn.metrics.mean_absolute_percentage_error` now supports Array API compatible inputs. By :user:`Emily Chen ` :pr:`29300` - |Feature| :func:`sklearn.metrics.pairwise.paired_euclidean_distances` now supports Array API compatible inputs. By :user:`Emily Chen ` :pr:`29389` - |Feature| :func:`sklearn.metrics.pairwise.euclidean_distances` and :func:`sklearn.metrics.pairwise.rbf_kernel` now supports Array API compatible inputs. By :user:`Omar Salman ` :pr:`29433` - |Feature| :func:`sklearn.metrics.pairwise.linear_kernel`, :func:`sklearn.metrics.pairwise.sigmoid_kernel`, and :func:`sklearn.metrics.pairwise.polynomial_kernel` now supports Array API compatible inputs. By :user:`Omar Salman ` :pr:`29475` - |Feature| :func:`sklearn.metrics.mean_squared_log_error` and :func:`sklearn.metrics.root_mean_squared_log_error` now supports Array API compatible inputs. By :user:`Virgil Chan ` :pr:`29709` - |Feature| :class:`preprocessing.MinMaxScaler` with `clip=True` now supports Array API compatible inputs. By :user:`Shreekant Nandiyawar ` :pr:`29751` - Support for the soon to be deprecated `cupy.array_api` module has been removed in favor of directly supporting the top level `cupy` module, possibly via the `array_api_compat.cupy` compatibility wrapper. By :user:`Olivier Grisel ` :pr:`29639` Metadata routing ---------------- Refer to the :ref:`Metadata Routing User Guide ` for more details. - |Feature| :class:`semi_supervised.SelfTrainingClassifier` now supports metadata routing. The fit method now accepts ``**fit_params`` which are passed to the underlying estimators via their `fit` methods. In addition, the :meth:`~semi_supervised.SelfTrainingClassifier.predict`, :meth:`~semi_supervised.SelfTrainingClassifier.predict_proba`, :meth:`~semi_supervised.SelfTrainingClassifier.predict_log_proba`, :meth:`~semi_supervised.SelfTrainingClassifier.score` and :meth:`~semi_supervised.SelfTrainingClassifier.decision_function` methods also accept ``**params`` which are passed to the underlying estimators via their respective methods. By :user:`Adam Li ` :pr:`28494` - |Feature| :class:`ensemble.StackingClassifier` and :class:`ensemble.StackingRegressor` now support metadata routing and pass ``**fit_params`` to the underlying estimators via their `fit` methods. By :user:`Stefanie Senger ` :pr:`28701` - |Feature| :func:`model_selection.learning_curve` now supports metadata routing for the `fit` method of its estimator and for its underlying CV splitter and scorer. By :user:`Stefanie Senger ` :pr:`28975` - |Feature| :class:`compose.TransformedTargetRegressor` now supports metadata routing in its :meth:`~compose.TransformedTargetRegressor.fit` and :meth:`~compose.TransformedTargetRegressor.predict` methods and routes the corresponding params to the underlying regressor. By :user:`Omar Salman ` :pr:`29136` - |Feature| :class:`feature_selection.SequentialFeatureSelector` now supports metadata routing in its `fit` method and passes the corresponding params to the :func:`model_selection.cross_val_score` function. By :user:`Omar Salman ` :pr:`29260` - |Feature| :func:`model_selection.permutation_test_score` now supports metadata routing for the `fit` method of its estimator and for its underlying CV splitter and scorer. By :user:`Adam Li ` :pr:`29266` - |Feature| :class:`feature_selection.RFE` and :class:`feature_selection.RFECV` now support metadata routing. By :user:`Omar Salman ` :pr:`29312` - |Feature| :func:`model_selection.validation_curve` now supports metadata routing for the `fit` method of its estimator and for its underlying CV splitter and scorer. By :user:`Stefanie Senger ` :pr:`29329` - |Fix| Metadata is routed correctly to grouped CV splitters via :class:`linear_model.RidgeCV` and :class:`linear_model.RidgeClassifierCV` and `UnsetMetadataPassedError` is fixed for :class:`linear_model.RidgeClassifierCV` with default scoring. By :user:`Stefanie Senger ` :pr:`29634` - |Fix| Many method arguments which shouldn't be included in the routing mechanism are now excluded and the `set_{method}_request` methods are not generated for them. By `Adrin Jalali`_ :pr:`29920` Dropping official support for PyPy ---------------------------------- Due to limited maintainer resources and small number of users, official PyPy support has been dropped. Some parts of scikit-learn may still work but PyPy is not tested anymore in the scikit-learn Continuous Integration. By :user:`Loïc Estève ` :pr:`29128` Dropping support for building with setuptools --------------------------------------------- From scikit-learn 1.6 onwards, support for building with setuptools has been removed. Meson is the only supported way to build scikit-learn, see :ref:`Building from source ` for more details. By :user:`Loïc Estève ` :pr:`29400` :mod:`sklearn.base` ------------------- - |Enhancement| Added a function :func:`base.is_clusterer` which determines whether a given estimator is of category clusterer. By :user:`Christian Veenhuis ` :pr:`28936` - |API| Passing a class object to :func:`~sklearn.base.is_classifier`, :func:`~sklearn.base.is_regressor`, :func:`~sklearn.base.is_transformer`, and :func:`~sklearn.base.is_outlier_detector` is now deprecated. Pass an instance instead. By `Adrin Jalali`_ :pr:`30122` :mod:`sklearn.calibration` -------------------------- - |API| `cv="prefit"` is deprecated for :class:`~sklearn.calibration.CalibratedClassifierCV`. Use :class:`~sklearn.frozen.FrozenEstimator` instead, as `CalibratedClassifierCV(FrozenEstimator(estimator))`. By `Adrin Jalali`_ :pr:`30171` :mod:`sklearn.cluster` ---------------------- - |API| The `copy` parameter of :class:`cluster.Birch` was deprecated in 1.6 and will be removed in 1.8. It has no effect as the estimator does not perform in-place operations on the input data. By :user:`Yao Xiao ` :pr:`29124` :mod:`sklearn.compose` ---------------------- - |Enhancement| :func:`sklearn.compose.ColumnTransformer` `verbose_feature_names_out` now accepts string format or callable to generate feature names. By :user:`Marc Bresson ` :pr:`28934` :mod:`sklearn.covariance` ------------------------- - |Efficiency| :class:`covariance.MinCovDet` fitting is now slightly faster. By :user:`Antony Lee ` :pr:`29835` :mod:`sklearn.cross_decomposition` ---------------------------------- - |Fix| :class:`cross_decomposition.PLSRegression` properly raises an error when `n_components` is larger than `n_samples`. By :user:`Thomas Fan ` :pr:`29710` :mod:`sklearn.datasets` ----------------------- - |Feature| :func:`datasets.fetch_file` allows downloading arbitrary data-file from the web. It handles local caching, integrity checks with SHA256 digests and automatic retries in case of HTTP errors. By :user:`Olivier Grisel ` :pr:`29354` :mod:`sklearn.decomposition` ---------------------------- - |Enhancement| :class:`~sklearn.decomposition.LatentDirichletAllocation` now has a ``normalize`` parameter in :meth:`~sklearn.decomposition.LatentDirichletAllocation.transform` and :meth:`~sklearn.decomposition.LatentDirichletAllocation.fit_transform` methods to control whether the document topic distribution is normalized. By `Adrin Jalali`_ :pr:`30097` - |Fix| :class:`~sklearn.decomposition.IncrementalPCA` will now only raise a ``ValueError`` when the number of samples in the input data to ``partial_fit`` is less than the number of components on the first call to ``partial_fit``. Subsequent calls to ``partial_fit`` no longer face this restriction. By :user:`Thomas Gessey-Jones ` :pr:`30224` :mod:`sklearn.discriminant_analysis` ------------------------------------ - |Fix| :class:`discriminant_analysis.QuadraticDiscriminantAnalysis` will now cause `LinAlgWarning` in case of collinear variables. These errors can be silenced using the `reg_param` attribute. By :user:`Alihan Zihna ` :pr:`19731` :mod:`sklearn.ensemble` ----------------------- - |Feature| :class:`ensemble.ExtraTreesClassifier` and :class:`ensemble.ExtraTreesRegressor` now support missing-values in the data matrix `X`. Missing-values are handled by randomly moving all of the samples to the left, or right child node as the tree is traversed. By :user:`Adam Li ` :pr:`28268` - |Efficiency| Small runtime improvement of fitting :class:`ensemble.HistGradientBoostingClassifier` and :class:`ensemble.HistGradientBoostingRegressor` by parallelizing the initial search for bin thresholds. By :user:`Christian Lorentzen ` :pr:`28064` - |Efficiency| :class:`ensemble.IsolationForest` now runs parallel jobs during :term:`predict` offering a speedup of up to 2-4x on sample sizes larger than 2000 using `joblib`. By :user:`Adam Li ` and :user:`Sérgio Pereira ` :pr:`28622` - |Enhancement| The verbosity of :class:`ensemble.HistGradientBoostingClassifier` and :class:`ensemble.HistGradientBoostingRegressor` got a more granular control. Now, `verbose = 1` prints only summary messages, `verbose >= 2` prints the full information as before. By :user:`Christian Lorentzen ` :pr:`28179` - |API| The parameter `algorithm` of :class:`ensemble.AdaBoostClassifier` is deprecated and will be removed in 1.8. By :user:`Jérémie du Boisberranger ` :pr:`29997` :mod:`sklearn.feature_extraction` --------------------------------- - |Fix| :class:`feature_extraction.text.TfidfVectorizer` now correctly preserves the `dtype` of `idf_` based on the input data. By :user:`Guillaume Lemaitre ` :pr:`30022` :mod:`sklearn.frozen` --------------------- - |MajorFeature| :class:`~sklearn.frozen.FrozenEstimator` is now introduced which allows freezing an estimator. This means calling `.fit` on it has no effect, and doing a `clone(frozenestimator)` returns the same estimator instead of an unfitted clone. :pr:`29705` By `Adrin Jalali`_ :pr:`29705` :mod:`sklearn.impute` --------------------- - |Fix| :class:`impute.KNNImputer` excludes samples with nan distances when computing the mean value for uniform weights. By :user:`Xuefeng Xu ` :pr:`29135` - |Fix| When `min_value` and `max_value` are array-like and some features are dropped due to `keep_empty_features=False`, :class:`impute.IterativeImputer` no longer raises an error and now indexes correctly. By :user:`Guntitat Sawadwuthikul ` :pr:`29451` - |Fix| Fixed :class:`impute.IterativeImputer` to make sure that it does not skip the iterative process when `keep_empty_features` is set to `True`. By :user:`Arif Qodari ` :pr:`29779` - |API| Add a warning in :class:`impute.SimpleImputer` when `keep_empty_feature=False` and `strategy="constant"`. In this case empty features are not dropped and this behaviour will change in 1.8. By :user:`Arthur Courselle ` and :user:`Simon Riou ` :pr:`29950` :mod:`sklearn.inspection` ------------------------- - |Enhancement| Add `custom_values` parameter in :func:`inspection.partial_dependence`. It enables users to pass their own grid of values at which the partial dependence should be calculated. By :user:`Freddy A. Boulton ` and :user:`Stephen Pardy ` :pr:`26202` :mod:`sklearn.linear_model` --------------------------- - |Enhancement| The `solver="newton-cholesky"` in :class:`linear_model.LogisticRegression` and :class:`linear_model.LogisticRegressionCV` is extended to support the full multinomial loss in a multiclass setting. By :user:`Christian Lorentzen ` :pr:`28840` - |Fix| In :class:`linear_model.Ridge` and :class:`linear_model.RidgeCV`, after `fit`, the `coef_` attribute is now of shape `(n_samples,)` like other linear models. By :user:`Maxwell Liu`, `Guillaume Lemaitre`_, and `Adrin Jalali`_ :pr:`19746` - |Fix| :class:`linear_model.LogisticRegressionCV` corrects sample weight handling for the calculation of test scores. By :user:`Shruti Nath ` :pr:`29419` - |Fix| :class:`linear_model.LassoCV` and :class:`linear_model.ElasticNetCV` now take sample weights into accounts to define the search grid for the internally tuned `alpha` hyper-parameter. By :user:`John Hopfensperger ` and :user:`Shruti Nath ` :pr:`29442` - |Fix| :class:`linear_model.LogisticRegression`, :class:`linear_model.PoissonRegressor`, :class:`linear_model.GammaRegressor`, :class:`linear_model.TweedieRegressor` now take sample weights into account to decide when to fall back to `solver='lbfgs'` whenever `solver='newton-cholesky'` becomes numerically unstable. By :user:`Antoine Baker ` :pr:`29818` - |Fix| :class:`linear_model.RidgeCV` now properly uses predictions on the same scale as the target seen during `fit`. These predictions are stored in `cv_results_` when `scoring != None`. Previously, the predictions were rescaled by the square root of the sample weights and offset by the mean of the target, leading to an incorrect estimate of the score. By :user:`Guillaume Lemaitre `, :user:`Jérôme Dockes ` and :user:`Hanmin Qin ` :pr:`29842` - |Fix| :class:`linear_model.RidgeCV` now properly supports custom multioutput scorers by letting the scorer manage the multioutput averaging. Previously, the predictions and true targets were both squeezed to a 1D array before computing the error. By :user:`Guillaume Lemaitre ` :pr:`29884` - |Fix| :class:`linear_model.LinearRegression` now sets the `cond` parameter when calling the `scipy.linalg.lstsq` solver on dense input data. This ensures more numerically robust results on rank-deficient data. In particular, it empirically fixes the expected equivalence property between fitting with reweighted or with repeated data points. By :user:`Antoine Baker ` :pr:`30040` - |Fix| :class:`linear_model.LogisticRegression` and and other linear models that accept `solver="newton-cholesky"` now report the correct number of iterations when they fall back to the `"lbfgs"` solver because of a rank deficient Hessian matrix. By :user:`Olivier Grisel ` :pr:`30100` - |Fix| :class:`~sklearn.linear_model.SGDOneClassSVM` now correctly inherits from :class:`~sklearn.base.OutlierMixin` and the tags are correctly set. By :user:`Guillaume Lemaitre ` :pr:`30227` - |API| Deprecates `copy_X` in :class:`linear_model.TheilSenRegressor` as the parameter has no effect. `copy_X` will be removed in 1.8. By :user:`Adam Li ` :pr:`29105` :mod:`sklearn.manifold` ----------------------- - |Efficiency| :func:`manifold.locally_linear_embedding` and :class:`manifold.LocallyLinearEmbedding` now allocate more efficiently the memory of sparse matrices in the Hessian, Modified and LTSA methods. By :user:`Giorgio Angelotti ` :pr:`28096` :mod:`sklearn.metrics` ---------------------- - |Efficiency| :func:`sklearn.metrics.classification_report` is now faster by caching classification labels. By :user:`Adrin Jalali ` :pr:`29738` - |Enhancement| :meth:`metrics.RocCurveDisplay.from_estimator`, :meth:`metrics.RocCurveDisplay.from_predictions`, :meth:`metrics.PrecisionRecallDisplay.from_estimator`, and :meth:`metrics.PrecisionRecallDisplay.from_predictions` now accept a new keyword `despine` to remove the top and right spines of the plot in order to make it clearer. By :user:`Yao Xiao ` :pr:`26367` - |Enhancement| :func:`sklearn.metrics.check_scoring` now accepts `raise_exc` to specify whether to raise an exception if a subset of the scorers in multimetric scoring fails or to return an error code. By :user:`Stefanie Senger ` :pr:`28992` - |Fix| :func:`metrics.roc_auc_score` will now correctly return np.nan and warn user if only one class is present in the labels. By :user:`Gleb Levitski ` and :user:`Janez Demšar ` :pr:`27412`, :pr:`30013` - |Fix| The functions :func:`metrics.mean_squared_log_error` and :func:`metrics.root_mean_squared_log_error` now check whether the inputs are within the correct domain for the function :math:`y=\log(1+x)`, rather than :math:`y=\log(x)`. The functions :func:`metrics.mean_absolute_error`, :func:`metrics.mean_absolute_percentage_error`, :func:`metrics.mean_squared_error` and :func:`metrics.root_mean_squared_error` now explicitly check whether a scalar will be returned when `multioutput=uniform_average`. By :user:`Virgil Chan ` :pr:`29709` - |API| The `assert_all_finite` parameter of functions :func:`metrics.pairwise.check_pairwise_arrays` and :func:`metrics.pairwise_distances` is renamed into `ensure_all_finite`. `force_all_finite` will be removed in 1.8. By :user:`Jérémie du Boisberranger ` :pr:`29404` - |API| `scoring="neg_max_error"` should be used instead of `scoring="max_error"` which is now deprecated. By :user:`Farid "Freddie" Taba ` :pr:`29462` - |API| The default value of the `response_method` parameter of :func:`metrics.make_scorer` will change from `None` to `"predict"` and `None` will be removed in 1.8. In the mean time, `None` is equivalent to `"predict"`. By :user:`Jérémie du Boisberranger ` :pr:`30001` :mod:`sklearn.model_selection` ------------------------------ - |Enhancement| :class:`~model_selection.GroupKFold` now has the ability to shuffle groups into different folds when `shuffle=True`. By :user:`Zachary Vealey ` :pr:`28519` - |Enhancement| There is no need to call `fit` on a :class:`~sklearn.model_selection.FixedThresholdClassifier` if the underlying estimator is already fitted. By :user:`Adrin Jalali ` :pr:`30172` - |Fix| Improve error message when :func:`model_selection.RepeatedStratifiedKFold.split` is called without a `y` argument By :user:`Anurag Varma ` :pr:`29402` :mod:`sklearn.neighbors` ------------------------ - |Enhancement| :class:`neighbors.NearestNeighbors`, :class:`neighbors.KNeighborsClassifier`, :class:`neighbors.KNeighborsRegressor`, :class:`neighbors.RadiusNeighborsClassifier`, :class:`neighbors.RadiusNeighborsRegressor`, :class:`neighbors.KNeighborsTransformer`, :class:`neighbors.RadiusNeighborsTransformer`, and :class:`neighbors.LocalOutlierFactor` now work with `metric="nan_euclidean"`, supporting `nan` inputs. By :user:`Carlo Lemos `, `Guillaume Lemaitre`_, and `Adrin Jalali`_ :pr:`25330` - |Enhancement| Add :meth:`neighbors.NearestCentroid.decision_function`, :meth:`neighbors.NearestCentroid.predict_proba` and :meth:`neighbors.NearestCentroid.predict_log_proba` to the :class:`neighbors.NearestCentroid` estimator class. Support the case when `X` is sparse and `shrinking_threshold` is not `None` in :class:`neighbors.NearestCentroid`. By :user:`Matthew Ning ` :pr:`26689` - |Enhancement| Make `predict`, `predict_proba`, and `score` of :class:`neighbors.KNeighborsClassifier` and :class:`neighbors.RadiusNeighborsClassifier` accept `X=None` as input. In this case predictions for all training set points are returned, and points are not included into their own neighbors. By :user:`Dmitry Kobak ` :pr:`30047` - |Fix| :class:`neighbors.LocalOutlierFactor` raises a warning in the `fit` method when duplicate values in the training data lead to inaccurate outlier detection. By :user:`Henrique Caroço ` :pr:`28773` :mod:`sklearn.neural_network` ----------------------------- - |Fix| :class:`neural_network.MLPRegressor` does no longer crash when the model diverges and that `early_stopping` is enabled. By :user:`Marc Bresson ` :pr:`29773` :mod:`sklearn.pipeline` ----------------------- - |MajorFeature| :class:`pipeline.Pipeline` can now transform metadata up to the step requiring the metadata, which can be set using the `transform_input` parameter. By `Adrin Jalali`_ :pr:`28901` - |Enhancement| :class:`pipeline.Pipeline` now warns about not being fitted before calling methods that require the pipeline to be fitted. This warning will become an error in 1.8. By `Adrin Jalali`_ :pr:`29868` - |Fix| Fixed an issue with tags and estimator type of :class:`~sklearn.pipeline.Pipeline` when pipeline is empty. This allows the HTML representation of an empty pipeline to be rendered correctly. By :user:`Gennaro Daniele Acciaro ` :pr:`30203` :mod:`sklearn.preprocessing` ---------------------------- - |Enhancement| Added `warn` option to `handle_unknown` parameter in :class:`preprocessing.OneHotEncoder`. By :user:`Gleb Levitski ` :pr:`28637` - |Enhancement| The HTML representation of :class:`preprocessing.FunctionTransformer` will show the function name in the label. By :user:`Yao Xiao ` :pr:`29158` - |Fix| :class:`preprocessing.PowerTransformer` now uses `scipy.special.inv_boxcox` to output `nan` if the input of BoxCox's inverse is invalid. By :user:`Xuefeng Xu ` :pr:`27875` :mod:`sklearn.semi_supervised` ------------------------------ - |API| :class:`semi_supervised.SelfTrainingClassifier` deprecated the `base_estimator` parameter in favor of `estimator`. By :user:`Adam Li ` :pr:`28494` :mod:`sklearn.tree` ------------------- - |Feature| :class:`tree.ExtraTreeClassifier` and :class:`tree.ExtraTreeRegressor` now support missing-values in the data matrix ``X``. Missing-values are handled by randomly moving all of the samples to the left, or right child node as the tree is traversed. By :user:`Adam Li ` :pr:`27966` - |Fix| Escape double quotes for labels and feature names when exporting trees to Graphviz format. By :user:`Santiago M. Mola `. :pr:`17575` :mod:`sklearn.utils` -------------------- - |Enhancement| :func:`utils.check_array` now accepts `ensure_non_negative` to check for negative values in the passed array, until now only available through calling :func:`utils.check_non_negative`. By :user:`Tamara Atanasoska ` :pr:`29540` - |Enhancement| :func:`~sklearn.utils.estimator_checks.check_estimator` and :func:`~sklearn.utils.estimator_checks.parametrize_with_checks` now check and fail if the classifier has the `tags.classifier_tags.multi_class = False` tag but does not fail on multi-class data. By `Adrin Jalali`_ :pr:`29874` - |Enhancement| :func:`utils.validation.check_is_fitted` now passes on stateless estimators. An estimator can indicate it's stateless by setting the `requires_fit` tag. See :ref:`estimator_tags` for more information. By :user:`Adrin Jalali ` :pr:`29880` - |Enhancement| Changes to :func:`~utils.estimator_checks.check_estimator` and :func:`~utils.estimator_checks.parametrize_with_checks`. - :func:`~utils.estimator_checks.check_estimator` introduces new arguments: ``on_skip``, ``on_fail``, and ``callback`` to control the behavior of the check runner. Refer to the API documentation for more details. - ``generate_only=True`` is deprecated in :func:`~utils.estimator_checks.check_estimator`. Use :func:`~utils.estimator_checks.estimator_checks_generator` instead. - The ``_xfail_checks`` estimator tag is now removed, and now in order to indicate which tests are expected to fail, you can pass a dictionary to the :func:`~utils.estimator_checks.check_estimator` as the ``expected_failed_checks`` parameter. Similarly, the ``expected_failed_checks`` parameter in :func:`~utils.estimator_checks.parametrize_with_checks` can be used, which is a callable returning a dictionary of the form:: { "check_name": "reason to mark this check as xfail", } By `Adrin Jalali`_ :pr:`30149` - |Fix| :func:`utils.estimator_checks.parametrize_with_checks` and :func:`utils.estimator_checks.check_estimator` now support estimators that have `set_output` called on them. By :user:`Adrin Jalali ` :pr:`29869` - |API| The `assert_all_finite` parameter of functions :func:`utils.check_array`, :func:`utils.check_X_y`, :func:`utils.as_float_array` is renamed into `ensure_all_finite`. `force_all_finite` will be removed in 1.8. By :user:`Jérémie du Boisberranger ` :pr:`29404` - |API| :func:`check_estimators.check_sample_weights_invariance` replaced by :func:`check_estimators.check_sample_weight_equivalence` which uses integer (including zero) weights. By :user:`Antoine Baker ` :pr:`29818` - |API| Using `_estimator_type` to set the estimator type is deprecated. Inherit from :class:`~sklearn.base.ClassifierMixin`, :class:`~sklearn.base.RegressorMixin`, :class:`~sklearn.base.TransformerMixin`, or :class:`~sklearn.base.OutlierMixin` instead. Alternatively, you can set `estimator_type` in :class:`~sklearn.utils.Tags` in the `__sklearn_tags__` method. By `Adrin Jalali`_ :pr:`30122` .. rubric:: Code and documentation contributors Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.7, including: TODO: update at the time of the release.