.. include:: _contributors.rst .. currentmodule:: sklearn .. _release_notes_1_4: =========== Version 1.4 =========== For a short description of the main highlights of the release, please refer to :ref:`sphx_glr_auto_examples_release_highlights_plot_release_highlights_1_4_0.py`. .. include:: changelog_legend.inc .. _changes_1_4_2: Version 1.4.2 ============= **April 2024** This release only includes support for numpy 2. .. _changes_1_4_1: Version 1.4.1.post1 =================== **February 2024** .. note:: The 1.4.1.post1 release includes a packaging fix requiring `numpy<2` to account for incompatibilities with NumPy 2.0 ABI. Note that the 1.4.1 release is not available on PyPI and conda-forge. Metadata Routing ---------------- - |FIX| Fix routing issue with :class:`~compose.ColumnTransformer` when used inside another meta-estimator. :pr:`28188` by `Adrin Jalali`_. - |Fix| No error is raised when no metadata is passed to a metaestimator that includes a sub-estimator which doesn't support metadata routing. :pr:`28256` by `Adrin Jalali`_. DataFrame Support ----------------- - |Enhancement| |Fix| Pandas and Polars dataframe are validated directly without ducktyping checks. :pr:`28195` by `Thomas Fan`_. Changes impacting many modules ------------------------------ - |Efficiency| |Fix| Partial revert of :pr:`28191` to avoid a performance regression for estimators relying on euclidean pairwise computation with sparse matrices. The impacted estimators are: - :func:`sklearn.metrics.pairwise_distances_argmin` - :func:`sklearn.metrics.pairwise_distances_argmin_min` - :class:`sklearn.cluster.AffinityPropagation` - :class:`sklearn.cluster.Birch` - :class:`sklearn.cluster.SpectralClustering` - :class:`sklearn.neighbors.KNeighborsClassifier` - :class:`sklearn.neighbors.KNeighborsRegressor` - :class:`sklearn.neighbors.RadiusNeighborsClassifier` - :class:`sklearn.neighbors.RadiusNeighborsRegressor` - :class:`sklearn.neighbors.LocalOutlierFactor` - :class:`sklearn.neighbors.NearestNeighbors` - :class:`sklearn.manifold.Isomap` - :class:`sklearn.manifold.TSNE` - :func:`sklearn.manifold.trustworthiness` :pr:`28235` by :user:`Julien Jerphanion `. - |Fix| Fixes a bug for all scikit-learn transformers when using `set_output` with `transform` set to `pandas` or `polars`. The bug could lead to wrong naming of the columns of the returned dataframe. :pr:`28262` by :user:`Guillaume Lemaitre `. - |Fix| When users try to use a method in :class:`~ensemble.StackingClassifier`, :class:`~ensemble.StackingClassifier`, :class:`~ensemble.StackingClassifier`, :class:`~feature_selection.SelectFromModel`, :class:`~feature_selection.RFE`, :class:`~semi_supervised.SelfTrainingClassifier`, :class:`~multiclass.OneVsOneClassifier`, :class:`~multiclass.OutputCodeClassifier` or :class:`~multiclass.OneVsRestClassifier` that their sub-estimators don't implement, the `AttributeError` now reraises in the traceback. :pr:`28167` by :user:`Stefanie Senger `. Metadata Routing ---------------- - |Fix| Fix :class:`multioutput.MultiOutputRegressor` and :class:`multioutput.MultiOutputClassifier` to work with estimators that don't consume any metadata when metadata routing is enabled. :pr:`28240` by `Adrin Jalali`_. Changelog --------- :mod:`sklearn.calibration` .......................... - |Fix| `calibration.CalibratedClassifierCV` supports :term:`predict_proba` with float32 output from the inner estimator. :pr:`28247` by `Thomas Fan`_. :mod:`sklearn.cluster` ...................... - |Fix| :class:`cluster.AffinityPropagation` now avoids assigning multiple different clusters for equal points. :pr:`28121` by :user:`Pietro Peterlongo ` and :user:`Yao Xiao `. - |Fix| Avoid infinite loop in :class:`cluster.KMeans` when the number of clusters is larger than the number of non-duplicate samples. :pr:`28165` by :user:`Jérémie du Boisberranger `. :mod:`sklearn.compose` ...................... - |Fix| :class:`compose.ColumnTransformer` now transform into a polars dataframe when `verbose_feature_names_out=True` and the transformers internally used several times the same columns. Previously, it would raise a due to duplicated column names. :pr:`28262` by :user:`Guillaume Lemaitre `. :mod:`sklearn.ensemble` ....................... - |Fix| :class:`HistGradientBoostingClassifier` and :class:`HistGradientBoostingRegressor` when fitted on `pandas` `DataFrame` with extension dtypes, for example `pd.Int64Dtype` :pr:`28385` by :user:`Loïc Estève `. - |Fix| Fixes error message raised by :class:`ensemble.VotingClassifier` when the target is multilabel or multiclass-multioutput in a DataFrame format. :pr:`27702` by :user:`Guillaume Lemaitre `. :mod:`sklearn.impute` ..................... - |Fix|: :class:`impute.SimpleImputer` now raises an error in `.fit` and `.transform` if `fill_value` can not be cast to input value dtype with `casting='same_kind'`. :pr:`28365` by :user:`Leo Grinsztajn `. :mod:`sklearn.inspection` ......................... - |Fix| :func:`inspection.permutation_importance` now handles properly `sample_weight` together with subsampling (i.e. `max_features` < 1.0). :pr:`28184` by :user:`Michael Mayer `. :mod:`sklearn.linear_model` ........................... - |Fix| :class:`linear_model.ARDRegression` now handles pandas input types for `predict(X, return_std=True)`. :pr:`28377` by :user:`Eddie Bergman `. :mod:`sklearn.preprocessing` ............................ - |Fix| make :class:`preprocessing.FunctionTransformer` more lenient and overwrite output column names with the `get_feature_names_out` in the following cases: (i) the input and output column names remain the same (happen when using NumPy `ufunc`); (ii) the input column names are numbers; (iii) the output will be set to Pandas or Polars dataframe. :pr:`28241` by :user:`Guillaume Lemaitre `. - |Fix| :class:`preprocessing.FunctionTransformer` now also warns when `set_output` is called with `transform="polars"` and `func` does not return a Polars dataframe or `feature_names_out` is not specified. :pr:`28263` by :user:`Guillaume Lemaitre `. - |Fix| :class:`preprocessing.TargetEncoder` no longer fails when `target_type="continuous"` and the input is read-only. In particular, it now works with pandas copy-on-write mode enabled. :pr:`28233` by :user:`John Hopfensperger `. :mod:`sklearn.tree` ................... - |Fix| :class:`tree.DecisionTreeClassifier` and :class:`tree.DecisionTreeRegressor` are handling missing values properly. The internal criterion was not initialized when no missing values were present in the data, leading to potentially wrong criterion values. :pr:`28295` by :user:`Guillaume Lemaitre ` and :pr:`28327` by :user:`Adam Li `. :mod:`sklearn.utils` .................... - |Enhancement| |Fix| :func:`utils.metaestimators.available_if` now reraises the error from the `check` function as the cause of the `AttributeError`. :pr:`28198` by `Thomas Fan`_. - |Fix| :func:`utils._safe_indexing` now raises a `ValueError` when `X` is a Python list and `axis=1`, as documented in the docstring. :pr:`28222` by :user:`Guillaume Lemaitre `. .. _changes_1_4: Version 1.4.0 ============= **January 2024** Changed models -------------- The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures. - |Efficiency| :class:`linear_model.LogisticRegression` and :class:`linear_model.LogisticRegressionCV` now have much better convergence for solvers `"lbfgs"` and `"newton-cg"`. Both solvers can now reach much higher precision for the coefficients depending on the specified `tol`. Additionally, lbfgs can make better use of `tol`, i.e., stop sooner or reach higher precision. Note: The lbfgs is the default solver, so this change might effect many models. This change also means that with this new version of scikit-learn, the resulting coefficients `coef_` and `intercept_` of your models will change for these two solvers (when fit on the same data again). The amount of change depends on the specified `tol`, for small values you will get more precise results. :pr:`26721` by :user:`Christian Lorentzen `. - |Fix| fixes a memory leak seen in PyPy for estimators using the Cython loss functions. :pr:`27670` by :user:`Guillaume Lemaitre `. Changes impacting all modules ----------------------------- - |MajorFeature| Transformers now support polars output with `set_output(transform="polars")`. :pr:`27315` by `Thomas Fan`_. - |Enhancement| All estimators now recognizes the column names from any dataframe that adopts the `DataFrame Interchange Protocol `__. Dataframes that return a correct representation through `np.asarray(df)` is expected to work with our estimators and functions. :pr:`26464` by `Thomas Fan`_. - |Enhancement| The HTML representation of estimators now includes a link to the documentation and is color-coded to denote whether the estimator is fitted or not (unfitted estimators are orange, fitted estimators are blue). :pr:`26616` by :user:`Riccardo Cappuzzo `, :user:`Ines Ibnukhsein `, :user:`Gael Varoquaux `, `Joel Nothman`_ and :user:`Lilian Boulard `. - |Fix| Fixed a bug in most estimators and functions where setting a parameter to a large integer would cause a `TypeError`. :pr:`26648` by :user:`Naoise Holohan `. Metadata Routing ---------------- The following models now support metadata routing in one or more or their methods. Refer to the :ref:`Metadata Routing User Guide ` for more details. - |Feature| :class:`LarsCV` and :class:`LassoLarsCV` now support metadata routing in their `fit` method and route metadata to the CV splitter. :pr:`27538` by :user:`Omar Salman `. - |Feature| :class:`multiclass.OneVsRestClassifier`, :class:`multiclass.OneVsOneClassifier` and :class:`multiclass.OutputCodeClassifier` now support metadata routing in their ``fit`` and ``partial_fit``, and route metadata to the underlying estimator's ``fit`` and ``partial_fit``. :pr:`27308` by :user:`Stefanie Senger `. - |Feature| :class:`pipeline.Pipeline` now supports metadata routing according to :ref:`metadata routing user guide `. :pr:`26789` by `Adrin Jalali`_. - |Feature| :func:`~model_selection.cross_validate`, :func:`~model_selection.cross_val_score`, and :func:`~model_selection.cross_val_predict` now support metadata routing. The metadata are routed to the estimator's `fit`, the scorer, and the CV splitter's `split`. The metadata is accepted via the new `params` parameter. `fit_params` is deprecated and will be removed in version 1.6. `groups` parameter is also not accepted as a separate argument when metadata routing is enabled and should be passed via the `params` parameter. :pr:`26896` by `Adrin Jalali`_. - |Feature| :class:`~model_selection.GridSearchCV`, :class:`~model_selection.RandomizedSearchCV`, :class:`~model_selection.HalvingGridSearchCV`, and :class:`~model_selection.HalvingRandomSearchCV` now support metadata routing in their ``fit`` and ``score``, and route metadata to the underlying estimator's ``fit``, the CV splitter, and the scorer. :pr:`27058` by `Adrin Jalali`_. - |Feature| :class:`~compose.ColumnTransformer` now supports metadata routing according to :ref:`metadata routing user guide `. :pr:`27005` by `Adrin Jalali`_. - |Feature| :class:`linear_model.LogisticRegressionCV` now supports metadata routing. :meth:`linear_model.LogisticRegressionCV.fit` now accepts ``**params`` which are passed to the underlying splitter and scorer. :meth:`linear_model.LogisticRegressionCV.score` now accepts ``**score_params`` which are passed to the underlying scorer. :pr:`26525` by :user:`Omar Salman `. - |Feature| :class:`feature_selection.SelectFromModel` now supports metadata routing in `fit` and `partial_fit`. :pr:`27490` by :user:`Stefanie Senger `. - |Feature| :class:`linear_model.OrthogonalMatchingPursuitCV` now supports metadata routing. Its `fit` now accepts ``**fit_params``, which are passed to the underlying splitter. :pr:`27500` by :user:`Stefanie Senger `. - |Feature| :class:`ElasticNetCV`, :class:`LassoCV`, :class:`MultiTaskElasticNetCV` and :class:`MultiTaskLassoCV` now support metadata routing and route metadata to the CV splitter. :pr:`27478` by :user:`Omar Salman `. - |Fix| All meta-estimators for which metadata routing is not yet implemented now raise a `NotImplementedError` on `get_metadata_routing` and on `fit` if metadata routing is enabled and any metadata is passed to them. :pr:`27389` by `Adrin Jalali`_. Support for SciPy sparse arrays ------------------------------- Several estimators are now supporting SciPy sparse arrays. The following functions and classes are impacted: **Functions:** - :func:`cluster.compute_optics_graph` in :pr:`27104` by :user:`Maren Westermann ` and in :pr:`27250` by :user:`Yao Xiao `; - :func:`cluster.kmeans_plusplus` in :pr:`27179` by :user:`Nurseit Kamchyev `; - :func:`decomposition.non_negative_factorization` in :pr:`27100` by :user:`Isaac Virshup `; - :func:`feature_selection.f_regression` in :pr:`27239` by :user:`Yaroslav Korobko `; - :func:`feature_selection.r_regression` in :pr:`27239` by :user:`Yaroslav Korobko `; - :func:`manifold.trustworthiness` in :pr:`27250` by :user:`Yao Xiao `; - :func:`manifold.spectral_embedding` in :pr:`27240` by :user:`Yao Xiao `; - :func:`metrics.pairwise_distances` in :pr:`27250` by :user:`Yao Xiao `; - :func:`metrics.pairwise_distances_chunked` in :pr:`27250` by :user:`Yao Xiao `; - :func:`metrics.pairwise.pairwise_kernels` in :pr:`27250` by :user:`Yao Xiao `; - :func:`utils.multiclass.type_of_target` in :pr:`27274` by :user:`Yao Xiao `. **Classes:** - :class:`cluster.HDBSCAN` in :pr:`27250` by :user:`Yao Xiao `; - :class:`cluster.KMeans` in :pr:`27179` by :user:`Nurseit Kamchyev `; - :class:`cluster.MiniBatchKMeans` in :pr:`27179` by :user:`Nurseit Kamchyev `; - :class:`cluster.OPTICS` in :pr:`27104` by :user:`Maren Westermann ` and in :pr:`27250` by :user:`Yao Xiao `; - :class:`cluster.SpectralClustering` in :pr:`27161` by :user:`Bharat Raghunathan `; - :class:`decomposition.MiniBatchNMF` in :pr:`27100` by :user:`Isaac Virshup `; - :class:`decomposition.NMF` in :pr:`27100` by :user:`Isaac Virshup `; - :class:`feature_extraction.text.TfidfTransformer` in :pr:`27219` by :user:`Yao Xiao `; - :class:`manifold.Isomap` in :pr:`27250` by :user:`Yao Xiao `; - :class:`manifold.SpectralEmbedding` in :pr:`27240` by :user:`Yao Xiao `; - :class:`manifold.TSNE` in :pr:`27250` by :user:`Yao Xiao `; - :class:`impute.SimpleImputer` in :pr:`27277` by :user:`Yao Xiao `; - :class:`impute.IterativeImputer` in :pr:`27277` by :user:`Yao Xiao `; - :class:`impute.KNNImputer` in :pr:`27277` by :user:`Yao Xiao `; - :class:`kernel_approximation.PolynomialCountSketch` in :pr:`27301` by :user:`Lohit SundaramahaLingam `; - :class:`neural_network.BernoulliRBM` in :pr:`27252` by :user:`Yao Xiao `; - :class:`preprocessing.PolynomialFeatures` in :pr:`27166` by :user:`Mohit Joshi `; - :class:`random_projection.GaussianRandomProjection` in :pr:`27314` by :user:`Stefanie Senger `; - :class:`random_projection.SparseRandomProjection` in :pr:`27314` by :user:`Stefanie Senger `. Support for Array API --------------------- Several estimators and functions support the `Array API `_. Such changes allows for using the estimators and functions with other libraries such as JAX, CuPy, and PyTorch. This therefore enables some GPU-accelerated computations. See :ref:`array_api` for more details. **Functions:** - :func:`sklearn.metrics.accuracy_score` and :func:`sklearn.metrics.zero_one_loss` in :pr:`27137` by :user:`Edoardo Abati `; - :func:`sklearn.model_selection.train_test_split` in :pr:`26855` by `Tim Head`_; - :func:`~utils.multiclass.is_multilabel` in :pr:`27601` by :user:`Yaroslav Korobko `. **Classes:** - :class:`decomposition.PCA` for the `full` and `randomized` solvers (with QR power iterations) in :pr:`26315`, :pr:`27098` and :pr:`27431` by :user:`Mateusz Sokół `, :user:`Olivier Grisel ` and :user:`Edoardo Abati `; - :class:`preprocessing.KernelCenterer` in :pr:`27556` by :user:`Edoardo Abati `; - :class:`preprocessing.MaxAbsScaler` in :pr:`27110` by :user:`Edoardo Abati `; - :class:`preprocessing.MinMaxScaler` in :pr:`26243` by `Tim Head`_; - :class:`preprocessing.Normalizer` in :pr:`27558` by :user:`Edoardo Abati `. Private Loss Function Module ---------------------------- - |FIX| The gradient computation of the binomial log loss is now numerically more stable for very large, in absolute value, input (raw predictions). Before, it could result in `np.nan`. Among the models that profit from this change are :class:`ensemble.GradientBoostingClassifier`, :class:`ensemble.HistGradientBoostingClassifier` and :class:`linear_model.LogisticRegression`. :pr:`28048` by :user:`Christian Lorentzen `. Changelog --------- .. Entries should be grouped by module (in alphabetic order) and prefixed with one of the labels: |MajorFeature|, |Feature|, |Efficiency|, |Enhancement|, |Fix| or |API| (see whats_new.rst for descriptions). Entries should be ordered by those labels (e.g. |Fix| after |Efficiency|). Changes not specific to a module should be listed under *Multiple Modules* or *Miscellaneous*. Entries should end with: :pr:`123456` by :user:`Joe Bloggs `. where 123455 is the *pull request* number, not the issue number. :mod:`sklearn.base` ................... - |Enhancement| :meth:`base.ClusterMixin.fit_predict` and :meth:`base.OutlierMixin.fit_predict` now accept ``**kwargs`` which are passed to the ``fit`` method of the estimator. :pr:`26506` by `Adrin Jalali`_. - |Enhancement| :meth:`base.TransformerMixin.fit_transform` and :meth:`base.OutlierMixin.fit_predict` now raise a warning if ``transform`` / ``predict`` consume metadata, but no custom ``fit_transform`` / ``fit_predict`` is defined in the class inheriting from them correspondingly. :pr:`26831` by `Adrin Jalali`_. - |Enhancement| :func:`base.clone` now supports `dict` as input and creates a copy. :pr:`26786` by `Adrin Jalali`_. - |API|:func:`~utils.metadata_routing.process_routing` now has a different signature. The first two (the object and the method) are positional only, and all metadata are passed as keyword arguments. :pr:`26909` by `Adrin Jalali`_. :mod:`sklearn.calibration` .......................... - |Enhancement| The internal objective and gradient of the `sigmoid` method of :class:`calibration.CalibratedClassifierCV` have been replaced by the private loss module. :pr:`27185` by :user:`Omar Salman `. :mod:`sklearn.cluster` ...................... - |Fix| The `degree` parameter in the :class:`cluster.SpectralClustering` constructor now accepts real values instead of only integral values in accordance with the `degree` parameter of the :class:`sklearn.metrics.pairwise.polynomial_kernel`. :pr:`27668` by :user:`Nolan McMahon `. - |Fix| Fixes a bug in :class:`cluster.OPTICS` where the cluster correction based on predecessor was not using the right indexing. It would lead to inconsistent results depedendent on the order of the data. :pr:`26459` by :user:`Haoying Zhang ` and :user:`Guillaume Lemaitre `. - |Fix| Improve error message when checking the number of connected components in the `fit` method of :class:`cluster.HDBSCAN`. :pr:`27678` by :user:`Ganesh Tata `. - |Fix| Create copy of precomputed sparse matrix within the `fit` method of :class:`cluster.DBSCAN` to avoid in-place modification of the sparse matrix. :pr:`27651` by :user:`Ganesh Tata `. - |Fix| Raises a proper `ValueError` when `metric="precomputed"` and requested storing centers via the parameter `store_centers`. :pr:`27898` by :user:`Guillaume Lemaitre `. - |API| `kdtree` and `balltree` values are now deprecated and are renamed as `kd_tree` and `ball_tree` respectively for the `algorithm` parameter of :class:`cluster.HDBSCAN` ensuring consistency in naming convention. `kdtree` and `balltree` values will be removed in 1.6. :pr:`26744` by :user:`Shreesha Kumar Bhat `. - |API| The option `metric=None` in :class:`cluster.AgglomerativeClustering` and :class:`cluster.FeatureAgglomeration` is deprecated in version 1.4 and will be removed in version 1.6. Use the default value instead. :pr:`27828` by :user:`Guillaume Lemaitre `. :mod:`sklearn.compose` ...................... - |MajorFeature| Adds `polars `__ input support to :class:`compose.ColumnTransformer` through the `DataFrame Interchange Protocol `__. The minimum supported version for polars is `0.19.12`. :pr:`26683` by `Thomas Fan`_. - |Fix| :func:`cluster.spectral_clustering` and :class:`cluster.SpectralClustering` now raise an explicit error message indicating that sparse matrices and arrays with `np.int64` indices are not supported. :pr:`27240` by :user:`Yao Xiao `. - |API| outputs that use pandas extension dtypes and contain `pd.NA` in :class:`~compose.ColumnTransformer` now result in a `FutureWarning` and will cause a `ValueError` in version 1.6, unless the output container has been configured as "pandas" with `set_output(transform="pandas")`. Before, such outputs resulted in numpy arrays of dtype `object` containing `pd.NA` which could not be converted to numpy floats and caused errors when passed to other scikit-learn estimators. :pr:`27734` by :user:`Jérôme Dockès `. :mod:`sklearn.covariance` ......................... - |Enhancement| Allow :func:`covariance.shrunk_covariance` to process multiple covariance matrices at once by handling nd-arrays. :pr:`25275` by :user:`Quentin Barthélemy `. - |API| |FIX| :class:`~compose.ColumnTransformer` now replaces `"passthrough"` with a corresponding :class:`~preprocessing.FunctionTransformer` in the fitted ``transformers_`` attribute. :pr:`27204` by `Adrin Jalali`_. :mod:`sklearn.datasets` ....................... - |Enhancement| :func:`datasets.make_sparse_spd_matrix` now uses a more memory- efficient sparse layout. It also accepts a new keyword `sparse_format` that allows specifying the output format of the sparse matrix. By default `sparse_format=None`, which returns a dense numpy ndarray as before. :pr:`27438` by :user:`Yao Xiao `. - |Fix| :func:`datasets.dump_svmlight_file` now does not raise `ValueError` when `X` is read-only, e.g., a `numpy.memmap` instance. :pr:`28111` by :user:`Yao Xiao `. - |API| :func:`datasets.make_sparse_spd_matrix` deprecated the keyword argument ``dim`` in favor of ``n_dim``. ``dim`` will be removed in version 1.6. :pr:`27718` by :user:`Adam Li `. :mod:`sklearn.decomposition` ............................ - |Feature| :class:`decomposition.PCA` now supports :class:`scipy.sparse.sparray` and :class:`scipy.sparse.spmatrix` inputs when using the `arpack` solver. When used on sparse data like :func:`datasets.fetch_20newsgroups_vectorized` this can lead to speed-ups of 100x (single threaded) and 70x lower memory usage. Based on :user:`Alexander Tarashansky `'s implementation in `scanpy `_. :pr:`18689` by :user:`Isaac Virshup ` and :user:`Andrey Portnoy `. - |Enhancement| An "auto" option was added to the `n_components` parameter of :func:`decomposition.non_negative_factorization`, :class:`decomposition.NMF` and :class:`decomposition.MiniBatchNMF` to automatically infer the number of components from W or H shapes when using a custom initialization. The default value of this parameter will change from `None` to `auto` in version 1.6. :pr:`26634` by :user:`Alexandre Landeau ` and :user:`Alexandre Vigny `. - |Fix| :func:`decomposition.dict_learning_online` does not ignore anymore the parameter `max_iter`. :pr:`27834` by :user:`Guillaume Lemaitre `. - |Fix| The `degree` parameter in the :class:`decomposition.KernelPCA` constructor now accepts real values instead of only integral values in accordance with the `degree` parameter of the :class:`sklearn.metrics.pairwise.polynomial_kernel`. :pr:`27668` by :user:`Nolan McMahon `. - |API| The option `max_iter=None` in :class:`decomposition.MiniBatchDictionaryLearning`, :class:`decomposition.MiniBatchSparsePCA`, and :func:`decomposition.dict_learning_online` is deprecated and will be removed in version 1.6. Use the default value instead. :pr:`27834` by :user:`Guillaume Lemaitre `. :mod:`sklearn.ensemble` ....................... - |MajorFeature| :class:`ensemble.RandomForestClassifier` and :class:`ensemble.RandomForestRegressor` support missing values when the criterion is `gini`, `entropy`, or `log_loss`, for classification or `squared_error`, `friedman_mse`, or `poisson` for regression. :pr:`26391` by `Thomas Fan`_. - |MajorFeature| :class:`ensemble.HistGradientBoostingClassifier` and :class:`ensemble.HistGradientBoostingRegressor` supports `categorical_features="from_dtype"`, which treats columns with Pandas or Polars Categorical dtype as categories in the algorithm. `categorical_features="from_dtype"` will become the default in v1.6. Categorical features no longer need to be encoded with numbers. When categorical features are numbers, the maximum value no longer needs to be smaller than `max_bins`; only the number of (unique) categories must be smaller than `max_bins`. :pr:`26411` by `Thomas Fan`_ and :pr:`27835` by :user:`Jérôme Dockès `. - |MajorFeature| :class:`ensemble.HistGradientBoostingClassifier` and :class:`ensemble.HistGradientBoostingRegressor` got the new parameter `max_features` to specify the proportion of randomly chosen features considered in each split. :pr:`27139` by :user:`Christian Lorentzen `. - |Feature| :class:`ensemble.RandomForestClassifier`, :class:`ensemble.RandomForestRegressor`, :class:`ensemble.ExtraTreesClassifier` and :class:`ensemble.ExtraTreesRegressor` now support monotonic constraints, useful when features are supposed to have a positive/negative effect on the target. Missing values in the train data and multi-output targets are not supported. :pr:`13649` by :user:`Samuel Ronsin `, initiated by :user:`Patrick O'Reilly `. - |Efficiency| :class:`ensemble.HistGradientBoostingClassifier` and :class:`ensemble.HistGradientBoostingRegressor` are now a bit faster by reusing the parent node's histogram as children node's histogram in the subtraction trick. In effect, less memory has to be allocated and deallocated. :pr:`27865` by :user:`Christian Lorentzen `. - |Efficiency| :class:`ensemble.GradientBoostingClassifier` is faster, for binary and in particular for multiclass problems thanks to the private loss function module. :pr:`26278` and :pr:`28095` by :user:`Christian Lorentzen `. - |Efficiency| Improves runtime and memory usage for :class:`ensemble.GradientBoostingClassifier` and :class:`ensemble.GradientBoostingRegressor` when trained on sparse data. :pr:`26957` by `Thomas Fan`_. - |Efficiency| :class:`ensemble.HistGradientBoostingClassifier` and :class:`ensemble.HistGradientBoostingRegressor` is now faster when `scoring` is a predefined metric listed in :func:`metrics.get_scorer_names` and early stopping is enabled. :pr:`26163` by `Thomas Fan`_. - |Enhancement| A fitted property, ``estimators_samples_``, was added to all Forest methods, including :class:`ensemble.RandomForestClassifier`, :class:`ensemble.RandomForestRegressor`, :class:`ensemble.ExtraTreesClassifier` and :class:`ensemble.ExtraTreesRegressor`, which allows to retrieve the training sample indices used for each tree estimator. :pr:`26736` by :user:`Adam Li `. - |Fix| Fixes :class:`ensemble.IsolationForest` when the input is a sparse matrix and `contamination` is set to a float value. :pr:`27645` by :user:`Guillaume Lemaitre `. - |Fix| Raises a `ValueError` in :class:`ensemble.RandomForestRegressor` and :class:`ensemble.ExtraTreesRegressor` when requesting OOB score with multioutput model for the targets being all rounded to integer. It was recognized as a multiclass problem. :pr:`27817` by :user:`Daniele Ongari ` - |Fix| Changes estimator tags to acknowledge that :class:`ensemble.VotingClassifier`, :class:`ensemble.VotingRegressor`, :class:`ensemble.StackingClassifier`, :class:`ensemble.StackingRegressor`, support missing values if all `estimators` support missing values. :pr:`27710` by :user:`Guillaume Lemaitre `. - |Fix| Support loading pickles of :class:`ensemble.HistGradientBoostingClassifier` and :class:`ensemble.HistGradientBoostingRegressor` when the pickle has been generated on a platform with a different bitness. A typical example is to train and pickle the model on 64 bit machine and load the model on a 32 bit machine for prediction. :pr:`28074` by :user:`Christian Lorentzen ` and :user:`Loïc Estève `. - |API| In :class:`ensemble.AdaBoostClassifier`, the `algorithm` argument `SAMME.R` was deprecated and will be removed in 1.6. :pr:`26830` by :user:`Stefanie Senger `. :mod:`sklearn.feature_extraction` ................................. - |API| Changed error type from :class:`AttributeError` to :class:`exceptions.NotFittedError` in unfitted instances of :class:`feature_extraction.DictVectorizer` for the following methods: :func:`feature_extraction.DictVectorizer.inverse_transform`, :func:`feature_extraction.DictVectorizer.restrict`, :func:`feature_extraction.DictVectorizer.transform`. :pr:`24838` by :user:`Lorenz Hertel `. :mod:`sklearn.feature_selection` ................................ - |Enhancement| :class:`feature_selection.SelectKBest`, :class:`feature_selection.SelectPercentile`, and :class:`feature_selection.GenericUnivariateSelect` now support unsupervised feature selection by providing a `score_func` taking `X` and `y=None`. :pr:`27721` by :user:`Guillaume Lemaitre `. - |Enhancement| :class:`feature_selection.SelectKBest` and :class:`feature_selection.GenericUnivariateSelect` with `mode='k_best'` now shows a warning when `k` is greater than the number of features. :pr:`27841` by `Thomas Fan`_. - |Fix| :class:`feature_selection.RFE` and :class:`feature_selection.RFECV` do not check for nans during input validation. :pr:`21807` by `Thomas Fan`_. :mod:`sklearn.inspection` ......................... - |Enhancement| :class:`inspection.DecisionBoundaryDisplay` now accepts a parameter `class_of_interest` to select the class of interest when plotting the response provided by `response_method="predict_proba"` or `response_method="decision_function"`. It allows to plot the decision boundary for both binary and multiclass classifiers. :pr:`27291` by :user:`Guillaume Lemaitre `. - |Fix| :meth:`inspection.DecisionBoundaryDisplay.from_estimator` and :class:`inspection.PartialDependenceDisplay.from_estimator` now return the correct type for subclasses. :pr:`27675` by :user:`John Cant `. - |API| :class:`inspection.DecisionBoundaryDisplay` raise an `AttributeError` instead of a `ValueError` when an estimator does not implement the requested response method. :pr:`27291` by :user:`Guillaume Lemaitre `. :mod:`sklearn.kernel_ridge` ........................... - |Fix| The `degree` parameter in the :class:`kernel_ridge.KernelRidge` constructor now accepts real values instead of only integral values in accordance with the `degree` parameter of the :class:`sklearn.metrics.pairwise.polynomial_kernel`. :pr:`27668` by :user:`Nolan McMahon `. :mod:`sklearn.linear_model` ........................... - |Efficiency| :class:`linear_model.LogisticRegression` and :class:`linear_model.LogisticRegressionCV` now have much better convergence for solvers `"lbfgs"` and `"newton-cg"`. Both solvers can now reach much higher precision for the coefficients depending on the specified `tol`. Additionally, lbfgs can make better use of `tol`, i.e., stop sooner or reach higher precision. This is accomplished by better scaling of the objective function, i.e., using average per sample losses instead of sum of per sample losses. :pr:`26721` by :user:`Christian Lorentzen `. - |Efficiency| :class:`linear_model.LogisticRegression` and :class:`linear_model.LogisticRegressionCV` with solver `"newton-cg"` can now be considerably faster for some data and parameter settings. This is accomplished by a better line search convergence check for negligible loss improvements that takes into account gradient information. :pr:`26721` by :user:`Christian Lorentzen `. - |Efficiency| Solver `"newton-cg"` in :class:`linear_model.LogisticRegression` and :class:`linear_model.LogisticRegressionCV` uses a little less memory. The effect is proportional to the number of coefficients (`n_features * n_classes`). :pr:`27417` by :user:`Christian Lorentzen `. - |Fix| Ensure that the `sigma_` attribute of :class:`linear_model.ARDRegression` and :class:`linear_model.BayesianRidge` always has a `float32` dtype when fitted on `float32` data, even with the type promotion rules of NumPy 2. :pr:`27899` by :user:`Olivier Grisel `. - |API| The attribute `loss_function_` of :class:`linear_model.SGDClassifier` and :class:`linear_model.SGDOneClassSVM` has been deprecated and will be removed in version 1.6. :pr:`27979` by :user:`Christian Lorentzen `. :mod:`sklearn.metrics` ...................... - |Efficiency| Computing pairwise distances via :class:`metrics.DistanceMetric` for CSR x CSR, Dense x CSR, and CSR x Dense datasets is now 1.5x faster. :pr:`26765` by :user:`Meekail Zain `. - |Efficiency| Computing distances via :class:`metrics.DistanceMetric` for CSR x CSR, Dense x CSR, and CSR x Dense now uses ~50% less memory, and outputs distances in the same dtype as the provided data. :pr:`27006` by :user:`Meekail Zain `. - |Enhancement| Improve the rendering of the plot obtained with the :class:`metrics.PrecisionRecallDisplay` and :class:`metrics.RocCurveDisplay` classes. the x- and y-axis limits are set to [0, 1] and the aspect ratio between both axis is set to be 1 to get a square plot. :pr:`26366` by :user:`Mojdeh Rastgoo `. - |Enhancement| Added `neg_root_mean_squared_log_error_scorer` as scorer :pr:`26734` by :user:`Alejandro Martin Gil <101AlexMartin>`. - |Enhancement| :func:`metrics.confusion_matrix` now warns when only one label was found in `y_true` and `y_pred`. :pr:`27650` by :user:`Lucy Liu `. - |Fix| computing pairwise distances with :func:`metrics.pairwise.euclidean_distances` no longer raises an exception when `X` is provided as a `float64` array and `X_norm_squared` as a `float32` array. :pr:`27624` by :user:`Jérôme Dockès `. - |Fix| :func:`f1_score` now provides correct values when handling various cases in which division by zero occurs by using a formulation that does not depend on the precision and recall values. :pr:`27577` by :user:`Omar Salman ` and :user:`Guillaume Lemaitre `. - |Fix| :func:`metrics.make_scorer` now raises an error when using a regressor on a scorer requesting a non-thresholded decision function (from `decision_function` or `predict_proba`). Such scorer are specific to classification. :pr:`26840` by :user:`Guillaume Lemaitre `. - |Fix| :meth:`metrics.DetCurveDisplay.from_predictions`, :class:`metrics.PrecisionRecallDisplay.from_predictions`, :class:`metrics.PredictionErrorDisplay.from_predictions`, and :class:`metrics.RocCurveDisplay.from_predictions` now return the correct type for subclasses. :pr:`27675` by :user:`John Cant `. - |API| Deprecated `needs_threshold` and `needs_proba` from :func:`metrics.make_scorer`. These parameters will be removed in version 1.6. Instead, use `response_method` that accepts `"predict"`, `"predict_proba"` or `"decision_function"` or a list of such values. `needs_proba=True` is equivalent to `response_method="predict_proba"` and `needs_threshold=True` is equivalent to `response_method=("decision_function", "predict_proba")`. :pr:`26840` by :user:`Guillaume Lemaitre `. - |API| The `squared` parameter of :func:`metrics.mean_squared_error` and :func:`metrics.mean_squared_log_error` is deprecated and will be removed in 1.6. Use the new functions :func:`metrics.root_mean_squared_error` and :func:`metrics.root_mean_squared_log_error` instead. :pr:`26734` by :user:`Alejandro Martin Gil <101AlexMartin>`. :mod:`sklearn.model_selection` .............................. - |Enhancement| :func:`model_selection.learning_curve` raises a warning when every cross validation fold fails. :pr:`26299` by :user:`Rahil Parikh `. - |Fix| :class:`model_selection.GridSearchCV`, :class:`model_selection.RandomizedSearchCV`, and :class:`model_selection.HalvingGridSearchCV` now don't change the given object in the parameter grid if it's an estimator. :pr:`26786` by `Adrin Jalali`_. :mod:`sklearn.multioutput` .......................... - |Enhancement| Add method `predict_log_proba` to :class:`multioutput.ClassifierChain`. :pr:`27720` by :user:`Guillaume Lemaitre `. :mod:`sklearn.neighbors` ........................ - |Efficiency| :meth:`sklearn.neighbors.KNeighborsRegressor.predict` and :meth:`sklearn.neighbors.KNeighborsClassifier.predict_proba` now efficiently support pairs of dense and sparse datasets. :pr:`27018` by :user:`Julien Jerphanion `. - |Efficiency| The performance of :meth:`neighbors.RadiusNeighborsClassifier.predict` and of :meth:`neighbors.RadiusNeighborsClassifier.predict_proba` has been improved when `radius` is large and `algorithm="brute"` with non-Euclidean metrics. :pr:`26828` by :user:`Omar Salman `. - |Fix| Improve error message for :class:`neighbors.LocalOutlierFactor` when it is invoked with `n_samples=n_neighbors`. :pr:`23317` by :user:`Bharat Raghunathan `. - |Fix| :meth:`neighbors.KNeighborsClassifier.predict` and :meth:`neighbors.KNeighborsClassifier.predict_proba` now raises an error when the weights of all neighbors of some sample are zero. This can happen when `weights` is a user-defined function. :pr:`26410` by :user:`Yao Xiao `. - |API| :class:`neighbors.KNeighborsRegressor` now accepts :class:`metrics.DistanceMetric` objects directly via the `metric` keyword argument allowing for the use of accelerated third-party :class:`metrics.DistanceMetric` objects. :pr:`26267` by :user:`Meekail Zain `. :mod:`sklearn.preprocessing` ............................ - |Efficiency| :class:`preprocessing.OrdinalEncoder` avoids calculating missing indices twice to improve efficiency. :pr:`27017` by :user:`Xuefeng Xu `. - |Efficiency| Improves efficiency in :class:`preprocessing.OneHotEncoder` and :class:`preprocessing.OrdinalEncoder` in checking `nan`. :pr:`27760` by :user:`Xuefeng Xu `. - |Enhancement| Improves warnings in :class:`preprocessing.FunctionTransformer` when `func` returns a pandas dataframe and the output is configured to be pandas. :pr:`26944` by `Thomas Fan`_. - |Enhancement| :class:`preprocessing.TargetEncoder` now supports `target_type` 'multiclass'. :pr:`26674` by :user:`Lucy Liu `. - |Fix| :class:`preprocessing.OneHotEncoder` and :class:`preprocessing.OrdinalEncoder` raise an exception when `nan` is a category and is not the last in the user's provided categories. :pr:`27309` by :user:`Xuefeng Xu `. - |Fix| :class:`preprocessing.OneHotEncoder` and :class:`preprocessing.OrdinalEncoder` raise an exception if the user provided categories contain duplicates. :pr:`27328` by :user:`Xuefeng Xu `. - |Fix| :class:`preprocessing.FunctionTransformer` raises an error at `transform` if the output of `get_feature_names_out` is not consistent with the column names of the output container if those are defined. :pr:`27801` by :user:`Guillaume Lemaitre `. - |Fix| Raise a `NotFittedError` in :class:`preprocessing.OrdinalEncoder` when calling `transform` without calling `fit` since `categories` always requires to be checked. :pr:`27821` by :user:`Guillaume Lemaitre `. :mod:`sklearn.tree` ................... - |Feature| :class:`tree.DecisionTreeClassifier`, :class:`tree.DecisionTreeRegressor`, :class:`tree.ExtraTreeClassifier` and :class:`tree.ExtraTreeRegressor` now support monotonic constraints, useful when features are supposed to have a positive/negative effect on the target. Missing values in the train data and multi-output targets are not supported. :pr:`13649` by :user:`Samuel Ronsin `, initiated by :user:`Patrick O'Reilly `. :mod:`sklearn.utils` .................... - |Enhancement| :func:`sklearn.utils.estimator_html_repr` dynamically adapts diagram colors based on the browser's `prefers-color-scheme`, providing improved adaptability to dark mode environments. :pr:`26862` by :user:`Andrew Goh Yisheng <9y5>`, `Thomas Fan`_, `Adrin Jalali`_. - |Enhancement| :class:`~utils.metadata_routing.MetadataRequest` and :class:`~utils.metadata_routing.MetadataRouter` now have a ``consumes`` method which can be used to check whether a given set of parameters would be consumed. :pr:`26831` by `Adrin Jalali`_. - |Enhancement| Make :func:`sklearn.utils.check_array` attempt to output `int32`-indexed CSR and COO arrays when converting from DIA arrays if the number of non-zero entries is small enough. This ensures that estimators implemented in Cython and that do not accept `int64`-indexed sparse datastucture, now consistently accept the same sparse input formats for SciPy sparse matrices and arrays. :pr:`27372` by :user:`Guillaume Lemaitre `. - |Fix| :func:`sklearn.utils.check_array` should accept both matrix and array from the sparse SciPy module. The previous implementation would fail if `copy=True` by calling specific NumPy `np.may_share_memory` that does not work with SciPy sparse array and does not return the correct result for SciPy sparse matrix. :pr:`27336` by :user:`Guillaume Lemaitre `. - |Fix| :func:`~utils.estimator_checks.check_estimators_pickle` with `readonly_memmap=True` now relies on joblib's own capability to allocate aligned memory mapped arrays when loading a serialized estimator instead of calling a dedicated private function that would crash when OpenBLAS misdetects the CPU architecture. :pr:`27614` by :user:`Olivier Grisel `. - |Fix| Error message in :func:`~utils.check_array` when a sparse matrix was passed but `accept_sparse` is `False` now suggests to use `.toarray()` and not `X.toarray()`. :pr:`27757` by :user:`Lucy Liu `. - |Fix| Fix the function :func:`~utils.check_array` to output the right error message when the input is a Series instead of a DataFrame. :pr:`28090` by :user:`Stan Furrer ` and :user:`Yao Xiao `. - |API| :func:`sklearn.extmath.log_logistic` is deprecated and will be removed in 1.6. Use `-np.logaddexp(0, -x)` instead. :pr:`27544` by :user:`Christian Lorentzen `. .. rubric:: Code and documentation contributors Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.3, including: 101AlexMartin, Abhishek Singh Kushwah, Adam Li, Adarsh Wase, Adrin Jalali, Advik Sinha, Alex, Alexander Al-Feghali, Alexis IMBERT, AlexL, Alex Molas, Anam Fatima, Andrew Goh, andyscanzio, Aniket Patil, Artem Kislovskiy, Arturo Amor, ashah002, avm19, Ben Holmes, Ben Mares, Benoit Chevallier-Mames, Bharat Raghunathan, Binesh Bannerjee, Brendan Lu, Brevin Kunde, Camille Troillard, Carlo Lemos, Chad Parmet, Christian Clauss, Christian Lorentzen, Christian Veenhuis, Christos Aridas, Cindy Liang, Claudio Salvatore Arcidiacono, Connor Boyle, cynthias13w, DaminK, Daniele Ongari, Daniel Schmitz, Daniel Tinoco, David Brochart, Deborah L. Haar, DevanshKyada27, Dimitri Papadopoulos Orfanos, Dmitry Nesterov, DUONG, Edoardo Abati, Eitan Hemed, Elabonga Atuo, Elisabeth Günther, Emma Carballal, Emmanuel Ferdman, epimorphic, Erwan Le Floch, Fabian Egli, Filip Karlo Došilović, Florian Idelberger, Franck Charras, Gael Varoquaux, Ganesh Tata, Gleb Levitski, Guillaume Lemaitre, Haoying Zhang, Harmanan Kohli, Ily, ioangatop, IsaacTrost, Isaac Virshup, Iwona Zdzieblo, Jakub Kaczmarzyk, James McDermott, Jarrod Millman, JB Mountford, Jérémie du Boisberranger, Jérôme Dockès, Jiawei Zhang, Joel Nothman, John Cant, John Hopfensperger, Jona Sassenhagen, Jon Nordby, Julien Jerphanion, Kennedy Waweru, kevin moore, Kian Eliasi, Kishan Ved, Konstantinos Pitas, Koustav Ghosh, Kushan Sharma, ldwy4, Linus, Lohit SundaramahaLingam, Loic Esteve, Lorenz, Louis Fouquet, Lucy Liu, Luis Silvestrin, Lukáš Folwarczný, Lukas Geiger, Malte Londschien, Marcus Fraaß, Marek Hanuš, Maren Westermann, Mark Elliot, Martin Larralde, Mateusz Sokół, mathurinm, mecopur, Meekail Zain, Michael Higgins, Miki Watanabe, Milton Gomez, MN193, Mohammed Hamdy, Mohit Joshi, mrastgoo, Naman Dhingra, Naoise Holohan, Narendra Singh dangi, Noa Malem-Shinitski, Nolan, Nurseit Kamchyev, Oleksii Kachaiev, Olivier Grisel, Omar Salman, partev, Peter Hull, Peter Steinbach, Pierre de Fréminville, Pooja Subramaniam, Puneeth K, qmarcou, Quentin Barthélemy, Rahil Parikh, Rahul Mahajan, Raj Pulapakura, Raphael, Ricardo Peres, Riccardo Cappuzzo, Roman Lutz, Salim Dohri, Samuel O. Ronsin, Sandip Dutta, Sayed Qaiser Ali, scaja, scikit-learn-bot, Sebastian Berg, Shreesha Kumar Bhat, Shubhal Gupta, Søren Fuglede Jørgensen, Stefanie Senger, Tamara, Tanjina Afroj, THARAK HEGDE, thebabush, Thomas J. Fan, Thomas Roehr, Tialo, Tim Head, tongyu, Venkatachalam N, Vijeth Moudgalya, Vincent M, Vivek Reddy P, Vladimir Fokow, Xiao Yuan, Xuefeng Xu, Yang Tao, Yao Xiao, Yuchen Zhou, Yuusuke Hiramatsu