Version 1.9#

Legend for changelogs

  • Major Feature something big that you couldn’t do before.

  • Feature something that you couldn’t do before.

  • Efficiency an existing feature now may not require as much computation or memory.

  • Enhancement a miscellaneous minor improvement.

  • Fix something that previously didn’t work as documented – or according to reasonable expectations – should now work.

  • API Change you will need to change your code to have the same effect in the future; or a feature will be removed in the future.

Version 1.9.dev0#

April 2026

Changed models#

  • Enhancement The transform method of preprocessing.PowerTransformer with method="yeo-johnson" now uses the numerical more stable function scipy.stats.yeojohnson instead of an own implementation. The results may deviate in numerical edge cases or within the precision of floating-point arithmetic. By Christian Lorentzen. #33272

  • API Change The default value of the scoring parameter in linear_model.LogisticRegressionCV will change in version 1.11 from None, i.e. accuracy, to "neg_log_loss". This is a much better default scoring function as it aligns with the log loss that logistic regression is minimizing (with regularization). For the meantime, you can silence the warning for this change by explicitly passing a value to scoring. By Christian Lorentzen. #33333

Changes impacting many modules#

Support for Array API#

Additional estimators and functions have been updated to include support for all Array API compliant inputs.

See Array API support (experimental) for more details.

Metadata routing#

Refer to the Metadata Routing User Guide for more details.

sklearn.cluster#

sklearn.compose#

sklearn.datasets#

sklearn.decomposition#

sklearn.ensemble#

  • Fix Fixed the way ensemble.HistGradientBoostingClassifier and ensemble.HistGradientBoostingRegressor compute their bin edges to properly and consistently handle sample_weight. When sample_weights=None is passed to fit and the number of distinct feature values is less than the specified max_bins, the edges are still set to midpoints between consecutive feature values. Otherwise, the bin edges are set to weight-aware quantiles computed using the averaged inverted CDF method. If n_samples is larger than the subsample parameter, the weights are instead used to subsample the data (with replacement) and the bin edges are set using unweighted quantiles of the subsampled data. By Shruti Nath and Olivier Grisel #29641

  • Fix ensemble.RandomForestClassifier, ensemble.RandomForestRegressor, ensemble.ExtraTreesClassifier and ensemble.ExtraTreesRegressor now use sample_weight to draw the samples instead of forwarding them multiplied by a uniformly sampled mask to the underlying estimators. Furthermore, when max_samples is a float, it is now interpreted as a fraction of sample_weight.sum() instead of X.shape[0]. As sampling is done with replacement, a float max_samples greater than 1.0 is now allowed, as well as an integer max_samples greater then X.shape[0]. The default max_samples=None draws X.shape[0] samples, irrespective of sample_weight. By Antoine Baker. #31529

  • Fix Both ensemble.GradientBoostingRegressor and ensemble.GradientBoostingClassifier with the default "friedman_mse" criterion were computing impurity values with an incorrect scaling, leading to unexpected trees in some cases. The implementation now uses "squared_error", which is exactly equivalent to "friedman_mse" up to floating-point error discrepancies but computes correct impurity values. By Arthur Lacote. #32708

  • API Change The criterion parameter is now deprecated for classes ensemble.GradientBoostingRegressor and ensemble.GradientBoostingClassifier, as both options ("friedman_mse" and "squared_error") were producing the same results, up to floating-point rounding discrepancies and a bug in "friedman_mse". By Arthur Lacote #32708

sklearn.feature_extraction#

sklearn.feature_selection#

sklearn.gaussian_process#

  • Efficiency Constructor signature of Gaussian process kernels is now cached, improving performance on small and medium datasets. By Stanislav Terliakov #33067

  • Fix The hyperparameters of the default kernel of GaussianProcessRegressor, namely ConstantKernel() * RBF(), are now optimized when optimizer is not None. Thus, gpr = GaussianProcessRegressor().fit(X, y) uses optimized kernel hyperparameters. By Matthias De Lozzo. #32964

sklearn.inspection#

sklearn.linear_model#

sklearn.manifold#

sklearn.metrics#

sklearn.model_selection#

  • Enhancement GroupKFold now uses stable sorting when doing the group distribution. This ensures that the splits are consistent across runs. By marikabergengren and Adrin Jalali #28464

  • Fix model_selection.StratifiedGroupKFold now raises a ValueError when n_splits is greater than the number of unique groups, preventing degenerate folds. By Chani Fainendler. #33176

  • Fix Fixed incorrect ValueError when using scoring="average_precision" or similar in model selection utilities such as model_selection.GridSearchCV or model_selection.cross_validate with multiclass classifiers. The pos_label parameter is only relevant for binary classification and was incorrectly being validated for scorers used on multiclass problems. By Olivier Grisel. #33473

sklearn.neural_network#

sklearn.pipeline#

sklearn.preprocessing#

sklearn.svm#

sklearn.tree#

sklearn.utils#

  • Enhancement sklearn.utils._tags.get_tags now provides a clearer error message when a class is passed instead of an estimator instance. By Achyuthan S and Anne Beyer. #32565

  • Enhancement sklearn.utils._response._get_response_values now provides a clearer error message when estimator does not implement the given response_method. By Quentin Barthélemy. #33126

  • Fix The parameter table in the HTML representation of all scikit-learn estimators inheritiging from base.BaseEstimator, displays each parameter documentation as a tooltip. The last tooltip of a parameter in the last table of any HTML representation was partially hidden. This issue has been fixed. By Dea María Léon #32887

  • Fix Fixed _weighted_percentile with average=True so zero-weight samples

    just before the end of the array are handled correctly. This

    can change results when using sample_weight with preprocessing.KBinsDiscretizer (strategy="quantile", quantile_method="averaged_inverted_cdf") and in metrics.median_absolute_error, metrics.d2_pinball_score, and metrics.d2_absolute_error_score. By Arthur Lacote. #33127

  • Fix utils.validation.check_array now correctly rejects pandas StringDtype columns when dtype="numeric" is requested. In pandas 3, string columns use StringDtype instead of object dtype, which caused check_array to silently accept string data instead of raising a ValueError. By Olivier Grisel. #33491

  • Fix The code path for polars dataframes in validate_data was made independent of the dataframe interchange protocol __dataframe__. This change was necessary to adapt to the recent deprecation of the interchange protocol in polars version 1.40. By Christian Lorentzen. #33789

Code and documentation contributors

Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.8, including:

TODO: update at the time of the release.