Version 1.9#

For a short description of the main highlights of the release, please refer to Release Highlights for scikit-learn 1.9.

Legend for changelogs

Major Feature something big that you couldn’t do before.
Feature something that you couldn’t do before.
Efficiency an existing feature now may not require as much computation or memory.
Enhancement a miscellaneous minor improvement.
Fix something that previously didn’t work as documented – or according to reasonable expectations – should now work.
API Change you will need to change your code to have the same effect in the future; or a feature will be removed in the future.

Version 1.9.0#

June 2026

Changed models#

Enhancement The transform method of preprocessing.PowerTransformer with method="yeo-johnson" now uses the numerical more stable function scipy.stats.yeojohnson instead of an own implementation. The results may deviate in numerical edge cases or within the precision of floating-point arithmetic. By Christian Lorentzen. #33272

Changes impacting many modules#

Major Feature Introduced a new config key: “sparse_interface” to control whether functions return sparse objects using SciPy sparse matrix or SciPy sparse array. Use sklearn.set_config(sparse_interface="sparray") to have sklearn return sparse arrays. See more at the SciPy Sparse Migration Guide. The scikit-learn config “sparse_interface” initially defaults to sparse matrix (“spmatrix”). The plan is to have the default change to sparse array (“sparray”) in a few releases. By Dan Schult. #31177
Enhancement Scikit-learn accepted a new library dependency: narwhals. This is a very lightweight dependency that simplifies the support of dataframe input X and dataframe output as specified in the set_output API. Examples are pandas and polars dataframes. Narwhals can also help to support more dataframe libraries. Another reason for its adoption was that the dataframe interchange protocol (__dataframe__) on which scikit-learn relied so far for non-pandas dataframes got deprecated by polars and has run its course. By Christian Lorentzen and Marco Gorelli. #31127
Enhancement The HTML representation of all scikit-learn estimators inheriting from base.BaseEstimator now displays a new block showing the number and names of the output features when using a compose.ColumnTransformer or a pipeline.FeatureUnion. A copy-paste button is available for the output features name. By Dea María Léon, Guillaume Lemaitre, Jérémie du Boisberranger, Olivier Grisel, Antoine Baker. #31937
Enhancement pipeline.Pipeline, pipeline.FeatureUnion and compose.ColumnTransformer now raise a clearer error message when an estimator class is passed instead of an instance. By Anne Beyer. #32888
Enhancement Checks for response values now provide a clearer error message when estimator does not implement the given response_method. By Quentin Barthélemy. #33126
Enhancement The HTML representation of all scikit-learn estimators inheriting from base.BaseEstimator now includes a table displaying their fitted attributes. These are all the public estimator attributes that are computed during the call to fit with a name that ends with an underscore. By Dea María Léon, Jérémie du Boisberranger, Olivier Grisel, Guillaume Lemaitre, Antoine Baker. #33399
Fix Raise ValueError when sample_weight contains only zero values to prevent meaningless input data during fitting. This change applies to all estimators that support the parameter sample_weight. This change also affects metrics that validate sample weights. By Lucy Liu and John Hendricks. #32212
Fix Some parameter descriptions in the HTML representation of estimators were not properly escaped, which could lead to malformed HTML if the description contains characters like < or >. By Olivier Grisel. #32942

Support for Array API#

Additional estimators and functions have been updated to include support for all Array API compliant inputs.

See Array API support (experimental) for more details.

Feature sklearn.metrics.d2_absolute_error_score and sklearn.metrics.d2_pinball_score now support array API compatible inputs. By Virgil Chan. #31671
Feature linear_model.LogisticRegression now supports array API compatible inputs with solver="lbfgs". By Omar Salman and Olivier Grisel. #32644
Feature metrics.average_precision_score now supports Array API compliant inputs. By Stefanie Senger. #32909
Feature sklearn.metrics.pairwise.paired_manhattan_distances now supports array API compatible inputs. By Bharat Raghunathan. #32979
Feature metrics.pairwise_distances_argmin now supports array API compatible inputs. By Bharat Raghunathan. #32985
Feature linear_model.LinearRegression, linear_model.Ridge, linear_model.RidgeClassifier, linear_model.LogisticRegression, and discriminant_analysis.LinearDiscriminantAnalysis now raise a more informative error message when arrays passed at fit and prediction time use different array API namespaces or devices. A new sklearn.utils._array_api.move_estimator_to utility is provided to move an estimator’s fitted array attributes to a different namespace and device. By Jérôme Dockès and Tim Head. #33076
Feature pipeline.FeatureUnion now supports Array API compliant inputs when all its transformers do. By Olivier Grisel. #33263
Feature linear_model.PoissonRegressor now supports array API compatible inputs with solver="lbfgs". By Christian Lorentzen and Omar Salman. #33348
Enhancement kernel_approximation.Nystroem now supports array API compatible inputs. By Emily Chen. #29661
Enhancement linear_model.RidgeCV now accepts array API compliant arrays with gcv_mode set to auto or eigen. By Antoine Baker. #33020
Enhancement Internal NumPy CPU conversions now always attempt a generic DLPack-based transfer and only fallback to library-specific methods when necessary. This should ease support for additional array API and DLPack compliant input types without extending the ad hoc conversion helpers. By Olivier Grisel. #33623
Fix Fixed a bug that would cause Cython-based estimators to fail when fit on NumPy inputs when setting sklearn.set_config(array_api_dispatch=True). By Olivier Grisel. #32846
Fix Fixes how pos_label is inferred when pos_label is set to None, in sklearn.metrics.brier_score_loss and sklearn.metrics.d2_brier_score. By Lucy Liu. #32923
Fix linear_model.ridge_regression now correctly passes a Python scalar as fill_value to xp.full when broadcasting alpha for multi-target regression, ensuring compliance with the array API specification. This fixes compatibility issues with some array API backends. By Olivier Grisel. #33437
Fix metrics.pairwise_distances no longer emits spurious cross-library dtype comparison warnings when called with Array API inputs under config_context(array_api_dispatch=True). By Olivier Grisel. #33873
Fix Fixed support for integer Array API inputs on devices that do not support float64 in preprocessing.MinMaxScaler, preprocessing.MaxAbsScaler, preprocessing.KernelCenterer, preprocessing.normalize, utils.extmath.randomized_range_finder, and internal linear-model preprocessing and log-sum-exp utilities. By Arthur Lacote. #33898
Fix Fix passing an array as alpha in linear_model.Ridge when using the array API. By Thomas Moreau. #34004
Fix linear_model.RidgeClassifier and linear_model.RidgeClassifierCV now store classes_ in the namespace and on the device of y when fitted with array API inputs from mixed namespaces/devices, making them consistent with linear_model.LogisticRegression. By Arthur Lacote. #34065
Fix Fixed a bug where NumPy-fitted estimators could raise an error with config_context(array_api_dispatch=True) when making predictions with array-like or SciPy sparse inputs, or when a fitted attribute was sparse, such as after calling linear_model.LogisticRegression.sparsify. By Arthur Lacote. #34144

Metadata routing#

Refer to the Metadata Routing User Guide for more details.

Enhancement TargetEncoder now routes groups to the CV splitter internally used for cross fitting in its fit_transform. By Samruddhi Baviskar and Stefanie Senger. #33089
Fix Scorers now correctly request for metadata, and their set_score_request methods correctly detect metadata available in the signature of their score_func. Also, sklearn.linear_model.LogisticRegressionCV now correctly routes metadata to the underlying scorer when its .score(...) method is called. By Adrin Jalali #30859
Fix If a class explicitly defines a set_{method}_request method, it will not be overridden by the metadata routing machinery. By Adrin Jalali #32111
Fix Metadata routing objects (MetadataRequest, MetadataRouter, and their per-method requests) no longer deep-copy the owning estimator. Since scikit-learn 1.8, the routing objects hold a reference to the owner estimator for display purposes, which caused get_routing_for_object and add_self_request to transitively deep-copy the full estimator state, which can fail, and is very inefficient. By Adrin Jalali. #33827
Fix learning_curve now correctly routes sample_weight to the sub-estimator’s partial_fit method if exploit_incremental_learning is set to True. By Stefanie Senger. #34039

Callbacks#

Major Feature This release introduces a new callback API to invoke callbacks during the fitting of estimators that support them. It comes with two built-in callbacks:
- sklearn.callback.ProgressBar, to display progress bars.
- sklearn.callback.ScoringMonitor, to compute and log a scoring metric at the end of each iteration.
The following estimators support callbacks:
- LogisticRegression (only with solver="lbfgs").
- GridSearchCV
- HalvingGridSearchCV
- HalvingRandomSearchCV
- RandomizedSearchCV
- Pipeline
- StandardScaler
It also provides a public API to implement callback support in custom estimators or or to implement custom callbacks, see the developer’s guide.

This API is experimental for now and may change without the usual deprecation cycle.

By Jérémie du Boisberranger, François Paugam and Stefanie Senger. #33322

`sklearn.cluster`#

Enhancement cluster.AgglomerativeClustering and cluster.FeatureAgglomeration now accept metric="l2" together with linkage="ward". metric="l2" is equivalent to metric="euclidean". #24681 by Guillaume Lemaitre. #24681
Fix cluster.MiniBatchKMeans now correctly handles sample weights during fitting. When sample weights are not None, mini-batch indices are created by sub-sampling with replacement using the normalized sample weights as probabilities. By Shruti Nath, Olivier Grisel, and Jeremie du Boisberranger. #30751
Fix Fixed a bug in cluster.BisectingKMeans when using a custom callable init with n_clusters > 2. By Mohammad Ahmadullah Khan. #33148

`sklearn.compose`#

Fix The dotted line for compose.ColumnTransformer in its HTML display now includes only its elements. The behaviour when a remainder is used, has also been corrected. By Dea María Léon. #32713
Fix Fixes the regression that a KeyError was thrown when using compose.ColumnTransformer.fit_transform with metadata routing and remainder="passthrough". By Anne Beyer. #33665

`sklearn.datasets`#

Efficiency Re-enabled compressed caching for datasets.fetch_kddcup99, reducing on-disk cache size without changing the public API. By Unique Shrestha. #33118
Fix Fixed datasets.fetch_openml to issue OpenML API calls to https://www.openml.org/api/v1/ instead of https://api.openml.org/api/v1/, which no longer resolves or redirects correctly. By Olivier Grisel. #33868

`sklearn.decomposition`#

Efficiency FastICA with algorithm='deflation' and fun='logcosh' is now an order of magnitude faster. By Mohammad Ahmadullah Khan. #33269
Fix Fixed a typo (from "OR" to "QR") in the list of allowed values for power_iteration_normalizer in decomposition.TruncatedSVD. By Olivier Grisel. #33492

`sklearn.ensemble`#

Fix Fixed the way ensemble.HistGradientBoostingClassifier and ensemble.HistGradientBoostingRegressor compute their bin edges to properly and consistently handle sample_weight. When sample_weights=None is passed to fit and the number of distinct feature values is less than the specified max_bins, the edges are still set to midpoints between consecutive feature values. Otherwise, the bin edges are set to weight-aware quantiles computed using the averaged inverted CDF method. If n_samples is larger than the subsample parameter, the weights are instead used to subsample the data (with replacement) and the bin edges are set using unweighted quantiles of the subsampled data. By Shruti Nath and Olivier Grisel. #29641
Fix ensemble.RandomForestClassifier, ensemble.RandomForestRegressor, ensemble.ExtraTreesClassifier and ensemble.ExtraTreesRegressor now use sample_weight to draw the samples instead of forwarding them multiplied by a uniformly sampled mask to the underlying estimators. Furthermore, when max_samples is a float, it is now interpreted as a fraction of sample_weight.sum() instead of X.shape[0]. As sampling is done with replacement, a float max_samples greater than 1.0 is now allowed, as well as an integer max_samples greater then X.shape[0]. The default max_samples=None draws X.shape[0] samples, irrespective of sample_weight. By Antoine Baker. #31529
Fix Both ensemble.GradientBoostingRegressor and ensemble.GradientBoostingClassifier with the default "friedman_mse" criterion were computing impurity values with an incorrect scaling, leading to unexpected trees in some cases. The implementation now uses "squared_error", which is exactly equivalent to "friedman_mse" up to floating-point error discrepancies but computes correct impurity values. By Arthur Lacote. #32708
API Change The criterion parameter is now deprecated for classes ensemble.GradientBoostingRegressor and ensemble.GradientBoostingClassifier, as both options ("friedman_mse" and "squared_error") were producing the same results, up to floating-point rounding discrepancies and a bug in "friedman_mse". By Arthur Lacote. #32708

`sklearn.feature_extraction`#

Fix feature_extraction.image.reconstruct_from_patches_2d now produces correct results when a patch dimension equals the corresponding image dimension. By Eden Rochman. #33643

`sklearn.feature_selection`#

Enhancement feature_selection.SelectFromModel and feature_selection.RFE now support estimators whose feature importance is a sparse matrix or array, notably by passing a user-defined callable to the parameter importance_getter. By andymucyo-ops and isaacambrogetti. #33786
Fix feature_selection.RFE now uses stable sorting when ranking feature importances. This ensures that the feature selection is deterministic and consistent across runs when feature importances are tied. By blitchj. #29532

`sklearn.gaussian_process`#

Efficiency Constructor signature of Gaussian process kernels is now cached, improving performance on small and medium datasets. By Stanislav Terliakov. #33067
Fix The hyperparameters of the default kernel of GaussianProcessRegressor, namely ConstantKernel() * RBF(), are now optimized when optimizer is not None. Thus, gpr = GaussianProcessRegressor().fit(X, y) uses optimized kernel hyperparameters. By Matthias De Lozzo. #32964

`sklearn.inspection`#

Enhancement In inspection.DecisionBoundaryDisplay, multiclass_colors now defaults to the more accessible Petroff color sequence for multiclass problems with up to 10 classes. By Anne Beyer. #33709
Fix In inspection.DecisionBoundaryDisplay, multiclass_colors is now also used for multiclass plotting when response_method="predict". By Anne Beyer. #33015
Fix In inspection.DecisionBoundaryDisplay, n_classes is now inferred more robustly from the estimator. If it fails for custom estimators, a comprehensive error message is shown. By Anne Beyer. #33202
Fix inspection.DecisionBoundaryDisplay now displays all class boundaries when using plot_method="contour" with all response_methods, and displays all classes in distinct colors when using plot_method="contourf" with response_method="predict". By Anne Beyer and Levente Csibi. #33300
Fix In inspection.DecisionBoundaryDisplay, a ValueError is now raised if the colormap passed to multiclass_colors contains fewer colors than there are classes in multiclass problems. By Anne Beyer. #33419
Fix For multiclass data, inspection.DecisionBoundaryDisplay with plot_method="contour" now also displays class-specific contours for response_method="predict_proba" and response_method="decision_function". Multiclass class boundary contour lines are now displayed in black by default for all response methods to avoid confusion. By Anne Beyer. #33471
Fix In inspection.DecisionBoundaryDisplay, multiclass_colors_ now always stores the colors for multiclass problems as a numpy array. By Anne Beyer. #33651

`sklearn.linear_model`#

Feature linear_model.MultiTaskElasticNet, linear_model.MultiTaskElasticNetCV, linear_model.MultiTaskLasso, and linear_model.MultiTaskLassoCV now support fitting on sparse X as well as fitting with sample_weight. By Christian Lorentzen. #33440
Efficiency linear_model.LogisticRegression with solver="lbfgs" now estimates the gradient of the loss at float32 precision when fitted with float32 data (X) to improve training speed and memory efficiency. Previously, the input data would be implicitly cast to float64. If you relied on the previous behavior for numerical reasons, you can explicitly cast your data to float64 before fitting to reproduce it. By Omar Salman and Olivier Grisel. #32644
Efficiency The linear_model.LinearRegression, linear_model.Ridge, linear_model.Lasso, linear_model.LassoCV, linear_model.ElasticNet, linear_model.ElasticNetCV and linear_model.BayesianRidge classes now no longer make an unnecessary copy of dense X, y input during preprocessing when copy_X=False and sample_weight is provided. By Junteng Li. #33041
Enhancement linear_model.LogisticRegressionCV now correctly handles the case when the scoring parameter is set (to something not None) and when the CV splits result in folds where some class labels are missing. By Christian Lorentzen. #32828
Enhancement linear_model.ElasticNet, linear_model.ElasticNetCV and linear_model.enet_path now are able to fit Ridge regression, i.e. setting l1_ratio=0. Before this PR, the stopping criterion was a formulation of the dual gap that breaks down for l1_ratio=0. Now, an alternative dual gap formulation is used for this setting. This reduces the noise of raised warnings. By Christian Lorentzen. #32845
Enhancement Efficiency linear_model.ElasticNet, linear_model.ElasticNetCV, linear_model.Lasso, linear_model.LassoCV, linear_model.MultiTaskElasticNet, linear_model.MultiTaskElasticNetCV linear_model.MultiTaskLasso, linear_model.MultiTaskLassoCV as well as linear_model.lasso_path and linear_model.enet_path are now faster when fit with strong L1 penalty and many features. During gap safe screening of features, the update of the residual is now only performed if the coefficient is not zero. By Christian Lorentzen. #33161
Fix linear_model.LassoCV and linear_model.ElasticNetCV now take the positive parameter into account to compute the maximum alpha parameter, where all coefficients are zero. This impacts the search grid for the internally tuned alpha hyper-parameter stored in the attribute alphas_. By Junteng Li. #32768
Fix Correct the formulation of alpha within linear_model.SGDOneClassSVM. The corrected value is alpha = nu instead of alpha = nu / 2. Note: This might result in changed values for the fitted attributes like coef_ and offset_ as well as the predictions made using this class. By Omar Salman. #32778
Fix linear_model.enet_path now correctly handles the precompute parameter when check_input=False. Previously, the value of precompute was not properly treated which could lead to a ValueError. This also affects linear_model.ElasticNetCV, linear_model.LassoCV, linear_model.MultiTaskElasticNetCV and linear_model.MultiTaskLassoCV. By Albert Dorador. #33014
Fix The leave-one out errors and model parameters estimated in linear_model.RidgeCV and linear_model.RidgeClassifierCV when cv=None are now numerically stable in the small alpha regime. The default auto option is now equivalent to eigen and picks the cheaper option: eigendecomposition of the covariance matrix when n_features <= n_samples, respectively of the Gram matrix when n_samples > n_features. When store_cv_results=True and X is an integer array, the cv_results_ attribute was wrongly coerced to the integer dtype of X, it now always has a float dtype. By Antoine Baker. #33020
Fix Fixed a bug in linear_model.SGDClassifier for multiclass settings where large negative values of linear_model.SGDClassifier.decision_function could lead to NaN values. In this case, this fix assigns equal probability for each class. By Christian Lorentzen. #33168
Fix Fix unsigned integer overflow in linear_model.RidgeClassifier when fitting with unsigned integer inputs. Internal label binarisation now avoids wrapping -1 for unsigned integer target dtypes. By Virgil Chan. #33441
Fix The tol parameter in linear_model.LinearRegression is now set as the cond parameter of the scipy.linalg.lstsq solver when fitting on dense data. Some tests involving linear_model.LinearRegression were brittle with the default cond values from scipy or numpy. Here at least the user has control over the cond value and can change it if necessary. By Antoine Baker. #33565
Fix linear_model.LogisticRegressionCV no longer raises a TypeError when refit=False and use_legacy_attributes=False are set together with a non-elasticnet penalty like l1_ratios=[0.0]. Previously, None was stored in l1_ratio_ instead of 0.0, which caused float() to fail during post-processing. By Mohamad Fazeli. #33902
Fix linear_model.BayesianRidge and linear_model.ARDRegression now center test features during predict to correctly compute predictive variance. By Danilo Silva. #33918
API Change Passing sample_weight as a positional argument to linear_model.LogisticRegressionCV.score is deprecated and will be removed in version 1.11. Pass it as a keyword argument instead. By Adrin Jalali #30859 #30859
API Change The default value of the scoring parameter in linear_model.LogisticRegressionCV will change in version 1.11 from None, i.e. accuracy, to "neg_log_loss". This is a much better default scoring function as it aligns with the log loss that logistic regression is minimizing (with regularization). For the meantime, you can silence the warning for this change by explicitly passing a value to scoring. By Christian Lorentzen. #33333
API Change The parameter n_alphas has been deprecated for linear_model.lasso_path and linear_model.enet_path. This deprecation follows the same deprecation that has happened for linear_model.ElasticNetCV and linear_model.LassoCV. The parameter alphas now supports both integers and array-likes, removing the need for n_alphas. From now on, only alphas should be set, either to and integer to indicate the number of automatically generated alphas or to an array-like of values for the regularization path. By Christian Lorentzen. #33855

`sklearn.manifold`#

Efficiency The way ARPACK eigensolver is called in manifold.SpectralEmbedding and cluster.SpectralClustering was improved, resulting in faster runtimes. By Dmitry Kobak. #33262
Fix manifold.MDS.fit_transform returns the correct number of components when using init="classical_mds". By Ben Pedigo. #33318

`sklearn.metrics`#

Major Feature metrics.metric_at_thresholds has been added to compute a metric’s values across all possible thresholds. By Carlo Lemos and Lucy Liu. #32732
Feature Add class method from_cv_results to metrics.PrecisionRecallDisplay, which allows easy plotting of multiple precision-recall curves from model_selection.cross_validate results. By Lucy Liu. #30508
Enhancement cohen_kappa_score now has a replace_undefined_by param, that can be set to define the function’s return value when the metric is undefined (division by zero). By Stefanie Senger. #31172
Fix metrics.d2_pinball_score and metrics.d2_absolute_error_score now always use the "averaged_inverted_cdf" quantile method, both with and without sample weights. Previously, the "linear" quantile method was used only for the unweighted case leading the surprising discrepancies when comparing the results with unit weights. Note that all quantile interpolation methods are asymptotically equivalent in the large sample limit, but this fix can cause score value changes on small evaluation sets (without weights). By Virgil Chan. #31671
Fix metrics.accuracy_score, metrics.hamming_loss metrics.zero_one_loss, metrics.matthews_corrcoef and metrics.confusion_matrix (when labels is not None) now raise an error when y_true is string and y_pred is numeric, for all array-like inputs. Previously, lists and numpy arrays not of object dtype did not raise an error for this mixed input case. The above metrics will also raise an error for label indicator matrix inputs of inconsistent size, except for metrics.confusion_matrix which does not accept label indicator matrix inputs. By Lucy Liu. #33086
Fix Fixed metrics.pairwise_distances_argmin and metrics.pairwise_distances_argmin_min to avoid a quadratic-time path when many distances are identical, which could lead to severe slowdowns or even a stack overflow (segmentation fault) on large inputs. By Arthur Lacote. #33252
Fix metrics.PrecisionRecallDisplay.from_estimator and metrics.PrecisionRecallDisplay.from_predictions now correctly plot chance level line when y_true is a pytorch tensor. By Lucas Oliveira. #33405
Fix y_pred was deprecated in favor of y_proba for metrics.log_loss and metrics.d2_log_loss_score as predicted probabilities are expected, not predicted labels. By Lucy Liu. #33740
Fix metrics.pairwise_distances no longer raises an error for the euclidean metric when called with Y_norm_squared and n_jobs > 1. By Kunle Li. #33876
API Change Passing the pos_label and sample_weight parameters of metrics.confusion_matrix_at_thresholds as positional arguments is deprecated and will be removed in v1.11. By Jérémie du Boisberranger. #33357

`sklearn.model_selection`#

Enhancement GroupKFold now uses stable sorting when doing the group distribution. This ensures that the splits are consistent across runs. By marikabergengren and Adrin Jalali. #28464
Fix model_selection.StratifiedGroupKFold now raises a ValueError when n_splits is greater than the number of unique groups, preventing degenerate folds. By Chani Fainendler. #33176
Fix Fixed incorrect ValueError when using scoring="average_precision" or similar in model selection utilities such as model_selection.GridSearchCV or model_selection.cross_validate with multiclass classifiers. The pos_label parameter is only relevant for binary classification and was incorrectly being validated for scorers used on multiclass problems. By Olivier Grisel. #33473

`sklearn.neighbors`#

Fix neighbors.KNeighborsClassifier and neighbors.RadiusNeighborsClassifier now work with string labels when algorithm="brute". By AAAZZZR. #33048
Fix Fixed a quadratic-time path in the internal simultaneous_sort used by neighbors.BallTree and neighbors.KDTree queries when many distances are identical, which could lead to severe slowdowns or even a stack overflow (segmentation fault) on large inputs. Neighbor searches with tied distances no longer degrade badly in runtime. By Arthur Lacote. #33252

`sklearn.neural_network`#

Fix neural_network.MLPClassifier with early_stopping=True no longer raises a TypeError when y contains non-numeric class labels (e.g. strings): validation scoring now checks finiteness only for floating predictions. By Guillaume Lemaitre. #33774

`sklearn.pipeline`#

Fix Fixed a bug in pipeline.FeatureUnion with set_output(transform="polars") when transformers produce duplicate column names. By Jérémie du Boisberranger and Levente Csibi. #32106
Fix pipeline.Pipeline now raises an AttributeError when accessing attributes that are not available on an empty pipeline. It’s therefore possible to call dir on an empty pipeline. By Jérémie du Boisberranger. #33362

`sklearn.preprocessing`#

Fix PowerTransformer and QuantileTransformer now don’t raise a warning in inverse_transform related to feature names if fit is called using data with feature names. By Thibault and Mohammad Ahmadullah Khan. #33268
API Change The shuffle and the random_state parameters are deprecated on TargetEncoder and will be removed in version 1.11. Pass a cross-validation generator as cv argument to specify the shuffling behaviour instead. By Stefanie Senger. #33453

`sklearn.svm`#

Fix Raise more informative error when fitting svm.NuSVR with all zero sample weights. By Lucy Liu and John Hendricks. #32212
API Change The probability parameter of sklearn.svm.SVC and sklearn.svm.NuSVC is deprecated due to not being thread-safe and will be removed in 1.11. Use sklearn.calibration.CalibratedClassifierCV with the respective estimator and ensemble=False instead. By Shruti Nath. #32050
API Change The probA_ and probB_ attributes of sklearn.svm.SVC and sklearn.svm.NuSVC are deprecated due to deprecation of the probability parameter and will be removed in 1.11. By Shruti Nath. #33388

`sklearn.tree`#

Feature In tree.DecisionTreeRegressor and ensemble.RandomForestRegressor, criterion="absolute_error" — and, consequently, all criterion options — now support missing values for dense training data X. By Arthur Lacote. #32119
Enhancement tree.DecisionTreeClassifier, tree.DecisionTreeRegressor, tree.ExtraTreeClassifier, tree.ExtraTreeRegressor, ensemble.RandomForestClassifier, ensemble.RandomForestRegressor, ensemble.ExtraTreesClassifier, and ensemble.ExtraTreesRegressor now support combining monotonic_cst with missing values in dense training data. This builds on the improvements to missing-value support for dense training data in #32119. By Samuel O. Ronsin. #27630
Fix Fix calculation of node impurity in tree.DecisionTreeRegressor, ensemble.RandomForestRegressor, ensemble.ExtraTreeRegressor and ensemble.ExtraTreesRegressor when missing values are present for the Poisson criterion. The Poisson criterion was returning invalid impurities (including negative values) when missing values were present. By Arthur Lacote. #32119
Fix Fixed feature-wise NaN detection in trees. Features could be seen as NaN-free for some edge-case patterns, which led to not considering splits with NaNs assigned to the left node for those features. This affects tree.DecisionTreeRegressor, tree.ExtraTreeRegressor, ensemble.RandomForestRegressor and ensemble.ExtraTreesRegressor. By Arthur Lacote. #32193
Fix Fixed color conversion in tree export so RGB values with zero channels are correctly converted to two-digit hexadecimal components (for example, (0, 255, 0) now yields #00ff00). By Simon-Martin Schröder. #33845
API Change criterion="friedman_mse" is now deprecated. This criterion was intended for gradient boosting but was incorrectly implemented in scikit-learn’s trees and was actually behaving identically to criterion="squared_error". Use criterion="squared_error" instead. This affects tree.DecisionTreeRegressor, tree.ExtraTreeRegressor, ensemble.RandomForestRegressor and ensemble.ExtraTreesRegressor. By Arthur Lacote. #32708

`sklearn.utils`#

Enhancement utils.get_tags now provides a clearer error message when a class is passed instead of an estimator instance. By Achyuthan S and Anne Beyer. #32565
Fix The parameter table in the HTML representation of all scikit-learn estimators inheritiging from base.BaseEstimator, displays each parameter documentation as a tooltip. The last tooltip of a parameter in the last table of any HTML representation was partially hidden. This issue has been fixed. By Dea María Léon. #32887
Fix Fixed utils.stats._weighted_percentile with average=True so zero-weight samples just before the end of the array are handled correctly. This can change results before the end of the array are handled correctly. This can change results when using sample_weight with preprocessing.KBinsDiscretizer (strategy="quantile", quantile_method="averaged_inverted_cdf") and in metrics.median_absolute_error, metrics.d2_pinball_score, and metrics.d2_absolute_error_score. By Arthur Lacote. #33127
Fix utils.check_array now correctly rejects pandas StringDtype columns when dtype="numeric" is requested. In pandas 3, string columns use StringDtype instead of object dtype, which caused check_array to silently accept string data instead of raising a ValueError. By Olivier Grisel. #33491
Fix The code path for polars dataframes in utils.validation.validate_data was made independent of the dataframe interchange protocol __dataframe__. This change was necessary to adapt to the recent deprecation of the interchange protocol in polars version 1.40. By Christian Lorentzen. #33789
API Change utils.multiclass.unique_labels now accepts ys_types parameter, which allows avoiding duplicate calls to utils.multiclass.type_of_target. By Lucy Liu. #33086

Code and documentation contributors

Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.8, including:

AAAZZZR, ABHISHEK, Adrin Jalali, Agnus Paul, Albert Dorador Chalar, Alex Kuleshov, alexshacked, ANAND VENUGOPAL, Andres Nayeem Mejia, Andy, Anne Beyer, antoinebaker, Anvay, Arthur, Arthur Lacote, Arturo Amor, Ashutosh Devpura, Auguste Baum, Balaji Seshadri, baynecheke, Ben Pedigo, Bharat Raghunathan, Bodhi Russell Silberling, Bodhi Silberling, Chaitanya Dasari, Chani Fainendler, Charlie Tonneslan, Christian Lorentzen, Christian Veenhuis, Christine P. Chai, CipherCat, clijo, Copilot, C. Titus Brown, cui, Daniel Agyapong, danilo-silva-ufsc, Dan Schult, david-cortes-intel, Dea María Léon, Dhruv Sharma, DhyeyTeraiya, Dimitri Papadopoulos Orfanos, Dmitry Kobak, EdenRochmanSharabi, Emily (Xinyi) Chen, Eric Prestat, fabianhenning, Florian Bourgey, François Paugam, Gaetan, GarimaGarg222, GAUTAM V DATLA, Guillaume Lemaitre, holodata-ej, Ho Yin Chau, Isaacc, Itamar Turner-Trauring, Jake Blitch, James Dean, James Lamb, Jérémie du Boisberranger, Jim Crist-Harif, John Hendricks, Junteng Li, Karthik, Kiyarash Fazeli, Kunle, Lev, Levente Csibi, Loic Esteve, Lucas Colley, Lucas Oliveira, Lucy Liu, Marco Edward Gorelli, marikabergengren, Matthias De Lozzo, Mohammad Ahmadullah Khan, Nguyen Cat Luong, Nikita, Nithurshen, Olivier Grisel, Omar Salman, pavitra danappa byali, pomrakna, prakritim01, Quentin Barthélemy, Ralf Gommers, Ram, Remi Gau, Reshama Shaikh, Riya Jha, Robert Pollak, Rudrendu Paul, Samuel O. Ronsin, Sarvesh V, sauravyadav1008, Seyi Kuforiji, shifanaaaa, Shruti Nath, Shyan Paul, Simon-Martin Schröder, Sophia Houhamdi, Stanislav Terliakov, Stefanie Senger, Taoufik KEHAL, Tejas, TejasAnalyst, Thomas Moreau, Thomas S., Tim Head, Unique Shrestha, Varun Agnihotri, Virgil Chan, Wiktor Olszowy, Xiao Yuan, Yann Lechelle