Version 1.9#
Legend for changelogs
Major Feature something big that you couldn’t do before.
Feature something that you couldn’t do before.
Efficiency an existing feature now may not require as much computation or memory.
Enhancement a miscellaneous minor improvement.
Fix something that previously didn’t work as documented – or according to reasonable expectations – should now work.
API Change you will need to change your code to have the same effect in the future; or a feature will be removed in the future.
Version 1.9.dev0#
May 2026
Changed models#
Enhancement The
transformmethod ofpreprocessing.PowerTransformerwithmethod="yeo-johnson"now uses the numerical more stable functionscipy.stats.yeojohnsoninstead of an own implementation. The results may deviate in numerical edge cases or within the precision of floating-point arithmetic. By Christian Lorentzen. #33272
Changes impacting many modules#
Major Feature Introduced a new config key: “sparse_interface” to control whether functions return sparse objects using SciPy sparse matrix or SciPy sparse array. Use
sklearn.set_config(sparse_interface="sparray")to have sklearn return sparse arrays. See more at the SciPy Sparse Migration Guide. The scikit-learn config “sparse_interface” initially defaults to sparse matrix (“spmatrix”). The plan is to have the default change to sparse array (“sparray”) in a few releases. By Dan Schult. #31177Enhancement Scikit-learn accepted a new library dependency: narwhals. This is a very lightweight dependency that simplifies the support of dataframe input
Xand dataframe output as specified in theset_outputAPI. Examples are pandas and polars dataframes. Narwhals can also help to support more dataframe libraries. Another reason for its adoption was that the dataframe interchange protocol (__dataframe__) on which scikit-learn relied so far for non-pandas dataframes got deprecated by polars and has run its course. By Christian Lorentzen and Marco Gorelli. #31127Enhancement The HTML representation of all scikit-learn estimators inheriting from
base.BaseEstimatornow displays a new block showing the number and names of the output features when using acompose.ColumnTransformeror apipeline.FeatureUnion. A copy-paste button is available for the output features name. By Dea María Léon, Guillaume Lemaitre, Jérémie du Boisberranger, Olivier Grisel, Antoine Baker. #31937Enhancement
pipeline.Pipeline,pipeline.FeatureUnionandcompose.ColumnTransformernow raise a clearer error message when an estimator class is passed instead of an instance. By Anne Beyer. #32888Enhancement Checks for response values now provide a clearer error message when estimator does not implement the given
response_method. By Quentin Barthélemy. #33126Enhancement The HTML representation of all scikit-learn estimators inheriting from
base.BaseEstimatornow includes a table displaying their fitted attributes. These are all the public estimator attributes that are computed during the call to fit with a name that ends with an underscore. By Dea María Léon, Jérémie du Boisberranger, Olivier Grisel, Guillaume Lemaitre, Antoine Baker. #33399Fix Raise ValueError when
sample_weightcontains only zero values to prevent meaningless input data during fitting. This change applies to all estimators that support the parametersample_weight. This change also affects metrics that validate sample weights. By Lucy Liu and John Hendricks. #32212Fix Some parameter descriptions in the HTML representation of estimators were not properly escaped, which could lead to malformed HTML if the description contains characters like
<or>. By Olivier Grisel. #32942
Support for Array API#
Additional estimators and functions have been updated to include support for all Array API compliant inputs.
See Array API support (experimental) for more details.
Feature
sklearn.metrics.d2_absolute_error_scoreandsklearn.metrics.d2_pinball_scorenow support array API compatible inputs. By Virgil Chan. #31671Feature
linear_model.LogisticRegressionnow supports array API compatible inputs withsolver="lbfgs". By Omar Salman and Olivier Grisel. #32644Feature
metrics.average_precision_scorenow supports Array API compliant inputs. By Stefanie Senger. #32909Feature
sklearn.metrics.pairwise.paired_manhattan_distancesnow supports array API compatible inputs. By Bharat Raghunathan. #32979Feature
metrics.pairwise_distances_argminnow supports array API compatible inputs. By Bharat Raghunathan. #32985Feature
linear_model.LinearRegression,linear_model.Ridge,linear_model.RidgeClassifier,linear_model.LogisticRegression, anddiscriminant_analysis.LinearDiscriminantAnalysisnow raise a more informative error message when arrays passed at fit and prediction time use different array API namespaces or devices. A newsklearn.utils._array_api.move_estimator_toutility is provided to move an estimator’s fitted array attributes to a different namespace and device. By Jérôme Dockès and Tim Head. #33076Feature
pipeline.FeatureUnionnow supports Array API compliant inputs when all its transformers do. By Olivier Grisel. #33263Feature
linear_model.PoissonRegressornow supports array API compatible inputs withsolver="lbfgs". By Christian Lorentzen and Omar Salman. #33348Enhancement
kernel_approximation.Nystroemnow supports array API compatible inputs. By Emily Chen. #29661Enhancement
linear_model.RidgeCVnow accepts array API compliant arrays withgcv_modeset toautooreigen. By Antoine Baker. #33020Enhancement Internal NumPy CPU conversions now always attempt a generic DLPack-based transfer and only fallback to library-specific methods when necessary. This should ease support for additional array API and DLPack compliant input types without extending the ad hoc conversion helpers. By Olivier Grisel. #33623
Fix Fixed a bug that would cause Cython-based estimators to fail when fit on NumPy inputs when setting
sklearn.set_config(array_api_dispatch=True). By Olivier Grisel. #32846Fix Fixes how
pos_labelis inferred whenpos_labelis set toNone, insklearn.metrics.brier_score_lossandsklearn.metrics.d2_brier_score. By Lucy Liu. #32923Fix
linear_model.ridge_regressionnow correctly passes a Python scalar asfill_valuetoxp.fullwhen broadcasting alpha for multi-target regression, ensuring compliance with the array API specification. This fixes compatibility issues with some array API backends. By Olivier Grisel. #33437Fix
metrics.pairwise_distancesno longer emits spurious cross-library dtype comparison warnings when called with Array API inputs underconfig_context(array_api_dispatch=True). By Olivier Grisel. #33873Fix Fixed support for integer Array API inputs on devices that do not support
float64inpreprocessing.MinMaxScaler,preprocessing.MaxAbsScaler,preprocessing.KernelCenterer,preprocessing.normalize,utils.extmath.randomized_range_finder, and internal linear-model preprocessing and log-sum-exp utilities. By Arthur Lacote. #33898
Metadata routing#
Refer to the Metadata Routing User Guide for more details.
Enhancement
TargetEncodernow routesgroupsto the CV splitter internally used for cross fitting in itsfit_transform. By Samruddhi Baviskar and Stefanie Senger. #33089Fix Scorers now correctly request for metadata, and their
set_score_requestmethods correctly detect metadata available in the signature of theirscore_func. Also,sklearn.linear_model.LogisticRegressionCVnow correctly routes metadata to the underlying scorer when its.score(...)method is called. By Adrin Jalali #30859Fix If a class explicitly defines a
set_{method}_requestmethod, it will not be overridden by the metadata routing machinery. By Adrin Jalali #32111
Callbacks#
Major Feature This release introduces a new callback API to invoke callbacks during the fitting of estimators that support them. It comes with two built-in callbacks:
sklearn.callback.ProgressBar, to display progress bars.sklearn.callback.ScoringMonitor, to compute and log a scoring metric at the end of each iteration.
The following estimators support callbacks:
LogisticRegression(only withsolver="lbfgs").
It also provides a public API to implement callback support in custom estimators or or to implement custom callbacks, see the developer’s guide.
This API is experimental for now and may change without the usual deprecation cycle.
By Jérémie du Boisberranger, François Paugam and Stefanie Senger. #33322
sklearn.cluster#
Enhancement
cluster.AgglomerativeClusteringandcluster.FeatureAgglomerationnow acceptmetric="l2"together withlinkage="ward".metric="l2"is equivalent tometric="euclidean". #24681 by Guillaume Lemaitre. #24681Fix
cluster.MiniBatchKMeansnow correctly handles sample weights during fitting. When sample weights are not None, mini-batch indices are created by sub-sampling with replacement using the normalized sample weights as probabilities. By Shruti Nath, Olivier Grisel, and Jeremie du Boisberranger. #30751Fix Fixed a bug in
cluster.BisectingKMeanswhen using a custom callableinitwithn_clusters > 2. By Mohammad Ahmadullah Khan. #33148
sklearn.compose#
Fix The dotted line for
compose.ColumnTransformerin its HTML display now includes only its elements. The behaviour when a remainder is used, has also been corrected. By Dea María Léon. #32713Fix Fixes the regression that a
KeyErrorwas thrown when usingcompose.ColumnTransformer.fit_transformwith metadata routing andremainder="passthrough". By Anne Beyer. #33665
sklearn.datasets#
Efficiency Re-enabled compressed caching for
datasets.fetch_kddcup99, reducing on-disk cache size without changing the public API. By Unique Shrestha. #33118Fix Fixed
datasets.fetch_openmlto issue OpenML API calls tohttps://www.openml.org/api/v1/instead ofhttps://api.openml.org/api/v1/, which no longer resolves or redirects correctly. By Olivier Grisel. #33868
sklearn.decomposition#
Efficiency
FastICAwithalgorithm='deflation'andfun='logcosh'is now an order of magnitude faster. By Mohammad Ahmadullah Khan. #33269Fix Fixed a typo (from
"OR"to"QR") in the list of allowed values forpower_iteration_normalizerindecomposition.TruncatedSVD. By Olivier Grisel. #33492
sklearn.ensemble#
Fix Fixed the way
ensemble.HistGradientBoostingClassifierandensemble.HistGradientBoostingRegressorcompute their bin edges to properly and consistently handle sample_weight. Whensample_weights=Noneis passed tofitand the number of distinct feature values is less than the specifiedmax_bins, the edges are still set to midpoints between consecutive feature values. Otherwise, the bin edges are set to weight-aware quantiles computed using the averaged inverted CDF method. Ifn_samplesis larger than thesubsampleparameter, the weights are instead used to subsample the data (with replacement) and the bin edges are set using unweighted quantiles of the subsampled data. By Shruti Nath and Olivier Grisel. #29641Fix
ensemble.RandomForestClassifier,ensemble.RandomForestRegressor,ensemble.ExtraTreesClassifierandensemble.ExtraTreesRegressornow usesample_weightto draw the samples instead of forwarding them multiplied by a uniformly sampled mask to the underlying estimators. Furthermore, whenmax_samplesis a float, it is now interpreted as a fraction ofsample_weight.sum()instead ofX.shape[0]. As sampling is done with replacement, a floatmax_samplesgreater than1.0is now allowed, as well as an integermax_samplesgreater thenX.shape[0]. The defaultmax_samples=NonedrawsX.shape[0]samples, irrespective ofsample_weight. By Antoine Baker. #31529Fix Both
ensemble.GradientBoostingRegressorandensemble.GradientBoostingClassifierwith the default"friedman_mse"criterion were computing impurity values with an incorrect scaling, leading to unexpected trees in some cases. The implementation now uses"squared_error", which is exactly equivalent to"friedman_mse"up to floating-point error discrepancies but computes correct impurity values. By Arthur Lacote. #32708API Change The
criterionparameter is now deprecated for classesensemble.GradientBoostingRegressorandensemble.GradientBoostingClassifier, as both options ("friedman_mse"and"squared_error") were producing the same results, up to floating-point rounding discrepancies and a bug in"friedman_mse". By Arthur Lacote. #32708
sklearn.feature_extraction#
Fix
feature_extraction.image.reconstruct_from_patches_2dnow produces correct results when a patch dimension equals the corresponding image dimension. By Eden Rochman. #33643
sklearn.feature_selection#
Enhancement
feature_selection.SelectFromModelandfeature_selection.RFEnow support estimators whose feature importance is a sparse matrix or array, notably by passing a user-defined callable to the parameterimportance_getter. By andymucyo-ops and isaacambrogetti. #33786Fix
feature_selection.RFEnow uses stable sorting when ranking feature importances. This ensures that the feature selection is deterministic and consistent across runs when feature importances are tied. By blitchj. #29532
sklearn.gaussian_process#
Efficiency Constructor signature of Gaussian process kernels is now cached, improving performance on small and medium datasets. By Stanislav Terliakov. #33067
Fix The hyperparameters of the default kernel of
GaussianProcessRegressor, namelyConstantKernel() * RBF(), are now optimized whenoptimizeris notNone. Thus,gpr = GaussianProcessRegressor().fit(X, y)uses optimized kernel hyperparameters. By Matthias De Lozzo. #32964
sklearn.inspection#
Enhancement In
inspection.DecisionBoundaryDisplay,multiclass_colorsnow defaults to the more accessible Petroff color sequence for multiclass problems with up to 10 classes. By Anne Beyer. #33709Fix In
inspection.DecisionBoundaryDisplay,multiclass_colorsis now also used for multiclass plotting whenresponse_method="predict". By Anne Beyer. #33015Fix In
inspection.DecisionBoundaryDisplay,n_classesis now inferred more robustly from the estimator. If it fails for custom estimators, a comprehensive error message is shown. By Anne Beyer. #33202Fix
inspection.DecisionBoundaryDisplaynow displays all class boundaries when usingplot_method="contour"with all response_methods, and displays all classes in distinct colors when usingplot_method="contourf"withresponse_method="predict". By Anne Beyer and Levente Csibi. #33300Fix In
inspection.DecisionBoundaryDisplay, aValueErroris now raised if the colormap passed tomulticlass_colorscontains fewer colors than there are classes in multiclass problems. By Anne Beyer. #33419Fix For multiclass data,
inspection.DecisionBoundaryDisplaywithplot_method="contour"now also displays class-specific contours forresponse_method="predict_proba"andresponse_method="decision_function". Multiclass class boundary contour lines are now displayed in black by default for all response methods to avoid confusion. By Anne Beyer. #33471Fix In
inspection.DecisionBoundaryDisplay,multiclass_colors_now always stores the colors for multiclass problems as a numpy array. By Anne Beyer. #33651
sklearn.linear_model#
Feature
linear_model.MultiTaskElasticNet,linear_model.MultiTaskElasticNetCV,linear_model.MultiTaskLasso, andlinear_model.MultiTaskLassoCVnow support fitting on sparseXas well as fitting withsample_weight. By Christian Lorentzen. #33440Efficiency
linear_model.LogisticRegressionwithsolver="lbfgs"now estimates the gradient of the loss atfloat32precision when fitted withfloat32data (X) to improve training speed and memory efficiency. Previously, the input data would be implicitly cast tofloat64. If you relied on the previous behavior for numerical reasons, you can explicitly cast your data tofloat64before fitting to reproduce it. By Omar Salman and Olivier Grisel. #32644Efficiency The
linear_model.LinearRegression,linear_model.Ridge,linear_model.Lasso,linear_model.LassoCV,linear_model.ElasticNet,linear_model.ElasticNetCVandlinear_model.BayesianRidgeclasses now no longer make an unnecessary copy of denseX, yinput during preprocessing whencopy_X=Falseandsample_weightis provided. By Junteng Li. #33041Enhancement
linear_model.LogisticRegressionCVnow correctly handles the case when thescoringparameter is set (to something notNone) and when the CV splits result in folds where some class labels are missing. By Christian Lorentzen. #32828Enhancement
linear_model.ElasticNet,linear_model.ElasticNetCVandlinear_model.enet_pathnow are able to fit Ridge regression, i.e. settingl1_ratio=0. Before this PR, the stopping criterion was a formulation of the dual gap that breaks down forl1_ratio=0. Now, an alternative dual gap formulation is used for this setting. This reduces the noise of raised warnings. By Christian Lorentzen. #32845Enhancement Efficiency
linear_model.ElasticNet,linear_model.ElasticNetCV,linear_model.Lasso,linear_model.LassoCV,linear_model.MultiTaskElasticNet,linear_model.MultiTaskElasticNetCVlinear_model.MultiTaskLasso,linear_model.MultiTaskLassoCVas well aslinear_model.lasso_pathandlinear_model.enet_pathare now faster when fit with strong L1 penalty and many features. During gap safe screening of features, the update of the residual is now only performed if the coefficient is not zero. By Christian Lorentzen. #33161Fix
linear_model.LassoCVandlinear_model.ElasticNetCVnow take thepositiveparameter into account to compute the maximumalphaparameter, where all coefficients are zero. This impacts the search grid for the internally tunedalphahyper-parameter stored in the attributealphas_. By Junteng Li. #32768Fix Correct the formulation of
alphawithinlinear_model.SGDOneClassSVM. The corrected value isalpha = nuinstead ofalpha = nu / 2. Note: This might result in changed values for the fitted attributes likecoef_andoffset_as well as the predictions made using this class. By Omar Salman. #32778Fix
linear_model.enet_pathnow correctly handles theprecomputeparameter whencheck_input=False. Previously, the value ofprecomputewas not properly treated which could lead to a ValueError. This also affectslinear_model.ElasticNetCV,linear_model.LassoCV,linear_model.MultiTaskElasticNetCVandlinear_model.MultiTaskLassoCV. By Albert Dorador. #33014Fix The leave-one out errors and model parameters estimated in
linear_model.RidgeCVandlinear_model.RidgeClassifierCVwhencv=Noneare now numerically stable in the smallalpharegime. The defaultautooption is now equivalent toeigenand picks the cheaper option: eigendecomposition of the covariance matrix whenn_features <= n_samples, respectively of the Gram matrix whenn_samples > n_features. Whenstore_cv_results=TrueandXis an integer array, thecv_results_attribute was wrongly coerced to the integer dtype ofX, it now always has a float dtype. By Antoine Baker. #33020Fix Fixed a bug in
linear_model.SGDClassifierfor multiclass settings where large negative values oflinear_model.SGDClassifier.decision_functioncould lead to NaN values. In this case, this fix assigns equal probability for each class. By Christian Lorentzen. #33168Fix Fix unsigned integer overflow in
linear_model.RidgeClassifierwhen fitting with unsigned integer inputs. Internal label binarisation now avoids wrapping -1 for unsigned integer target dtypes. By Virgil Chan. #33441Fix The
tolparameter inlinear_model.LinearRegressionis now set as thecondparameter of thescipy.linalg.lstsqsolver when fitting on dense data. Some tests involvinglinear_model.LinearRegressionwere brittle with the defaultcondvalues fromscipyornumpy. Here at least the user has control over thecondvalue and can change it if necessary. By Antoine Baker. #33565Fix
linear_model.LogisticRegressionCVno longer raises aTypeErrorwhenrefit=Falseanduse_legacy_attributes=Falseare set together with a non-elasticnet penalty likel1_ratios=[0.0]. Previously,Nonewas stored inl1_ratio_instead of0.0, which causedfloat()to fail during post-processing. By Mohamad Fazeli. #33902Fix
linear_model.BayesianRidgeandlinear_model.ARDRegressionnow center test features duringpredictto correctly compute predictive variance. By Danilo Silva. #33918API Change Passing
sample_weightas a positional argument tolinear_model.LogisticRegressionCV.scoreis deprecated and will be removed in version 1.11. Pass it as a keyword argument instead. By Adrin Jalali #30859 #30859API Change The default value of the
scoringparameter inlinear_model.LogisticRegressionCVwill change in version 1.11 fromNone, i.e. accuracy, to"neg_log_loss". This is a much better default scoring function as it aligns with the log loss that logistic regression is minimizing (with regularization). For the meantime, you can silence the warning for this change by explicitly passing a value toscoring. By Christian Lorentzen. #33333API Change The parameter
n_alphashas been deprecated forlinear_model.lasso_pathandlinear_model.enet_path. This deprecation follows the same deprecation that has happened forlinear_model.ElasticNetCVandlinear_model.LassoCV. The parameteralphasnow supports both integers and array-likes, removing the need forn_alphas. From now on, onlyalphasshould be set, either to and integer to indicate the number of automatically generated alphas or to an array-like of values for the regularization path. By Christian Lorentzen. #33855
sklearn.manifold#
Efficiency The way ARPACK eigensolver is called in
manifold.SpectralEmbeddingandcluster.SpectralClusteringwas improved, resulting in faster runtimes. By Dmitry Kobak. #33262Fix
manifold.MDS.fit_transformreturns the correct number of components when usinginit="classical_mds". By Ben Pedigo. #33318
sklearn.metrics#
Major Feature
metrics.metric_at_thresholdshas been added to compute a metric’s values across all possible thresholds. By Carlo Lemos and Lucy Liu. #32732Feature Add class method
from_cv_resultstometrics.PrecisionRecallDisplay, which allows easy plotting of multiple precision-recall curves frommodel_selection.cross_validateresults. By Lucy Liu. #30508Enhancement
cohen_kappa_scorenow has areplace_undefined_byparam, that can be set to define the function’s return value when the metric is undefined (division by zero). By Stefanie Senger. #31172Fix
metrics.d2_pinball_scoreandmetrics.d2_absolute_error_scorenow always use the"averaged_inverted_cdf"quantile method, both with and without sample weights. Previously, the"linear"quantile method was used only for the unweighted case leading the surprising discrepancies when comparing the results with unit weights. Note that all quantile interpolation methods are asymptotically equivalent in the large sample limit, but this fix can cause score value changes on small evaluation sets (without weights). By Virgil Chan. #31671Fix
metrics.accuracy_score,metrics.hamming_lossmetrics.zero_one_loss,metrics.matthews_corrcoefandmetrics.confusion_matrix(whenlabelsis notNone) now raise an error wheny_trueis string andy_predis numeric, for all array-like inputs. Previously, lists and numpy arrays not ofobjectdtype did not raise an error for this mixed input case. The above metrics will also raise an error for label indicator matrix inputs of inconsistent size, except formetrics.confusion_matrixwhich does not accept label indicator matrix inputs. By Lucy Liu. #33086Fix Fixed
metrics.pairwise_distances_argminandmetrics.pairwise_distances_argmin_minto avoid a quadratic-time path when many distances are identical, which could lead to severe slowdowns or even a stack overflow (segmentation fault) on large inputs. By Arthur Lacote. #33252Fix
metrics.PrecisionRecallDisplay.from_estimatorandmetrics.PrecisionRecallDisplay.from_predictionsnow correctly plot chance level line wheny_trueis a pytorch tensor. By Lucas Oliveira. #33405Fix
y_predwas deprecated in favor ofy_probaformetrics.log_lossandmetrics.d2_log_loss_scoreas predicted probabilities are expected, not predicted labels. By Lucy Liu. #33740Fix
metrics.pairwise_distancesno longer raises an error for the euclidean metric when called withY_norm_squaredandn_jobs > 1. By Kunle Li. #33876API Change Passing the
pos_labelandsample_weightparameters ofmetrics.confusion_matrix_at_thresholdsas positional arguments is deprecated and will be removed in v1.11. By Jérémie du Boisberranger. #33357
sklearn.model_selection#
Enhancement
GroupKFoldnow usesstablesorting when doing the group distribution. This ensures that the splits are consistent across runs. By marikabergengren and Adrin Jalali. #28464Fix
model_selection.StratifiedGroupKFoldnow raises aValueErrorwhenn_splitsis greater than the number of unique groups, preventing degenerate folds. By Chani Fainendler. #33176Fix Fixed incorrect
ValueErrorwhen usingscoring="average_precision"or similar in model selection utilities such asmodel_selection.GridSearchCVormodel_selection.cross_validatewith multiclass classifiers. Thepos_labelparameter is only relevant for binary classification and was incorrectly being validated for scorers used on multiclass problems. By Olivier Grisel. #33473
sklearn.neighbors#
Fix
neighbors.KNeighborsClassifierandneighbors.RadiusNeighborsClassifiernow work with string labels whenalgorithm="brute". By AAAZZZR. #33048Fix Fixed a quadratic-time path in the internal
simultaneous_sortused byneighbors.BallTreeandneighbors.KDTreequeries when many distances are identical, which could lead to severe slowdowns or even a stack overflow (segmentation fault) on large inputs. Neighbor searches with tied distances no longer degrade badly in runtime. By Arthur Lacote. #33252
sklearn.neural_network#
Fix
neural_network.MLPClassifierwithearly_stopping=Trueno longer raises aTypeErrorwhenycontains non-numeric class labels (e.g. strings): validation scoring now checks finiteness only for floating predictions. By Guillaume Lemaitre. #33774
sklearn.pipeline#
Fix Fixed
pipeline.FeatureUnionto properly handle column renaming when using Polars output, preventing duplicate column names. By Levente Csibi. #32853Fix
pipeline.Pipelinenow raises anAttributeErrorwhen accessing attributes that are not available on an empty pipeline. It’s therefore possible to calldiron an empty pipeline. By Jérémie du Boisberranger. #33362
sklearn.preprocessing#
Fix
PowerTransformerandQuantileTransformernow don’t raise a warning ininverse_transformrelated to feature names iffitis called using data with feature names. By Thibault and Mohammad Ahmadullah Khan. #33268API Change The
shuffleand therandom_stateparameters are deprecated onTargetEncoderand will be removed in version 1.11. Pass a cross-validation generator ascvargument to specify the shuffling behaviour instead. By Stefanie Senger. #33453
sklearn.svm#
Fix Raise more informative error when fitting
svm.NuSVRwith all zero sample weights. By Lucy Liu and John Hendricks. #32212API Change The
probabilityparameter ofsklearn.svm.SVCandsklearn.svm.NuSVCis deprecated due to not being thread-safe and will be removed in 1.11. Usesklearn.calibration.CalibratedClassifierCVwith the respective estimator andensemble=Falseinstead. By Shruti Nath. #32050API Change The
probA_andprobB_attributes ofsklearn.svm.SVCandsklearn.svm.NuSVCare deprecated due to deprecation of theprobabilityparameter and will be removed in 1.11. By Shruti Nath. #33388
sklearn.tree#
Feature In
tree.DecisionTreeRegressorandensemble.RandomForestRegressor,criterion="absolute_error"— and, consequently, all criterion options — now support missing values for dense training dataX. By Arthur Lacote. #32119Enhancement
tree.DecisionTreeClassifier,tree.DecisionTreeRegressor,tree.ExtraTreeClassifier,tree.ExtraTreeRegressor,ensemble.RandomForestClassifier,ensemble.RandomForestRegressor,ensemble.ExtraTreesClassifier, andensemble.ExtraTreesRegressornow support combiningmonotonic_cstwith missing values in dense training data. This builds on the improvements to missing-value support for dense training data in #32119. By Samuel O. Ronsin. #27630Fix Fix calculation of node impurity in
tree.DecisionTreeRegressor,ensemble.RandomForestRegressor,ensemble.ExtraTreeRegressorandensemble.ExtraTreesRegressorwhen missing values are present for the Poisson criterion. The Poisson criterion was returning invalid impurities (including negative values) when missing values were present. By Arthur Lacote. #32119Fix Fixed feature-wise NaN detection in trees. Features could be seen as NaN-free for some edge-case patterns, which led to not considering splits with NaNs assigned to the left node for those features. This affects
tree.DecisionTreeRegressor,tree.ExtraTreeRegressor,ensemble.RandomForestRegressorandensemble.ExtraTreesRegressor. By Arthur Lacote. #32193Fix Fixed color conversion in tree export so RGB values with zero channels are correctly converted to two-digit hexadecimal components (for example,
(0, 255, 0)now yields#00ff00). By Simon-Martin Schröder. #33845API Change
criterion="friedman_mse"is now deprecated. This criterion was intended for gradient boosting but was incorrectly implemented in scikit-learn’s trees and was actually behaving identically tocriterion="squared_error". Usecriterion="squared_error"instead. This affectstree.DecisionTreeRegressor,tree.ExtraTreeRegressor,ensemble.RandomForestRegressorandensemble.ExtraTreesRegressor. By Arthur Lacote. #32708
sklearn.utils#
Enhancement
utils.get_tagsnow provides a clearer error message when a class is passed instead of an estimator instance. By Achyuthan S and Anne Beyer. #32565Fix The parameter table in the HTML representation of all scikit-learn estimators inheritiging from
base.BaseEstimator, displays each parameter documentation as a tooltip. The last tooltip of a parameter in the last table of any HTML representation was partially hidden. This issue has been fixed. By Dea María Léon. #32887Fix Fixed
utils.stats._weighted_percentilewithaverage=Trueso zero-weight samples just before the end of the array are handled correctly. This can change results before the end of the array are handled correctly. This can change results when usingsample_weightwithpreprocessing.KBinsDiscretizer(strategy="quantile",quantile_method="averaged_inverted_cdf") and inmetrics.median_absolute_error,metrics.d2_pinball_score, andmetrics.d2_absolute_error_score. By Arthur Lacote. #33127Fix
utils.check_arraynow correctly rejects pandasStringDtypecolumns whendtype="numeric"is requested. In pandas 3, string columns useStringDtypeinstead ofobjectdtype, which causedcheck_arrayto silently accept string data instead of raising aValueError. By Olivier Grisel. #33491Fix The code path for polars dataframes in
utils.validation.validate_datawas made independent of the dataframe interchange protocol__dataframe__. This change was necessary to adapt to the recent deprecation of the interchange protocol in polars version 1.40. By Christian Lorentzen. #33789API Change
utils.multiclass.unique_labelsnow acceptsys_typesparameter, which allows avoiding duplicate calls toutils.multiclass.type_of_target. By Lucy Liu. #33086
Code and documentation contributors
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.8, including:
TODO: update at the time of the release.