.. include:: _contributors.rst .. currentmodule:: sklearn .. _release_notes_1_8: =========== Version 1.8 =========== .. -- UNCOMMENT WHEN 1.8.0 IS RELEASED -- For a short description of the main highlights of the release, please refer to :ref:`sphx_glr_auto_examples_release_highlights_plot_release_highlights_1_7_0.py`. .. DELETE WHEN 1.8.0 IS RELEASED Since October 2024, DO NOT add your changelog entry in this file. .. Instead, create a file named `..rst` in the relevant sub-folder in `doc/whats_new/upcoming_changes/`. For full details, see: https://github.com/scikit-learn/scikit-learn/blob/main/doc/whats_new/upcoming_changes/README.md .. include:: changelog_legend.inc .. towncrier release notes start .. _changes_1_8_0: Version 1.8.0 ============= **November 2025** Changes impacting many modules ------------------------------ - |Efficiency| Improved CPU and memory usage in estimators and metric functions that rely on weighted percentiles and better match NumPy and Scipy (un-weighted) implementations of percentiles. By :user:`Lucy Liu ` :pr:`31775` Support for Array API --------------------- Additional estimators and functions have been updated to include support for all `Array API `_ compliant inputs. See :ref:`array_api` for more details. - |Feature| :class:`sklearn.preprocessing.StandardScaler` now supports Array API compliant inputs. By :user:`Alexander Fabisch `, :user:`Edoardo Abati `, :user:`Olivier Grisel ` and :user:`Charles Hill `. :pr:`27113` - |Feature| :class:`linear_model.RidgeCV`, :class:`linear_model.RidgeClassifier` and :class:`linear_model.RidgeClassifierCV` now support array API compatible inputs with `solver="svd"`. By :user:`Jérôme Dockès `. :pr:`27961` - |Feature| :func:`metrics.pairwise.pairwise_kernels` for any kernel except "laplacian" and :func:`metrics.pairwise_distances` for metrics "cosine", "euclidean" and "l2" now support array API inputs. By :user:`Emily Chen ` and :user:`Lucy Liu ` :pr:`29822` - |Feature| :func:`sklearn.metrics.confusion_matrix` now supports Array API compatible inputs. By :user:`Stefanie Senger ` :pr:`30562` - |Feature| :class:`sklearn.mixture.GaussianMixture` with `init_params="random"` or `init_params="random_from_data"` and `warm_start=False` now supports Array API compatible inputs. By :user:`Stefanie Senger ` and :user:`Loïc Estève ` :pr:`30777` - |Feature| :func:`sklearn.metrics.roc_curve` now supports Array API compatible inputs. By :user:`Thomas Li ` :pr:`30878` - |Feature| :class:`preprocessing.PolynomialFeatures` now supports array API compatible inputs. By :user:`Omar Salman ` :pr:`31580` - |Feature| :class:`calibration.CalibratedClassifierCV` now supports array API compatible inputs with `method="temperature"` and when the underlying `estimator` also supports the array API. By :user:`Omar Salman ` :pr:`32246` - |Feature| :func:`sklearn.metrics.precision_recall_curve` now supports array API compatible inputs. By :user:`Lucy Liu ` :pr:`32249` - |Feature| :func:`sklearn.model_selection.cross_val_predict` now supports array API compatible inputs. By :user:`Omar Salman ` :pr:`32270` - |Feature| :func:`sklearn.metrics.brier_score_loss`, :func:`sklearn.metrics.log_loss`, :func:`sklearn.metrics.d2_brier_score` and :func:`sklearn.metrics.d2_log_loss_score` now support array API compatible inputs. By :user:`Omar Salman ` :pr:`32422` - |Feature| :class:`naive_bayes.GaussianNB` now supports array API compatible inputs. By :user:`Omar Salman ` :pr:`32497` - |Feature| :func:`sklearn.metrics.det_curve` now supports Array API compliant inputs. By :user:`Josef Affourtit `. :pr:`32586` - |Feature| :func:`sklearn.metrics.pairwise.manhattan_distances` now supports array API compatible inputs. By :user:`Omar Salman `. :pr:`32597` - |Feature| :func:`sklearn.metrics.calinski_harabasz_score` now supports Array API compliant inputs. By :user:`Josef Affourtit `. :pr:`32600` - |Feature| :func:`sklearn.metrics.balanced_accuracy_score` now supports array API compatible inputs. By :user:`Omar Salman `. :pr:`32604` - |Feature| :func:`sklearn.metrics.pairwise.laplacian_kernel` now supports array API compatible inputs. By :user:`Zubair Shakoor `. :pr:`32613` - |Feature| :func:`sklearn.metrics.cohen_kappa_score` now supports array API compatible inputs. By :user:`Omar Salman `. :pr:`32619` Metadata routing ---------------- Refer to the :ref:`Metadata Routing User Guide ` for more details. - |Fix| Fixed an issue where passing `sample_weight` to a :class:`Pipeline` inside a :class:`GridSearchCV` would raise an error with metadata routing enabled. By `Adrin Jalali`_. :pr:`31898` Free-threaded CPython 3.14 support ---------------------------------- scikit-learn has support for free-threaded CPython, in particular free-threaded wheels are available for all of our supported platforms on Python 3.14. Free-threaded (also known as nogil) CPython is a version of CPython that aims at enabling efficient multi-threaded use cases by removing the Global Interpreter Lock (GIL). If you want to try out free-threaded Python, the recommendation is to use Python 3.14, that has fixed a number of issues compared to Python 3.13. Feel free to try free-threaded on your use case and report any issues! For more details about free-threaded CPython see `py-free-threading doc `_, in particular `how to install a free-threaded CPython `_ and `Ecosystem compatibility tracking `_. By :user:`Loïc Estève ` and :user:`Olivier Grisel ` and many other people in the wider Scientific Python and CPython ecosystem, for example :user:`Nathan Goldbaum `, :user:`Ralf Gommers `, :user:`Edgar Andrés Margffoy Tuay `. :pr:`custom-top-level-32079` :mod:`sklearn.base` ------------------- - |Feature| Refactored :meth:`dir` in :class:`BaseEstimator` to recognize condition check in :meth:`available_if`. By :user:`John Hendricks ` and :user:`Miguel Parece `. :pr:`31928` - |Fix| Fixed the handling of pandas missing values in HTML display of all estimators. By :user: `Dea María Léon `. :pr:`32341` :mod:`sklearn.calibration` -------------------------- - |Feature| Added temperature scaling method in :class:`calibration.CalibratedClassifierCV`. By :user:`Virgil Chan ` and :user:`Christian Lorentzen `. :pr:`31068` :mod:`sklearn.cluster` ---------------------- - |Efficiency| :func:`cluster.kmeans_plusplus` now uses `np.cumsum` directly without extra numerical stability checks and without casting to `np.float64`. By :user:`Tiziano Zito ` :pr:`31991` - |Fix| The default value of the `copy` parameter in :class:`cluster.HDBSCAN` will change from `False` to `True` in 1.10 to avoid data modification and maintain consistency with other estimators. By :user:`Sarthak Puri `. :pr:`31973` :mod:`sklearn.compose` ---------------------- - |Fix| The :class:`compose.ColumnTransformer` now correctly fits on data provided as a `polars.DataFrame` when any transformer has a sparse output. By :user:`Phillipp Gnan `. :pr:`32188` :mod:`sklearn.covariance` ------------------------- - |Efficiency| :class:`sklearn.covariance.GraphicalLasso`, :class:`sklearn.covariance.GraphicalLassoCV` and :func:`sklearn.covariance.graphical_lasso` with `mode="cd"` profit from the fit time performance improvement of :class:`sklearn.linear_model.Lasso` by means of gap safe screening rules. By :user:`Christian Lorentzen `. :pr:`31987` - |Fix| Fixed uncontrollable randomness in :class:`sklearn.covariance.GraphicalLasso`, :class:`sklearn.covariance.GraphicalLassoCV` and :func:`sklearn.covariance.graphical_lasso`. For `mode="cd"`, they now use cyclic coordinate descent. Before, it was random coordinate descent with uncontrollable random number seeding. By :user:`Christian Lorentzen `. :pr:`31987` - |Fix| Added correction to :class:`covariance.MinCovDet` to adjust for consistency at the normal distribution. This reduces the bias present when applying this method to data that is normally distributed. By :user:`Daniel Herrera-Esposito ` :pr:`32117` :mod:`sklearn.decomposition` ---------------------------- - |Efficiency| :class:`sklearn.decomposition.DictionaryLearning` and :class:`sklearn.decomposition.MiniBatchDictionaryLearning` with `fit_algorithm="cd"`, :class:`sklearn.decomposition.SparseCoder` with `transform_algorithm="lasso_cd"`, :class:`sklearn.decomposition.MiniBatchSparsePCA`, :class:`sklearn.decomposition.SparsePCA`, :func:`sklearn.decomposition.dict_learning` and :func:`sklearn.decomposition.dict_learning_online` with `method="cd"`, :func:`sklearn.decomposition.sparse_encode` with `algorithm="lasso_cd"` all profit from the fit time performance improvement of :class:`sklearn.linear_model.Lasso` by means of gap safe screening rules. By :user:`Christian Lorentzen `. :pr:`31987` - |Enhancement| :class:`decomposition.SparseCoder` now follows the transformer API of scikit-learn. In addition, the :meth:`fit` method now validates the input and parameters. By :user:`François Paugam `. :pr:`32077` - |Fix| Add input checks to the `inverse_transform` method of :class:`decomposition.PCA` and :class:`decomposition.IncrementalPCA`. :pr:`29310` by :user:`Ian Faust `. :pr:`29310` :mod:`sklearn.discriminant_analysis` ------------------------------------ - |Feature| Added `solver`, `covariance_estimator` and `shrinkage` in :class:`discriminant_analysis.QuadraticDiscriminantAnalysis`. The resulting class is more similar to :class:`discriminant_analysis.LinearDiscriminantAnalysis` and allows for more flexibility in the estimation of the covariance matrices. By :user:`Daniel Herrera-Esposito `. :pr:`32108` :mod:`sklearn.ensemble` ----------------------- - |Fix| :class:`ensemble.BaggingClassifier`, :class:`ensemble.BaggingRegressor` and :class:`ensemble.IsolationForest` now use `sample_weight` to draw the samples instead of forwarding them multiplied by a uniformly sampled mask to the underlying estimators. Furthermore, `max_samples` is now interpreted as a fraction of `sample_weight.sum()` instead of `X.shape[0]` when passed as a float. By :user:`Antoine Baker `. :pr:`31414` :mod:`sklearn.feature_selection` -------------------------------- - |Enhancement| :class:`feature_selection.SelectFromModel` now does not force `max_features` to be less than or equal to the number of input features. By :user:`Thibault ` :pr:`31939` :mod:`sklearn.gaussian_process` ------------------------------- - |Efficiency| make :class:`GaussianProcessRegressor.predict` faster when `return_cov` and `return_std` are both `False`. By :user:`Rafael Ayllón Gavilán `. :pr:`31431` :mod:`sklearn.linear_model` --------------------------- - |Efficiency| :class:`linear_model.ElasticNet` and :class:`linear_model.Lasso` with `precompute=False` use less memory for dense `X` and are a bit faster. Previously, they used twice the memory of `X` even for Fortran-contiguous `X`. By :user:`Christian Lorentzen ` :pr:`31665` - |Efficiency| :class:`linear_model.ElasticNet` and :class:`linear_model.Lasso` avoid double input checking and are therefore a bit faster. By :user:`Christian Lorentzen `. :pr:`31848` - |Efficiency| :class:`linear_model.ElasticNet`, :class:`linear_model.ElasticNetCV`, :class:`linear_model.Lasso`, :class:`linear_model.LassoCV`, :class:`linear_model.MultiTaskElasticNet`, :class:`linear_model.MultiTaskElasticNetCV`, :class:`linear_model.MultiTaskLasso` and :class:`linear_model.MultiTaskLassoCV` are faster to fit by avoiding a BLAS level 1 (axpy) call in the innermost loop. Same for functions :func:`linear_model.enet_path` and :func:`linear_model.lasso_path`. By :user:`Christian Lorentzen ` :pr:`31956` and :pr:`31880` - |Efficiency| :class:`linear_model.ElasticNetCV`, :class:`linear_model.LassoCV`, :class:`linear_model.MultiTaskElasticNetCV` and :class:`linear_model.MultiTaskLassoCV` avoid an additional copy of `X` with default `copy_X=True`. By :user:`Christian Lorentzen `. :pr:`31946` - |Efficiency| :class:`linear_model.ElasticNet`, :class:`linear_model.ElasticNetCV`, :class:`linear_model.Lasso`, :class:`linear_model.LassoCV`, :class:`linear_model.MultiTaskElasticNetCV`, :class:`linear_model.MultiTaskLassoCV` as well as :func:`linear_model.lasso_path` and :func:`linear_model.enet_path` now implement gap safe screening rules in the coordinate descent solver for dense and sparse `X`. The speedup of fitting time is particularly pronounced (10-times is possible) when computing regularization paths like the \*CV-variants of the above estimators do. There is now an additional check of the stopping criterion before entering the main loop of descent steps. As the stopping criterion requires the computation of the dual gap, the screening happens whenever the dual gap is computed. By :user:`Christian Lorentzen ` :pr:`31882`, :pr:`31986`, :pr:`31987` and :pr:`32014` - |Enhancement| :class:`linear_model.ElasticNet`, :class:`linear_model.ElasticNetCV`, :class:`linear_model.Lasso`, :class:`linear_model.LassoCV`, :class:`MultiTaskElasticNet`, :class:`MultiTaskElasticNetCV`, :class:`MultiTaskLasso`, :class:`MultiTaskLassoCV`, as well as :func:`linear_model.enet_path` and :func:`linear_model.lasso_path` now use `dual gap <= tol` instead of `dual gap < tol` as stopping criterion. The resulting coefficients might differ to previous versions of scikit-learn in rare cases. By :user:`Christian Lorentzen `. :pr:`31906` - |Fix| Fix the convergence criteria for SGD models, to avoid premature convergence when `tol != None`. This primarily impacts :class:`SGDOneClassSVM` but also affects :class:`SGDClassifier` and :class:`SGDRegressor`. Before this fix, only the loss function without penalty was used as the convergence check, whereas now, the full objective with regularization is used. By :user:`Guillaume Lemaitre ` and :user:`kostayScr ` :pr:`31856` - |Fix| The allowed parameter range for the initial learning rate `eta0` in :class:`linear_model.SGDClassifier`, :class:`linear_model.SGDOneClassSVM`, :class:`linear_model.SGDRegressor` and :class:`linear_model.Perceptron` changed from non-negative numbers to strictly positive numbers. As a consequence, the default `eta0` of :class:`linear_model.SGDClassifier` and :class:`linear_model.SGDOneClassSVM` changed from 0 to 0.01. But note that `eta0` is not used by the default learning rate "optimal" of those two estimators. By :user:`Christian Lorentzen `. :pr:`31933` - |Fix| :class:`linear_model.LogisticRegressionCV` is able to handle CV splits where some class labels are missing in some folds. Before, it raised an error whenever a class label were missing in a fold. By :user:`Christian Lorentzen :pr:`32747` - |API| :class:`linear_model.PassiveAggressiveClassifier` and :class:`linear_model.PassiveAggressiveRegressor` are deprecated and will be removed in 1.10. Equivalent estimators are available with :class:`linear_model.SGDClassifier` and :class:`SGDRegressor`, both of which expose the options `learning_rate="pa1"` and `"pa2"`. The parameter `eta0` can be used to specify the aggressiveness parameter of the Passive-Aggressive-Algorithms, called C in the reference paper. By :user:`Christian Lorentzen ` :pr:`31932` and :pr:`29097` - |API| :class:`linear_model.SGDClassifier`, :class:`linear_model.SGDRegressor`, and :class:`linear_model.SGDOneClassSVM` now deprecate negative values for the `power_t` parameter. Using a negative value will raise a warning in version 1.8 and will raise an error in version 1.10. A value in the range [0.0, inf) must be used instead. By :user:`Ritvi Alagusankar ` :pr:`31474` - |API| Raising error in :class:`sklearn.linear_model.LogisticRegression` when liblinear solver is used and input X values are larger than 1e30, the liblinear solver freezes otherwise. By :user:`Shruti Nath `. :pr:`31888` - |API| :class:`linear_model.LogisticRegressionCV` got a new parameter `use_legacy_attributes` to control the types and shapes of the fitted attributes `C_`, `l1_ratio_`, `coefs_paths_`, `scores_` and `n_iter_`. The current default value `True` keeps the legacy behaviour. If `False` then: - ``C_`` is a float. - ``l1_ratio_`` is a float. - ``coefs_paths_`` is an ndarray of shape (n_folds, n_l1_ratios, n_cs, n_classes, n_features). For binary problems (n_classes=2), the 2nd last dimension is 1. - ``scores_`` is an ndarray of shape (n_folds, n_l1_ratios, n_cs). - ``n_iter_`` is an ndarray of shape (n_folds, n_l1_ratios, n_cs). In version 1.10, the default will change to `False` and `use_legacy_attributes` will be deprecated. In 1.12 `use_legacy_attributes` will be removed. By :user:`Christian Lorentzen `. :pr:`32114` - |API| The `n_jobs` parameter of :class:`linear_model.LogisticRegression` is deprecated and will be removed in 1.10. It has no effect since 1.8. By :user:`Loïc Estève `. :pr:`32742` :mod:`sklearn.manifold` ----------------------- - |MajorFeature| :class:`manifold.ClassicalMDS` was implemented to perform classical MDS (eigendecomposition of the double-centered distance matrix). By :user:`Dmitry Kobak ` and :user:`Meekail Zain ` :pr:`31322` - |Feature| :class:`manifold.MDS` now supports arbitrary distance metrics (via `metric` and `metric_params` parameters) and initialization via classical MDS (via `init` parameter). The `dissimilarity` parameter was deprecated. The old `metric` parameter was renamed into `metric_mds`. By :user:`Dmitry Kobak ` :pr:`32229` - |Feature| :class:`manifold.TSNE` now supports PCA initialization with sparse input matrices. By :user:`Arturo Amor `. :pr:`32433` :mod:`sklearn.metrics` ---------------------- - |Feature| :func:`metrics.d2_brier_score` has been added which calculates the D^2 for the Brier score. By :user:`Omar Salman `. :pr:`28971` - |Feature| Add :func:`metrics.confusion_matrix_at_thresholds` function that returns the number of true negatives, false positives, false negatives and true positives per threshold. By :user:`Success Moses `. :pr:`30134` - |Efficiency| Avoid redundant input validation in :func:`metrics.d2_log_loss_score` leading to a 1.2x speedup in large scale benchmarks. By :user:`Olivier Grisel ` and :user:`Omar Salman ` :pr:`32356` - |Enhancement| :func:`metrics.median_absolute_error` now supports Array API compatible inputs. By :user:`Lucy Liu `. :pr:`31406` - |Enhancement| Improved the error message for sparse inputs for the following metrics: :func:`metrics.accuracy_score`, :func:`metrics.multilabel_confusion_matrix`, :func:`metrics.jaccard_score`, :func:`metrics.zero_one_loss`, :func:`metrics.f1_score`, :func:`metrics.fbeta_score`, :func:`metrics.precision_recall_fscore_support`, :func:`metrics.class_likelihood_ratios`, :func:`metrics.precision_score`, :func:`metrics.recall_score`, :func:`metrics.classification_report`, :func:`metrics.hamming_loss`. By :user:`Lucy Liu `. :pr:`32047` - |Fix| :func:`metrics.median_absolute_error` now uses `_averaged_weighted_percentile` instead of `_weighted_percentile` to calculate median when `sample_weight` is not `None`. This is equivalent to using the "averaged_inverted_cdf" instead of the "inverted_cdf" quantile method, which gives results equivalent to `numpy.median` if equal weights used. By :user:`Lucy Liu ` :pr:`30787` - |Fix| Additional `sample_weight` checking has been added to :func:`metrics.accuracy_score`, :func:`metrics.balanced_accuracy_score`, :func:`metrics.brier_score_loss`, :func:`metrics.class_likelihood_ratios`, :func:`metrics.classification_report`, :func:`metrics.cohen_kappa_score`, :func:`metrics.confusion_matrix`, :func:`metrics.f1_score`, :func:`metrics.fbeta_score`, :func:`metrics.hamming_loss`, :func:`metrics.jaccard_score`, :func:`metrics.matthews_corrcoef`, :func:`metrics.multilabel_confusion_matrix`, :func:`metrics.precision_recall_fscore_support`, :func:`metrics.precision_score`, :func:`metrics.recall_score` and :func:`metrics.zero_one_loss`. `sample_weight` can only be 1D, consistent to `y_true` and `y_pred` in length,and all values must be finite and not complex. By :user:`Lucy Liu `. :pr:`31701` - |Fix| `y_pred` is deprecated in favour of `y_score` in :func:`metrics.DetCurveDisplay.from_predictions` and :func:`metrics.PrecisionRecallDisplay.from_predictions`. `y_pred` will be removed in v1.10. By :user:`Luis ` :pr:`31764` - |Fix| `repr` on a scorer which has been created with a `partial` `score_func` now correctly works and uses the `repr` of the given `partial` object. By `Adrin Jalali`_. :pr:`31891` - |Fix| kwargs specified in the `curve_kwargs` parameter of :meth:`metrics.RocCurveDisplay.from_cv_results` now only overwrite their corresponding default value before being passed to Matplotlib's `plot`. Previously, passing any `curve_kwargs` would overwrite all default kwargs. By :user:`Lucy Liu `. :pr:`32313` - |Fix| Registered named scorer objects for :func:`metrics.d2_brier_score` and :func:`metrics.d2_log_loss_score` and updated their input validation to be consistent with related metric functions. By :user:`Olivier Grisel ` and :user:`Omar Salman ` :pr:`32356` - |Fix| :meth:`metrics.RocCurveDisplay.from_cv_results` will now infer `pos_label` as `estimator.classes_[-1]`, using the estimator from `cv_results`, when `pos_label=None`. Previously, an error was raised when `pos_label=None`. By :user:`Lucy Liu `. :pr:`32372` - |Fix| All classification metrics now raise a `ValueError` when required input arrays (`y_pred`, `y_true`, `y1`, `y2`, `pred_decision`, or `y_proba`) are empty. Previously, `accuracy_score`, `class_likelihood_ratios`, `classification_report`, `confusion_matrix`, `hamming_loss`, `jaccard_score`, `matthews_corrcoef`, `multilabel_confusion_matrix`, and `precision_recall_fscore_support` did not raise this error consistently. By :user:`Stefanie Senger `. :pr:`32549` - |API| :func:`metrics.cluster.entropy` is deprecated and will be removed in v1.10. By :user:`Lucy Liu ` :pr:`31294` - |API| The `estimator_name` parameter is deprecated in favour of `name` in :class:`metrics.PrecisionRecallDisplay` and will be removed in 1.10. By :user:`Lucy Liu `. :pr:`32310` :mod:`sklearn.model_selection` ------------------------------ - |Enhancement| :class:`model_selection.StratifiedShuffleSplit` will now specify which classes have too few members when raising a ``ValueError`` if any class has less than 2 members. This is useful to identify which classes are causing the error. By :user:`Marc Bresson ` :pr:`32265` - |Fix| Fix shuffle behaviour in :class:`model_selection.StratifiedGroupKFold`. Now stratification among folds is also preserved when `shuffle=True`. By :user:`Pau Folch `. :pr:`32540` :mod:`sklearn.multiclass` ------------------------- - |Fix| Fix tie-breaking behavior in :class:`multiclass.OneVsRestClassifier` to match `np.argmax` tie-breaking behavior. By :user:`Lakshmi Krishnan `. :pr:`15504` :mod:`sklearn.naive_bayes` -------------------------- - |Fix| :class:`naive_bayes.GaussianNB` preserves the dtype of the fitted attributes according to the dtype of `X`. By :user:`Omar Salman ` :pr:`32497` :mod:`sklearn.preprocessing` ---------------------------- - |Enhancement| :class:`preprocessing.SplineTransformer` can now handle missing values with the parameter `handle_missing`. By :user:`Stefanie Senger `. :pr:`28043` - |Enhancement| The :class:`preprocessing.PowerTransformer` now returns a warning when NaN values are encountered in the inverse transform, `inverse_transform`, typically caused by extremely skewed data. By :user:`Roberto Mourao ` :pr:`29307` - |Enhancement| :class:`preprocessing.MaxAbsScaler` can now clip out-of-range values in held-out data with the parameter `clip`. By :user:`Hleb Levitski `. :pr:`31790` :mod:`sklearn.semi_supervised` ------------------------------ - |Fix| User written kernel results are now normalized in :class:`semi_supervised.LabelPropagation` so all row sums equal 1 even if kernel gives asymmetric or non-uniform row sums. By :user:`Dan Schult `. :pr:`31924` :mod:`sklearn.tree` ------------------- - |Efficiency| :class:`tree.DecisionTreeRegressor` with `criterion="absolute_error"` now runs much faster: O(n log n) complexity against previous O(n^2) allowing to scale to millions of data points, even hundred of millions. By :user:`Arthur Lacote ` :pr:`32100` - |Fix| Make :func:`tree.export_text` thread-safe. By :user:`Olivier Grisel `. :pr:`30041` - |Fix| :func:`~sklearn.tree.export_graphviz` now raises a `ValueError` if given feature names are not all strings. By :user:`Guilherme Peixoto ` :pr:`31036` - |Fix| :class:`tree.DecisionTreeRegressor` with `criterion="absolute_error"` would sometimes make sub-optimal splits (i.e. splits that don't minimize the absolute error). Now it's fixed. Hence retraining trees might gives slightly different results. By :user:`Arthur Lacote ` :pr:`32100` - |Fix| Fixed a regression in :ref:`decision trees ` where almost constant features were not handled properly. By :user:`Sercan Turkmen `. :pr:`32259` - |Fix| Fix handling of missing values in method :func:`decision_path` of trees (:class:`tree.DecisionTreeClassifier`, :class:`tree.DecisionTreeRegressor`, :class:`tree.ExtraTreeClassifier` and :class:`tree.ExtraTreeRegressor`) By :user:`Arthur Lacote `. :pr:`32280` - |Fix| Fix decision tree splitting with missing values present in some features. In some cases the last non-missing sample would not be partitioned correctly. By :user:`Tim Head ` and :user:`Arthur Lacote `. :pr:`32351` :mod:`sklearn.utils` -------------------- - |Efficiency| The function :func:`sklearn.utils.extmath.safe_sparse_dot` was improved by a dedicated Cython routine for the case of `a @ b` with sparse 2-dimensional `a` and `b` and when a dense output is required, i.e., `dense_output=True`. This improves several algorithms in scikit-learn when dealing with sparse arrays (or matrices). By :user:`Christian Lorentzen `. :pr:`31952` - |Enhancement| The parameter table in the HTML representation of all scikit-learn estimators and more generally of estimators inheriting from :class:`base.BaseEstimator` now displays the parameter description as a tooltip and has a link to the online documentation for each parameter. By :user:`Dea María Léon `. :pr:`31564` - |Enhancement| ``sklearn.utils._check_sample_weight`` now raises a clearer error message when the provided weights are neither a scalar nor a 1-D array-like of the same size as the input data. By :user:`Kapil Parekh `. :pr:`31873` - |Enhancement| :func:`sklearn.utils.estimator_checks.parametrize_with_checks` now lets you configure strict mode for xfailing checks. Tests that unexpectedly pass will lead to a test failure. The default behaviour is unchanged. By :user:`Tim Head `. :pr:`31951` - |Enhancement| Fixed the alignment of the "?" and "i" symbols and improved the color style of the HTML representation of estimators. By :user:`Guillaume Lemaitre `. :pr:`31969` - |Fix| Changes the way color are chosen when displaying an estimator as an HTML representation. Colors are not adapted anymore to the user's theme, but chosen based on theme declared color scheme (light or dark) for VSCode and JupyterLab. If theme does not declare a color scheme, scheme is chosen according to default text color of the page, if it fails fallbacks to a media query. By :user:`Matt J. `. :pr:`32330` - |API| :func:`utils.extmath.stable_cumsum` is deprecated and will be removed in v1.10. Use `np.cumulative_sum` with the desired dtype directly instead. By :user:`Tiziano Zito `. :pr:`32258` .. rubric:: Code and documentation contributors Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.7, including: TODO: update at the time of the release.