Version 1.1.0

In Development

Legend for changelogs

  • Major Feature : something big that you couldn’t do before.

  • Feature : something that you couldn’t do before.

  • Efficiency : an existing feature now may not require as much computation or memory.

  • Enhancement : a miscellaneous minor improvement.

  • Fix : something that previously didn’t work as documentated – or according to reasonable expectations – should now work.

  • API Change : you will need to change your code to have the same effect in the future; or a feature will be removed in the future.

Minimal dependencies

Version 1.1.0 of scikit-learn requires python 3.7+, numpy 1.14.6+ and scipy 1.1.0+. Optional minimal dependency is matplotlib 2.2.3+.

Put the changes in their relevant module.

Changed models

The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.

  • Efficiency cluster.KMeans now defaults to algorithm="lloyd" instead of algorithm="auto", which was equivalent to algorithm="elkan". Lloyd’s algorithm and Elkan’s algorithm converge to the same solution, up to numerical rounding errors, but in general Lloyd’s algorithm uses much less memory, and it is often faster.

  • Fix The eigenvectors initialization for cluster.SpectralClustering and manifold.SpectralEmbedding now samples from a Gaussian when using the 'amg' or 'lobpcg' solver. This change improves numerical stability of the solver, but may result in a different model.

  • Fix feature_selection.f_regression and feature_selection.r_regression will now returned finite score by default instead of np.nan and np.inf for some corner case. You can use force_finite=False if you really want to get non-finite values and keep the old behavior.

Changelog

  • Enhancement All scikit-learn models now generate a more informative error message when some input contains unexpected NaN or infinite values. In particular the message contains the input name (“X”, “y” or “sample_weight”) and if an unexpected NaN value is found in X, the error message suggests potential solutions. #21219 by Olivier Grisel.

  • Enhancement All scikit-learn models now generate a more informative error message when setting invalid hyper-parameters with set_params. #21542 by Olivier Grisel.

sklearn.calibration

sklearn.cluster

  • Enhancement cluster.SpectralClustering and cluster.spectral now include the new 'cluster_qr' method from cluster.cluster_qr that clusters samples in the embedding space as an alternative to the existing 'kmeans' and 'discrete' methods. See cluster.spectral_clustering for more details. #21148 by Andrew Knyazev

  • Efficiency In cluster.KMeans, the default algorithm is now "lloyd" which is the full classical EM-style algorithm. Both "auto" and "full" are deprecated and will be removed in version 1.3. They are now aliases for "lloyd". The previous default was "auto", which relied on Elkan’s algorithm. Lloyd’s algorithm uses less memory than Elkan’s, it is faster on many datasets, and its results are identical, hence the change. #21735 by Aurélien Geron.

  • Enhancement cluster.SpectralClustering now raises consistent error messages when passed invalid values for n_clusters, n_init, gamma, n_neighbors, eigen_tol or degree. #21881 by Hugo Vassard.

sklearn.cross_decomposition

sklearn.discriminant_analysis

sklearn.feature_extraction

  • Feature Added auto mode to feature_selection.SequentialFeatureSelection. If the argument n_features_to_select is 'auto', select features until the score improvement does not exceed the argument tol. The default value of n_features_to_select changed from None to ‘warn’ in 1.1 and will become 'auto' in 1.3. None and 'warn' will be removed in 1.3. #20145 by @murata-yu.

sklearn.feature_selection

sklearn.datasets

sklearn.decomposition

sklearn.ensemble

sklearn.feature_selection

sklearn.feature_extraction

sklearn.feature_extraction.text

sklearn.feature_selection

sklearn.gaussian_process

sklearn.impute

sklearn.linear_model

sklearn.metrics

  • Feature r2_score and explained_variance_score have a new force_finite parameter. Setting this parameter to False will return the actual non-finite score in case of perfect predictions or constant y_true, instead of the finite approximation (1.0 and 0.0 respectively) currently returned by default. #17266 by Sylvain Marié.

  • API Change metrics.DistanceMetric has been moved from sklearn.neighbors to sklearn.metric. Using neighbors.DistanceMetric for imports is still valid for backward compatibility, but this alias will be removed in 1.3. #21177 by Julien Jerphanion.

  • API Change Parameters sample_weight and multioutput of metrics. mean_absolute_percentage_error are now keyword-only, in accordance with SLEP009. A deprecation cycle was introduced. #21576 by Paul-Emile Dugnat.

  • API Change The "wminkowski" metric of sklearn.metrics.DistanceMetric is deprecated and will be removed in version 1.3. Instead the existing "minkowski" metric now takes in an optional w parameter for weights. This deprecation aims at remaining consistent with SciPy 1.8 convention. #21873 by Yar Khine Phyo

  • Fix metrics.silhouette_score now supports integer input for precomputed distances. #22108 by Thomas Fan.

sklearn.manifold

sklearn.model_selection

sklearn.mixture

sklearn.neighbors

sklearn.neural_network

sklearn.pipeline

sklearn.preprocessing

  • Enhancement Adds a subsample parameter to preprocessing.KBinsDiscretizer. This allows specifying a maximum number of samples to be used while fitting the model. The option is only available when strategy is set to quantile. #21445 by Felipe Bidu and Amanda Dsouza.

  • Enhancement Added the get_feature_names_out method and a new parameter feature_names_out to preprocessing.FunctionTransformer. You can set feature_names_out to ‘one-to-one’ to use the input features names as the output feature names, or you can set it to a callable that returns the output feature names. This is especially useful when the transformer changes the number of features. If feature_names_out is None (which is the default), then get_output_feature_names is not defined. #21569 by Aurélien Geron.

  • Fix preprocessing.LabelBinarizer now validates input parameters in fit instead of __init__. #21434 by Krum Arnaudov.

sklearn.random_projection

sklearn.svm

sklearn.utils

Code and Documentation Contributors

Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.0, including:

TODO: update at the time of the release.