.. include:: _contributors.rst .. currentmodule:: sklearn .. _changes_1_2: Version 1.2.0 ============= **In Development** .. include:: changelog_legend.inc Changed models -------------- The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures. - |Enhancement| The default `eigen_tol` for :class:`cluster.SpectralClustering`, :class:`manifold.SpectralEmbedding`, :func:`cluster.spectral_clustering`, and :func:`manifold.spectral_embedding` is now `None` when using the `'amg'` or `'lobpcg'` solvers. This change improves numerical stability of the solver, but may result in a different model. - |Fix| :class:`manifold.TSNE` now throws a `ValueError` when fit with `perplexity>=n_samples` to ensure mathematical correctness of the algorithm. :pr:`10805` by :user:`Mathias Andersen ` and :pr:`23471` by :user:`Meekail Zain ` Changes impacting all modules ----------------------------- - |Enhancement| Finiteness checks (detection of NaN and infinite values) in all estimators are now significantly more efficient for float32 data by leveraging NumPy's SIMD optimized primitives. :pr:`23446` by :user:`Meekail Zain ` Changelog --------- .. Entries should be grouped by module (in alphabetic order) and prefixed with one of the labels: |MajorFeature|, |Feature|, |Efficiency|, |Enhancement|, |Fix| or |API| (see whats_new.rst for descriptions). Entries should be ordered by those labels (e.g. |Fix| after |Efficiency|). Changes not specific to a module should be listed under *Multiple Modules* or *Miscellaneous*. Entries should end with: :pr:`123456` by :user:`Joe Bloggs `. where 123456 is the *pull request* number, not the issue number. :mod:`sklearn.calibration` .......................... - |API| Rename `base_estimator` to `estimator` in :class:`CalibratedClassifierCV` to improve readability and consistency. The parameter `base_estimator` is deprecated and will be removed in 1.4. :pr:`22054` by :user:`Kevin Roice `. :mod:`sklearn.cluster` ...................... - |Enhancement| The `predict` and `fit_predict` methods of :class:`cluster.OPTICS` now accept sparse data type for input data. :pr:`14736` by :user:`Hunt Zhan `, :pr:`20802` by :user:`Brandon Pokorny `, and :pr:`22965` by :user:`Meekail Zain `. - |Enhancement| :class:`cluster.Birch` now preserves dtype for `numpy.float32` inputs. :pr:`22968` by `Meekail Zain `. - |Enhancement| :class:`cluster.KMeans` and :class:`cluster.MiniBatchKMeans` now accept a new `'auto'` option for `n_init` which changes the number of random initializations to one when using `init='k-means++'` for efficiency. This begins deprecation for the default values of `n_init` in the two classes and both will have their defaults changed to `n_init='auto'` in 1.4. :pr:`23038` by :user:`Meekail Zain `. - |Enhancement| :class:`cluster.SpectralClustering` and :func:`cluster.spectral_clustering` now propogates the `eigen_tol` parameter to all choices of `eigen_solver`. Includes a new option `eigen_tol="auto"` and begins deprecation to change the default from `eigen_tol=0` to `eigen_tol="auto"` in version 1.3. :pr:`23210` by :user:`Meekail Zain `. :mod:`sklearn.datasets` ....................... - |Enhancement| Introduce the new parameter `parser` in :func:`datasets.fetch_openml`. `parser="pandas"` allows to use the very CPU and memory efficient `pandas.read_csv` parser to load dense ARFF formatted dataset files. It is possible to pass `parser="liac-arff"` to use the old LIAC parser. When `parser="auto"`, dense datasets are loaded with "pandas" and sparse datasets are loaded with "liac-arff". Currently, `parser="liac-arff"` by default and will change to `parser="auto"` in version 1.4 :pr:`21938` by :user:`Guillaume Lemaitre `. - |Enhancement| :func:`datasets.dump_svmlight_file` is now accelerated with a Cython implementation, providing 2-4x speedups. :pr:`23127` by :user:`Meekail Zain ` :mod:`sklearn.decomposition` ............................ - |Efficiency| :func:`decomposition.FastICA.fit` has been optimised w.r.t its memory footprint and runtime. :pr:`22268` by :user:`MohamedBsh `. :mod:`sklearn.ensemble` ....................... - |Efficiency| Improve runtime performance of :class:`ensemble.IsolationForest` by avoiding data copies. :pr:`23252` by :user:`Zhehao Liu `. :mod:`sklearn.decomposition` ............................ - |Enhancement| :class:`decomposition.FastICA` now allows the user to select how whitening is performed through the new `whiten_solver` parameter, which supports `svd` and `eigh`. `whiten_solver` defaults to `svd` although `eigh` may be faster and more memory efficient in cases where `num_features > num_samples`. An additional `sign_flip` parameter is added. When `sign_flip=True`, then the output of both solvers will be reconciled during `fit` so that their outputs match. This may change the output of the default solver, and hence may not be backwards compatible. :pr:`11860` by :user:`Pierre Ablin `, :pr:`22527` by :user:`Meekail Zain ` and `Thomas Fan`_. :mod:`sklearn.impute` ..................... - |Fix| :class:`impute.SimpleImputer` uses the dtype seen in `fit` for `transform` when the dtype is object. :pr:`22063` by `Thomas Fan`_. :mod:`sklearn.linear_model` ........................... - |Fix| Use dtype-aware tolerances for the validation of gram matrices (passed by users or precomputed). :pr:`22059` by :user:`Malte S. Kurz `. - |Fix| Fixed an error in :class:`linear_model.LogisticRegression` with `solver="newton-cg"`, `fit_intercept=True`, and a single feature. :pr:`23608` by `Tom Dupre la Tour`_. - |API| The default value for the `solver` parameter in :class:`linear_model.QuantileRegressor` will change from `"interior-point"` to `"highs"` in version 1.4. :pr:`23637` by :user:`Guillaume Lemaitre `. :mod:`sklearn.metrics` ...................... - |Feature| :func:`metrics.class_likelihood_ratios` is added to compute the positive and negative likelihood ratios derived from the confusion matrix of a binary classification problem. :pr:`22518` by :user:`Arturo Amor `. - |Fix| :func:`metrics.ndcg_score` will now trigger a warning when the `y_true` value contains a negative value. Users may still use negative values, but the result may not be between 0 and 1. Starting in v1.4, passing in negative values for `y_true` will raise an error. :pr:`22710` by :user:`Conroy Trinh ` and :pr:`23461` by :user:`Meekail Zain `. - |Fix| Fixed error message of :class:`metrics.coverage_error` for 1D array input. :pr:`23548` by :user:`Hao Chun Chang `. :mod:`sklearn.neighbors` ........................ - |Enhancement| :class:`neighbors.KernelDensity` bandwidth parameter now accepts definition using Scott's and Silvermann's estimation methods. :pr:`10468` by :user:`Ruben ` and :pr:`22993` by :user:`Jovan Stojanovic `. - |Feature| Adds new function :func:`neighbors.sort_graph_by_row_values` to sort a CSR sparse graph such that each row is stored with increasing values. This is useful to improve efficiency when using precomputed sparse distance matrices in a variety of estimators and avoid an `EfficiencyWarning`. :pr:`23139` by `Tom Dupre la Tour`_. :mod:`sklearn.svm` .................. - |API| The `class_weight_` attribute is now deprecated for :class:`svm.NuSVR`, :class:`svm.SVR`, :class:`svm.OneClassSVM`. :pr:`22898` by :user:`Meekail Zain `. :mod:`sklearn.tree` ................... - |Enhancement| :func:`tree.plot_tree`, :func:`tree.export_graphviz` now uses a lower case `x[i]` to represent feature `i`. :pr:`23480` by `Thomas Fan`_. - |Fix| Fixed invalid memory access bug during fit in :class:`tree.DecisionTreeRegressor` and :class:`tree.DecisionTreeClassifier`. :pr:`23273` by `Thomas Fan`_. :mod:`sklearn.utils` .................... - |Enhancement| :func:`utils.extmath.randomized_svd` now accepts an argument, `lapack_svd_driver`, to specify the lapack driver used in the internal deterministic SVD used by the randomized SVD algorithm. :pr:`20617` by :user:`Srinath Kailasa ` :mod:`sklearn.manifold` ....................... - |Enhancement| Adds `eigen_tol` parameter to :class:`manifold.SpectralEmbedding`. Both :func:`manifold.spectral_embedding` and :class:`manifold.SpectralEmbedding` now propogate `eigen_tol` to all choices of `eigen_solver`. Includes a new option `eigen_tol="auto"` and begins deprecation to change the default from `eigen_tol=0` to `eigen_tol="auto"` in version 1.3. :pr:`23210` by :user:`Meekail Zain `. - |Fix| :class:`manifold.TSNE` now throws a `ValueError` when fit with `perplexity>=n_samples` to ensure mathematical correctness of the algorithm. :pr:`10805` by :user:`Mathias Andersen ` and :pr:`23471` by :user:`Meekail Zain ` Code and Documentation Contributors ----------------------------------- Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.1, including: TODO: update at the time of the release.