API Reference

This is the class and function reference of scikit-learn. Please refer to the full user guide for further details, as the class and function raw specifications may not be enough to give full guidelines on their uses. For reference on concepts repeated across the API, see Glossary of Common Terms and API Elements.

sklearn: Settings and information tools

The sklearn module includes functions to configure global settings and get information about the working environment.

config_context(*[, assume_finite, ...])

Context manager for global scikit-learn configuration.


Retrieve current values for configuration set by set_config.

set_config([assume_finite, working_memory, ...])

Set global scikit-learn configuration.


Print useful debugging information"

sklearn.base: Base classes and utility functions

Base classes for all estimators.

Base classes


Base class for all estimators in scikit-learn.


Mixin class for all bicluster estimators in scikit-learn.


Mixin class for all classifiers in scikit-learn.


Mixin class for all cluster estimators in scikit-learn.


Mixin class for all density estimators in scikit-learn.


Mixin class for all regression estimators in scikit-learn.


Mixin class for all transformers in scikit-learn.


Mixin class for all meta estimators in scikit-learn.


Provides get_feature_names_out for simple transformers.


Mixin class for all outlier detection estimators in scikit-learn.


Mixin class for transformers that generate their own names by prefixing.


Transformer mixin that performs feature selection given a support mask


base.clone(estimator, *[, safe])

Construct a new unfitted estimator with the same parameters.


Return True if the given estimator is (probably) a classifier.


Return True if the given estimator is (probably) a regressor.

sklearn.calibration: Probability Calibration

Calibration of predicted probabilities.

User guide: See the Probability calibration section for further details.


Probability calibration with isotonic regression or logistic regression.

calibration.calibration_curve(y_true, y_prob, *)

Compute true and predicted probabilities for a calibration curve.

sklearn.cluster: Clustering

The sklearn.cluster module gathers popular unsupervised clustering algorithms.

User guide: See the Clustering and Biclustering sections for further details.


cluster.AffinityPropagation(*[, damping, ...])

Perform Affinity Propagation Clustering of data.


Agglomerative Clustering.

cluster.Birch(*[, threshold, ...])

Implements the BIRCH clustering algorithm.

cluster.DBSCAN([eps, min_samples, metric, ...])

Perform DBSCAN clustering from vector array or distance matrix.

cluster.HDBSCAN([min_cluster_size, ...])

Cluster data using hierarchical density-based clustering.

cluster.FeatureAgglomeration([n_clusters, ...])

Agglomerate features.

cluster.KMeans([n_clusters, init, n_init, ...])

K-Means clustering.

cluster.BisectingKMeans([n_clusters, init, ...])

Bisecting K-Means clustering.

cluster.MiniBatchKMeans([n_clusters, init, ...])

Mini-Batch K-Means clustering.

cluster.MeanShift(*[, bandwidth, seeds, ...])

Mean shift clustering using a flat kernel.

cluster.OPTICS(*[, min_samples, max_eps, ...])

Estimate clustering structure from vector array.

cluster.SpectralClustering([n_clusters, ...])

Apply clustering to a projection of the normalized Laplacian.

cluster.SpectralBiclustering([n_clusters, ...])

Spectral biclustering (Kluger, 2003).

cluster.SpectralCoclustering([n_clusters, ...])

Spectral Co-Clustering algorithm (Dhillon, 2001).


cluster.affinity_propagation(S, *[, ...])

Perform Affinity Propagation Clustering of data.

cluster.cluster_optics_dbscan(*, ...)

Perform DBSCAN extraction for an arbitrary epsilon.

cluster.cluster_optics_xi(*, reachability, ...)

Automatically extract clusters according to the Xi-steep method.

cluster.compute_optics_graph(X, *, ...)

Compute the OPTICS reachability graph.

cluster.dbscan(X[, eps, min_samples, ...])

Perform DBSCAN clustering from vector array or distance matrix.

cluster.estimate_bandwidth(X, *[, quantile, ...])

Estimate the bandwidth to use with the mean-shift algorithm.

cluster.k_means(X, n_clusters, *[, ...])

Perform K-means clustering algorithm.

cluster.kmeans_plusplus(X, n_clusters, *[, ...])

Init n_clusters seeds according to k-means++.

cluster.mean_shift(X, *[, bandwidth, seeds, ...])

Perform mean shift clustering of data using a flat kernel.

cluster.spectral_clustering(affinity, *[, ...])

Apply clustering to a projection of the normalized Laplacian.

cluster.ward_tree(X, *[, connectivity, ...])

Ward clustering based on a Feature matrix.

sklearn.compose: Composite Estimators

Meta-estimators for building composite models with transformers

In addition to its current contents, this module will eventually be home to refurbished versions of Pipeline and FeatureUnion.

User guide: See the Pipelines and composite estimators section for further details.

compose.ColumnTransformer(transformers, *[, ...])

Applies transformers to columns of an array or pandas DataFrame.


Meta-estimator to regress on a transformed target.


Construct a ColumnTransformer from the given transformers.

compose.make_column_selector([pattern, ...])

Create a callable to select columns to be used with ColumnTransformer.

sklearn.covariance: Covariance Estimators

The sklearn.covariance module includes methods and algorithms to robustly estimate the covariance of features given a set of points. The precision matrix defined as the inverse of the covariance is also estimated. Covariance estimation is closely related to the theory of Gaussian Graphical Models.

User guide: See the Covariance estimation section for further details.

covariance.EmpiricalCovariance(*[, ...])

Maximum likelihood covariance estimator.

covariance.EllipticEnvelope(*[, ...])

An object for detecting outliers in a Gaussian distributed dataset.

covariance.GraphicalLasso([alpha, mode, ...])

Sparse inverse covariance estimation with an l1-penalized estimator.

covariance.GraphicalLassoCV(*[, alphas, ...])

Sparse inverse covariance w/ cross-validated choice of the l1 penalty.

covariance.LedoitWolf(*[, store_precision, ...])

LedoitWolf Estimator.

covariance.MinCovDet(*[, store_precision, ...])

Minimum Covariance Determinant (MCD): robust estimator of covariance.

covariance.OAS(*[, store_precision, ...])

Oracle Approximating Shrinkage Estimator as proposed in [R69773891e6a6-1].

covariance.ShrunkCovariance(*[, ...])

Covariance estimator with shrinkage.

covariance.empirical_covariance(X, *[, ...])

Compute the Maximum likelihood covariance estimator.

covariance.graphical_lasso(emp_cov, alpha, *)

L1-penalized covariance estimator.

covariance.ledoit_wolf(X, *[, ...])

Estimate the shrunk Ledoit-Wolf covariance matrix.

covariance.ledoit_wolf_shrinkage(X[, ...])

Estimate the shrunk Ledoit-Wolf covariance matrix.

covariance.oas(X, *[, assume_centered])

Estimate covariance with the Oracle Approximating Shrinkage as proposed in [Rca3a42e5ec35-1].

covariance.shrunk_covariance(emp_cov[, ...])

Calculate covariance matrices shrunk on the diagonal.

sklearn.cross_decomposition: Cross decomposition

User guide: See the Cross decomposition section for further details.

cross_decomposition.CCA([n_components, ...])

Canonical Correlation Analysis, also known as "Mode B" PLS.


Partial Least Squares transformer and regressor.


PLS regression.

cross_decomposition.PLSSVD([n_components, ...])

Partial Least Square SVD.

sklearn.datasets: Datasets

The sklearn.datasets module includes utilities to load datasets, including methods to load and fetch popular reference datasets. It also features some artificial data generators.

User guide: See the Dataset loading utilities section for further details.



Delete all the content of the data home cache.

datasets.dump_svmlight_file(X, y, f, *[, ...])

Dump the dataset in svmlight / libsvm file format.

datasets.fetch_20newsgroups(*[, data_home, ...])

Load the filenames and data from the 20 newsgroups dataset (classification).

datasets.fetch_20newsgroups_vectorized(*[, ...])

Load and vectorize the 20 newsgroups dataset (classification).

datasets.fetch_california_housing(*[, ...])

Load the California housing dataset (regression).

datasets.fetch_covtype(*[, data_home, ...])

Load the covertype dataset (classification).

datasets.fetch_kddcup99(*[, subset, ...])

Load the kddcup99 dataset (classification).

datasets.fetch_lfw_pairs(*[, subset, ...])

Load the Labeled Faces in the Wild (LFW) pairs dataset (classification).

datasets.fetch_lfw_people(*[, data_home, ...])

Load the Labeled Faces in the Wild (LFW) people dataset (classification).

datasets.fetch_olivetti_faces(*[, ...])

Load the Olivetti faces data-set from AT&T (classification).

datasets.fetch_openml([name, version, ...])

Fetch dataset from openml by name or dataset id.

datasets.fetch_rcv1(*[, data_home, subset, ...])

Load the RCV1 multilabel dataset (classification).

datasets.fetch_species_distributions(*[, ...])

Loader for species distribution dataset from Phillips et.


Return the path of the scikit-learn data directory.

datasets.load_breast_cancer(*[, return_X_y, ...])

Load and return the breast cancer wisconsin dataset (classification).

datasets.load_diabetes(*[, return_X_y, ...])

Load and return the diabetes dataset (regression).

datasets.load_digits(*[, n_class, ...])

Load and return the digits dataset (classification).

datasets.load_files(container_path, *[, ...])

Load text files with categories as subfolder names.

datasets.load_iris(*[, return_X_y, as_frame])

Load and return the iris dataset (classification).

datasets.load_linnerud(*[, return_X_y, as_frame])

Load and return the physical exercise Linnerud dataset.


Load the numpy array of a single sample image.


Load sample images for image manipulation.

datasets.load_svmlight_file(f, *[, ...])

Load datasets in the svmlight / libsvm format into sparse CSR matrix.

datasets.load_svmlight_files(files, *[, ...])

Load dataset from multiple files in SVMlight format.

datasets.load_wine(*[, return_X_y, as_frame])

Load and return the wine dataset (classification).

Samples generator

datasets.make_biclusters(shape, n_clusters, *)

Generate a constant block diagonal structure array for biclustering.

datasets.make_blobs([n_samples, n_features, ...])

Generate isotropic Gaussian blobs for clustering.

datasets.make_checkerboard(shape, n_clusters, *)

Generate an array with block checkerboard structure for biclustering.

datasets.make_circles([n_samples, shuffle, ...])

Make a large circle containing a smaller circle in 2d.

datasets.make_classification([n_samples, ...])

Generate a random n-class classification problem.

datasets.make_friedman1([n_samples, ...])

Generate the "Friedman #1" regression problem.

datasets.make_friedman2([n_samples, noise, ...])

Generate the "Friedman #2" regression problem.

datasets.make_friedman3([n_samples, noise, ...])

Generate the "Friedman #3" regression problem.

datasets.make_gaussian_quantiles(*[, mean, ...])

Generate isotropic Gaussian and label samples by quantile.

datasets.make_hastie_10_2([n_samples, ...])

Generate data for binary classification used in Hastie et al. 2009, Example 10.2.

datasets.make_low_rank_matrix([n_samples, ...])

Generate a mostly low rank matrix with bell-shaped singular values.

datasets.make_moons([n_samples, shuffle, ...])

Make two interleaving half circles.


Generate a random multilabel classification problem.

datasets.make_regression([n_samples, ...])

Generate a random regression problem.

datasets.make_s_curve([n_samples, noise, ...])

Generate an S curve dataset.

datasets.make_sparse_coded_signal(n_samples, ...)

Generate a signal as a sparse combination of dictionary elements.

datasets.make_sparse_spd_matrix([n_dim, ...])

Generate a sparse symmetric definite positive matrix.


Generate a random regression problem with sparse uncorrelated design.

datasets.make_spd_matrix(n_dim, *[, ...])

Generate a random symmetric, positive-definite matrix.

datasets.make_swiss_roll([n_samples, noise, ...])

Generate a swiss roll dataset.

sklearn.decomposition: Matrix Decomposition

The sklearn.decomposition module includes matrix decomposition algorithms, including among others PCA, NMF or ICA. Most of the algorithms of this module can be regarded as dimensionality reduction techniques.

User guide: See the Decomposing signals in components (matrix factorization problems) section for further details.


Dictionary learning.

decomposition.FactorAnalysis([n_components, ...])

Factor Analysis (FA).

decomposition.FastICA([n_components, ...])

FastICA: a fast algorithm for Independent Component Analysis.

decomposition.IncrementalPCA([n_components, ...])

Incremental principal components analysis (IPCA).

decomposition.KernelPCA([n_components, ...])

Kernel Principal component analysis (KPCA) [R396fc7d924b8-1].


Latent Dirichlet Allocation with online variational Bayes algorithm.


Mini-batch dictionary learning.


Mini-batch Sparse Principal Components Analysis.

decomposition.NMF([n_components, init, ...])

Non-Negative Matrix Factorization (NMF).

decomposition.MiniBatchNMF([n_components, ...])

Mini-Batch Non-Negative Matrix Factorization (NMF).

decomposition.PCA([n_components, copy, ...])

Principal component analysis (PCA).

decomposition.SparsePCA([n_components, ...])

Sparse Principal Components Analysis (SparsePCA).

decomposition.SparseCoder(dictionary, *[, ...])

Sparse coding.

decomposition.TruncatedSVD([n_components, ...])

Dimensionality reduction using truncated SVD (aka LSA).

decomposition.dict_learning(X, n_components, ...)

Solve a dictionary learning matrix factorization problem.

decomposition.dict_learning_online(X[, ...])

Solve a dictionary learning matrix factorization problem online.

decomposition.fastica(X[, n_components, ...])

Perform Fast Independent Component Analysis.


Compute Non-negative Matrix Factorization (NMF).

decomposition.sparse_encode(X, dictionary, *)

Sparse coding.

sklearn.discriminant_analysis: Discriminant Analysis

Linear Discriminant Analysis and Quadratic Discriminant Analysis

User guide: See the Linear and Quadratic Discriminant Analysis section for further details.


Linear Discriminant Analysis.


Quadratic Discriminant Analysis.

sklearn.dummy: Dummy estimators

User guide: See the Metrics and scoring: quantifying the quality of predictions section for further details.

dummy.DummyClassifier(*[, strategy, ...])

DummyClassifier makes predictions that ignore the input features.

dummy.DummyRegressor(*[, strategy, ...])

Regressor that makes predictions using simple rules.

sklearn.ensemble: Ensemble Methods

The sklearn.ensemble module includes ensemble-based methods for classification, regression and anomaly detection.

User guide: See the Ensembles: Gradient boosting, random forests, bagging, voting, stacking section for further details.

ensemble.AdaBoostClassifier([estimator, ...])

An AdaBoost classifier.

ensemble.AdaBoostRegressor([estimator, ...])

An AdaBoost regressor.

ensemble.BaggingClassifier([estimator, ...])

A Bagging classifier.

ensemble.BaggingRegressor([estimator, ...])

A Bagging regressor.


An extra-trees classifier.

ensemble.ExtraTreesRegressor([n_estimators, ...])

An extra-trees regressor.

ensemble.GradientBoostingClassifier(*[, ...])

Gradient Boosting for classification.

ensemble.GradientBoostingRegressor(*[, ...])

Gradient Boosting for regression.

ensemble.IsolationForest(*[, n_estimators, ...])

Isolation Forest Algorithm.


A random forest classifier.


A random forest regressor.


An ensemble of totally random trees.

ensemble.StackingClassifier(estimators[, ...])

Stack of estimators with a final classifier.

ensemble.StackingRegressor(estimators[, ...])

Stack of estimators with a final regressor.

ensemble.VotingClassifier(estimators, *[, ...])

Soft Voting/Majority Rule classifier for unfitted estimators.

ensemble.VotingRegressor(estimators, *[, ...])

Prediction voting regressor for unfitted estimators.


Histogram-based Gradient Boosting Regression Tree.


Histogram-based Gradient Boosting Classification Tree.

sklearn.exceptions: Exceptions and warnings

The sklearn.exceptions module includes all custom warnings and error classes used across scikit-learn.


Custom warning to capture convergence problems


Warning used to notify implicit data conversions happening in the code.


Custom warning to notify potential issues with data dimensionality.


Warning used to notify the user of inefficient computation.


Warning class used if there is an error while fitting the estimator.

exceptions.InconsistentVersionWarning(*, ...)

Warning raised when an estimator is unpickled with a inconsistent version.


Exception class to raise if estimator is used before fitting.


Warning used when the metric is invalid

sklearn.experimental: Experimental

The sklearn.experimental module provides importable modules that enable the use of experimental features or estimators.

The features and estimators that are experimental aren’t subject to deprecation cycles. Use them at your own risks!


Enables IterativeImputer


Enables Successive Halving search-estimators

sklearn.feature_extraction: Feature Extraction

The sklearn.feature_extraction module deals with feature extraction from raw data. It currently includes methods to extract features from text and images.

User guide: See the Feature extraction section for further details.

feature_extraction.DictVectorizer(*[, ...])

Transforms lists of feature-value mappings to vectors.


Implements feature hashing, aka the hashing trick.

From images

The sklearn.feature_extraction.image submodule gathers utilities to extract features from images.


Reshape a 2D image into a collection of patches.

feature_extraction.image.grid_to_graph(n_x, n_y)

Graph of the pixel-to-pixel connections.

feature_extraction.image.img_to_graph(img, *)

Graph of the pixel-to-pixel gradient connections.


Reconstruct the image from all of its patches.

feature_extraction.image.PatchExtractor(*[, ...])

Extracts patches from a collection of images.

From text

The sklearn.feature_extraction.text submodule gathers utilities to build feature vectors from text documents.

feature_extraction.text.CountVectorizer(*[, ...])

Convert a collection of text documents to a matrix of token counts.


Convert a collection of text documents to a matrix of token occurrences.


Transform a count matrix to a normalized tf or tf-idf representation.

feature_extraction.text.TfidfVectorizer(*[, ...])

Convert a collection of raw documents to a matrix of TF-IDF features.

sklearn.feature_selection: Feature Selection

The sklearn.feature_selection module implements feature selection algorithms. It currently includes univariate filter selection methods and the recursive feature elimination algorithm.

User guide: See the Feature selection section for further details.


Univariate feature selector with configurable strategy.


Select features according to a percentile of the highest scores.

feature_selection.SelectKBest([score_func, k])

Select features according to the k highest scores.

feature_selection.SelectFpr([score_func, alpha])

Filter: Select the pvalues below alpha based on a FPR test.

feature_selection.SelectFdr([score_func, alpha])

Filter: Select the p-values for an estimated false discovery rate.

feature_selection.SelectFromModel(estimator, *)

Meta-transformer for selecting features based on importance weights.

feature_selection.SelectFwe([score_func, alpha])

Filter: Select the p-values corresponding to Family-wise error rate.


Transformer that performs Sequential Feature Selection.

feature_selection.RFE(estimator, *[, ...])

Feature ranking with recursive feature elimination.

feature_selection.RFECV(estimator, *[, ...])

Recursive feature elimination with cross-validation to select features.


Feature selector that removes all low-variance features.

feature_selection.chi2(X, y)

Compute chi-squared stats between each non-negative feature and class.

feature_selection.f_classif(X, y)

Compute the ANOVA F-value for the provided sample.

feature_selection.f_regression(X, y, *[, ...])

Univariate linear regression tests returning F-statistic and p-values.

feature_selection.r_regression(X, y, *[, ...])

Compute Pearson's r for each features and the target.

feature_selection.mutual_info_classif(X, y, *)

Estimate mutual information for a discrete target variable.

feature_selection.mutual_info_regression(X, y, *)

Estimate mutual information for a continuous target variable.

sklearn.gaussian_process: Gaussian Processes

The sklearn.gaussian_process module implements Gaussian Process based regression and classification.

User guide: See the Gaussian Processes section for further details.


Gaussian process classification (GPC) based on Laplace approximation.


Gaussian process regression (GPR).


The sklearn.gaussian_process.kernels module implements a set of kernels that can be combined by operators and used in Gaussian processes.


Kernel which is composed of a set of other kernels.


Constant kernel.


Dot-Product kernel.


Exp-Sine-Squared kernel (aka periodic kernel).


The Exponentiation kernel takes one base kernel and a scalar parameter \(p\) and combines them via


A kernel hyperparameter's specification in form of a namedtuple.


Base class for all kernels.


Matern kernel.


Wrapper for kernels in sklearn.metrics.pairwise.

gaussian_process.kernels.Product(k1, k2)

The Product kernel takes two kernels \(k_1\) and \(k_2\) and combines them via

gaussian_process.kernels.RBF([length_scale, ...])

Radial basis function kernel (aka squared-exponential kernel).


Rational Quadratic kernel.

gaussian_process.kernels.Sum(k1, k2)

The Sum kernel takes two kernels \(k_1\) and \(k_2\) and combines them via


White kernel.

sklearn.impute: Impute

Transformers for missing value imputation

User guide: See the Imputation of missing values section for further details.

impute.SimpleImputer(*[, missing_values, ...])

Univariate imputer for completing missing values with simple strategies.

impute.IterativeImputer([estimator, ...])

Multivariate imputer that estimates each feature from all the others.

impute.MissingIndicator(*[, missing_values, ...])

Binary indicators for missing values.

impute.KNNImputer(*[, missing_values, ...])

Imputation for completing missing values using k-Nearest Neighbors.

sklearn.inspection: Inspection

The sklearn.inspection module includes tools for model inspection.

inspection.partial_dependence(estimator, X, ...)

Partial dependence of features.

inspection.permutation_importance(estimator, ...)

Permutation importance for feature evaluation [Rd9e56ef97513-BRE].


inspection.DecisionBoundaryDisplay(*, xx0, ...)

Decisions boundary visualization.

inspection.PartialDependenceDisplay(...[, ...])

Partial Dependence Plot (PDP).

sklearn.isotonic: Isotonic regression

User guide: See the Isotonic regression section for further details.

isotonic.IsotonicRegression(*[, y_min, ...])

Isotonic regression model.

isotonic.check_increasing(x, y)

Determine whether y is monotonically correlated with x.

isotonic.isotonic_regression(y, *[, ...])

Solve the isotonic regression model.

sklearn.kernel_approximation: Kernel Approximation

The sklearn.kernel_approximation module implements several approximate kernel feature maps based on Fourier transforms and Count Sketches.

User guide: See the Kernel Approximation section for further details.


Approximate feature map for additive chi2 kernel.

kernel_approximation.Nystroem([kernel, ...])

Approximate a kernel map using a subset of the training data.


Polynomial kernel approximation via Tensor Sketch.

kernel_approximation.RBFSampler(*[, gamma, ...])

Approximate a RBF kernel feature map using random Fourier features.

kernel_approximation.SkewedChi2Sampler(*[, ...])

Approximate feature map for "skewed chi-squared" kernel.

sklearn.kernel_ridge: Kernel Ridge Regression

Module sklearn.kernel_ridge implements kernel ridge regression.

User guide: See the Kernel ridge regression section for further details.

kernel_ridge.KernelRidge([alpha, kernel, ...])

Kernel ridge regression.

sklearn.linear_model: Linear Models

The sklearn.linear_model module implements a variety of linear models.

User guide: See the Linear Models section for further details.

The following subsections are only rough guidelines: the same estimator can fall into multiple categories, depending on its parameters.

Linear classifiers

linear_model.LogisticRegression([penalty, ...])

Logistic Regression (aka logit, MaxEnt) classifier.

linear_model.LogisticRegressionCV(*[, Cs, ...])

Logistic Regression CV (aka logit, MaxEnt) classifier.


Passive Aggressive Classifier.

linear_model.Perceptron(*[, penalty, alpha, ...])

Linear perceptron classifier.

linear_model.RidgeClassifier([alpha, ...])

Classifier using Ridge regression.

linear_model.RidgeClassifierCV([alphas, ...])

Ridge classifier with built-in cross-validation.

linear_model.SGDClassifier([loss, penalty, ...])

Linear classifiers (SVM, logistic regression, etc.) with SGD training.

linear_model.SGDOneClassSVM([nu, ...])

Solves linear One-Class SVM using Stochastic Gradient Descent.

Classical linear regressors

linear_model.LinearRegression(*[, ...])

Ordinary least squares Linear Regression.

linear_model.Ridge([alpha, fit_intercept, ...])

Linear least squares with l2 regularization.

linear_model.RidgeCV([alphas, ...])

Ridge regression with built-in cross-validation.

linear_model.SGDRegressor([loss, penalty, ...])

Linear model fitted by minimizing a regularized empirical loss with SGD.

Regressors with variable selection

The following estimators have built-in variable selection fitting procedures, but any estimator using a L1 or elastic-net penalty also performs variable selection: typically SGDRegressor or SGDClassifier with an appropriate penalty.

linear_model.ElasticNet([alpha, l1_ratio, ...])

Linear regression with combined L1 and L2 priors as regularizer.

linear_model.ElasticNetCV(*[, l1_ratio, ...])

Elastic Net model with iterative fitting along a regularization path.

linear_model.Lars(*[, fit_intercept, ...])

Least Angle Regression model a.k.a.

linear_model.LarsCV(*[, fit_intercept, ...])

Cross-validated Least Angle Regression model.

linear_model.Lasso([alpha, fit_intercept, ...])

Linear Model trained with L1 prior as regularizer (aka the Lasso).

linear_model.LassoCV(*[, eps, n_alphas, ...])

Lasso linear model with iterative fitting along a regularization path.

linear_model.LassoLars([alpha, ...])

Lasso model fit with Least Angle Regression a.k.a.

linear_model.LassoLarsCV(*[, fit_intercept, ...])

Cross-validated Lasso, using the LARS algorithm.

linear_model.LassoLarsIC([criterion, ...])

Lasso model fit with Lars using BIC or AIC for model selection.

linear_model.OrthogonalMatchingPursuit(*[, ...])

Orthogonal Matching Pursuit model (OMP).


Cross-validated Orthogonal Matching Pursuit model (OMP).

Bayesian regressors

linear_model.ARDRegression(*[, max_iter, ...])

Bayesian ARD regression.

linear_model.BayesianRidge(*[, max_iter, ...])

Bayesian ridge regression.

Multi-task linear regressors with variable selection

These estimators fit multiple regression problems (or tasks) jointly, while inducing sparse coefficients. While the inferred coefficients may differ between the tasks, they are constrained to agree on the features that are selected (non-zero coefficients).

linear_model.MultiTaskElasticNet([alpha, ...])

Multi-task ElasticNet model trained with L1/L2 mixed-norm as regularizer.

linear_model.MultiTaskElasticNetCV(*[, ...])

Multi-task L1/L2 ElasticNet with built-in cross-validation.

linear_model.MultiTaskLasso([alpha, ...])

Multi-task Lasso model trained with L1/L2 mixed-norm as regularizer.

linear_model.MultiTaskLassoCV(*[, eps, ...])

Multi-task Lasso model trained with L1/L2 mixed-norm as regularizer.

Outlier-robust regressors

Any estimator using the Huber loss would also be robust to outliers, e.g. SGDRegressor with loss='huber'.

linear_model.HuberRegressor(*[, epsilon, ...])

L2-regularized linear regression model that is robust to outliers.

linear_model.QuantileRegressor(*[, ...])

Linear regression model that predicts conditional quantiles.

linear_model.RANSACRegressor([estimator, ...])

RANSAC (RANdom SAmple Consensus) algorithm.

linear_model.TheilSenRegressor(*[, ...])

Theil-Sen Estimator: robust multivariate regression model.

Generalized linear models (GLM) for regression

These models allow for response variables to have error distributions other than a normal distribution:

linear_model.PoissonRegressor(*[, alpha, ...])

Generalized Linear Model with a Poisson distribution.

linear_model.TweedieRegressor(*[, power, ...])

Generalized Linear Model with a Tweedie distribution.

linear_model.GammaRegressor(*[, alpha, ...])

Generalized Linear Model with a Gamma distribution.


linear_model.PassiveAggressiveRegressor(*[, ...])

Passive Aggressive Regressor.

linear_model.enet_path(X, y, *[, l1_ratio, ...])

Compute elastic net path with coordinate descent.

linear_model.lars_path(X, y[, Xy, Gram, ...])

Compute Least Angle Regression or Lasso path using the LARS algorithm [1].

linear_model.lars_path_gram(Xy, Gram, *, ...)

The lars_path in the sufficient stats mode [1].

linear_model.lasso_path(X, y, *[, eps, ...])

Compute Lasso path with coordinate descent.

linear_model.orthogonal_mp(X, y, *[, ...])

Orthogonal Matching Pursuit (OMP).

linear_model.orthogonal_mp_gram(Gram, Xy, *)

Gram Orthogonal Matching Pursuit (OMP).

linear_model.ridge_regression(X, y, alpha, *)

Solve the ridge equation by the method of normal equations.

sklearn.manifold: Manifold Learning

The sklearn.manifold module implements data embedding techniques.

User guide: See the Manifold learning section for further details.

manifold.Isomap(*[, n_neighbors, radius, ...])

Isomap Embedding.

manifold.LocallyLinearEmbedding(*[, ...])

Locally Linear Embedding.

manifold.MDS([n_components, metric, n_init, ...])

Multidimensional scaling.

manifold.SpectralEmbedding([n_components, ...])

Spectral embedding for non-linear dimensionality reduction.

manifold.TSNE([n_components, perplexity, ...])

T-distributed Stochastic Neighbor Embedding.

manifold.locally_linear_embedding(X, *, ...)

Perform a Locally Linear Embedding analysis on the data.

manifold.smacof(dissimilarities, *[, ...])

Compute multidimensional scaling using the SMACOF algorithm.

manifold.spectral_embedding(adjacency, *[, ...])

Project the sample on the first eigenvectors of the graph Laplacian.

manifold.trustworthiness(X, X_embedded, *[, ...])

Indicate to what extent the local structure is retained.

sklearn.metrics: Metrics

See the Metrics and scoring: quantifying the quality of predictions section and the Pairwise metrics, Affinities and Kernels section of the user guide for further details.

The sklearn.metrics module includes score functions, performance metrics and pairwise metrics and distance computations.

Model Selection Interface

See the The scoring parameter: defining model evaluation rules section of the user guide for further details.

metrics.check_scoring(estimator[, scoring, ...])

Determine scorer from user options.


Get a scorer from string.


Get the names of all available scorers.

metrics.make_scorer(score_func, *[, ...])

Make a scorer from a performance metric or loss function.

Classification metrics

See the Classification metrics section of the user guide for further details.

metrics.accuracy_score(y_true, y_pred, *[, ...])

Accuracy classification score.

metrics.auc(x, y)

Compute Area Under the Curve (AUC) using the trapezoidal rule.

metrics.average_precision_score(y_true, ...)

Compute average precision (AP) from prediction scores.

metrics.balanced_accuracy_score(y_true, ...)

Compute the balanced accuracy.

metrics.brier_score_loss(y_true, y_prob, *)

Compute the Brier score loss.

metrics.class_likelihood_ratios(y_true, ...)

Compute binary classification positive and negative likelihood ratios.

metrics.classification_report(y_true, y_pred, *)

Build a text report showing the main classification metrics.

metrics.cohen_kappa_score(y1, y2, *[, ...])

Compute Cohen's kappa: a statistic that measures inter-annotator agreement.

metrics.confusion_matrix(y_true, y_pred, *)

Compute confusion matrix to evaluate the accuracy of a classification.

metrics.dcg_score(y_true, y_score, *[, k, ...])

Compute Discounted Cumulative Gain.

metrics.det_curve(y_true, y_score[, ...])

Compute error rates for different probability thresholds.

metrics.f1_score(y_true, y_pred, *[, ...])

Compute the F1 score, also known as balanced F-score or F-measure.

metrics.fbeta_score(y_true, y_pred, *, beta)

Compute the F-beta score.

metrics.hamming_loss(y_true, y_pred, *[, ...])

Compute the average Hamming loss.

metrics.hinge_loss(y_true, pred_decision, *)

Average hinge loss (non-regularized).

metrics.jaccard_score(y_true, y_pred, *[, ...])

Jaccard similarity coefficient score.

metrics.log_loss(y_true, y_pred, *[, eps, ...])

Log loss, aka logistic loss or cross-entropy loss.

metrics.matthews_corrcoef(y_true, y_pred, *)

Compute the Matthews correlation coefficient (MCC).

metrics.multilabel_confusion_matrix(y_true, ...)

Compute a confusion matrix for each class or sample.

metrics.ndcg_score(y_true, y_score, *[, k, ...])

Compute Normalized Discounted Cumulative Gain.

metrics.precision_recall_curve(y_true, ...)

Compute precision-recall pairs for different probability thresholds.


Compute precision, recall, F-measure and support for each class.

metrics.precision_score(y_true, y_pred, *[, ...])

Compute the precision.

metrics.recall_score(y_true, y_pred, *[, ...])

Compute the recall.

metrics.roc_auc_score(y_true, y_score, *[, ...])

Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.

metrics.roc_curve(y_true, y_score, *[, ...])

Compute Receiver operating characteristic (ROC).

metrics.top_k_accuracy_score(y_true, y_score, *)

Top-k Accuracy classification score.

metrics.zero_one_loss(y_true, y_pred, *[, ...])

Zero-one classification loss.

Regression metrics

See the Regression metrics section of the user guide for further details.

metrics.explained_variance_score(y_true, ...)

Explained variance regression score function.

metrics.max_error(y_true, y_pred)

The max_error metric calculates the maximum residual error.

metrics.mean_absolute_error(y_true, y_pred, *)

Mean absolute error regression loss.

metrics.mean_squared_error(y_true, y_pred, *)

Mean squared error regression loss.

metrics.mean_squared_log_error(y_true, y_pred, *)

Mean squared logarithmic error regression loss.

metrics.median_absolute_error(y_true, y_pred, *)

Median absolute error regression loss.


Mean absolute percentage error (MAPE) regression loss.

metrics.r2_score(y_true, y_pred, *[, ...])

\(R^2\) (coefficient of determination) regression score function.

metrics.root_mean_squared_log_error(y_true, ...)

Root mean squared logarithmic error regression loss.

metrics.root_mean_squared_error(y_true, ...)

Root mean squared error regression loss.

metrics.mean_poisson_deviance(y_true, y_pred, *)

Mean Poisson deviance regression loss.

metrics.mean_gamma_deviance(y_true, y_pred, *)

Mean Gamma deviance regression loss.

metrics.mean_tweedie_deviance(y_true, y_pred, *)

Mean Tweedie deviance regression loss.

metrics.d2_tweedie_score(y_true, y_pred, *)

\(D^2\) regression score function, fraction of Tweedie deviance explained.

metrics.mean_pinball_loss(y_true, y_pred, *)

Pinball loss for quantile regression.

metrics.d2_pinball_score(y_true, y_pred, *)

\(D^2\) regression score function, fraction of pinball loss explained.

metrics.d2_absolute_error_score(y_true, ...)

\(D^2\) regression score function, fraction of absolute error explained.

Multilabel ranking metrics

See the Multilabel ranking metrics section of the user guide for further details.

metrics.coverage_error(y_true, y_score, *[, ...])

Coverage error measure.


Compute ranking-based average precision.

metrics.label_ranking_loss(y_true, y_score, *)

Compute Ranking loss measure.

Clustering metrics

See the Clustering performance evaluation section of the user guide for further details.

The sklearn.metrics.cluster submodule contains evaluation metrics for cluster analysis results. There are two forms of evaluation:

  • supervised, which uses a ground truth class values for each sample.

  • unsupervised, which does not and measures the ‘quality’ of the model itself.

metrics.adjusted_mutual_info_score(...[, ...])

Adjusted Mutual Information between two clusterings.

metrics.adjusted_rand_score(labels_true, ...)

Rand index adjusted for chance.

metrics.calinski_harabasz_score(X, labels)

Compute the Calinski and Harabasz score.

metrics.davies_bouldin_score(X, labels)

Compute the Davies-Bouldin score.

metrics.completeness_score(labels_true, ...)

Compute completeness metric of a cluster labeling given a ground truth.

metrics.cluster.contingency_matrix(...[, ...])

Build a contingency matrix describing the relationship between labels.


Pair confusion matrix arising from two clusterings [R9ca8fd06d29a-1].

metrics.fowlkes_mallows_score(labels_true, ...)

Measure the similarity of two clusterings of a set of points.


Compute the homogeneity and completeness and V-Measure scores at once.

metrics.homogeneity_score(labels_true, ...)

Homogeneity metric of a cluster labeling given a ground truth.

metrics.mutual_info_score(labels_true, ...)

Mutual Information between two clusterings.

metrics.normalized_mutual_info_score(...[, ...])

Normalized Mutual Information between two clusterings.

metrics.rand_score(labels_true, labels_pred)

Rand index.

metrics.silhouette_score(X, labels, *[, ...])

Compute the mean Silhouette Coefficient of all samples.

metrics.silhouette_samples(X, labels, *[, ...])

Compute the Silhouette Coefficient for each sample.

metrics.v_measure_score(labels_true, ...[, beta])

V-measure cluster labeling given a ground truth.

Biclustering metrics

See the Biclustering evaluation section of the user guide for further details.

metrics.consensus_score(a, b, *[, similarity])

The similarity of two sets of biclusters.

Distance metrics


Uniform interface for fast distance metric functions.

Pairwise metrics

See the Pairwise metrics, Affinities and Kernels section of the user guide for further details.

metrics.pairwise.additive_chi2_kernel(X[, Y])

Compute the additive chi-squared kernel between observations in X and Y.

metrics.pairwise.chi2_kernel(X[, Y, gamma])

Compute the exponential chi-squared kernel between X and Y.

metrics.pairwise.cosine_similarity(X[, Y, ...])

Compute cosine similarity between samples in X and Y.

metrics.pairwise.cosine_distances(X[, Y])

Compute cosine distance between samples in X and Y.


Valid metrics for pairwise_distances.

metrics.pairwise.euclidean_distances(X[, Y, ...])

Compute the distance matrix between each pair from a vector array X and Y.

metrics.pairwise.haversine_distances(X[, Y])

Compute the Haversine distance between samples in X and Y.


Valid metrics for pairwise_kernels.

metrics.pairwise.laplacian_kernel(X[, Y, gamma])

Compute the laplacian kernel between X and Y.

metrics.pairwise.linear_kernel(X[, Y, ...])

Compute the linear kernel between X and Y.

metrics.pairwise.manhattan_distances(X[, Y])

Compute the L1 distances between the vectors in X and Y.


Calculate the euclidean distances in the presence of missing values.

metrics.pairwise.pairwise_kernels(X[, Y, ...])

Compute the kernel between arrays X and optional array Y.

metrics.pairwise.polynomial_kernel(X[, Y, ...])

Compute the polynomial kernel between X and Y.

metrics.pairwise.rbf_kernel(X[, Y, gamma])

Compute the rbf (gaussian) kernel between X and Y.

metrics.pairwise.sigmoid_kernel(X[, Y, ...])

Compute the sigmoid kernel between X and Y.

metrics.pairwise.paired_euclidean_distances(X, Y)

Compute the paired euclidean distances between X and Y.

metrics.pairwise.paired_manhattan_distances(X, Y)

Compute the paired L1 distances between X and Y.

metrics.pairwise.paired_cosine_distances(X, Y)

Compute the paired cosine distances between X and Y.

metrics.pairwise.paired_distances(X, Y, *[, ...])

Compute the paired distances between X and Y.

metrics.pairwise_distances(X[, Y, metric, ...])

Compute the distance matrix from a vector array X and optional Y.

metrics.pairwise_distances_argmin(X, Y, *[, ...])

Compute minimum distances between one point and a set of points.

metrics.pairwise_distances_argmin_min(X, Y, *)

Compute minimum distances between one point and a set of points.

metrics.pairwise_distances_chunked(X[, Y, ...])

Generate a distance matrix chunk by chunk with optional reduction.


See the Visualizations section of the user guide for further details.

metrics.ConfusionMatrixDisplay(...[, ...])

Confusion Matrix visualization.

metrics.DetCurveDisplay(*, fpr, fnr[, ...])

DET curve visualization.

metrics.PrecisionRecallDisplay(precision, ...)

Precision Recall visualization.

metrics.PredictionErrorDisplay(*, y_true, y_pred)

Visualization of the prediction error of a regression model.

metrics.RocCurveDisplay(*, fpr, tpr[, ...])

ROC Curve visualization.

calibration.CalibrationDisplay(prob_true, ...)

Calibration curve (also known as reliability diagram) visualization.

sklearn.mixture: Gaussian Mixture Models

The sklearn.mixture module implements mixture modeling algorithms.

User guide: See the Gaussian mixture models section for further details.

mixture.BayesianGaussianMixture(*[, ...])

Variational Bayesian estimation of a Gaussian mixture.

mixture.GaussianMixture([n_components, ...])

Gaussian Mixture.

sklearn.model_selection: Model Selection

User guide: See the Cross-validation: evaluating estimator performance, Tuning the hyper-parameters of an estimator and Learning curve sections for further details.

Splitter Classes


K-fold iterator variant with non-overlapping groups.


Shuffle-Group(s)-Out cross-validation iterator.

model_selection.KFold([n_splits, shuffle, ...])

K-Fold cross-validator.


Leave One Group Out cross-validator.


Leave P Group(s) Out cross-validator.


Leave-One-Out cross-validator.


Leave-P-Out cross-validator.


Predefined split cross-validator.

model_selection.RepeatedKFold(*[, n_splits, ...])

Repeated K-Fold cross validator.

model_selection.RepeatedStratifiedKFold(*[, ...])

Repeated Stratified K-Fold cross validator.

model_selection.ShuffleSplit([n_splits, ...])

Random permutation cross-validator.

model_selection.StratifiedKFold([n_splits, ...])

Stratified K-Fold cross-validator.


Stratified ShuffleSplit cross-validator.


Stratified K-Fold iterator variant with non-overlapping groups.

model_selection.TimeSeriesSplit([n_splits, ...])

Time Series cross-validator.

Splitter Functions

model_selection.check_cv([cv, y, classifier])

Input checker utility for building a cross-validator.

model_selection.train_test_split(*arrays[, ...])

Split arrays or matrices into random train and test subsets.

Hyper-parameter optimizers

model_selection.GridSearchCV(estimator, ...)

Exhaustive search over specified parameter values for an estimator.

model_selection.HalvingGridSearchCV(...[, ...])

Search over specified parameter values with successive halving.


Grid of parameters with a discrete number of values for each.

model_selection.ParameterSampler(...[, ...])

Generator on parameters sampled from given distributions.

model_selection.RandomizedSearchCV(...[, ...])

Randomized search on hyper parameters.

model_selection.HalvingRandomSearchCV(...[, ...])

Randomized search on hyper parameters.

Model validation

model_selection.cross_validate(estimator, X)

Evaluate metric(s) by cross-validation and also record fit/score times.

model_selection.cross_val_predict(estimator, X)

Generate cross-validated estimates for each input data point.

model_selection.cross_val_score(estimator, X)

Evaluate a score by cross-validation.

model_selection.learning_curve(estimator, X, ...)

Learning curve.


Evaluate the significance of a cross-validated score with permutations.

model_selection.validation_curve(estimator, ...)

Validation curve.


model_selection.LearningCurveDisplay(*, ...)

Learning Curve visualization.

model_selection.ValidationCurveDisplay(*, ...)

Validation Curve visualization.

sklearn.multiclass: Multiclass classification

Multiclass classification strategies

This module implements multiclass learning algorithms:
  • one-vs-the-rest / one-vs-all

  • one-vs-one

  • error correcting output codes

The estimators provided in this module are meta-estimators: they require a base estimator to be provided in their constructor. For example, it is possible to use these estimators to turn a binary classifier or a regressor into a multiclass classifier. It is also possible to use these estimators with multiclass estimators in the hope that their accuracy or runtime performance improves.

All classifiers in scikit-learn implement multiclass classification; you only need to use this module if you want to experiment with custom multiclass strategies.

The one-vs-the-rest meta-classifier also implements a predict_proba method, so long as such a method is implemented by the base classifier. This method returns probabilities of class membership in both the single label and multilabel case. Note that in the multilabel case, probabilities are the marginal probability that a given sample falls in the given class. As such, in the multilabel case the sum of these probabilities over all possible labels for a given sample will not sum to unity, as they do in the single label case.

User guide: See the Multiclass classification section for further details.

multiclass.OneVsRestClassifier(estimator, *)

One-vs-the-rest (OvR) multiclass strategy.

multiclass.OneVsOneClassifier(estimator, *)

One-vs-one multiclass strategy.

multiclass.OutputCodeClassifier(estimator, *)

(Error-Correcting) Output-Code multiclass strategy.

sklearn.multioutput: Multioutput regression and classification

This module implements multioutput regression and classification.

The estimators provided in this module are meta-estimators: they require a base estimator to be provided in their constructor. The meta-estimator extends single output estimators to multioutput estimators.

User guide: See the Multilabel classification, Multiclass-multioutput classification, and Multioutput regression sections for further details.

multioutput.ClassifierChain(base_estimator, *)

A multi-label model that arranges binary classifiers into a chain.

multioutput.MultiOutputRegressor(estimator, *)

Multi target regression.

multioutput.MultiOutputClassifier(estimator, *)

Multi target classification.

multioutput.RegressorChain(base_estimator, *)

A multi-label model that arranges regressions into a chain.

sklearn.naive_bayes: Naive Bayes

The sklearn.naive_bayes module implements Naive Bayes algorithms. These are supervised learning methods based on applying Bayes’ theorem with strong (naive) feature independence assumptions.

User guide: See the Naive Bayes section for further details.

naive_bayes.BernoulliNB(*[, alpha, ...])

Naive Bayes classifier for multivariate Bernoulli models.

naive_bayes.CategoricalNB(*[, alpha, ...])

Naive Bayes classifier for categorical features.

naive_bayes.ComplementNB(*[, alpha, ...])

The Complement Naive Bayes classifier described in Rennie et al. (2003).

naive_bayes.GaussianNB(*[, priors, ...])

Gaussian Naive Bayes (GaussianNB).

naive_bayes.MultinomialNB(*[, alpha, ...])

Naive Bayes classifier for multinomial models.

sklearn.neighbors: Nearest Neighbors

The sklearn.neighbors module implements the k-nearest neighbors algorithm.

User guide: See the Nearest Neighbors section for further details.

neighbors.BallTree(X[, leaf_size, metric])

BallTree for fast generalized N-point problems

neighbors.KDTree(X[, leaf_size, metric])

KDTree for fast generalized N-point problems

neighbors.KernelDensity(*[, bandwidth, ...])

Kernel Density Estimation.


Classifier implementing the k-nearest neighbors vote.

neighbors.KNeighborsRegressor([n_neighbors, ...])

Regression based on k-nearest neighbors.

neighbors.KNeighborsTransformer(*[, mode, ...])

Transform X into a (weighted) graph of k nearest neighbors.

neighbors.LocalOutlierFactor([n_neighbors, ...])

Unsupervised Outlier Detection using the Local Outlier Factor (LOF).


Classifier implementing a vote among neighbors within a given radius.

neighbors.RadiusNeighborsRegressor([radius, ...])

Regression based on neighbors within a fixed radius.

neighbors.RadiusNeighborsTransformer(*[, ...])

Transform X into a (weighted) graph of neighbors nearer than a radius.

neighbors.NearestCentroid([metric, ...])

Nearest centroid classifier.

neighbors.NearestNeighbors(*[, n_neighbors, ...])

Unsupervised learner for implementing neighbor searches.


Neighborhood Components Analysis.

neighbors.kneighbors_graph(X, n_neighbors, *)

Compute the (weighted) graph of k-Neighbors for points in X.

neighbors.radius_neighbors_graph(X, radius, *)

Compute the (weighted) graph of Neighbors for points in X.

neighbors.sort_graph_by_row_values(graph[, ...])

Sort a sparse graph such that each row is stored with increasing values.

sklearn.neural_network: Neural network models

The sklearn.neural_network module includes models based on neural networks.

User guide: See the Neural network models (supervised) and Neural network models (unsupervised) sections for further details.

neural_network.BernoulliRBM([n_components, ...])

Bernoulli Restricted Boltzmann Machine (RBM).


Multi-layer Perceptron classifier.


Multi-layer Perceptron regressor.

sklearn.pipeline: Pipeline

The sklearn.pipeline module implements utilities to build a composite estimator, as a chain of transforms and estimators.

User guide: See the Pipelines and composite estimators section for further details.

pipeline.FeatureUnion(transformer_list, *[, ...])

Concatenates results of multiple transformer objects.

pipeline.Pipeline(steps, *[, memory, verbose])

A sequence of data transformers with an optional final predictor.

pipeline.make_pipeline(*steps[, memory, verbose])

Construct a Pipeline from the given estimators.

pipeline.make_union(*transformers[, n_jobs, ...])

Construct a FeatureUnion from the given transformers.

sklearn.preprocessing: Preprocessing and Normalization

The sklearn.preprocessing module includes scaling, centering, normalization, binarization methods.

User guide: See the Preprocessing data section for further details.

preprocessing.Binarizer(*[, threshold, copy])

Binarize data (set feature values to 0 or 1) according to a threshold.

preprocessing.FunctionTransformer([func, ...])

Constructs a transformer from an arbitrary callable.

preprocessing.KBinsDiscretizer([n_bins, ...])

Bin continuous data into intervals.


Center an arbitrary kernel matrix \(K\).

preprocessing.LabelBinarizer(*[, neg_label, ...])

Binarize labels in a one-vs-all fashion.


Encode target labels with value between 0 and n_classes-1.

preprocessing.MultiLabelBinarizer(*[, ...])

Transform between iterable of iterables and a multilabel format.

preprocessing.MaxAbsScaler(*[, copy])

Scale each feature by its maximum absolute value.

preprocessing.MinMaxScaler([feature_range, ...])

Transform features by scaling each feature to a given range.

preprocessing.Normalizer([norm, copy])

Normalize samples individually to unit norm.

preprocessing.OneHotEncoder(*[, categories, ...])

Encode categorical features as a one-hot numeric array.

preprocessing.OrdinalEncoder(*[, ...])

Encode categorical features as an integer array.

preprocessing.PolynomialFeatures([degree, ...])

Generate polynomial and interaction features.

preprocessing.PowerTransformer([method, ...])

Apply a power transform featurewise to make data more Gaussian-like.

preprocessing.QuantileTransformer(*[, ...])

Transform features using quantiles information.

preprocessing.RobustScaler(*[, ...])

Scale features using statistics that are robust to outliers.

preprocessing.SplineTransformer([n_knots, ...])

Generate univariate B-spline bases for features.

preprocessing.StandardScaler(*[, copy, ...])

Standardize features by removing the mean and scaling to unit variance.

preprocessing.TargetEncoder([categories, ...])

Target Encoder for regression and classification targets.

preprocessing.add_dummy_feature(X[, value])

Augment dataset with an additional dummy feature.

preprocessing.binarize(X, *[, threshold, copy])

Boolean thresholding of array-like or scipy.sparse matrix.

preprocessing.label_binarize(y, *, classes)

Binarize labels in a one-vs-all fashion.

preprocessing.maxabs_scale(X, *[, axis, copy])

Scale each feature to the [-1, 1] range without breaking the sparsity.

preprocessing.minmax_scale(X[, ...])

Transform features by scaling each feature to a given range.

preprocessing.normalize(X[, norm, axis, ...])

Scale input vectors individually to unit norm (vector length).

preprocessing.quantile_transform(X, *[, ...])

Transform features using quantiles information.

preprocessing.robust_scale(X, *[, axis, ...])

Standardize a dataset along any axis.

preprocessing.scale(X, *[, axis, with_mean, ...])

Standardize a dataset along any axis.

preprocessing.power_transform(X[, method, ...])

Parametric, monotonic transformation to make data more Gaussian-like.

sklearn.random_projection: Random projection

Random Projection transformers.

Random Projections are a simple and computationally efficient way to reduce the dimensionality of the data by trading a controlled amount of accuracy (as additional variance) for faster processing times and smaller model sizes.

The dimensions and distribution of Random Projections matrices are controlled so as to preserve the pairwise distances between any two samples of the dataset.

The main theoretical result behind the efficiency of random projection is the Johnson-Lindenstrauss lemma (quoting Wikipedia):

In mathematics, the Johnson-Lindenstrauss lemma is a result concerning low-distortion embeddings of points from high-dimensional into low-dimensional Euclidean space. The lemma states that a small set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved. The map used for the embedding is at least Lipschitz, and can even be taken to be an orthogonal projection.

User guide: See the Random Projection section for further details.


Reduce dimensionality through Gaussian random projection.


Reduce dimensionality through sparse random projection.


Find a 'safe' number of components to randomly project to.

sklearn.semi_supervised: Semi-Supervised Learning

The sklearn.semi_supervised module implements semi-supervised learning algorithms. These algorithms utilize small amounts of labeled data and large amounts of unlabeled data for classification tasks. This module includes Label Propagation.

User guide: See the Semi-supervised learning section for further details.

semi_supervised.LabelPropagation([kernel, ...])

Label Propagation classifier.

semi_supervised.LabelSpreading([kernel, ...])

LabelSpreading model for semi-supervised learning.


Self-training classifier.

sklearn.svm: Support Vector Machines

The sklearn.svm module includes Support Vector Machine algorithms.

User guide: See the Support Vector Machines section for further details.


svm.LinearSVC([penalty, loss, dual, tol, C, ...])

Linear Support Vector Classification.

svm.LinearSVR(*[, epsilon, tol, C, loss, ...])

Linear Support Vector Regression.

svm.NuSVC(*[, nu, kernel, degree, gamma, ...])

Nu-Support Vector Classification.

svm.NuSVR(*[, nu, C, kernel, degree, gamma, ...])

Nu Support Vector Regression.

svm.OneClassSVM(*[, kernel, degree, gamma, ...])

Unsupervised Outlier Detection.

svm.SVC(*[, C, kernel, degree, gamma, ...])

C-Support Vector Classification.

svm.SVR(*[, kernel, degree, gamma, coef0, ...])

Epsilon-Support Vector Regression.

svm.l1_min_c(X, y, *[, loss, fit_intercept, ...])

Return the lowest bound for C.

sklearn.tree: Decision Trees

The sklearn.tree module includes decision tree-based models for classification and regression.

User guide: See the Decision Trees section for further details.

tree.DecisionTreeClassifier(*[, criterion, ...])

A decision tree classifier.

tree.DecisionTreeRegressor(*[, criterion, ...])

A decision tree regressor.

tree.ExtraTreeClassifier(*[, criterion, ...])

An extremely randomized tree classifier.

tree.ExtraTreeRegressor(*[, criterion, ...])

An extremely randomized tree regressor.

tree.export_graphviz(decision_tree[, ...])

Export a decision tree in DOT format.

tree.export_text(decision_tree, *[, ...])

Build a text report showing the rules of a decision tree.


tree.plot_tree(decision_tree, *[, ...])

Plot a decision tree.

sklearn.utils: Utilities

The sklearn.utils module includes various utilities.

Developer guide: See the Utilities for Developers page for further details.


Container object exposing keys as attributes.

utils.as_float_array(X, *[, copy, ...])

Convert an array-like to an array of floats.

utils.assert_all_finite(X, *[, allow_nan, ...])

Throw a ValueError if X contains NaN or infinity.


Decorator to mark a function or class as deprecated.


Build a HTML representation of an estimator.

utils.gen_batches(n, batch_size, *[, ...])

Generator to create slices containing batch_size elements from 0 to n.

utils.gen_even_slices(n, n_packs, *[, n_samples])

Generator to create n_packs evenly spaced slices going up to n.


Make arrays indexable for cross-validation.

utils.murmurhash3_32(key[, seed, positive])

Compute the 32bit murmurhash3 of key at seed.

utils.resample(*arrays[, replace, ...])

Resample arrays or sparse matrices in a consistent way.

utils._safe_indexing(X, indices, *[, axis])

Return rows, items or columns of X using indices.

utils.safe_mask(X, mask)

Return a mask which is safe to use on X.

utils.safe_sqr(X, *[, copy])

Element wise squaring of array-likes and sparse matrices.

utils.shuffle(*arrays[, random_state, n_samples])

Shuffle arrays or sparse matrices in a consistent way.

Input and parameter validation

The sklearn.utils.validation module includes functions to validate input and parameters within scikit-learn estimators.

utils.check_X_y(X, y[, accept_sparse, ...])

Input validation for standard estimators.

utils.check_array(array[, accept_sparse, ...])

Input validation on an array, list, sparse matrix or similar.

utils.check_scalar(x, name, target_type, *)

Validate scalar parameters type and value.


Check that all arrays have consistent first dimensions.


Turn seed into a np.random.RandomState instance.


Perform is_fitted validation for estimator.


Check that memory is joblib.Memory-like.

utils.validation.check_symmetric(array, *[, ...])

Make sure that array is 2D, square and symmetric.

utils.validation.column_or_1d(y, *[, dtype, ...])

Ravel column or 1d numpy array, else raises an error.


Check whether the estimator's fit method supports the given parameter.

Utilities used in meta-estimators

The sklearn.utils.metaestimators module includes utilities for meta-estimators.


An attribute that is available only if check returns a truthy value.

Utilities to handle weights based on class labels

The sklearn.utils.class_weight module includes utilities for handling weights based on class labels.


Estimate class weights for unbalanced datasets.


Estimate sample weights by class for unbalanced datasets.

Utilities to deal with multiclass target in classifiers

The sklearn.utils.multiclass module includes utilities to handle multiclass/multioutput target in classifiers.

utils.multiclass.type_of_target(y[, input_name])

Determine the type of data indicated by the target.


Check if y is in a multilabel format.


Extract an ordered array of unique labels.

Utilities for optimal mathematical operations

The sklearn.utils.extmath module includes utilities to perform optimal mathematical operations in scikit-learn that are not available in SciPy.

utils.extmath.safe_sparse_dot(a, b, *[, ...])

Dot product that handle the sparse matrix case correctly.

utils.extmath.randomized_range_finder(A, *, ...)

Compute an orthonormal matrix whose range approximates the range of A.

utils.extmath.randomized_svd(M, n_components, *)

Compute a truncated randomized SVD.


Compute logarithm of determinant of a square matrix.


Compute density of a sparse vector.

utils.extmath.weighted_mode(a, w, *[, axis])

Return an array of the weighted modal (most common) value in the passed array.

Utilities to work with sparse matrices and arrays

The sklearn.utils.sparsefuncs module includes a collection of utilities to work with sparse matrices and arrays.

utils.sparsefuncs.incr_mean_variance_axis(X, ...)

Compute incremental mean and variance along an axis on a CSR or CSC matrix.

utils.sparsefuncs.inplace_column_scale(X, scale)

Inplace column scaling of a CSC/CSR matrix.

utils.sparsefuncs.inplace_row_scale(X, scale)

Inplace row scaling of a CSR or CSC matrix.

utils.sparsefuncs.inplace_swap_row(X, m, n)

Swap two rows of a CSC/CSR matrix in-place.

utils.sparsefuncs.inplace_swap_column(X, m, n)

Swap two columns of a CSC/CSR matrix in-place.

utils.sparsefuncs.mean_variance_axis(X, axis)

Compute mean and variance along an axis on a CSR or CSC matrix.

utils.sparsefuncs.inplace_csr_column_scale(X, ...)

Inplace column scaling of a CSR matrix.

The sklearn.utils.sparsefuncs_fast module includes a collection of utilities to work with sparse matrices and arrays written in Cython.


Inplace row normalize using the l1 norm


Inplace row normalize using the l2 norm

Utilities to work with graphs

The sklearn.utils.graph module includes graph utilities and algorithms.


Return the length of the shortest path from source to all reachable nodes.

Utilities for random sampling

The mod:sklearn.utils.random module includes utilities for random sampling.


Sample integers without replacement.

Utilities to operate on arrays

The sklearn.utils.arrayfuncs module includes a small collection of auxiliary functions that operate on arrays.


Find the minimum value of an array over positive values.

Metadata routing

The sklearn.utils.metadata_routing module includes utilities to route metadata within scikit-learn estimators.


Get a Metadata{Router, Request} instance from the given object.

utils.metadata_routing.process_routing(_obj, ...)

Validate and route input parameters.


Stores and handles metadata routing for a router object.


Contains the metadata request info of a consumer.


Stores the mapping between callee and caller methods for a router.

Scikit-learn object discovery

The sklearn.utils.discovery module includes utilities to discover objects (i.e. estimators, displays, functions) from the sklearn package.


Get a list of all estimators from sklearn.


Get a list of all displays from sklearn.


Get a list of all functions from sklearn.

Scikit-learn compatibility checker

The sklearn.utils.estimator_checks module includes various utilities to check the compatibility of estimators with the scikit-learn API.


Check if estimator adheres to scikit-learn conventions.


Pytest specific decorator for parametrizing estimator checks.

Utilities for parallel computing

The sklearn.utils.parallel customizes joblib tools for scikit-learn usage.


Decorator used to capture the arguments of a function.

utils.parallel_backend(backend[, n_jobs, ...])

Change the default backend used by Parallel inside a with block.

utils.register_parallel_backend(name, factory)

Register a new Parallel backend factory.

utils.parallel.Parallel([n_jobs, backend, ...])

Tweak of joblib.Parallel that propagates the scikit-learn configuration.

Recently deprecated