API Reference#

This is the class and function reference of scikit-learn. Please refer to the full user guide for further details, as the raw specifications of classes and functions may not be enough to give full guidelines on their uses. For reference on concepts repeated across the API, see Glossary of Common Terms and API Elements.

Object

Description

config_context

Context manager for global scikit-learn configuration.

get_config

Retrieve current values for configuration set by set_config.

set_config

Set global scikit-learn configuration.

show_versions

Print useful debugging information”

BaseEstimator

Base class for all estimators in scikit-learn.

BiclusterMixin

Mixin class for all bicluster estimators in scikit-learn.

ClassNamePrefixFeaturesOutMixin

Mixin class for transformers that generate their own names by prefixing.

ClassifierMixin

Mixin class for all classifiers in scikit-learn.

ClusterMixin

Mixin class for all cluster estimators in scikit-learn.

DensityMixin

Mixin class for all density estimators in scikit-learn.

MetaEstimatorMixin

Mixin class for all meta estimators in scikit-learn.

OneToOneFeatureMixin

Provides get_feature_names_out for simple transformers.

OutlierMixin

Mixin class for all outlier detection estimators in scikit-learn.

RegressorMixin

Mixin class for all regression estimators in scikit-learn.

TransformerMixin

Mixin class for all transformers in scikit-learn.

clone

Construct a new unfitted estimator with the same parameters.

is_classifier

Return True if the given estimator is (probably) a classifier.

is_clusterer

Return True if the given estimator is (probably) a clusterer.

is_regressor

Return True if the given estimator is (probably) a regressor.

is_transformer

is_outlier_detector

Return True if the given estimator is (probably) an outlier detector.

CalibratedClassifierCV

Probability calibration with isotonic regression or logistic regression.

calibration_curve

Compute true and predicted probabilities for a calibration curve.

CalibrationDisplay

Calibration curve (also known as reliability diagram) visualization.

AffinityPropagation

Perform Affinity Propagation Clustering of data.

AgglomerativeClustering

Agglomerative Clustering.

Birch

Implements the BIRCH clustering algorithm.

BisectingKMeans

Bisecting K-Means clustering.

DBSCAN

Perform DBSCAN clustering from vector array or distance matrix.

FeatureAgglomeration

Agglomerate features.

HDBSCAN

Cluster data using hierarchical density-based clustering.

KMeans

K-Means clustering.

MeanShift

Mean shift clustering using a flat kernel.

MiniBatchKMeans

Mini-Batch K-Means clustering.

OPTICS

Estimate clustering structure from vector array.

SpectralBiclustering

Spectral biclustering (Kluger, 2003).

SpectralClustering

Apply clustering to a projection of the normalized Laplacian.

SpectralCoclustering

Spectral Co-Clustering algorithm (Dhillon, 2001).

affinity_propagation

Perform Affinity Propagation Clustering of data.

cluster_optics_dbscan

Perform DBSCAN extraction for an arbitrary epsilon.

cluster_optics_xi

Automatically extract clusters according to the Xi-steep method.

compute_optics_graph

Compute the OPTICS reachability graph.

dbscan

Perform DBSCAN clustering from vector array or distance matrix.

estimate_bandwidth

Estimate the bandwidth to use with the mean-shift algorithm.

k_means

Perform K-means clustering algorithm.

kmeans_plusplus

Init n_clusters seeds according to k-means++.

mean_shift

Perform mean shift clustering of data using a flat kernel.

spectral_clustering

Apply clustering to a projection of the normalized Laplacian.

ward_tree

Ward clustering based on a Feature matrix.

ColumnTransformer

Applies transformers to columns of an array or pandas DataFrame.

TransformedTargetRegressor

Meta-estimator to regress on a transformed target.

make_column_selector

Create a callable to select columns to be used with

make_column_transformer

Construct a ColumnTransformer from the given transformers.

EllipticEnvelope

An object for detecting outliers in a Gaussian distributed dataset.

EmpiricalCovariance

Maximum likelihood covariance estimator.

GraphicalLasso

Sparse inverse covariance estimation with an l1-penalized estimator.

GraphicalLassoCV

Sparse inverse covariance w/ cross-validated choice of the l1 penalty.

LedoitWolf

LedoitWolf Estimator.

MinCovDet

Minimum Covariance Determinant (MCD): robust estimator of covariance.

OAS

Oracle Approximating Shrinkage Estimator.

ShrunkCovariance

Covariance estimator with shrinkage.

empirical_covariance

Compute the Maximum likelihood covariance estimator.

graphical_lasso

L1-penalized covariance estimator.

ledoit_wolf

Estimate the shrunk Ledoit-Wolf covariance matrix.

ledoit_wolf_shrinkage

Estimate the shrunk Ledoit-Wolf covariance matrix.

oas

Estimate covariance with the Oracle Approximating Shrinkage.

shrunk_covariance

Calculate covariance matrices shrunk on the diagonal.

CCA

Canonical Correlation Analysis, also known as “Mode B” PLS.

PLSCanonical

Partial Least Squares transformer and regressor.

PLSRegression

PLS regression.

PLSSVD

Partial Least Square SVD.

clear_data_home

Delete all the content of the data home cache.

dump_svmlight_file

Dump the dataset in svmlight / libsvm file format.

fetch_20newsgroups

Load the filenames and data from the 20 newsgroups dataset (classification).

fetch_20newsgroups_vectorized

Load and vectorize the 20 newsgroups dataset (classification).

fetch_california_housing

Load the California housing dataset (regression).

fetch_covtype

Load the covertype dataset (classification).

fetch_file

Fetch a file from the web if not already present in the local folder.

fetch_kddcup99

Load the kddcup99 dataset (classification).

fetch_lfw_pairs

Load the Labeled Faces in the Wild (LFW) pairs dataset (classification).

fetch_lfw_people

Load the Labeled Faces in the Wild (LFW) people dataset (classification).

fetch_olivetti_faces

Load the Olivetti faces data-set from AT&T (classification).

fetch_openml

Fetch dataset from openml by name or dataset id.

fetch_rcv1

Load the RCV1 multilabel dataset (classification).

fetch_species_distributions

Loader for species distribution dataset from Phillips et. al. (2006).

get_data_home

Return the path of the scikit-learn data directory.

load_breast_cancer

Load and return the breast cancer wisconsin dataset (classification).

load_diabetes

Load and return the diabetes dataset (regression).

load_digits

Load and return the digits dataset (classification).

load_files

Load text files with categories as subfolder names.

load_iris

Load and return the iris dataset (classification).

load_linnerud

Load and return the physical exercise Linnerud dataset.

load_sample_image

Load the numpy array of a single sample image.

load_sample_images

Load sample images for image manipulation.

load_svmlight_file

Load datasets in the svmlight / libsvm format into sparse CSR matrix.

load_svmlight_files

Load dataset from multiple files in SVMlight format.

load_wine

Load and return the wine dataset (classification).

make_biclusters

Generate a constant block diagonal structure array for biclustering.

make_blobs

Generate isotropic Gaussian blobs for clustering.

make_checkerboard

Generate an array with block checkerboard structure for biclustering.

make_circles

Make a large circle containing a smaller circle in 2d.

make_classification

Generate a random n-class classification problem.

make_friedman1

Generate the “Friedman #1” regression problem.

make_friedman2

Generate the “Friedman #2” regression problem.

make_friedman3

Generate the “Friedman #3” regression problem.

make_gaussian_quantiles

Generate isotropic Gaussian and label samples by quantile.

make_hastie_10_2

Generate data for binary classification used in Hastie et al. 2009, Example 10.2.

make_low_rank_matrix

Generate a mostly low rank matrix with bell-shaped singular values.

make_moons

Make two interleaving half circles.

make_multilabel_classification

Generate a random multilabel classification problem.

make_regression

Generate a random regression problem.

make_s_curve

Generate an S curve dataset.

make_sparse_coded_signal

Generate a signal as a sparse combination of dictionary elements.

make_sparse_spd_matrix

Generate a sparse symmetric definite positive matrix.

make_sparse_uncorrelated

Generate a random regression problem with sparse uncorrelated design.

make_spd_matrix

Generate a random symmetric, positive-definite matrix.

make_swiss_roll

Generate a swiss roll dataset.

DictionaryLearning

Dictionary learning.

FactorAnalysis

Factor Analysis (FA).

FastICA

FastICA: a fast algorithm for Independent Component Analysis.

IncrementalPCA

Incremental principal components analysis (IPCA).

KernelPCA

Kernel Principal component analysis (KPCA).

LatentDirichletAllocation

Latent Dirichlet Allocation with online variational Bayes algorithm.

MiniBatchDictionaryLearning

Mini-batch dictionary learning.

MiniBatchNMF

Mini-Batch Non-Negative Matrix Factorization (NMF).

MiniBatchSparsePCA

Mini-batch Sparse Principal Components Analysis.

NMF

Non-Negative Matrix Factorization (NMF).

PCA

Principal component analysis (PCA).

SparseCoder

Sparse coding.

SparsePCA

Sparse Principal Components Analysis (SparsePCA).

TruncatedSVD

Dimensionality reduction using truncated SVD (aka LSA).

dict_learning

Solve a dictionary learning matrix factorization problem.

dict_learning_online

Solve a dictionary learning matrix factorization problem online.

fastica

Perform Fast Independent Component Analysis.

non_negative_factorization

Compute Non-negative Matrix Factorization (NMF).

sparse_encode

Sparse coding.

LinearDiscriminantAnalysis

Linear Discriminant Analysis.

QuadraticDiscriminantAnalysis

Quadratic Discriminant Analysis.

DummyClassifier

DummyClassifier makes predictions that ignore the input features.

DummyRegressor

Regressor that makes predictions using simple rules.

AdaBoostClassifier

An AdaBoost classifier.

AdaBoostRegressor

An AdaBoost regressor.

BaggingClassifier

A Bagging classifier.

BaggingRegressor

A Bagging regressor.

ExtraTreesClassifier

An extra-trees classifier.

ExtraTreesRegressor

An extra-trees regressor.

GradientBoostingClassifier

Gradient Boosting for classification.

GradientBoostingRegressor

Gradient Boosting for regression.

HistGradientBoostingClassifier

Histogram-based Gradient Boosting Classification Tree.

HistGradientBoostingRegressor

Histogram-based Gradient Boosting Regression Tree.

IsolationForest

Isolation Forest Algorithm.

RandomForestClassifier

A random forest classifier.

RandomForestRegressor

A random forest regressor.

RandomTreesEmbedding

An ensemble of totally random trees.

StackingClassifier

Stack of estimators with a final classifier.

StackingRegressor

Stack of estimators with a final regressor.

VotingClassifier

Soft Voting/Majority Rule classifier for unfitted estimators.

VotingRegressor

Prediction voting regressor for unfitted estimators.

ConvergenceWarning

Custom warning to capture convergence problems

DataConversionWarning

Warning used to notify implicit data conversions happening in the code.

DataDimensionalityWarning

Custom warning to notify potential issues with data dimensionality.

EfficiencyWarning

Warning used to notify the user of inefficient computation.

FitFailedWarning

Warning class used if there is an error while fitting the estimator.

InconsistentVersionWarning

Warning raised when an estimator is unpickled with a inconsistent version.

NotFittedError

Exception class to raise if estimator is used before fitting.

UndefinedMetricWarning

Warning used when the metric is invalid

EstimatorCheckFailedWarning

Warning raised when an estimator check from the common tests fails.

enable_halving_search_cv

Enables Successive Halving search-estimators

enable_iterative_imputer

Enables IterativeImputer

DictVectorizer

Transforms lists of feature-value mappings to vectors.

FeatureHasher

Implements feature hashing, aka the hashing trick.

PatchExtractor

Extracts patches from a collection of images.

extract_patches_2d

Reshape a 2D image into a collection of patches.

grid_to_graph

Graph of the pixel-to-pixel connections.

img_to_graph

Graph of the pixel-to-pixel gradient connections.

reconstruct_from_patches_2d

Reconstruct the image from all of its patches.

CountVectorizer

Convert a collection of text documents to a matrix of token counts.

HashingVectorizer

Convert a collection of text documents to a matrix of token occurrences.

TfidfTransformer

Transform a count matrix to a normalized tf or tf-idf representation.

TfidfVectorizer

Convert a collection of raw documents to a matrix of TF-IDF features.

GenericUnivariateSelect

Univariate feature selector with configurable strategy.

RFE

Feature ranking with recursive feature elimination.

RFECV

Recursive feature elimination with cross-validation to select features.

SelectFdr

Filter: Select the p-values for an estimated false discovery rate.

SelectFpr

Filter: Select the pvalues below alpha based on a FPR test.

SelectFromModel

Meta-transformer for selecting features based on importance weights.

SelectFwe

Filter: Select the p-values corresponding to Family-wise error rate.

SelectKBest

Select features according to the k highest scores.

SelectPercentile

Select features according to a percentile of the highest scores.

SelectorMixin

Transformer mixin that performs feature selection given a support mask

SequentialFeatureSelector

Transformer that performs Sequential Feature Selection.

VarianceThreshold

Feature selector that removes all low-variance features.

chi2

Compute chi-squared stats between each non-negative feature and class.

f_classif

Compute the ANOVA F-value for the provided sample.

f_regression

Univariate linear regression tests returning F-statistic and p-values.

mutual_info_classif

Estimate mutual information for a discrete target variable.

mutual_info_regression

Estimate mutual information for a continuous target variable.

r_regression

Compute Pearson’s r for each features and the target.

FrozenEstimator

Estimator that wraps a fitted estimator to prevent re-fitting.

GaussianProcessClassifier

Gaussian process classification (GPC) based on Laplace approximation.

GaussianProcessRegressor

Gaussian process regression (GPR).

CompoundKernel

Kernel which is composed of a set of other kernels.

ConstantKernel

DotProduct

Dot-Product kernel.

ExpSineSquared

Exp-Sine-Squared kernel (aka periodic kernel).

Exponentiation

The Exponentiation kernel takes one base kernel and a scalar parameter

Hyperparameter

A kernel hyperparameter’s specification in form of a namedtuple.

Kernel

Base class for all kernels.

Matern

PairwiseKernel

Wrapper for kernels in sklearn.metrics.pairwise.

Product

The Product kernel takes two kernels \(k_1\) and \(k_2\)

RBF

Radial basis function kernel (aka squared-exponential kernel).

RationalQuadratic

Rational Quadratic kernel.

Sum

The Sum kernel takes two kernels \(k_1\) and \(k_2\)

WhiteKernel

IterativeImputer

Multivariate imputer that estimates each feature from all the others.

KNNImputer

Imputation for completing missing values using k-Nearest Neighbors.

MissingIndicator

Binary indicators for missing values.

SimpleImputer

Univariate imputer for completing missing values with simple strategies.

partial_dependence

Partial dependence of features.

permutation_importance

Permutation importance for feature evaluation [Rd9e56ef97513-BRE].

DecisionBoundaryDisplay

Decisions boundary visualization.

PartialDependenceDisplay

Partial Dependence Plot (PDP).

IsotonicRegression

Isotonic regression model.

check_increasing

Determine whether y is monotonically correlated with x.

isotonic_regression

Solve the isotonic regression model.

AdditiveChi2Sampler

Approximate feature map for additive chi2 kernel.

Nystroem

Approximate a kernel map using a subset of the training data.

PolynomialCountSketch

Polynomial kernel approximation via Tensor Sketch.

RBFSampler

Approximate a RBF kernel feature map using random Fourier features.

SkewedChi2Sampler

Approximate feature map for “skewed chi-squared” kernel.

KernelRidge

Kernel ridge regression.

LogisticRegression

Logistic Regression (aka logit, MaxEnt) classifier.

LogisticRegressionCV

Logistic Regression CV (aka logit, MaxEnt) classifier.

PassiveAggressiveClassifier

Passive Aggressive Classifier.

Perceptron

Linear perceptron classifier.

RidgeClassifier

Classifier using Ridge regression.

RidgeClassifierCV

Ridge classifier with built-in cross-validation.

SGDClassifier

Linear classifiers (SVM, logistic regression, etc.) with SGD training.

SGDOneClassSVM

Solves linear One-Class SVM using Stochastic Gradient Descent.

LinearRegression

Ordinary least squares Linear Regression.

Ridge

Linear least squares with l2 regularization.

RidgeCV

Ridge regression with built-in cross-validation.

SGDRegressor

Linear model fitted by minimizing a regularized empirical loss with SGD.

ElasticNet

Linear regression with combined L1 and L2 priors as regularizer.

ElasticNetCV

Elastic Net model with iterative fitting along a regularization path.

Lars

Least Angle Regression model a.k.a. LAR.

LarsCV

Cross-validated Least Angle Regression model.

Lasso

Linear Model trained with L1 prior as regularizer (aka the Lasso).

LassoCV

Lasso linear model with iterative fitting along a regularization path.

LassoLars

Lasso model fit with Least Angle Regression a.k.a. Lars.

LassoLarsCV

Cross-validated Lasso, using the LARS algorithm.

LassoLarsIC

Lasso model fit with Lars using BIC or AIC for model selection.

OrthogonalMatchingPursuit

Orthogonal Matching Pursuit model (OMP).

OrthogonalMatchingPursuitCV

Cross-validated Orthogonal Matching Pursuit model (OMP).

ARDRegression

Bayesian ARD regression.

BayesianRidge

Bayesian ridge regression.

MultiTaskElasticNet

Multi-task ElasticNet model trained with L1/L2 mixed-norm as regularizer.

MultiTaskElasticNetCV

Multi-task L1/L2 ElasticNet with built-in cross-validation.

MultiTaskLasso

Multi-task Lasso model trained with L1/L2 mixed-norm as regularizer.

MultiTaskLassoCV

Multi-task Lasso model trained with L1/L2 mixed-norm as regularizer.

HuberRegressor

L2-regularized linear regression model that is robust to outliers.

QuantileRegressor

Linear regression model that predicts conditional quantiles.

RANSACRegressor

RANSAC (RANdom SAmple Consensus) algorithm.

TheilSenRegressor

Theil-Sen Estimator: robust multivariate regression model.

GammaRegressor

Generalized Linear Model with a Gamma distribution.

PoissonRegressor

Generalized Linear Model with a Poisson distribution.

TweedieRegressor

Generalized Linear Model with a Tweedie distribution.

PassiveAggressiveRegressor

Passive Aggressive Regressor.

enet_path

Compute elastic net path with coordinate descent.

lars_path

Compute Least Angle Regression or Lasso path using the LARS algorithm.

lars_path_gram

The lars_path in the sufficient stats mode.

lasso_path

Compute Lasso path with coordinate descent.

orthogonal_mp

Orthogonal Matching Pursuit (OMP).

orthogonal_mp_gram

Gram Orthogonal Matching Pursuit (OMP).

ridge_regression

Solve the ridge equation by the method of normal equations.

Isomap

Isomap Embedding.

LocallyLinearEmbedding

Locally Linear Embedding.

MDS

Multidimensional scaling.

SpectralEmbedding

Spectral embedding for non-linear dimensionality reduction.

TSNE

T-distributed Stochastic Neighbor Embedding.

locally_linear_embedding

Perform a Locally Linear Embedding analysis on the data.

smacof

Compute multidimensional scaling using the SMACOF algorithm.

spectral_embedding

Project the sample on the first eigenvectors of the graph Laplacian.

trustworthiness

Indicate to what extent the local structure is retained.

check_scoring

Determine scorer from user options.

get_scorer

Get a scorer from string.

get_scorer_names

Get the names of all available scorers.

make_scorer

Make a scorer from a performance metric or loss function.

accuracy_score

Accuracy classification score.

auc

Compute Area Under the Curve (AUC) using the trapezoidal rule.

average_precision_score

Compute average precision (AP) from prediction scores.

balanced_accuracy_score

Compute the balanced accuracy.

brier_score_loss

Compute the Brier score loss.

class_likelihood_ratios

Compute binary classification positive and negative likelihood ratios.

classification_report

Build a text report showing the main classification metrics.

cohen_kappa_score

Compute Cohen’s kappa: a statistic that measures inter-annotator agreement.

confusion_matrix

Compute confusion matrix to evaluate the accuracy of a classification.

d2_log_loss_score

\(D^2\) score function, fraction of log loss explained.

dcg_score

Compute Discounted Cumulative Gain.

det_curve

Compute error rates for different probability thresholds.

f1_score

Compute the F1 score, also known as balanced F-score or F-measure.

fbeta_score

Compute the F-beta score.

hamming_loss

Compute the average Hamming loss.

hinge_loss

Average hinge loss (non-regularized).

jaccard_score

Jaccard similarity coefficient score.

log_loss

Log loss, aka logistic loss or cross-entropy loss.

matthews_corrcoef

Compute the Matthews correlation coefficient (MCC).

multilabel_confusion_matrix

Compute a confusion matrix for each class or sample.

ndcg_score

Compute Normalized Discounted Cumulative Gain.

precision_recall_curve

Compute precision-recall pairs for different probability thresholds.

precision_recall_fscore_support

Compute precision, recall, F-measure and support for each class.

precision_score

Compute the precision.

recall_score

Compute the recall.

roc_auc_score

Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.

roc_curve

Compute Receiver operating characteristic (ROC).

top_k_accuracy_score

Top-k Accuracy classification score.

zero_one_loss

Zero-one classification loss.

d2_absolute_error_score

\(D^2\) regression score function, fraction of absolute error explained.

d2_pinball_score

\(D^2\) regression score function, fraction of pinball loss explained.

d2_tweedie_score

\(D^2\) regression score function, fraction of Tweedie deviance explained.

explained_variance_score

Explained variance regression score function.

max_error

The max_error metric calculates the maximum residual error.

mean_absolute_error

Mean absolute error regression loss.

mean_absolute_percentage_error

Mean absolute percentage error (MAPE) regression loss.

mean_gamma_deviance

Mean Gamma deviance regression loss.

mean_pinball_loss

Pinball loss for quantile regression.

mean_poisson_deviance

Mean Poisson deviance regression loss.

mean_squared_error

Mean squared error regression loss.

mean_squared_log_error

Mean squared logarithmic error regression loss.

mean_tweedie_deviance

Mean Tweedie deviance regression loss.

median_absolute_error

Median absolute error regression loss.

r2_score

\(R^2\) (coefficient of determination) regression score function.

root_mean_squared_error

Root mean squared error regression loss.

root_mean_squared_log_error

Root mean squared logarithmic error regression loss.

coverage_error

Coverage error measure.

label_ranking_average_precision_score

Compute ranking-based average precision.

label_ranking_loss

Compute Ranking loss measure.

adjusted_mutual_info_score

Adjusted Mutual Information between two clusterings.

adjusted_rand_score

Rand index adjusted for chance.

calinski_harabasz_score

Compute the Calinski and Harabasz score.

contingency_matrix

Build a contingency matrix describing the relationship between labels.

pair_confusion_matrix

Pair confusion matrix arising from two clusterings.

completeness_score

Compute completeness metric of a cluster labeling given a ground truth.

davies_bouldin_score

Compute the Davies-Bouldin score.

fowlkes_mallows_score

Measure the similarity of two clusterings of a set of points.

homogeneity_completeness_v_measure

Compute the homogeneity and completeness and V-Measure scores at once.

homogeneity_score

Homogeneity metric of a cluster labeling given a ground truth.

mutual_info_score

Mutual Information between two clusterings.

normalized_mutual_info_score

Normalized Mutual Information between two clusterings.

rand_score

Rand index.

silhouette_samples

Compute the Silhouette Coefficient for each sample.

silhouette_score

Compute the mean Silhouette Coefficient of all samples.

v_measure_score

V-measure cluster labeling given a ground truth.

consensus_score

The similarity of two sets of biclusters.

DistanceMetric

Uniform interface for fast distance metric functions.

additive_chi2_kernel

Compute the additive chi-squared kernel between observations in X and Y.

chi2_kernel

Compute the exponential chi-squared kernel between X and Y.

cosine_distances

Compute cosine distance between samples in X and Y.

cosine_similarity

Compute cosine similarity between samples in X and Y.

distance_metrics

Valid metrics for pairwise_distances.

euclidean_distances

Compute the distance matrix between each pair from a vector array X and Y.

haversine_distances

Compute the Haversine distance between samples in X and Y.

kernel_metrics

Valid metrics for pairwise_kernels.

laplacian_kernel

Compute the laplacian kernel between X and Y.

linear_kernel

Compute the linear kernel between X and Y.

manhattan_distances

Compute the L1 distances between the vectors in X and Y.

nan_euclidean_distances

Calculate the euclidean distances in the presence of missing values.

paired_cosine_distances

Compute the paired cosine distances between X and Y.

paired_distances

Compute the paired distances between X and Y.

paired_euclidean_distances

Compute the paired euclidean distances between X and Y.

paired_manhattan_distances

Compute the paired L1 distances between X and Y.

pairwise_kernels

Compute the kernel between arrays X and optional array Y.

polynomial_kernel

Compute the polynomial kernel between X and Y.

rbf_kernel

Compute the rbf (gaussian) kernel between X and Y.

sigmoid_kernel

Compute the sigmoid kernel between X and Y.

pairwise_distances

Compute the distance matrix from a vector array X and optional Y.

pairwise_distances_argmin

Compute minimum distances between one point and a set of points.

pairwise_distances_argmin_min

Compute minimum distances between one point and a set of points.

pairwise_distances_chunked

Generate a distance matrix chunk by chunk with optional reduction.

ConfusionMatrixDisplay

Confusion Matrix visualization.

DetCurveDisplay

DET curve visualization.

PrecisionRecallDisplay

Precision Recall visualization.

PredictionErrorDisplay

Visualization of the prediction error of a regression model.

RocCurveDisplay

ROC Curve visualization.

BayesianGaussianMixture

Variational Bayesian estimation of a Gaussian mixture.

GaussianMixture

Gaussian Mixture.

GroupKFold

K-fold iterator variant with non-overlapping groups.

GroupShuffleSplit

Shuffle-Group(s)-Out cross-validation iterator.

KFold

K-Fold cross-validator.

LeaveOneGroupOut

Leave One Group Out cross-validator.

LeaveOneOut

Leave-One-Out cross-validator.

LeavePGroupsOut

Leave P Group(s) Out cross-validator.

LeavePOut

Leave-P-Out cross-validator.

PredefinedSplit

Predefined split cross-validator.

RepeatedKFold

Repeated K-Fold cross validator.

RepeatedStratifiedKFold

Repeated Stratified K-Fold cross validator.

ShuffleSplit

Random permutation cross-validator.

StratifiedGroupKFold

Stratified K-Fold iterator variant with non-overlapping groups.

StratifiedKFold

Stratified K-Fold cross-validator.

StratifiedShuffleSplit

Stratified ShuffleSplit cross-validator.

TimeSeriesSplit

Time Series cross-validator.

check_cv

Input checker utility for building a cross-validator.

train_test_split

Split arrays or matrices into random train and test subsets.

GridSearchCV

Exhaustive search over specified parameter values for an estimator.

HalvingGridSearchCV

Search over specified parameter values with successive halving.

HalvingRandomSearchCV

Randomized search on hyper parameters.

ParameterGrid

Grid of parameters with a discrete number of values for each.

ParameterSampler

Generator on parameters sampled from given distributions.

RandomizedSearchCV

Randomized search on hyper parameters.

FixedThresholdClassifier

Binary classifier that manually sets the decision threshold.

TunedThresholdClassifierCV

Classifier that post-tunes the decision threshold using cross-validation.

cross_val_predict

Generate cross-validated estimates for each input data point.

cross_val_score

Evaluate a score by cross-validation.

cross_validate

Evaluate metric(s) by cross-validation and also record fit/score times.

learning_curve

Learning curve.

permutation_test_score

Evaluate the significance of a cross-validated score with permutations.

validation_curve

Validation curve.

LearningCurveDisplay

Learning Curve visualization.

ValidationCurveDisplay

Validation Curve visualization.

OneVsOneClassifier

One-vs-one multiclass strategy.

OneVsRestClassifier

One-vs-the-rest (OvR) multiclass strategy.

OutputCodeClassifier

(Error-Correcting) Output-Code multiclass strategy.

ClassifierChain

A multi-label model that arranges binary classifiers into a chain.

MultiOutputClassifier

Multi target classification.

MultiOutputRegressor

Multi target regression.

RegressorChain

A multi-label model that arranges regressions into a chain.

BernoulliNB

Naive Bayes classifier for multivariate Bernoulli models.

CategoricalNB

Naive Bayes classifier for categorical features.

ComplementNB

The Complement Naive Bayes classifier described in Rennie et al. (2003).

GaussianNB

Gaussian Naive Bayes (GaussianNB).

MultinomialNB

Naive Bayes classifier for multinomial models.

BallTree

BallTree for fast generalized N-point problems

KDTree

KDTree for fast generalized N-point problems

KNeighborsClassifier

Classifier implementing the k-nearest neighbors vote.

KNeighborsRegressor

Regression based on k-nearest neighbors.

KNeighborsTransformer

Transform X into a (weighted) graph of k nearest neighbors.

KernelDensity

Kernel Density Estimation.

LocalOutlierFactor

Unsupervised Outlier Detection using the Local Outlier Factor (LOF).

NearestCentroid

Nearest centroid classifier.

NearestNeighbors

Unsupervised learner for implementing neighbor searches.

NeighborhoodComponentsAnalysis

Neighborhood Components Analysis.

RadiusNeighborsClassifier

Classifier implementing a vote among neighbors within a given radius.

RadiusNeighborsRegressor

Regression based on neighbors within a fixed radius.

RadiusNeighborsTransformer

Transform X into a (weighted) graph of neighbors nearer than a radius.

kneighbors_graph

Compute the (weighted) graph of k-Neighbors for points in X.

radius_neighbors_graph

Compute the (weighted) graph of Neighbors for points in X.

sort_graph_by_row_values

Sort a sparse graph such that each row is stored with increasing values.

BernoulliRBM

Bernoulli Restricted Boltzmann Machine (RBM).

MLPClassifier

Multi-layer Perceptron classifier.

MLPRegressor

Multi-layer Perceptron regressor.

FeatureUnion

Concatenates results of multiple transformer objects.

Pipeline

A sequence of data transformers with an optional final predictor.

make_pipeline

Construct a Pipeline from the given estimators.

make_union

Construct a FeatureUnion from the given transformers.

Binarizer

Binarize data (set feature values to 0 or 1) according to a threshold.

FunctionTransformer

Constructs a transformer from an arbitrary callable.

KBinsDiscretizer

Bin continuous data into intervals.

KernelCenterer

Center an arbitrary kernel matrix \(K\).

LabelBinarizer

Binarize labels in a one-vs-all fashion.

LabelEncoder

Encode target labels with value between 0 and n_classes-1.

MaxAbsScaler

Scale each feature by its maximum absolute value.

MinMaxScaler

Transform features by scaling each feature to a given range.

MultiLabelBinarizer

Transform between iterable of iterables and a multilabel format.

Normalizer

Normalize samples individually to unit norm.

OneHotEncoder

Encode categorical features as a one-hot numeric array.

OrdinalEncoder

Encode categorical features as an integer array.

PolynomialFeatures

Generate polynomial and interaction features.

PowerTransformer

Apply a power transform featurewise to make data more Gaussian-like.

QuantileTransformer

Transform features using quantiles information.

RobustScaler

Scale features using statistics that are robust to outliers.

SplineTransformer

Generate univariate B-spline bases for features.

StandardScaler

Standardize features by removing the mean and scaling to unit variance.

TargetEncoder

Target Encoder for regression and classification targets.

add_dummy_feature

Augment dataset with an additional dummy feature.

binarize

Boolean thresholding of array-like or scipy.sparse matrix.

label_binarize

Binarize labels in a one-vs-all fashion.

maxabs_scale

Scale each feature to the [-1, 1] range without breaking the sparsity.

minmax_scale

Transform features by scaling each feature to a given range.

normalize

Scale input vectors individually to unit norm (vector length).

power_transform

Parametric, monotonic transformation to make data more Gaussian-like.

quantile_transform

Transform features using quantiles information.

robust_scale

Standardize a dataset along any axis.

scale

Standardize a dataset along any axis.

GaussianRandomProjection

Reduce dimensionality through Gaussian random projection.

SparseRandomProjection

Reduce dimensionality through sparse random projection.

johnson_lindenstrauss_min_dim

Find a ‘safe’ number of components to randomly project to.

LabelPropagation

Label Propagation classifier.

LabelSpreading

LabelSpreading model for semi-supervised learning.

SelfTrainingClassifier

Self-training classifier.

LinearSVC

Linear Support Vector Classification.

LinearSVR

Linear Support Vector Regression.

NuSVC

Nu-Support Vector Classification.

NuSVR

Nu Support Vector Regression.

OneClassSVM

Unsupervised Outlier Detection.

SVC

C-Support Vector Classification.

SVR

Epsilon-Support Vector Regression.

l1_min_c

Return the lowest bound for C.

DecisionTreeClassifier

A decision tree classifier.

DecisionTreeRegressor

A decision tree regressor.

ExtraTreeClassifier

An extremely randomized tree classifier.

ExtraTreeRegressor

An extremely randomized tree regressor.

export_graphviz

Export a decision tree in DOT format.

export_text

Build a text report showing the rules of a decision tree.

plot_tree

Plot a decision tree.

Bunch

Container object exposing keys as attributes.

_safe_indexing

Return rows, items or columns of X using indices.

as_float_array

Convert an array-like to an array of floats.

assert_all_finite

Throw a ValueError if X contains NaN or infinity.

deprecated

Decorator to mark a function or class as deprecated.

estimator_html_repr

Build a HTML representation of an estimator.

gen_batches

Generator to create slices containing batch_size elements from 0 to n.

gen_even_slices

Generator to create n_packs evenly spaced slices going up to n.

indexable

Make arrays indexable for cross-validation.

murmurhash3_32

Compute the 32bit murmurhash3 of key at seed.

resample

Resample arrays or sparse matrices in a consistent way.

safe_mask

Return a mask which is safe to use on X.

safe_sqr

Element wise squaring of array-likes and sparse matrices.

shuffle

Shuffle arrays or sparse matrices in a consistent way.

Tags

Tags for the estimator.

InputTags

Tags for the input data.

TargetTags

Tags for the target data.

ClassifierTags

Tags for the classifier.

RegressorTags

Tags for the regressor.

TransformerTags

Tags for the transformer.

get_tags

Get estimator tags.

check_X_y

Input validation for standard estimators.

check_array

Input validation on an array, list, sparse matrix or similar.

check_consistent_length

Check that all arrays have consistent first dimensions.

check_random_state

Turn seed into a np.random.RandomState instance.

check_scalar

Validate scalar parameters type and value.

check_is_fitted

Perform is_fitted validation for estimator.

check_memory

Check that memory is joblib.Memory-like.

check_symmetric

Make sure that array is 2D, square and symmetric.

column_or_1d

Ravel column or 1d numpy array, else raises an error.

has_fit_parameter

Check whether the estimator’s fit method supports the given parameter.

validate_data

Validate input data and set or check feature names and counts of the input.

available_if

An attribute that is available only if check returns a truthy value.

compute_class_weight

Estimate class weights for unbalanced datasets.

compute_sample_weight

Estimate sample weights by class for unbalanced datasets.

is_multilabel

Check if y is in a multilabel format.

type_of_target

Determine the type of data indicated by the target.

unique_labels

Extract an ordered array of unique labels.

density

Compute density of a sparse vector.

fast_logdet

Compute logarithm of determinant of a square matrix.

randomized_range_finder

Compute an orthonormal matrix whose range approximates the range of A.

randomized_svd

Compute a truncated randomized SVD.

safe_sparse_dot

Dot product that handle the sparse matrix case correctly.

weighted_mode

Return an array of the weighted modal (most common) value in the passed array.

incr_mean_variance_axis

Compute incremental mean and variance along an axis on a CSR or CSC matrix.

inplace_column_scale

Inplace column scaling of a CSC/CSR matrix.

inplace_csr_column_scale

Inplace column scaling of a CSR matrix.

inplace_row_scale

Inplace row scaling of a CSR or CSC matrix.

inplace_swap_column

Swap two columns of a CSC/CSR matrix in-place.

inplace_swap_row

Swap two rows of a CSC/CSR matrix in-place.

mean_variance_axis

Compute mean and variance along an axis on a CSR or CSC matrix.

inplace_csr_row_normalize_l1

Normalize inplace the rows of a CSR matrix or array by their L1 norm.

inplace_csr_row_normalize_l2

Normalize inplace the rows of a CSR matrix or array by their L2 norm.

single_source_shortest_path_length

Return the length of the shortest path from source to all reachable nodes.

sample_without_replacement

Sample integers without replacement.

min_pos

Find the minimum value of an array over positive values.

MetadataRequest

Contains the metadata request info of a consumer.

MetadataRouter

Stores and handles metadata routing for a router object.

MethodMapping

Stores the mapping between caller and callee methods for a router.

get_routing_for_object

Get a Metadata{Router, Request} instance from the given object.

process_routing

Validate and route input parameters.

all_displays

Get a list of all displays from sklearn.

all_estimators

Get a list of all estimators from sklearn.

all_functions

Get a list of all functions from sklearn.

check_estimator

Check if estimator adheres to scikit-learn conventions.

parametrize_with_checks

Pytest specific decorator for parametrizing estimator checks.

estimator_checks_generator

Iteratively yield all check callables for an estimator.

Parallel

Tweak of joblib.Parallel that propagates the scikit-learn configuration.

delayed

Decorator used to capture the arguments of a function.

parallel_backend

Change the default backend used by Parallel inside a with block.

register_parallel_backend

Register a new Parallel backend factory.