Reference

This is the class and function reference of scikit-learn. Please refer to the full user guide for further details, as the class and function raw specifications may not be enough to give full guidelines on their uses.

sklearn.base: Base classes and utility functions

Base classes for all estimators.

Base classes

base.BaseEstimator Base class for all estimators in scikit-learn
base.ClassifierMixin Mixin class for all classifiers in scikit-learn.
base.ClusterMixin Mixin class for all cluster estimators in scikit-learn.
base.RegressorMixin Mixin class for all regression estimators in scikit-learn.
base.TransformerMixin Mixin class for all transformers in scikit-learn.

Functions

base.clone(estimator[, safe]) Constructs a new estimator with the same parameters.

sklearn.cluster: Clustering

The sklearn.cluster module gathers popular unsupervised clustering algorithms.

User guide: See the Clustering section for further details.

Classes

Functions

sklearn.cluster.bicluster: Biclustering

Spectral biclustering algorithms.

Authors : Kemal Eren License: BSD 3 clause

User guide: See the Biclustering section for further details.

Classes

SpectralBiclustering([n_clusters, method, ...]) Spectral biclustering (Kluger, 2003).
SpectralCoclustering([n_clusters, ...]) Spectral Co-Clustering algorithm (Dhillon, 2001).

sklearn.covariance: Covariance Estimators

The sklearn.covariance module includes methods and algorithms to robustly estimate the covariance of features given a set of points. The precision matrix defined as the inverse of the covariance is also estimated. Covariance estimation is closely related to the theory of Gaussian Graphical Models.

User guide: See the Covariance estimation section for further details.

sklearn.cross_validation: Cross Validation

The sklearn.cross_validation module includes utilities for cross- validation and performance evaluation.

User guide: See the Cross-validation: evaluating estimator performance section for further details.

cross_validation.KFold(n[, n_folds, ...]) K-Folds cross validation iterator.
cross_validation.LeaveOneLabelOut(labels[, ...]) Leave-One-Label_Out cross-validation iterator
cross_validation.LeaveOneOut(n[, indices]) Leave-One-Out cross validation iterator.
cross_validation.LeavePLabelOut(labels, p[, ...]) Leave-P-Label_Out cross-validation iterator
cross_validation.LeavePOut(n, p[, indices]) Leave-P-Out cross validation iterator
cross_validation.PredefinedSplit(test_fold) Predefined split cross validation iterator
cross_validation.StratifiedKFold(y[, ...]) Stratified K-Folds cross validation iterator
cross_validation.ShuffleSplit(n[, n_iter, ...]) Random permutation cross-validation iterator.
cross_validation.StratifiedShuffleSplit(y[, ...]) Stratified ShuffleSplit cross validation iterator
cross_validation.train_test_split(*arrays, ...) Split arrays or matrices into random train and test subsets
cross_validation.cross_val_score(estimator, X) Evaluate a score by cross-validation
cross_validation.cross_val_predict(estimator, X) Generate cross-validated estimates for each input data point
cross_validation.permutation_test_score(...) Evaluate the significance of a cross-validated score with permutations
cross_validation.check_cv(cv[, X, y, classifier]) Input checker utility for building a CV in a user friendly way.

sklearn.datasets: Datasets

The sklearn.datasets module includes utilities to load datasets, including methods to load and fetch popular reference datasets. It also features some artificial data generators.

User guide: See the Dataset loading utilities section for further details.

Loaders

Samples generator

sklearn.decomposition: Matrix Decomposition

The sklearn.decomposition module includes matrix decomposition algorithms, including among others PCA, NMF or ICA. Most of the algorithms of this module can be regarded as dimensionality reduction techniques.

User guide: See the Decomposing signals in components (matrix factorization problems) section for further details.

sklearn.dummy: Dummy estimators

User guide: See the Model evaluation: quantifying the quality of predictions section for further details.

dummy.DummyClassifier([strategy, ...]) DummyClassifier is a classifier that makes predictions using simple rules.
dummy.DummyRegressor([strategy, constant, ...]) DummyRegressor is a regressor that makes predictions using simple rules.

sklearn.ensemble: Ensemble Methods

The sklearn.ensemble module includes ensemble-based methods for classification and regression.

User guide: See the Ensemble methods section for further details.

partial dependence

Partial dependence plots for tree ensembles.

ensemble.partial_dependence.partial_dependence(...) Partial dependence of target_variables.
ensemble.partial_dependence.plot_partial_dependence(...) Partial dependence plots for features.

sklearn.feature_extraction: Feature Extraction

The sklearn.feature_extraction module deals with feature extraction from raw data. It currently includes methods to extract features from text and images.

User guide: See the Feature extraction section for further details.

From images

The sklearn.feature_extraction.image submodule gathers utilities to extract features from images.

feature_extraction.image.img_to_graph(img[, ...]) Graph of the pixel-to-pixel gradient connections
feature_extraction.image.grid_to_graph(n_x, n_y) Graph of the pixel-to-pixel connections
feature_extraction.image.extract_patches_2d(...) Reshape a 2D image into a collection of patches
feature_extraction.image.reconstruct_from_patches_2d(...) Reconstruct the image from all of its patches.
feature_extraction.image.PatchExtractor([...]) Extracts patches from a collection of images

From text

The sklearn.feature_extraction.text submodule gathers utilities to build feature vectors from text documents.

feature_extraction.text.CountVectorizer([...]) Convert a collection of text documents to a matrix of token counts
feature_extraction.text.HashingVectorizer([...]) Convert a collection of text documents to a matrix of token occurrences
feature_extraction.text.TfidfTransformer([...]) Transform a count matrix to a normalized tf or tf-idf representation
feature_extraction.text.TfidfVectorizer([...]) Convert a collection of raw documents to a matrix of TF-IDF features.

sklearn.feature_selection: Feature Selection

The sklearn.feature_selection module implements feature selection algorithms. It currently includes univariate filter selection methods and the recursive feature elimination algorithm.

User guide: See the Feature selection section for further details.

sklearn.gaussian_process: Gaussian Processes

The sklearn.gaussian_process module implements scalar Gaussian Process based predictions.

User guide: See the Gaussian Processes section for further details.

gaussian_process.correlation_models.absolute_exponential(...) Absolute exponential autocorrelation model.
gaussian_process.correlation_models.squared_exponential(...) Squared exponential correlation model (Radial Basis Function).
gaussian_process.correlation_models.generalized_exponential(...) Generalized exponential correlation model.
gaussian_process.correlation_models.pure_nugget(...) Spatial independence correlation model (pure nugget).
gaussian_process.correlation_models.cubic(...) Cubic correlation model:
gaussian_process.correlation_models.linear(...) Linear correlation model:
gaussian_process.regression_models.constant(x) Zero order polynomial (constant, p = 1) regression model.
gaussian_process.regression_models.linear(x) First order polynomial (linear, p = n+1) regression model.
gaussian_process.regression_models.quadratic(x) Second order polynomial (quadratic, p = n*(n-1)/2+n+1) regression model.

sklearn.isotonic: Isotonic regression

User guide: See the Isotonic regression section for further details.

isotonic.IsotonicRegression([y_min, y_max, ...]) Isotonic regression model.
isotonic.isotonic_regression(y[, ...]) Solve the isotonic regression model:
isotonic.check_increasing(x, y) Determine whether y is monotonically correlated with x.

sklearn.kernel_approximation Kernel Approximation

The sklearn.kernel_approximation module implements several approximate kernel feature maps base on Fourier transforms.

User guide: See the Kernel Approximation section for further details.

kernel_approximation.AdditiveChi2Sampler([...]) Approximate feature map for additive chi2 kernel.
kernel_approximation.Nystroem([kernel, ...]) Approximate a kernel map using a subset of the training data.
kernel_approximation.RBFSampler([gamma, ...]) Approximates feature map of an RBF kernel by Monte Carlo approximation of its Fourier transform.
kernel_approximation.SkewedChi2Sampler([...]) Approximates feature map of the “skewed chi-squared” kernel by Monte Carlo approximation of its Fourier transform.

sklearn.kernel_ridge Kernel Ridge Regression

Module sklearn.kernel_ridge implements kernel ridge regression.

User guide: See the Kernel ridge regression section for further details.

kernel_ridge.KernelRidge([alpha, kernel, ...]) Kernel ridge regression.

sklearn.lda: Linear Discriminant Analysis

Linear Discriminant Analysis (LDA)

User guide: See the Linear and quadratic discriminant analysis section for further details.

lda.LDA([solver, shrinkage, priors, ...]) Linear Discriminant Analysis (LDA).

sklearn.learning_curve Learning curve evaluation

Utilities to evaluate models with respect to a variable

learning_curve.learning_curve(estimator, X, y) Learning curve.
learning_curve.validation_curve(estimator, ...) Validation curve.

sklearn.linear_model: Generalized Linear Models

The sklearn.linear_model module implements generalized linear models. It includes Ridge regression, Bayesian Regression, Lasso and Elastic Net estimators computed with Least Angle Regression and coordinate descent. It also implements Stochastic Gradient Descent related algorithms.

User guide: See the Generalized Linear Models section for further details.

sklearn.manifold: Manifold Learning

The sklearn.manifold module implements data embedding techniques.

User guide: See the Manifold learning section for further details.

sklearn.metrics: Metrics

See the Model evaluation: quantifying the quality of predictions section and the Pairwise metrics, Affinities and Kernels section of the user guide for further details.

The sklearn.metrics module includes score functions, performance metrics and pairwise metrics and distance computations.

Model Selection Interface

See the The scoring parameter: defining model evaluation rules section of the user guide for further details.

Classification metrics

See the Classification metrics section of the user guide for further details.

Regression metrics

See the Regression metrics section of the user guide for further details.

Multilabel ranking metrics

See the Multilabel ranking metrics section of the user guide for further details.

Clustering metrics

See the Clustering performance evaluation section of the user guide for further details.

The sklearn.metrics.cluster submodule contains evaluation metrics for cluster analysis results. There are two forms of evaluation:

  • supervised, which uses a ground truth class values for each sample.
  • unsupervised, which does not and measures the ‘quality’ of the model itself.

Biclustering metrics

See the Biclustering evaluation section of the user guide for further details.

Pairwise metrics

See the Pairwise metrics, Affinities and Kernels section of the user guide for further details.

metrics.pairwise.additive_chi2_kernel(X[, Y]) Computes the additive chi-squared kernel between observations in X and Y
metrics.pairwise.chi2_kernel(X[, Y, gamma]) Computes the exponential chi-squared kernel X and Y.
metrics.pairwise.distance_metrics() Valid metrics for pairwise_distances.
metrics.pairwise.euclidean_distances(X[, Y, ...]) Considering the rows of X (and Y=X) as vectors, compute the distance matrix between each pair of vectors.
metrics.pairwise.kernel_metrics() Valid metrics for pairwise_kernels
metrics.pairwise.linear_kernel(X[, Y]) Compute the linear kernel between X and Y.
metrics.pairwise.manhattan_distances(X[, Y, ...]) Compute the L1 distances between the vectors in X and Y.
metrics.pairwise.pairwise_distances(X[, Y, ...]) Compute the distance matrix from a vector array X and optional Y.
metrics.pairwise.pairwise_kernels(X[, Y, ...]) Compute the kernel between arrays X and optional array Y.
metrics.pairwise.polynomial_kernel(X[, Y, ...]) Compute the polynomial kernel between X and Y:
metrics.pairwise.rbf_kernel(X[, Y, gamma]) Compute the rbf (gaussian) kernel between X and Y:

sklearn.mixture: Gaussian Mixture Models

The sklearn.mixture module implements mixture modeling algorithms.

User guide: See the Gaussian mixture models section for further details.

sklearn.multiclass: Multiclass and multilabel classification

Multiclass and multilabel classification strategies

This module implements multiclass learning algorithms:
  • one-vs-the-rest / one-vs-all
  • one-vs-one
  • error correcting output codes

The estimators provided in this module are meta-estimators: they require a base estimator to be provided in their constructor. For example, it is possible to use these estimators to turn a binary classifier or a regressor into a multiclass classifier. It is also possible to use these estimators with multiclass estimators in the hope that their accuracy or runtime performance improves.

All classifiers in scikit-learn implement multiclass classification; you only need to use this module if you want to experiment with custom multiclass strategies.

The one-vs-the-rest meta-classifier also implements a predict_proba method, so long as such a method is implemented by the base classifier. This method returns probabilities of class membership in both the single label and multilabel case. Note that in the multilabel case, probabilities are the marginal probability that a given sample falls in the given class. As such, in the multilabel case the sum of these probabilities over all possible labels for a given sample will not sum to unity, as they do in the single label case.

User guide: See the Multiclass and multilabel algorithms section for further details.

multiclass.OneVsRestClassifier(estimator[, ...]) One-vs-the-rest (OvR) multiclass/multilabel strategy
multiclass.OneVsOneClassifier(estimator[, ...]) One-vs-one multiclass strategy
multiclass.OutputCodeClassifier(estimator[, ...]) (Error-Correcting) Output-Code multiclass strategy

sklearn.naive_bayes: Naive Bayes

The sklearn.naive_bayes module implements Naive Bayes algorithms. These are supervised learning methods based on applying Bayes’ theorem with strong (naive) feature independence assumptions.

User guide: See the Naive Bayes section for further details.

naive_bayes.GaussianNB Gaussian Naive Bayes (GaussianNB)
naive_bayes.MultinomialNB([alpha, ...]) Naive Bayes classifier for multinomial models
naive_bayes.BernoulliNB([alpha, binarize, ...]) Naive Bayes classifier for multivariate Bernoulli models.

sklearn.neighbors: Nearest Neighbors

The sklearn.neighbors module implements the k-nearest neighbors algorithm.

User guide: See the Nearest Neighbors section for further details.

sklearn.neural_network: Neural network models

The sklearn.neural_network module includes models based on neural networks.

User guide: See the Neural network models (unsupervised) section for further details.

sklearn.calibration: Probability Calibration

Calibration of predicted probabilities.

User guide: See the Probability calibration section for further details.

calibration.CalibratedClassifierCV([...]) Probability calibration with isotonic regression or sigmoid.
calibration.calibration_curve(y_true, y_prob) Compute true and predicted probabilities for a calibration curve.

sklearn.cross_decomposition: Cross decomposition

User guide: See the Cross decomposition section for further details.

sklearn.pipeline: Pipeline

The sklearn.pipeline module implements utilities to build a composite estimator, as a chain of transforms and estimators.

pipeline.Pipeline(steps) Pipeline of transforms with a final estimator.
pipeline.FeatureUnion(transformer_list[, ...]) Concatenates results of multiple transformer objects.
pipeline.make_pipeline(*steps) Construct a Pipeline from the given estimators.
pipeline.make_union(*transformers) Construct a FeatureUnion from the given transformers.

sklearn.preprocessing: Preprocessing and Normalization

The sklearn.preprocessing module includes scaling, centering, normalization, binarization and imputation methods.

User guide: See the Preprocessing data section for further details.

sklearn.qda: Quadratic Discriminant Analysis

Quadratic Discriminant Analysis

User guide: See the Linear and quadratic discriminant analysis section for further details.

qda.QDA([priors, reg_param]) Quadratic Discriminant Analysis (QDA)

sklearn.random_projection: Random projection

Random Projection transformers

Random Projections are a simple and computationally efficient way to reduce the dimensionality of the data by trading a controlled amount of accuracy (as additional variance) for faster processing times and smaller model sizes.

The dimensions and distribution of Random Projections matrices are controlled so as to preserve the pairwise distances between any two samples of the dataset.

The main theoretical result behind the efficiency of random projection is the Johnson-Lindenstrauss lemma (quoting Wikipedia):

In mathematics, the Johnson-Lindenstrauss lemma is a result concerning low-distortion embeddings of points from high-dimensional into low-dimensional Euclidean space. The lemma states that a small set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved. The map used for the embedding is at least Lipschitz, and can even be taken to be an orthogonal projection.

User guide: See the Random Projection section for further details.

random_projection.GaussianRandomProjection([...]) Reduce dimensionality through Gaussian random projection
random_projection.SparseRandomProjection([...]) Reduce dimensionality through sparse random projection
random_projection.johnson_lindenstrauss_min_dim(...) Find a ‘safe’ number of components to randomly project to

sklearn.semi_supervised Semi-Supervised Learning

The sklearn.semi_supervised module implements semi-supervised learning algorithms. These algorithms utilized small amounts of labeled data and large amounts of unlabeled data for classification tasks. This module includes Label Propagation.

User guide: See the Semi-Supervised section for further details.

sklearn.svm: Support Vector Machines

The sklearn.svm module includes Support Vector Machine algorithms.

User guide: See the Support Vector Machines section for further details.

Estimators

Low-level methods

svm.libsvm.fit Train the model using libsvm (low-level method)
svm.libsvm.decision_function Predict margin (libsvm name for this is predict_values)
svm.libsvm.predict Predict target values of X given a model (low-level method)
svm.libsvm.predict_proba Predict probabilities
svm.libsvm.cross_validation Binding of the cross-validation routine (low-level routine)

sklearn.tree: Decision Trees

The sklearn.tree module includes decision tree-based models for classification and regression.

User guide: See the Decision Trees section for further details.

sklearn.utils: Utilities

The sklearn.utils module includes various utilities.

Developer guide: See the Utilities for Developers page for further details.

utils.resample(*arrays, **options) Resample arrays or sparse matrices in a consistent way
utils.shuffle(*arrays, **options) Shuffle arrays or sparse matrices in a consistent way