# Reference¶

This is the class and function reference of scikit-learn. Please refer to the full user guide for further details, as the class and function raw specifications may not be enough to give full guidelines on their uses.

`sklearn.base`

: Base classes and utility functions¶

Base classes for all estimators.

### Base classes¶

`base.BaseEstimator` |
Base class for all estimators in scikit-learn |

`base.ClassifierMixin` |
Mixin class for all classifiers in scikit-learn. |

`base.ClusterMixin` |
Mixin class for all cluster estimators in scikit-learn. |

`base.RegressorMixin` |
Mixin class for all regression estimators in scikit-learn. |

`base.TransformerMixin` |
Mixin class for all transformers in scikit-learn. |

### Functions¶

`base.clone` (estimator[, safe]) |
Constructs a new estimator with the same parameters. |

`sklearn.cluster`

: Clustering¶

The `sklearn.cluster`

module gathers popular unsupervised clustering
algorithms.

**User guide:** See the Clustering section for further details.

### Classes¶

### Functions¶

`sklearn.cluster.bicluster`

: Biclustering¶

Spectral biclustering algorithms.

Authors : Kemal Eren License: BSD 3 clause

**User guide:** See the Biclustering section for further details.

### Classes¶

`SpectralBiclustering` ([n_clusters, method, ...]) |
Spectral biclustering (Kluger, 2003). |

`SpectralCoclustering` ([n_clusters, ...]) |
Spectral Co-Clustering algorithm (Dhillon, 2001). |

`sklearn.covariance`

: Covariance Estimators¶

The `sklearn.covariance`

module includes methods and algorithms to
robustly estimate the covariance of features given a set of points. The
precision matrix defined as the inverse of the covariance is also estimated.
Covariance estimation is closely related to the theory of Gaussian Graphical
Models.

**User guide:** See the Covariance estimation section for further details.

`sklearn.cross_validation`

: Cross Validation¶

The `sklearn.cross_validation`

module includes utilities for cross-
validation and performance evaluation.

**User guide:** See the Cross-validation: evaluating estimator performance section for further details.

`cross_validation.KFold` (n[, n_folds, ...]) |
K-Folds cross validation iterator. |

`cross_validation.LeaveOneLabelOut` (labels[, ...]) |
Leave-One-Label_Out cross-validation iterator |

`cross_validation.LeaveOneOut` (n[, indices]) |
Leave-One-Out cross validation iterator. |

`cross_validation.LeavePLabelOut` (labels, p[, ...]) |
Leave-P-Label_Out cross-validation iterator |

`cross_validation.LeavePOut` (n, p[, indices]) |
Leave-P-Out cross validation iterator |

`cross_validation.PredefinedSplit` (test_fold) |
Predefined split cross validation iterator |

`cross_validation.StratifiedKFold` (y[, ...]) |
Stratified K-Folds cross validation iterator |

`cross_validation.ShuffleSplit` (n[, n_iter, ...]) |
Random permutation cross-validation iterator. |

`cross_validation.StratifiedShuffleSplit` (y[, ...]) |
Stratified ShuffleSplit cross validation iterator |

`cross_validation.train_test_split` (*arrays, ...) |
Split arrays or matrices into random train and test subsets |

`cross_validation.cross_val_score` (estimator, X) |
Evaluate a score by cross-validation |

`cross_validation.cross_val_predict` (estimator, X) |
Generate cross-validated estimates for each input data point |

`cross_validation.permutation_test_score` (...) |
Evaluate the significance of a cross-validated score with permutations |

`cross_validation.check_cv` (cv[, X, y, classifier]) |
Input checker utility for building a CV in a user friendly way. |

`sklearn.datasets`

: Datasets¶

The `sklearn.datasets`

module includes utilities to load datasets,
including methods to load and fetch popular reference datasets. It also
features some artificial data generators.

**User guide:** See the Dataset loading utilities section for further details.

### Loaders¶

### Samples generator¶

`sklearn.decomposition`

: Matrix Decomposition¶

The `sklearn.decomposition`

module includes matrix decomposition
algorithms, including among others PCA, NMF or ICA. Most of the algorithms of
this module can be regarded as dimensionality reduction techniques.

**User guide:** See the Decomposing signals in components (matrix factorization problems) section for further details.

`sklearn.dummy`

: Dummy estimators¶

**User guide:** See the Model evaluation: quantifying the quality of predictions section for further details.

`dummy.DummyClassifier` ([strategy, ...]) |
DummyClassifier is a classifier that makes predictions using simple rules. |

`dummy.DummyRegressor` ([strategy, constant, ...]) |
DummyRegressor is a regressor that makes predictions using simple rules. |

`sklearn.ensemble`

: Ensemble Methods¶

The `sklearn.ensemble`

module includes ensemble-based methods for
classification and regression.

**User guide:** See the Ensemble methods section for further details.

### partial dependence¶

Partial dependence plots for tree ensembles.

`ensemble.partial_dependence.partial_dependence` (...) |
Partial dependence of `target_variables` . |

`ensemble.partial_dependence.plot_partial_dependence` (...) |
Partial dependence plots for `features` . |

`sklearn.feature_extraction`

: Feature Extraction¶

The `sklearn.feature_extraction`

module deals with feature extraction
from raw data. It currently includes methods to extract features from text and
images.

**User guide:** See the Feature extraction section for further details.

### From images¶

The `sklearn.feature_extraction.image`

submodule gathers utilities to
extract features from images.

`feature_extraction.image.img_to_graph` (img[, ...]) |
Graph of the pixel-to-pixel gradient connections |

`feature_extraction.image.grid_to_graph` (n_x, n_y) |
Graph of the pixel-to-pixel connections |

`feature_extraction.image.extract_patches_2d` (...) |
Reshape a 2D image into a collection of patches |

`feature_extraction.image.reconstruct_from_patches_2d` (...) |
Reconstruct the image from all of its patches. |

`feature_extraction.image.PatchExtractor` ([...]) |
Extracts patches from a collection of images |

### From text¶

The `sklearn.feature_extraction.text`

submodule gathers utilities to
build feature vectors from text documents.

`feature_extraction.text.CountVectorizer` ([...]) |
Convert a collection of text documents to a matrix of token counts |

`feature_extraction.text.HashingVectorizer` ([...]) |
Convert a collection of text documents to a matrix of token occurrences |

`feature_extraction.text.TfidfTransformer` ([...]) |
Transform a count matrix to a normalized tf or tf-idf representation |

`feature_extraction.text.TfidfVectorizer` ([...]) |
Convert a collection of raw documents to a matrix of TF-IDF features. |

`sklearn.feature_selection`

: Feature Selection¶

The `sklearn.feature_selection`

module implements feature selection
algorithms. It currently includes univariate filter selection methods and the
recursive feature elimination algorithm.

**User guide:** See the Feature selection section for further details.

`sklearn.gaussian_process`

: Gaussian Processes¶

The `sklearn.gaussian_process`

module implements scalar Gaussian Process
based predictions.

**User guide:** See the Gaussian Processes section for further details.

`gaussian_process.correlation_models.absolute_exponential` (...) |
Absolute exponential autocorrelation model. |

`gaussian_process.correlation_models.squared_exponential` (...) |
Squared exponential correlation model (Radial Basis Function). |

`gaussian_process.correlation_models.generalized_exponential` (...) |
Generalized exponential correlation model. |

`gaussian_process.correlation_models.pure_nugget` (...) |
Spatial independence correlation model (pure nugget). |

`gaussian_process.correlation_models.cubic` (...) |
Cubic correlation model: |

`gaussian_process.correlation_models.linear` (...) |
Linear correlation model: |

`gaussian_process.regression_models.constant` (x) |
Zero order polynomial (constant, p = 1) regression model. |

`gaussian_process.regression_models.linear` (x) |
First order polynomial (linear, p = n+1) regression model. |

`gaussian_process.regression_models.quadratic` (x) |
Second order polynomial (quadratic, p = n*(n-1)/2+n+1) regression model. |

`sklearn.grid_search`

: Grid Search¶

The `sklearn.grid_search`

includes utilities to fine-tune the parameters
of an estimator.

**User guide:** See the Grid Search: Searching for estimator parameters section for further details.

`grid_search.GridSearchCV` (estimator, param_grid) |
Exhaustive search over specified parameter values for an estimator. |

`grid_search.ParameterGrid` (param_grid) |
Grid of parameters with a discrete number of values for each. |

`grid_search.ParameterSampler` (...[, random_state]) |
Generator on parameters sampled from given distributions. |

`grid_search.RandomizedSearchCV` (estimator, ...) |
Randomized search on hyper parameters. |

`sklearn.isotonic`

: Isotonic regression¶

**User guide:** See the Isotonic regression section for further details.

`isotonic.IsotonicRegression` ([y_min, y_max, ...]) |
Isotonic regression model. |

`isotonic.isotonic_regression` (y[, ...]) |
Solve the isotonic regression model: |

`isotonic.check_increasing` (x, y) |
Determine whether y is monotonically correlated with x. |

`sklearn.kernel_approximation`

Kernel Approximation¶

The `sklearn.kernel_approximation`

module implements several
approximate kernel feature maps base on Fourier transforms.

**User guide:** See the Kernel Approximation section for further details.

`kernel_approximation.AdditiveChi2Sampler` ([...]) |
Approximate feature map for additive chi2 kernel. |

`kernel_approximation.Nystroem` ([kernel, ...]) |
Approximate a kernel map using a subset of the training data. |

`kernel_approximation.RBFSampler` ([gamma, ...]) |
Approximates feature map of an RBF kernel by Monte Carlo approximation of its Fourier transform. |

`kernel_approximation.SkewedChi2Sampler` ([...]) |
Approximates feature map of the “skewed chi-squared” kernel by Monte Carlo approximation of its Fourier transform. |

`sklearn.kernel_ridge`

Kernel Ridge Regression¶

Module `sklearn.kernel_ridge`

implements kernel ridge regression.

**User guide:** See the Kernel ridge regression section for further details.

`kernel_ridge.KernelRidge` ([alpha, kernel, ...]) |
Kernel ridge regression. |

`sklearn.lda`

: Linear Discriminant Analysis¶

Linear Discriminant Analysis (LDA)

**User guide:** See the Linear and quadratic discriminant analysis section for further details.

`lda.LDA` ([solver, shrinkage, priors, ...]) |
Linear Discriminant Analysis (LDA). |

`sklearn.learning_curve`

Learning curve evaluation¶

Utilities to evaluate models with respect to a variable

`learning_curve.learning_curve` (estimator, X, y) |
Learning curve. |

`learning_curve.validation_curve` (estimator, ...) |
Validation curve. |

`sklearn.linear_model`

: Generalized Linear Models¶

The `sklearn.linear_model`

module implements generalized linear models. It
includes Ridge regression, Bayesian Regression, Lasso and Elastic Net
estimators computed with Least Angle Regression and coordinate descent. It also
implements Stochastic Gradient Descent related algorithms.

**User guide:** See the Generalized Linear Models section for further details.

`sklearn.manifold`

: Manifold Learning¶

The `sklearn.manifold`

module implements data embedding techniques.

**User guide:** See the Manifold learning section for further details.

`sklearn.metrics`

: Metrics¶

See the Model evaluation: quantifying the quality of predictions section and the Pairwise metrics, Affinities and Kernels section of the user guide for further details.

The `sklearn.metrics`

module includes score functions, performance metrics
and pairwise metrics and distance computations.

### Model Selection Interface¶

See the The scoring parameter: defining model evaluation rules section of the user guide for further details.

### Classification metrics¶

See the Classification metrics section of the user guide for further details.

### Multilabel ranking metrics¶

See the Multilabel ranking metrics section of the user guide for further details.

### Clustering metrics¶

See the Clustering performance evaluation section of the user guide for further details.

The `sklearn.metrics.cluster`

submodule contains evaluation metrics for
cluster analysis results. There are two forms of evaluation:

- supervised, which uses a ground truth class values for each sample.
- unsupervised, which does not and measures the ‘quality’ of the model itself.

### Biclustering metrics¶

See the Biclustering evaluation section of the user guide for further details.

### Pairwise metrics¶

See the Pairwise metrics, Affinities and Kernels section of the user guide for further details.

`metrics.pairwise.additive_chi2_kernel` (X[, Y]) |
Computes the additive chi-squared kernel between observations in X and Y |

`metrics.pairwise.chi2_kernel` (X[, Y, gamma]) |
Computes the exponential chi-squared kernel X and Y. |

`metrics.pairwise.distance_metrics` () |
Valid metrics for pairwise_distances. |

`metrics.pairwise.euclidean_distances` (X[, Y, ...]) |
Considering the rows of X (and Y=X) as vectors, compute the distance matrix between each pair of vectors. |

`metrics.pairwise.kernel_metrics` () |
Valid metrics for pairwise_kernels |

`metrics.pairwise.linear_kernel` (X[, Y]) |
Compute the linear kernel between X and Y. |

`metrics.pairwise.manhattan_distances` (X[, Y, ...]) |
Compute the L1 distances between the vectors in X and Y. |

`metrics.pairwise.pairwise_distances` (X[, Y, ...]) |
Compute the distance matrix from a vector array X and optional Y. |

`metrics.pairwise.pairwise_kernels` (X[, Y, ...]) |
Compute the kernel between arrays X and optional array Y. |

`metrics.pairwise.polynomial_kernel` (X[, Y, ...]) |
Compute the polynomial kernel between X and Y: |

`metrics.pairwise.rbf_kernel` (X[, Y, gamma]) |
Compute the rbf (gaussian) kernel between X and Y: |

`sklearn.mixture`

: Gaussian Mixture Models¶

The `sklearn.mixture`

module implements mixture modeling algorithms.

**User guide:** See the Gaussian mixture models section for further details.

`sklearn.multiclass`

: Multiclass and multilabel classification¶

### Multiclass and multilabel classification strategies¶

- This module implements multiclass learning algorithms:
- one-vs-the-rest / one-vs-all
- one-vs-one
- error correcting output codes

The estimators provided in this module are meta-estimators: they require a base estimator to be provided in their constructor. For example, it is possible to use these estimators to turn a binary classifier or a regressor into a multiclass classifier. It is also possible to use these estimators with multiclass estimators in the hope that their accuracy or runtime performance improves.

All classifiers in scikit-learn implement multiclass classification; you only need to use this module if you want to experiment with custom multiclass strategies.

The one-vs-the-rest meta-classifier also implements a predict_proba method,
so long as such a method is implemented by the base classifier. This method
returns probabilities of class membership in both the single label and
multilabel case. Note that in the multilabel case, probabilities are the
marginal probability that a given sample falls in the given class. As such, in
the multilabel case the sum of these probabilities over all possible labels
for a given sample *will not* sum to unity, as they do in the single label
case.

**User guide:** See the Multiclass and multilabel algorithms section for further details.

`multiclass.OneVsRestClassifier` (estimator[, ...]) |
One-vs-the-rest (OvR) multiclass/multilabel strategy |

`multiclass.OneVsOneClassifier` (estimator[, ...]) |
One-vs-one multiclass strategy |

`multiclass.OutputCodeClassifier` (estimator[, ...]) |
(Error-Correcting) Output-Code multiclass strategy |

`sklearn.naive_bayes`

: Naive Bayes¶

The `sklearn.naive_bayes`

module implements Naive Bayes algorithms. These
are supervised learning methods based on applying Bayes’ theorem with strong
(naive) feature independence assumptions.

**User guide:** See the Naive Bayes section for further details.

`naive_bayes.GaussianNB` |
Gaussian Naive Bayes (GaussianNB) |

`naive_bayes.MultinomialNB` ([alpha, ...]) |
Naive Bayes classifier for multinomial models |

`naive_bayes.BernoulliNB` ([alpha, binarize, ...]) |
Naive Bayes classifier for multivariate Bernoulli models. |

`sklearn.neighbors`

: Nearest Neighbors¶

The `sklearn.neighbors`

module implements the k-nearest neighbors
algorithm.

**User guide:** See the Nearest Neighbors section for further details.

`sklearn.neural_network`

: Neural network models¶

The `sklearn.neural_network`

module includes models based on neural
networks.

**User guide:** See the Neural network models (unsupervised) section for further details.

`sklearn.calibration`

: Probability Calibration¶

Calibration of predicted probabilities.

**User guide:** See the Probability calibration section for further details.

`calibration.CalibratedClassifierCV` ([...]) |
Probability calibration with isotonic regression or sigmoid. |

`calibration.calibration_curve` (y_true, y_prob) |
Compute true and predicted probabilities for a calibration curve. |

`sklearn.cross_decomposition`

: Cross decomposition¶

**User guide:** See the Cross decomposition section for further details.

`sklearn.pipeline`

: Pipeline¶

The `sklearn.pipeline`

module implements utilities to build a composite
estimator, as a chain of transforms and estimators.

`pipeline.Pipeline` (steps) |
Pipeline of transforms with a final estimator. |

`pipeline.FeatureUnion` (transformer_list[, ...]) |
Concatenates results of multiple transformer objects. |

`pipeline.make_pipeline` (*steps) |
Construct a Pipeline from the given estimators. |

`pipeline.make_union` (*transformers) |
Construct a FeatureUnion from the given transformers. |

`sklearn.preprocessing`

: Preprocessing and Normalization¶

The `sklearn.preprocessing`

module includes scaling, centering,
normalization, binarization and imputation methods.

**User guide:** See the Preprocessing data section for further details.

`sklearn.qda`

: Quadratic Discriminant Analysis¶

Quadratic Discriminant Analysis

**User guide:** See the Linear and quadratic discriminant analysis section for further details.

`qda.QDA` ([priors, reg_param]) |
Quadratic Discriminant Analysis (QDA) |

`sklearn.random_projection`

: Random projection¶

Random Projection transformers

Random Projections are a simple and computationally efficient way to reduce the dimensionality of the data by trading a controlled amount of accuracy (as additional variance) for faster processing times and smaller model sizes.

The dimensions and distribution of Random Projections matrices are controlled so as to preserve the pairwise distances between any two samples of the dataset.

The main theoretical result behind the efficiency of random projection is the Johnson-Lindenstrauss lemma (quoting Wikipedia):

In mathematics, the Johnson-Lindenstrauss lemma is a result concerning low-distortion embeddings of points from high-dimensional into low-dimensional Euclidean space. The lemma states that a small set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved. The map used for the embedding is at least Lipschitz, and can even be taken to be an orthogonal projection.

**User guide:** See the Random Projection section for further details.

`random_projection.GaussianRandomProjection` ([...]) |
Reduce dimensionality through Gaussian random projection |

`random_projection.SparseRandomProjection` ([...]) |
Reduce dimensionality through sparse random projection |

`random_projection.johnson_lindenstrauss_min_dim` (...) |
Find a ‘safe’ number of components to randomly project to |

`sklearn.semi_supervised`

Semi-Supervised Learning¶

The `sklearn.semi_supervised`

module implements semi-supervised learning
algorithms. These algorithms utilized small amounts of labeled data and large
amounts of unlabeled data for classification tasks. This module includes Label
Propagation.

**User guide:** See the Semi-Supervised section for further details.

`sklearn.svm`

: Support Vector Machines¶

The `sklearn.svm`

module includes Support Vector Machine algorithms.

**User guide:** See the Support Vector Machines section for further details.

### Low-level methods¶

`svm.libsvm.fit` |
Train the model using libsvm (low-level method) |

`svm.libsvm.decision_function` |
Predict margin (libsvm name for this is predict_values) |

`svm.libsvm.predict` |
Predict target values of X given a model (low-level method) |

`svm.libsvm.predict_proba` |
Predict probabilities |

`svm.libsvm.cross_validation` |
Binding of the cross-validation routine (low-level routine) |

`sklearn.tree`

: Decision Trees¶

The `sklearn.tree`

module includes decision tree-based models for
classification and regression.

**User guide:** See the Decision Trees section for further details.

`sklearn.utils`

: Utilities¶

The `sklearn.utils`

module includes various utilities.

**Developer guide:** See the Utilities for Developers page for further details.

`utils.resample` (*arrays, **options) |
Resample arrays or sparse matrices in a consistent way |

`utils.shuffle` (*arrays, **options) |
Shuffle arrays or sparse matrices in a consistent way |