sklearn.pipeline.Pipeline

class sklearn.pipeline.Pipeline(steps)[source]

Pipeline of transforms with a final estimator.

Sequentially apply a list of transforms and a final estimator. Intermediate steps of the pipeline must be ‘transforms’, that is, they must implement fit and transform methods. The final estimator only needs to implement fit.

The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in the example below.

Read more in the User Guide.

Parameters:

steps : list

List of (name, transform) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an estimator.

Attributes:

named_steps : dict

Read-only attribute to access any step parameter by user given name. Keys are step names and values are steps parameters.

Examples

>>> from sklearn import svm
>>> from sklearn.datasets import samples_generator
>>> from sklearn.feature_selection import SelectKBest
>>> from sklearn.feature_selection import f_regression
>>> from sklearn.pipeline import Pipeline
>>> # generate some data to play with
>>> X, y = samples_generator.make_classification(
...     n_informative=5, n_redundant=0, random_state=42)
>>> # ANOVA SVM-C
>>> anova_filter = SelectKBest(f_regression, k=5)
>>> clf = svm.SVC(kernel='linear')
>>> anova_svm = Pipeline([('anova', anova_filter), ('svc', clf)])
>>> # You can set the parameters using the names issued
>>> # For instance, fit using a k of 10 in the SelectKBest
>>> # and a parameter 'C' of the svm
>>> anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y)
...                                              
Pipeline(steps=[...])
>>> prediction = anova_svm.predict(X)
>>> anova_svm.score(X, y)                        
0.77...
>>> # getting the selected features chosen by anova_filter
>>> anova_svm.named_steps['anova'].get_support()
... 
array([ True,  True,  True, False, False,  True, False,  True,  True, True,
       False, False,  True, False,  True, False, False, False, False,
       True], dtype=bool)

Methods

decision_function(X) Applies transforms to the data, and the decision_function method of the final estimator.
fit(X[, y]) Fit all the transforms one after the other and transform the data, then fit the transformed data using the final estimator.
fit_predict(X[, y]) Applies fit_predict of last step in pipeline after transforms.
fit_transform(X[, y]) Fit all the transforms one after the other and transform the data, then use fit_transform on transformed data using the final estimator.
get_params([deep])
inverse_transform(X) Applies inverse transform to the data.
predict(X) Applies transforms to the data, and the predict method of the final estimator.
predict_log_proba(X) Applies transforms to the data, and the predict_log_proba method of the final estimator.
predict_proba(X) Applies transforms to the data, and the predict_proba method of the final estimator.
score(X[, y]) Applies transforms to the data, and the score method of the final estimator.
set_params(**params) Set the parameters of this estimator.
transform(X) Applies transforms to the data, and the transform method of the final estimator.
__init__(steps)[source]
decision_function(X)[source]

Applies transforms to the data, and the decision_function method of the final estimator. Valid only if the final estimator implements decision_function.

Parameters:

X : iterable

Data to predict on. Must fulfill input requirements of first step of the pipeline.

fit(X, y=None, **fit_params)[source]

Fit all the transforms one after the other and transform the data, then fit the transformed data using the final estimator.

Parameters:

X : iterable

Training data. Must fulfill input requirements of first step of the pipeline.

y : iterable, default=None

Training targets. Must fulfill label requirements for all steps of the pipeline.

fit_predict(X, y=None, **fit_params)[source]

Applies fit_predict of last step in pipeline after transforms.

Applies fit_transforms of a pipeline to the data, followed by the fit_predict method of the final estimator in the pipeline. Valid only if the final estimator implements fit_predict.

Parameters:

X : iterable

Training data. Must fulfill input requirements of first step of the pipeline.

y : iterable, default=None

Training targets. Must fulfill label requirements for all steps of the pipeline.

fit_transform(X, y=None, **fit_params)[source]

Fit all the transforms one after the other and transform the data, then use fit_transform on transformed data using the final estimator.

Parameters:

X : iterable

Training data. Must fulfill input requirements of first step of the pipeline.

y : iterable, default=None

Training targets. Must fulfill label requirements for all steps of the pipeline.

inverse_transform(X)[source]

Applies inverse transform to the data. Starts with the last step of the pipeline and applies inverse_transform in inverse order of the pipeline steps. Valid only if all steps of the pipeline implement inverse_transform.

Parameters:

X : iterable

Data to inverse transform. Must fulfill output requirements of the last step of the pipeline.

predict(X)[source]

Applies transforms to the data, and the predict method of the final estimator. Valid only if the final estimator implements predict.

Parameters:

X : iterable

Data to predict on. Must fulfill input requirements of first step of the pipeline.

predict_log_proba(X)[source]

Applies transforms to the data, and the predict_log_proba method of the final estimator. Valid only if the final estimator implements predict_log_proba.

Parameters:

X : iterable

Data to predict on. Must fulfill input requirements of first step of the pipeline.

predict_proba(X)[source]

Applies transforms to the data, and the predict_proba method of the final estimator. Valid only if the final estimator implements predict_proba.

Parameters:

X : iterable

Data to predict on. Must fulfill input requirements of first step of the pipeline.

score(X, y=None)[source]

Applies transforms to the data, and the score method of the final estimator. Valid only if the final estimator implements score.

Parameters:

X : iterable

Data to score. Must fulfill input requirements of first step of the pipeline.

y : iterable, default=None

Targets used for scoring. Must fulfill label requirements for all steps of the pipeline.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:self :
transform(X)[source]

Applies transforms to the data, and the transform method of the final estimator. Valid only if the final estimator implements transform.

Parameters:

X : iterable

Data to predict on. Must fulfill input requirements of first step of the pipeline.