StackingRegressor#

class sklearn.ensemble.StackingRegressor(estimators, final_estimator=None, *, cv=None, n_jobs=None, passthrough=False, verbose=0)[source]#

Stack of estimators with a final regressor.

Stacked generalization consists in stacking the output of individual estimator and use a regressor to compute the final prediction. Stacking allows to use the strength of each individual estimator by using their output as input of a final estimator.

Note that estimators_ are fitted on the full X while final_estimator_ is trained using cross-validated predictions of the base estimators using cross_val_predict.

See also

StackingClassifier: Stack of estimators with a final classifier.

References

[1]

Wolpert, David H. “Stacked generalization.” Neural networks 5.2 (1992): 241-259.

Examples

>>> from sklearn.datasets import load_diabetes
>>> from sklearn.linear_model import RidgeCV
>>> from sklearn.svm import LinearSVR
>>> from sklearn.ensemble import RandomForestRegressor
>>> from sklearn.ensemble import StackingRegressor
>>> X, y = load_diabetes(return_X_y=True)
>>> estimators = [
...     ('lr', RidgeCV()),
...     ('svr', LinearSVR(random_state=42))
... ]
>>> reg = StackingRegressor(
...     estimators=estimators,
...     final_estimator=RandomForestRegressor(n_estimators=10,
...                                           random_state=42)
... )
>>> from sklearn.model_selection import train_test_split
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, random_state=42
... )
>>> reg.fit(X_train, y_train).score(X_test, y_test)
0.3...

fit(X, y, **fit_params)[source]#

Fit the estimators.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): Training vectors, where n_samples is the number of samples and n_features is the number of features.
yarray-like of shape (n_samples,): Target values.
**fit_paramsdict: Parameters to pass to the underlying estimators.

Added in version 1.6: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:

selfobject: Returns a fitted instance.

fit_transform(X, y, **fit_params)[source]#

Fit the estimators and return the predictions for X for each estimator.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): Training vectors, where n_samples is the number of samples and n_features is the number of features.
yarray-like of shape (n_samples,): Target values.
**fit_paramsdict: Parameters to pass to the underlying estimators.

Added in version 1.6: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:

y_predsndarray of shape (n_samples, n_estimators): Prediction outputs for each estimator.

get_feature_names_out(input_features=None)[source]#

Get output feature names for transformation.

Parameters:

input_featuresarray-like of str or None, default=None

Input features. The input feature names are only used when passthrough is True.

If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then names are generated: [x0, x1, ..., x(n_features_in_ - 1)].
If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.

If passthrough is False, then only the names of estimators are used to generate the output feature names.

Returns:

feature_names_outndarray of str objects: Transformed feature names.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Added in version 1.6.

Returns:

routingMetadataRouter: A MetadataRouter encapsulating routing information.

get_params(deep=True)[source]#

Get the parameters of an estimator from the ensemble.

Returns the parameters given in the constructor as well as the estimators contained within the estimators parameter.

Parameters:

deepbool, default=True: Setting it to True gets the various estimators and the parameters of the estimators as well.

Returns:

paramsdict: Parameter and estimator names mapped to their values or parameter names mapped to their values.

property named_estimators#

Dictionary to access any fitted sub-estimators by name.

Returns:

Bunch

predict(X, **predict_params)[source]#

Predict target for X.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features)

Training vectors, where n_samples is the number of samples and n_features is the number of features.

**predict_paramsdict of str -> obj

Parameters to the predict called by the final_estimator. Note that this may be used to return uncertainties from some estimators with return_std or return_cov. Be aware that it will only account for uncertainty in the final estimator.

If enable_metadata_routing=False (default): Parameters directly passed to the predict method of the final_estimator.
If enable_metadata_routing=True: Parameters safely routed to the predict method of the final_estimator. See Metadata Routing User Guide for more details.

Changed in version 1.6: **predict_params can be routed via metadata routing API.

Returns:

y_predndarray of shape (n_samples,) or (n_samples, n_output): Predicted targets.

score(X, y, sample_weight=None)[source]#

Return coefficient of determination on test data.

The coefficient of determination, $R^2$, is defined as $(1 - \frac{u}{v})$, where $u$ is the residual sum of squares ((y_true - y_pred)** 2).sum() and $v$ is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a $R^2$ score of 0.0.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True values for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: $R^2$ of self.predict(X) w.r.t. y.

Notes

The $R^2$ score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score. This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_output(*, transform=None)[source]#

Set output container.

Refer to the user guide for more details and Introducing the set_output API for an example on how to use the API.

Parameters:

transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

"default": Default output format of a transformer
"pandas": DataFrame output
"polars": Polars output
None: Transform configuration is unchanged

Added in version 1.4: "polars" option was added.

Returns:

selfestimator instance: Estimator instance.

set_params(**params)[source]#

Set the parameters of an estimator from the ensemble.

Valid parameter keys can be listed with get_params(). Note that you can directly set the parameters of the estimators contained in estimators.

Parameters:

**paramskeyword arguments: Specific parameters using e.g. set_params(parameter_name=new_value). In addition, to setting the parameters of the estimator, the individual estimator of the estimators can also be set, or can be removed by setting them to ‘drop’.

Returns:

selfobject: Estimator instance.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → StackingRegressor[source]#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns:

selfobject: The updated object.

transform(X)[source]#

Return the predictions for X for each estimator.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): Training vectors, where n_samples is the number of samples and n_features is the number of features.

Returns:

y_predsndarray of shape (n_samples, n_estimators): Prediction outputs for each estimator.

Gallery examples#

Combine predictors using stacking