sklearn.multiclass.OutputCodeClassifier

class sklearn.multiclass.OutputCodeClassifier(estimator, code_size=1.5, random_state=None, n_jobs=None)[source]

(Error-Correcting) Output-Code multiclass strategy

Output-code based strategies consist in representing each class with a binary code (an array of 0s and 1s). At fitting time, one binary classifier per bit in the code book is fitted. At prediction time, the classifiers are used to project new points in the class space and the class closest to the points is chosen. The main advantage of these strategies is that the number of classifiers used can be controlled by the user, either for compressing the model (0 < code_size < 1) or for making the model more robust to errors (code_size > 1). See the documentation for more details.

Read more in the User Guide.

Parameters
estimatorestimator object

An estimator object implementing fit and one of decision_function or predict_proba.

code_sizefloat

Percentage of the number of classes to be used to create the code book. A number between 0 and 1 will require fewer classifiers than one-vs-the-rest. A number greater than 1 will require more classifiers than one-vs-the-rest.

random_stateint, RandomState instance or None, optional, default: None

The generator used to initialize the codebook. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

n_jobsint or None, optional (default=None)

The number of jobs to use for the computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

Attributes
estimators_list of int(n_classes * code_size) estimators

Estimators used for predictions.

classes_numpy array of shape [n_classes]

Array containing labels.

code_book_numpy array of shape [n_classes, code_size]

Binary array containing the code of each class.

References

R2eddaeec0849-1

“Solving multiclass learning problems via error-correcting output codes”, Dietterich T., Bakiri G., Journal of Artificial Intelligence Research 2, 1995.

R2eddaeec0849-2

“The error coding method and PICTs”, James G., Hastie T., Journal of Computational and Graphical statistics 7, 1998.

R2eddaeec0849-3

“The Elements of Statistical Learning”, Hastie T., Tibshirani R., Friedman J., page 606 (second-edition) 2008.

Examples

>>> from sklearn.multiclass import OutputCodeClassifier
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_samples=100, n_features=4,
...                            n_informative=2, n_redundant=0,
...                            random_state=0, shuffle=False)
>>> clf = OutputCodeClassifier(
...     estimator=RandomForestClassifier(random_state=0),
...     random_state=0).fit(X, y)
>>> clf.predict([[0, 0, 0, 0]])
array([1])

Methods

fit(self, X, y)

Fit underlying estimators.

get_params(self[, deep])

Get parameters for this estimator.

predict(self, X)

Predict multi-class targets using underlying estimators.

score(self, X, y[, sample_weight])

Return the mean accuracy on the given test data and labels.

set_params(self, \*\*params)

Set the parameters of this estimator.

__init__(self, estimator, code_size=1.5, random_state=None, n_jobs=None)[source]

Initialize self. See help(type(self)) for accurate signature.

fit(self, X, y)[source]

Fit underlying estimators.

Parameters
X(sparse) array-like of shape (n_samples, n_features)

Data.

ynumpy array of shape [n_samples]

Multi-class targets.

Returns
self
get_params(self, deep=True)[source]

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

predict(self, X)[source]

Predict multi-class targets using underlying estimators.

Parameters
X(sparse) array-like of shape (n_samples, n_features)

Data.

Returns
ynumpy array of shape [n_samples]

Predicted multi-class targets.

score(self, X, y, sample_weight=None)[source]

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns
scorefloat

Mean accuracy of self.predict(X) wrt. y.

set_params(self, **params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfobject

Estimator instance.