Fork me on GitHub

sklearn.preprocessing.LabelBinarizer

class sklearn.preprocessing.LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)[source]

Binarize labels in a one-vs-all fashion

Several regression and binary classification algorithms are available in the scikit. A simple way to extend these algorithms to the multi-class classification case is to use the so-called one-vs-all scheme.

At learning time, this simply consists in learning one regressor or binary classifier per class. In doing so, one needs to convert multi-class labels to binary labels (belong or does not belong to the class). LabelBinarizer makes this process easy with the transform method.

At prediction time, one assigns the class for which the corresponding model gave the greatest confidence. LabelBinarizer makes this easy with the inverse_transform method.

Parameters:

neg_label : int (default: 0)

Value with which negative labels must be encoded.

pos_label : int (default: 1)

Value with which positive labels must be encoded.

sparse_output : boolean (default: False)

True if the returned array from transform is desired to be in sparse CSR format.

Attributes:

classes_ : array of shape [n_class]

Holds the label for each class.

y_type_ : str,

Represents the type of the target data as evaluated by utils.multiclass.type_of_target. Possible type are ‘continuous’, ‘continuous-multioutput’, ‘binary’, ‘multiclass’, ‘mutliclass-multioutput’, ‘multilabel-sequences’, ‘multilabel-indicator’, and ‘unknown’.

multilabel_ : boolean

True if the transformer was fitted on a multilabel rather than a multiclass set of labels. The multilabel_ attribute is deprecated and will be removed in 0.18

sparse_input_ : boolean,

True if the input data to transform is given as a sparse matrix, False otherwise.

indicator_matrix_ : str

‘sparse’ when the input data to tansform is a multilable-indicator and is sparse, None otherwise. The indicator_matrix_ attribute is deprecated as of version 0.16 and will be removed in 0.18

See also

label_binarize
function to perform the transform operation of LabelBinarizer with fixed classes.

Examples

>>> from sklearn import preprocessing
>>> lb = preprocessing.LabelBinarizer()
>>> lb.fit([1, 2, 6, 4, 2])
LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)
>>> lb.classes_
array([1, 2, 4, 6])
>>> lb.transform([1, 6])
array([[1, 0, 0, 0],
       [0, 0, 0, 1]])

Binary targets transform to a column vector

>>> lb = preprocessing.LabelBinarizer()
>>> lb.fit_transform(['yes', 'no', 'no', 'yes'])
array([[1],
       [0],
       [0],
       [1]])

Passing a 2D matrix for multilabel classification

>>> import numpy as np
>>> lb.fit(np.array([[0, 1, 1], [1, 0, 0]]))
LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)
>>> lb.classes_
array([0, 1, 2])
>>> lb.transform([0, 1, 2, 1])
array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1],
       [0, 1, 0]])

Methods

fit(y) Fit label binarizer
fit_transform(X[, y]) Fit to data, then transform it.
get_params([deep]) Get parameters for this estimator.
inverse_transform(Y[, threshold]) Transform binary labels back to multi-class labels
set_params(**params) Set the parameters of this estimator.
transform(y) Transform multi-class labels to binary labels
__init__(neg_label=0, pos_label=1, sparse_output=False)[source]
fit(y)[source]

Fit label binarizer

Parameters:

y : numpy array of shape (n_samples,) or (n_samples, n_classes)

Target values. The 2-d matrix should only contain 0 and 1, represents multilabel classification.

Returns:

self : returns an instance of self.

fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

X : numpy array of shape [n_samples, n_features]

Training set.

y : numpy array of shape [n_samples]

Target values.

Returns:

X_new : numpy array of shape [n_samples, n_features_new]

Transformed array.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:

deep: boolean, optional :

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

indicator_matrix_

DEPRECATED: Attribute indicator_matrix_ is deprecated and will be removed in 0.17. Use y_type_ == 'multilabel-indicator' instead

inverse_transform(Y, threshold=None)[source]

Transform binary labels back to multi-class labels

Parameters:

Y : numpy array or sparse matrix with shape [n_samples, n_classes]

Target values. All sparse matrices are converted to CSR before inverse transformation.

threshold : float or None

Threshold used in the binary and multi-label cases.

Use 0 when:
  • Y contains the output of decision_function (classifier)
Use 0.5 when:
  • Y contains the output of predict_proba

If None, the threshold is assumed to be half way between neg_label and pos_label.

Returns:

y : numpy array or CSR matrix of shape [n_samples] Target values.

Notes

In the case when the binary labels are fractional (probabilistic), inverse_transform chooses the class with the greatest value. Typically, this allows to use the output of a linear model’s decision_function method directly as the input of inverse_transform.

multilabel_

DEPRECATED: Attribute multilabel_ is deprecated and will be removed in 0.17. Use y_type_.startswith('multilabel') instead

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:self :
transform(y)[source]

Transform multi-class labels to binary labels

The output of transform is sometimes referred to by some authors as the 1-of-K coding scheme.

Parameters:

y : numpy array or sparse matrix of shape (n_samples,) or

(n_samples, n_classes) Target values. The 2-d matrix should only contain 0 and 1, represents multilabel classification. Sparse matrix can be CSR, CSC, COO, DOK, or LIL.

Returns:

Y : numpy array or CSR matrix of shape [n_samples, n_classes]

Shape will be [n_samples, 1] for binary problems.

Previous
Next