sklearn.preprocessing
.MultiLabelBinarizer¶
- class sklearn.preprocessing.MultiLabelBinarizer(*, classes=None, sparse_output=False)[source]¶
Transform between iterable of iterables and a multilabel format.
Although a list of sets or tuples is a very intuitive format for multilabel data, it is unwieldy to process. This transformer converts between this intuitive format and the supported multilabel format: a (samples x classes) binary matrix indicating the presence of a class label.
- Parameters:
- classesarray-like of shape (n_classes,), default=None
Indicates an ordering for the class labels. All entries should be unique (cannot contain duplicate classes).
- sparse_outputbool, default=False
Set to True if output binary array is desired in CSR sparse format.
- Attributes:
- classes_ndarray of shape (n_classes,)
A copy of the
classes
parameter when provided. Otherwise it corresponds to the sorted set of classes found when fitting.
See also
OneHotEncoder
Encode categorical features using a one-hot aka one-of-K scheme.
Examples
>>> from sklearn.preprocessing import MultiLabelBinarizer >>> mlb = MultiLabelBinarizer() >>> mlb.fit_transform([(1, 2), (3,)]) array([[1, 1, 0], [0, 0, 1]]) >>> mlb.classes_ array([1, 2, 3])
>>> mlb.fit_transform([{'sci-fi', 'thriller'}, {'comedy'}]) array([[0, 1, 1], [1, 0, 0]]) >>> list(mlb.classes_) ['comedy', 'sci-fi', 'thriller']
A common mistake is to pass in a list, which leads to the following issue:
>>> mlb = MultiLabelBinarizer() >>> mlb.fit(['sci-fi', 'thriller', 'comedy']) MultiLabelBinarizer() >>> mlb.classes_ array(['-', 'c', 'd', 'e', 'f', 'h', 'i', 'l', 'm', 'o', 'r', 's', 't', 'y'], dtype=object)
To correct this, the list of labels should be passed in as:
>>> mlb = MultiLabelBinarizer() >>> mlb.fit([['sci-fi', 'thriller', 'comedy']]) MultiLabelBinarizer() >>> mlb.classes_ array(['comedy', 'sci-fi', 'thriller'], dtype=object)
Methods
fit
(y)Fit the label sets binarizer, storing classes_.
Fit the label sets binarizer and transform the given label sets.
get_params
([deep])Get parameters for this estimator.
Transform the given indicator matrix into label sets.
set_output
(*[, transform])Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(y)Transform the given label sets.
- fit(y)[source]¶
Fit the label sets binarizer, storing classes_.
- Parameters:
- yiterable of iterables
A set of labels (any orderable and hashable object) for each sample. If the
classes
parameter is set,y
will not be iterated.
- Returns:
- selfobject
Fitted estimator.
- fit_transform(y)[source]¶
Fit the label sets binarizer and transform the given label sets.
- Parameters:
- yiterable of iterables
A set of labels (any orderable and hashable object) for each sample. If the
classes
parameter is set,y
will not be iterated.
- Returns:
- y_indicator{ndarray, sparse matrix} of shape (n_samples, n_classes)
A matrix such that
y_indicator[i, j] = 1
iffclasses_[j]
is iny[i]
, and 0 otherwise. Sparse matrix will be of CSR format.
- get_params(deep=True)[source]¶
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- inverse_transform(yt)[source]¶
Transform the given indicator matrix into label sets.
- Parameters:
- yt{ndarray, sparse matrix} of shape (n_samples, n_classes)
A matrix containing only 1s ands 0s.
- Returns:
- ylist of tuples
The set of labels for each sample such that
y[i]
consists ofclasses_[j]
for eachyt[i, j] == 1
.
- set_output(*, transform=None)[source]¶
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”}, default=None
Configure output of
transform
andfit_transform
."default"
: Default output format of a transformer"pandas"
: DataFrame outputNone
: Transform configuration is unchanged
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- transform(y)[source]¶
Transform the given label sets.
- Parameters:
- yiterable of iterables
A set of labels (any orderable and hashable object) for each sample. If the
classes
parameter is set,y
will not be iterated.
- Returns:
- y_indicatorarray or CSR matrix, shape (n_samples, n_classes)
A matrix such that
y_indicator[i, j] = 1
iffclasses_[j]
is iny[i]
, and 0 otherwise.