sklearn.impute
.MissingIndicator¶
-
class
sklearn.impute.
MissingIndicator
(*, missing_values=nan, features='missing-only', sparse='auto', error_on_new=True)[source]¶ Binary indicators for missing values.
Note that this component typically should not be used in a vanilla
Pipeline
consisting of transformers and a classifier, but rather could be added using aFeatureUnion
orColumnTransformer
.Read more in the User Guide.
New in version 0.20.
- Parameters
- missing_valuesint, float, string, np.nan or None, default=np.nan
The placeholder for the missing values. All occurrences of
missing_values
will be imputed. For pandas’ dataframes with nullable integer dtypes with missing values,missing_values
should be set tonp.nan
, sincepd.NA
will be converted tonp.nan
.- features{‘missing-only’, ‘all’}, default=’missing-only’
Whether the imputer mask should represent all or a subset of features.
If ‘missing-only’ (default), the imputer mask will only represent features containing missing values during fit time.
If ‘all’, the imputer mask will represent all features.
- sparsebool or ‘auto’, default=’auto’
Whether the imputer mask format should be sparse or dense.
If ‘auto’ (default), the imputer mask will be of same type as input.
If True, the imputer mask will be a sparse matrix.
If False, the imputer mask will be a numpy array.
- error_on_newbool, default=True
If True, transform will raise an error when there are features with missing values in transform that have no missing values in fit. This is applicable only when
features='missing-only'
.
- Attributes
- features_ndarray, shape (n_missing_features,) or (n_features,)
The features indices which will be returned when calling
transform
. They are computed duringfit
. Forfeatures='all'
, it is torange(n_features)
.
Examples
>>> import numpy as np >>> from sklearn.impute import MissingIndicator >>> X1 = np.array([[np.nan, 1, 3], ... [4, 0, np.nan], ... [8, 1, 0]]) >>> X2 = np.array([[5, 1, np.nan], ... [np.nan, 2, 3], ... [2, 4, 0]]) >>> indicator = MissingIndicator() >>> indicator.fit(X1) MissingIndicator() >>> X2_tr = indicator.transform(X2) >>> X2_tr array([[False, True], [ True, False], [False, False]])
Methods
fit
(X[, y])Fit the transformer on X.
fit_transform
(X[, y])Generate missing values indicator for X.
get_params
([deep])Get parameters for this estimator.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Generate missing values indicator for X.
-
fit
(X, y=None)[source]¶ Fit the transformer on X.
- Parameters
- X{array-like, sparse matrix}, shape (n_samples, n_features)
Input data, where
n_samples
is the number of samples andn_features
is the number of features.
- Returns
- selfobject
Returns self.
-
fit_transform
(X, y=None)[source]¶ Generate missing values indicator for X.
- Parameters
- X{array-like, sparse matrix}, shape (n_samples, n_features)
The input data to complete.
- Returns
- Xt{ndarray or sparse matrix}, shape (n_samples, n_features) or (n_samples, n_features_with_missing)
The missing indicator for input data. The data type of
Xt
will be boolean.
-
get_params
(deep=True)[source]¶ Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsdict
Parameter names mapped to their values.
-
set_params
(**params)[source]¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfestimator instance
Estimator instance.
-
transform
(X)[source]¶ Generate missing values indicator for X.
- Parameters
- X{array-like, sparse matrix}, shape (n_samples, n_features)
The input data to complete.
- Returns
- Xt{ndarray or sparse matrix}, shape (n_samples, n_features) or (n_samples, n_features_with_missing)
The missing indicator for input data. The data type of
Xt
will be boolean.