sklearn.impute.MissingIndicator¶
-
class
sklearn.impute.MissingIndicator(*, missing_values=nan, features='missing-only', sparse='auto', error_on_new=True)[source]¶ Binary indicators for missing values.
Note that this component typically should not be used in a vanilla
Pipelineconsisting of transformers and a classifier, but rather could be added using aFeatureUnionorColumnTransformer.Read more in the User Guide.
New in version 0.20.
- Parameters
- missing_valuesint, float, string, np.nan or None, default=np.nan
The placeholder for the missing values. All occurrences of
missing_valueswill be imputed. For pandas’ dataframes with nullable integer dtypes with missing values,missing_valuesshould be set tonp.nan, sincepd.NAwill be converted tonp.nan.- features{‘missing-only’, ‘all’}, default=’missing-only’
Whether the imputer mask should represent all or a subset of features.
If ‘missing-only’ (default), the imputer mask will only represent features containing missing values during fit time.
If ‘all’, the imputer mask will represent all features.
- sparsebool or ‘auto’, default=’auto’
Whether the imputer mask format should be sparse or dense.
If ‘auto’ (default), the imputer mask will be of same type as input.
If True, the imputer mask will be a sparse matrix.
If False, the imputer mask will be a numpy array.
- error_on_newbool, default=True
If True, transform will raise an error when there are features with missing values in transform that have no missing values in fit. This is applicable only when
features='missing-only'.
- Attributes
- features_ndarray, shape (n_missing_features,) or (n_features,)
The features indices which will be returned when calling
transform. They are computed duringfit. Forfeatures='all', it is torange(n_features).
Examples
>>> import numpy as np >>> from sklearn.impute import MissingIndicator >>> X1 = np.array([[np.nan, 1, 3], ... [4, 0, np.nan], ... [8, 1, 0]]) >>> X2 = np.array([[5, 1, np.nan], ... [np.nan, 2, 3], ... [2, 4, 0]]) >>> indicator = MissingIndicator() >>> indicator.fit(X1) MissingIndicator() >>> X2_tr = indicator.transform(X2) >>> X2_tr array([[False, True], [ True, False], [False, False]])
Methods
fit(X[, y])Fit the transformer on X.
fit_transform(X[, y])Generate missing values indicator for X.
get_params([deep])Get parameters for this estimator.
set_params(**params)Set the parameters of this estimator.
transform(X)Generate missing values indicator for X.
-
fit(X, y=None)[source]¶ Fit the transformer on X.
- Parameters
- X{array-like, sparse matrix}, shape (n_samples, n_features)
Input data, where
n_samplesis the number of samples andn_featuresis the number of features.
- Returns
- selfobject
Returns self.
-
fit_transform(X, y=None)[source]¶ Generate missing values indicator for X.
- Parameters
- X{array-like, sparse matrix}, shape (n_samples, n_features)
The input data to complete.
- Returns
- Xt{ndarray or sparse matrix}, shape (n_samples, n_features) or (n_samples, n_features_with_missing)
The missing indicator for input data. The data type of
Xtwill be boolean.
-
get_params(deep=True)[source]¶ Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsdict
Parameter names mapped to their values.
-
set_params(**params)[source]¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfestimator instance
Estimator instance.
-
transform(X)[source]¶ Generate missing values indicator for X.
- Parameters
- X{array-like, sparse matrix}, shape (n_samples, n_features)
The input data to complete.
- Returns
- Xt{ndarray or sparse matrix}, shape (n_samples, n_features) or (n_samples, n_features_with_missing)
The missing indicator for input data. The data type of
Xtwill be boolean.