sklearn.impute.MissingIndicator

class sklearn.impute.MissingIndicator(*, missing_values=nan, features='missing-only', sparse='auto', error_on_new=True)[source]

Binary indicators for missing values.

Note that this component typically should not be used in a vanilla Pipeline consisting of transformers and a classifier, but rather could be added using a FeatureUnion or ColumnTransformer.

Read more in the User Guide.

New in version 0.20.

Parameters:
missing_valuesint, float, str, np.nan or None, default=np.nan

The placeholder for the missing values. All occurrences of missing_values will be imputed. For pandas’ dataframes with nullable integer dtypes with missing values, missing_values should be set to np.nan, since pd.NA will be converted to np.nan.

features{‘missing-only’, ‘all’}, default=’missing-only’

Whether the imputer mask should represent all or a subset of features.

  • If 'missing-only' (default), the imputer mask will only represent features containing missing values during fit time.

  • If 'all', the imputer mask will represent all features.

sparsebool or ‘auto’, default=’auto’

Whether the imputer mask format should be sparse or dense.

  • If 'auto' (default), the imputer mask will be of same type as input.

  • If True, the imputer mask will be a sparse matrix.

  • If False, the imputer mask will be a numpy array.

error_on_newbool, default=True

If True, transform will raise an error when there are features with missing values that have no missing values in fit. This is applicable only when features='missing-only'.

Attributes:
features_ndarray of shape (n_missing_features,) or (n_features,)

The features indices which will be returned when calling transform. They are computed during fit. If features='all', features_ is equal to range(n_features).

n_features_in_int

Number of features seen during fit.

New in version 0.24.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during fit. Defined only when X has feature names that are all strings.

New in version 1.0.

See also

SimpleImputer

Univariate imputation of missing values.

IterativeImputer

Multivariate imputation of missing values.

Examples

>>> import numpy as np
>>> from sklearn.impute import MissingIndicator
>>> X1 = np.array([[np.nan, 1, 3],
...                [4, 0, np.nan],
...                [8, 1, 0]])
>>> X2 = np.array([[5, 1, np.nan],
...                [np.nan, 2, 3],
...                [2, 4, 0]])
>>> indicator = MissingIndicator()
>>> indicator.fit(X1)
MissingIndicator()
>>> X2_tr = indicator.transform(X2)
>>> X2_tr
array([[False,  True],
       [ True, False],
       [False, False]])

Methods

fit(X[, y])

Fit the transformer on X.

fit_transform(X[, y])

Generate missing values indicator for X.

get_feature_names_out([input_features])

Get output feature names for transformation.

get_params([deep])

Get parameters for this estimator.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Generate missing values indicator for X.

fit(X, y=None)[source]

Fit the transformer on X.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

Input data, where n_samples is the number of samples and n_features is the number of features.

yIgnored

Not used, present for API consistency by convention.

Returns:
selfobject

Fitted estimator.

fit_transform(X, y=None)[source]

Generate missing values indicator for X.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The input data to complete.

yIgnored

Not used, present for API consistency by convention.

Returns:
Xt{ndarray, sparse matrix} of shape (n_samples, n_features) or (n_samples, n_features_with_missing)

The missing indicator for input data. The data type of Xt will be boolean.

get_feature_names_out(input_features=None)[source]

Get output feature names for transformation.

Parameters:
input_featuresarray-like of str or None, default=None

Input features.

  • If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then the following input feature names are generated: ["x0", "x1", ..., "x(n_features_in_ - 1)"].

  • If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.

Returns:
feature_names_outndarray of str objects

Transformed feature names.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”}, default=None

Configure output of transform and fit_transform.

  • "default": Default output format of a transformer

  • "pandas": DataFrame output

  • None: Transform configuration is unchanged

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(X)[source]

Generate missing values indicator for X.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The input data to complete.

Returns:
Xt{ndarray, sparse matrix} of shape (n_samples, n_features) or (n_samples, n_features_with_missing)

The missing indicator for input data. The data type of Xt will be boolean.