validate_data#

sklearn.utils.validation.validate_data(_estimator, /, X='no_validation', y='no_validation', reset=True, validate_separately=False, skip_check_array=False, **check_params)[source]#

Validate input data and set or check feature names and counts of the input.

This helper function should be used in an estimator that requires input validation. This mutates the estimator and sets the n_features_in_ and feature_names_in_ attributes if reset=True.

Added in version 1.6.

Parameters:
_estimatorestimator instance

The estimator to validate the input for.

X{array-like, sparse matrix, dataframe} of shape (n_samples, n_features), default=’no validation’

The input samples. If 'no_validation', no validation is performed on X. This is useful for meta-estimator which can delegate input validation to their underlying estimator(s). In that case y must be passed and the only accepted check_params are multi_output and y_numeric.

yarray-like of shape (n_samples,), default=’no_validation’

The targets.

  • If None, check_array is called on X. If the estimator’s requires_y tag is True, then an error will be raised.

  • If 'no_validation', check_array is called on X and the estimator’s requires_y tag is ignored. This is a default placeholder and is never meant to be explicitly set. In that case X must be passed.

  • Otherwise, only y with _check_y or both X and y are checked with either check_array or check_X_y depending on validate_separately.

resetbool, default=True

Whether to reset the n_features_in_ attribute. If False, the input will be checked for consistency with data provided when reset was last True.

Note

It is recommended to call reset=True in fit and in the first call to partial_fit. All other methods that validate X should set reset=False.

validate_separatelyFalse or tuple of dicts, default=False

Only used if y is not None. If False, call check_X_y. Else, it must be a tuple of kwargs to be used for calling check_array on X and y respectively.

estimator=self is automatically added to these dicts to generate more informative error message in case of invalid input data.

skip_check_arraybool, default=False

If True, X and y are unchanged and only feature_names_in_ and n_features_in_ are checked. Otherwise, check_array is called on X and y.

**check_paramskwargs

Parameters passed to check_array or check_X_y. Ignored if validate_separately is not False.

estimator=self is automatically added to these params to generate more informative error message in case of invalid input data.

Returns:
out{ndarray, sparse matrix} or tuple of these

The validated input. A tuple is returned if both X and y are validated.