sklearn.feature_selection.chi2(X, y)[source]

Compute chi-squared stats between each non-negative feature and class.

This score can be used to select the n_features features with the highest values for the test chi-squared statistic from X, which must contain only non-negative features such as booleans or frequencies (e.g., term counts in document classification), relative to the classes.

Recall that the chi-square test measures dependence between stochastic variables, so using this function “weeds out” the features that are the most likely to be independent of class and therefore irrelevant for classification.

X{array-like, sparse matrix} of shape (n_samples, n_features)

Sample vectors.

yarray-like of shape (n_samples,)

Target vector (class labels).

chi2ndarray of shape (n_features,)

Chi2 statistics for each feature.

p_valuesndarray of shape (n_features,)

P-values for each feature.

See also


ANOVA F-value between label/feature for classification tasks.


F-value between label/feature for regression tasks.


Complexity of this algorithm is O(n_classes * n_features).

Examples using sklearn.feature_selection.chi2

Column Transformer with Mixed Types

