sklearn.feature_selection
.chi2¶

sklearn.feature_selection.
chi2
(X, y)[source]¶ Compute chisquared stats between each nonnegative feature and class.
This score can be used to select the n_features features with the highest values for the test chisquared statistic from X, which must contain only nonnegative features such as booleans or frequencies (e.g., term counts in document classification), relative to the classes.
Recall that the chisquare test measures dependence between stochastic variables, so using this function “weeds out” the features that are the most likely to be independent of class and therefore irrelevant for classification.
Read more in the User Guide.
 Parameters
 X{arraylike, sparse matrix} of shape (n_samples, n_features)
Sample vectors.
 yarraylike of shape (n_samples,)
Target vector (class labels).
 Returns
 chi2ndarray of shape (n_features,)
Chi2 statistics for each feature.
 p_valuesndarray of shape (n_features,)
Pvalues for each feature.
See also
f_classif
ANOVA Fvalue between label/feature for classification tasks.
f_regression
Fvalue between label/feature for regression tasks.
Notes
Complexity of this algorithm is O(n_classes * n_features).