sklearn.feature_selection.chi2¶
- sklearn.feature_selection.chi2(X, y)¶
Compute χ² (chi-squared) statistic for each class/feature combination.
This score can be used to select the n_features features with the highest values for the χ² (chi-square) statistic from X, which must contain booleans or frequencies (e.g., term counts in document classification), relative to the classes.
Recall that the χ² statistic measures dependence between stochastic variables, so using this function “weeds out” the features that are the most likely to be independent of class and therefore irrelevant for classification.
Parameters : X : {array-like, sparse matrix}, shape = (n_samples, n_features_in)
Sample vectors.
y : array-like, shape = (n_samples,)
Target vector (class labels).
Returns : chi2 : array, shape = (n_features,)
chi2 statistics of each feature.
pval : array, shape = (n_features,)
p-values of each feature.
Notes
Complexity of this algorithm is O(n_classes * n_features).