.. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_feature_selection_plot_feature_selection.py: ============================ Univariate Feature Selection ============================ An example showing univariate feature selection. Noisy (non informative) features are added to the iris data and univariate feature selection is applied. For each feature, we plot the p-values for the univariate feature selection and the corresponding weights of an SVM. We can see that univariate feature selection selects the informative features and that these have larger SVM weights. In the total set of features, only the 4 first ones are significant. We can see that they have the highest score with univariate feature selection. The SVM assigns a large weight to one of these features, but also Selects many of the non-informative features. Applying univariate feature selection before the SVM increases the SVM weight attributed to the significant features, and will thus improve classification. .. image:: /auto_examples/feature_selection/images/sphx_glr_plot_feature_selection_001.png :alt: Comparing feature selection :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Classification accuracy without selecting features: 0.789 Classification accuracy after univariate feature selection: 0.868 | .. code-block:: default print(__doc__) import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.preprocessing import MinMaxScaler from sklearn.svm import LinearSVC from sklearn.pipeline import make_pipeline from sklearn.feature_selection import SelectKBest, f_classif # ############################################################################# # Import some data to play with # The iris dataset X, y = load_iris(return_X_y=True) # Some noisy data not correlated E = np.random.RandomState(42).uniform(0, 0.1, size=(X.shape[0], 20)) # Add the noisy data to the informative features X = np.hstack((X, E)) # Split dataset to select feature and evaluate the classifier X_train, X_test, y_train, y_test = train_test_split( X, y, stratify=y, random_state=0 ) plt.figure(1) plt.clf() X_indices = np.arange(X.shape[-1]) # ############################################################################# # Univariate feature selection with F-test for feature scoring # We use the default selection function to select the four # most significant features selector = SelectKBest(f_classif, k=4) selector.fit(X_train, y_train) scores = -np.log10(selector.pvalues_) scores /= scores.max() plt.bar(X_indices - .45, scores, width=.2, label=r'Univariate score ($-Log(p_{value})$)') # ############################################################################# # Compare to the weights of an SVM clf = make_pipeline(MinMaxScaler(), LinearSVC()) clf.fit(X_train, y_train) print('Classification accuracy without selecting features: {:.3f}' .format(clf.score(X_test, y_test))) svm_weights = np.abs(clf[-1].coef_).sum(axis=0) svm_weights /= svm_weights.sum() plt.bar(X_indices - .25, svm_weights, width=.2, label='SVM weight') clf_selected = make_pipeline( SelectKBest(f_classif, k=4), MinMaxScaler(), LinearSVC() ) clf_selected.fit(X_train, y_train) print('Classification accuracy after univariate feature selection: {:.3f}' .format(clf_selected.score(X_test, y_test))) svm_weights_selected = np.abs(clf_selected[-1].coef_).sum(axis=0) svm_weights_selected /= svm_weights_selected.sum() plt.bar(X_indices[selector.get_support()] - .05, svm_weights_selected, width=.2, label='SVM weights after selection') plt.title("Comparing feature selection") plt.xlabel('Feature number') plt.yticks(()) plt.axis('tight') plt.legend(loc='upper right') plt.show() .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.107 seconds) .. _sphx_glr_download_auto_examples_feature_selection_plot_feature_selection.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: binder-badge .. image:: https://mybinder.org/badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/0.23.X?urlpath=lab/tree/notebooks/auto_examples/feature_selection/plot_feature_selection.ipynb :width: 150 px .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_feature_selection.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_feature_selection.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_