.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_feature_selection_plot_select_from_model_diabetes.py>`     to download the full example code or to run this example in your browser via Binder
    .. rst-class:: sphx-glr-example-title

    .. _sphx_glr_auto_examples_feature_selection_plot_select_from_model_diabetes.py:


===================================================
Feature selection using SelectFromModel and LassoCV
===================================================

Use SelectFromModel meta-transformer along with Lasso to select the best
couple of features from the diabetes dataset.

Since the L1 norm promotes sparsity of features we might be interested in
selecting only a subset of the most interesting features from the dataset. This
example shows how to select two the most interesting features from the diabetes
dataset.

Diabetes dataset consists of 10 variables (features) collected from 442
diabetes patients. This example shows how to use SelectFromModel and LassoCv to
find the best two features predicting disease progression after one year from
the baseline.

Authors: `Manoj Kumar <mks542@nyu.edu>`_,
`Maria Telenczuk <https://github.com/maikia>`_

License: BSD 3 clause


.. code-block:: default


    print(__doc__)

    import matplotlib.pyplot as plt
    import numpy as np

    from sklearn.datasets import load_diabetes
    from sklearn.feature_selection import SelectFromModel
    from sklearn.linear_model import LassoCV








Load the data
---------------------------------------------------------

First, let's load the diabetes dataset which is available from within
sklearn. Then, we will look what features are collected for the diabates
patients:


.. code-block:: default


    diabetes = load_diabetes()

    X = diabetes.data
    y = diabetes.target

    feature_names = diabetes.feature_names
    print(feature_names)





.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']




Find importance of the features
---------------------------------------------------------

To decide on the importance of the features we are going to use LassoCV
estimator. The features with the highest absolute `coef_` value are
considered the most important


.. code-block:: default


    clf = LassoCV().fit(X, y)
    importance = np.abs(clf.coef_)
    print(importance)





.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    [  6.49684455 235.99640534 521.73854261 321.06689245 569.4426838
     302.45627915   0.         143.6995665  669.92633112  66.83430445]




Select from the model features with the higest score
---------------------------------------------------------

Now we want to select the two features which are the most important.
SelectFromModel() allows for setting the threshold. Only the features with
the `coef_` higher than the threshold will remain. Here, we want to set the
threshold slightly above the third highest `coef_` calculated by LassoCV()
from our data.


.. code-block:: default


    idx_third = importance.argsort()[-3]
    threshold = importance[idx_third] + 0.01

    idx_features = (-importance).argsort()[:2]
    name_features = np.array(feature_names)[idx_features]
    print('Selected features: {}'.format(name_features))

    sfm = SelectFromModel(clf, threshold=threshold)
    sfm.fit(X, y)
    X_transform = sfm.transform(X)

    n_features = sfm.transform(X).shape[1]





.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Selected features: ['s5' 's1']




Plot the two most important features
---------------------------------------------------------

Finally we will plot the selected two features from the data.


.. code-block:: default


    plt.title(
        "Features from diabets using SelectFromModel with "
        "threshold %0.3f." % sfm.threshold)
    feature1 = X_transform[:, 0]
    feature2 = X_transform[:, 1]
    plt.plot(feature1, feature2, 'r.')
    plt.xlabel("First feature: {}".format(name_features[0]))
    plt.ylabel("Second feature: {}".format(name_features[1]))
    plt.ylim([np.min(feature2), np.max(feature2)])
    plt.show()



.. image:: /auto_examples/feature_selection/images/sphx_glr_plot_select_from_model_diabetes_001.png
    :alt: Features from diabets using SelectFromModel with threshold 521.749.
    :class: sphx-glr-single-img






.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  0.187 seconds)


.. _sphx_glr_download_auto_examples_feature_selection_plot_select_from_model_diabetes.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: binder-badge

    .. image:: https://mybinder.org/badge_logo.svg
      :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/0.23.X?urlpath=lab/tree/notebooks/auto_examples/feature_selection/plot_select_from_model_diabetes.ipynb
      :width: 150 px


  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: plot_select_from_model_diabetes.py <plot_select_from_model_diabetes.py>`



  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: plot_select_from_model_diabetes.ipynb <plot_select_from_model_diabetes.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_