.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/linear_model/plot_lasso_model_selection.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_linear_model_plot_lasso_model_selection.py>`
        to download the full example code or to run this example in your browser via Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_linear_model_plot_lasso_model_selection.py:


=================================================
Lasso model selection: AIC-BIC / cross-validation
=================================================

This example focuses on model selection for Lasso models that are
linear models with an L1 penalty for regression problems.

Indeed, several strategies can be used to select the value of the
regularization parameter: via cross-validation or using an information
criterion, namely AIC or BIC.

In what follows, we will discuss in details the different strategies.

.. GENERATED FROM PYTHON SOURCE LINES 15-22

.. code-block:: default


    # Author: Olivier Grisel
    #         Gael Varoquaux
    #         Alexandre Gramfort
    #         Guillaume Lemaitre
    # License: BSD 3 clause








.. GENERATED FROM PYTHON SOURCE LINES 23-27

.. code-block:: default

    import sklearn

    sklearn.set_config(display="diagram")








.. GENERATED FROM PYTHON SOURCE LINES 28-31

Dataset
-------
In this example, we will use the diabetes dataset.

.. GENERATED FROM PYTHON SOURCE LINES 31-36

.. code-block:: default

    from sklearn.datasets import load_diabetes

    X, y = load_diabetes(return_X_y=True, as_frame=True)
    X.head()






.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>age</th>
          <th>sex</th>
          <th>bmi</th>
          <th>bp</th>
          <th>s1</th>
          <th>s2</th>
          <th>s3</th>
          <th>s4</th>
          <th>s5</th>
          <th>s6</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>0.038076</td>
          <td>0.050680</td>
          <td>0.061696</td>
          <td>0.021872</td>
          <td>-0.044223</td>
          <td>-0.034821</td>
          <td>-0.043401</td>
          <td>-0.002592</td>
          <td>0.019908</td>
          <td>-0.017646</td>
        </tr>
        <tr>
          <th>1</th>
          <td>-0.001882</td>
          <td>-0.044642</td>
          <td>-0.051474</td>
          <td>-0.026328</td>
          <td>-0.008449</td>
          <td>-0.019163</td>
          <td>0.074412</td>
          <td>-0.039493</td>
          <td>-0.068330</td>
          <td>-0.092204</td>
        </tr>
        <tr>
          <th>2</th>
          <td>0.085299</td>
          <td>0.050680</td>
          <td>0.044451</td>
          <td>-0.005671</td>
          <td>-0.045599</td>
          <td>-0.034194</td>
          <td>-0.032356</td>
          <td>-0.002592</td>
          <td>0.002864</td>
          <td>-0.025930</td>
        </tr>
        <tr>
          <th>3</th>
          <td>-0.089063</td>
          <td>-0.044642</td>
          <td>-0.011595</td>
          <td>-0.036656</td>
          <td>0.012191</td>
          <td>0.024991</td>
          <td>-0.036038</td>
          <td>0.034309</td>
          <td>0.022692</td>
          <td>-0.009362</td>
        </tr>
        <tr>
          <th>4</th>
          <td>0.005383</td>
          <td>-0.044642</td>
          <td>-0.036385</td>
          <td>0.021872</td>
          <td>0.003935</td>
          <td>0.015596</td>
          <td>0.008142</td>
          <td>-0.002592</td>
          <td>-0.031991</td>
          <td>-0.046641</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 37-39

In addition, we add some random features to the original data to
better illustrate the feature selection performed by the Lasso model.

.. GENERATED FROM PYTHON SOURCE LINES 39-52

.. code-block:: default

    import numpy as np
    import pandas as pd

    rng = np.random.RandomState(42)
    n_random_features = 14
    X_random = pd.DataFrame(
        rng.randn(X.shape[0], n_random_features),
        columns=[f"random_{i:02d}" for i in range(n_random_features)],
    )
    X = pd.concat([X, X_random], axis=1)
    # Show only a subset of the columns
    X[X.columns[::3]].head()






.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>age</th>
          <th>bp</th>
          <th>s3</th>
          <th>s6</th>
          <th>random_02</th>
          <th>random_05</th>
          <th>random_08</th>
          <th>random_11</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>0.038076</td>
          <td>0.021872</td>
          <td>-0.043401</td>
          <td>-0.017646</td>
          <td>0.647689</td>
          <td>-0.234137</td>
          <td>-0.469474</td>
          <td>-0.465730</td>
        </tr>
        <tr>
          <th>1</th>
          <td>-0.001882</td>
          <td>-0.026328</td>
          <td>0.074412</td>
          <td>-0.092204</td>
          <td>-1.012831</td>
          <td>-1.412304</td>
          <td>0.067528</td>
          <td>0.110923</td>
        </tr>
        <tr>
          <th>2</th>
          <td>0.085299</td>
          <td>-0.005671</td>
          <td>-0.032356</td>
          <td>-0.025930</td>
          <td>-0.601707</td>
          <td>-1.057711</td>
          <td>0.208864</td>
          <td>0.196861</td>
        </tr>
        <tr>
          <th>3</th>
          <td>-0.089063</td>
          <td>-0.036656</td>
          <td>-0.036038</td>
          <td>-0.009362</td>
          <td>-1.478522</td>
          <td>1.057122</td>
          <td>0.324084</td>
          <td>0.611676</td>
        </tr>
        <tr>
          <th>4</th>
          <td>0.005383</td>
          <td>0.021872</td>
          <td>0.008142</td>
          <td>-0.046641</td>
          <td>0.331263</td>
          <td>-0.185659</td>
          <td>0.812526</td>
          <td>1.003533</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 53-66

Selecting Lasso via an information criterion
--------------------------------------------
:class:`~sklearn.linear_model.LassoLarsIC` provides a Lasso estimator that
uses the Akaike information criterion (AIC) or the Bayes information
criterion (BIC) to select the optimal value of the regularization
parameter alpha.

Before fitting the model, we will standardize the data with a
:class:`~sklearn.preprocessing.StandardScaler`. In addition, we will
measure the time to fit and tune the hyperparameter alpha in order to
compare with the cross-validation strategy.

We will first fit a Lasso model with the AIC criterion.

.. GENERATED FROM PYTHON SOURCE LINES 66-77

.. code-block:: default

    import time
    from sklearn.preprocessing import StandardScaler
    from sklearn.linear_model import LassoLarsIC
    from sklearn.pipeline import make_pipeline

    start_time = time.time()
    lasso_lars_ic = make_pipeline(
        StandardScaler(), LassoLarsIC(criterion="aic", normalize=False)
    ).fit(X, y)
    fit_time = time.time() - start_time








.. GENERATED FROM PYTHON SOURCE LINES 78-79

We store the AIC metric for each value of alpha used during `fit`.

.. GENERATED FROM PYTHON SOURCE LINES 79-87

.. code-block:: default

    results = pd.DataFrame(
        {
            "alphas": lasso_lars_ic[-1].alphas_,
            "AIC criterion": lasso_lars_ic[-1].criterion_,
        }
    ).set_index("alphas")
    alpha_aic = lasso_lars_ic[-1].alpha_








.. GENERATED FROM PYTHON SOURCE LINES 88-89

Now, we perform the same analysis using the BIC criterion.

.. GENERATED FROM PYTHON SOURCE LINES 89-94

.. code-block:: default

    lasso_lars_ic.set_params(lassolarsic__criterion="bic").fit(X, y)
    results["BIC criterion"] = lasso_lars_ic[-1].criterion_
    alpha_bic = lasso_lars_ic[-1].alpha_









.. GENERATED FROM PYTHON SOURCE LINES 95-96

We can check which value of `alpha` leads to the minimum AIC and BIC.

.. GENERATED FROM PYTHON SOURCE LINES 96-103

.. code-block:: default

    def highlight_min(x):
        x_min = x.min()
        return ["font-weight: bold" if v == x_min else "" for v in x]


    results.style.apply(highlight_min)






.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <style type="text/css">
    #T_c089b_row6_col1, #T_c089b_row21_col0 {
      font-weight: bold;
    }
    </style>
    <table id="T_c089b">
      <thead>
        <tr>
          <th class="blank level0" >&nbsp;</th>
          <th id="T_c089b_level0_col0" class="col_heading level0 col0" >AIC criterion</th>
          <th id="T_c089b_level0_col1" class="col_heading level0 col1" >BIC criterion</th>
        </tr>
        <tr>
          <th class="index_name level0" >alphas</th>
          <th class="blank col0" >&nbsp;</th>
          <th class="blank col1" >&nbsp;</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th id="T_c089b_level0_row0" class="row_heading level0 row0" >45.160030</th>
          <td id="T_c089b_row0_col0" class="data row0 col0" >5244.765686</td>
          <td id="T_c089b_row0_col1" class="data row0 col1" >5244.765686</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row1" class="row_heading level0 row1" >42.300448</th>
          <td id="T_c089b_row1_col0" class="data row1 col0" >5208.252839</td>
          <td id="T_c089b_row1_col1" class="data row1 col1" >5212.344149</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row2" class="row_heading level0 row2" >21.542302</th>
          <td id="T_c089b_row2_col0" class="data row2 col0" >4928.021378</td>
          <td id="T_c089b_row2_col1" class="data row2 col1" >4936.203998</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row3" class="row_heading level0 row3" >15.034110</th>
          <td id="T_c089b_row3_col0" class="data row3 col0" >4869.678327</td>
          <td id="T_c089b_row3_col1" class="data row3 col1" >4881.952257</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row4" class="row_heading level0 row4" >6.189693</th>
          <td id="T_c089b_row4_col0" class="data row4 col0" >4815.437203</td>
          <td id="T_c089b_row4_col1" class="data row4 col1" >4831.802443</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row5" class="row_heading level0 row5" >5.329590</th>
          <td id="T_c089b_row5_col0" class="data row5 col0" >4810.422702</td>
          <td id="T_c089b_row5_col1" class="data row5 col1" >4830.879251</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row6" class="row_heading level0 row6" >4.305930</th>
          <td id="T_c089b_row6_col0" class="data row6 col0" >4803.572132</td>
          <td id="T_c089b_row6_col1" class="data row6 col1" >4828.119992</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row7" class="row_heading level0 row7" >4.124325</th>
          <td id="T_c089b_row7_col0" class="data row7 col0" >4804.126582</td>
          <td id="T_c089b_row7_col1" class="data row7 col1" >4832.765751</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row8" class="row_heading level0 row8" >3.820656</th>
          <td id="T_c089b_row8_col0" class="data row8 col0" >4803.620474</td>
          <td id="T_c089b_row8_col1" class="data row8 col1" >4836.350953</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row9" class="row_heading level0 row9" >3.750361</th>
          <td id="T_c089b_row9_col0" class="data row9 col0" >4805.011535</td>
          <td id="T_c089b_row9_col1" class="data row9 col1" >4841.833324</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row10" class="row_heading level0 row10" >3.570653</th>
          <td id="T_c089b_row10_col0" class="data row10 col0" >4805.289336</td>
          <td id="T_c089b_row10_col1" class="data row10 col1" >4846.202435</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row11" class="row_heading level0 row11" >3.550197</th>
          <td id="T_c089b_row11_col0" class="data row11 col0" >4807.075010</td>
          <td id="T_c089b_row11_col1" class="data row11 col1" >4852.079419</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row12" class="row_heading level0 row12" >3.358374</th>
          <td id="T_c089b_row12_col0" class="data row12 col0" >4806.878236</td>
          <td id="T_c089b_row12_col1" class="data row12 col1" >4855.973955</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row13" class="row_heading level0 row13" >3.259335</th>
          <td id="T_c089b_row13_col0" class="data row13 col0" >4807.705694</td>
          <td id="T_c089b_row13_col1" class="data row13 col1" >4860.892722</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row14" class="row_heading level0 row14" >3.237724</th>
          <td id="T_c089b_row14_col0" class="data row14 col0" >4809.439868</td>
          <td id="T_c089b_row14_col1" class="data row14 col1" >4866.718206</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row15" class="row_heading level0 row15" >2.850023</th>
          <td id="T_c089b_row15_col0" class="data row15 col0" >4805.988366</td>
          <td id="T_c089b_row15_col1" class="data row15 col1" >4867.358015</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row16" class="row_heading level0 row16" >2.384495</th>
          <td id="T_c089b_row16_col0" class="data row16 col0" >4801.703302</td>
          <td id="T_c089b_row16_col1" class="data row16 col1" >4867.164260</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row17" class="row_heading level0 row17" >2.296438</th>
          <td id="T_c089b_row17_col0" class="data row17 col0" >4802.592070</td>
          <td id="T_c089b_row17_col1" class="data row17 col1" >4872.144338</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row18" class="row_heading level0 row18" >2.031600</th>
          <td id="T_c089b_row18_col0" class="data row18 col0" >4801.236396</td>
          <td id="T_c089b_row18_col1" class="data row18 col1" >4874.879974</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row19" class="row_heading level0 row19" >1.618202</th>
          <td id="T_c089b_row19_col0" class="data row19 col0" >4798.482569</td>
          <td id="T_c089b_row19_col1" class="data row19 col1" >4876.217457</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row20" class="row_heading level0 row20" >1.526599</th>
          <td id="T_c089b_row20_col0" class="data row20 col0" >4799.542941</td>
          <td id="T_c089b_row20_col1" class="data row20 col1" >4881.369138</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row21" class="row_heading level0 row21" >0.586815</th>
          <td id="T_c089b_row21_col0" class="data row21 col0" >4794.237899</td>
          <td id="T_c089b_row21_col1" class="data row21 col1" >4880.155406</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row22" class="row_heading level0 row22" >0.446051</th>
          <td id="T_c089b_row22_col0" class="data row22 col0" >4795.589066</td>
          <td id="T_c089b_row22_col1" class="data row22 col1" >4885.597883</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row23" class="row_heading level0 row23" >0.259021</th>
          <td id="T_c089b_row23_col0" class="data row23 col0" >4796.966000</td>
          <td id="T_c089b_row23_col1" class="data row23 col1" >4891.066127</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row24" class="row_heading level0 row24" >0.032175</th>
          <td id="T_c089b_row24_col0" class="data row24 col0" >4794.661549</td>
          <td id="T_c089b_row24_col1" class="data row24 col1" >4888.761677</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row25" class="row_heading level0 row25" >0.019066</th>
          <td id="T_c089b_row25_col0" class="data row25 col0" >4794.651882</td>
          <td id="T_c089b_row25_col1" class="data row25 col1" >4888.752009</td>
        </tr>
        <tr>
          <th id="T_c089b_level0_row26" class="row_heading level0 row26" >0.000000</th>
          <td id="T_c089b_row26_col0" class="data row26 col0" >4796.625435</td>
          <td id="T_c089b_row26_col1" class="data row26 col1" >4894.816872</td>
        </tr>
      </tbody>
    </table>

    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 104-108

Finally, we can plot the AIC and BIC values for the different alpha values.
The vertical lines in the plot correspond to the alpha chosen for each
criterion. The selected alpha corresponds to the minimum of the AIC or BIC
criterion.

.. GENERATED FROM PYTHON SOURCE LINES 108-133

.. code-block:: default

    ax = results.plot()
    ax.vlines(
        alpha_aic,
        results["AIC criterion"].min(),
        results["AIC criterion"].max(),
        label="alpha: AIC estimate",
        linestyles="--",
        color="tab:blue",
    )
    ax.vlines(
        alpha_bic,
        results["BIC criterion"].min(),
        results["BIC criterion"].max(),
        label="alpha: BIC estimate",
        linestyle="--",
        color="tab:orange",
    )
    ax.set_xlabel(r"$\alpha$")
    ax.set_ylabel("criterion")
    ax.set_xscale("log")
    ax.legend()
    _ = ax.set_title(
        f"Information-criterion for model selection (training time {fit_time:.2f}s)"
    )




.. image-sg:: /auto_examples/linear_model/images/sphx_glr_plot_lasso_model_selection_001.png
   :alt: Information-criterion for model selection (training time 0.01s)
   :srcset: /auto_examples/linear_model/images/sphx_glr_plot_lasso_model_selection_001.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 134-164

Model selection with an information-criterion is very fast. It relies on
computing the criterion on the in-sample set provided to `fit`. Both criteria
estimate the model generalization error based on the training set error and
penalize this overly optimistic error. However, this penalty relies on a
proper estimation of the degrees of freedom and the noise variance. Both are
derived for large samples (asymptotic results) and assume the model is
correct, i.e. that the data are actually generated by this model.

These models also tend to break when the problem is badly conditioned (more
features than samples). It is then required to provide an estimate of the
noise variance.

Selecting Lasso via cross-validation
------------------------------------
The Lasso estimator can be implemented with different solvers: coordinate
descent and least angle regression. They differ with regards to their
execution speed and sources of numerical errors.

In scikit-learn, two different estimators are available with integrated
cross-validation: :class:`~sklearn.linear_model.LassoCV` and
:class:`~sklearn.linear_model.LassoLarsCV` that respectively solve the
problem with coordinate descent and least angle regression.

In the remainder of this section, we will present both approaches. For both
algorithms, we will use a 20-fold cross-validation strategy.

Lasso via coordinate descent
............................
Let's start by making the hyperparameter tuning using
:class:`~sklearn.linear_model.LassoCV`.

.. GENERATED FROM PYTHON SOURCE LINES 164-170

.. code-block:: default

    from sklearn.linear_model import LassoCV

    start_time = time.time()
    model = make_pipeline(StandardScaler(), LassoCV(cv=20)).fit(X, y)
    fit_time = time.time() - start_time








.. GENERATED FROM PYTHON SOURCE LINES 171-193

.. code-block:: default

    import matplotlib.pyplot as plt

    ymin, ymax = 2300, 3800
    lasso = model[-1]
    plt.semilogx(lasso.alphas_, lasso.mse_path_, linestyle=":")
    plt.plot(
        lasso.alphas_,
        lasso.mse_path_.mean(axis=-1),
        color="black",
        label="Average across the folds",
        linewidth=2,
    )
    plt.axvline(lasso.alpha_, linestyle="--", color="black", label="alpha: CV estimate")

    plt.ylim(ymin, ymax)
    plt.xlabel(r"$\alpha$")
    plt.ylabel("Mean square error")
    plt.legend()
    _ = plt.title(
        f"Mean square error on each fold: coordinate descent (train time: {fit_time:.2f}s)"
    )




.. image-sg:: /auto_examples/linear_model/images/sphx_glr_plot_lasso_model_selection_002.png
   :alt: Mean square error on each fold: coordinate descent (train time: 0.19s)
   :srcset: /auto_examples/linear_model/images/sphx_glr_plot_lasso_model_selection_002.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 194-198

Lasso via least angle regression
................................
Let's start by making the hyperparameter tuning using
:class:`~sklearn.linear_model.LassoLarsCV`.

.. GENERATED FROM PYTHON SOURCE LINES 198-204

.. code-block:: default

    from sklearn.linear_model import LassoLarsCV

    start_time = time.time()
    model = make_pipeline(StandardScaler(), LassoLarsCV(cv=20, normalize=False)).fit(X, y)
    fit_time = time.time() - start_time








.. GENERATED FROM PYTHON SOURCE LINES 205-222

.. code-block:: default

    lasso = model[-1]
    plt.semilogx(lasso.cv_alphas_, lasso.mse_path_, ":")
    plt.semilogx(
        lasso.cv_alphas_,
        lasso.mse_path_.mean(axis=-1),
        color="black",
        label="Average across the folds",
        linewidth=2,
    )
    plt.axvline(lasso.alpha_, linestyle="--", color="black", label="alpha CV")

    plt.ylim(ymin, ymax)
    plt.xlabel(r"$\alpha$")
    plt.ylabel("Mean square error")
    plt.legend()
    _ = plt.title(f"Mean square error on each fold: Lars (train time: {fit_time:.2f}s)")




.. image-sg:: /auto_examples/linear_model/images/sphx_glr_plot_lasso_model_selection_003.png
   :alt: Mean square error on each fold: Lars (train time: 0.07s)
   :srcset: /auto_examples/linear_model/images/sphx_glr_plot_lasso_model_selection_003.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 223-259

Summary of cross-validation approach
....................................
Both algorithms give roughly the same results.

Lars computes a solution path only for each kink in the path. As a result, it
is very efficient when there are only of few kinks, which is the case if
there are few features or samples. Also, it is able to compute the full path
without setting any hyperparameter. On the opposite, coordinate descent
computes the path points on a pre-specified grid (here we use the default).
Thus it is more efficient if the number of grid points is smaller than the
number of kinks in the path. Such a strategy can be interesting if the number
of features is really large and there are enough samples to be selected in
each of the cross-validation fold. In terms of numerical errors, for heavily
correlated variables, Lars will accumulate more errors, while the coordinate
descent algorithm will only sample the path on a grid.

Note how the optimal value of alpha varies for each fold. This illustrates
why nested-cross validation is a good strategy when trying to evaluate the
performance of a method for which a parameter is chosen by cross-validation:
this choice of parameter may not be optimal for a final evaluation on
unseen test set only.

Conclusion
----------
In this tutorial, we presented two approaches for selecting the best
hyperparameter `alpha`: one strategy finds the optimal value of `alpha`
by only using the training set and some information criterion, and another
strategy is based on cross-validation.

In this example, both approaches are working similarly. The in-sample
hyperparameter selection even shows its efficacy in terms of computational
performance. However, it can only be used when the number of samples is large
enough compared to the number of features.

That's why hyperparameter optimization via cross-validation is a safe
strategy: it works in different settings.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  0.787 seconds)


.. _sphx_glr_download_auto_examples_linear_model_plot_lasso_model_selection.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: binder-badge

    .. image:: images/binder_badge_logo.svg
      :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.0.X?urlpath=lab/tree/notebooks/auto_examples/linear_model/plot_lasso_model_selection.ipynb
      :alt: Launch binder
      :width: 150 px


  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: plot_lasso_model_selection.py <plot_lasso_model_selection.py>`



  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: plot_lasso_model_selection.ipynb <plot_lasso_model_selection.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_