.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/exercises/plot_cv_diabetes.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_exercises_plot_cv_diabetes.py>`
        to download the full example code or to run this example in your browser via JupyterLite or Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_exercises_plot_cv_diabetes.py:


===============================================
Cross-validation on diabetes Dataset Exercise
===============================================

A tutorial exercise which uses cross-validation with linear models.

This exercise is used in the :ref:`cv_estimators_tut` part of the
:ref:`model_selection_tut` section of the :ref:`stat_learn_tut_index`.

.. GENERATED FROM PYTHON SOURCE LINES 14-16

Load dataset and apply GridSearchCV
-----------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 16-38

.. code-block:: Python

    import matplotlib.pyplot as plt
    import numpy as np

    from sklearn import datasets
    from sklearn.linear_model import Lasso
    from sklearn.model_selection import GridSearchCV

    X, y = datasets.load_diabetes(return_X_y=True)
    X = X[:150]
    y = y[:150]

    lasso = Lasso(random_state=0, max_iter=10000)
    alphas = np.logspace(-4, -0.5, 30)

    tuned_parameters = [{"alpha": alphas}]
    n_folds = 5

    clf = GridSearchCV(lasso, tuned_parameters, cv=n_folds, refit=False)
    clf.fit(X, y)
    scores = clf.cv_results_["mean_test_score"]
    scores_std = clf.cv_results_["std_test_score"]


.. GENERATED FROM PYTHON SOURCE LINES 39-41

Plot error lines showing +/- std. errors of the scores
------------------------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 41-58

.. code-block:: Python


    plt.figure().set_size_inches(8, 6)
    plt.semilogx(alphas, scores)

    std_error = scores_std / np.sqrt(n_folds)

    plt.semilogx(alphas, scores + std_error, "b--")
    plt.semilogx(alphas, scores - std_error, "b--")

    # alpha=0.2 controls the translucency of the fill color
    plt.fill_between(alphas, scores + std_error, scores - std_error, alpha=0.2)

    plt.ylabel("CV score +/- std error")
    plt.xlabel("alpha")
    plt.axhline(np.max(scores), linestyle="--", color=".5")
    plt.xlim([alphas[0], alphas[-1]])


.. image-sg:: /auto_examples/exercises/images/sphx_glr_plot_cv_diabetes_001.png
   :alt: plot cv diabetes
   :srcset: /auto_examples/exercises/images/sphx_glr_plot_cv_diabetes_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    (9.999999999999999e-05, 0.31622776601683794)


.. GENERATED FROM PYTHON SOURCE LINES 59-61

Bonus: how much can you trust the selection of alpha?
-----------------------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 61-91

.. code-block:: Python


    # To answer this question we use the LassoCV object that sets its alpha
    # parameter automatically from the data by internal cross-validation (i.e. it
    # performs cross-validation on the training data it receives).
    # We use external cross-validation to see how much the automatically obtained
    # alphas differ across different cross-validation folds.

    from sklearn.linear_model import LassoCV
    from sklearn.model_selection import KFold

    lasso_cv = LassoCV(alphas=alphas, random_state=0, max_iter=10000)
    k_fold = KFold(3)

    print("Answer to the bonus question:", "how much can you trust the selection of alpha?")
    print()
    print("Alpha parameters maximising the generalization score on different")
    print("subsets of the data:")
    for k, (train, test) in enumerate(k_fold.split(X, y)):
        lasso_cv.fit(X[train], y[train])
        print(
            "[fold {0}] alpha: {1:.5f}, score: {2:.5f}".format(
                k, lasso_cv.alpha_, lasso_cv.score(X[test], y[test])
            )
        )
    print()
    print("Answer: Not very much since we obtained different alphas for different")
    print("subsets of the data and moreover, the scores for these alphas differ")
    print("quite substantially.")

    plt.show()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Answer to the bonus question: how much can you trust the selection of alpha?

    Alpha parameters maximising the generalization score on different
    subsets of the data:
    [fold 0] alpha: 0.05968, score: 0.54209
    [fold 1] alpha: 0.04520, score: 0.15521
    [fold 2] alpha: 0.07880, score: 0.45192

    Answer: Not very much since we obtained different alphas for different
    subsets of the data and moreover, the scores for these alphas differ
    quite substantially.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.516 seconds)


.. _sphx_glr_download_auto_examples_exercises_plot_cv_diabetes.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.4.X?urlpath=lab/tree/notebooks/auto_examples/exercises/plot_cv_diabetes.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: lite-badge

      .. image:: images/jupyterlite_badge_logo.svg
        :target: ../../lite/lab/?path=auto_examples/exercises/plot_cv_diabetes.ipynb
        :alt: Launch JupyterLite
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_cv_diabetes.ipynb <plot_cv_diabetes.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_cv_diabetes.py <plot_cv_diabetes.py>`


.. include:: plot_cv_diabetes.recommendations


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_