.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/model_selection/plot_train_error_vs_test_error.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_model_selection_plot_train_error_vs_test_error.py>`
        to download the full example code or to run this example in your browser via JupyterLite or Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_model_selection_plot_train_error_vs_test_error.py:


=========================
Train error vs Test error
=========================

Illustration of how the performance of an estimator on unseen data (test data)
is not the same as the performance on training data. As the regularization
increases the performance on train decreases while the performance on test
is optimal within a range of values of the regularization parameter.
The example with an Elastic-Net regression model and the performance is
measured using the explained variance a.k.a. R^2.

.. GENERATED FROM PYTHON SOURCE LINES 14-18

.. code-block:: Python


    # Author: Alexandre Gramfort <alexandre.gramfort@inria.fr>
    # License: BSD 3 clause


.. GENERATED FROM PYTHON SOURCE LINES 19-21

Generate sample data
--------------------

.. GENERATED FROM PYTHON SOURCE LINES 21-39

.. code-block:: Python

    import numpy as np

    from sklearn import linear_model
    from sklearn.datasets import make_regression
    from sklearn.model_selection import train_test_split

    n_samples_train, n_samples_test, n_features = 75, 150, 500
    X, y, coef = make_regression(
        n_samples=n_samples_train + n_samples_test,
        n_features=n_features,
        n_informative=50,
        shuffle=False,
        noise=1.0,
        coef=True,
    )
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, train_size=n_samples_train, test_size=n_samples_test, shuffle=False
    )


.. GENERATED FROM PYTHON SOURCE LINES 40-42

Compute train and test errors
-----------------------------

.. GENERATED FROM PYTHON SOURCE LINES 42-60

.. code-block:: Python

    alphas = np.logspace(-5, 1, 60)
    enet = linear_model.ElasticNet(l1_ratio=0.7, max_iter=10000)
    train_errors = list()
    test_errors = list()
    for alpha in alphas:
        enet.set_params(alpha=alpha)
        enet.fit(X_train, y_train)
        train_errors.append(enet.score(X_train, y_train))
        test_errors.append(enet.score(X_test, y_test))

    i_alpha_optim = np.argmax(test_errors)
    alpha_optim = alphas[i_alpha_optim]
    print("Optimal regularization parameter : %s" % alpha_optim)

    # Estimate the coef_ on full data with optimal regularization parameter
    enet.set_params(alpha=alpha_optim)
    coef_ = enet.fit(X, y).coef_


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Optimal regularization parameter : 0.00020991037201085544


.. GENERATED FROM PYTHON SOURCE LINES 61-63

Plot results functions
----------------------

.. GENERATED FROM PYTHON SOURCE LINES 63-89

.. code-block:: Python


    import matplotlib.pyplot as plt

    plt.subplot(2, 1, 1)
    plt.semilogx(alphas, train_errors, label="Train")
    plt.semilogx(alphas, test_errors, label="Test")
    plt.vlines(
        alpha_optim,
        plt.ylim()[0],
        np.max(test_errors),
        color="k",
        linewidth=3,
        label="Optimum on test",
    )
    plt.legend(loc="lower right")
    plt.ylim([0, 1.2])
    plt.xlabel("Regularization parameter")
    plt.ylabel("Performance")

    # Show estimated coef_ vs true coef
    plt.subplot(2, 1, 2)
    plt.plot(coef, label="True coef")
    plt.plot(coef_, label="Estimated coef")
    plt.legend()
    plt.subplots_adjust(0.09, 0.04, 0.94, 0.94, 0.26, 0.26)
    plt.show()


.. image-sg:: /auto_examples/model_selection/images/sphx_glr_plot_train_error_vs_test_error_001.png
   :alt: plot train error vs test error
   :srcset: /auto_examples/model_selection/images/sphx_glr_plot_train_error_vs_test_error_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 6.680 seconds)


.. _sphx_glr_download_auto_examples_model_selection_plot_train_error_vs_test_error.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/model_selection/plot_train_error_vs_test_error.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: lite-badge

      .. image:: images/jupyterlite_badge_logo.svg
        :target: ../../lite/lab/?path=auto_examples/model_selection/plot_train_error_vs_test_error.ipynb
        :alt: Launch JupyterLite
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_train_error_vs_test_error.ipynb <plot_train_error_vs_test_error.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_train_error_vs_test_error.py <plot_train_error_vs_test_error.py>`


.. include:: plot_train_error_vs_test_error.recommendations


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_