.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/linear_model/plot_ols.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_linear_model_plot_ols.py>`
        to download the full example code. or to run this example in your browser via JupyterLite or Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_linear_model_plot_ols.py:


==============================
Ordinary Least Squares Example
==============================

This example shows how to use the ordinary least squares (OLS) model
called :class:`~sklearn.linear_model.LinearRegression` in scikit-learn.

For this purpose, we use a single feature from the diabetes dataset and try to
predict the diabetes progression using this linear model. We therefore load the
diabetes dataset and split it into training and test sets.

Then, we fit the model on the training set and evaluate its performance on the test
set and finally visualize the results on the test set.

.. GENERATED FROM PYTHON SOURCE LINES 16-20

.. code-block:: Python


    # Authors: The scikit-learn developers
    # SPDX-License-Identifier: BSD-3-Clause


.. GENERATED FROM PYTHON SOURCE LINES 21-26

Data Loading and Preparation
----------------------------

Load the diabetes dataset. For simplicity, we only keep a single feature in the data.
Then, we split the data and target into training and test sets.

.. GENERATED FROM PYTHON SOURCE LINES 26-33

.. code-block:: Python

    from sklearn.datasets import load_diabetes
    from sklearn.model_selection import train_test_split

    X, y = load_diabetes(return_X_y=True)
    X = X[:, [2]]  # Use only one feature
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=20, shuffle=False)


.. GENERATED FROM PYTHON SOURCE LINES 34-40

Linear regression model
-----------------------

We create a linear regression model and fit it on the training data. Note that by
default, an intercept is added to the model. We can control this behavior by setting
the `fit_intercept` parameter.

.. GENERATED FROM PYTHON SOURCE LINES 40-44

.. code-block:: Python

    from sklearn.linear_model import LinearRegression

    regressor = LinearRegression().fit(X_train, y_train)


.. GENERATED FROM PYTHON SOURCE LINES 45-50

Model evaluation
----------------

We evaluate the model's performance on the test set using the mean squared error
and the coefficient of determination.

.. GENERATED FROM PYTHON SOURCE LINES 50-57

.. code-block:: Python

    from sklearn.metrics import mean_squared_error, r2_score

    y_pred = regressor.predict(X_test)

    print(f"Mean squared error: {mean_squared_error(y_test, y_pred):.2f}")
    print(f"Coefficient of determination: {r2_score(y_test, y_pred):.2f}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Mean squared error: 2548.07
    Coefficient of determination: 0.47


.. GENERATED FROM PYTHON SOURCE LINES 58-62

Plotting the results
--------------------

Finally, we visualize the results on the train and test data.

.. GENERATED FROM PYTHON SOURCE LINES 62-86

.. code-block:: Python

    import matplotlib.pyplot as plt

    fig, ax = plt.subplots(ncols=2, figsize=(10, 5), sharex=True, sharey=True)

    ax[0].scatter(X_train, y_train, label="Train data points")
    ax[0].plot(
        X_train,
        regressor.predict(X_train),
        linewidth=3,
        color="tab:orange",
        label="Model predictions",
    )
    ax[0].set(xlabel="Feature", ylabel="Target", title="Train set")
    ax[0].legend()

    ax[1].scatter(X_test, y_test, label="Test data points")
    ax[1].plot(X_test, y_pred, linewidth=3, color="tab:orange", label="Model predictions")
    ax[1].set(xlabel="Feature", ylabel="Target", title="Test set")
    ax[1].legend()

    fig.suptitle("Linear Regression")

    plt.show()


.. image-sg:: /auto_examples/linear_model/images/sphx_glr_plot_ols_001.png
   :alt: Linear Regression, Train set, Test set
   :srcset: /auto_examples/linear_model/images/sphx_glr_plot_ols_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 87-98

Conclusion
----------

The trained model corresponds to the estimator that minimizes the mean squared error
between the predicted and the true target values on the training data. We therefore
obtain an estimator of the conditional mean of the target given the data.

Note that in higher dimensions, minimizing only the squared error might lead to
overfitting. Therefore, regularization techniques are commonly used to prevent this
issue, such as those implemented in :class:`~sklearn.linear_model.Ridge` or
:class:`~sklearn.linear_model.Lasso`.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.146 seconds)


.. _sphx_glr_download_auto_examples_linear_model_plot_ols.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.6.X?urlpath=lab/tree/notebooks/auto_examples/linear_model/plot_ols.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: lite-badge

      .. image:: images/jupyterlite_badge_logo.svg
        :target: ../../lite/lab/index.html?path=auto_examples/linear_model/plot_ols.ipynb
        :alt: Launch JupyterLite
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_ols.ipynb <plot_ols.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_ols.py <plot_ols.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_ols.zip <plot_ols.zip>`


.. include:: plot_ols.recommendations


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_