.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/gaussian_process/plot_gpr_co2.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_gaussian_process_plot_gpr_co2.py>`
        to download the full example code or to run this example in your browser via JupyterLite or Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_gaussian_process_plot_gpr_co2.py:


====================================================================================
Forecasting of CO2 level on Mona Loa dataset using Gaussian process regression (GPR)
====================================================================================

This example is based on Section 5.4.3 of "Gaussian Processes for Machine
Learning" [RW2006]_. It illustrates an example of complex kernel engineering
and hyperparameter optimization using gradient ascent on the
log-marginal-likelihood. The data consists of the monthly average atmospheric
CO2 concentrations (in parts per million by volume (ppm)) collected at the
Mauna Loa Observatory in Hawaii, between 1958 and 2001. The objective is to
model the CO2 concentration as a function of the time :math:`t` and extrapolate
for years after 2001.

.. topic: References

    .. [RW2006] `Rasmussen, Carl Edward.
       "Gaussian processes in machine learning."
       Summer school on machine learning. Springer, Berlin, Heidelberg, 2003
       <http://www.gaussianprocess.org/gpml/chapters/RW.pdf>`_.

.. GENERATED FROM PYTHON SOURCE LINES 22-29

.. code-block:: Python


    print(__doc__)

    # Authors: Jan Hendrik Metzen <jhm@informatik.uni-bremen.de>
    #          Guillaume Lemaitre <g.lemaitre58@gmail.com>
    # License: BSD 3 clause








.. GENERATED FROM PYTHON SOURCE LINES 30-37

Build the dataset
-----------------

We will derive a dataset from the Mauna Loa Observatory that collected air
samples. We are interested in estimating the concentration of CO2 and
extrapolate it for further year. First, we load the original dataset available
in OpenML.

.. GENERATED FROM PYTHON SOURCE LINES 37-42

.. code-block:: Python

    from sklearn.datasets import fetch_openml

    co2 = fetch_openml(data_id=41187, as_frame=True)
    co2.frame.head()






.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>year</th>
          <th>month</th>
          <th>day</th>
          <th>weight</th>
          <th>flag</th>
          <th>station</th>
          <th>co2</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>1958</td>
          <td>3</td>
          <td>29</td>
          <td>4</td>
          <td>0</td>
          <td>MLO</td>
          <td>316.1</td>
        </tr>
        <tr>
          <th>1</th>
          <td>1958</td>
          <td>4</td>
          <td>5</td>
          <td>6</td>
          <td>0</td>
          <td>MLO</td>
          <td>317.3</td>
        </tr>
        <tr>
          <th>2</th>
          <td>1958</td>
          <td>4</td>
          <td>12</td>
          <td>4</td>
          <td>0</td>
          <td>MLO</td>
          <td>317.6</td>
        </tr>
        <tr>
          <th>3</th>
          <td>1958</td>
          <td>4</td>
          <td>19</td>
          <td>6</td>
          <td>0</td>
          <td>MLO</td>
          <td>317.5</td>
        </tr>
        <tr>
          <th>4</th>
          <td>1958</td>
          <td>4</td>
          <td>26</td>
          <td>2</td>
          <td>0</td>
          <td>MLO</td>
          <td>316.4</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 43-45

First, we process the original dataframe to create a date index and select
only the CO2 column.

.. GENERATED FROM PYTHON SOURCE LINES 45-52

.. code-block:: Python

    import pandas as pd

    co2_data = co2.frame
    co2_data["date"] = pd.to_datetime(co2_data[["year", "month", "day"]])
    co2_data = co2_data[["date", "co2"]].set_index("date")
    co2_data.head()






.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>co2</th>
        </tr>
        <tr>
          <th>date</th>
          <th></th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>1958-03-29</th>
          <td>316.1</td>
        </tr>
        <tr>
          <th>1958-04-05</th>
          <td>317.3</td>
        </tr>
        <tr>
          <th>1958-04-12</th>
          <td>317.6</td>
        </tr>
        <tr>
          <th>1958-04-19</th>
          <td>317.5</td>
        </tr>
        <tr>
          <th>1958-04-26</th>
          <td>316.4</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 53-55

.. code-block:: Python

    co2_data.index.min(), co2_data.index.max()





.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    (Timestamp('1958-03-29 00:00:00'), Timestamp('2001-12-29 00:00:00'))



.. GENERATED FROM PYTHON SOURCE LINES 56-59

We see that we get CO2 concentration for some days from March, 1958 to
December, 2001. We can plot these raw information to have a better
understanding.

.. GENERATED FROM PYTHON SOURCE LINES 59-65

.. code-block:: Python

    import matplotlib.pyplot as plt

    co2_data.plot()
    plt.ylabel("CO$_2$ concentration (ppm)")
    _ = plt.title("Raw air samples measurements from the Mauna Loa Observatory")




.. image-sg:: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_co2_001.png
   :alt: Raw air samples measurements from the Mauna Loa Observatory
   :srcset: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_co2_001.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 66-69

We will preprocess the dataset by taking a monthly average and drop month
for which no measurements were collected. Such a processing will have an
smoothing effect on the data.

.. GENERATED FROM PYTHON SOURCE LINES 69-84

.. code-block:: Python


    try:
        co2_data_resampled_monthly = co2_data.resample("ME")
    except ValueError:
        # pandas < 2.2 uses M instead of ME
        co2_data_resampled_monthly = co2_data.resample("M")


    co2_data = co2_data_resampled_monthly.mean().dropna(axis="index", how="any")
    co2_data.plot()
    plt.ylabel("Monthly average of CO$_2$ concentration (ppm)")
    _ = plt.title(
        "Monthly average of air samples measurements\nfrom the Mauna Loa Observatory"
    )




.. image-sg:: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_co2_002.png
   :alt: Monthly average of air samples measurements from the Mauna Loa Observatory
   :srcset: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_co2_002.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 85-91

The idea in this example will be to predict the CO2 concentration in function
of the date. We are as well interested in extrapolating for upcoming year
after 2001.

As a first step, we will divide the data and the target to estimate. The data
being a date, we will convert it into a numeric.

.. GENERATED FROM PYTHON SOURCE LINES 91-94

.. code-block:: Python

    X = (co2_data.index.year + co2_data.index.month / 12).to_numpy().reshape(-1, 1)
    y = co2_data["co2"].to_numpy()








.. GENERATED FROM PYTHON SOURCE LINES 95-109

Design the proper kernel
------------------------

To design the kernel to use with our Gaussian process, we can make some
assumption regarding the data at hand. We observe that they have several
characteristics: we see a long term rising trend, a pronounced seasonal
variation and some smaller irregularities. We can use different appropriate
kernel that would capture these features.

First, the long term rising trend could be fitted using a radial basis
function (RBF) kernel with a large length-scale parameter. The RBF kernel
with a large length-scale enforces this component to be smooth. An trending
increase is not enforced as to give a degree of freedom to our model. The
specific length-scale and the amplitude are free hyperparameters.

.. GENERATED FROM PYTHON SOURCE LINES 109-113

.. code-block:: Python

    from sklearn.gaussian_process.kernels import RBF

    long_term_trend_kernel = 50.0**2 * RBF(length_scale=50.0)








.. GENERATED FROM PYTHON SOURCE LINES 114-121

The seasonal variation is explained by the periodic exponential sine squared
kernel with a fixed periodicity of 1 year. The length-scale of this periodic
component, controlling its smoothness, is a free parameter. In order to allow
decaying away from exact periodicity, the product with an RBF kernel is
taken. The length-scale of this RBF component controls the decay time and is
a further free parameter. This type of kernel is also known as locally
periodic kernel.

.. GENERATED FROM PYTHON SOURCE LINES 121-129

.. code-block:: Python

    from sklearn.gaussian_process.kernels import ExpSineSquared

    seasonal_kernel = (
        2.0**2
        * RBF(length_scale=100.0)
        * ExpSineSquared(length_scale=1.0, periodicity=1.0, periodicity_bounds="fixed")
    )








.. GENERATED FROM PYTHON SOURCE LINES 130-135

The small irregularities are to be explained by a rational quadratic kernel
component, whose length-scale and alpha parameter, which quantifies the
diffuseness of the length-scales, are to be determined. A rational quadratic
kernel is equivalent to an RBF kernel with several length-scale and will
better accommodate the different irregularities.

.. GENERATED FROM PYTHON SOURCE LINES 135-139

.. code-block:: Python

    from sklearn.gaussian_process.kernels import RationalQuadratic

    irregularities_kernel = 0.5**2 * RationalQuadratic(length_scale=1.0, alpha=1.0)








.. GENERATED FROM PYTHON SOURCE LINES 140-145

Finally, the noise in the dataset can be accounted with a kernel consisting
of an RBF kernel contribution, which shall explain the correlated noise
components such as local weather phenomena, and a white kernel contribution
for the white noise. The relative amplitudes and the RBF's length scale are
further free parameters.

.. GENERATED FROM PYTHON SOURCE LINES 145-151

.. code-block:: Python

    from sklearn.gaussian_process.kernels import WhiteKernel

    noise_kernel = 0.1**2 * RBF(length_scale=0.1) + WhiteKernel(
        noise_level=0.1**2, noise_level_bounds=(1e-5, 1e5)
    )








.. GENERATED FROM PYTHON SOURCE LINES 152-153

Thus, our final kernel is an addition of all previous kernel.

.. GENERATED FROM PYTHON SOURCE LINES 153-158

.. code-block:: Python

    co2_kernel = (
        long_term_trend_kernel + seasonal_kernel + irregularities_kernel + noise_kernel
    )
    co2_kernel





.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    50**2 * RBF(length_scale=50) + 2**2 * RBF(length_scale=100) * ExpSineSquared(length_scale=1, periodicity=1) + 0.5**2 * RationalQuadratic(alpha=1, length_scale=1) + 0.1**2 * RBF(length_scale=0.1) + WhiteKernel(noise_level=0.01)



.. GENERATED FROM PYTHON SOURCE LINES 159-168

Model fitting and extrapolation
-------------------------------

Now, we are ready to use a Gaussian process regressor and fit the available
data. To follow the example from the literature, we will subtract the mean
from the target. We could have used `normalize_y=True`. However, doing so
would have also scaled the target (dividing `y` by its standard deviation).
Thus, the hyperparameters of the different kernel would have had different
meaning since they would not have been expressed in ppm.

.. GENERATED FROM PYTHON SOURCE LINES 168-174

.. code-block:: Python

    from sklearn.gaussian_process import GaussianProcessRegressor

    y_mean = y.mean()
    gaussian_process = GaussianProcessRegressor(kernel=co2_kernel, normalize_y=False)
    gaussian_process.fit(X, y - y_mean)






.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <style>#sk-container-id-29 {
      /* Definition of color scheme common for light and dark mode */
      --sklearn-color-text: black;
      --sklearn-color-line: gray;
      /* Definition of color scheme for unfitted estimators */
      --sklearn-color-unfitted-level-0: #fff5e6;
      --sklearn-color-unfitted-level-1: #f6e4d2;
      --sklearn-color-unfitted-level-2: #ffe0b3;
      --sklearn-color-unfitted-level-3: chocolate;
      /* Definition of color scheme for fitted estimators */
      --sklearn-color-fitted-level-0: #f0f8ff;
      --sklearn-color-fitted-level-1: #d4ebff;
      --sklearn-color-fitted-level-2: #b3dbfd;
      --sklearn-color-fitted-level-3: cornflowerblue;

      /* Specific color for light theme */
      --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));
      --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white)));
      --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));
      --sklearn-color-icon: #696969;

      @media (prefers-color-scheme: dark) {
        /* Redefinition of color scheme for dark theme */
        --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));
        --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111)));
        --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));
        --sklearn-color-icon: #878787;
      }
    }

    #sk-container-id-29 {
      color: var(--sklearn-color-text);
    }

    #sk-container-id-29 pre {
      padding: 0;
    }

    #sk-container-id-29 input.sk-hidden--visually {
      border: 0;
      clip: rect(1px 1px 1px 1px);
      clip: rect(1px, 1px, 1px, 1px);
      height: 1px;
      margin: -1px;
      overflow: hidden;
      padding: 0;
      position: absolute;
      width: 1px;
    }

    #sk-container-id-29 div.sk-dashed-wrapped {
      border: 1px dashed var(--sklearn-color-line);
      margin: 0 0.4em 0.5em 0.4em;
      box-sizing: border-box;
      padding-bottom: 0.4em;
      background-color: var(--sklearn-color-background);
    }

    #sk-container-id-29 div.sk-container {
      /* jupyter's `normalize.less` sets `[hidden] { display: none; }`
         but bootstrap.min.css set `[hidden] { display: none !important; }`
         so we also need the `!important` here to be able to override the
         default hidden behavior on the sphinx rendered scikit-learn.org.
         See: https://github.com/scikit-learn/scikit-learn/issues/21755 */
      display: inline-block !important;
      position: relative;
    }

    #sk-container-id-29 div.sk-text-repr-fallback {
      display: none;
    }

    div.sk-parallel-item,
    div.sk-serial,
    div.sk-item {
      /* draw centered vertical line to link estimators */
      background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));
      background-size: 2px 100%;
      background-repeat: no-repeat;
      background-position: center center;
    }

    /* Parallel-specific style estimator block */

    #sk-container-id-29 div.sk-parallel-item::after {
      content: "";
      width: 100%;
      border-bottom: 2px solid var(--sklearn-color-text-on-default-background);
      flex-grow: 1;
    }

    #sk-container-id-29 div.sk-parallel {
      display: flex;
      align-items: stretch;
      justify-content: center;
      background-color: var(--sklearn-color-background);
      position: relative;
    }

    #sk-container-id-29 div.sk-parallel-item {
      display: flex;
      flex-direction: column;
    }

    #sk-container-id-29 div.sk-parallel-item:first-child::after {
      align-self: flex-end;
      width: 50%;
    }

    #sk-container-id-29 div.sk-parallel-item:last-child::after {
      align-self: flex-start;
      width: 50%;
    }

    #sk-container-id-29 div.sk-parallel-item:only-child::after {
      width: 0;
    }

    /* Serial-specific style estimator block */

    #sk-container-id-29 div.sk-serial {
      display: flex;
      flex-direction: column;
      align-items: center;
      background-color: var(--sklearn-color-background);
      padding-right: 1em;
      padding-left: 1em;
    }


    /* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is
    clickable and can be expanded/collapsed.
    - Pipeline and ColumnTransformer use this feature and define the default style
    - Estimators will overwrite some part of the style using the `sk-estimator` class
    */

    /* Pipeline and ColumnTransformer style (default) */

    #sk-container-id-29 div.sk-toggleable {
      /* Default theme specific background. It is overwritten whether we have a
      specific estimator or a Pipeline/ColumnTransformer */
      background-color: var(--sklearn-color-background);
    }

    /* Toggleable label */
    #sk-container-id-29 label.sk-toggleable__label {
      cursor: pointer;
      display: block;
      width: 100%;
      margin-bottom: 0;
      padding: 0.5em;
      box-sizing: border-box;
      text-align: center;
    }

    #sk-container-id-29 label.sk-toggleable__label-arrow:before {
      /* Arrow on the left of the label */
      content: "▸";
      float: left;
      margin-right: 0.25em;
      color: var(--sklearn-color-icon);
    }

    #sk-container-id-29 label.sk-toggleable__label-arrow:hover:before {
      color: var(--sklearn-color-text);
    }

    /* Toggleable content - dropdown */

    #sk-container-id-29 div.sk-toggleable__content {
      max-height: 0;
      max-width: 0;
      overflow: hidden;
      text-align: left;
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-0);
    }

    #sk-container-id-29 div.sk-toggleable__content.fitted {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-0);
    }

    #sk-container-id-29 div.sk-toggleable__content pre {
      margin: 0.2em;
      border-radius: 0.25em;
      color: var(--sklearn-color-text);
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-0);
    }

    #sk-container-id-29 div.sk-toggleable__content.fitted pre {
      /* unfitted */
      background-color: var(--sklearn-color-fitted-level-0);
    }

    #sk-container-id-29 input.sk-toggleable__control:checked~div.sk-toggleable__content {
      /* Expand drop-down */
      max-height: 200px;
      max-width: 100%;
      overflow: auto;
    }

    #sk-container-id-29 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {
      content: "▾";
    }

    /* Pipeline/ColumnTransformer-specific style */

    #sk-container-id-29 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {
      color: var(--sklearn-color-text);
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    #sk-container-id-29 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {
      background-color: var(--sklearn-color-fitted-level-2);
    }

    /* Estimator-specific style */

    /* Colorize estimator box */
    #sk-container-id-29 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    #sk-container-id-29 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-2);
    }

    #sk-container-id-29 div.sk-label label.sk-toggleable__label,
    #sk-container-id-29 div.sk-label label {
      /* The background is the default theme color */
      color: var(--sklearn-color-text-on-default-background);
    }

    /* On hover, darken the color of the background */
    #sk-container-id-29 div.sk-label:hover label.sk-toggleable__label {
      color: var(--sklearn-color-text);
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    /* Label box, darken color on hover, fitted */
    #sk-container-id-29 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {
      color: var(--sklearn-color-text);
      background-color: var(--sklearn-color-fitted-level-2);
    }

    /* Estimator label */

    #sk-container-id-29 div.sk-label label {
      font-family: monospace;
      font-weight: bold;
      display: inline-block;
      line-height: 1.2em;
    }

    #sk-container-id-29 div.sk-label-container {
      text-align: center;
    }

    /* Estimator-specific */
    #sk-container-id-29 div.sk-estimator {
      font-family: monospace;
      border: 1px dotted var(--sklearn-color-border-box);
      border-radius: 0.25em;
      box-sizing: border-box;
      margin-bottom: 0.5em;
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-0);
    }

    #sk-container-id-29 div.sk-estimator.fitted {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-0);
    }

    /* on hover */
    #sk-container-id-29 div.sk-estimator:hover {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    #sk-container-id-29 div.sk-estimator.fitted:hover {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-2);
    }

    /* Specification for estimator info (e.g. "i" and "?") */

    /* Common style for "i" and "?" */

    .sk-estimator-doc-link,
    a:link.sk-estimator-doc-link,
    a:visited.sk-estimator-doc-link {
      float: right;
      font-size: smaller;
      line-height: 1em;
      font-family: monospace;
      background-color: var(--sklearn-color-background);
      border-radius: 1em;
      height: 1em;
      width: 1em;
      text-decoration: none !important;
      margin-left: 1ex;
      /* unfitted */
      border: var(--sklearn-color-unfitted-level-1) 1pt solid;
      color: var(--sklearn-color-unfitted-level-1);
    }

    .sk-estimator-doc-link.fitted,
    a:link.sk-estimator-doc-link.fitted,
    a:visited.sk-estimator-doc-link.fitted {
      /* fitted */
      border: var(--sklearn-color-fitted-level-1) 1pt solid;
      color: var(--sklearn-color-fitted-level-1);
    }

    /* On hover */
    div.sk-estimator:hover .sk-estimator-doc-link:hover,
    .sk-estimator-doc-link:hover,
    div.sk-label-container:hover .sk-estimator-doc-link:hover,
    .sk-estimator-doc-link:hover {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-3);
      color: var(--sklearn-color-background);
      text-decoration: none;
    }

    div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,
    .sk-estimator-doc-link.fitted:hover,
    div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,
    .sk-estimator-doc-link.fitted:hover {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-3);
      color: var(--sklearn-color-background);
      text-decoration: none;
    }

    /* Span, style for the box shown on hovering the info icon */
    .sk-estimator-doc-link span {
      display: none;
      z-index: 9999;
      position: relative;
      font-weight: normal;
      right: .2ex;
      padding: .5ex;
      margin: .5ex;
      width: min-content;
      min-width: 20ex;
      max-width: 50ex;
      color: var(--sklearn-color-text);
      box-shadow: 2pt 2pt 4pt #999;
      /* unfitted */
      background: var(--sklearn-color-unfitted-level-0);
      border: .5pt solid var(--sklearn-color-unfitted-level-3);
    }

    .sk-estimator-doc-link.fitted span {
      /* fitted */
      background: var(--sklearn-color-fitted-level-0);
      border: var(--sklearn-color-fitted-level-3);
    }

    .sk-estimator-doc-link:hover span {
      display: block;
    }

    /* "?"-specific style due to the `<a>` HTML tag */

    #sk-container-id-29 a.estimator_doc_link {
      float: right;
      font-size: 1rem;
      line-height: 1em;
      font-family: monospace;
      background-color: var(--sklearn-color-background);
      border-radius: 1rem;
      height: 1rem;
      width: 1rem;
      text-decoration: none;
      /* unfitted */
      color: var(--sklearn-color-unfitted-level-1);
      border: var(--sklearn-color-unfitted-level-1) 1pt solid;
    }

    #sk-container-id-29 a.estimator_doc_link.fitted {
      /* fitted */
      border: var(--sklearn-color-fitted-level-1) 1pt solid;
      color: var(--sklearn-color-fitted-level-1);
    }

    /* On hover */
    #sk-container-id-29 a.estimator_doc_link:hover {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-3);
      color: var(--sklearn-color-background);
      text-decoration: none;
    }

    #sk-container-id-29 a.estimator_doc_link.fitted:hover {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-3);
    }
    </style><div id="sk-container-id-29" class="sk-top-container"><div class="sk-text-repr-fallback"><pre>GaussianProcessRegressor(kernel=50**2 * RBF(length_scale=50) + 2**2 * RBF(length_scale=100) * ExpSineSquared(length_scale=1, periodicity=1) + 0.5**2 * RationalQuadratic(alpha=1, length_scale=1) + 0.1**2 * RBF(length_scale=0.1) + WhiteKernel(noise_level=0.01))</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item"><div class="sk-estimator fitted sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-112" type="checkbox" checked><label for="sk-estimator-id-112" class="sk-toggleable__label fitted sk-toggleable__label-arrow fitted">&nbsp;&nbsp;GaussianProcessRegressor<a class="sk-estimator-doc-link fitted" rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.4/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html">?<span>Documentation for GaussianProcessRegressor</span></a><span class="sk-estimator-doc-link fitted">i<span>Fitted</span></span></label><div class="sk-toggleable__content fitted"><pre>GaussianProcessRegressor(kernel=50**2 * RBF(length_scale=50) + 2**2 * RBF(length_scale=100) * ExpSineSquared(length_scale=1, periodicity=1) + 0.5**2 * RationalQuadratic(alpha=1, length_scale=1) + 0.1**2 * RBF(length_scale=0.1) + WhiteKernel(noise_level=0.01))</pre></div> </div></div></div></div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 175-182

Now, we will use the Gaussian process to predict on:

- training data to inspect the goodness of fit;
- future data to see the extrapolation done by the model.

Thus, we create synthetic data from 1958 to the current month. In addition,
we need to add the subtracted mean computed during training.

.. GENERATED FROM PYTHON SOURCE LINES 182-192

.. code-block:: Python

    import datetime

    import numpy as np

    today = datetime.datetime.now()
    current_month = today.year + today.month / 12
    X_test = np.linspace(start=1958, stop=current_month, num=1_000).reshape(-1, 1)
    mean_y_pred, std_y_pred = gaussian_process.predict(X_test, return_std=True)
    mean_y_pred += y_mean








.. GENERATED FROM PYTHON SOURCE LINES 193-209

.. code-block:: Python

    plt.plot(X, y, color="black", linestyle="dashed", label="Measurements")
    plt.plot(X_test, mean_y_pred, color="tab:blue", alpha=0.4, label="Gaussian process")
    plt.fill_between(
        X_test.ravel(),
        mean_y_pred - std_y_pred,
        mean_y_pred + std_y_pred,
        color="tab:blue",
        alpha=0.2,
    )
    plt.legend()
    plt.xlabel("Year")
    plt.ylabel("Monthly average of CO$_2$ concentration (ppm)")
    _ = plt.title(
        "Monthly average of air samples measurements\nfrom the Mauna Loa Observatory"
    )




.. image-sg:: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_co2_003.png
   :alt: Monthly average of air samples measurements from the Mauna Loa Observatory
   :srcset: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_co2_003.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 210-217

Our fitted model is capable to fit previous data properly and extrapolate to
future year with confidence.

Interpretation of kernel hyperparameters
----------------------------------------

Now, we can have a look at the hyperparameters of the kernel.

.. GENERATED FROM PYTHON SOURCE LINES 217-219

.. code-block:: Python

    gaussian_process.kernel_





.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    44.8**2 * RBF(length_scale=51.6) + 2.64**2 * RBF(length_scale=91.5) * ExpSineSquared(length_scale=1.48, periodicity=1) + 0.536**2 * RationalQuadratic(alpha=2.89, length_scale=0.968) + 0.188**2 * RBF(length_scale=0.122) + WhiteKernel(noise_level=0.0367)



.. GENERATED FROM PYTHON SOURCE LINES 220-228

Thus, most of the target signal, with the mean subtracted, is explained by a
long-term rising trend for ~45 ppm and a length-scale of ~52 years. The
periodic component has an amplitude of ~2.6ppm, a decay time of ~90 years and
a length-scale of ~1.5. The long decay time indicates that we have a
component very close to a seasonal periodicity. The correlated noise has an
amplitude of ~0.2 ppm with a length scale of ~0.12 years and a white-noise
contribution of ~0.04 ppm. Thus, the overall noise level is very small,
indicating that the data can be very well explained by the model.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 45.839 seconds)


.. _sphx_glr_download_auto_examples_gaussian_process_plot_gpr_co2.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.4.X?urlpath=lab/tree/notebooks/auto_examples/gaussian_process/plot_gpr_co2.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: lite-badge

      .. image:: images/jupyterlite_badge_logo.svg
        :target: ../../lite/lab/?path=auto_examples/gaussian_process/plot_gpr_co2.ipynb
        :alt: Launch JupyterLite
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_gpr_co2.ipynb <plot_gpr_co2.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_gpr_co2.py <plot_gpr_co2.py>`


.. include:: plot_gpr_co2.recommendations


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_