.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/gaussian_process/plot_gpr_noisy_targets.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:here  to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_gaussian_process_plot_gpr_noisy_targets.py: ========================================================= Gaussian Processes regression: basic introductory example ========================================================= A simple one-dimensional regression example computed in two different ways: 1. A noise-free case 2. A noisy case with known noise-level per datapoint In both cases, the kernel's parameters are estimated using the maximum likelihood principle. The figures illustrate the interpolating property of the Gaussian Process model as well as its probabilistic nature in the form of a pointwise 95% confidence interval. Note that alpha is a parameter to control the strength of the Tikhonov regularization on the assumed training points' covariance matrix. .. GENERATED FROM PYTHON SOURCE LINES 21-28 .. code-block:: default # Author: Vincent Dubourg # Jake Vanderplas # Jan Hendrik Metzen # Guillaume Lemaitre # License: BSD 3 clause .. GENERATED FROM PYTHON SOURCE LINES 29-34 Dataset generation ------------------ We will start by generating a synthetic dataset. The true generative process is defined as :math:f(x) = x \sin(x). .. GENERATED FROM PYTHON SOURCE LINES 34-39 .. code-block:: default import numpy as np X = np.linspace(start=0, stop=10, num=1_000).reshape(-1, 1) y = np.squeeze(X * np.sin(X)) .. GENERATED FROM PYTHON SOURCE LINES 40-48 .. code-block:: default import matplotlib.pyplot as plt plt.plot(X, y, label=r"$f(x) = x \sin(x)$", linestyle="dotted") plt.legend() plt.xlabel("$x$") plt.ylabel("$f(x)$") _ = plt.title("True generative process") .. image-sg:: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_targets_001.png :alt: True generative process :srcset: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_targets_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 49-58 We will use this dataset in the next experiment to illustrate how Gaussian Process regression is working. Example with noise-free target ------------------------------ In this first example, we will use the true generative process without adding any noise. For training the Gaussian Process regression, we will only select few samples. .. GENERATED FROM PYTHON SOURCE LINES 58-62 .. code-block:: default rng = np.random.RandomState(1) training_indices = rng.choice(np.arange(y.size), size=6, replace=False) X_train, y_train = X[training_indices], y[training_indices] .. GENERATED FROM PYTHON SOURCE LINES 63-66 Now, we fit a Gaussian process on these few training data samples. We will use a radial basis function (RBF) kernel and a constant parameter to fit the amplitude. .. GENERATED FROM PYTHON SOURCE LINES 66-74 .. code-block:: default from sklearn.gaussian_process import GaussianProcessRegressor from sklearn.gaussian_process.kernels import RBF kernel = 1 * RBF(length_scale=1.0, length_scale_bounds=(1e-2, 1e2)) gaussian_process = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9) gaussian_process.fit(X_train, y_train) gaussian_process.kernel_ .. rst-class:: sphx-glr-script-out Out: .. code-block:: none 5.02**2 * RBF(length_scale=1.43) .. GENERATED FROM PYTHON SOURCE LINES 75-78 After fitting our model, we see that the hyperparameters of the kernel have been optimized. Now, we will use our kernel to compute the mean prediction of the full dataset and plot the 95% confidence interval. .. GENERATED FROM PYTHON SOURCE LINES 78-95 .. code-block:: default mean_prediction, std_prediction = gaussian_process.predict(X, return_std=True) plt.plot(X, y, label=r"$f(x) = x \sin(x)$", linestyle="dotted") plt.scatter(X_train, y_train, label="Observations") plt.plot(X, mean_prediction, label="Mean prediction") plt.fill_between( X.ravel(), mean_prediction - 1.96 * std_prediction, mean_prediction + 1.96 * std_prediction, alpha=0.5, label=r"95% confidence interval", ) plt.legend() plt.xlabel("$x$") plt.ylabel("$f(x)$") _ = plt.title("Gaussian process regression on noise-free dataset") .. image-sg:: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_targets_002.png :alt: Gaussian process regression on noise-free dataset :srcset: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_targets_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 96-109 We see that for a prediction made on a data point close to the one from the training set, the 95% confidence has a small amplitude. Whenever a sample falls far from training data, our model's prediction is less accurate and the model prediction is less precise (higher uncertainty). Example with noisy targets -------------------------- We can repeat a similar experiment adding an additional noise to the target this time. It will allow seeing the effect of the noise on the fitted model. We add some random Gaussian noise to the target with an arbitrary standard deviation. .. GENERATED FROM PYTHON SOURCE LINES 109-112 .. code-block:: default noise_std = 0.75 y_train_noisy = y_train + rng.normal(loc=0.0, scale=noise_std, size=y_train.shape) .. GENERATED FROM PYTHON SOURCE LINES 113-116 We create a similar Gaussian process model. In addition to the kernel, this time, we specify the parameter alpha which can be interpreted as the variance of a Gaussian noise. .. GENERATED FROM PYTHON SOURCE LINES 116-122 .. code-block:: default gaussian_process = GaussianProcessRegressor( kernel=kernel, alpha=noise_std**2, n_restarts_optimizer=9 ) gaussian_process.fit(X_train, y_train_noisy) mean_prediction, std_prediction = gaussian_process.predict(X, return_std=True) .. GENERATED FROM PYTHON SOURCE LINES 123-124 Let's plot the mean prediction and the uncertainty region as before. .. GENERATED FROM PYTHON SOURCE LINES 124-149 .. code-block:: default plt.plot(X, y, label=r"$f(x) = x \sin(x)$", linestyle="dotted") plt.errorbar( X_train, y_train_noisy, noise_std, linestyle="None", color="tab:blue", marker=".", markersize=10, label="Observations", ) plt.plot(X, mean_prediction, label="Mean prediction") plt.fill_between( X.ravel(), mean_prediction - 1.96 * std_prediction, mean_prediction + 1.96 * std_prediction, color="tab:orange", alpha=0.5, label=r"95% confidence interval", ) plt.legend() plt.xlabel("$x$") plt.ylabel("$f(x)$") _ = plt.title("Gaussian process regression on a noisy dataset") .. image-sg:: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_targets_003.png :alt: Gaussian process regression on a noisy dataset :srcset: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_targets_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 150-154 The noise affects the predictions close to the training samples: the predictive uncertainty near to the training samples is larger because we explicitly model a given level target noise independent of the input variable. .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.438 seconds) .. _sphx_glr_download_auto_examples_gaussian_process_plot_gpr_noisy_targets.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.1.X?urlpath=lab/tree/notebooks/auto_examples/gaussian_process/plot_gpr_noisy_targets.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-python :download:Download Python source code: plot_gpr_noisy_targets.py  .. container:: sphx-glr-download sphx-glr-download-jupyter :download:Download Jupyter notebook: plot_gpr_noisy_targets.ipynb  .. only:: html .. rst-class:: sphx-glr-signature Gallery generated by Sphinx-Gallery _