.. note::
:class: sphxglrdownloadlinknote
Click :ref:`here ` to download the full example code
.. rstclass:: sphxglrexampletitle
.. _sphx_glr_auto_examples_gaussian_process_plot_compare_gpr_krr.py:
==========================================================
Comparison of kernel ridge and Gaussian process regression
==========================================================
Both kernel ridge regression (KRR) and Gaussian process regression (GPR) learn
a target function by employing internally the "kernel trick". KRR learns a
linear function in the space induced by the respective kernel which corresponds
to a nonlinear function in the original space. The linear function in the
kernel space is chosen based on the meansquared error loss with
ridge regularization. GPR uses the kernel to define the covariance of
a prior distribution over the target functions and uses the observed training
data to define a likelihood function. Based on Bayes theorem, a (Gaussian)
posterior distribution over target functions is defined, whose mean is used
for prediction.
A major difference is that GPR can choose the kernel's hyperparameters based
on gradientascent on the marginal likelihood function while KRR needs to
perform a grid search on a crossvalidated loss function (meansquared error
loss). A further difference is that GPR learns a generative, probabilistic
model of the target function and can thus provide meaningful confidence
intervals and posterior samples along with the predictions while KRR only
provides predictions.
This example illustrates both methods on an artificial dataset, which
consists of a sinusoidal target function and strong noise. The figure compares
the learned model of KRR and GPR based on a ExpSineSquared kernel, which is
suited for learning periodic functions. The kernel's hyperparameters control
the smoothness (l) and periodicity of the kernel (p). Moreover, the noise level
of the data is learned explicitly by GPR by an additional WhiteKernel component
in the kernel and by the regularization parameter alpha of KRR.
The figure shows that both methods learn reasonable models of the target
function. GPR correctly identifies the periodicity of the function to be
roughly 2*pi (6.28), while KRR chooses the doubled periodicity 4*pi. Besides
that, GPR provides reasonable confidence bounds on the prediction which are not
available for KRR. A major difference between the two methods is the time
required for fitting and predicting: while fitting KRR is fast in principle,
the gridsearch for hyperparameter optimization scales exponentially with the
number of hyperparameters ("curse of dimensionality"). The gradientbased
optimization of the parameters in GPR does not suffer from this exponential
scaling and is thus considerable faster on this example with 3dimensional
hyperparameter space. The time for predicting is similar; however, generating
the variance of the predictive distribution of GPR takes considerable longer
than just predicting the mean.
.. image:: /auto_examples/gaussian_process/images/sphx_glr_plot_compare_gpr_krr_001.png
:class: sphxglrsingleimg
.. rstclass:: sphxglrscriptout
Out:
.. codeblock:: none
Time for KRR fitting: 3.180
Time for GPR fitting: 0.096
Time for KRR prediction: 0.009
Time for GPR prediction: 0.010
Time for GPR prediction with standarddeviation: 0.014

.. codeblock:: python
print(__doc__)
# Authors: Jan Hendrik Metzen
# License: BSD 3 clause
import time
import numpy as np
import matplotlib.pyplot as plt
from sklearn.kernel_ridge import KernelRidge
from sklearn.model_selection import GridSearchCV
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import WhiteKernel, ExpSineSquared
rng = np.random.RandomState(0)
# Generate sample data
X = 15 * rng.rand(100, 1)
y = np.sin(X).ravel()
y += 3 * (0.5  rng.rand(X.shape[0])) # add noise
# Fit KernelRidge with parameter selection based on 5fold cross validation
param_grid = {"alpha": [1e0, 1e1, 1e2, 1e3],
"kernel": [ExpSineSquared(l, p)
for l in np.logspace(2, 2, 10)
for p in np.logspace(0, 2, 10)]}
kr = GridSearchCV(KernelRidge(), cv=5, param_grid=param_grid)
stime = time.time()
kr.fit(X, y)
print("Time for KRR fitting: %.3f" % (time.time()  stime))
gp_kernel = ExpSineSquared(1.0, 5.0, periodicity_bounds=(1e2, 1e1)) \
+ WhiteKernel(1e1)
gpr = GaussianProcessRegressor(kernel=gp_kernel)
stime = time.time()
gpr.fit(X, y)
print("Time for GPR fitting: %.3f" % (time.time()  stime))
# Predict using kernel ridge
X_plot = np.linspace(0, 20, 10000)[:, None]
stime = time.time()
y_kr = kr.predict(X_plot)
print("Time for KRR prediction: %.3f" % (time.time()  stime))
# Predict using gaussian process regressor
stime = time.time()
y_gpr = gpr.predict(X_plot, return_std=False)
print("Time for GPR prediction: %.3f" % (time.time()  stime))
stime = time.time()
y_gpr, y_std = gpr.predict(X_plot, return_std=True)
print("Time for GPR prediction with standarddeviation: %.3f"
% (time.time()  stime))
# Plot results
plt.figure(figsize=(10, 5))
lw = 2
plt.scatter(X, y, c='k', label='data')
plt.plot(X_plot, np.sin(X_plot), color='navy', lw=lw, label='True')
plt.plot(X_plot, y_kr, color='turquoise', lw=lw,
label='KRR (%s)' % kr.best_params_)
plt.plot(X_plot, y_gpr, color='darkorange', lw=lw,
label='GPR (%s)' % gpr.kernel_)
plt.fill_between(X_plot[:, 0], y_gpr  y_std, y_gpr + y_std, color='darkorange',
alpha=0.2)
plt.xlabel('data')
plt.ylabel('target')
plt.xlim(0, 20)
plt.ylim(4, 4)
plt.title('GPR versus Kernel Ridge')
plt.legend(loc="best", scatterpoints=1, prop={'size': 8})
plt.show()
**Total running time of the script:** ( 0 minutes 3.377 seconds)
.. _sphx_glr_download_auto_examples_gaussian_process_plot_compare_gpr_krr.py:
.. only :: html
.. container:: sphxglrfooter
:class: sphxglrfooterexample
.. container:: sphxglrdownload
:download:`Download Python source code: plot_compare_gpr_krr.py `
.. container:: sphxglrdownload
:download:`Download Jupyter notebook: plot_compare_gpr_krr.ipynb `
.. only:: html
.. rstclass:: sphxglrsignature
`Gallery generated by SphinxGallery `_