.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/applications/plot_digits_denoising.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_applications_plot_digits_denoising.py>`
        to download the full example code or to run this example in your browser via JupyterLite or Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_applications_plot_digits_denoising.py:


================================
Image denoising using kernel PCA
================================

This example shows how to use :class:`~sklearn.decomposition.KernelPCA` to
denoise images. In short, we take advantage of the approximation function
learned during `fit` to reconstruct the original image.

We will compare the results with an exact reconstruction using
:class:`~sklearn.decomposition.PCA`.

We will use USPS digits dataset to reproduce presented in Sect. 4 of [1]_.

.. topic:: References

   .. [1] `Bakır, Gökhan H., Jason Weston, and Bernhard Schölkopf.
      "Learning to find pre-images."
      Advances in neural information processing systems 16 (2004): 449-456.
      <https://papers.nips.cc/paper/2003/file/ac1ad983e08ad3304a97e147f522747e-Paper.pdf>`_

.. GENERATED FROM PYTHON SOURCE LINES 23-27

.. code-block:: default


    # Authors: Guillaume Lemaitre <guillaume.lemaitre@inria.fr>
    # Licence: BSD 3 clause








.. GENERATED FROM PYTHON SOURCE LINES 28-34

Load the dataset via OpenML
---------------------------

The USPS digits datasets is available in OpenML. We use
:func:`~sklearn.datasets.fetch_openml` to get this dataset. In addition, we
normalize the dataset such that all pixel values are in the range (0, 1).

.. GENERATED FROM PYTHON SOURCE LINES 34-43

.. code-block:: default

    import numpy as np

    from sklearn.datasets import fetch_openml
    from sklearn.model_selection import train_test_split
    from sklearn.preprocessing import MinMaxScaler

    X, y = fetch_openml(data_id=41082, as_frame=False, return_X_y=True, parser="pandas")
    X = MinMaxScaler().fit_transform(X)








.. GENERATED FROM PYTHON SOURCE LINES 44-57

The idea will be to learn a PCA basis (with and without a kernel) on
noisy images and then use these models to reconstruct and denoise these
images.

Thus, we split our dataset into a training and testing set composed of 1,000
samples for the training and 100 samples for testing. These images are
noise-free and we will use them to evaluate the efficiency of the denoising
approaches. In addition, we create a copy of the original dataset and add a
Gaussian noise.

The idea of this application, is to show that we can denoise corrupted images
by learning a PCA basis on some uncorrupted images. We will use both a PCA
and a kernel-based PCA to solve this problem.

.. GENERATED FROM PYTHON SOURCE LINES 57-68

.. code-block:: default

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, stratify=y, random_state=0, train_size=1_000, test_size=100
    )

    rng = np.random.RandomState(0)
    noise = rng.normal(scale=0.25, size=X_test.shape)
    X_test_noisy = X_test + noise

    noise = rng.normal(scale=0.25, size=X_train.shape)
    X_train_noisy = X_train + noise








.. GENERATED FROM PYTHON SOURCE LINES 69-71

In addition, we will create a helper function to qualitatively assess the
image reconstruction by plotting the test images.

.. GENERATED FROM PYTHON SOURCE LINES 71-83

.. code-block:: default

    import matplotlib.pyplot as plt


    def plot_digits(X, title):
        """Small helper function to plot 100 digits."""
        fig, axs = plt.subplots(nrows=10, ncols=10, figsize=(8, 8))
        for img, ax in zip(X, axs.ravel()):
            ax.imshow(img.reshape((16, 16)), cmap="Greys")
            ax.axis("off")
        fig.suptitle(title, fontsize=24)









.. GENERATED FROM PYTHON SOURCE LINES 84-89

In addition, we will use the mean squared error (MSE) to quantitatively
assess the image reconstruction.

Let's first have a look to see the difference between noise-free and noisy
images. We will check the test set in this regard.

.. GENERATED FROM PYTHON SOURCE LINES 89-94

.. code-block:: default

    plot_digits(X_test, "Uncorrupted test images")
    plot_digits(
        X_test_noisy, f"Noisy test images\nMSE: {np.mean((X_test - X_test_noisy) ** 2):.2f}"
    )




.. rst-class:: sphx-glr-horizontal


    *

      .. image-sg:: /auto_examples/applications/images/sphx_glr_plot_digits_denoising_001.png
         :alt: Uncorrupted test images
         :srcset: /auto_examples/applications/images/sphx_glr_plot_digits_denoising_001.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /auto_examples/applications/images/sphx_glr_plot_digits_denoising_002.png
         :alt: Noisy test images MSE: 0.06
         :srcset: /auto_examples/applications/images/sphx_glr_plot_digits_denoising_002.png
         :class: sphx-glr-multi-img





.. GENERATED FROM PYTHON SOURCE LINES 95-100

Learn the `PCA` basis
---------------------

We can now learn our PCA basis using both a linear PCA and a kernel PCA that
uses a radial basis function (RBF) kernel.

.. GENERATED FROM PYTHON SOURCE LINES 100-115

.. code-block:: default

    from sklearn.decomposition import PCA, KernelPCA

    pca = PCA(n_components=32, random_state=42)
    kernel_pca = KernelPCA(
        n_components=400,
        kernel="rbf",
        gamma=1e-3,
        fit_inverse_transform=True,
        alpha=5e-3,
        random_state=42,
    )

    pca.fit(X_train_noisy)
    _ = kernel_pca.fit(X_train_noisy)








.. GENERATED FROM PYTHON SOURCE LINES 116-126

Reconstruct and denoise test images
-----------------------------------

Now, we can transform and reconstruct the noisy test set. Since we used less
components than the number of original features, we will get an approximation
of the original set. Indeed, by dropping the components explaining variance
in PCA the least, we hope to remove noise. Similar thinking happens in kernel
PCA; however, we expect a better reconstruction because we use a non-linear
kernel to learn the PCA basis and a kernel ridge to learn the mapping
function.

.. GENERATED FROM PYTHON SOURCE LINES 126-131

.. code-block:: default

    X_reconstructed_kernel_pca = kernel_pca.inverse_transform(
        kernel_pca.transform(X_test_noisy)
    )
    X_reconstructed_pca = pca.inverse_transform(pca.transform(X_test_noisy))








.. GENERATED FROM PYTHON SOURCE LINES 132-145

.. code-block:: default

    plot_digits(X_test, "Uncorrupted test images")
    plot_digits(
        X_reconstructed_pca,
        f"PCA reconstruction\nMSE: {np.mean((X_test - X_reconstructed_pca) ** 2):.2f}",
    )
    plot_digits(
        X_reconstructed_kernel_pca,
        (
            "Kernel PCA reconstruction\n"
            f"MSE: {np.mean((X_test - X_reconstructed_kernel_pca) ** 2):.2f}"
        ),
    )




.. rst-class:: sphx-glr-horizontal


    *

      .. image-sg:: /auto_examples/applications/images/sphx_glr_plot_digits_denoising_003.png
         :alt: Uncorrupted test images
         :srcset: /auto_examples/applications/images/sphx_glr_plot_digits_denoising_003.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /auto_examples/applications/images/sphx_glr_plot_digits_denoising_004.png
         :alt: PCA reconstruction MSE: 0.01
         :srcset: /auto_examples/applications/images/sphx_glr_plot_digits_denoising_004.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /auto_examples/applications/images/sphx_glr_plot_digits_denoising_005.png
         :alt: Kernel PCA reconstruction MSE: 0.03
         :srcset: /auto_examples/applications/images/sphx_glr_plot_digits_denoising_005.png
         :class: sphx-glr-multi-img





.. GENERATED FROM PYTHON SOURCE LINES 146-152

PCA has a lower MSE than kernel PCA. However, the qualitative analysis might
not favor PCA instead of kernel PCA. We observe that kernel PCA is able to
remove background noise and provide a smoother image.

However, it should be noted that the results of the denoising with kernel PCA
will depend of the parameters `n_components`, `gamma`, and `alpha`.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 8.583 seconds)


.. _sphx_glr_download_auto_examples_applications_plot_digits_denoising.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.3.X?urlpath=lab/tree/notebooks/auto_examples/applications/plot_digits_denoising.ipynb
        :alt: Launch binder
        :width: 150 px



    .. container:: lite-badge

      .. image:: images/jupyterlite_badge_logo.svg
        :target: ../../lite/lab/?path=auto_examples/applications/plot_digits_denoising.ipynb
        :alt: Launch JupyterLite
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_digits_denoising.py <plot_digits_denoising.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_digits_denoising.ipynb <plot_digits_denoising.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_