.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/cross_decomposition/plot_compare_cross_decomposition.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_cross_decomposition_plot_compare_cross_decomposition.py>`
        to download the full example code or to run this example in your browser via Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_cross_decomposition_plot_compare_cross_decomposition.py:


===================================
Compare cross decomposition methods
===================================

Simple usage of various cross decomposition algorithms:

- PLSCanonical
- PLSRegression, with multivariate response, a.k.a. PLS2
- PLSRegression, with univariate response, a.k.a. PLS1
- CCA

Given 2 multivariate covarying two-dimensional datasets, X, and Y,
PLS extracts the 'directions of covariance', i.e. the components of each
datasets that explain the most shared variance between both datasets.
This is apparent on the **scatterplot matrix** display: components 1 in
dataset X and dataset Y are maximally correlated (points lie around the
first diagonal). This is also true for components 2 in both dataset,
however, the correlation across datasets for different components is
weak: the point cloud is very spherical.

.. GENERATED FROM PYTHON SOURCE LINES 25-27

Dataset based latent variables model
------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 27-49

.. code-block:: default


    import numpy as np

    n = 500
    # 2 latents vars:
    l1 = np.random.normal(size=n)
    l2 = np.random.normal(size=n)

    latents = np.array([l1, l1, l2, l2]).T
    X = latents + np.random.normal(size=4 * n).reshape((n, 4))
    Y = latents + np.random.normal(size=4 * n).reshape((n, 4))

    X_train = X[: n // 2]
    Y_train = Y[: n // 2]
    X_test = X[n // 2 :]
    Y_test = Y[n // 2 :]

    print("Corr(X)")
    print(np.round(np.corrcoef(X.T), 2))
    print("Corr(Y)")
    print(np.round(np.corrcoef(Y.T), 2))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Corr(X)
    [[ 1.    0.44 -0.06 -0.01]
     [ 0.44  1.   -0.01 -0.06]
     [-0.06 -0.01  1.    0.5 ]
     [-0.01 -0.06  0.5   1.  ]]
    Corr(Y)
    [[ 1.    0.47 -0.05  0.02]
     [ 0.47  1.   -0.01  0.03]
     [-0.05 -0.01  1.    0.47]
     [ 0.02  0.03  0.47  1.  ]]


.. GENERATED FROM PYTHON SOURCE LINES 50-55

Canonical (symmetric) PLS
-------------------------

Transform data
~~~~~~~~~~~~~~

.. GENERATED FROM PYTHON SOURCE LINES 55-63

.. code-block:: default


    from sklearn.cross_decomposition import PLSCanonical

    plsca = PLSCanonical(n_components=2)
    plsca.fit(X_train, Y_train)
    X_train_r, Y_train_r = plsca.transform(X_train, Y_train)
    X_test_r, Y_test_r = plsca.transform(X_test, Y_test)


.. GENERATED FROM PYTHON SOURCE LINES 64-66

Scatter plot of scores
~~~~~~~~~~~~~~~~~~~~~~

.. GENERATED FROM PYTHON SOURCE LINES 66-125

.. code-block:: default


    import matplotlib.pyplot as plt

    # On diagonal plot X vs Y scores on each components
    plt.figure(figsize=(12, 8))
    plt.subplot(221)
    plt.scatter(X_train_r[:, 0], Y_train_r[:, 0], label="train", marker="o", s=25)
    plt.scatter(X_test_r[:, 0], Y_test_r[:, 0], label="test", marker="o", s=25)
    plt.xlabel("x scores")
    plt.ylabel("y scores")
    plt.title(
        "Comp. 1: X vs Y (test corr = %.2f)"
        % np.corrcoef(X_test_r[:, 0], Y_test_r[:, 0])[0, 1]
    )
    plt.xticks(())
    plt.yticks(())
    plt.legend(loc="best")

    plt.subplot(224)
    plt.scatter(X_train_r[:, 1], Y_train_r[:, 1], label="train", marker="o", s=25)
    plt.scatter(X_test_r[:, 1], Y_test_r[:, 1], label="test", marker="o", s=25)
    plt.xlabel("x scores")
    plt.ylabel("y scores")
    plt.title(
        "Comp. 2: X vs Y (test corr = %.2f)"
        % np.corrcoef(X_test_r[:, 1], Y_test_r[:, 1])[0, 1]
    )
    plt.xticks(())
    plt.yticks(())
    plt.legend(loc="best")

    # Off diagonal plot components 1 vs 2 for X and Y
    plt.subplot(222)
    plt.scatter(X_train_r[:, 0], X_train_r[:, 1], label="train", marker="*", s=50)
    plt.scatter(X_test_r[:, 0], X_test_r[:, 1], label="test", marker="*", s=50)
    plt.xlabel("X comp. 1")
    plt.ylabel("X comp. 2")
    plt.title(
        "X comp. 1 vs X comp. 2 (test corr = %.2f)"
        % np.corrcoef(X_test_r[:, 0], X_test_r[:, 1])[0, 1]
    )
    plt.legend(loc="best")
    plt.xticks(())
    plt.yticks(())

    plt.subplot(223)
    plt.scatter(Y_train_r[:, 0], Y_train_r[:, 1], label="train", marker="*", s=50)
    plt.scatter(Y_test_r[:, 0], Y_test_r[:, 1], label="test", marker="*", s=50)
    plt.xlabel("Y comp. 1")
    plt.ylabel("Y comp. 2")
    plt.title(
        "Y comp. 1 vs Y comp. 2 , (test corr = %.2f)"
        % np.corrcoef(Y_test_r[:, 0], Y_test_r[:, 1])[0, 1]
    )
    plt.legend(loc="best")
    plt.xticks(())
    plt.yticks(())
    plt.show()


.. image-sg:: /auto_examples/cross_decomposition/images/sphx_glr_plot_compare_cross_decomposition_001.png
   :alt: Comp. 1: X vs Y (test corr = 0.60), Comp. 2: X vs Y (test corr = 0.67), X comp. 1 vs X comp. 2 (test corr = -0.17), Y comp. 1 vs Y comp. 2 , (test corr = -0.05)
   :srcset: /auto_examples/cross_decomposition/images/sphx_glr_plot_compare_cross_decomposition_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 126-128

PLS regression, with multivariate response, a.k.a. PLS2
-------------------------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 128-148

.. code-block:: default


    from sklearn.cross_decomposition import PLSRegression

    n = 1000
    q = 3
    p = 10
    X = np.random.normal(size=n * p).reshape((n, p))
    B = np.array([[1, 2] + [0] * (p - 2)] * q).T
    # each Yj = 1*X1 + 2*X2 + noize
    Y = np.dot(X, B) + np.random.normal(size=n * q).reshape((n, q)) + 5

    pls2 = PLSRegression(n_components=3)
    pls2.fit(X, Y)
    print("True B (such that: Y = XB + Err)")
    print(B)
    # compare pls2.coef_ with B
    print("Estimated B")
    print(np.round(pls2.coef_, 1))
    pls2.predict(X)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    True B (such that: Y = XB + Err)
    [[1 1 1]
     [2 2 2]
     [0 0 0]
     [0 0 0]
     [0 0 0]
     [0 0 0]
     [0 0 0]
     [0 0 0]
     [0 0 0]
     [0 0 0]]
    Estimated B
    /home/circleci/project/sklearn/cross_decomposition/_pls.py:503: FutureWarning:

    The attribute `coef_` will be transposed in version 1.3 to be consistent with other linear models in scikit-learn. Currently, `coef_` has a shape of (n_features, n_targets) and in the future it will have a shape of (n_targets, n_features).

    [[ 1.   1.   1. ]
     [ 2.   1.9  1.9]
     [-0.1 -0.   0. ]
     [ 0.   0.  -0. ]
     [-0.  -0.   0. ]
     [ 0.   0.   0. ]
     [ 0.   0.   0. ]
     [ 0.  -0.  -0. ]
     [ 0.   0.   0. ]
     [ 0.   0.   0.1]]

    array([[ 3.50210309,  3.55301008,  3.72528805],
           [10.03429511,  9.83576671,  9.74902647],
           [ 8.03916339,  7.84652988,  7.78629756],
           ...,
           [ 2.11231897,  2.1905275 ,  2.33508757],
           [ 5.35433161,  5.32686504,  5.39877158],
           [ 5.47827435,  5.38004088,  5.35574845]])


.. GENERATED FROM PYTHON SOURCE LINES 149-151

PLS regression, with univariate response, a.k.a. PLS1
-----------------------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 151-162

.. code-block:: default


    n = 1000
    p = 10
    X = np.random.normal(size=n * p).reshape((n, p))
    y = X[:, 0] + 2 * X[:, 1] + np.random.normal(size=n * 1) + 5
    pls1 = PLSRegression(n_components=3)
    pls1.fit(X, y)
    # note that the number of components exceeds 1 (the dimension of y)
    print("Estimated betas")
    print(np.round(pls1.coef_, 1))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Estimated betas
    /home/circleci/project/sklearn/cross_decomposition/_pls.py:503: FutureWarning:

    The attribute `coef_` will be transposed in version 1.3 to be consistent with other linear models in scikit-learn. Currently, `coef_` has a shape of (n_features, n_targets) and in the future it will have a shape of (n_targets, n_features).

    [[ 1.]
     [ 2.]
     [-0.]
     [ 0.]
     [-0.]
     [ 0.]
     [-0.]
     [ 0.]
     [-0.]
     [ 0.]]


.. GENERATED FROM PYTHON SOURCE LINES 163-165

CCA (PLS mode B with symmetric deflation)
-----------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 165-172

.. code-block:: default


    from sklearn.cross_decomposition import CCA

    cca = CCA(n_components=2)
    cca.fit(X_train, Y_train)
    X_train_r, Y_train_r = cca.transform(X_train, Y_train)
    X_test_r, Y_test_r = cca.transform(X_test, Y_test)


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  0.279 seconds)


.. _sphx_glr_download_auto_examples_cross_decomposition_plot_compare_cross_decomposition.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.2.X?urlpath=lab/tree/notebooks/auto_examples/cross_decomposition/plot_compare_cross_decomposition.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_compare_cross_decomposition.py <plot_compare_cross_decomposition.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_compare_cross_decomposition.ipynb <plot_compare_cross_decomposition.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_