.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/cross_decomposition/plot_compare_cross_decomposition.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via JupyterLite or Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_cross_decomposition_plot_compare_cross_decomposition.py: =================================== Compare cross decomposition methods =================================== Simple usage of various cross decomposition algorithms: - PLSCanonical - PLSRegression, with multivariate response, a.k.a. PLS2 - PLSRegression, with univariate response, a.k.a. PLS1 - CCA Given 2 multivariate covarying two-dimensional datasets, X, and Y, PLS extracts the 'directions of covariance', i.e. the components of each datasets that explain the most shared variance between both datasets. This is apparent on the **scatterplot matrix** display: components 1 in dataset X and dataset Y are maximally correlated (points lie around the first diagonal). This is also true for components 2 in both dataset, however, the correlation across datasets for different components is weak: the point cloud is very spherical. .. GENERATED FROM PYTHON SOURCE LINES 23-27 .. code-block:: Python # Authors: The scikit-learn developers # SPDX-License-Identifier: BSD-3-Clause .. GENERATED FROM PYTHON SOURCE LINES 28-30 Dataset based latent variables model ------------------------------------ .. GENERATED FROM PYTHON SOURCE LINES 30-52 .. code-block:: Python import numpy as np n = 500 # 2 latents vars: l1 = np.random.normal(size=n) l2 = np.random.normal(size=n) latents = np.array([l1, l1, l2, l2]).T X = latents + np.random.normal(size=4 * n).reshape((n, 4)) Y = latents + np.random.normal(size=4 * n).reshape((n, 4)) X_train = X[: n // 2] Y_train = Y[: n // 2] X_test = X[n // 2 :] Y_test = Y[n // 2 :] print("Corr(X)") print(np.round(np.corrcoef(X.T), 2)) print("Corr(Y)") print(np.round(np.corrcoef(Y.T), 2)) .. rst-class:: sphx-glr-script-out .. code-block:: none Corr(X) [[ 1. 0.45 -0.04 0. ] [ 0.45 1. -0.1 -0.02] [-0.04 -0.1 1. 0.42] [ 0. -0.02 0.42 1. ]] Corr(Y) [[ 1. 0.48 -0.12 -0.05] [ 0.48 1. 0.07 0.04] [-0.12 0.07 1. 0.5 ] [-0.05 0.04 0.5 1. ]] .. GENERATED FROM PYTHON SOURCE LINES 53-58 Canonical (symmetric) PLS ------------------------- Transform data ~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 58-66 .. code-block:: Python from sklearn.cross_decomposition import PLSCanonical plsca = PLSCanonical(n_components=2) plsca.fit(X_train, Y_train) X_train_r, Y_train_r = plsca.transform(X_train, Y_train) X_test_r, Y_test_r = plsca.transform(X_test, Y_test) .. GENERATED FROM PYTHON SOURCE LINES 67-69 Scatter plot of scores ~~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 69-128 .. code-block:: Python import matplotlib.pyplot as plt # On diagonal plot X vs Y scores on each components plt.figure(figsize=(12, 8)) plt.subplot(221) plt.scatter(X_train_r[:, 0], Y_train_r[:, 0], label="train", marker="o", s=25) plt.scatter(X_test_r[:, 0], Y_test_r[:, 0], label="test", marker="o", s=25) plt.xlabel("x scores") plt.ylabel("y scores") plt.title( "Comp. 1: X vs Y (test corr = %.2f)" % np.corrcoef(X_test_r[:, 0], Y_test_r[:, 0])[0, 1] ) plt.xticks(()) plt.yticks(()) plt.legend(loc="best") plt.subplot(224) plt.scatter(X_train_r[:, 1], Y_train_r[:, 1], label="train", marker="o", s=25) plt.scatter(X_test_r[:, 1], Y_test_r[:, 1], label="test", marker="o", s=25) plt.xlabel("x scores") plt.ylabel("y scores") plt.title( "Comp. 2: X vs Y (test corr = %.2f)" % np.corrcoef(X_test_r[:, 1], Y_test_r[:, 1])[0, 1] ) plt.xticks(()) plt.yticks(()) plt.legend(loc="best") # Off diagonal plot components 1 vs 2 for X and Y plt.subplot(222) plt.scatter(X_train_r[:, 0], X_train_r[:, 1], label="train", marker="*", s=50) plt.scatter(X_test_r[:, 0], X_test_r[:, 1], label="test", marker="*", s=50) plt.xlabel("X comp. 1") plt.ylabel("X comp. 2") plt.title( "X comp. 1 vs X comp. 2 (test corr = %.2f)" % np.corrcoef(X_test_r[:, 0], X_test_r[:, 1])[0, 1] ) plt.legend(loc="best") plt.xticks(()) plt.yticks(()) plt.subplot(223) plt.scatter(Y_train_r[:, 0], Y_train_r[:, 1], label="train", marker="*", s=50) plt.scatter(Y_test_r[:, 0], Y_test_r[:, 1], label="test", marker="*", s=50) plt.xlabel("Y comp. 1") plt.ylabel("Y comp. 2") plt.title( "Y comp. 1 vs Y comp. 2 , (test corr = %.2f)" % np.corrcoef(Y_test_r[:, 0], Y_test_r[:, 1])[0, 1] ) plt.legend(loc="best") plt.xticks(()) plt.yticks(()) plt.show() .. image-sg:: /auto_examples/cross_decomposition/images/sphx_glr_plot_compare_cross_decomposition_001.png :alt: Comp. 1: X vs Y (test corr = 0.67), Comp. 2: X vs Y (test corr = 0.64), X comp. 1 vs X comp. 2 (test corr = -0.02), Y comp. 1 vs Y comp. 2 , (test corr = -0.12) :srcset: /auto_examples/cross_decomposition/images/sphx_glr_plot_compare_cross_decomposition_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 129-131 PLS regression, with multivariate response, a.k.a. PLS2 ------------------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 131-151 .. code-block:: Python from sklearn.cross_decomposition import PLSRegression n = 1000 q = 3 p = 10 X = np.random.normal(size=n * p).reshape((n, p)) B = np.array([[1, 2] + [0] * (p - 2)] * q).T # each Yj = 1*X1 + 2*X2 + noize Y = np.dot(X, B) + np.random.normal(size=n * q).reshape((n, q)) + 5 pls2 = PLSRegression(n_components=3) pls2.fit(X, Y) print("True B (such that: Y = XB + Err)") print(B) # compare pls2.coef_ with B print("Estimated B") print(np.round(pls2.coef_, 1)) pls2.predict(X) .. rst-class:: sphx-glr-script-out .. code-block:: none True B (such that: Y = XB + Err) [[1 1 1] [2 2 2] [0 0 0] [0 0 0] [0 0 0] [0 0 0] [0 0 0] [0 0 0] [0 0 0] [0 0 0]] Estimated B [[ 1. 2. 0. -0. -0. 0. 0. -0. 0. 0. ] [ 1. 1.9 0. -0. -0. 0.1 0. 0. 0. 0. ] [ 1. 2.1 -0. 0. -0. 0. 0. 0. 0. -0. ]] array([[4.11693539, 4.19803308, 4.12190903], [8.77322639, 8.77777215, 9.04995982], [5.34990341, 5.37257991, 5.27597342], ..., [5.95433992, 5.9403917 , 6.02818216], [5.06880943, 5.08604995, 5.05216586], [9.72295655, 9.70432034, 9.79769376]]) .. GENERATED FROM PYTHON SOURCE LINES 152-154 PLS regression, with univariate response, a.k.a. PLS1 ----------------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 154-165 .. code-block:: Python n = 1000 p = 10 X = np.random.normal(size=n * p).reshape((n, p)) y = X[:, 0] + 2 * X[:, 1] + np.random.normal(size=n * 1) + 5 pls1 = PLSRegression(n_components=3) pls1.fit(X, y) # note that the number of components exceeds 1 (the dimension of y) print("Estimated betas") print(np.round(pls1.coef_, 1)) .. rst-class:: sphx-glr-script-out .. code-block:: none Estimated betas [[ 1. 2. -0.1 0. -0. -0. -0. 0. 0. -0.1]] .. GENERATED FROM PYTHON SOURCE LINES 166-168 CCA (PLS mode B with symmetric deflation) ----------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 168-175 .. code-block:: Python from sklearn.cross_decomposition import CCA cca = CCA(n_components=2) cca.fit(X_train, Y_train) X_train_r, Y_train_r = cca.transform(X_train, Y_train) X_test_r, Y_test_r = cca.transform(X_test, Y_test) .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.198 seconds) .. _sphx_glr_download_auto_examples_cross_decomposition_plot_compare_cross_decomposition.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.6.X?urlpath=lab/tree/notebooks/auto_examples/cross_decomposition/plot_compare_cross_decomposition.ipynb :alt: Launch binder :width: 150 px .. container:: lite-badge .. image:: images/jupyterlite_badge_logo.svg :target: ../../lite/lab/index.html?path=auto_examples/cross_decomposition/plot_compare_cross_decomposition.ipynb :alt: Launch JupyterLite :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_compare_cross_decomposition.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_compare_cross_decomposition.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_compare_cross_decomposition.zip ` .. include:: plot_compare_cross_decomposition.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_