This is documentation for an old release of Scikit-learn (version 0.24). Try the latest stable release (version 1.6) or development (unstable) versions.
1.8. Cross decomposition¶
The cross decomposition module contains supervised estimators for dimensionality reduction and regression, belonging to the “Partial Least Squares” family.
Cross decomposition algorithms find the fundamental relations between two
matrices (X and Y). They are latent variable approaches to modeling the
covariance structures in these two spaces. They will try to find the
multidimensional direction in the X space that explains the maximum
multidimensional variance direction in the Y space. In other words, PLS
projects both X
and Y
into a lower-dimensional subspace such that the
covariance between transformed(X)
and transformed(Y)
is maximal.
PLS draws similarities with Principal Component Regression (PCR), where
the samples are first projected into a lower-dimensional subspace, and the
targets y
are predicted using transformed(X)
. One issue with PCR is that
the dimensionality reduction is unsupervized, and may lose some important
variables: PCR would keep the features with the most variance, but it’s
possible that features with a small variances are relevant from predicting
the target. In a way, PLS allows for the same kind of dimensionality
reduction, but by taking into account the targets y
. An illustration of
this fact is given in the following example:
* Principal Component Regression vs Partial Least Squares Regression.
Apart from CCA, the PLS estimators are particularly suited when the matrix of predictors has more variables than observations, and when there is multicollinearity among the features. By contrast, standard linear regression would fail in these cases unless it is regularized.
Classes included in this module are PLSRegression
,
PLSCanonical
, CCA
and PLSSVD
1.8.1. PLSCanonical¶
We here describe the algorithm used in PLSCanonical
. The other
estimators use variants of this algorithm, and are detailed below.
We recommend section 1 for more details and comparisons between these
algorithms. In 1, PLSCanonical
corresponds to “PLSW2A”.
Given two centered matrices PLSCanonical
proceeds as follows:
Set
a) compute
and , the first left and right singular vectors of the cross-covariance matrix . and are called the weights. By definition, and are choosen so that they maximize the covariance between the projected and the projected target, that is .b) Project
and on the singular vectors to obtain scores: andc) Regress
on , i.e. find a vector such that the rank-1 matrix is as close as possible to . Do the same on with to obtain . The vectors and are called the loadings.d) deflate
and , i.e. subtract the rank-1 approximations: , and .
At the end, we have approximated
Note that the scores matrices
Step a) may be performed in two ways: either by computing the whole SVD of
'nipals'
option of the algorithm
parameter.
1.8.1.1. Transforming data¶
To transform x_rotations_
attribute.
Similarly, y_rotations_
attribute.
1.8.1.2. Predicting the targets Y¶
To predict the targets of some data
The idea is to try to predict the transformed targets
Then, we have
coef_
attribute.
1.8.2. PLSSVD¶
PLSSVD
is a simplified version of PLSCanonical
described earlier: instead of iteratively deflating the matrices PLSSVD
computes the SVD of n_components
singular vectors corresponding to
the biggest singular values in the matrices U
and V
, corresponding to the
x_weights_
and y_weights_
attributes. Here, the transformed data is
simply transformed(X) = XU
and transformed(Y) = YV
.
If n_components == 1
, PLSSVD
and PLSCanonical
are
strictly equivalent.
1.8.3. PLSRegression¶
The PLSRegression
estimator is similar to
PLSCanonical
with algorithm='nipals'
, with 2 significant
differences:
at step a) in the power method to compute
and , is never normalized.at step c), the targets
are approximated using the projection of (i.e. ) instead of the projection of (i.e. ). In other words, the loadings computation is different. As a result, the deflation in step d) will also be affected.
These two modifications affect the output of predict
and transform
,
which are not the same as for PLSCanonical
. Also, while the number
of components is limited by min(n_samples, n_features, n_targets)
in
PLSCanonical
, here the limit is the rank of min(n_samples, n_features)
.
PLSRegression
is also known as PLS1 (single targets) and PLS2
(multiple targets). Much like Lasso
,
PLSRegression
is a form of regularized linear regression where the
number of components controls the strength of the regularization.
1.8.4. Canonical Correlation Analysis¶
Canonical Correlation Analysis was developed prior and independently to PLS.
But it turns out that CCA
is a special case of PLS, and corresponds
to PLS in “Mode B” in the literature.
CCA
differs from PLSCanonical
in the way the weights
Since CCA
involves the inversion of
Reference: