sklearn.random_projection.GaussianRandomProjection

class sklearn.random_projection.GaussianRandomProjection(n_components='auto', *, eps=0.1, random_state=None)[source]

Reduce dimensionality through Gaussian random projection.

The components of the random matrix are drawn from N(0, 1 / n_components).

Read more in the User Guide.

New in version 0.13.

Parameters
n_componentsint or ‘auto’, default=’auto’

Dimensionality of the target projection space.

n_components can be automatically adjusted according to the number of samples in the dataset and the bound given by the Johnson-Lindenstrauss lemma. In that case the quality of the embedding is controlled by the eps parameter.

It should be noted that Johnson-Lindenstrauss lemma can yield very conservative estimated of the required number of components as it makes no assumption on the structure of the dataset.

epsfloat, default=0.1

Parameter to control the quality of the embedding according to the Johnson-Lindenstrauss lemma when n_components is set to ‘auto’. The value should be strictly positive.

Smaller values lead to better embedding and higher number of dimensions (n_components) in the target projection space.

random_stateint, RandomState instance or None, default=None

Controls the pseudo random number generator used to generate the projection matrix at fit time. Pass an int for reproducible output across multiple function calls. See Glossary.

Attributes
n_components_int

Concrete number of components computed when n_components=”auto”.

components_ndarray of shape (n_components, n_features)

Random matrix used for the projection.

Examples

>>> import numpy as np
>>> from sklearn.random_projection import GaussianRandomProjection
>>> rng = np.random.RandomState(42)
>>> X = rng.rand(100, 10000)
>>> transformer = GaussianRandomProjection(random_state=rng)
>>> X_new = transformer.fit_transform(X)
>>> X_new.shape
(100, 3947)

Methods

fit(X[, y])

Generate a sparse random projection matrix.

fit_transform(X[, y])

Fit to data, then transform it.

get_params([deep])

Get parameters for this estimator.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Project the data by using matrix product with the random matrix

fit(X, y=None)[source]

Generate a sparse random projection matrix.

Parameters
X{ndarray, sparse matrix} of shape (n_samples, n_features)

Training set: only the shape is used to find optimal random matrix dimensions based on the theory referenced in the afore mentioned papers.

y

Ignored

Returns
self
fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsdict

Parameter names mapped to their values.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfestimator instance

Estimator instance.

transform(X)[source]

Project the data by using matrix product with the random matrix

Parameters
X{ndarray, sparse matrix} of shape (n_samples, n_features)

The input data to project into a smaller dimensional space.

Returns
X_new{ndarray, sparse matrix} of shape (n_samples, n_components)

Projected array.