sklearn.decomposition.DictionaryLearning

class sklearn.decomposition.DictionaryLearning(n_components=None, alpha=1, max_iter=1000, tol=1e-08, fit_algorithm='lars', transform_algorithm='omp', transform_n_nonzero_coefs=None, transform_alpha=None, n_jobs=None, code_init=None, dict_init=None, verbose=False, split_sign=False, random_state=None, positive_code=False, positive_dict=False, transform_max_iter=1000)[source]

Dictionary learning

Finds a dictionary (a set of atoms) that can best be used to represent data using a sparse code.

Solves the optimization problem:

(U^*,V^*) = argmin 0.5 || Y - U V ||_2^2 + alpha * || U ||_1
            (U,V)
            with || V_k ||_2 = 1 for all  0 <= k < n_components

Read more in the User Guide.

Parameters
n_componentsint, default=n_features

number of dictionary elements to extract

alphafloat, default=1.0

sparsity controlling parameter

max_iterint, default=1000

maximum number of iterations to perform

tolfloat, default=1e-8

tolerance for numerical error

fit_algorithm{‘lars’, ‘cd’}, default=’lars’

lars: uses the least angle regression method to solve the lasso problem (linear_model.lars_path) cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). Lars will be faster if the estimated components are sparse.

New in version 0.17: cd coordinate descent method to improve speed.

transform_algorithm{‘lasso_lars’, ‘lasso_cd’, ‘lars’, ‘omp’, ‘threshold’}, default=’omp’

Algorithm used to transform the data lars: uses the least angle regression method (linear_model.lars_path) lasso_lars: uses Lars to compute the Lasso solution lasso_cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). lasso_lars will be faster if the estimated components are sparse. omp: uses orthogonal matching pursuit to estimate the sparse solution threshold: squashes to zero all coefficients less than alpha from the projection dictionary * X'

New in version 0.17: lasso_cd coordinate descent method to improve speed.

transform_n_nonzero_coefsint, default=0.1*n_features

Number of nonzero coefficients to target in each column of the solution. This is only used by algorithm='lars' and algorithm='omp' and is overridden by alpha in the omp case.

transform_alphafloat, default=1.0

If algorithm='lasso_lars' or algorithm='lasso_cd', alpha is the penalty applied to the L1 norm. If algorithm='threshold', alpha is the absolute value of the threshold below which coefficients will be squashed to zero. If algorithm='omp', alpha is the tolerance parameter: the value of the reconstruction error targeted. In this case, it overrides n_nonzero_coefs.

n_jobsint or None, default=None

Number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

code_initarray of shape (n_samples, n_components), default=None

initial value for the code, for warm restart

dict_initarray of shape (n_components, n_features), default=None

initial values for the dictionary, for warm restart

verbosebool, default=False

To control the verbosity of the procedure.

split_signbool, default=False

Whether to split the sparse feature vector into the concatenation of its negative part and its positive part. This can improve the performance of downstream classifiers.

random_stateint, RandomState instance or None, default=None

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

positive_codebool, default=False

Whether to enforce positivity when finding the code.

New in version 0.20.

positive_dictbool, default=False

Whether to enforce positivity when finding the dictionary

New in version 0.20.

transform_max_iterint, default=1000

Maximum number of iterations to perform if algorithm='lasso_cd' or lasso_lars.

New in version 0.22.

Attributes
components_array, [n_components, n_features]

dictionary atoms extracted from the data

error_array

vector of errors at each iteration

n_iter_int

Number of iterations run.

Notes

References:

J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009: Online dictionary learning for sparse coding (https://www.di.ens.fr/sierra/pdfs/icml09.pdf)

Methods

fit(self, X[, y])

Fit the model from data in X.

fit_transform(self, X[, y])

Fit to data, then transform it.

get_params(self[, deep])

Get parameters for this estimator.

set_params(self, \*\*params)

Set the parameters of this estimator.

transform(self, X)

Encode the data as a sparse combination of the dictionary atoms.

__init__(self, n_components=None, alpha=1, max_iter=1000, tol=1e-08, fit_algorithm='lars', transform_algorithm='omp', transform_n_nonzero_coefs=None, transform_alpha=None, n_jobs=None, code_init=None, dict_init=None, verbose=False, split_sign=False, random_state=None, positive_code=False, positive_dict=False, transform_max_iter=1000)[source]

Initialize self. See help(type(self)) for accurate signature.

fit(self, X, y=None)[source]

Fit the model from data in X.

Parameters
Xarray-like, shape (n_samples, n_features)

Training vector, where n_samples in the number of samples and n_features is the number of features.

yIgnored
Returns
selfobject

Returns the object itself

fit_transform(self, X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
Xnumpy array of shape [n_samples, n_features]

Training set.

ynumpy array of shape [n_samples]

Target values.

**fit_paramsdict

Additional fit parameters.

Returns
X_newnumpy array of shape [n_samples, n_features_new]

Transformed array.

get_params(self, deep=True)[source]

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

set_params(self, **params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfobject

Estimator instance.

transform(self, X)[source]

Encode the data as a sparse combination of the dictionary atoms.

Coding method is determined by the object parameter transform_algorithm.

Parameters
Xarray of shape (n_samples, n_features)

Test data to be transformed, must have the same number of features as the data used to train the model.

Returns
X_newarray, shape (n_samples, n_components)

Transformed data