# dict_learning#

sklearn.decomposition.dict_learning(X, n_components, *, alpha, max_iter=100, tol=1e-08, method='lars', n_jobs=None, dict_init=None, code_init=None, callback=None, verbose=False, random_state=None, return_n_iter=False, positive_dict=False, positive_code=False, method_max_iter=1000)[source]#

Solve a dictionary learning matrix factorization problem.

Finds the best dictionary and the corresponding sparse code for approximating the data matrix X by solving:

```(U^*, V^*) = argmin 0.5 || X - U V ||_Fro^2 + alpha * || U ||_1,1
(U,V)
with || V_k ||_2 = 1 for all  0 <= k < n_components
```

where V is the dictionary and U is the sparse code. ||.||_Fro stands for the Frobenius norm and ||.||_1,1 stands for the entry-wise matrix norm which is the sum of the absolute values of all the entries in the matrix.

Read more in the User Guide.

Parameters:
Xarray-like of shape (n_samples, n_features)

Data matrix.

n_componentsint

Number of dictionary atoms to extract.

alphaint or float

Sparsity controlling parameter.

max_iterint, default=100

Maximum number of iterations to perform.

tolfloat, default=1e-8

Tolerance for the stopping condition.

method{‘lars’, ‘cd’}, default=’lars’

The method used:

• `'lars'`: uses the least angle regression method to solve the lasso

problem (`linear_model.lars_path`);

• `'cd'`: uses the coordinate descent method to compute the Lasso solution (`linear_model.Lasso`). Lars will be faster if the estimated components are sparse.

n_jobsint, default=None

Number of parallel jobs to run. `None` means 1 unless in a `joblib.parallel_backend` context. `-1` means using all processors. See Glossary for more details.

dict_initndarray of shape (n_components, n_features), default=None

Initial value for the dictionary for warm restart scenarios. Only used if `code_init` and `dict_init` are not None.

code_initndarray of shape (n_samples, n_components), default=None

Initial value for the sparse code for warm restart scenarios. Only used if `code_init` and `dict_init` are not None.

callbackcallable, default=None

Callable that gets invoked every five iterations.

verbosebool, default=False

To control the verbosity of the procedure.

random_stateint, RandomState instance or None, default=None

Used for randomly initializing the dictionary. Pass an int for reproducible results across multiple function calls. See Glossary.

return_n_iterbool, default=False

Whether or not to return the number of iterations.

positive_dictbool, default=False

Whether to enforce positivity when finding the dictionary.

positive_codebool, default=False

Whether to enforce positivity when finding the code.

method_max_iterint, default=1000

Maximum number of iterations to perform.

Returns:
codendarray of shape (n_samples, n_components)

The sparse code factor in the matrix factorization.

dictionaryndarray of shape (n_components, n_features),

The dictionary factor in the matrix factorization.

errorsarray

Vector of errors at each iteration.

n_iterint

Number of iterations run. Returned only if `return_n_iter` is set to True.

`dict_learning_online`

Solve a dictionary learning matrix factorization problem online.

`DictionaryLearning`

Find a dictionary that sparsely encodes data.

`MiniBatchDictionaryLearning`

A faster, less accurate version of the dictionary learning algorithm.

`SparsePCA`

Sparse Principal Components Analysis.

`MiniBatchSparsePCA`

Mini-batch Sparse Principal Components Analysis.

Examples

```>>> import numpy as np
>>> from sklearn.datasets import make_sparse_coded_signal
>>> from sklearn.decomposition import dict_learning
>>> X, _, _ = make_sparse_coded_signal(
...     n_samples=30, n_components=15, n_features=20, n_nonzero_coefs=10,
...     random_state=42,
... )
>>> U, V, errors = dict_learning(X, n_components=15, alpha=0.1, random_state=42)
```

We can check the level of sparsity of `U`:

```>>> np.mean(U == 0)
0.6...
```

We can compare the average squared euclidean norm of the reconstruction error of the sparse coded signal relative to the squared euclidean norm of the original signal:

```>>> X_hat = U @ V
>>> np.mean(np.sum((X_hat - X) ** 2, axis=1) / np.sum(X ** 2, axis=1))
0.01...
```