sklearn.decomposition.dict_learning_online(X, n_components=2, *, alpha=1, max_iter=100, return_code=True, dict_init=None, callback=None, batch_size=256, verbose=False, shuffle=True, n_jobs=None, method='lars', random_state=None, positive_dict=False, positive_code=False, method_max_iter=1000, tol=0.001, max_no_improvement=10)[source]#

Solve a dictionary learning matrix factorization problem online.

Finds the best dictionary and the corresponding sparse code for approximating the data matrix X by solving:

(U^*, V^*) = argmin 0.5 || X - U V ||_Fro^2 + alpha * || U ||_1,1
             with || V_k ||_2 = 1 for all  0 <= k < n_components

where V is the dictionary and U is the sparse code. ||.||_Fro stands for the Frobenius norm and ||.||_1,1 stands for the entry-wise matrix norm which is the sum of the absolute values of all the entries in the matrix. This is accomplished by repeatedly iterating over mini-batches by slicing the input data.

Read more in the User Guide.

Xarray-like of shape (n_samples, n_features)

Data matrix.

n_componentsint or None, default=2

Number of dictionary atoms to extract. If None, then n_components is set to n_features.

alphafloat, default=1

Sparsity controlling parameter.

max_iterint, default=100

Maximum number of iterations over the complete dataset before stopping independently of any early stopping criterion heuristics.

Added in version 1.1.

Deprecated since version 1.4: max_iter=None is deprecated in 1.4 and will be removed in 1.6. Use the default value (i.e. 100) instead.

return_codebool, default=True

Whether to also return the code U or just the dictionary V.

dict_initndarray of shape (n_components, n_features), default=None

Initial values for the dictionary for warm restart scenarios. If None, the initial values for the dictionary are created with an SVD decomposition of the data via randomized_svd.

callbackcallable, default=None

A callable that gets invoked at the end of each iteration.

batch_sizeint, default=256

The number of samples to take in each batch.

Changed in version 1.3: The default value of batch_size changed from 3 to 256 in version 1.3.

verbosebool, default=False

To control the verbosity of the procedure.

shufflebool, default=True

Whether to shuffle the data before splitting it in batches.

n_jobsint, default=None

Number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

method{‘lars’, ‘cd’}, default=’lars’
  • 'lars': uses the least angle regression method to solve the lasso problem (linear_model.lars_path);

  • 'cd': uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). Lars will be faster if the estimated components are sparse.

random_stateint, RandomState instance or None, default=None

Used for initializing the dictionary when dict_init is not specified, randomly shuffling the data when shuffle is set to True, and updating the dictionary. Pass an int for reproducible results across multiple function calls. See Glossary.

positive_dictbool, default=False

Whether to enforce positivity when finding the dictionary.

Added in version 0.20.

positive_codebool, default=False

Whether to enforce positivity when finding the code.

Added in version 0.20.

method_max_iterint, default=1000

Maximum number of iterations to perform when solving the lasso problem.

Added in version 0.22.

tolfloat, default=1e-3

Control early stopping based on the norm of the differences in the dictionary between 2 steps.

To disable early stopping based on changes in the dictionary, set tol to 0.0.

Added in version 1.1.

max_no_improvementint, default=10

Control early stopping based on the consecutive number of mini batches that does not yield an improvement on the smoothed cost function.

To disable convergence detection based on cost function, set max_no_improvement to None.

Added in version 1.1.

codendarray of shape (n_samples, n_components),

The sparse code (only returned if return_code=True).

dictionaryndarray of shape (n_components, n_features),

The solutions to the dictionary learning problem.


Number of iterations run. Returned only if return_n_iter is set to True.

See also


Solve a dictionary learning matrix factorization problem.


Find a dictionary that sparsely encodes data.


A faster, less accurate, version of the dictionary learning algorithm.


Sparse Principal Components Analysis.


Mini-batch Sparse Principal Components Analysis.


>>> import numpy as np
>>> from sklearn.datasets import make_sparse_coded_signal
>>> from sklearn.decomposition import dict_learning_online
>>> X, _, _ = make_sparse_coded_signal(
...     n_samples=30, n_components=15, n_features=20, n_nonzero_coefs=10,
...     random_state=42,
... )
>>> U, V = dict_learning_online(
...     X, n_components=15, alpha=0.2, max_iter=20, batch_size=3, random_state=42
... )

We can check the level of sparsity of U:

>>> np.mean(U == 0)

We can compare the average squared euclidean norm of the reconstruction error of the sparse coded signal relative to the squared euclidean norm of the original signal:

>>> X_hat = U @ V
>>> np.mean(np.sum((X_hat - X) ** 2, axis=1) / np.sum(X ** 2, axis=1))