sklearn.decomposition.dict_learning_online

sklearn.decomposition.dict_learning_online(X, n_components=2, *, alpha=1, n_iter=100, return_code=True, dict_init=None, callback=None, batch_size=3, verbose=False, shuffle=True, n_jobs=None, method='lars', iter_offset=0, random_state=None, return_inner_stats=False, inner_stats=None, return_n_iter=False, positive_dict=False, positive_code=False, method_max_iter=1000)[source]

Solves a dictionary learning matrix factorization problem online.

Finds the best dictionary and the corresponding sparse code for approximating the data matrix X by solving:

(U^*, V^*) = argmin 0.5 || X - U V ||_2^2 + alpha * || U ||_1
             (U,V)
             with || V_k ||_2 = 1 for all  0 <= k < n_components

where V is the dictionary and U is the sparse code. This is accomplished by repeatedly iterating over mini-batches by slicing the input data.

Read more in the User Guide.

Parameters
Xndarray of shape (n_samples, n_features)

Data matrix.

n_componentsint, default=2

Number of dictionary atoms to extract.

alphafloat, default=1

Sparsity controlling parameter.

n_iterint, default=100

Number of mini-batch iterations to perform.

return_codebool, default=True

Whether to also return the code U or just the dictionary V.

dict_initndarray of shape (n_components, n_features), default=None

Initial value for the dictionary for warm restart scenarios.

callbackcallable, default=None

callable that gets invoked every five iterations.

batch_sizeint, default=3

The number of samples to take in each batch.

verbosebool, default=False

To control the verbosity of the procedure.

shufflebool, default=True

Whether to shuffle the data before splitting it in batches.

n_jobsint, default=None

Number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

method{‘lars’, ‘cd’}, default=’lars’
  • 'lars': uses the least angle regression method to solve the lasso problem (linear_model.lars_path);

  • 'cd': uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). Lars will be faster if the estimated components are sparse.

iter_offsetint, default=0

Number of previous iterations completed on the dictionary used for initialization.

random_stateint, RandomState instance or None, default=None

Used for initializing the dictionary when dict_init is not specified, randomly shuffling the data when shuffle is set to True, and updating the dictionary. Pass an int for reproducible results across multiple function calls. See Glossary.

return_inner_statsbool, default=False

Return the inner statistics A (dictionary covariance) and B (data approximation). Useful to restart the algorithm in an online setting. If return_inner_stats is True, return_code is ignored.

inner_statstuple of (A, B) ndarrays, default=None

Inner sufficient statistics that are kept by the algorithm. Passing them at initialization is useful in online settings, to avoid losing the history of the evolution. A (n_components, n_components) is the dictionary covariance matrix. B (n_features, n_components) is the data approximation matrix.

return_n_iterbool, default=False

Whether or not to return the number of iterations.

positive_dictbool, default=False

Whether to enforce positivity when finding the dictionary.

New in version 0.20.

positive_codebool, default=False

Whether to enforce positivity when finding the code.

New in version 0.20.

method_max_iterint, default=1000

Maximum number of iterations to perform when solving the lasso problem.

New in version 0.22.

Returns
codendarray of shape (n_samples, n_components),

The sparse code (only returned if return_code=True).

dictionaryndarray of shape (n_components, n_features),

The solutions to the dictionary learning problem.

n_iterint

Number of iterations run. Returned only if return_n_iter is set to True.