sklearn.decomposition
.dict_learning_online¶
-
sklearn.decomposition.
dict_learning_online
(X, n_components=2, *, alpha=1, n_iter=100, return_code=True, dict_init=None, callback=None, batch_size=3, verbose=False, shuffle=True, n_jobs=None, method='lars', iter_offset=0, random_state=None, return_inner_stats=False, inner_stats=None, return_n_iter=False, positive_dict=False, positive_code=False, method_max_iter=1000)[source]¶ Solves a dictionary learning matrix factorization problem online.
Finds the best dictionary and the corresponding sparse code for approximating the data matrix X by solving:
(U^*, V^*) = argmin 0.5 || X - U V ||_2^2 + alpha * || U ||_1 (U,V) with || V_k ||_2 = 1 for all 0 <= k < n_components
where V is the dictionary and U is the sparse code. This is accomplished by repeatedly iterating over mini-batches by slicing the input data.
Read more in the User Guide.
- Parameters
- Xndarray of shape (n_samples, n_features)
Data matrix.
- n_componentsint, default=2
Number of dictionary atoms to extract.
- alphafloat, default=1
Sparsity controlling parameter.
- n_iterint, default=100
Number of mini-batch iterations to perform.
- return_codebool, default=True
Whether to also return the code U or just the dictionary
V
.- dict_initndarray of shape (n_components, n_features), default=None
Initial value for the dictionary for warm restart scenarios.
- callbackcallable, default=None
callable that gets invoked every five iterations.
- batch_sizeint, default=3
The number of samples to take in each batch.
- verbosebool, default=False
To control the verbosity of the procedure.
- shufflebool, default=True
Whether to shuffle the data before splitting it in batches.
- n_jobsint, default=None
Number of parallel jobs to run.
None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors. See Glossary for more details.- method{‘lars’, ‘cd’}, default=’lars’
'lars'
: uses the least angle regression method to solve the lasso problem (linear_model.lars_path
);'cd'
: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso
). Lars will be faster if the estimated components are sparse.
- iter_offsetint, default=0
Number of previous iterations completed on the dictionary used for initialization.
- random_stateint, RandomState instance or None, default=None
Used for initializing the dictionary when
dict_init
is not specified, randomly shuffling the data whenshuffle
is set toTrue
, and updating the dictionary. Pass an int for reproducible results across multiple function calls. See Glossary.- return_inner_statsbool, default=False
Return the inner statistics A (dictionary covariance) and B (data approximation). Useful to restart the algorithm in an online setting. If
return_inner_stats
isTrue
,return_code
is ignored.- inner_statstuple of (A, B) ndarrays, default=None
Inner sufficient statistics that are kept by the algorithm. Passing them at initialization is useful in online settings, to avoid losing the history of the evolution.
A
(n_components, n_components)
is the dictionary covariance matrix.B
(n_features, n_components)
is the data approximation matrix.- return_n_iterbool, default=False
Whether or not to return the number of iterations.
- positive_dictbool, default=False
Whether to enforce positivity when finding the dictionary.
New in version 0.20.
- positive_codebool, default=False
Whether to enforce positivity when finding the code.
New in version 0.20.
- method_max_iterint, default=1000
Maximum number of iterations to perform when solving the lasso problem.
New in version 0.22.
- Returns
- codendarray of shape (n_samples, n_components),
The sparse code (only returned if
return_code=True
).- dictionaryndarray of shape (n_components, n_features),
The solutions to the dictionary learning problem.
- n_iterint
Number of iterations run. Returned only if
return_n_iter
is set toTrue
.