OPTICS#

class sklearn.cluster.OPTICS(*, min_samples=5, max_eps=inf, metric='minkowski', p=2, metric_params=None, cluster_method='xi', eps=None, xi=0.05, predecessor_correction=True, min_cluster_size=None, algorithm='auto', leaf_size=30, memory=None, n_jobs=None)[source]#

Estimate clustering structure from vector array.

OPTICS (Ordering Points To Identify the Clustering Structure), closely related to DBSCAN, finds core sample of high density and expands clusters from them [1]. Unlike DBSCAN, keeps cluster hierarchy for a variable neighborhood radius. Better suited for usage on large datasets than the current sklearn implementation of DBSCAN.

Clusters are then extracted using a DBSCAN-like method (cluster_method = ‘dbscan’) or an automatic technique proposed in [1] (cluster_method = ‘xi’).

This implementation deviates from the original OPTICS by first performing k-nearest-neighborhood searches on all points to identify core sizes, then computing only the distances to unprocessed points when constructing the cluster order. Note that we do not employ a heap to manage the expansion candidates, so the time complexity will be O(n^2).

See also

DBSCAN: A similar clustering for a specified neighborhood radius (eps). Our implementation is optimized for runtime.

References

[1] (1,2)

Ankerst, Mihael, Markus M. Breunig, Hans-Peter Kriegel, and Jörg Sander. “OPTICS: ordering points to identify the clustering structure.” ACM SIGMOD Record 28, no. 2 (1999): 49-60.

[2]

Schubert, Erich, Michael Gertz. “Improving the Cluster Structure Extracted from OPTICS Plots.” Proc. of the Conference “Lernen, Wissen, Daten, Analysen” (LWDA) (2018): 318-329.

Examples

>>> from sklearn.cluster import OPTICS
>>> import numpy as np
>>> X = np.array([[1, 2], [2, 5], [3, 6],
...               [8, 7], [8, 8], [7, 3]])
>>> clustering = OPTICS(min_samples=2).fit(X)
>>> clustering.labels_
array([0, 0, 0, 1, 1, 1])

For a more detailed example see Demo of OPTICS clustering algorithm.

For a comparison of OPTICS with other clustering algorithms, see Comparing different clustering algorithms on toy datasets

fit(X, y=None)[source]#

Perform OPTICS clustering.

Extracts an ordered list of points and reachability distances, and performs initial clustering using max_eps distance specified at OPTICS object instantiation.

Parameters:

X{ndarray, sparse matrix} of shape (n_samples, n_features), or (n_samples, n_samples) if metric=’precomputed’: A feature array, or array of distances between samples if metric=’precomputed’. If a sparse matrix is provided, it will be converted into CSR format.
yIgnored: Not used, present for API consistency by convention.

Returns:

selfobject: Returns a fitted instance of self.

fit_predict(X, y=None, **kwargs)[source]#

Perform clustering on X and returns cluster labels.

Parameters:

Xarray-like of shape (n_samples, n_features): Input data.
yIgnored: Not used, present for API consistency by convention.
**kwargsdict: Arguments to be passed to fit.

Added in version 1.4.

Returns:

labelsndarray of shape (n_samples,), dtype=np.int64: Cluster labels.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: