Fork me on GitHub

sklearn.cluster.dbscan

sklearn.cluster.dbscan(X, eps=0.5, min_samples=5, metric='minkowski', algorithm='auto', leaf_size=30, p=2, random_state=None)

Perform DBSCAN clustering from vector array or distance matrix.

Parameters:

X: array [n_samples, n_samples] or [n_samples, n_features] :

Array of distances between samples, or a feature array. The array is treated as a feature array unless the metric is given as ‘precomputed’.

eps: float, optional :

The maximum distance between two samples for them to be considered as in the same neighborhood.

min_samples: int, optional :

The number of samples in a neighborhood for a point to be considered as a core point.

metric: string, or callable :

The metric to use when calculating distance between instances in a feature array. If metric is a string or callable, it must be one of the options allowed by metrics.pairwise.pairwise_distances for its metric parameter. If metric is “precomputed”, X is assumed to be a distance matrix and must be square.

algorithm: {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional :

The algorithm to be used by the NearestNeighbors module to compute pointwise distances and find nearest neighbors. See NearestNeighbors module documentation for details.

leaf_size: int, optional (default = 30) :

Leaf size passed to BallTree or cKDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

p: float, optional :

The power of the Minkowski metric to be used to calculate distance between points.

random_state: numpy.RandomState, optional :

The generator used to initialize the centers. Defaults to numpy.random.

Returns:

core_samples: array [n_core_samples] :

Indices of core samples.

labels : array [n_samples]

Cluster labels for each point. Noisy samples are given the label -1.

Notes

See examples/cluster/plot_dbscan.py for an example.

References

Ester, M., H. P. Kriegel, J. Sander, and X. Xu, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, pp. 226-231. 1996

Previous
Next