.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/cluster/plot_dbscan.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_cluster_plot_dbscan.py: =================================== Demo of DBSCAN clustering algorithm =================================== DBSCAN (Density-Based Spatial Clustering of Applications with Noise) finds core samples in regions of high density and expands clusters from them. This algorithm is good for data which contains clusters of similar density. See the :ref:`sphx_glr_auto_examples_cluster_plot_cluster_comparison.py` example for a demo of different clustering algorithms on 2D datasets. .. GENERATED FROM PYTHON SOURCE LINES 16-20 Data generation --------------- We use :class:`~sklearn.datasets.make_blobs` to create 3 synthetic clusters. .. GENERATED FROM PYTHON SOURCE LINES 20-31 .. code-block:: default from sklearn.datasets import make_blobs from sklearn.preprocessing import StandardScaler centers = [[1, 1], [-1, -1], [1, -1]] X, labels_true = make_blobs( n_samples=750, centers=centers, cluster_std=0.4, random_state=0 ) X = StandardScaler().fit_transform(X) .. GENERATED FROM PYTHON SOURCE LINES 32-33 We can visualize the resulting data: .. GENERATED FROM PYTHON SOURCE LINES 33-39 .. code-block:: default import matplotlib.pyplot as plt plt.scatter(X[:, 0], X[:, 1]) plt.show() .. image-sg:: /auto_examples/cluster/images/sphx_glr_plot_dbscan_001.png :alt: plot dbscan :srcset: /auto_examples/cluster/images/sphx_glr_plot_dbscan_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 40-45 Compute DBSCAN -------------- One can access the labels assigned by :class:`~sklearn.cluster.DBSCAN` using the `labels_` attribute. Noisy samples are given the label math:`-1`. .. GENERATED FROM PYTHON SOURCE LINES 45-60 .. code-block:: default import numpy as np from sklearn.cluster import DBSCAN from sklearn import metrics db = DBSCAN(eps=0.3, min_samples=10).fit(X) labels = db.labels_ # Number of clusters in labels, ignoring noise if present. n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0) n_noise_ = list(labels).count(-1) print("Estimated number of clusters: %d" % n_clusters_) print("Estimated number of noise points: %d" % n_noise_) .. rst-class:: sphx-glr-script-out .. code-block:: none Estimated number of clusters: 3 Estimated number of noise points: 18 .. GENERATED FROM PYTHON SOURCE LINES 61-76 Clustering algorithms are fundamentally unsupervised learning methods. However, since :class:`~sklearn.datasets.make_blobs` gives access to the true labels of the synthetic clusters, it is possible to use evaluation metrics that leverage this "supervised" ground truth information to quantify the quality of the resulting clusters. Examples of such metrics are the homogeneity, completeness, V-measure, Rand-Index, Adjusted Rand-Index and Adjusted Mutual Information (AMI). If the ground truth labels are not known, evaluation can only be performed using the model results itself. In that case, the Silhouette Coefficient comes in handy. For more information, see the :ref:`sphx_glr_auto_examples_cluster_plot_adjusted_for_chance_measures.py` example or the :ref:`clustering_evaluation` module. .. GENERATED FROM PYTHON SOURCE LINES 76-87 .. code-block:: default print(f"Homogeneity: {metrics.homogeneity_score(labels_true, labels):.3f}") print(f"Completeness: {metrics.completeness_score(labels_true, labels):.3f}") print(f"V-measure: {metrics.v_measure_score(labels_true, labels):.3f}") print(f"Adjusted Rand Index: {metrics.adjusted_rand_score(labels_true, labels):.3f}") print( "Adjusted Mutual Information:" f" {metrics.adjusted_mutual_info_score(labels_true, labels):.3f}" ) print(f"Silhouette Coefficient: {metrics.silhouette_score(X, labels):.3f}") .. rst-class:: sphx-glr-script-out .. code-block:: none Homogeneity: 0.953 Completeness: 0.883 V-measure: 0.917 Adjusted Rand Index: 0.952 Adjusted Mutual Information: 0.916 Silhouette Coefficient: 0.626 .. GENERATED FROM PYTHON SOURCE LINES 88-94 Plot results ------------ Core samples (large dots) and non-core samples (small dots) are color-coded according to the asigned cluster. Samples tagged as noise are represented in black. .. GENERATED FROM PYTHON SOURCE LINES 94-129 .. code-block:: default unique_labels = set(labels) core_samples_mask = np.zeros_like(labels, dtype=bool) core_samples_mask[db.core_sample_indices_] = True colors = [plt.cm.Spectral(each) for each in np.linspace(0, 1, len(unique_labels))] for k, col in zip(unique_labels, colors): if k == -1: # Black used for noise. col = [0, 0, 0, 1] class_member_mask = labels == k xy = X[class_member_mask & core_samples_mask] plt.plot( xy[:, 0], xy[:, 1], "o", markerfacecolor=tuple(col), markeredgecolor="k", markersize=14, ) xy = X[class_member_mask & ~core_samples_mask] plt.plot( xy[:, 0], xy[:, 1], "o", markerfacecolor=tuple(col), markeredgecolor="k", markersize=6, ) plt.title(f"Estimated number of clusters: {n_clusters_}") plt.show() .. image-sg:: /auto_examples/cluster/images/sphx_glr_plot_dbscan_002.png :alt: Estimated number of clusters: 3 :srcset: /auto_examples/cluster/images/sphx_glr_plot_dbscan_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.174 seconds) .. _sphx_glr_download_auto_examples_cluster_plot_dbscan.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.2.X?urlpath=lab/tree/notebooks/auto_examples/cluster/plot_dbscan.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_dbscan.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_dbscan.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_