.. note::
    :class: sphx-glr-download-link-note

    Click :ref:`here <sphx_glr_download_auto_examples_cluster_plot_linkage_comparison.py>` to download the full example code
.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_cluster_plot_linkage_comparison.py:


================================================================
Comparing different hierarchical linkage methods on toy datasets
================================================================

This example shows characteristics of different linkage
methods for hierarchical clustering on datasets that are
"interesting" but still in 2D.

The main observations to make are:

- single linkage is fast, and can perform well on
  non-globular data, but it performs poorly in the
  presence of noise.
- average and complete linkage perform well on
  cleanly separated globular clusters, but have mixed
  results otherwise.
- Ward is the most effective method for noisy data.

While these examples give some intuition about the
algorithms, this intuition might not apply to very high
dimensional data.


.. code-block:: default

    print(__doc__)

    import time
    import warnings

    import numpy as np
    import matplotlib.pyplot as plt

    from sklearn import cluster, datasets
    from sklearn.preprocessing import StandardScaler
    from itertools import cycle, islice

    np.random.seed(0)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none


Generate datasets. We choose the size big enough to see the scalability
of the algorithms, but not too big to avoid too long running times


.. code-block:: default


    n_samples = 1500
    noisy_circles = datasets.make_circles(n_samples=n_samples, factor=.5,
                                          noise=.05)
    noisy_moons = datasets.make_moons(n_samples=n_samples, noise=.05)
    blobs = datasets.make_blobs(n_samples=n_samples, random_state=8)
    no_structure = np.random.rand(n_samples, 2), None

    # Anisotropicly distributed data
    random_state = 170
    X, y = datasets.make_blobs(n_samples=n_samples, random_state=random_state)
    transformation = [[0.6, -0.6], [-0.4, 0.8]]
    X_aniso = np.dot(X, transformation)
    aniso = (X_aniso, y)

    # blobs with varied variances
    varied = datasets.make_blobs(n_samples=n_samples,
                                 cluster_std=[1.0, 2.5, 0.5],
                                 random_state=random_state)


Run the clustering and plot


.. code-block:: default


    # Set up cluster parameters
    plt.figure(figsize=(9 * 1.3 + 2, 14.5))
    plt.subplots_adjust(left=.02, right=.98, bottom=.001, top=.96, wspace=.05,
                        hspace=.01)

    plot_num = 1

    default_base = {'n_neighbors': 10,
                    'n_clusters': 3}

    datasets = [
        (noisy_circles, {'n_clusters': 2}),
        (noisy_moons, {'n_clusters': 2}),
        (varied, {'n_neighbors': 2}),
        (aniso, {'n_neighbors': 2}),
        (blobs, {}),
        (no_structure, {})]

    for i_dataset, (dataset, algo_params) in enumerate(datasets):
        # update parameters with dataset-specific values
        params = default_base.copy()
        params.update(algo_params)

        X, y = dataset

        # normalize dataset for easier parameter selection
        X = StandardScaler().fit_transform(X)

        # ============
        # Create cluster objects
        # ============
        ward = cluster.AgglomerativeClustering(
            n_clusters=params['n_clusters'], linkage='ward')
        complete = cluster.AgglomerativeClustering(
            n_clusters=params['n_clusters'], linkage='complete')
        average = cluster.AgglomerativeClustering(
            n_clusters=params['n_clusters'], linkage='average')
        single = cluster.AgglomerativeClustering(
            n_clusters=params['n_clusters'], linkage='single')

        clustering_algorithms = (
            ('Single Linkage', single),
            ('Average Linkage', average),
            ('Complete Linkage', complete),
            ('Ward Linkage', ward),
        )

        for name, algorithm in clustering_algorithms:
            t0 = time.time()

            # catch warnings related to kneighbors_graph
            with warnings.catch_warnings():
                warnings.filterwarnings(
                    "ignore",
                    message="the number of connected components of the " +
                    "connectivity matrix is [0-9]{1,2}" +
                    " > 1. Completing it to avoid stopping the tree early.",
                    category=UserWarning)
                algorithm.fit(X)

            t1 = time.time()
            if hasattr(algorithm, 'labels_'):
                y_pred = algorithm.labels_.astype(np.int)
            else:
                y_pred = algorithm.predict(X)

            plt.subplot(len(datasets), len(clustering_algorithms), plot_num)
            if i_dataset == 0:
                plt.title(name, size=18)

            colors = np.array(list(islice(cycle(['#377eb8', '#ff7f00', '#4daf4a',
                                                 '#f781bf', '#a65628', '#984ea3',
                                                 '#999999', '#e41a1c', '#dede00']),
                                          int(max(y_pred) + 1))))
            plt.scatter(X[:, 0], X[:, 1], s=10, color=colors[y_pred])

            plt.xlim(-2.5, 2.5)
            plt.ylim(-2.5, 2.5)
            plt.xticks(())
            plt.yticks(())
            plt.text(.99, .01, ('%.2fs' % (t1 - t0)).lstrip('0'),
                     transform=plt.gca().transAxes, size=15,
                     horizontalalignment='right')
            plot_num += 1

    plt.show()


.. image:: /auto_examples/cluster/images/sphx_glr_plot_linkage_comparison_001.png
    :class: sphx-glr-single-img


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  2.338 seconds)


.. _sphx_glr_download_auto_examples_cluster_plot_linkage_comparison.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: sphx-glr-download

     :download:`Download Python source code: plot_linkage_comparison.py <plot_linkage_comparison.py>`


  .. container:: sphx-glr-download

     :download:`Download Jupyter notebook: plot_linkage_comparison.ipynb <plot_linkage_comparison.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_