.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/cluster/plot_mini_batch_kmeans.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via JupyterLite or Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_cluster_plot_mini_batch_kmeans.py: ==================================================================== Comparison of the K-Means and MiniBatchKMeans clustering algorithms ==================================================================== We want to compare the performance of the MiniBatchKMeans and KMeans: the MiniBatchKMeans is faster, but gives slightly different results (see :ref:`mini_batch_kmeans`). We will cluster a set of data, first with KMeans and then with MiniBatchKMeans, and plot the results. We will also plot the points that are labelled differently between the two algorithms. .. GENERATED FROM PYTHON SOURCE LINES 16-20 .. code-block:: Python # Authors: The scikit-learn developers # SPDX-License-Identifier: BSD-3-Clause .. GENERATED FROM PYTHON SOURCE LINES 21-25 Generate the data ----------------- We start by generating the blobs of data to be clustered. .. GENERATED FROM PYTHON SOURCE LINES 25-37 .. code-block:: Python import numpy as np from sklearn.datasets import make_blobs np.random.seed(0) batch_size = 45 centers = [[1, 1], [-1, -1], [1, -1]] n_clusters = len(centers) X, labels_true = make_blobs(n_samples=3000, centers=centers, cluster_std=0.7) .. GENERATED FROM PYTHON SOURCE LINES 38-40 Compute clustering with KMeans ------------------------------ .. GENERATED FROM PYTHON SOURCE LINES 40-50 .. code-block:: Python import time from sklearn.cluster import KMeans k_means = KMeans(init="k-means++", n_clusters=3, n_init=10) t0 = time.time() k_means.fit(X) t_batch = time.time() - t0 .. GENERATED FROM PYTHON SOURCE LINES 51-53 Compute clustering with MiniBatchKMeans --------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 53-68 .. code-block:: Python from sklearn.cluster import MiniBatchKMeans mbk = MiniBatchKMeans( init="k-means++", n_clusters=3, batch_size=batch_size, n_init=10, max_no_improvement=10, verbose=0, ) t0 = time.time() mbk.fit(X) t_mini_batch = time.time() - t0 .. GENERATED FROM PYTHON SOURCE LINES 69-75 Establishing parity between clusters ------------------------------------ We want to have the same color for the same cluster from both the MiniBatchKMeans and the KMeans algorithm. Let's pair the cluster centers per closest one. .. GENERATED FROM PYTHON SOURCE LINES 75-85 .. code-block:: Python from sklearn.metrics.pairwise import pairwise_distances_argmin k_means_cluster_centers = k_means.cluster_centers_ order = pairwise_distances_argmin(k_means.cluster_centers_, mbk.cluster_centers_) mbk_means_cluster_centers = mbk.cluster_centers_[order] k_means_labels = pairwise_distances_argmin(X, k_means_cluster_centers) mbk_means_labels = pairwise_distances_argmin(X, mbk_means_cluster_centers) .. GENERATED FROM PYTHON SOURCE LINES 86-88 Plotting the results -------------------- .. GENERATED FROM PYTHON SOURCE LINES 88-148 .. code-block:: Python import matplotlib.pyplot as plt fig = plt.figure(figsize=(8, 3)) fig.subplots_adjust(left=0.02, right=0.98, bottom=0.05, top=0.9) colors = ["#4EACC5", "#FF9C34", "#4E9A06"] # KMeans ax = fig.add_subplot(1, 3, 1) for k, col in zip(range(n_clusters), colors): my_members = k_means_labels == k cluster_center = k_means_cluster_centers[k] ax.plot(X[my_members, 0], X[my_members, 1], "w", markerfacecolor=col, marker=".") ax.plot( cluster_center[0], cluster_center[1], "o", markerfacecolor=col, markeredgecolor="k", markersize=6, ) ax.set_title("KMeans") ax.set_xticks(()) ax.set_yticks(()) plt.text(-3.5, 1.8, "train time: %.2fs\ninertia: %f" % (t_batch, k_means.inertia_)) # MiniBatchKMeans ax = fig.add_subplot(1, 3, 2) for k, col in zip(range(n_clusters), colors): my_members = mbk_means_labels == k cluster_center = mbk_means_cluster_centers[k] ax.plot(X[my_members, 0], X[my_members, 1], "w", markerfacecolor=col, marker=".") ax.plot( cluster_center[0], cluster_center[1], "o", markerfacecolor=col, markeredgecolor="k", markersize=6, ) ax.set_title("MiniBatchKMeans") ax.set_xticks(()) ax.set_yticks(()) plt.text(-3.5, 1.8, "train time: %.2fs\ninertia: %f" % (t_mini_batch, mbk.inertia_)) # Initialize the different array to all False different = mbk_means_labels == 4 ax = fig.add_subplot(1, 3, 3) for k in range(n_clusters): different += (k_means_labels == k) != (mbk_means_labels == k) identical = np.logical_not(different) ax.plot(X[identical, 0], X[identical, 1], "w", markerfacecolor="#bbbbbb", marker=".") ax.plot(X[different, 0], X[different, 1], "w", markerfacecolor="m", marker=".") ax.set_title("Difference") ax.set_xticks(()) ax.set_yticks(()) plt.show() .. image-sg:: /auto_examples/cluster/images/sphx_glr_plot_mini_batch_kmeans_001.png :alt: KMeans, MiniBatchKMeans, Difference :srcset: /auto_examples/cluster/images/sphx_glr_plot_mini_batch_kmeans_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.197 seconds) .. _sphx_glr_download_auto_examples_cluster_plot_mini_batch_kmeans.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/cluster/plot_mini_batch_kmeans.ipynb :alt: Launch binder :width: 150 px .. container:: lite-badge .. image:: images/jupyterlite_badge_logo.svg :target: ../../lite/lab/index.html?path=auto_examples/cluster/plot_mini_batch_kmeans.ipynb :alt: Launch JupyterLite :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_mini_batch_kmeans.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_mini_batch_kmeans.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_mini_batch_kmeans.zip ` .. include:: plot_mini_batch_kmeans.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_