.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/cluster/plot_inductive_clustering.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_cluster_plot_inductive_clustering.py>`
        to download the full example code or to run this example in your browser via Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_cluster_plot_inductive_clustering.py:


====================
Inductive Clustering
====================

Clustering can be expensive, especially when our dataset contains millions
of datapoints. Many clustering algorithms are not :term:`inductive` and so
cannot be directly applied to new data samples without recomputing the
clustering, which may be intractable. Instead, we can use clustering to then
learn an inductive model with a classifier, which has several benefits:

- it allows the clusters to scale and apply to new data
- unlike re-fitting the clusters to new samples, it makes sure the labelling
  procedure is consistent over time
- it allows us to use the inferential capabilities of the classifier to
  describe or explain the clusters

This example illustrates a generic implementation of a meta-estimator which
extends clustering by inducing a classifier from the cluster labels.

.. GENERATED FROM PYTHON SOURCE LINES 22-130



.. image-sg:: /auto_examples/cluster/images/sphx_glr_plot_inductive_clustering_001.png
   :alt: Ward Linkage, Unknown instances, Classify unknown instances
   :srcset: /auto_examples/cluster/images/sphx_glr_plot_inductive_clustering_001.png
   :class: sphx-glr-single-img





.. code-block:: default


    # Authors: Chirag Nagpal
    #          Christos Aridas

    import matplotlib.pyplot as plt
    from sklearn.base import BaseEstimator, clone
    from sklearn.cluster import AgglomerativeClustering
    from sklearn.datasets import make_blobs
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.inspection import DecisionBoundaryDisplay
    from sklearn.utils.metaestimators import available_if
    from sklearn.utils.validation import check_is_fitted


    N_SAMPLES = 5000
    RANDOM_STATE = 42


    def _classifier_has(attr):
        """Check if we can delegate a method to the underlying classifier.

        First, we check the first fitted classifier if available, otherwise we
        check the unfitted classifier.
        """
        return lambda estimator: (
            hasattr(estimator.classifier_, attr)
            if hasattr(estimator, "classifier_")
            else hasattr(estimator.classifier, attr)
        )


    class InductiveClusterer(BaseEstimator):
        def __init__(self, clusterer, classifier):
            self.clusterer = clusterer
            self.classifier = classifier

        def fit(self, X, y=None):
            self.clusterer_ = clone(self.clusterer)
            self.classifier_ = clone(self.classifier)
            y = self.clusterer_.fit_predict(X)
            self.classifier_.fit(X, y)
            return self

        @available_if(_classifier_has("predict"))
        def predict(self, X):
            check_is_fitted(self)
            return self.classifier_.predict(X)

        @available_if(_classifier_has("decision_function"))
        def decision_function(self, X):
            check_is_fitted(self)
            return self.classifier_.decision_function(X)


    def plot_scatter(X, color, alpha=0.5):
        return plt.scatter(X[:, 0], X[:, 1], c=color, alpha=alpha, edgecolor="k")


    # Generate some training data from clustering
    X, y = make_blobs(
        n_samples=N_SAMPLES,
        cluster_std=[1.0, 1.0, 0.5],
        centers=[(-5, -5), (0, 0), (5, 5)],
        random_state=RANDOM_STATE,
    )


    # Train a clustering algorithm on the training data and get the cluster labels
    clusterer = AgglomerativeClustering(n_clusters=3)
    cluster_labels = clusterer.fit_predict(X)

    plt.figure(figsize=(12, 4))

    plt.subplot(131)
    plot_scatter(X, cluster_labels)
    plt.title("Ward Linkage")


    # Generate new samples and plot them along with the original dataset
    X_new, y_new = make_blobs(
        n_samples=10, centers=[(-7, -1), (-2, 4), (3, 6)], random_state=RANDOM_STATE
    )

    plt.subplot(132)
    plot_scatter(X, cluster_labels)
    plot_scatter(X_new, "black", 1)
    plt.title("Unknown instances")


    # Declare the inductive learning model that it will be used to
    # predict cluster membership for unknown instances
    classifier = RandomForestClassifier(random_state=RANDOM_STATE)
    inductive_learner = InductiveClusterer(clusterer, classifier).fit(X)

    probable_clusters = inductive_learner.predict(X_new)


    ax = plt.subplot(133)
    plot_scatter(X, cluster_labels)
    plot_scatter(X_new, probable_clusters)

    # Plotting decision regions
    DecisionBoundaryDisplay.from_estimator(
        inductive_learner, X, response_method="predict", alpha=0.4, ax=ax
    )
    plt.title("Classify unknown instances")

    plt.show()


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  1.619 seconds)


.. _sphx_glr_download_auto_examples_cluster_plot_inductive_clustering.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.1.X?urlpath=lab/tree/notebooks/auto_examples/cluster/plot_inductive_clustering.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_inductive_clustering.py <plot_inductive_clustering.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_inductive_clustering.ipynb <plot_inductive_clustering.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_