.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/bicluster/plot_spectral_biclustering.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_bicluster_plot_spectral_biclustering.py>`
        to download the full example code or to run this example in your browser via JupyterLite or Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_bicluster_plot_spectral_biclustering.py:


=============================================
A demo of the Spectral Biclustering algorithm
=============================================

This example demonstrates how to generate a checkerboard dataset and bicluster
it using the :class:`~sklearn.cluster.SpectralBiclustering` algorithm. The
spectral biclustering algorithm is specifically designed to cluster data by
simultaneously considering both the rows (samples) and columns (features) of a
matrix. It aims to identify patterns not only between samples but also within
subsets of samples, allowing for the detection of localized structure within the
data. This makes spectral biclustering particularly well-suited for datasets
where the order or arrangement of features is fixed, such as in images, time
series, or genomes.

The data is generated, then shuffled and passed to the spectral biclustering
algorithm. The rows and columns of the shuffled matrix are then rearranged to
plot the biclusters found.

.. GENERATED FROM PYTHON SOURCE LINES 20-24

.. code-block:: default


    # Author: Kemal Eren <kemal@kemaleren.com>
    # License: BSD 3 clause








.. GENERATED FROM PYTHON SOURCE LINES 25-35

Generate sample data
--------------------
We generate the sample data using the
:func:`~sklearn.datasets.make_checkerboard` function. Each pixel within
`shape=(300, 300)` represents with it's color a value from a uniform
distribution. The noise is added from a normal distribution, where the value
chosen for `noise` is the standard deviation.

As you can see, the data is distributed over 12 cluster cells and is
relatively well distinguishable.

.. GENERATED FROM PYTHON SOURCE LINES 35-48

.. code-block:: default

    from matplotlib import pyplot as plt

    from sklearn.datasets import make_checkerboard

    n_clusters = (4, 3)
    data, rows, columns = make_checkerboard(
        shape=(300, 300), n_clusters=n_clusters, noise=10, shuffle=False, random_state=42
    )

    plt.matshow(data, cmap=plt.cm.Blues)
    plt.title("Original dataset")
    _ = plt.show()




.. image-sg:: /auto_examples/bicluster/images/sphx_glr_plot_spectral_biclustering_001.png
   :alt: Original dataset
   :srcset: /auto_examples/bicluster/images/sphx_glr_plot_spectral_biclustering_001.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 49-51

We shuffle the data and the goal is to reconstruct it afterwards using
:class:`~sklearn.cluster.SpectralBiclustering`.

.. GENERATED FROM PYTHON SOURCE LINES 51-58

.. code-block:: default

    import numpy as np

    # Creating lists of shuffled row and column indices
    rng = np.random.RandomState(0)
    row_idx_shuffled = rng.permutation(data.shape[0])
    col_idx_shuffled = rng.permutation(data.shape[1])








.. GENERATED FROM PYTHON SOURCE LINES 59-61

We redefine the shuffled data and plot it. We observe that we lost the
structure of original data matrix.

.. GENERATED FROM PYTHON SOURCE LINES 61-67

.. code-block:: default

    data = data[row_idx_shuffled][:, col_idx_shuffled]

    plt.matshow(data, cmap=plt.cm.Blues)
    plt.title("Shuffled dataset")
    _ = plt.show()




.. image-sg:: /auto_examples/bicluster/images/sphx_glr_plot_spectral_biclustering_002.png
   :alt: Shuffled dataset
   :srcset: /auto_examples/bicluster/images/sphx_glr_plot_spectral_biclustering_002.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 68-74

Fitting `SpectralBiclustering`
------------------------------
We fit the model and compare the obtained clusters with the ground truth. Note
that when creating the model we specify the same number of clusters that we
used to create the dataset (`n_clusters = (4, 3)`), which will contribute to
obtain a good result.

.. GENERATED FROM PYTHON SOURCE LINES 74-86

.. code-block:: default

    from sklearn.cluster import SpectralBiclustering
    from sklearn.metrics import consensus_score

    model = SpectralBiclustering(n_clusters=n_clusters, method="log", random_state=0)
    model.fit(data)

    # Compute the similarity of two sets of biclusters
    score = consensus_score(
        model.biclusters_, (rows[:, row_idx_shuffled], columns[:, col_idx_shuffled])
    )
    print(f"consensus score: {score:.1f}")





.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    consensus score: 1.0




.. GENERATED FROM PYTHON SOURCE LINES 87-89

The score is between 0 and 1, where 1 corresponds to a perfect matching. It
shows the quality of the biclustering.

.. GENERATED FROM PYTHON SOURCE LINES 91-98

Plotting results
----------------
Now, we rearrange the data based on the row and column labels assigned by the
:class:`~sklearn.cluster.SpectralBiclustering` model in ascending order and
plot again. The `row_labels_` range from 0 to 3, while the `column_labels_`
range from 0 to 2, representing a total of 4 clusters per row and 3 clusters
per column.

.. GENERATED FROM PYTHON SOURCE LINES 98-107

.. code-block:: default


    # Reordering first the rows and then the columns.
    reordered_rows = data[np.argsort(model.row_labels_)]
    reordered_data = reordered_rows[:, np.argsort(model.column_labels_)]

    plt.matshow(reordered_data, cmap=plt.cm.Blues)
    plt.title("After biclustering; rearranged to show biclusters")
    _ = plt.show()




.. image-sg:: /auto_examples/bicluster/images/sphx_glr_plot_spectral_biclustering_003.png
   :alt: After biclustering; rearranged to show biclusters
   :srcset: /auto_examples/bicluster/images/sphx_glr_plot_spectral_biclustering_003.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 108-113

As a last step, we want to demonstrate the relationships between the row
and column labels assigned by the model. Therefore, we create a grid with
:func:`numpy.outer`, which takes the sorted `row_labels_` and `column_labels_`
and adds 1 to each to ensure that the labels start from 1 instead of 0 for
better visualization.

.. GENERATED FROM PYTHON SOURCE LINES 113-120

.. code-block:: default

    plt.matshow(
        np.outer(np.sort(model.row_labels_) + 1, np.sort(model.column_labels_) + 1),
        cmap=plt.cm.Blues,
    )
    plt.title("Checkerboard structure of rearranged data")
    plt.show()




.. image-sg:: /auto_examples/bicluster/images/sphx_glr_plot_spectral_biclustering_004.png
   :alt: Checkerboard structure of rearranged data
   :srcset: /auto_examples/bicluster/images/sphx_glr_plot_spectral_biclustering_004.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 121-124

The outer product of the row and column label vectors shows a representation
of the checkerboard structure, where different combinations of row and column
labels are represented by different shades of blue.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.520 seconds)


.. _sphx_glr_download_auto_examples_bicluster_plot_spectral_biclustering.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.3.X?urlpath=lab/tree/notebooks/auto_examples/bicluster/plot_spectral_biclustering.ipynb
        :alt: Launch binder
        :width: 150 px



    .. container:: lite-badge

      .. image:: images/jupyterlite_badge_logo.svg
        :target: ../../lite/lab/?path=auto_examples/bicluster/plot_spectral_biclustering.ipynb
        :alt: Launch JupyterLite
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_spectral_biclustering.py <plot_spectral_biclustering.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_spectral_biclustering.ipynb <plot_spectral_biclustering.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_