SpectralCoclustering#

class sklearn.cluster.SpectralCoclustering(n_clusters=3, *, svd_method='randomized', n_svd_vecs=None, mini_batch=False, init='k-means++', n_init=10, random_state=None)[source]#

Spectral Co-Clustering algorithm (Dhillon, 2001) [1].

Clusters rows and columns of an array X to solve the relaxed normalized cut of the bipartite graph created from X as follows: the edge between row vertex i and column vertex j has weight X[i, j].

The resulting bicluster structure is block-diagonal, since each row and each column belongs to exactly one bicluster.

Supports sparse matrices, as long as they are nonnegative.

See also

SpectralBiclustering: Partitions rows and columns under the assumption that the data has an underlying checkerboard structure.

References

[1]

Dhillon, Inderjit S, 2001. Co-clustering documents and words using bipartite spectral graph partitioning.

Examples

>>> from sklearn.cluster import SpectralCoclustering
>>> import numpy as np
>>> X = np.array([[1, 1], [2, 1], [1, 0],
...               [4, 7], [3, 5], [3, 6]])
>>> clustering = SpectralCoclustering(n_clusters=2, random_state=0).fit(X)
>>> clustering.row_labels_
array([0, 1, 1, 0, 0, 0], dtype=int32)
>>> clustering.column_labels_
array([0, 0], dtype=int32)
>>> clustering
SpectralCoclustering(n_clusters=2, random_state=0)

For a more detailed example, see the following: A demo of the Spectral Co-Clustering algorithm.

fit(X, y=None)[source]#

Create a biclustering for X.

Parameters:

Xarray-like of shape (n_samples, n_features): Training data.
yIgnored: Not used, present for API consistency by convention.

Returns:

selfobject: SpectralBiclustering instance.

get_indices(i)[source]#

Row and column indices of the i’th bicluster.

Only works if rows_ and columns_ attributes exist.

Parameters:

iint: The index of the cluster.

Returns:

row_indndarray, dtype=np.intp: Indices of rows in the dataset that belong to the bicluster.
col_indndarray, dtype=np.intp: Indices of columns in the dataset that belong to the bicluster.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

get_shape(i)[source]#

Shape of the i’th bicluster.

Parameters:

iint: The index of the cluster.

Returns:

n_rowsint: Number of rows in the bicluster.
n_colsint: Number of columns in the bicluster.

get_submatrix(i, data)[source]#

Return the submatrix corresponding to bicluster i.

Parameters:

iint: The index of the cluster.
dataarray-like of shape (n_samples, n_features): The data.

Returns:

submatrixndarray of shape (n_rows, n_cols): The submatrix corresponding to bicluster i.

Notes

Works with sparse matrices. Only works if rows_ and columns_ attributes exist.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

Gallery examples#

Biclustering documents with the Spectral Co-clustering algorithm

A demo of the Spectral Co-Clustering algorithm