sklearn.cluster.spectral_clustering

sklearn.cluster.spectral_clustering(affinity, *, n_clusters=8, n_components=None, eigen_solver=None, random_state=None, n_init=10, eigen_tol=0.0, assign_labels='kmeans', verbose=False)[source]

Apply clustering to a projection of the normalized Laplacian.

In practice Spectral Clustering is very useful when the structure of the individual clusters is highly non-convex or more generally when a measure of the center and spread of the cluster is not a suitable description of the complete cluster. For instance, when clusters are nested circles on the 2D plane.

If affinity is the adjacency matrix of a graph, this method can be used to find normalized graph cuts [1], [2].

Read more in the User Guide.

Parameters
affinity{array-like, sparse matrix} of shape (n_samples, n_samples)

The affinity matrix describing the relationship of the samples to embed. Must be symmetric.

Possible examples:
  • adjacency matrix of a graph,

  • heat kernel of the pairwise distance matrix of the samples,

  • symmetric k-nearest neighbours connectivity matrix of the samples.

n_clustersint, default=None

Number of clusters to extract.

n_componentsint, default=n_clusters

Number of eigenvectors to use for the spectral embedding

eigen_solver{None, ‘arpack’, ‘lobpcg’, or ‘amg’}

The eigenvalue decomposition strategy to use. AMG requires pyamg to be installed. It can be faster on very large, sparse problems, but may also lead to instabilities. If None, then 'arpack' is used. See [4] for more details regarding 'lobpcg'.

random_stateint, RandomState instance, default=None

A pseudo random number generator used for the initialization of the lobpcg eigenvectors decomposition when eigen_solver == 'amg', and for the K-Means initialization. Use an int to make the results deterministic across calls (See Glossary).

Note

When using eigen_solver == 'amg', it is necessary to also fix the global numpy seed with np.random.seed(int) to get deterministic results. See https://github.com/pyamg/pyamg/issues/139 for further information.

n_initint, default=10

Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia. Only used if assign_labels='kmeans'.

eigen_tolfloat, default=0.0

Stopping criterion for eigendecomposition of the Laplacian matrix when using arpack eigen_solver.

assign_labels{‘kmeans’, ‘discretize’}, default=’kmeans’

The strategy to use to assign labels in the embedding space. There are two ways to assign labels after the Laplacian embedding. k-means can be applied and is a popular choice. But it can also be sensitive to initialization. Discretization is another approach which is less sensitive to random initialization [3].

verbosebool, default=False

Verbosity mode.

New in version 0.24.

Returns
labelsarray of integers, shape: n_samples

The labels of the clusters.

Notes

The graph should contain only one connect component, elsewhere the results make little sense.

This algorithm solves the normalized cut for k=2: it is a normalized spectral clustering.

References

1

Normalized cuts and image segmentation, 2000 Jianbo Shi, Jitendra Malik

2

A Tutorial on Spectral Clustering, 2007 Ulrike von Luxburg

3

Multiclass spectral clustering, 2003 Stella X. Yu, Jianbo Shi

4

Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method, 2001. A. V. Knyazev SIAM Journal on Scientific Computing 23, no. 2, pp. 517-541.

Examples using sklearn.cluster.spectral_clustering