`sklearn.cluster`.KMeans¶

class sklearn.cluster.KMeans(n_clusters=8, init=’k-means++’, n_init=10, max_iter=300, tol=0.0001, precompute_distances=’auto’, verbose=0, random_state=None, copy_x=True, n_jobs=1, algorithm=’auto’)[source]¶

K-Means clustering

See also

MiniBatchKMeans: Alternative online implementation that does incremental updates of the centers positions using mini-batches. For large scale learning (say n_samples > 10k) MiniBatchKMeans is probably much faster than the default batch implementation.

Notes

The k-means problem is solved using Lloyd’s algorithm.

The average complexity is given by O(k n T), were n is the number of samples and T is the number of iteration.

The worst case complexity is given by O(n^(k+2/p)) with n = n_samples, p = n_features. (D. Arthur and S. Vassilvitskii, ‘How slow is the k-means method?’ SoCG2006)

In practice, the k-means algorithm is very fast (one of the fastest clustering algorithms available), but it falls in local minima. That’s why it can be useful to restart it several times.

Examples

>>> from sklearn.cluster import KMeans
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
...               [4, 2], [4, 4], [4, 0]])
>>> kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
>>> kmeans.labels_
array([0, 0, 0, 1, 1, 1], dtype=int32)
>>> kmeans.predict([[0, 0], [4, 4]])
array([0, 1], dtype=int32)
>>> kmeans.cluster_centers_
array([[ 1.,  2.],
       [ 4.,  2.]])

Methods

`fit`(X[, y])	Compute k-means clustering.
`fit_predict`(X[, y])	Compute cluster centers and predict cluster index for each sample.
`fit_transform`(X[, y])	Compute clustering and transform X to cluster-distance space.
`get_params`([deep])	Get parameters for this estimator.
`predict`(X)	Predict the closest cluster each sample in X belongs to.
`score`(X[, y])	Opposite of the value of X on the K-means objective.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(X)	Transform X to a cluster-distance space.

__init__(n_clusters=8, init=’k-means++’, n_init=10, max_iter=300, tol=0.0001, precompute_distances=’auto’, verbose=0, random_state=None, copy_x=True, n_jobs=1, algorithm=’auto’)[source]¶

fit(X, y=None)[source]¶

Compute k-means clustering.

Parameters:

X : array-like or sparse matrix, shape=(n_samples, n_features)

Training instances to cluster.

y : Ignored

fit_predict(X, y=None)[source]¶

Compute cluster centers and predict cluster index for each sample.

Convenience method; equivalent to calling fit(X) followed by predict(X).

Parameters:

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

New data to transform.

u : Ignored

Returns:

labels : array, shape [n_samples,]

Index of the cluster each sample belongs to.

fit_transform(X, y=None)[source]¶

Compute clustering and transform X to cluster-distance space.

Equivalent to fit(X).transform(X), but more efficiently implemented.

Parameters:

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

New data to transform.

y : Ignored

Returns:

X_new : array, shape [n_samples, k]

X transformed in the new space.

get_params(deep=True)[source]¶

Get parameters for this estimator.

Parameters:

deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

predict(X)[source]¶

Predict the closest cluster each sample in X belongs to.

In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.

Parameters:

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

New data to predict.

Returns:

labels : array, shape [n_samples,]

Index of the cluster each sample belongs to.

score(X, y=None)[source]¶

Opposite of the value of X on the K-means objective.

Parameters:

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

New data.

y : Ignored

Returns:

score : float

Opposite of the value of X on the K-means objective.

set_params(**params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:	self :

transform(X)[source]¶

Transform X to a cluster-distance space.

In the new space, each dimension is the distance to the cluster centers. Note that even if X is sparse, the array returned by transform will typically be dense.

Parameters:

X : {array-like, sparse matrix}, shape = [n_samples, n_features]

New data to transform.

Returns:

X_new : array, shape [n_samples, k]

X transformed in the new space.

Examples using `sklearn.cluster.KMeans`¶

K-means Clustering

Color Quantization using K-Means

Vector Quantization Example

Demonstration of k-means assumptions

A demo of K-Means clustering on the handwritten digits data

Selecting the number of clusters with silhouette analysis on KMeans clustering

Empirical evaluation of the impact of k-means initialization

Comparison of the K-Means and MiniBatchKMeans clustering algorithms

Clustering text documents using k-means

sklearn.cluster.KMeans¶

Examples using sklearn.cluster.KMeans¶

`sklearn.cluster`.KMeans¶

Examples using `sklearn.cluster.KMeans`¶