`sklearn.cluster`.MiniBatchKMeans¶

class sklearn.cluster.MiniBatchKMeans(n_clusters=8, *, init='k-means++', max_iter=100, batch_size=100, verbose=0, compute_labels=True, random_state=None, tol=0.0, max_no_improvement=10, init_size=None, n_init=3, reassignment_ratio=0.01)[source]¶

Mini-Batch K-Means clustering.

See also

KMeans: The classic implementation of the clustering method based on the Lloyd’s algorithm. It consumes the whole set of input data at each iteration.

Notes

See https://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf

Examples

>>> from sklearn.cluster import MiniBatchKMeans
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
...               [4, 2], [4, 0], [4, 4],
...               [4, 5], [0, 1], [2, 2],
...               [3, 2], [5, 5], [1, -1]])
>>> # manually fit on batches
>>> kmeans = MiniBatchKMeans(n_clusters=2,
...                          random_state=0,
...                          batch_size=6)
>>> kmeans = kmeans.partial_fit(X[0:6,:])
>>> kmeans = kmeans.partial_fit(X[6:12,:])
>>> kmeans.cluster_centers_
array([[2. , 1. ],
       [3.5, 4.5]])
>>> kmeans.predict([[0, 0], [4, 4]])
array([0, 1], dtype=int32)
>>> # fit on the whole data
>>> kmeans = MiniBatchKMeans(n_clusters=2,
...                          random_state=0,
...                          batch_size=6,
...                          max_iter=10).fit(X)
>>> kmeans.cluster_centers_
array([[3.95918367, 2.40816327],
       [1.12195122, 1.3902439 ]])
>>> kmeans.predict([[0, 0], [4, 4]])
array([1, 0], dtype=int32)

Methods

`fit`(X[, y, sample_weight])	Compute the centroids on X by chunking it into mini-batches.
`fit_predict`(X[, y, sample_weight])	Compute cluster centers and predict cluster index for each sample.
`fit_transform`(X[, y, sample_weight])	Compute clustering and transform X to cluster-distance space.
`get_params`([deep])	Get parameters for this estimator.
`partial_fit`(X[, y, sample_weight])	Update k means estimate on a single mini-batch X.
`predict`(X[, sample_weight])	Predict the closest cluster each sample in X belongs to.
`score`(X[, y, sample_weight])	Opposite of the value of X on the K-means objective.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(X)	Transform X to a cluster-distance space.

__init__(n_clusters=8, *, init='k-means++', max_iter=100, batch_size=100, verbose=0, compute_labels=True, random_state=None, tol=0.0, max_no_improvement=10, init_size=None, n_init=3, reassignment_ratio=0.01)[source]¶: Initialize self. See help(type(self)) for accurate signature.

fit(X, y=None, sample_weight=None)[source]¶

Compute the centroids on X by chunking it into mini-batches.

Parameters

Xarray-like or sparse matrix, shape=(n_samples, n_features): Training instances to cluster. It must be noted that the data will be converted to C ordering, which will cause a memory copy if the given data is not C-contiguous.
yIgnored: Not used, present here for API consistency by convention.
sample_weightarray-like, shape (n_samples,), optional: The weights for each observation in X. If None, all observations are assigned equal weight (default: None).

New in version 0.20.

Returns

self

fit_predict(X, y=None, sample_weight=None)[source]¶

Compute cluster centers and predict cluster index for each sample.

Convenience method; equivalent to calling fit(X) followed by predict(X).

Parameters

X{array-like, sparse matrix} of shape (n_samples, n_features): New data to transform.
yIgnored: Not used, present here for API consistency by convention.
sample_weightarray-like of shape (n_samples,), default=None: The weights for each observation in X. If None, all observations are assigned equal weight.

Returns

labelsndarray of shape (n_samples,): Index of the cluster each sample belongs to.

fit_transform(X, y=None, sample_weight=None)[source]¶

Compute clustering and transform X to cluster-distance space.

Equivalent to fit(X).transform(X), but more efficiently implemented.

Parameters

X{array-like, sparse matrix} of shape (n_samples, n_features): New data to transform.
yIgnored: Not used, present here for API consistency by convention.
sample_weightarray-like of shape (n_samples,), default=None: The weights for each observation in X. If None, all observations are assigned equal weight.

Returns

X_newarray of shape (n_samples, n_clusters): X transformed in the new space.

get_params(deep=True)[source]¶

Get parameters for this estimator.

Parameters

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsmapping of string to any: Parameter names mapped to their values.

partial_fit(X, y=None, sample_weight=None)[source]¶

Update k means estimate on a single mini-batch X.

Parameters

Xarray-like of shape (n_samples, n_features): Coordinates of the data points to cluster. It must be noted that X will be copied if it is not C-contiguous.
yIgnored: Not used, present here for API consistency by convention.
sample_weightarray-like, shape (n_samples,), optional: The weights for each observation in X. If None, all observations are assigned equal weight (default: None).

Returns

self

predict(X, sample_weight=None)[source]¶

Predict the closest cluster each sample in X belongs to.

In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.

Parameters

X{array-like, sparse matrix} of shape (n_samples, n_features): New data to predict.
sample_weightarray-like, shape (n_samples,), optional: The weights for each observation in X. If None, all observations are assigned equal weight (default: None).

Returns

labelsarray, shape [n_samples,]: Index of the cluster each sample belongs to.

score(X, y=None, sample_weight=None)[source]¶

Opposite of the value of X on the K-means objective.

Parameters

X{array-like, sparse matrix} of shape (n_samples, n_features): New data.
yIgnored: Not used, present here for API consistency by convention.
sample_weightarray-like of shape (n_samples,), default=None: The weights for each observation in X. If None, all observations are assigned equal weight.

Returns

scorefloat: Opposite of the value of X on the K-means objective.

set_params(**params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**paramsdict: Estimator parameters.

Returns

selfobject: Estimator instance.

transform(X)[source]¶

Transform X to a cluster-distance space.

In the new space, each dimension is the distance to the cluster centers. Note that even if X is sparse, the array returned by transform will typically be dense.

Parameters

X{array-like, sparse matrix} of shape (n_samples, n_features): New data to transform.

Returns

X_newndarray of shape (n_samples, n_clusters): X transformed in the new space.

Examples using `sklearn.cluster.MiniBatchKMeans`¶

Biclustering documents with the Spectral Co-clustering algorithm¶

Online learning of a dictionary of parts of faces¶

Compare BIRCH and MiniBatchKMeans¶

Empirical evaluation of the impact of k-means initialization¶

Comparison of the K-Means and MiniBatchKMeans clustering algorithms¶

Comparing different clustering algorithms on toy datasets¶

Faces dataset decompositions¶

Clustering text documents using k-means¶

sklearn.cluster.MiniBatchKMeans¶

Examples using sklearn.cluster.MiniBatchKMeans¶

`sklearn.cluster`.MiniBatchKMeans¶

Examples using `sklearn.cluster.MiniBatchKMeans`¶