sklearn.cluster.mean_shift

sklearn.cluster.mean_shift(X, bandwidth=None, seeds=None, bin_seeding=False, min_bin_freq=1, cluster_all=True, max_iter=300, max_iterations=None, n_jobs=1)[source]

Perform mean shift clustering of data using a flat kernel.

Read more in the User Guide.

Parameters:

X : array-like, shape=[n_samples, n_features]

Input data.

bandwidth : float, optional

Kernel bandwidth.

If bandwidth is not given, it is determined using a heuristic based on the median of all pairwise distances. This will take quadratic time in the number of samples. The sklearn.cluster.estimate_bandwidth function can be used to do this more efficiently.

seeds : array-like, shape=[n_seeds, n_features] or None

Point used as initial kernel locations. If None and bin_seeding=False, each data point is used as a seed. If None and bin_seeding=True, see bin_seeding.

bin_seeding : boolean, default=False

If true, initial kernel locations are not locations of all points, but rather the location of the discretized version of points, where points are binned onto a grid whose coarseness corresponds to the bandwidth. Setting this option to True will speed up the algorithm because fewer seeds will be initialized. Ignored if seeds argument is not None.

min_bin_freq : int, default=1

To speed up the algorithm, accept only those bins with at least min_bin_freq points as seeds.

cluster_all : boolean, default True

If true, then all points are clustered, even those orphans that are not within any kernel. Orphans are assigned to the nearest kernel. If false, then orphans are given cluster label -1.

max_iter : int, default 300

Maximum number of iterations, per seed point before the clustering operation terminates (for that seed point), if has not converged yet.

n_jobs : int

The number of jobs to use for the computation. This works by computing each of the n_init runs in parallel.

If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.

New in version 0.17: Parallel Execution using n_jobs.

Returns:

cluster_centers : array, shape=[n_clusters, n_features]

Coordinates of cluster centers.

labels : array, shape=[n_samples]

Cluster labels for each point.

Notes

See examples/cluster/plot_meanshift.py for an example.