estimate_bandwidth#
- sklearn.cluster.estimate_bandwidth(X, *, quantile=0.3, n_samples=None, random_state=0, n_jobs=None)[source]#
Estimate the bandwidth to use with the mean-shift algorithm.
This function takes time at least quadratic in
n_samples
. For large datasets, it is wise to subsample by settingn_samples
. Alternatively, the parameterbandwidth
can be set to a small value without estimating it.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input points.
- quantilefloat, default=0.3
Should be between [0, 1] 0.5 means that the median of all pairwise distances is used.
- n_samplesint, default=None
The number of samples to use. If not given, all samples are used.
- random_stateint, RandomState instance, default=None
The generator used to randomly select the samples from input points for bandwidth estimation. Use an int to make the randomness deterministic. See Glossary.
- n_jobsint, default=None
The number of parallel jobs to run for neighbors search.
None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors. See Glossary for more details.
- Returns:
- bandwidthfloat
The bandwidth parameter.
Examples
>>> import numpy as np >>> from sklearn.cluster import estimate_bandwidth >>> X = np.array([[1, 1], [2, 1], [1, 0], ... [4, 7], [3, 5], [3, 6]]) >>> estimate_bandwidth(X, quantile=0.5) 1.61...
Gallery examples#
A demo of the mean-shift clustering algorithm
Comparing different clustering algorithms on toy datasets