sklearn.manifold
.trustworthiness¶
-
sklearn.manifold.
trustworthiness
(X, X_embedded, n_neighbors=5, metric='euclidean')[source]¶ Expresses to what extent the local structure is retained.
The trustworthiness is within [0, 1]. It is defined as
\[T(k) = 1 - \frac{2}{nk (2n - 3k - 1)} \sum^n_{i=1} \sum_{j \in \mathcal{N}_{i}^{k}} \max(0, (r(i, j) - k))\]where for each sample i, \(\mathcal{N}_{i}^{k}\) are its k nearest neighbors in the output space, and every sample j is its \(r(i, j)\)-th nearest neighbor in the input space. In other words, any unexpected nearest neighbors in the output space are penalised in proportion to their rank in the input space.
“Neighborhood Preservation in Nonlinear Projection Methods: An Experimental Study” J. Venna, S. Kaski
“Learning a Parametric Embedding by Preserving Local Structure” L.J.P. van der Maaten
- Parameters
- Xarray, shape (n_samples, n_features) or (n_samples, n_samples)
If the metric is ‘precomputed’ X must be a square distance matrix. Otherwise it contains a sample per row.
- X_embeddedarray, shape (n_samples, n_components)
Embedding of the training data in low-dimensional space.
- n_neighborsint, optional (default: 5)
Number of neighbors k that will be considered.
- metricstring, or callable, optional, default ‘euclidean’
Which metric to use for computing pairwise distances between samples from the original input space. If metric is ‘precomputed’, X must be a matrix of pairwise distances or squared distances. Otherwise, see the documentation of argument metric in sklearn.pairwise.pairwise_distances for a list of available metrics.
- Returns
- trustworthinessfloat
Trustworthiness of the low-dimensional embedding.