sklearn.manifold.trustworthiness

sklearn.manifold.trustworthiness(X, X_embedded, n_neighbors=5, metric='euclidean')[source]

Expresses to what extent the local structure is retained.

The trustworthiness is within [0, 1]. It is defined as

\[T(k) = 1 - \frac{2}{nk (2n - 3k - 1)} \sum^n_{i=1} \sum_{j \in \mathcal{N}_{i}^{k}} \max(0, (r(i, j) - k))\]

where for each sample i, \(\mathcal{N}_{i}^{k}\) are its k nearest neighbors in the output space, and every sample j is its \(r(i, j)\)-th nearest neighbor in the input space. In other words, any unexpected nearest neighbors in the output space are penalised in proportion to their rank in the input space.

  • “Neighborhood Preservation in Nonlinear Projection Methods: An Experimental Study” J. Venna, S. Kaski

  • “Learning a Parametric Embedding by Preserving Local Structure” L.J.P. van der Maaten

Parameters
Xarray, shape (n_samples, n_features) or (n_samples, n_samples)

If the metric is ‘precomputed’ X must be a square distance matrix. Otherwise it contains a sample per row.

X_embeddedarray, shape (n_samples, n_components)

Embedding of the training data in low-dimensional space.

n_neighborsint, optional (default: 5)

Number of neighbors k that will be considered.

metricstring, or callable, optional, default ‘euclidean’

Which metric to use for computing pairwise distances between samples from the original input space. If metric is ‘precomputed’, X must be a matrix of pairwise distances or squared distances. Otherwise, see the documentation of argument metric in sklearn.pairwise.pairwise_distances for a list of available metrics.

Returns
trustworthinessfloat

Trustworthiness of the low-dimensional embedding.