sklearn.metrics.pairwise
.nan_euclidean_distances¶

sklearn.metrics.pairwise.
nan_euclidean_distances
(X, Y=None, squared=False, missing_values=nan, copy=True)[source]¶ Calculate the euclidean distances in the presence of missing values.
Compute the euclidean distance between each pair of samples in X and Y, where Y=X is assumed if Y=None. When calculating the distance between a pair of samples, this formulation ignores feature coordinates with a missing value in either sample and scales up the weight of the remaining coordinates:
dist(x,y) = sqrt(weight * sq. distance from present coordinates) where, weight = Total # of coordinates / # of present coordinates
For example, the distance between
[3, na, na, 6]
and[1, na, 4, 5]
is:\[\sqrt{\frac{4}{2}((31)^2 + (65)^2)}\]If all the coordinates are missing or if there are no common present coordinates then NaN is returned for that pair.
Read more in the User Guide.
New in version 0.22.
 Parameters
 Xarraylike, shape=(n_samples_1, n_features)
 Yarraylike, shape=(n_samples_2, n_features)
 squaredbool, default=False
Return squared Euclidean distances.
 missing_valuesnp.nan or int, default=np.nan
Representation of missing value
 copyboolean, default=True
Make and use a deep copy of X and Y (if Y exists)
 Returns
 distancesarray, shape (n_samples_1, n_samples_2)
See also
paired_distances
distances between pairs of elements of X and Y.
References
John K. Dixon, “Pattern Recognition with Partly Missing Data”, IEEE Transactions on Systems, Man, and Cybernetics, Volume: 9, Issue: 10, pp. 617  621, Oct. 1979. http://ieeexplore.ieee.org/abstract/document/4310090/
Examples
>>> from sklearn.metrics.pairwise import nan_euclidean_distances >>> nan = float("NaN") >>> X = [[0, 1], [1, nan]] >>> nan_euclidean_distances(X, X) # distance between rows of X array([[0. , 1.41421356], [1.41421356, 0. ]])
>>> # get distance to origin >>> nan_euclidean_distances(X, [[0, 0]]) array([[1. ], [1.41421356]])