jaccard_score#

sklearn.metrics.jaccard_score(y_true, y_pred, *, labels=None, pos_label=1, average='binary', sample_weight=None, zero_division='warn')[source]#

Jaccard similarity coefficient score.

The Jaccard index [1], or Jaccard similarity coefficient, defined as the size of the intersection divided by the size of the union of two label sets, is used to compare set of predicted labels for a sample to the corresponding set of labels in y_true.

Support beyond binary targets is achieved by treating multiclass and multilabel data as a collection of binary problems, one for each label. For the binary case, setting average='binary' will return the Jaccard similarity coefficient for pos_label. If average is not 'binary', pos_label is ignored and scores for both classes are computed, then averaged or both returned (when average=None). Similarly, for multiclass and multilabel targets, scores for all labels are either returned or averaged depending on the average parameter. Use labels specify the set of labels to calculate the score for.

See also

accuracy_score: Function for calculating the accuracy score.
f1_score: Function for calculating the F1 score.
multilabel_confusion_matrix: Function for computing a confusion matrix for each class or sample.

Notes

jaccard_score may be a poor metric if there are no positives for some samples or classes. Jaccard is undefined if there are no true or predicted labels, and our implementation will return a score of 0 with a warning.

References

[1]

Wikipedia entry for the Jaccard index.

Examples

>>> import numpy as np
>>> from sklearn.metrics import jaccard_score
>>> y_true = np.array([[0, 1, 1],
...                    [1, 1, 0]])
>>> y_pred = np.array([[1, 1, 1],
...                    [1, 0, 0]])

In the binary case:

>>> jaccard_score(y_true[0], y_pred[0])
0.6666

In the 2D comparison case (e.g. image similarity):

>>> jaccard_score(y_true, y_pred, average="micro")
0.6

In the multilabel case:

>>> jaccard_score(y_true, y_pred, average='samples')
0.5833
>>> jaccard_score(y_true, y_pred, average='macro')
0.6666
>>> jaccard_score(y_true, y_pred, average=None)
array([0.5, 0.5, 1. ])

In the multiclass case:

>>> y_pred = [0, 2, 1, 2]
>>> y_true = [0, 1, 2, 2]
>>> jaccard_score(y_true, y_pred, average=None)
array([1. , 0. , 0.33])

Gallery examples#

Multilabel classification using a classifier chain