This is documentation for an old release of Scikit-learn (version 0.24). Try the latest stable release (version 1.6) or development (unstable) versions.

sklearn.metrics.cohen_kappa_score

sklearn.metrics.cohen_kappa_score(y1, y2, *, labels=None, weights=None, sample_weight=None)[source]

Cohen’s kappa: a statistic that measures inter-annotator agreement.

This function computes Cohen’s kappa [1], a score that expresses the level of agreement between two annotators on a classification problem. It is defined as

κ=(pope)/(1pe)

where po is the empirical probability of agreement on the label assigned to any sample (the observed agreement ratio), and pe is the expected agreement when both annotators assign labels randomly. pe is estimated using a per-annotator empirical prior over the class labels [2].

Read more in the User Guide.

Parameters
y1array of shape (n_samples,)

Labels assigned by the first annotator.

y2array of shape (n_samples,)

Labels assigned by the second annotator. The kappa statistic is symmetric, so swapping y1 and y2 doesn’t change the value.

labelsarray-like of shape (n_classes,), default=None

List of labels to index the matrix. This may be used to select a subset of labels. If None, all labels that appear at least once in y1 or y2 are used.

weights{‘linear’, ‘quadratic’}, default=None

Weighting type to calculate the score. None means no weighted; “linear” means linear weighted; “quadratic” means quadratic weighted.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns
kappafloat

The kappa statistic, which is a number between -1 and 1. The maximum value means complete agreement; zero or lower means chance agreement.

References

1

J. Cohen (1960). “A coefficient of agreement for nominal scales”. Educational and Psychological Measurement 20(1):37-46. doi:10.1177/001316446002000104.

2

R. Artstein and M. Poesio (2008). “Inter-coder agreement for computational linguistics”. Computational Linguistics 34(4):555-596.

3

Wikipedia entry for the Cohen’s kappa.