cohen_kappa_score#

sklearn.metrics.cohen_kappa_score(y1, y2, *, labels=None, weights=None, sample_weight=None, replace_undefined_by=nan)[source]#

Compute Cohen’s kappa: a statistic that measures inter-annotator agreement.

This function computes Cohen’s kappa [1], a score that expresses the level of agreement between two annotators on a classification problem. It is defined as

\[\kappa = (p_o - p_e) / (1 - p_e)\]

where \(p_o\) is the empirical probability of agreement on the label assigned to any sample (the observed agreement ratio), and \(p_e\) is the expected agreement when both annotators assign labels randomly. \(p_e\) is estimated using a per-annotator empirical prior over the class labels [2].

Read more in the User Guide.

Parameters:

y1array-like of shape (n_samples,)

Labels assigned by the first annotator.

y2array-like of shape (n_samples,)

Labels assigned by the second annotator. The kappa statistic is symmetric, so swapping y1 and y2 doesn’t change the value.

labelsarray-like of shape (n_classes,), default=None

List of labels to index the matrix. This may be used to select a subset of labels. If None, all labels that appear at least once in y1 or y2 are used. Note that at least one label in labels must be present in y1, even though this function is otherwise agnostic to the order of y1 and y2.

weights{‘linear’, ‘quadratic’}, default=None

Weighting type to calculate the score. None means not weighted; “linear” means linear weighting; “quadratic” means quadratic weighting.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

replace_undefined_bynp.nan, float in [-1.0, 1.0], default=np.nan

Sets the return value when the metric is undefined. This can happen when no label of interest (as defined in the labels param) is assigned by the second annotator, or when both y1 and y2`only have one label in common that is also in `labels. In these cases, an UndefinedMetricWarning is raised. Can take the following values:

np.nan to return np.nan
a floating point value in the range of [-1.0, 1.0] to return a specific value

Added in version 1.9.

Returns:

kappafloat: The kappa statistic, which is a number between -1.0 and 1.0. The maximum value means complete agreement; the minimum value means complete disagreement; 0.0 indicates no agreement beyond what would be expected by chance.

References

[1]

J. Cohen (1960). “A coefficient of agreement for nominal scales”. Educational and Psychological Measurement 20(1):37-46.

[2]

R. Artstein and M. Poesio (2008). “Inter-coder agreement for computational linguistics”. Computational Linguistics 34(4):555-596.

[3]

Wikipedia entry for the Cohen’s kappa.

Examples

>>> from sklearn.metrics import cohen_kappa_score
>>> y1 = ["negative", "positive", "negative", "neutral", "positive"]
>>> y2 = ["negative", "positive", "negative", "neutral", "negative"]
>>> cohen_kappa_score(y1, y2)
0.6875