Fork me on GitHub


sklearn.metrics.f1_score(y_true, y_pred, labels=None, pos_label=1, average='weighted', sample_weight=None)

Compute the F1 score, also known as balanced F-score or F-measure

The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal. The formula for the F1 score is:

F1 = 2 * (precision * recall) / (precision + recall)

In the multi-class and multi-label case, this is the weighted average of the F1 score of each class.


y_true : array-like or label indicator matrix

Ground truth (correct) target values.

y_pred : array-like or label indicator matrix

Estimated targets as returned by a classifier.

labels : array

Integer array of labels.

pos_label : str or int, 1 by default

If average is not None and the classification target is binary, only this class’s scores will be returned.

average : string, [None, ‘micro’, ‘macro’, ‘samples’, ‘weighted’ (default)]

If None, the scores for each class are returned. Otherwise, unless pos_label is given in binary classification, this determines the type of averaging performed on the data:


Calculate metrics globally by counting the total true positives, false negatives and false positives.


Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.


Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.


Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).

sample_weight : array-like of shape = [n_samples], optional

Sample weights.


f1_score : float or array of float, shape = [n_unique_labels]

F1 score of the positive class in binary classification or weighted average of the F1 scores of each class for the multiclass task.


[R155]Wikipedia entry for the F1-score


>>> from sklearn.metrics import f1_score
>>> y_true = [0, 1, 2, 0, 1, 2]
>>> y_pred = [0, 2, 1, 0, 0, 1]
>>> f1_score(y_true, y_pred, average='macro')  
>>> f1_score(y_true, y_pred, average='micro')  
>>> f1_score(y_true, y_pred, average='weighted')  
>>> f1_score(y_true, y_pred, average=None)
array([ 0.8,  0. ,  0. ])

Examples using sklearn.metrics.f1_score