sklearn.cross_validation.LabelKFold

class sklearn.cross_validation.LabelKFold(labels, n_folds=3)[source]

K-fold iterator variant with non-overlapping labels.

The same label will not appear in two different folds (the number of distinct labels has to be at least equal to the number of folds).

The folds are approximately balanced in the sense that the number of distinct labels is approximately the same in each fold.

New in version 0.17.

Parameters:

labels : array-like with shape (n_samples, )

Contains a label for each sample. The folds are built so that the same label does not appear in two different folds.

n_folds : int, default=3

Number of folds. Must be at least 2.

See also

LeaveOneLabelOut, domain-specific

Examples

>>> from sklearn.cross_validation import LabelKFold
>>> X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
>>> y = np.array([1, 2, 3, 4])
>>> labels = np.array([0, 0, 2, 2])
>>> label_kfold = LabelKFold(labels, n_folds=2)
>>> len(label_kfold)
2
>>> print(label_kfold)
sklearn.cross_validation.LabelKFold(n_labels=4, n_folds=2)
>>> for train_index, test_index in label_kfold:
...     print("TRAIN:", train_index, "TEST:", test_index)
...     X_train, X_test = X[train_index], X[test_index]
...     y_train, y_test = y[train_index], y[test_index]
...     print(X_train, X_test, y_train, y_test)
...
TRAIN: [0 1] TEST: [2 3]
[[1 2]
 [3 4]] [[5 6]
 [7 8]] [1 2] [3 4]
TRAIN: [2 3] TEST: [0 1]
[[5 6]
 [7 8]] [[1 2]
 [3 4]] [3 4] [1 2]
.. automethod:: __init__