LeavePGroupsOut#

class sklearn.model_selection.LeavePGroupsOut(n_groups)[source]#

Leave P Group(s) Out cross-validator.

Provides train/test indices to split data according to a third-party provided group. This group information can be used to encode arbitrary domain-specific groupings of the samples as integers.

For instance the groups could be the year of collection of the samples and thus allow for cross-validation against time-based splits.

The difference between LeavePGroupsOut and LeaveOneGroupOut is that the former builds the test sets with all the samples assigned to p different values of the groups while the latter uses samples all assigned the same groups.

Read more in the User Guide.

Parameters:
n_groupsint

Number of groups (p) to leave out in the test split.

See also

GroupKFold

K-fold iterator variant with non-overlapping groups.

Examples

>>> import numpy as np
>>> from sklearn.model_selection import LeavePGroupsOut
>>> X = np.array([[1, 2], [3, 4], [5, 6]])
>>> y = np.array([1, 2, 1])
>>> groups = np.array([1, 2, 3])
>>> lpgo = LeavePGroupsOut(n_groups=2)
>>> lpgo.get_n_splits(groups=groups)
3
>>> print(lpgo)
LeavePGroupsOut(n_groups=2)
>>> for i, (train_index, test_index) in enumerate(lpgo.split(X, y, groups)):
...     print(f"Fold {i}:")
...     print(f"  Train: index={train_index}, group={groups[train_index]}")
...     print(f"  Test:  index={test_index}, group={groups[test_index]}")
Fold 0:
  Train: index=[2], group=[3]
  Test:  index=[0 1], group=[1 2]
Fold 1:
  Train: index=[1], group=[2]
  Test:  index=[0 2], group=[1 3]
Fold 2:
  Train: index=[0], group=[1]
  Test:  index=[1 2], group=[2 3]
get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_n_splits(X=None, y=None, groups=None)[source]#

Returns the number of splitting iterations in the cross-validator.

Parameters:
Xarray-like of shape (n_samples, n_features), default=None

Always ignored, exists for API compatibility.

yarray-like of shape (n_samples,), default=None

Always ignored, exists for API compatibility.

groupsarray-like of shape (n_samples,), default=None

Group labels for the samples used while splitting the dataset into train/test set. This ‘groups’ parameter must always be specified to calculate the number of splits, though the other parameters can be omitted.

Returns:
n_splitsint

Returns the number of splitting iterations in the cross-validator.

set_split_request(*, groups: bool | None | str = '$UNCHANGED$') LeavePGroupsOut[source]#

Configure whether metadata should be requested to be passed to the split method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to split if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to split.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
groupsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for groups parameter in split.

Returns:
selfobject

The updated object.

split(X, y=None, groups=None)[source]#

Generate indices to split data into training and test set.

Parameters:
Xarray-like of shape (n_samples, n_features)

Training data, where n_samples is the number of samples and n_features is the number of features.

yarray-like of shape (n_samples,), default=None

The target variable for supervised learning problems.

groupsarray-like of shape (n_samples,)

Group labels for the samples used while splitting the dataset into train/test set.

Yields:
trainndarray

The training set indices for that split.

testndarray

The testing set indices for that split.