sklearn.utils.resample

sklearn.utils.resample(*arrays, **options)[source]

Resample arrays or sparse matrices in a consistent way

The default strategy implements one step of the bootstrapping procedure.

Parameters:

*arrays : sequence of indexable data-structures

Indexable data-structures can be arrays, lists, dataframes or scipy sparse matrices with consistent first dimension.

replace : boolean, True by default

Implements resampling with replacement. If False, this will implement (sliced) random permutations.

n_samples : int, None by default

Number of samples to generate. If left to None this is automatically set to the first dimension of the arrays.

random_state : int or RandomState instance

Control the shuffling for reproducible behavior.

Returns:

resampled_arrays : sequence of indexable data-structures

Sequence of resampled views of the collections. The original arrays are not impacted.

Examples

It is possible to mix sparse and dense arrays in the same run:

>>> X = np.array([[1., 0.], [2., 1.], [0., 0.]])
>>> y = np.array([0, 1, 2])

>>> from scipy.sparse import coo_matrix
>>> X_sparse = coo_matrix(X)

>>> from sklearn.utils import resample
>>> X, X_sparse, y = resample(X, X_sparse, y, random_state=0)
>>> X
array([[ 1.,  0.],
       [ 2.,  1.],
       [ 1.,  0.]])

>>> X_sparse                   
<3x2 sparse matrix of type '<... 'numpy.float64'>'
    with 4 stored elements in Compressed Sparse Row format>

>>> X_sparse.toarray()
array([[ 1.,  0.],
       [ 2.,  1.],
       [ 1.,  0.]])

>>> y
array([0, 1, 0])

>>> resample(y, n_samples=2, random_state=0)
array([0, 1])