sklearn.datasets.fetch_kddcup99(*, subset=None, data_home=None, shuffle=False, random_state=None, percent10=True, download_if_missing=True, return_X_y=False)[source]

Load the kddcup99 dataset (classification).

Download it if necessary.



Samples total





discrete (int) or continuous (float)

Read more in the User Guide.

New in version 0.18.

subset{‘SA’, ‘SF’, ‘http’, ‘smtp’}, default=None

To return the corresponding classical subsets of kddcup 99. If None, return the entire kddcup 99 dataset.

data_homestr, default=None

Specify another download and cache folder for the datasets. By default all scikit-learn data is stored in ‘~/scikit_learn_data’ subfolders. .. versionadded:: 0.19

shufflebool, default=False

Whether to shuffle dataset.

random_stateint or RandomState instance, default=None

Determines random number generation for dataset shuffling and for selection of abnormal samples if subset='SA'. Pass an int for reproducible output across multiple function calls. See Glossary.

percent10bool, default=True

Whether to load only 10 percent of the data.

download_if_missingbool, default=True

If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site.

return_X_ybool, default=False

If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

New in version 0.20.


Dictionary-like object, with the following attributes.

datandarray of shape (494021, 41)

The data matrix to learn.

targetndarray of shape (494021,)

The regression target for each sample.


The full description of the dataset.

(data, target)tuple if return_X_y is True

New in version 0.20.