sklearn.datasets.fetch_kddcup99

sklearn.datasets.fetch_kddcup99(*, subset=None, data_home=None, shuffle=False, random_state=None, percent10=True, download_if_missing=True, return_X_y=False, as_frame=False)[source]

Load the kddcup99 dataset (classification).

Download it if necessary.

Classes

23

Samples total

4898431

Dimensionality

41

Features

discrete (int) or continuous (float)

Read more in the User Guide.

New in version 0.18.

Parameters:
subset{‘SA’, ‘SF’, ‘http’, ‘smtp’}, default=None

To return the corresponding classical subsets of kddcup 99. If None, return the entire kddcup 99 dataset.

data_homestr or path-like, default=None

Specify another download and cache folder for the datasets. By default all scikit-learn data is stored in ‘~/scikit_learn_data’ subfolders.

New in version 0.19.

shufflebool, default=False

Whether to shuffle dataset.

random_stateint, RandomState instance or None, default=None

Determines random number generation for dataset shuffling and for selection of abnormal samples if subset='SA'. Pass an int for reproducible output across multiple function calls. See Glossary.

percent10bool, default=True

Whether to load only 10 percent of the data.

download_if_missingbool, default=True

If False, raise an OSError if the data is not locally available instead of trying to download the data from the source site.

return_X_ybool, default=False

If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

New in version 0.20.

as_framebool, default=False

If True, returns a pandas Dataframe for the data and target objects in the Bunch returned object; Bunch return object will also have a frame member.

New in version 0.24.

Returns:
dataBunch

Dictionary-like object, with the following attributes.

data{ndarray, dataframe} of shape (494021, 41)

The data matrix to learn. If as_frame=True, data will be a pandas DataFrame.

target{ndarray, series} of shape (494021,)

The regression target for each sample. If as_frame=True, target will be a pandas Series.

framedataframe of shape (494021, 42)

Only present when as_frame=True. Contains data and target.

DESCRstr

The full description of the dataset.

feature_nameslist

The names of the dataset columns

target_names: list

The names of the target columns

(data, target)tuple if return_X_y is True

A tuple of two ndarray. The first containing a 2D array of shape (n_samples, n_features) with each row representing one sample and each column representing the features. The second ndarray of shape (n_samples,) containing the target samples.

New in version 0.20.

Examples using sklearn.datasets.fetch_kddcup99

Evaluation of outlier detection estimators

Evaluation of outlier detection estimators