This is documentation for an old release of Scikit-learn (version 0.23). Try the latest stable release (version 1.6) or development (unstable) versions.

`sklearn.datasets`.fetch_20newsgroups_vectorized¶

sklearn.datasets.fetch_20newsgroups_vectorized(*, subset='train', remove=(), data_home=None, download_if_missing=True, return_X_y=False, normalize=True)[source]¶

Load the 20 newsgroups dataset and vectorize it into token counts (classification).

Download it if necessary.

This is a convenience function; the transformation is done using the default settings for sklearn.feature_extraction.text.CountVectorizer. For more advanced usage (stopword filtering, n-gram extraction, etc.), combine fetch_20newsgroups with a custom sklearn.feature_extraction.text.CountVectorizer, sklearn.feature_extraction.text.HashingVectorizer, sklearn.feature_extraction.text.TfidfTransformer or sklearn.feature_extraction.text.TfidfVectorizer.

The resulting counts are normalized using sklearn.preprocessing.normalize unless normalize is set to False.

Classes	20
Samples total	18846
Dimensionality	130107
Features	real

Examples using `sklearn.datasets.fetch_20newsgroups_vectorized`¶

Model Complexity Influence¶

Multiclass sparse logistic regression on 20newgroups¶

The Johnson-Lindenstrauss bound for embedding with random projections¶

sklearn.datasets.fetch_20newsgroups_vectorized¶

Examples using sklearn.datasets.fetch_20newsgroups_vectorized¶

`sklearn.datasets`.fetch_20newsgroups_vectorized¶

Examples using `sklearn.datasets.fetch_20newsgroups_vectorized`¶