sklearn.dummy.DummyClassifier

class sklearn.dummy.DummyClassifier(*, strategy='prior', random_state=None, constant=None)[source]

DummyClassifier is a classifier that makes predictions using simple rules.

This classifier is useful as a simple baseline to compare with other (real) classifiers. Do not use it for real problems.

Read more in the User Guide.

New in version 0.13.

Parameters
strategy{“stratified”, “most_frequent”, “prior”, “uniform”, “constant”}, default=”prior”

Strategy to use to generate predictions.

  • “stratified”: generates predictions by respecting the training set’s class distribution.

  • “most_frequent”: always predicts the most frequent label in the training set.

  • “prior”: always predicts the class that maximizes the class prior (like “most_frequent”) and predict_proba returns the class prior.

  • “uniform”: generates predictions uniformly at random.

  • “constant”: always predicts a constant label that is provided by the user. This is useful for metrics that evaluate a non-majority class

    Changed in version 0.24: The default value of strategy has changed to “prior” in version 0.24.

random_stateint, RandomState instance or None, default=None

Controls the randomness to generate the predictions when strategy='stratified' or strategy='uniform'. Pass an int for reproducible output across multiple function calls. See Glossary.

constantint or str or array-like of shape (n_outputs,)

The explicit constant as predicted by the “constant” strategy. This parameter is useful only for the “constant” strategy.

Attributes
classes_ndarray of shape (n_classes,) or list of such arrays

Class labels for each output.

n_classes_int or list of int

Number of label for each output.

class_prior_ndarray of shape (n_classes,) or list of such arrays

Probability of each class for each output.

n_outputs_int

Number of outputs.

sparse_output_bool

True if the array returned from predict is to be in sparse CSC format. Is automatically set to True if the input y is passed in sparse format.

Examples

>>> import numpy as np
>>> from sklearn.dummy import DummyClassifier
>>> X = np.array([-1, 1, 1, 1])
>>> y = np.array([0, 1, 1, 1])
>>> dummy_clf = DummyClassifier(strategy="most_frequent")
>>> dummy_clf.fit(X, y)
DummyClassifier(strategy='most_frequent')
>>> dummy_clf.predict(X)
array([1, 1, 1, 1])
>>> dummy_clf.score(X, y)
0.75

Methods

fit(X, y[, sample_weight])

Fit the random classifier.

get_params([deep])

Get parameters for this estimator.

predict(X)

Perform classification on test vectors X.

predict_log_proba(X)

Return log probability estimates for the test vectors X.

predict_proba(X)

Return probability estimates for the test vectors X.

score(X, y[, sample_weight])

Returns the mean accuracy on the given test data and labels.

set_params(**params)

Set the parameters of this estimator.

fit(X, y, sample_weight=None)[source]

Fit the random classifier.

Parameters
Xarray-like of shape (n_samples, n_features)

Training data.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

Target values.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns
selfobject
get_params(deep=True)[source]

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsdict

Parameter names mapped to their values.

predict(X)[source]

Perform classification on test vectors X.

Parameters
Xarray-like of shape (n_samples, n_features)

Test data.

Returns
yarray-like of shape (n_samples,) or (n_samples, n_outputs)

Predicted target values for X.

predict_log_proba(X)[source]

Return log probability estimates for the test vectors X.

Parameters
X{array-like, object with finite length or shape}

Training data, requires length = n_samples

Returns
Pndarray of shape (n_samples, n_classes) or list of such arrays

Returns the log probability of the sample for each class in the model, where classes are ordered arithmetically for each output.

predict_proba(X)[source]

Return probability estimates for the test vectors X.

Parameters
Xarray-like of shape (n_samples, n_features)

Test data.

Returns
Pndarray of shape (n_samples, n_classes) or list of such arrays

Returns the probability of the sample for each class in the model, where classes are ordered arithmetically, for each output.

score(X, y, sample_weight=None)[source]

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters
XNone or array-like of shape (n_samples, n_features)

Test samples. Passing None as test samples gives the same result as passing real test samples, since DummyClassifier operates independently of the sampled observations.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns
scorefloat

Mean accuracy of self.predict(X) wrt. y.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfestimator instance

Estimator instance.