sklearn.dummy
.DummyClassifier¶
-
class
sklearn.dummy.
DummyClassifier
(strategy='warn', random_state=None, constant=None)[source]¶ DummyClassifier is a classifier that makes predictions using simple rules.
This classifier is useful as a simple baseline to compare with other (real) classifiers. Do not use it for real problems.
Read more in the User Guide.
New in version 0.13.
- Parameters
- strategystr, default=”stratified”
Strategy to use to generate predictions.
“stratified”: generates predictions by respecting the training set’s class distribution.
“most_frequent”: always predicts the most frequent label in the training set.
“prior”: always predicts the class that maximizes the class prior (like “most_frequent”) and
predict_proba
returns the class prior.“uniform”: generates predictions uniformly at random.
“constant”: always predicts a constant label that is provided by the user. This is useful for metrics that evaluate a non-majority class
Changed in version 0.22: The default value of
strategy
will change to “prior” in version 0.24. Starting from version 0.22, a warning will be raised ifstrategy
is not explicitly set.New in version 0.17: Dummy Classifier now supports prior fitting strategy using parameter prior.
- random_stateint, RandomState instance or None, optional, default=None
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by
np.random
.- constantint or str or array-like of shape (n_outputs,)
The explicit constant as predicted by the “constant” strategy. This parameter is useful only for the “constant” strategy.
- Attributes
- classes_array or list of array of shape (n_classes,)
Class labels for each output.
- n_classes_array or list of array of shape (n_classes,)
Number of label for each output.
- class_prior_array or list of array of shape (n_classes,)
Probability of each class for each output.
- n_outputs_int,
Number of outputs.
- sparse_output_bool,
True if the array returned from predict is to be in sparse CSC format. Is automatically set to True if the input y is passed in sparse format.
Examples
>>> import numpy as np >>> from sklearn.dummy import DummyClassifier >>> X = np.array([-1, 1, 1, 1]) >>> y = np.array([0, 1, 1, 1]) >>> dummy_clf = DummyClassifier(strategy="most_frequent") >>> dummy_clf.fit(X, y) DummyClassifier(strategy='most_frequent') >>> dummy_clf.predict(X) array([1, 1, 1, 1]) >>> dummy_clf.score(X, y) 0.75
Methods
fit
(self, X, y[, sample_weight])Fit the random classifier.
get_params
(self[, deep])Get parameters for this estimator.
predict
(self, X)Perform classification on test vectors X.
predict_log_proba
(self, X)Return log probability estimates for the test vectors X.
predict_proba
(self, X)Return probability estimates for the test vectors X.
score
(self, X, y[, sample_weight])Returns the mean accuracy on the given test data and labels.
set_params
(self, \*\*params)Set the parameters of this estimator.
-
__init__
(self, strategy='warn', random_state=None, constant=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
fit
(self, X, y, sample_weight=None)[source]¶ Fit the random classifier.
- Parameters
- X{array-like, object with finite length or shape}
Training data, requires length = n_samples
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
Target values.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns
- selfobject
-
get_params
(self, deep=True)[source]¶ Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsmapping of string to any
Parameter names mapped to their values.
-
predict
(self, X)[source]¶ Perform classification on test vectors X.
- Parameters
- X{array-like, object with finite length or shape}
Training data, requires length = n_samples
- Returns
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
Predicted target values for X.
-
predict_log_proba
(self, X)[source]¶ Return log probability estimates for the test vectors X.
- Parameters
- X{array-like, object with finite length or shape}
Training data, requires length = n_samples
- Returns
- Parray-like or list of array-like of shape (n_samples, n_classes)
Returns the log probability of the sample for each class in the model, where classes are ordered arithmetically for each output.
-
predict_proba
(self, X)[source]¶ Return probability estimates for the test vectors X.
- Parameters
- X{array-like, object with finite length or shape}
Training data, requires length = n_samples
- Returns
- Parray-like or list of array-lke of shape (n_samples, n_classes)
Returns the probability of the sample for each class in the model, where classes are ordered arithmetically, for each output.
-
score
(self, X, y, sample_weight=None)[source]¶ Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters
- X{array-like, None}
Test samples with shape = (n_samples, n_features) or None. Passing None as test samples gives the same result as passing real test samples, since DummyClassifier operates independently of the sampled observations.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns
- scorefloat
Mean accuracy of self.predict(X) wrt. y.
-
set_params
(self, **params)[source]¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfobject
Estimator instance.