sklearn.dummy
.DummyClassifier¶
- class sklearn.dummy.DummyClassifier(*, strategy='prior', random_state=None, constant=None)[source]¶
DummyClassifier is a classifier that makes predictions using simple rules.
This classifier is useful as a simple baseline to compare with other (real) classifiers. Do not use it for real problems.
Read more in the User Guide.
New in version 0.13.
- Parameters
- strategy{“stratified”, “most_frequent”, “prior”, “uniform”, “constant”}, default=”prior”
Strategy to use to generate predictions.
“stratified”: generates predictions by respecting the training set’s class distribution.
“most_frequent”: always predicts the most frequent label in the training set.
“prior”: always predicts the class that maximizes the class prior (like “most_frequent”) and
predict_proba
returns the class prior.“uniform”: generates predictions uniformly at random.
“constant”: always predicts a constant label that is provided by the user. This is useful for metrics that evaluate a non-majority class
Changed in version 0.24: The default value of
strategy
has changed to “prior” in version 0.24.
- random_stateint, RandomState instance or None, default=None
Controls the randomness to generate the predictions when
strategy='stratified'
orstrategy='uniform'
. Pass an int for reproducible output across multiple function calls. See Glossary.- constantint or str or array-like of shape (n_outputs,)
The explicit constant as predicted by the “constant” strategy. This parameter is useful only for the “constant” strategy.
- Attributes
- classes_ndarray of shape (n_classes,) or list of such arrays
Class labels for each output.
- n_classes_int or list of int
Number of label for each output.
- class_prior_ndarray of shape (n_classes,) or list of such arrays
Probability of each class for each output.
- n_outputs_int
Number of outputs.
- n_features_in_int
Number of features seen during fit.
New in version 0.24.
- sparse_output_bool
True if the array returned from predict is to be in sparse CSC format. Is automatically set to True if the input y is passed in sparse format.
Examples
>>> import numpy as np >>> from sklearn.dummy import DummyClassifier >>> X = np.array([-1, 1, 1, 1]) >>> y = np.array([0, 1, 1, 1]) >>> dummy_clf = DummyClassifier(strategy="most_frequent") >>> dummy_clf.fit(X, y) DummyClassifier(strategy='most_frequent') >>> dummy_clf.predict(X) array([1, 1, 1, 1]) >>> dummy_clf.score(X, y) 0.75
Methods
fit
(X, y[, sample_weight])Fit the random classifier.
get_params
([deep])Get parameters for this estimator.
predict
(X)Perform classification on test vectors X.
Return log probability estimates for the test vectors X.
Return probability estimates for the test vectors X.
score
(X, y[, sample_weight])Returns the mean accuracy on the given test data and labels.
set_params
(**params)Set the parameters of this estimator.
- fit(X, y, sample_weight=None)[source]¶
Fit the random classifier.
- Parameters
- Xarray-like of shape (n_samples, n_features)
Training data.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
Target values.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns
- selfobject
- get_params(deep=True)[source]¶
Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsdict
Parameter names mapped to their values.
- predict(X)[source]¶
Perform classification on test vectors X.
- Parameters
- Xarray-like of shape (n_samples, n_features)
Test data.
- Returns
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
Predicted target values for X.
- predict_log_proba(X)[source]¶
Return log probability estimates for the test vectors X.
- Parameters
- X{array-like, object with finite length or shape}
Training data, requires length = n_samples
- Returns
- Pndarray of shape (n_samples, n_classes) or list of such arrays
Returns the log probability of the sample for each class in the model, where classes are ordered arithmetically for each output.
- predict_proba(X)[source]¶
Return probability estimates for the test vectors X.
- Parameters
- Xarray-like of shape (n_samples, n_features)
Test data.
- Returns
- Pndarray of shape (n_samples, n_classes) or list of such arrays
Returns the probability of the sample for each class in the model, where classes are ordered arithmetically, for each output.
- score(X, y, sample_weight=None)[source]¶
Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters
- XNone or array-like of shape (n_samples, n_features)
Test samples. Passing None as test samples gives the same result as passing real test samples, since DummyClassifier operates independently of the sampled observations.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns
- scorefloat
Mean accuracy of self.predict(X) wrt. y.
- set_params(**params)[source]¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfestimator instance
Estimator instance.