TunedThresholdClassifierCV#

class sklearn.model_selection.TunedThresholdClassifierCV(estimator, *, scoring='balanced_accuracy', response_method='auto', thresholds=100, cv=None, refit=True, n_jobs=None, random_state=None, store_cv_results=False)[source]#

Classifier that post-tunes the decision threshold using cross-validation.

This estimator post-tunes the decision threshold (cut-off point) that is used for converting posterior probability estimates (i.e. output of predict_proba) or decision scores (i.e. output of decision_function) into a class label. The tuning is done by optimizing a binary metric, potentially constrained by another metric.

See also

sklearn.model_selection.FixedThresholdClassifier: Classifier that uses a constant threshold.
sklearn.calibration.CalibratedClassifierCV: Estimator that calibrates probabilities.

Examples

>>> from sklearn.datasets import make_classification
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.metrics import classification_report
>>> from sklearn.model_selection import TunedThresholdClassifierCV, train_test_split
>>> X, y = make_classification(
...     n_samples=1_000, weights=[0.9, 0.1], class_sep=0.8, random_state=42
... )
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, stratify=y, random_state=42
... )
>>> classifier = RandomForestClassifier(random_state=0).fit(X_train, y_train)
>>> print(classification_report(y_test, classifier.predict(X_test)))
              precision    recall  f1-score   support

           0       0.94      0.99      0.96       224
           1       0.80      0.46      0.59        26

    accuracy                           0.93       250
   macro avg       0.87      0.72      0.77       250
weighted avg       0.93      0.93      0.92       250

>>> classifier_tuned = TunedThresholdClassifierCV(
...     classifier, scoring="balanced_accuracy"
... ).fit(X_train, y_train)
>>> print(
...     f"Cut-off point found at {classifier_tuned.best_threshold_:.3f}"
... )
Cut-off point found at 0.342
>>> print(classification_report(y_test, classifier_tuned.predict(X_test)))
              precision    recall  f1-score   support

           0       0.96      0.95      0.96       224
           1       0.61      0.65      0.63        26

    accuracy                           0.92       250
   macro avg       0.78      0.80      0.79       250
weighted avg       0.92      0.92      0.92       250

decision_function(X)[source]#

Decision function for samples in X using the fitted estimator.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): Training vectors, where n_samples is the number of samples and n_features is the number of features.

Returns:

decisionsndarray of shape (n_samples,): The decision function computed the fitted estimator.

fit(X, y, **params)[source]#

Fit the classifier.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): Training data.
yarray-like of shape (n_samples,): Target values.
**paramsdict: Parameters to pass to the fit method of the underlying classifier.

Returns:

selfobject: Returns an instance of self.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRouter: A MetadataRouter encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(X)[source]#

Predict the target of new samples.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The samples, as accepted by estimator.predict.

Returns:

class_labelsndarray of shape (n_samples,): The predicted class.

predict_log_proba(X)[source]#

Predict logarithm class probabilities for X using the fitted estimator.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): Training vectors, where n_samples is the number of samples and n_features is the number of features.

Returns:

log_probabilitiesndarray of shape (n_samples, n_classes): The logarithm class probabilities of the input samples.

predict_proba(X)[source]#

Predict class probabilities for X using the fitted estimator.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): Training vectors, where n_samples is the number of samples and n_features is the number of features.

Returns:

probabilitiesndarray of shape (n_samples, n_classes): The class probabilities of the input samples.

score(X, y, sample_weight=None)[source]#

Return accuracy on provided data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → TunedThresholdClassifierCV[source]#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns:

selfobject: The updated object.

Gallery examples#

Post-tuning the decision threshold for cost-sensitive learning

Post-hoc tuning the cut-off point of decision function

Release Highlights for scikit-learn 1.5