.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/model_selection/plot_likelihood_ratios.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end <sphx_glr_download_auto_examples_model_selection_plot_likelihood_ratios.py>` to download the full example code or to run this example in your browser via JupyterLite or Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_model_selection_plot_likelihood_ratios.py: ============================================================= Class Likelihood Ratios to measure classification performance ============================================================= This example demonstrates the :func:`~sklearn.metrics.class_likelihood_ratios` function, which computes the positive and negative likelihood ratios (`LR+`, `LR-`) to assess the predictive power of a binary classifier. As we will see, these metrics are independent of the proportion between classes in the test set, which makes them very useful when the available data for a study has a different class proportion than the target application. A typical use is a case-control study in medicine, which has nearly balanced classes while the general population has large class imbalance. In such application, the pre-test probability of an individual having the target condition can be chosen to be the prevalence, i.e. the proportion of a particular population found to be affected by a medical condition. The post-test probabilities represent then the probability that the condition is truly present given a positive test result. In this example we first discuss the link between pre-test and post-test odds given by the :ref:`class_likelihood_ratios`. Then we evaluate their behavior in some controlled scenarios. In the last section we plot them as a function of the prevalence of the positive class. .. GENERATED FROM PYTHON SOURCE LINES 27-30 .. code-block:: default # Authors: Arturo Amor <david-arturo.amor-quiroz@inria.fr> # Olivier Grisel <olivier.grisel@ensta.org> .. GENERATED FROM PYTHON SOURCE LINES 31-38 Pre-test vs. post-test analysis =============================== Suppose we have a population of subjects with physiological measurements `X` that can hopefully serve as indirect bio-markers of the disease and actual disease indicators `y` (ground truth). Most of the people in the population do not carry the disease but a minority (in this case around 10%) does: .. GENERATED FROM PYTHON SOURCE LINES 38-44 .. code-block:: default from sklearn.datasets import make_classification X, y = make_classification(n_samples=10_000, weights=[0.9, 0.1], random_state=0) print(f"Percentage of people carrying the disease: {100*y.mean():.2f}%") .. rst-class:: sphx-glr-script-out .. code-block:: none Percentage of people carrying the disease: 10.37% .. GENERATED FROM PYTHON SOURCE LINES 45-48 A machine learning model is built to diagnose if a person with some given physiological measurements is likely to carry the disease of interest. To evaluate the model, we need to assess its performance on a held-out test set: .. GENERATED FROM PYTHON SOURCE LINES 48-53 .. code-block:: default from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0) .. GENERATED FROM PYTHON SOURCE LINES 54-57 Then we can fit our diagnosis model and compute the positive likelihood ratio to evaluate the usefulness of this classifier as a disease diagnosis tool: .. GENERATED FROM PYTHON SOURCE LINES 57-66 .. code-block:: default from sklearn.linear_model import LogisticRegression from sklearn.metrics import class_likelihood_ratios estimator = LogisticRegression().fit(X_train, y_train) y_pred = estimator.predict(X_test) pos_LR, neg_LR = class_likelihood_ratios(y_test, y_pred) print(f"LR+: {pos_LR:.3f}") .. rst-class:: sphx-glr-script-out .. code-block:: none LR+: 12.617 .. GENERATED FROM PYTHON SOURCE LINES 67-77 Since the positive class likelihood ratio is much larger than 1.0, it means that the machine learning-based diagnosis tool is useful: the post-test odds that the condition is truly present given a positive test result are more than 12 times larger than the pre-test odds. Cross-validation of likelihood ratios ===================================== We assess the variability of the measurements for the class likelihood ratios in some particular cases. .. GENERATED FROM PYTHON SOURCE LINES 77-97 .. code-block:: default import pandas as pd def scoring(estimator, X, y): y_pred = estimator.predict(X) pos_lr, neg_lr = class_likelihood_ratios(y, y_pred, raise_warning=False) return {"positive_likelihood_ratio": pos_lr, "negative_likelihood_ratio": neg_lr} def extract_score(cv_results): lr = pd.DataFrame( { "positive": cv_results["test_positive_likelihood_ratio"], "negative": cv_results["test_negative_likelihood_ratio"], } ) return lr.aggregate(["mean", "std"]) .. GENERATED FROM PYTHON SOURCE LINES 98-100 We first validate the :class:`~sklearn.linear_model.LogisticRegression` model with default hyperparameters as used in the previous section. .. GENERATED FROM PYTHON SOURCE LINES 100-106 .. code-block:: default from sklearn.model_selection import cross_validate estimator = LogisticRegression() extract_score(cross_validate(estimator, X, y, scoring=scoring, cv=10)) .. raw:: html <div class="output_subarea output_html rendered_html output_result"> <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>positive</th> <th>negative</th> </tr> </thead> <tbody> <tr> <th>mean</th> <td>16.718894</td> <td>0.724619</td> </tr> <tr> <th>std</th> <td>4.321091</td> <td>0.054054</td> </tr> </tbody> </table> </div> </div> <br /> <br /> .. GENERATED FROM PYTHON SOURCE LINES 107-113 We confirm that the model is useful: the post-test odds are between 12 and 20 times larger than the pre-test odds. On the contrary, let's consider a dummy model that will output random predictions with similar odds as the average disease prevalence in the training set: .. GENERATED FROM PYTHON SOURCE LINES 113-119 .. code-block:: default from sklearn.dummy import DummyClassifier estimator = DummyClassifier(strategy="stratified", random_state=1234) extract_score(cross_validate(estimator, X, y, scoring=scoring, cv=10)) .. raw:: html <div class="output_subarea output_html rendered_html output_result"> <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>positive</th> <th>negative</th> </tr> </thead> <tbody> <tr> <th>mean</th> <td>1.108843</td> <td>0.986989</td> </tr> <tr> <th>std</th> <td>0.268147</td> <td>0.034278</td> </tr> </tbody> </table> </div> </div> <br /> <br /> .. GENERATED FROM PYTHON SOURCE LINES 120-125 Here both class likelihood ratios are compatible with 1.0 which makes this classifier useless as a diagnostic tool to improve disease detection. Another option for the dummy model is to always predict the most frequent class, which in this case is "no-disease". .. GENERATED FROM PYTHON SOURCE LINES 125-129 .. code-block:: default estimator = DummyClassifier(strategy="most_frequent") extract_score(cross_validate(estimator, X, y, scoring=scoring, cv=10)) .. raw:: html <div class="output_subarea output_html rendered_html output_result"> <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>positive</th> <th>negative</th> </tr> </thead> <tbody> <tr> <th>mean</th> <td>NaN</td> <td>1.0</td> </tr> <tr> <th>std</th> <td>NaN</td> <td>0.0</td> </tr> </tbody> </table> </div> </div> <br /> <br /> .. GENERATED FROM PYTHON SOURCE LINES 130-145 The absence of positive predictions means there will be no true positives nor false positives, leading to an undefined `LR+` that by no means should be interpreted as an infinite `LR+` (the classifier perfectly identifying positive cases). In such situation the :func:`~sklearn.metrics.class_likelihood_ratios` function returns `nan` and raises a warning by default. Indeed, the value of `LR-` helps us discard this model. A similar scenario may arise when cross-validating highly imbalanced data with few samples: some folds will have no samples with the disease and therefore they will output no true positives nor false negatives when used for testing. Mathematically this leads to an infinite `LR+`, which should also not be interpreted as the model perfectly identifying positive cases. Such event leads to a higher variance of the estimated likelihood ratios, but can still be interpreted as an increment of the post-test odds of having the condition. .. GENERATED FROM PYTHON SOURCE LINES 145-150 .. code-block:: default estimator = LogisticRegression() X, y = make_classification(n_samples=300, weights=[0.9, 0.1], random_state=0) extract_score(cross_validate(estimator, X, y, scoring=scoring, cv=10)) .. raw:: html <div class="output_subarea output_html rendered_html output_result"> <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>positive</th> <th>negative</th> </tr> </thead> <tbody> <tr> <th>mean</th> <td>17.8000</td> <td>0.373333</td> </tr> <tr> <th>std</th> <td>8.5557</td> <td>0.235430</td> </tr> </tbody> </table> </div> </div> <br /> <br /> .. GENERATED FROM PYTHON SOURCE LINES 151-168 Invariance with respect to prevalence ===================================== The likelihood ratios are independent of the disease prevalence and can be extrapolated between populations regardless of any possible class imbalance, **as long as the same model is applied to all of them**. Notice that in the plots below **the decision boundary is constant** (see :ref:`sphx_glr_auto_examples_svm_plot_separating_hyperplane_unbalanced.py` for a study of the boundary decision for unbalanced classes). Here we train a :class:`~sklearn.linear_model.LogisticRegression` base model on a case-control study with a prevalence of 50%. It is then evaluated over populations with varying prevalence. We use the :func:`~sklearn.datasets.make_classification` function to ensure the data-generating process is always the same as shown in the plots below. The label `1` corresponds to the positive class "disease", whereas the label `0` stands for "no-disease". .. GENERATED FROM PYTHON SOURCE LINES 168-194 .. code-block:: default from collections import defaultdict import matplotlib.pyplot as plt import numpy as np from sklearn.inspection import DecisionBoundaryDisplay populations = defaultdict(list) common_params = { "n_samples": 10_000, "n_features": 2, "n_informative": 2, "n_redundant": 0, "random_state": 0, } weights = np.linspace(0.1, 0.8, 6) weights = weights[::-1] # fit and evaluate base model on balanced classes X, y = make_classification(**common_params, weights=[0.5, 0.5]) estimator = LogisticRegression().fit(X, y) lr_base = extract_score(cross_validate(estimator, X, y, scoring=scoring, cv=10)) pos_lr_base, pos_lr_base_std = lr_base["positive"].values neg_lr_base, neg_lr_base_std = lr_base["negative"].values .. GENERATED FROM PYTHON SOURCE LINES 195-198 We will now show the decision boundary for each level of prevalence. Note that we only plot a subset of the original data to better assess the linear model decision boundary. .. GENERATED FROM PYTHON SOURCE LINES 198-228 .. code-block:: default fig, axs = plt.subplots(nrows=3, ncols=2, figsize=(15, 12)) for ax, (n, weight) in zip(axs.ravel(), enumerate(weights)): X, y = make_classification( **common_params, weights=[weight, 1 - weight], ) prevalence = y.mean() populations["prevalence"].append(prevalence) populations["X"].append(X) populations["y"].append(y) # down-sample for plotting rng = np.random.RandomState(1) plot_indices = rng.choice(np.arange(X.shape[0]), size=500, replace=True) X_plot, y_plot = X[plot_indices], y[plot_indices] # plot fixed decision boundary of base model with varying prevalence disp = DecisionBoundaryDisplay.from_estimator( estimator, X_plot, response_method="predict", alpha=0.5, ax=ax, ) scatter = disp.ax_.scatter(X_plot[:, 0], X_plot[:, 1], c=y_plot, edgecolor="k") disp.ax_.set_title(f"prevalence = {y_plot.mean():.2f}") disp.ax_.legend(*scatter.legend_elements()) .. image-sg:: /auto_examples/model_selection/images/sphx_glr_plot_likelihood_ratios_001.png :alt: prevalence = 0.22, prevalence = 0.34, prevalence = 0.45, prevalence = 0.60, prevalence = 0.76, prevalence = 0.88 :srcset: /auto_examples/model_selection/images/sphx_glr_plot_likelihood_ratios_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 229-230 We define a function for bootstrapping. .. GENERATED FROM PYTHON SOURCE LINES 230-245 .. code-block:: default def scoring_on_bootstrap(estimator, X, y, rng, n_bootstrap=100): results_for_prevalence = defaultdict(list) for _ in range(n_bootstrap): bootstrap_indices = rng.choice( np.arange(X.shape[0]), size=X.shape[0], replace=True ) for key, value in scoring( estimator, X[bootstrap_indices], y[bootstrap_indices] ).items(): results_for_prevalence[key].append(value) return pd.DataFrame(results_for_prevalence) .. GENERATED FROM PYTHON SOURCE LINES 246-247 We score the base model for each prevalence using bootstrapping. .. GENERATED FROM PYTHON SOURCE LINES 247-267 .. code-block:: default results = defaultdict(list) n_bootstrap = 100 rng = np.random.default_rng(seed=0) for prevalence, X, y in zip( populations["prevalence"], populations["X"], populations["y"] ): results_for_prevalence = scoring_on_bootstrap( estimator, X, y, rng, n_bootstrap=n_bootstrap ) results["prevalence"].append(prevalence) results["metrics"].append( results_for_prevalence.aggregate(["mean", "std"]).unstack() ) results = pd.DataFrame(results["metrics"], index=results["prevalence"]) results.index.name = "prevalence" results .. raw:: html <div class="output_subarea output_html rendered_html output_result"> <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead tr th { text-align: left; } .dataframe thead tr:last-of-type th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr> <th></th> <th colspan="2" halign="left">positive_likelihood_ratio</th> <th colspan="2" halign="left">negative_likelihood_ratio</th> </tr> <tr> <th></th> <th>mean</th> <th>std</th> <th>mean</th> <th>std</th> </tr> <tr> <th>prevalence</th> <th></th> <th></th> <th></th> <th></th> </tr> </thead> <tbody> <tr> <th>0.2039</th> <td>4.507943</td> <td>0.113516</td> <td>0.207667</td> <td>0.009778</td> </tr> <tr> <th>0.3419</th> <td>4.445329</td> <td>0.125197</td> <td>0.198280</td> <td>0.008907</td> </tr> <tr> <th>0.4809</th> <td>4.422287</td> <td>0.123864</td> <td>0.192630</td> <td>0.006340</td> </tr> <tr> <th>0.6196</th> <td>4.410507</td> <td>0.163975</td> <td>0.193761</td> <td>0.005864</td> </tr> <tr> <th>0.7578</th> <td>4.335398</td> <td>0.175224</td> <td>0.189120</td> <td>0.005820</td> </tr> <tr> <th>0.8963</th> <td>4.198284</td> <td>0.238943</td> <td>0.185496</td> <td>0.005020</td> </tr> </tbody> </table> </div> </div> <br /> <br /> .. GENERATED FROM PYTHON SOURCE LINES 268-271 In the plots below we observe that the class likelihood ratios re-computed with different prevalences are indeed constant within one standard deviation of those computed with on balanced classes. .. GENERATED FROM PYTHON SOURCE LINES 271-326 .. code-block:: default fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(15, 6)) results["positive_likelihood_ratio"]["mean"].plot( ax=ax1, color="r", label="extrapolation through populations" ) ax1.axhline(y=pos_lr_base + pos_lr_base_std, color="r", linestyle="--") ax1.axhline( y=pos_lr_base - pos_lr_base_std, color="r", linestyle="--", label="base model confidence band", ) ax1.fill_between( results.index, results["positive_likelihood_ratio"]["mean"] - results["positive_likelihood_ratio"]["std"], results["positive_likelihood_ratio"]["mean"] + results["positive_likelihood_ratio"]["std"], color="r", alpha=0.3, ) ax1.set( title="Positive likelihood ratio", ylabel="LR+", ylim=[0, 5], ) ax1.legend(loc="lower right") ax2 = results["negative_likelihood_ratio"]["mean"].plot( ax=ax2, color="b", label="extrapolation through populations" ) ax2.axhline(y=neg_lr_base + neg_lr_base_std, color="b", linestyle="--") ax2.axhline( y=neg_lr_base - neg_lr_base_std, color="b", linestyle="--", label="base model confidence band", ) ax2.fill_between( results.index, results["negative_likelihood_ratio"]["mean"] - results["negative_likelihood_ratio"]["std"], results["negative_likelihood_ratio"]["mean"] + results["negative_likelihood_ratio"]["std"], color="b", alpha=0.3, ) ax2.set( title="Negative likelihood ratio", ylabel="LR-", ylim=[0, 0.5], ) ax2.legend(loc="lower right") plt.show() .. image-sg:: /auto_examples/model_selection/images/sphx_glr_plot_likelihood_ratios_002.png :alt: Positive likelihood ratio, Negative likelihood ratio :srcset: /auto_examples/model_selection/images/sphx_glr_plot_likelihood_ratios_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 2.151 seconds) .. _sphx_glr_download_auto_examples_model_selection_plot_likelihood_ratios.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.3.X?urlpath=lab/tree/notebooks/auto_examples/model_selection/plot_likelihood_ratios.ipynb :alt: Launch binder :width: 150 px .. container:: lite-badge .. image:: images/jupyterlite_badge_logo.svg :target: ../../lite/lab/?path=auto_examples/model_selection/plot_likelihood_ratios.ipynb :alt: Launch JupyterLite :width: 150 px .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_likelihood_ratios.py <plot_likelihood_ratios.py>` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_likelihood_ratios.ipynb <plot_likelihood_ratios.ipynb>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_