.. GENERATED FROM PYTHON SOURCE LINES 153-168 One column shows all models evaluated by the same metric. The minimum number on a column should be obtained when the model is trained and measured with the same metric. This should be always the case on the training set if the training converged. Note that because the target distribution is asymmetric, the expected conditional mean and conditional median are significantly different and therefore one could not use the squared error model get a good estimation of the conditional median nor the converse. If the target distribution were symmetric and had no outliers (e.g. with a Gaussian noise), then median estimator and the least squares estimator would have yielded similar predictions. We then do the same on the test set. .. GENERATED FROM PYTHON SOURCE LINES 168-180 .. code-block:: default results = [] for name, gbr in sorted(all_models.items()): metrics = {"model": name} y_pred = gbr.predict(X_test) for alpha in [0.05, 0.5, 0.95]: metrics["pbl=%1.2f" % alpha] = mean_pinball_loss(y_test, y_pred, alpha=alpha) metrics["MSE"] = mean_squared_error(y_test, y_pred) results.append(metrics) pd.DataFrame(results).set_index("model").style.apply(highlight_min) .. raw:: html

.. GENERATED FROM PYTHON SOURCE LINES 181-201 Errors are higher meaning the models slightly overfitted the data. It still shows that the best test metric is obtained when the model is trained by minimizing this same metric. Note that the conditional median estimator is competitive with the squared error estimator in terms of MSE on the test set: this can be explained by the fact the squared error estimator is very sensitive to large outliers which can cause significant overfitting. This can be seen on the right hand side of the previous plot. The conditional median estimator is biased (underestimation for this asymmetric noise) but is also naturally robust to outliers and overfits less. Calibration of the confidence interval -------------------------------------- We can also evaluate the ability of the two extreme quantile estimators at producing a well-calibrated conditional 90%-confidence interval. To do this we can compute the fraction of observations that fall between the predictions: .. GENERATED FROM PYTHON SOURCE LINES 201-211 .. code-block:: default def coverage_fraction(y, y_low, y_high): return np.mean(np.logical_and(y >= y_low, y <= y_high)) coverage_fraction( y_train, all_models["q 0.05"].predict(X_train), all_models["q 0.95"].predict(X_train), ) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.9 .. GENERATED FROM PYTHON SOURCE LINES 212-214 On the training set the calibration is very close to the expected coverage value for a 90% confidence interval. .. GENERATED FROM PYTHON SOURCE LINES 214-219 .. code-block:: default coverage_fraction( y_test, all_models["q 0.05"].predict(X_test), all_models["q 0.95"].predict(X_test) ) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.868 .. GENERATED FROM PYTHON SOURCE LINES 220-237 On the test set, the estimated confidence interval is slightly too narrow. Note, however, that we would need to wrap those metrics in a cross-validation loop to assess their variability under data resampling. Tuning the hyper-parameters of the quantile regressors ------------------------------------------------------ In the plot above, we observed that the 5th percentile regressor seems to underfit and could not adapt to sinusoidal shape of the signal. The hyper-parameters of the model were approximately hand-tuned for the median regressor and there is no reason that the same hyper-parameters are suitable for the 5th percentile regressor. To confirm this hypothesis, we tune the hyper-parameters of a new regressor of the 5th percentile by selecting the best model parameters by cross-validation on the pinball loss with alpha=0.05: .. GENERATED FROM PYTHON SOURCE LINES 239-269 .. code-block:: default from sklearn.experimental import enable_halving_search_cv # noqa from sklearn.model_selection import HalvingRandomSearchCV from sklearn.metrics import make_scorer from pprint import pprint param_grid = dict( learning_rate=[0.05, 0.1, 0.2], max_depth=[2, 5, 10], min_samples_leaf=[1, 5, 10, 20], min_samples_split=[5, 10, 20, 30, 50], ) alpha = 0.05 neg_mean_pinball_loss_05p_scorer = make_scorer( mean_pinball_loss, alpha=alpha, greater_is_better=False, # maximize the negative loss ) gbr = GradientBoostingRegressor(loss="quantile", alpha=alpha, random_state=0) search_05p = HalvingRandomSearchCV( gbr, param_grid, resource="n_estimators", max_resources=250, min_resources=50, scoring=neg_mean_pinball_loss_05p_scorer, n_jobs=2, random_state=0, ).fit(X_train, y_train) pprint(search_05p.best_params_) .. rst-class:: sphx-glr-script-out .. code-block:: none {'learning_rate': 0.2, 'max_depth': 2, 'min_samples_leaf': 20, 'min_samples_split': 10, 'n_estimators': 150} .. GENERATED FROM PYTHON SOURCE LINES 270-278 We observe that the hyper-parameters that were hand-tuned for the median regressor are in the same range as the hyper-parameters suitable for the 5th percentile regressor. Let's now tune the hyper-parameters for the 95th percentile regressor. We need to redefine the `scoring` metric used to select the best model, along with adjusting the alpha parameter of the inner gradient boosting estimator itself: .. GENERATED FROM PYTHON SOURCE LINES 278-293 .. code-block:: default from sklearn.base import clone alpha = 0.95 neg_mean_pinball_loss_95p_scorer = make_scorer( mean_pinball_loss, alpha=alpha, greater_is_better=False, # maximize the negative loss ) search_95p = clone(search_05p).set_params( estimator__alpha=alpha, scoring=neg_mean_pinball_loss_95p_scorer, ) search_95p.fit(X_train, y_train) pprint(search_95p.best_params_) .. rst-class:: sphx-glr-script-out .. code-block:: none {'learning_rate': 0.05, 'max_depth': 2, 'min_samples_leaf': 5, 'min_samples_split': 20, 'n_estimators': 150} .. GENERATED FROM PYTHON SOURCE LINES 294-302 The result shows that the hyper-parameters for the 95th percentile regressor identified by the search procedure are roughly in the same range as the hand- tuned hyper-parameters for the median regressor and the hyper-parameters identified by the search procedure for the 5th percentile regressor. However, the hyper-parameter searches did lead to an improved 90% confidence interval that is comprised by the predictions of those two tuned quantile regressors. Note that the prediction of the upper 95th percentile has a much coarser shape than the prediction of the lower 5th percentile because of the outliers: .. GENERATED FROM PYTHON SOURCE LINES 302-320 .. code-block:: default y_lower = search_05p.predict(xx) y_upper = search_95p.predict(xx) fig = plt.figure(figsize=(10, 10)) plt.plot(xx, f(xx), "g:", linewidth=3, label=r"$f(x) = x\,\sin(x)$") plt.plot(X_test, y_test, "b.", markersize=10, label="Test observations") plt.plot(xx, y_upper, "k-") plt.plot(xx, y_lower, "k-") plt.fill_between( xx.ravel(), y_lower, y_upper, alpha=0.4, label="Predicted 90% interval" ) plt.xlabel("$x$") plt.ylabel("$f(x)$") plt.ylim(-10, 25) plt.legend(loc="upper left") plt.title("Prediction with tuned hyper-parameters") plt.show() .. image-sg:: /auto_examples/ensemble/images/sphx_glr_plot_gradient_boosting_quantile_002.png :alt: Prediction with tuned hyper-parameters :srcset: /auto_examples/ensemble/images/sphx_glr_plot_gradient_boosting_quantile_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 321-326 The plot looks qualitatively better than for the untuned models, especially for the shape of the of lower quantile. We now quantitatively evaluate the joint-calibration of the pair of estimators: .. GENERATED FROM PYTHON SOURCE LINES 326-327 .. code-block:: default coverage_fraction(y_train, search_05p.predict(X_train), search_95p.predict(X_train)) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.9026666666666666 .. GENERATED FROM PYTHON SOURCE LINES 328-329 .. code-block:: default coverage_fraction(y_test, search_05p.predict(X_test), search_95p.predict(X_test)) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.796 .. GENERATED FROM PYTHON SOURCE LINES 330-335 The calibration of the tuned pair is sadly not better on the test set: the width of the estimated confidence interval is still too narrow. Again, we would need to wrap this study in a cross-validation loop to better assess the variability of those estimates. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 9.798 seconds) .. _sphx_glr_download_auto_examples_ensemble_plot_gradient_boosting_quantile.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.3.X?urlpath=lab/tree/notebooks/auto_examples/ensemble/plot_gradient_boosting_quantile.ipynb :alt: Launch binder :width: 150 px .. container:: lite-badge .. image:: images/jupyterlite_badge_logo.svg :target: ../../lite/lab/?path=auto_examples/ensemble/plot_gradient_boosting_quantile.ipynb :alt: Launch JupyterLite :width: 150 px .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_gradient_boosting_quantile.py