.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/tree/plot_tree_regression.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via JupyterLite or Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_tree_plot_tree_regression.py: ======================== Decision Tree Regression ======================== In this example, we demonstrate the effect of changing the maximum depth of a decision tree on how it fits to the data. We perform this once on a 1D regression task and once on a multi-output regression task. .. GENERATED FROM PYTHON SOURCE LINES 9-13 .. code-block:: Python # Authors: The scikit-learn developers # SPDX-License-Identifier: BSD-3-Clause .. GENERATED FROM PYTHON SOURCE LINES 14-29 Decision Tree on a 1D Regression Task ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Here we fit a tree on a 1D regression task. The :ref:`decision trees ` is used to fit a sine curve with addition noisy observation. As a result, it learns local linear regressions approximating the sine curve. We can see that if the maximum depth of the tree (controlled by the `max_depth` parameter) is set too high, the decision trees learn too fine details of the training data and learn from the noise, i.e. they overfit. Create a random 1D dataset -------------------------- .. GENERATED FROM PYTHON SOURCE LINES 29-36 .. code-block:: Python import numpy as np rng = np.random.RandomState(1) X = np.sort(5 * rng.rand(80, 1), axis=0) y = np.sin(X).ravel() y[::5] += 3 * (0.5 - rng.rand(16)) .. GENERATED FROM PYTHON SOURCE LINES 37-40 Fit regression model -------------------- Here we fit two models with different maximum depths .. GENERATED FROM PYTHON SOURCE LINES 40-47 .. code-block:: Python from sklearn.tree import DecisionTreeRegressor regr_1 = DecisionTreeRegressor(max_depth=2) regr_2 = DecisionTreeRegressor(max_depth=5) regr_1.fit(X, y) regr_2.fit(X, y) .. raw:: html
DecisionTreeRegressor(max_depth=5)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 48-51 Predict ------- Get predictions on the test set .. GENERATED FROM PYTHON SOURCE LINES 51-55 .. code-block:: Python X_test = np.arange(0.0, 5.0, 0.01)[:, np.newaxis] y_1 = regr_1.predict(X_test) y_2 = regr_2.predict(X_test) .. GENERATED FROM PYTHON SOURCE LINES 56-58 Plot the results ---------------- .. GENERATED FROM PYTHON SOURCE LINES 58-70 .. code-block:: Python import matplotlib.pyplot as plt plt.figure() plt.scatter(X, y, s=20, edgecolor="black", c="darkorange", label="data") plt.plot(X_test, y_1, color="cornflowerblue", label="max_depth=2", linewidth=2) plt.plot(X_test, y_2, color="yellowgreen", label="max_depth=5", linewidth=2) plt.xlabel("data") plt.ylabel("target") plt.title("Decision Tree Regression") plt.legend() plt.show() .. image-sg:: /auto_examples/tree/images/sphx_glr_plot_tree_regression_001.png :alt: Decision Tree Regression :srcset: /auto_examples/tree/images/sphx_glr_plot_tree_regression_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 71-76 As you can see, the model with a depth of 5 (yellow) learns the details of the training data to the point that it overfits to the noise. On the other hand, the model with a depth of 2 (blue) learns the major tendencies in the data well and does not overfit. In real use cases, you need to make sure that the tree is not overfitting the training data, which can be done using cross-validation. .. GENERATED FROM PYTHON SOURCE LINES 78-89 Decision Tree Regression with Multi-Output Targets ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Here the :ref:`decision trees ` is used to predict simultaneously the noisy `x` and `y` observations of a circle given a single underlying feature. As a result, it learns local linear regressions approximating the circle. We can see that if the maximum depth of the tree (controlled by the `max_depth` parameter) is set too high, the decision trees learn too fine details of the training data and learn from the noise, i.e. they overfit. .. GENERATED FROM PYTHON SOURCE LINES 91-93 Create a random dataset ----------------------- .. GENERATED FROM PYTHON SOURCE LINES 93-98 .. code-block:: Python rng = np.random.RandomState(1) X = np.sort(200 * rng.rand(100, 1) - 100, axis=0) y = np.array([np.pi * np.sin(X).ravel(), np.pi * np.cos(X).ravel()]).T y[::5, :] += 0.5 - rng.rand(20, 2) .. GENERATED FROM PYTHON SOURCE LINES 99-101 Fit regression model -------------------- .. GENERATED FROM PYTHON SOURCE LINES 101-108 .. code-block:: Python regr_1 = DecisionTreeRegressor(max_depth=2) regr_2 = DecisionTreeRegressor(max_depth=5) regr_3 = DecisionTreeRegressor(max_depth=8) regr_1.fit(X, y) regr_2.fit(X, y) regr_3.fit(X, y) .. raw:: html
DecisionTreeRegressor(max_depth=8)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 109-112 Predict ------- Get predictions on the test set .. GENERATED FROM PYTHON SOURCE LINES 112-117 .. code-block:: Python X_test = np.arange(-100.0, 100.0, 0.01)[:, np.newaxis] y_1 = regr_1.predict(X_test) y_2 = regr_2.predict(X_test) y_3 = regr_3.predict(X_test) .. GENERATED FROM PYTHON SOURCE LINES 118-120 Plot the results ---------------- .. GENERATED FROM PYTHON SOURCE LINES 120-141 .. code-block:: Python plt.figure() s = 25 plt.scatter(y[:, 0], y[:, 1], c="yellow", s=s, edgecolor="black", label="data") plt.scatter( y_1[:, 0], y_1[:, 1], c="cornflowerblue", s=s, edgecolor="black", label="max_depth=2", ) plt.scatter(y_2[:, 0], y_2[:, 1], c="red", s=s, edgecolor="black", label="max_depth=5") plt.scatter(y_3[:, 0], y_3[:, 1], c="blue", s=s, edgecolor="black", label="max_depth=8") plt.xlim([-6, 6]) plt.ylim([-6, 6]) plt.xlabel("target 1") plt.ylabel("target 2") plt.title("Multi-output Decision Tree Regression") plt.legend(loc="best") plt.show() .. image-sg:: /auto_examples/tree/images/sphx_glr_plot_tree_regression_002.png :alt: Multi-output Decision Tree Regression :srcset: /auto_examples/tree/images/sphx_glr_plot_tree_regression_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 142-145 As you can see, the higher the value of `max_depth`, the more details of the data are caught by the model. However, the model also overfits to the data and is influenced by the noise. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.370 seconds) .. _sphx_glr_download_auto_examples_tree_plot_tree_regression.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/tree/plot_tree_regression.ipynb :alt: Launch binder :width: 150 px .. container:: lite-badge .. image:: images/jupyterlite_badge_logo.svg :target: ../../lite/lab/index.html?path=auto_examples/tree/plot_tree_regression.ipynb :alt: Launch JupyterLite :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_tree_regression.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_tree_regression.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_tree_regression.zip ` .. include:: plot_tree_regression.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_