.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/ensemble/plot_forest_importances.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here <sphx_glr_download_auto_examples_ensemble_plot_forest_importances.py>` to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_ensemble_plot_forest_importances.py: ========================================== Feature importances with a forest of trees ========================================== This example shows the use of a forest of trees to evaluate the importance of features on an artificial classification task. The blue bars are the feature importances of the forest, along with their inter-trees variability represented by the error bars. As expected, the plot suggests that 3 features are informative, while the remaining are not. .. GENERATED FROM PYTHON SOURCE LINES 14-17 .. code-block:: default print(__doc__) import matplotlib.pyplot as plt .. GENERATED FROM PYTHON SOURCE LINES 18-24 Data generation and model fitting --------------------------------- We generate a synthetic dataset with only 3 informative features. We will explicitly not shuffle the dataset to ensure that the informative features will correspond to the three first columns of X. In addition, we will split our dataset into training and testing subsets. .. GENERATED FROM PYTHON SOURCE LINES 24-33 .. code-block:: default from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split X, y = make_classification( n_samples=1000, n_features=10, n_informative=3, n_redundant=0, n_repeated=0, n_classes=2, random_state=0, shuffle=False) X_train, X_test, y_train, y_test = train_test_split( X, y, stratify=y, random_state=42) .. GENERATED FROM PYTHON SOURCE LINES 34-35 A random forest classifier will be fitted to compute the feature importances. .. GENERATED FROM PYTHON SOURCE LINES 35-41 .. code-block:: default from sklearn.ensemble import RandomForestClassifier feature_names = [f'feature {i}' for i in range(X.shape[1])] forest = RandomForestClassifier(random_state=0) forest.fit(X_train, y_train) .. raw:: html <div class="output_subarea output_html rendered_html output_result"> <style>#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 {color: black;background-color: white;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 pre{padding: 0;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 div.sk-toggleable {background-color: white;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.2em 0.3em;box-sizing: border-box;text-align: center;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;margin: 0.25em 0.25em;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 div.sk-estimator:hover {background-color: #d4ebff;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 div.sk-parallel-item::after {content: "";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 div.sk-serial::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 div.sk-item {z-index: 1;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 div.sk-parallel-item {display: flex;flex-direction: column;position: relative;background-color: white;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 div.sk-parallel-item:only-child::after {width: 0;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0.2em;box-sizing: border-box;padding-bottom: 0.1em;background-color: white;position: relative;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 div.sk-label label {font-family: monospace;font-weight: bold;background-color: white;display: inline-block;line-height: 1.2em;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 div.sk-label-container {position: relative;z-index: 2;text-align: center;}#sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051 div.sk-container {display: inline-block;position: relative;}</style><div id="sk-ba9ea8ab-7df6-47b1-977f-5921a15e3051" class"sk-top-container"><div class="sk-container"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="62500c36-6fc0-4313-92b9-987fb0f68dfe" type="checkbox" checked><label class="sk-toggleable__label" for="62500c36-6fc0-4313-92b9-987fb0f68dfe">RandomForestClassifier</label><div class="sk-toggleable__content"><pre>RandomForestClassifier(random_state=0)</pre></div></div></div></div></div> </div> <br /> <br /> .. GENERATED FROM PYTHON SOURCE LINES 42-52 Feature importance based on mean decrease in impurity ----------------------------------------------------- Feature importances are provided by the fitted attribute `feature_importances_` and they are computed as the mean and standard deviation of accumulation of the impurity decrease within each tree. .. warning:: Impurity-based feature importances can be misleading for high cardinality features (many unique values). See :ref:`permutation_importance` as an alternative below. .. GENERATED FROM PYTHON SOURCE LINES 52-64 .. code-block:: default import time import numpy as np start_time = time.time() importances = forest.feature_importances_ std = np.std([ tree.feature_importances_ for tree in forest.estimators_], axis=0) elapsed_time = time.time() - start_time print(f"Elapsed time to compute the importances: " f"{elapsed_time:.3f} seconds") .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Elapsed time to compute the importances: 0.008 seconds .. GENERATED FROM PYTHON SOURCE LINES 65-66 Let's plot the impurity-based importance. .. GENERATED FROM PYTHON SOURCE LINES 66-75 .. code-block:: default import pandas as pd forest_importances = pd.Series(importances, index=feature_names) fig, ax = plt.subplots() forest_importances.plot.bar(yerr=std, ax=ax) ax.set_title("Feature importances using MDI") ax.set_ylabel("Mean decrease in impurity") fig.tight_layout() .. image:: /auto_examples/ensemble/images/sphx_glr_plot_forest_importances_001.png :alt: Feature importances using MDI :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 76-83 We observe that, as expected, the three first features are found important. Feature importance based on feature permutation ----------------------------------------------- Permutation feature importance overcomes limitations of the impurity-based feature importance: they do not have a bias toward high-cardinality features and can be computed on a left-out test set. .. GENERATED FROM PYTHON SOURCE LINES 83-94 .. code-block:: default from sklearn.inspection import permutation_importance start_time = time.time() result = permutation_importance( forest, X_test, y_test, n_repeats=10, random_state=42, n_jobs=2) elapsed_time = time.time() - start_time print(f"Elapsed time to compute the importances: " f"{elapsed_time:.3f} seconds") forest_importances = pd.Series(result.importances_mean, index=feature_names) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Elapsed time to compute the importances: 0.929 seconds .. GENERATED FROM PYTHON SOURCE LINES 95-99 The computation for full permutation importance is more costly. Features are shuffled n times and the model refitted to estimate the importance of it. Please see :ref:`permutation_importance` for more details. We can now plot the importance ranking. .. GENERATED FROM PYTHON SOURCE LINES 99-107 .. code-block:: default fig, ax = plt.subplots() forest_importances.plot.bar(yerr=result.importances_std, ax=ax) ax.set_title("Feature importances using permutation on full model") ax.set_ylabel("Mean accuracy decrease") fig.tight_layout() plt.show() .. image:: /auto_examples/ensemble/images/sphx_glr_plot_forest_importances_002.png :alt: Feature importances using permutation on full model :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 108-111 The same features are detected as most important using both methods. Although the relative importances vary. As seen on the plots, MDI is less likely than permutation importance to fully omit a feature. .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 1.488 seconds) .. _sphx_glr_download_auto_examples_ensemble_plot_forest_importances.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/0.24.X?urlpath=lab/tree/notebooks/auto_examples/ensemble/plot_forest_importances.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_forest_importances.py <plot_forest_importances.py>` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_forest_importances.ipynb <plot_forest_importances.ipynb>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_