.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/ensemble/plot_isolation_forest.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end <sphx_glr_download_auto_examples_ensemble_plot_isolation_forest.py>` to download the full example code or to run this example in your browser via JupyterLite or Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_ensemble_plot_isolation_forest.py: ======================= IsolationForest example ======================= An example using :class:`~sklearn.ensemble.IsolationForest` for anomaly detection. The :ref:`isolation_forest` is an ensemble of "Isolation Trees" that "isolate" observations by recursive random partitioning, which can be represented by a tree structure. The number of splittings required to isolate a sample is lower for outliers and higher for inliers. In the present example we demo two ways to visualize the decision boundary of an Isolation Forest trained on a toy dataset. .. GENERATED FROM PYTHON SOURCE LINES 20-32 Data generation --------------- We generate two clusters (each one containing `n_samples`) by randomly sampling the standard normal distribution as returned by :func:`numpy.random.randn`. One of them is spherical and the other one is slightly deformed. For consistency with the :class:`~sklearn.ensemble.IsolationForest` notation, the inliers (i.e. the gaussian clusters) are assigned a ground truth label `1` whereas the outliers (created with :func:`numpy.random.uniform`) are assigned the label `-1`. .. GENERATED FROM PYTHON SOURCE LINES 32-51 .. code-block:: Python import numpy as np from sklearn.model_selection import train_test_split n_samples, n_outliers = 120, 40 rng = np.random.RandomState(0) covariance = np.array([[0.5, -0.1], [0.7, 0.4]]) cluster_1 = 0.4 * rng.randn(n_samples, 2) @ covariance + np.array([2, 2]) # general cluster_2 = 0.3 * rng.randn(n_samples, 2) + np.array([-2, -2]) # spherical outliers = rng.uniform(low=-4, high=4, size=(n_outliers, 2)) X = np.concatenate([cluster_1, cluster_2, outliers]) y = np.concatenate( [np.ones((2 * n_samples), dtype=int), -np.ones((n_outliers), dtype=int)] ) X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=42) .. GENERATED FROM PYTHON SOURCE LINES 52-53 We can visualize the resulting clusters: .. GENERATED FROM PYTHON SOURCE LINES 53-63 .. code-block:: Python import matplotlib.pyplot as plt scatter = plt.scatter(X[:, 0], X[:, 1], c=y, s=20, edgecolor="k") handles, labels = scatter.legend_elements() plt.axis("square") plt.legend(handles=handles, labels=["outliers", "inliers"], title="true class") plt.title("Gaussian inliers with \nuniformly distributed outliers") plt.show() .. image-sg:: /auto_examples/ensemble/images/sphx_glr_plot_isolation_forest_001.png :alt: Gaussian inliers with uniformly distributed outliers :srcset: /auto_examples/ensemble/images/sphx_glr_plot_isolation_forest_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 64-66 Training of the model --------------------- .. GENERATED FROM PYTHON SOURCE LINES 66-72 .. code-block:: Python from sklearn.ensemble import IsolationForest clf = IsolationForest(max_samples=100, random_state=0) clf.fit(X_train) .. raw:: html <div class="output_subarea output_html rendered_html output_result"> <style>#sk-container-id-21 { /* Definition of color scheme common for light and dark mode */ --sklearn-color-text: black; --sklearn-color-line: gray; /* Definition of color scheme for unfitted estimators */ --sklearn-color-unfitted-level-0: #fff5e6; --sklearn-color-unfitted-level-1: #f6e4d2; --sklearn-color-unfitted-level-2: #ffe0b3; --sklearn-color-unfitted-level-3: chocolate; /* Definition of color scheme for fitted estimators */ --sklearn-color-fitted-level-0: #f0f8ff; --sklearn-color-fitted-level-1: #d4ebff; --sklearn-color-fitted-level-2: #b3dbfd; --sklearn-color-fitted-level-3: cornflowerblue; /* Specific color for light theme */ --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black))); --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white))); --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black))); --sklearn-color-icon: #696969; @media (prefers-color-scheme: dark) { /* Redefinition of color scheme for dark theme */ --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white))); --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111))); --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white))); --sklearn-color-icon: #878787; } } #sk-container-id-21 { color: var(--sklearn-color-text); } #sk-container-id-21 pre { padding: 0; } #sk-container-id-21 input.sk-hidden--visually { border: 0; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px); height: 1px; margin: -1px; overflow: hidden; padding: 0; position: absolute; width: 1px; } #sk-container-id-21 div.sk-dashed-wrapped { border: 1px dashed var(--sklearn-color-line); margin: 0 0.4em 0.5em 0.4em; box-sizing: border-box; padding-bottom: 0.4em; background-color: var(--sklearn-color-background); } #sk-container-id-21 div.sk-container { /* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */ display: inline-block !important; position: relative; } #sk-container-id-21 div.sk-text-repr-fallback { display: none; } div.sk-parallel-item, div.sk-serial, div.sk-item { /* draw centered vertical line to link estimators */ background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background)); background-size: 2px 100%; background-repeat: no-repeat; background-position: center center; } /* Parallel-specific style estimator block */ #sk-container-id-21 div.sk-parallel-item::after { content: ""; width: 100%; border-bottom: 2px solid var(--sklearn-color-text-on-default-background); flex-grow: 1; } #sk-container-id-21 div.sk-parallel { display: flex; align-items: stretch; justify-content: center; background-color: var(--sklearn-color-background); position: relative; } #sk-container-id-21 div.sk-parallel-item { display: flex; flex-direction: column; } #sk-container-id-21 div.sk-parallel-item:first-child::after { align-self: flex-end; width: 50%; } #sk-container-id-21 div.sk-parallel-item:last-child::after { align-self: flex-start; width: 50%; } #sk-container-id-21 div.sk-parallel-item:only-child::after { width: 0; } /* Serial-specific style estimator block */ #sk-container-id-21 div.sk-serial { display: flex; flex-direction: column; align-items: center; background-color: var(--sklearn-color-background); padding-right: 1em; padding-left: 1em; } /* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is clickable and can be expanded/collapsed. - Pipeline and ColumnTransformer use this feature and define the default style - Estimators will overwrite some part of the style using the `sk-estimator` class */ /* Pipeline and ColumnTransformer style (default) */ #sk-container-id-21 div.sk-toggleable { /* Default theme specific background. It is overwritten whether we have a specific estimator or a Pipeline/ColumnTransformer */ background-color: var(--sklearn-color-background); } /* Toggleable label */ #sk-container-id-21 label.sk-toggleable__label { cursor: pointer; display: block; width: 100%; margin-bottom: 0; padding: 0.5em; box-sizing: border-box; text-align: center; } #sk-container-id-21 label.sk-toggleable__label-arrow:before { /* Arrow on the left of the label */ content: "▸"; float: left; margin-right: 0.25em; color: var(--sklearn-color-icon); } #sk-container-id-21 label.sk-toggleable__label-arrow:hover:before { color: var(--sklearn-color-text); } /* Toggleable content - dropdown */ #sk-container-id-21 div.sk-toggleable__content { max-height: 0; max-width: 0; overflow: hidden; text-align: left; /* unfitted */ background-color: var(--sklearn-color-unfitted-level-0); } #sk-container-id-21 div.sk-toggleable__content.fitted { /* fitted */ background-color: var(--sklearn-color-fitted-level-0); } #sk-container-id-21 div.sk-toggleable__content pre { margin: 0.2em; border-radius: 0.25em; color: var(--sklearn-color-text); /* unfitted */ background-color: var(--sklearn-color-unfitted-level-0); } #sk-container-id-21 div.sk-toggleable__content.fitted pre { /* unfitted */ background-color: var(--sklearn-color-fitted-level-0); } #sk-container-id-21 input.sk-toggleable__control:checked~div.sk-toggleable__content { /* Expand drop-down */ max-height: 200px; max-width: 100%; overflow: auto; } #sk-container-id-21 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before { content: "▾"; } /* Pipeline/ColumnTransformer-specific style */ #sk-container-id-21 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label { color: var(--sklearn-color-text); background-color: var(--sklearn-color-unfitted-level-2); } #sk-container-id-21 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label { background-color: var(--sklearn-color-fitted-level-2); } /* Estimator-specific style */ /* Colorize estimator box */ #sk-container-id-21 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label { /* unfitted */ background-color: var(--sklearn-color-unfitted-level-2); } #sk-container-id-21 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label { /* fitted */ background-color: var(--sklearn-color-fitted-level-2); } #sk-container-id-21 div.sk-label label.sk-toggleable__label, #sk-container-id-21 div.sk-label label { /* The background is the default theme color */ color: var(--sklearn-color-text-on-default-background); } /* On hover, darken the color of the background */ #sk-container-id-21 div.sk-label:hover label.sk-toggleable__label { color: var(--sklearn-color-text); background-color: var(--sklearn-color-unfitted-level-2); } /* Label box, darken color on hover, fitted */ #sk-container-id-21 div.sk-label.fitted:hover label.sk-toggleable__label.fitted { color: var(--sklearn-color-text); background-color: var(--sklearn-color-fitted-level-2); } /* Estimator label */ #sk-container-id-21 div.sk-label label { font-family: monospace; font-weight: bold; display: inline-block; line-height: 1.2em; } #sk-container-id-21 div.sk-label-container { text-align: center; } /* Estimator-specific */ #sk-container-id-21 div.sk-estimator { font-family: monospace; border: 1px dotted var(--sklearn-color-border-box); border-radius: 0.25em; box-sizing: border-box; margin-bottom: 0.5em; /* unfitted */ background-color: var(--sklearn-color-unfitted-level-0); } #sk-container-id-21 div.sk-estimator.fitted { /* fitted */ background-color: var(--sklearn-color-fitted-level-0); } /* on hover */ #sk-container-id-21 div.sk-estimator:hover { /* unfitted */ background-color: var(--sklearn-color-unfitted-level-2); } #sk-container-id-21 div.sk-estimator.fitted:hover { /* fitted */ background-color: var(--sklearn-color-fitted-level-2); } /* Specification for estimator info (e.g. "i" and "?") */ /* Common style for "i" and "?" */ .sk-estimator-doc-link, a:link.sk-estimator-doc-link, a:visited.sk-estimator-doc-link { float: right; font-size: smaller; line-height: 1em; font-family: monospace; background-color: var(--sklearn-color-background); border-radius: 1em; height: 1em; width: 1em; text-decoration: none !important; margin-left: 1ex; /* unfitted */ border: var(--sklearn-color-unfitted-level-1) 1pt solid; color: var(--sklearn-color-unfitted-level-1); } .sk-estimator-doc-link.fitted, a:link.sk-estimator-doc-link.fitted, a:visited.sk-estimator-doc-link.fitted { /* fitted */ border: var(--sklearn-color-fitted-level-1) 1pt solid; color: var(--sklearn-color-fitted-level-1); } /* On hover */ div.sk-estimator:hover .sk-estimator-doc-link:hover, .sk-estimator-doc-link:hover, div.sk-label-container:hover .sk-estimator-doc-link:hover, .sk-estimator-doc-link:hover { /* unfitted */ background-color: var(--sklearn-color-unfitted-level-3); color: var(--sklearn-color-background); text-decoration: none; } div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover, .sk-estimator-doc-link.fitted:hover, div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover, .sk-estimator-doc-link.fitted:hover { /* fitted */ background-color: var(--sklearn-color-fitted-level-3); color: var(--sklearn-color-background); text-decoration: none; } /* Span, style for the box shown on hovering the info icon */ .sk-estimator-doc-link span { display: none; z-index: 9999; position: relative; font-weight: normal; right: .2ex; padding: .5ex; margin: .5ex; width: min-content; min-width: 20ex; max-width: 50ex; color: var(--sklearn-color-text); box-shadow: 2pt 2pt 4pt #999; /* unfitted */ background: var(--sklearn-color-unfitted-level-0); border: .5pt solid var(--sklearn-color-unfitted-level-3); } .sk-estimator-doc-link.fitted span { /* fitted */ background: var(--sklearn-color-fitted-level-0); border: var(--sklearn-color-fitted-level-3); } .sk-estimator-doc-link:hover span { display: block; } /* "?"-specific style due to the `<a>` HTML tag */ #sk-container-id-21 a.estimator_doc_link { float: right; font-size: 1rem; line-height: 1em; font-family: monospace; background-color: var(--sklearn-color-background); border-radius: 1rem; height: 1rem; width: 1rem; text-decoration: none; /* unfitted */ color: var(--sklearn-color-unfitted-level-1); border: var(--sklearn-color-unfitted-level-1) 1pt solid; } #sk-container-id-21 a.estimator_doc_link.fitted { /* fitted */ border: var(--sklearn-color-fitted-level-1) 1pt solid; color: var(--sklearn-color-fitted-level-1); } /* On hover */ #sk-container-id-21 a.estimator_doc_link:hover { /* unfitted */ background-color: var(--sklearn-color-unfitted-level-3); color: var(--sklearn-color-background); text-decoration: none; } #sk-container-id-21 a.estimator_doc_link.fitted:hover { /* fitted */ background-color: var(--sklearn-color-fitted-level-3); } </style><div id="sk-container-id-21" class="sk-top-container"><div class="sk-text-repr-fallback"><pre>IsolationForest(max_samples=100, random_state=0)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item"><div class="sk-estimator fitted sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-97" type="checkbox" checked><label for="sk-estimator-id-97" class="sk-toggleable__label fitted sk-toggleable__label-arrow fitted"> IsolationForest<a class="sk-estimator-doc-link fitted" rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.4/modules/generated/sklearn.ensemble.IsolationForest.html">?<span>Documentation for IsolationForest</span></a><span class="sk-estimator-doc-link fitted">i<span>Fitted</span></span></label><div class="sk-toggleable__content fitted"><pre>IsolationForest(max_samples=100, random_state=0)</pre></div> </div></div></div></div> </div> <br /> <br /> .. GENERATED FROM PYTHON SOURCE LINES 73-80 Plot discrete decision boundary ------------------------------- We use the class :class:`~sklearn.inspection.DecisionBoundaryDisplay` to visualize a discrete decision boundary. The background color represents whether a sample in that given area is predicted to be an outlier or not. The scatter plot displays the true labels. .. GENERATED FROM PYTHON SOURCE LINES 80-97 .. code-block:: Python import matplotlib.pyplot as plt from sklearn.inspection import DecisionBoundaryDisplay disp = DecisionBoundaryDisplay.from_estimator( clf, X, response_method="predict", alpha=0.5, ) disp.ax_.scatter(X[:, 0], X[:, 1], c=y, s=20, edgecolor="k") disp.ax_.set_title("Binary decision boundary \nof IsolationForest") plt.axis("square") plt.legend(handles=handles, labels=["outliers", "inliers"], title="true class") plt.show() .. image-sg:: /auto_examples/ensemble/images/sphx_glr_plot_isolation_forest_002.png :alt: Binary decision boundary of IsolationForest :srcset: /auto_examples/ensemble/images/sphx_glr_plot_isolation_forest_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 98-111 Plot path length decision boundary ---------------------------------- By setting the `response_method="decision_function"`, the background of the :class:`~sklearn.inspection.DecisionBoundaryDisplay` represents the measure of normality of an observation. Such score is given by the path length averaged over a forest of random trees, which itself is given by the depth of the leaf (or equivalently the number of splits) required to isolate a given sample. When a forest of random trees collectively produce short path lengths for isolating some particular samples, they are highly likely to be anomalies and the measure of normality is close to `0`. Similarly, large paths correspond to values close to `1` and are more likely to be inliers. .. GENERATED FROM PYTHON SOURCE LINES 111-124 .. code-block:: Python disp = DecisionBoundaryDisplay.from_estimator( clf, X, response_method="decision_function", alpha=0.5, ) disp.ax_.scatter(X[:, 0], X[:, 1], c=y, s=20, edgecolor="k") disp.ax_.set_title("Path length decision boundary \nof IsolationForest") plt.axis("square") plt.legend(handles=handles, labels=["outliers", "inliers"], title="true class") plt.colorbar(disp.ax_.collections[1]) plt.show() .. image-sg:: /auto_examples/ensemble/images/sphx_glr_plot_isolation_forest_003.png :alt: Path length decision boundary of IsolationForest :srcset: /auto_examples/ensemble/images/sphx_glr_plot_isolation_forest_003.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.436 seconds) .. _sphx_glr_download_auto_examples_ensemble_plot_isolation_forest.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.4.X?urlpath=lab/tree/notebooks/auto_examples/ensemble/plot_isolation_forest.ipynb :alt: Launch binder :width: 150 px .. container:: lite-badge .. image:: images/jupyterlite_badge_logo.svg :target: ../../lite/lab/?path=auto_examples/ensemble/plot_isolation_forest.ipynb :alt: Launch JupyterLite :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_isolation_forest.ipynb <plot_isolation_forest.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_isolation_forest.py <plot_isolation_forest.py>` .. include:: plot_isolation_forest.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_