.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/miscellaneous/plot_set_output.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end <sphx_glr_download_auto_examples_miscellaneous_plot_set_output.py>` to download the full example code or to run this example in your browser via JupyterLite or Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_miscellaneous_plot_set_output.py: ================================ Introducing the `set_output` API ================================ .. currentmodule:: sklearn This example will demonstrate the `set_output` API to configure transformers to output pandas DataFrames. `set_output` can be configured per estimator by calling the `set_output` method or globally by setting `set_config(transform_output="pandas")`. For details, see `SLEP018 <https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep018/proposal.html>`__. .. GENERATED FROM PYTHON SOURCE LINES 16-17 First, we load the iris dataset as a DataFrame to demonstrate the `set_output` API. .. GENERATED FROM PYTHON SOURCE LINES 17-24 .. code-block:: default from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split X, y = load_iris(as_frame=True, return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=0) X_train.head() .. raw:: html <div class="output_subarea output_html rendered_html output_result"> <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>sepal length (cm)</th> <th>sepal width (cm)</th> <th>petal length (cm)</th> <th>petal width (cm)</th> </tr> </thead> <tbody> <tr> <th>60</th> <td>5.0</td> <td>2.0</td> <td>3.5</td> <td>1.0</td> </tr> <tr> <th>1</th> <td>4.9</td> <td>3.0</td> <td>1.4</td> <td>0.2</td> </tr> <tr> <th>8</th> <td>4.4</td> <td>2.9</td> <td>1.4</td> <td>0.2</td> </tr> <tr> <th>93</th> <td>5.0</td> <td>2.3</td> <td>3.3</td> <td>1.0</td> </tr> <tr> <th>106</th> <td>4.9</td> <td>2.5</td> <td>4.5</td> <td>1.7</td> </tr> </tbody> </table> </div> </div> <br /> <br /> .. GENERATED FROM PYTHON SOURCE LINES 25-27 To configure an estimator such as :class:`preprocessing.StandardScaler` to return DataFrames, call `set_output`. This feature requires pandas to be installed. .. GENERATED FROM PYTHON SOURCE LINES 27-36 .. code-block:: default from sklearn.preprocessing import StandardScaler scaler = StandardScaler().set_output(transform="pandas") scaler.fit(X_train) X_test_scaled = scaler.transform(X_test) X_test_scaled.head() .. raw:: html <div class="output_subarea output_html rendered_html output_result"> <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>sepal length (cm)</th> <th>sepal width (cm)</th> <th>petal length (cm)</th> <th>petal width (cm)</th> </tr> </thead> <tbody> <tr> <th>39</th> <td>-0.894264</td> <td>0.798301</td> <td>-1.271411</td> <td>-1.327605</td> </tr> <tr> <th>12</th> <td>-1.244466</td> <td>-0.086944</td> <td>-1.327407</td> <td>-1.459074</td> </tr> <tr> <th>48</th> <td>-0.660797</td> <td>1.462234</td> <td>-1.271411</td> <td>-1.327605</td> </tr> <tr> <th>23</th> <td>-0.894264</td> <td>0.576989</td> <td>-1.159419</td> <td>-0.933197</td> </tr> <tr> <th>81</th> <td>-0.427329</td> <td>-1.414810</td> <td>-0.039497</td> <td>-0.275851</td> </tr> </tbody> </table> </div> </div> <br /> <br /> .. GENERATED FROM PYTHON SOURCE LINES 37-38 `set_output` can be called after `fit` to configure `transform` after the fact. .. GENERATED FROM PYTHON SOURCE LINES 38-48 .. code-block:: default scaler2 = StandardScaler() scaler2.fit(X_train) X_test_np = scaler2.transform(X_test) print(f"Default output type: {type(X_test_np).__name__}") scaler2.set_output(transform="pandas") X_test_df = scaler2.transform(X_test) print(f"Configured pandas output type: {type(X_test_df).__name__}") .. rst-class:: sphx-glr-script-out .. code-block:: none Default output type: ndarray Configured pandas output type: DataFrame .. GENERATED FROM PYTHON SOURCE LINES 49-51 In a :class:`pipeline.Pipeline`, `set_output` configures all steps to output DataFrames. .. GENERATED FROM PYTHON SOURCE LINES 51-61 .. code-block:: default from sklearn.feature_selection import SelectPercentile from sklearn.linear_model import LogisticRegression from sklearn.pipeline import make_pipeline clf = make_pipeline( StandardScaler(), SelectPercentile(percentile=75), LogisticRegression() ) clf.set_output(transform="pandas") clf.fit(X_train, y_train) .. raw:: html <div class="output_subarea output_html rendered_html output_result"> <style>#sk-container-id-45 {color: black;}#sk-container-id-45 pre{padding: 0;}#sk-container-id-45 div.sk-toggleable {background-color: white;}#sk-container-id-45 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-45 label.sk-toggleable__label-arrow:before {content: "▸";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-45 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-45 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-45 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-45 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-45 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-45 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: "▾";}#sk-container-id-45 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-45 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-45 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-45 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-45 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-45 div.sk-parallel-item::after {content: "";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-45 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-45 div.sk-serial::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-45 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-45 div.sk-item {position: relative;z-index: 1;}#sk-container-id-45 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-45 div.sk-item::before, #sk-container-id-45 div.sk-parallel-item::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-45 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-45 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-45 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-45 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-45 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-45 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-45 div.sk-label-container {text-align: center;}#sk-container-id-45 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-45 div.sk-text-repr-fallback {display: none;}</style><div id="sk-container-id-45" class="sk-top-container"><div class="sk-text-repr-fallback"><pre>Pipeline(steps=[('standardscaler', StandardScaler()), ('selectpercentile', SelectPercentile(percentile=75)), ('logisticregression', LogisticRegression())])</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-210" type="checkbox" ><label for="sk-estimator-id-210" class="sk-toggleable__label sk-toggleable__label-arrow">Pipeline</label><div class="sk-toggleable__content"><pre>Pipeline(steps=[('standardscaler', StandardScaler()), ('selectpercentile', SelectPercentile(percentile=75)), ('logisticregression', LogisticRegression())])</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-211" type="checkbox" ><label for="sk-estimator-id-211" class="sk-toggleable__label sk-toggleable__label-arrow">StandardScaler</label><div class="sk-toggleable__content"><pre>StandardScaler()</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-212" type="checkbox" ><label for="sk-estimator-id-212" class="sk-toggleable__label sk-toggleable__label-arrow">SelectPercentile</label><div class="sk-toggleable__content"><pre>SelectPercentile(percentile=75)</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-213" type="checkbox" ><label for="sk-estimator-id-213" class="sk-toggleable__label sk-toggleable__label-arrow">LogisticRegression</label><div class="sk-toggleable__content"><pre>LogisticRegression()</pre></div></div></div></div></div></div></div> </div> <br /> <br /> .. GENERATED FROM PYTHON SOURCE LINES 62-64 Each transformer in the pipeline is configured to return DataFrames. This means that the final logistic regression step contains the feature names of the input. .. GENERATED FROM PYTHON SOURCE LINES 64-66 .. code-block:: default clf[-1].feature_names_in_ .. rst-class:: sphx-glr-script-out .. code-block:: none array(['sepal length (cm)', 'petal length (cm)', 'petal width (cm)'], dtype=object) .. GENERATED FROM PYTHON SOURCE LINES 67-69 Next we load the titanic dataset to demonstrate `set_output` with :class:`compose.ColumnTransformer` and heterogeneous data. .. GENERATED FROM PYTHON SOURCE LINES 69-76 .. code-block:: default from sklearn.datasets import fetch_openml X, y = fetch_openml( "titanic", version=1, as_frame=True, return_X_y=True, parser="pandas" ) X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y) .. GENERATED FROM PYTHON SOURCE LINES 77-79 The `set_output` API can be configured globally by using :func:`set_config` and setting `transform_output` to `"pandas"`. .. GENERATED FROM PYTHON SOURCE LINES 79-105 .. code-block:: default from sklearn import set_config from sklearn.compose import ColumnTransformer from sklearn.impute import SimpleImputer from sklearn.preprocessing import OneHotEncoder, StandardScaler set_config(transform_output="pandas") num_pipe = make_pipeline(SimpleImputer(), StandardScaler()) num_cols = ["age", "fare"] ct = ColumnTransformer( ( ("numerical", num_pipe, num_cols), ( "categorical", OneHotEncoder( sparse_output=False, drop="if_binary", handle_unknown="ignore" ), ["embarked", "sex", "pclass"], ), ), verbose_feature_names_out=False, ) clf = make_pipeline(ct, SelectPercentile(percentile=50), LogisticRegression()) clf.fit(X_train, y_train) clf.score(X_test, y_test) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.7621951219512195 .. GENERATED FROM PYTHON SOURCE LINES 106-108 With the global configuration, all transformers output DataFrames. This allows us to easily plot the logistic regression coefficients with the corresponding feature names. .. GENERATED FROM PYTHON SOURCE LINES 108-114 .. code-block:: default import pandas as pd log_reg = clf[-1] coef = pd.Series(log_reg.coef_.ravel(), index=log_reg.feature_names_in_) _ = coef.sort_values().plot.barh() .. image-sg:: /auto_examples/miscellaneous/images/sphx_glr_plot_set_output_001.png :alt: plot set output :srcset: /auto_examples/miscellaneous/images/sphx_glr_plot_set_output_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 115-117 In order to demonstrate the :func:`config_context` functionality below, let us first reset `transform_output` to its default value. .. GENERATED FROM PYTHON SOURCE LINES 117-119 .. code-block:: default set_config(transform_output="default") .. GENERATED FROM PYTHON SOURCE LINES 120-124 When configuring the output type with :func:`config_context` the configuration at the time when `transform` or `fit_transform` are called is what counts. Setting these only when you construct or fit the transformer has no effect. .. GENERATED FROM PYTHON SOURCE LINES 124-129 .. code-block:: default from sklearn import config_context scaler = StandardScaler() scaler.fit(X_train[num_cols]) .. raw:: html <div class="output_subarea output_html rendered_html output_result"> <style>#sk-container-id-46 {color: black;}#sk-container-id-46 pre{padding: 0;}#sk-container-id-46 div.sk-toggleable {background-color: white;}#sk-container-id-46 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-46 label.sk-toggleable__label-arrow:before {content: "▸";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-46 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-46 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-46 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-46 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-46 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-46 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: "▾";}#sk-container-id-46 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-46 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-46 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-46 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-46 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-46 div.sk-parallel-item::after {content: "";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-46 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-46 div.sk-serial::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-46 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-46 div.sk-item {position: relative;z-index: 1;}#sk-container-id-46 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-46 div.sk-item::before, #sk-container-id-46 div.sk-parallel-item::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-46 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-46 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-46 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-46 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-46 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-46 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-46 div.sk-label-container {text-align: center;}#sk-container-id-46 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-46 div.sk-text-repr-fallback {display: none;}</style><div id="sk-container-id-46" class="sk-top-container"><div class="sk-text-repr-fallback"><pre>StandardScaler()</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-214" type="checkbox" checked><label for="sk-estimator-id-214" class="sk-toggleable__label sk-toggleable__label-arrow">StandardScaler</label><div class="sk-toggleable__content"><pre>StandardScaler()</pre></div></div></div></div></div> </div> <br /> <br /> .. GENERATED FROM PYTHON SOURCE LINES 130-135 .. code-block:: default with config_context(transform_output="pandas"): # the output of transform will be a Pandas DataFrame X_test_scaled = scaler.transform(X_test[num_cols]) X_test_scaled.head() .. raw:: html <div class="output_subarea output_html rendered_html output_result"> <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>age</th> <th>fare</th> </tr> </thead> <tbody> <tr> <th>1088</th> <td>0.151101</td> <td>-0.479229</td> </tr> <tr> <th>1001</th> <td>NaN</td> <td>-0.188153</td> </tr> <tr> <th>660</th> <td>-0.393297</td> <td>-0.263234</td> </tr> <tr> <th>657</th> <td>-1.975455</td> <td>-0.263234</td> </tr> <tr> <th>285</th> <td>2.532843</td> <td>3.546068</td> </tr> </tbody> </table> </div> </div> <br /> <br /> .. GENERATED FROM PYTHON SOURCE LINES 136-137 outside of the context manager, the output will be a NumPy array .. GENERATED FROM PYTHON SOURCE LINES 137-139 .. code-block:: default X_test_scaled = scaler.transform(X_test[num_cols]) X_test_scaled[:5] .. rst-class:: sphx-glr-script-out .. code-block:: none array([[ 0.1511007 , -0.47922861], [ nan, -0.18815268], [-0.39329747, -0.26323428], [-1.97545464, -0.26323428], [ 2.53284267, 3.54606834]]) .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.150 seconds) .. _sphx_glr_download_auto_examples_miscellaneous_plot_set_output.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.3.X?urlpath=lab/tree/notebooks/auto_examples/miscellaneous/plot_set_output.ipynb :alt: Launch binder :width: 150 px .. container:: lite-badge .. image:: images/jupyterlite_badge_logo.svg :target: ../../lite/lab/?path=auto_examples/miscellaneous/plot_set_output.ipynb :alt: Launch JupyterLite :width: 150 px .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_set_output.py <plot_set_output.py>` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_set_output.ipynb <plot_set_output.ipynb>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_