.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/compose/plot_digits_pipe.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via JupyterLite or Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_compose_plot_digits_pipe.py: ========================================================= Pipelining: chaining a PCA and a logistic regression ========================================================= The PCA does an unsupervised dimensionality reduction, while the logistic regression does the prediction. We use a GridSearchCV to set the dimensionality of the PCA .. GENERATED FROM PYTHON SOURCE LINES 12-86 .. image-sg:: /auto_examples/compose/images/sphx_glr_plot_digits_pipe_001.png :alt: plot digits pipe :srcset: /auto_examples/compose/images/sphx_glr_plot_digits_pipe_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Best parameter (CV score=0.874): {'logistic__C': np.float64(21.54434690031882), 'pca__n_components': 60} | .. code-block:: Python # Authors: The scikit-learn developers # SPDX-License-Identifier: BSD-3-Clause import matplotlib.pyplot as plt import numpy as np import polars as pl from sklearn import datasets from sklearn.decomposition import PCA from sklearn.linear_model import LogisticRegression from sklearn.model_selection import GridSearchCV from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler # Define a pipeline to search for the best combination of PCA truncation # and classifier regularization. pca = PCA() # Define a Standard Scaler to normalize inputs scaler = StandardScaler() # set the tolerance to a large value to make the example faster logistic = LogisticRegression(max_iter=10000, tol=0.1) pipe = Pipeline(steps=[("scaler", scaler), ("pca", pca), ("logistic", logistic)]) X_digits, y_digits = datasets.load_digits(return_X_y=True) # Parameters of pipelines can be set using '__' separated parameter names: param_grid = { "pca__n_components": [5, 15, 30, 45, 60], "logistic__C": np.logspace(-4, 4, 4), } search = GridSearchCV(pipe, param_grid, n_jobs=2) search.fit(X_digits, y_digits) print("Best parameter (CV score=%0.3f):" % search.best_score_) print(search.best_params_) # Plot the PCA spectrum pca.fit(X_digits) fig, (ax0, ax1) = plt.subplots(nrows=2, sharex=True, figsize=(6, 6)) ax0.plot( np.arange(1, pca.n_components_ + 1), pca.explained_variance_ratio_, "+", linewidth=2 ) ax0.set_ylabel("PCA explained variance ratio") ax0.axvline( search.best_estimator_.named_steps["pca"].n_components, linestyle=":", label="n_components chosen", ) ax0.legend(prop=dict(size=12)) # For each number of components, find the best classifier results components_col = "param_pca__n_components" is_max_test_score = pl.col("mean_test_score") == pl.col("mean_test_score").max() best_clfs = ( pl.LazyFrame(search.cv_results_) .filter(is_max_test_score.over(components_col)) .unique(components_col) .sort(components_col) .collect() ) ax1.errorbar( best_clfs[components_col], best_clfs["mean_test_score"], yerr=best_clfs["std_test_score"], ) ax1.set_ylabel("Classification accuracy (val)") ax1.set_xlabel("n_components") plt.xlim(-1, 70) plt.tight_layout() plt.show() .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 1.159 seconds) .. _sphx_glr_download_auto_examples_compose_plot_digits_pipe.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.6.X?urlpath=lab/tree/notebooks/auto_examples/compose/plot_digits_pipe.ipynb :alt: Launch binder :width: 150 px .. container:: lite-badge .. image:: images/jupyterlite_badge_logo.svg :target: ../../lite/lab/index.html?path=auto_examples/compose/plot_digits_pipe.ipynb :alt: Launch JupyterLite :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_digits_pipe.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_digits_pipe.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_digits_pipe.zip ` .. include:: plot_digits_pipe.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_