.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/manifold/plot_lle_digits.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_manifold_plot_lle_digits.py: ============================================================================= Manifold learning on handwritten digits: Locally Linear Embedding, Isomap... ============================================================================= We illustrate various embedding techniques on the digits dataset. .. GENERATED FROM PYTHON SOURCE LINES 9-18 .. code-block:: default # Authors: Fabian Pedregosa # Olivier Grisel # Mathieu Blondel # Gael Varoquaux # Guillaume Lemaitre # License: BSD 3 clause (C) INRIA 2011 .. GENERATED FROM PYTHON SOURCE LINES 19-22 Load digits dataset ------------------- We will load the digits dataset and only use six first of the ten available classes. .. GENERATED FROM PYTHON SOURCE LINES 22-29 .. code-block:: default from sklearn.datasets import load_digits digits = load_digits(n_class=6) X, y = digits.data, digits.target n_samples, n_features = X.shape n_neighbors = 30 .. GENERATED FROM PYTHON SOURCE LINES 30-31 We can plot the first hundred digits from this data set. .. GENERATED FROM PYTHON SOURCE LINES 31-39 .. code-block:: default import matplotlib.pyplot as plt fig, axs = plt.subplots(nrows=10, ncols=10, figsize=(6, 6)) for idx, ax in enumerate(axs.ravel()): ax.imshow(X[idx].reshape((8, 8)), cmap=plt.cm.binary) ax.axis("off") _ = fig.suptitle("A selection from the 64-dimensional digits dataset", fontsize=16) .. image-sg:: /auto_examples/manifold/images/sphx_glr_plot_lle_digits_001.png :alt: A selection from the 64-dimensional digits dataset :srcset: /auto_examples/manifold/images/sphx_glr_plot_lle_digits_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 40-46 Helper function to plot embedding --------------------------------- Below, we will use different techniques to embed the digits dataset. We will plot the projection of the original data onto each embedding. It will allow us to check whether or digits are grouped together in the embedding space, or scattered across it. .. GENERATED FROM PYTHON SOURCE LINES 46-81 .. code-block:: default import numpy as np from matplotlib import offsetbox from sklearn.preprocessing import MinMaxScaler def plot_embedding(X, title, ax): X = MinMaxScaler().fit_transform(X) for digit in digits.target_names: ax.scatter( *X[y == digit].T, marker=f"${digit}$", s=60, color=plt.cm.Dark2(digit), alpha=0.425, zorder=2, ) shown_images = np.array([[1.0, 1.0]]) # just something big for i in range(X.shape[0]): # plot every digit on the embedding # show an annotation box for a group of digits dist = np.sum((X[i] - shown_images) ** 2, 1) if np.min(dist) < 4e-3: # don't show points that are too close continue shown_images = np.concatenate([shown_images, [X[i]]], axis=0) imagebox = offsetbox.AnnotationBbox( offsetbox.OffsetImage(digits.images[i], cmap=plt.cm.gray_r), X[i] ) imagebox.set(zorder=1) ax.add_artist(imagebox) ax.set_title(title) ax.axis("off") .. GENERATED FROM PYTHON SOURCE LINES 82-100 Embedding techniques comparison ------------------------------- Below, we compare different techniques. However, there are a couple of things to note: * the :class:`~sklearn.ensemble.RandomTreesEmbedding` is not technically a manifold embedding method, as it learn a high-dimensional representation on which we apply a dimensionality reduction method. However, it is often useful to cast a dataset into a representation in which the classes are linearly-separable. * the :class:`~sklearn.discriminant_analysis.LinearDiscriminantAnalysis` and the :class:`~sklearn.neighbors.NeighborhoodComponentsAnalysis`, are supervised dimensionality reduction method, i.e. they make use of the provided labels, contrary to other methods. * the :class:`~sklearn.manifold.TSNE` is initialized with the embedding that is generated by PCA in this example. It ensures global stability of the embedding, i.e., the embedding does not depend on random initialization. .. GENERATED FROM PYTHON SOURCE LINES 100-157 .. code-block:: default from sklearn.decomposition import TruncatedSVD from sklearn.discriminant_analysis import LinearDiscriminantAnalysis from sklearn.ensemble import RandomTreesEmbedding from sklearn.manifold import ( Isomap, LocallyLinearEmbedding, MDS, SpectralEmbedding, TSNE, ) from sklearn.neighbors import NeighborhoodComponentsAnalysis from sklearn.pipeline import make_pipeline from sklearn.random_projection import SparseRandomProjection embeddings = { "Random projection embedding": SparseRandomProjection( n_components=2, random_state=42 ), "Truncated SVD embedding": TruncatedSVD(n_components=2), "Linear Discriminant Analysis embedding": LinearDiscriminantAnalysis( n_components=2 ), "Isomap embedding": Isomap(n_neighbors=n_neighbors, n_components=2), "Standard LLE embedding": LocallyLinearEmbedding( n_neighbors=n_neighbors, n_components=2, method="standard" ), "Modified LLE embedding": LocallyLinearEmbedding( n_neighbors=n_neighbors, n_components=2, method="modified" ), "Hessian LLE embedding": LocallyLinearEmbedding( n_neighbors=n_neighbors, n_components=2, method="hessian" ), "LTSA LLE embedding": LocallyLinearEmbedding( n_neighbors=n_neighbors, n_components=2, method="ltsa" ), "MDS embedding": MDS(n_components=2, n_init=1, max_iter=120, n_jobs=2), "Random Trees embedding": make_pipeline( RandomTreesEmbedding(n_estimators=200, max_depth=5, random_state=0), TruncatedSVD(n_components=2), ), "Spectral embedding": SpectralEmbedding( n_components=2, random_state=0, eigen_solver="arpack" ), "t-SNE embeedding": TSNE( n_components=2, init="pca", learning_rate="auto", n_iter=500, n_iter_without_progress=150, n_jobs=2, random_state=0, ), "NCA embedding": NeighborhoodComponentsAnalysis( n_components=2, init="pca", random_state=0 ), } .. GENERATED FROM PYTHON SOURCE LINES 158-161 Once we declared all the methodes of interest, we can run and perform the projection of the original data. We will store the projected data as well as the computational time needed to perform each projection. .. GENERATED FROM PYTHON SOURCE LINES 161-176 .. code-block:: default from time import time projections, timing = {}, {} for name, transformer in embeddings.items(): if name.startswith("Linear Discriminant Analysis"): data = X.copy() data.flat[:: X.shape[1] + 1] += 0.01 # Make X invertible else: data = X print(f"Computing {name}...") start_time = time() projections[name] = transformer.fit_transform(data, y) timing[name] = time() - start_time .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Computing Random projection embedding... Computing Truncated SVD embedding... Computing Linear Discriminant Analysis embedding... Computing Isomap embedding... Computing Standard LLE embedding... Computing Modified LLE embedding... Computing Hessian LLE embedding... Computing LTSA LLE embedding... Computing MDS embedding... Computing Random Trees embedding... Computing Spectral embedding... Computing t-SNE embeedding... /home/circleci/project/sklearn/manifold/_t_sne.py:982: FutureWarning: The PCA initialization in TSNE will change to have the standard deviation of PC1 equal to 1e-4 in 1.2. This will ensure better convergence. warnings.warn( Computing NCA embedding... .. GENERATED FROM PYTHON SOURCE LINES 177-178 Finally, we can plot the resulting projection given by each method. .. GENERATED FROM PYTHON SOURCE LINES 178-190 .. code-block:: default from itertools import zip_longest fig, axs = plt.subplots(nrows=7, ncols=2, figsize=(17, 24)) for name, ax in zip_longest(timing, axs.ravel()): if name is None: ax.axis("off") continue title = f"{name} (time {timing[name]:.3f}s)" plot_embedding(projections[name], title, ax) plt.show() .. image-sg:: /auto_examples/manifold/images/sphx_glr_plot_lle_digits_002.png :alt: Random projection embedding (time 0.001s), Truncated SVD embedding (time 0.002s), Linear Discriminant Analysis embedding (time 0.010s), Isomap embedding (time 0.887s), Standard LLE embedding (time 0.174s), Modified LLE embedding (time 0.444s), Hessian LLE embedding (time 0.538s), LTSA LLE embedding (time 0.401s), MDS embedding (time 3.365s), Random Trees embedding (time 0.234s), Spectral embedding (time 0.196s), t-SNE embeedding (time 2.938s), NCA embedding (time 1.970s) :srcset: /auto_examples/manifold/images/sphx_glr_plot_lle_digits_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 15.867 seconds) .. _sphx_glr_download_auto_examples_manifold_plot_lle_digits.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.0.X?urlpath=lab/tree/notebooks/auto_examples/manifold/plot_lle_digits.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_lle_digits.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_lle_digits.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_