Note

Go to the end to download the full example code or to run this example in your browser via JupyterLite or Binder.

Label Propagation circles: Learning a complex structure#

Example of LabelPropagation learning a complex internal structure to demonstrate “manifold learning”. The outer circle should be labeled “red” and the inner circle “blue”. Because both label groups lie inside their own distinct shape, we can see that the labels propagate correctly around the circle.

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

We generate a dataset with two concentric circles. In addition, a label is associated with each sample of the dataset that is: 0 (belonging to the outer circle), 1 (belonging to the inner circle), and -1 (unknown). Here, all labels but two are tagged as unknown.

import numpy as np

from sklearn.datasets import make_circles

n_samples = 200
X, y = make_circles(n_samples=n_samples, shuffle=False)
outer, inner = 0, 1
labels = np.full(n_samples, -1.0)
labels[0] = outer
labels[-1] = inner

Plot raw data

import matplotlib.pyplot as plt

plt.figure(figsize=(4, 4))
plt.scatter(
    X[labels == outer, 0],
    X[labels == outer, 1],
    color="navy",
    marker="s",
    lw=0,
    label="outer labeled",
    s=10,
)
plt.scatter(
    X[labels == inner, 0],
    X[labels == inner, 1],
    color="c",
    marker="s",
    lw=0,
    label="inner labeled",
    s=10,
)
plt.scatter(
    X[labels == -1, 0],
    X[labels == -1, 1],
    color="darkorange",
    marker=".",
    label="unlabeled",
)
plt.legend(scatterpoints=1, shadow=False, loc="center")
_ = plt.title("Raw data (2 classes=outer and inner)")

The aim of LabelSpreading is to associate a label to sample where the label is initially unknown.

from sklearn.semi_supervised import LabelSpreading

label_spread = LabelSpreading(kernel="knn", alpha=0.8)
label_spread.fit(X, labels)

LabelSpreading(alpha=0.8, kernel='knn')

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Now, we can check which labels have been associated with each sample when the label was unknown.

output_labels = label_spread.transduction_
output_label_array = np.asarray(output_labels)
outer_numbers = (output_label_array == outer).nonzero()[0]
inner_numbers = (output_label_array == inner).nonzero()[0]

plt.figure(figsize=(4, 4))
plt.scatter(
    X[outer_numbers, 0],
    X[outer_numbers, 1],
    color="navy",
    marker="s",
    lw=0,
    s=10,
    label="outer learned",
)
plt.scatter(
    X[inner_numbers, 0],
    X[inner_numbers, 1],
    color="c",
    marker="s",
    lw=0,
    s=10,
    label="inner learned",
)
plt.legend(scatterpoints=1, shadow=False, loc="center")
plt.title("Labels learned with Label Spreading (KNN)")
plt.show()

Labels learned with Label Spreading (KNN)

Total running time of the script: (0 minutes 0.129 seconds)

Related examples

Decision boundary of semi-supervised classifiers versus SVM on the Iris dataset

A demo of the Spectral Biclustering algorithm

Semi-supervised Classification on a Text Dataset

Label Propagation digits: Demonstrating performance

Gallery generated by Sphinx-Gallery

	kernel kernel: {'knn', 'rbf'} or callable, default='rbf' String identifier for kernel function to use or the kernel function itself. Only 'rbf' and 'knn' strings are valid inputs. The function passed should take two inputs, each of shape (n_samples, n_features), and return a (n_samples, n_samples) shaped weight matrix.	'knn'
	alpha alpha: float, default=0.2 Clamping factor. A value in (0, 1) that specifies the relative amount that an instance should adopt the information from its neighbors as opposed to its initial label. alpha=0 means keeping the initial label information; alpha=1 means replacing all initial information.	0.8
	gamma gamma: float, default=20 Parameter for rbf kernel.	20
	n_neighbors n_neighbors: int, default=7 Parameter for knn kernel which is a strictly positive integer.	7
	max_iter max_iter: int, default=30 Maximum number of iterations allowed.	30
	tol tol: float, default=1e-3 Convergence tolerance: threshold to consider the system at steady state.	0.001
	n_jobs n_jobs: int, default=None The number of parallel jobs to run. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary <n_jobs>` for more details.	None

Label Propagation circles: Learning a complex structure#

This Page