7. Callbacks#

Note

The callback API is experimental, and is not yet implemented for all estimators. Please refer to the list of callback-compatible estimators for more information. It may change without the usual deprecation cycle.

This guide demonstrates how to use scikit-learn’s callbacks on compatible estimators. For information about how to implement the callback API, you can refer to the following sections of the developer’s guide:

the Implementing callback support in estimators section for making estimators compatible with callbacks.
the Developing callbacks section for how to implement a new callback.

In scikit-learn, callbacks are objects from the callback module that can be registered on an estimator to insert custom logic like monitoring progress or metrics, without modifying the underlying learning algorithm. The registered callbacks are invoked at specific steps of the fitting process.

7.1. Registering callbacks#

Estimators that support callbacks expose a set_callbacks method to register callbacks on them. The following example shows how to register a ProgressBar callback on a LogisticRegression:

>>> from sklearn.callback import ProgressBar
>>> from sklearn.linear_model import LogisticRegression
>>> progress_bar = ProgressBar()
>>> logreg = LogisticRegression(max_iter=200)
>>> logreg.set_callbacks(progress_bar)
LogisticRegression(max_iter=200)

Now that the progress bar is registered on the estimator, calling its fit method will display a progress bar:

>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> logreg.fit(X, y)
LogisticRegression - fit ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
LogisticRegression(max_iter=200)

Multiple callbacks can be registered on the same estimator, for example a ScoringMonitor callback can be registered in addition to the ProgressBar:

>>> from sklearn.callback import ScoringMonitor
>>> scoring_monitor = ScoringMonitor(scoring="accuracy")
>>> logreg.set_callbacks(progress_bar, scoring_monitor)
LogisticRegression(max_iter=200)

7.2. Callback invocation#

During fit, the callbacks are invoked at the start and end of each task, where tasks are arbitrary units of work defined by the estimator. Usually, tasks correspond to iterations of the estimator’s learning algorithm, but they can also correspond to more abstract operations like fitting an estimator, steps of a pipeline, cross-validation folds, etc. Within fit, tasks are divided into subtasks, which can themselves be divided and so on, giving them a natural tree structure where fitting the estimator is the root task.

This tree structure will usually be reflected in a callback’s generated objects and tasks will be identified by their name, id, and a reference to their parent task. For tasks that have a natural ordering, like the iterations of a learning algorithm, the ids are consecutive integers starting from 0. Some callbacks may provide additional contextual information about the tasks. Here’s an example of the logs of the ScoringMonitor:

>>> logreg.fit(X, y)
LogisticRegression - fit ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
LogisticRegression(max_iter=200)
>>> scoring_monitor.get_logs().data_as_pandas[["task_name", "task_id", "accuracy"]]
      task_name  task_id  accuracy
0           fit        0  0.973...
1    lbfgs-iter        0  0.333...
2    lbfgs-iter        1  0.333...
3    lbfgs-iter        2  0.666...
4    lbfgs-iter        3  0.926...
...

7.3. Usage with meta-estimators#

When using callbacks in estimator compositions, involving estimators and meta-estimators, we distinguish two types of callbacks: regular and auto-propagated. They serve different purposes and are meant to be registered on different estimators or meta-estimators in the composition.

7.3.1. Regular callbacks#

Regular callbacks are meant to be invoked within the fit of a given estimator. Their goal is usually to track the learning process of that estimator. ScoringMonitor, for example, records the scores at each iteration of a model. A regular callback can be registered on an estimator at any level of a composition. If a regular callback is registered on an estimator that is cloned by a meta-estimator, possibly multiple times, that callback will be invoked in each one of the fit executions of the clones.

For example, when tuning the hyperparameters of a LogisticRegression using a GridSearchCV, a ScoringMonitor can be registered on the LogisticRegression to monitor the scores of the logistic regression model for each parameter combination and each fold of the grid search:

>>> from sklearn.model_selection import GridSearchCV
>>> scoring_monitor = ScoringMonitor(scoring="accuracy")
>>> logreg = LogisticRegression(max_iter=200).set_callbacks(scoring_monitor)
>>> grid_search = GridSearchCV(logreg, {"C": [10, 1, 0.1]})
>>> grid_search.fit(X, y)
GridSearchCV(estimator=LogisticRegression(max_iter=200),
             param_grid={'C': [10, 1, 0.1]})
>>> log = scoring_monitor.get_logs().data_as_pandas
>>> # show the scores at the end of each fit of the search
>>> log[log["parent_task_id_path"] == (0,0)][["task_name", "task_id", "accuracy"]]
   task_name  task_id  accuracy
1        fit        0  0.975...
2        fit        1  0.975...
3        fit        2  0.991...
4        fit        3  0.991...
5        fit        4  0.975...
6        fit        5  0.966...
7        fit        6  0.966...
8        fit        7  0.983...
9        fit        8  0.983...
10       fit        9  0.975...
11       fit       10  0.958...
12       fit       11  0.958...
13       fit       12  0.958...
14       fit       13  0.958...
15       fit       14  0.941...

7.3.2. Auto-propagated callbacks#

Auto-propagated callbacks are meant to be invoked within the fit of all (meta-)estimators in an estimator composition. Their goal is usually to report more general information about the status at each step of the composition. ProgressBar for instance displays nested progress bars for the meta-estimators, their sub-estimators and so on. When registered on a meta-estimator, an auto-propagated callback will automatically be registered on all its sub-estimators that support callbacks.

Let’s add progress bars to the example of the previous section, tuning the hyperparameters of a LogisticRegression. The ProgressBar callback needs to be registered on the grid search:

>>> grid_search.set_callbacks(ProgressBar())
GridSearchCV(estimator=LogisticRegression(max_iter=200),
             param_grid={'C': [10, 1, 0.1]})
>>> grid_search.fit(X, y)
GridSearchCV - fit                                                           ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
  GridSearchCV - search #0                                                   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
    GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #0  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
    GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #3  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
    GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #6  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
    GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #9  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
    GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #12 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
    GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #1  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
    GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #4  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
    GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #7  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
    GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #10 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
    GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #13 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
    GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #2  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
    GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #5  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
    GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #8  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
    GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #11 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
    GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #14 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
  GridSearchCV - refit-with-best-params | LogisticRegression - fit #1        ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
GridSearchCV(estimator=LogisticRegression(max_iter=200),
             param_grid={'C': [10, 1, 0.1]})

7.4. Scikit-learn’s built-in callbacks#

The built-in callbacks currently available in scikit-learn are the following:

Callback	Description
`ProgressBar`	Display progress bars.
`ScoringMonitor`	Log a scoring metric at the end of each task during fit.

7.5. Callback Support Status#

The development of support for callbacks in estimators is in progress. Here is a list of the estimators that support callbacks: