7. Callbacks#
Note
The callback API is experimental, and is not yet implemented for all estimators. Please refer to the list of callback-compatible estimators for more information. It may change without the usual deprecation cycle.
This guide demonstrates how to use scikit-learn’s callbacks on compatible estimators. For information about how to implement the callback API, you can refer to the following sections of the developer’s guide:
the Implementing callback support in estimators section for making estimators compatible with callbacks.
the Developing callbacks section for how to implement a new callback.
In scikit-learn, callbacks are objects from the callback module that can
be registered on an estimator to insert custom logic like monitoring progress or
metrics, without modifying the underlying learning algorithm. The registered callbacks
are invoked at specific steps of the fitting process.
7.1. Registering callbacks#
Estimators that support callbacks expose a set_callbacks
method to register callbacks on them. The following example shows how to register a
ProgressBar callback on a LogisticRegression:
>>> from sklearn.callback import ProgressBar
>>> from sklearn.linear_model import LogisticRegression
>>> progress_bar = ProgressBar()
>>> logreg = LogisticRegression(max_iter=200)
>>> logreg.set_callbacks(progress_bar)
LogisticRegression(max_iter=200)
Now that the progress bar is registered on the estimator, calling its fit method will
display a progress bar:
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> logreg.fit(X, y)
LogisticRegression - fit ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
LogisticRegression(max_iter=200)
Multiple callbacks can be registered on the same estimator, for example a
ScoringMonitor callback can be registered in addition to the
ProgressBar:
>>> from sklearn.callback import ScoringMonitor
>>> scoring_monitor = ScoringMonitor(scoring="accuracy")
>>> logreg.set_callbacks(progress_bar, scoring_monitor)
LogisticRegression(max_iter=200)
7.2. Callback invocation#
During fit, the callbacks are invoked at the start and end of each task, where tasks
are arbitrary units of work defined by the estimator. Usually, tasks correspond to
iterations of the estimator’s learning algorithm, but they can also correspond to more
abstract operations like fitting an estimator, steps of a pipeline, cross-validation
folds, etc. Within fit, tasks are divided into subtasks, which can themselves be
divided and so on, giving them a natural tree structure where
fitting the estimator is the root task.
This tree structure will usually be reflected in a callback’s generated objects and
tasks will be identified by their name, id, and a reference to their parent task.
For tasks that have a natural ordering, like the iterations of a learning algorithm, the
ids are consecutive integers starting from 0. Some callbacks may provide additional
contextual information about the tasks. Here’s an example of the logs of the
ScoringMonitor:
>>> logreg.fit(X, y)
LogisticRegression - fit ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
LogisticRegression(max_iter=200)
>>> scoring_monitor.get_logs().data_as_pandas[["task_name", "task_id", "accuracy"]]
task_name task_id accuracy
0 fit 0 0.973...
1 lbfgs-iter 0 0.333...
2 lbfgs-iter 1 0.333...
3 lbfgs-iter 2 0.666...
4 lbfgs-iter 3 0.926...
...
7.3. Usage with meta-estimators#
When using callbacks in estimator compositions, involving estimators and meta-estimators, we distinguish two types of callbacks: regular and auto-propagated. They serve different purposes and are meant to be registered on different estimators or meta-estimators in the composition.
7.3.1. Regular callbacks#
Regular callbacks are meant to be invoked within the fit of a given estimator. Their
goal is usually to track the learning process of that estimator.
ScoringMonitor, for example, records the scores at each iteration of a model.
A regular callback can be registered on an estimator at any level of a composition. If a
regular callback is registered on an estimator that is cloned by a
meta-estimator, possibly multiple times, that callback will be invoked in each one of
the fit executions of the clones.
For example, when tuning the hyperparameters of a
LogisticRegression using a
GridSearchCV, a ScoringMonitor can be
registered on the LogisticRegression to monitor the
scores of the logistic regression model for each parameter combination and each fold of
the grid search:
>>> from sklearn.model_selection import GridSearchCV
>>> scoring_monitor = ScoringMonitor(scoring="accuracy")
>>> logreg = LogisticRegression(max_iter=200).set_callbacks(scoring_monitor)
>>> grid_search = GridSearchCV(logreg, {"C": [10, 1, 0.1]})
>>> grid_search.fit(X, y)
GridSearchCV(estimator=LogisticRegression(max_iter=200),
param_grid={'C': [10, 1, 0.1]})
>>> log = scoring_monitor.get_logs().data_as_pandas
>>> # show the scores at the end of each fit of the search
>>> log[log["parent_task_id_path"] == (0,0)][["task_name", "task_id", "accuracy"]]
task_name task_id accuracy
1 fit 0 0.975...
2 fit 1 0.975...
3 fit 2 0.991...
4 fit 3 0.991...
5 fit 4 0.975...
6 fit 5 0.966...
7 fit 6 0.966...
8 fit 7 0.983...
9 fit 8 0.983...
10 fit 9 0.975...
11 fit 10 0.958...
12 fit 11 0.958...
13 fit 12 0.958...
14 fit 13 0.958...
15 fit 14 0.941...
7.3.2. Auto-propagated callbacks#
Auto-propagated callbacks are meant to be invoked within the fit of all
(meta-)estimators in an estimator composition. Their goal is usually to report more
general information about the status at each step of the composition.
ProgressBar for instance displays nested progress bars for the
meta-estimators, their sub-estimators and so on. When registered on a meta-estimator,
an auto-propagated callback will automatically be registered on all its sub-estimators
that support callbacks.
registration restrictions#
Auto-propagated callbacks are designed to be registered on the top-level meta-estimator of an estimator composition. If some of its sub-estimators already have auto-propagated callbacks registered on them, an error will be raised.
If the top-level meta-estimator doesn’t itself support callbacks, then its sub-estimators are allowed to have auto-propagated callbacks registered on them. However, be aware that this is not optimal and that callbacks might not be able to deliver their full capabilities.
Let’s add progress bars to the example of the previous section, tuning the
hyperparameters of a LogisticRegression. The
ProgressBar callback needs to be registered on the grid search:
>>> grid_search.set_callbacks(ProgressBar())
GridSearchCV(estimator=LogisticRegression(max_iter=200),
param_grid={'C': [10, 1, 0.1]})
>>> grid_search.fit(X, y)
GridSearchCV - fit ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
GridSearchCV - search #0 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #0 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #3 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #6 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #9 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #12 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #1 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #4 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #7 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #10 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #13 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #5 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #8 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #11 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
GridSearchCV - candidate-split-evaluation | LogisticRegression - fit #14 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
GridSearchCV - refit-with-best-params | LogisticRegression - fit #1 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
GridSearchCV(estimator=LogisticRegression(max_iter=200),
param_grid={'C': [10, 1, 0.1]})
Control the propagation depth#
Auto-propagated callbacks can be distinguished from regular callbacks by the fact
that they have a max_propagation_depth attribute. This attribute indicates the
maximum depth of nested estimators at which the callback will be propagated. In the
previous example, the grid search is at depth 0 and the logistic regression is at
depth 1 for instance.
If max_propagation_depth is set to 0, the callback will not be propagated to any
sub-estimators and will only be invoked for the (meta-)estimator it is registered
on. If it is set to None, it will be propagated to all nested levels of the
estimator composition.
7.4. Scikit-learn’s built-in callbacks#
The built-in callbacks currently available in scikit-learn are the following:
Callback |
Description |
|---|---|
Display progress bars. |
|
Log a scoring metric at the end of each task during fit. |
7.5. Callback Support Status#
The development of support for callbacks in estimators is in progress. Here is a list of the estimators that support callbacks: