- sklearn.utils.parallel_backend(backend, n_jobs=-1, inner_max_num_threads=None, **backend_params)¶
Change the default backend used by Parallel inside a with block.
It is advised to use the
parallel_configcontext manager instead, which allows more fine-grained control over the backend configuration.
backendis a string it must match a previously registered implementation using the
By default the following backends are available:
‘loky’: single-host, process-based parallelism (used by default),
‘threading’: single-host, thread-based parallelism,
‘multiprocessing’: legacy single-host, process-based parallelism.
‘loky’ is recommended to run functions that manipulate Python objects. ‘threading’ is a low-overhead alternative that is most efficient for functions that release the Global Interpreter Lock: e.g. I/O-bound code or CPU-bound code in a few calls to native code that explicitly releases the GIL. Note that on some rare systems (such as Pyodide), multiprocessing and loky may not be available, in which case joblib defaults to threading.
You can also use the Dask joblib backend to distribute work across machines. This works well with scikit-learn estimators with the
n_jobsparameter, for example:
>>> import joblib >>> from sklearn.model_selection import GridSearchCV >>> from dask.distributed import Client, LocalCluster
>>> # create a local Dask cluster >>> cluster = LocalCluster() >>> client = Client(cluster) >>> grid_search = GridSearchCV(estimator, param_grid, n_jobs=-1) ... >>> with joblib.parallel_backend("dask", scatter=[X, y]): ... grid_search.fit(X, y)
It is also possible to use the distributed ‘ray’ backend for distributing the workload to a cluster of nodes. To use the ‘ray’ joblib backend add the following lines:
>>> from ray.util.joblib import register_ray >>> register_ray() >>> with parallel_backend("ray"): ... print(Parallel()(delayed(neg)(i + 1) for i in range(5))) [-1, -2, -3, -4, -5]
Alternatively the backend can be passed directly as an instance.
By default all available workers will be used (
n_jobs=-1) unless the caller passes an explicit value for the
This is an alternative to passing a
backend='backend_name'argument to the
Parallelclass constructor. It is particularly useful when calling into library code that uses joblib internally but does not expose the backend argument in its own API.
>>> from operator import neg >>> with parallel_backend('threading'): ... print(Parallel()(delayed(neg)(i + 1) for i in range(5))) ... [-1, -2, -3, -4, -5]
Joblib also tries to limit the oversubscription by limiting the number of threads usable in some third-party library threadpools like OpenBLAS, MKL or OpenMP. The default limit in each worker is set to
max(cpu_count() // effective_n_jobs, 1)but this limit can be overwritten with the
inner_max_num_threadsargument which will be used to set this limit in the child processes.
New in version 0.10.
context manager to change the backend configuration.