Cython Best Practices, Conventions and Knowledge#
This documents tips to develop Cython code in scikit-learn.
Tips for developing with Cython in scikit-learn#
Tips to ease development#
Time spent reading Cython’s documentation is not time lost.
If you intend to use OpenMP: On MacOS, system’s distribution of
clang
does not implement OpenMP. You can install thecompilers
package available onconda-forge
which comes with an implementation of OpenMP.Activating checks might help. E.g. for activating boundscheck use:
export SKLEARN_ENABLE_DEBUG_CYTHON_DIRECTIVES=1
Start from scratch in a notebook to understand how to use Cython and to get feedback on your work quickly. If you plan to use OpenMP for your implementations in your Jupyter Notebook, do add extra compiler and linkers arguments in the Cython magic.
# For GCC and for clang %%cython --compile-args=-fopenmp --link-args=-fopenmp # For Microsoft's compilers %%cython --compile-args=/openmp --link-args=/openmp
To debug C code (e.g. a segfault), do use
gdb
with:gdb --ex r --args python ./entrypoint_to_bug_reproducer.py
To have access to some value in place to debug in
cdef (nogil)
context, use:with gil: print(state_to_print)
Note that Cython cannot parse f-strings with
{var=}
expressions, e.g.print(f"{test_val=}")
scikit-learn codebase has a lot of non-unified (fused) types (re)definitions. There currently is ongoing work to simplify and unify that across the codebase. For now, make sure you understand which concrete types are used ultimately.
You might find this alias to compile individual Cython extension handy:
# You might want to add this alias to your shell script config. alias cythonX="cython -X language_level=3 -X boundscheck=False -X wraparound=False -X initializedcheck=False -X nonecheck=False -X cdivision=True" # This generates `source.c` as if you had recompiled scikit-learn entirely. cythonX --annotate source.pyx
Using the
--annotate
option with this flag allows generating a HTML report of code annotation. This report indicates interactions with the CPython interpreter on a line-by-line basis. Interactions with the CPython interpreter must be avoided as much as possible in the computationally intensive sections of the algorithms. For more information, please refer to this section of Cython’s tutorial# This generates a HTML report (`source.html`) for `source.c`. cythonX --annotate source.pyx
Tips for performance#
Understand the GIL in context for CPython (which problems it solves, what are its limitations) and get a good understanding of when Cython will be mapped to C code free of interactions with CPython, when it will not, and when it cannot (e.g. presence of interactions with Python objects, which include functions). In this regard, PEP073 provides a good overview and context and pathways for removal.
Make sure you have deactivated checks.
Always prefer memoryviews instead over
cnp.ndarray
when possible: memoryviews are lightweight.Avoid memoryview slicing: memoryview slicing might be costly or misleading in some cases and we better not use it, even if handling fewer dimensions in some context would be preferable.
Decorate final classes or methods with
@final
(this allows removing virtual tables when needed)Inline methods and function when it makes sense
Make sure your Cython compilation units use NumPy recent C API.
In doubt, read the generated C or C++ code if you can: “The fewer C instructions and indirections for a line of Cython code, the better” is a good rule of thumb.
nogil
declarations are just hints: when declaring thecdef
functions as nogil, it means that they can be called without holding the GIL, but it does not release the GIL when entering them. You have to do that yourself either by passingnogil=True
tocython.parallel.prange
explicitly, or by using an explicit context manager:cdef inline void my_func(self) nogil: # Some logic interacting with CPython, e.g. allocating arrays via NumPy. with nogil: # The code here is run as is it were written in C. return 0
This item is based on this comment from Stéfan’s Benhel
Direct calls to BLAS routines are possible via interfaces defined in
sklearn.utils._cython_blas
.
Using OpenMP#
Since scikit-learn can be built without OpenMP, it’s necessary to protect each direct call to OpenMP.
The _openmp_helpers
module, available in
sklearn/utils/_openmp_helpers.pyx
provides protected versions of the OpenMP routines. To use OpenMP routines, they
must be cimported
from this module and not from the OpenMP library directly:
from sklearn.utils._openmp_helpers cimport omp_get_max_threads
max_threads = omp_get_max_threads()
The parallel loop, prange
, is already protected by cython and can be used directly
from cython.parallel
.
Types#
Cython code requires to use explicit types. This is one of the reasons you get a
performance boost. In order to avoid code duplication, we have a central place
for the most used types in
sklearn/utils/_typedefs.pyd.
Ideally you start by having a look there and cimport
types you need, for example
from sklear.utils._typedefs cimport float32, float64