.. _cython: Cython Best Practices, Conventions and Knowledge ================================================ This documents tips to develop Cython code in scikit-learn. Tips for developing with Cython in scikit-learn ----------------------------------------------- Tips to ease development ^^^^^^^^^^^^^^^^^^^^^^^^ * Time spent reading `Cython's documentation `_ is not time lost. * If you intend to use OpenMP: On MacOS, system's distribution of ``clang`` does not implement OpenMP. You can install the ``compilers`` package available on ``conda-forge`` which comes with an implementation of OpenMP. * Activating `checks `_ might help. E.g. for activating boundscheck use: .. code-block:: bash export SKLEARN_ENABLE_DEBUG_CYTHON_DIRECTIVES=1 * `Start from scratch in a notebook `_ to understand how to use Cython and to get feedback on your work quickly. If you plan to use OpenMP for your implementations in your Jupyter Notebook, do add extra compiler and linkers arguments in the Cython magic. .. code-block:: python # For GCC and for clang %%cython --compile-args=-fopenmp --link-args=-fopenmp # For Microsoft's compilers %%cython --compile-args=/openmp --link-args=/openmp * To debug C code (e.g. a segfault), do use ``gdb`` with: .. code-block:: bash gdb --ex r --args python ./entrypoint_to_bug_reproducer.py * To have access to some value in place to debug in ``cdef (nogil)`` context, use: .. code-block:: cython with gil: print(state_to_print) * Note that Cython cannot parse f-strings with ``{var=}`` expressions, e.g. .. code-block:: bash print(f"{test_val=}") * scikit-learn codebase has a lot of non-unified (fused) types (re)definitions. There currently is `ongoing work to simplify and unify that across the codebase `_. For now, make sure you understand which concrete types are used ultimately. * You might find this alias to compile individual Cython extension handy: .. code-block:: # You might want to add this alias to your shell script config. alias cythonX="cython -X language_level=3 -X boundscheck=False -X wraparound=False -X initializedcheck=False -X nonecheck=False -X cdivision=True" # This generates `source.c` as if you had recompiled scikit-learn entirely. cythonX --annotate source.pyx * Using the ``--annotate`` option with this flag allows generating a HTML report of code annotation. This report indicates interactions with the CPython interpreter on a line-by-line basis. Interactions with the CPython interpreter must be avoided as much as possible in the computationally intensive sections of the algorithms. For more information, please refer to `this section of Cython's tutorial `_ .. code-block:: # This generates a HTML report (`source.html`) for `source.c`. cythonX --annotate source.pyx Tips for performance ^^^^^^^^^^^^^^^^^^^^ * Understand the GIL in context for CPython (which problems it solves, what are its limitations) and get a good understanding of when Cython will be mapped to C code free of interactions with CPython, when it will not, and when it cannot (e.g. presence of interactions with Python objects, which include functions). In this regard, `PEP073 `_ provides a good overview and context and pathways for removal. * Make sure you have deactivated `checks `_. * Always prefer memoryviews instead over ``cnp.ndarray`` when possible: memoryviews are lightweight. * Avoid memoryview slicing: memoryview slicing might be costly or misleading in some cases and we better not use it, even if handling fewer dimensions in some context would be preferable. * Decorate final classes or methods with ``@final`` (this allows removing virtual tables when needed) * Inline methods and function when it makes sense * In doubt, read the generated C or C++ code if you can: "The fewer C instructions and indirections for a line of Cython code, the better" is a good rule of thumb. * ``nogil`` declarations are just hints: when declaring the ``cdef`` functions as nogil, it means that they can be called without holding the GIL, but it does not release the GIL when entering them. You have to do that yourself either by passing ``nogil=True`` to ``cython.parallel.prange`` explicitly, or by using an explicit context manager: .. code-block:: cython cdef inline void my_func(self) nogil: # Some logic interacting with CPython, e.g. allocating arrays via NumPy. with nogil: # The code here is run as is it were written in C. return 0 This item is based on `this comment from Stéfan's Benhel `_ * Direct calls to BLAS routines are possible via interfaces defined in ``sklearn.utils._cython_blas``. Using OpenMP ^^^^^^^^^^^^ Since scikit-learn can be built without OpenMP, it's necessary to protect each direct call to OpenMP. The `_openmp_helpers` module, available in `sklearn/utils/_openmp_helpers.pyx `_ provides protected versions of the OpenMP routines. To use OpenMP routines, they must be ``cimported`` from this module and not from the OpenMP library directly: .. code-block:: cython from sklearn.utils._openmp_helpers cimport omp_get_max_threads max_threads = omp_get_max_threads() The parallel loop, `prange`, is already protected by cython and can be used directly from `cython.parallel`. Types ~~~~~ Cython code requires to use explicit types. This is one of the reasons you get a performance boost. In order to avoid code duplication, we have a central place for the most used types in `sklearn/utils/_typedefs.pyd `_. Ideally you start by having a look there and `cimport` types you need, for example .. code-block:: cython from sklear.utils._typedefs cimport float32, float64