4. Developers’ Tips for Debugging¶
4.1. Memory errors: debugging Cython with valgrind¶
While python/numpy’s built-in memory management is relatively robust, it can lead to performance penalties for some routines. For this reason, much of the high-performance code in scikit-learn in written in cython. This performance gain comes with a tradeoff, however: it is very easy for memory bugs to crop up in cython code, especially in situations where that code relies heavily on pointer arithmetic.
Memory errors can manifest themselves a number of ways. The easiest ones to debug are often segmentation faults and related glibc errors. Uninitialized variables can lead to unexpected behavior that is difficult to track down. A very useful tool when debugging these sorts of errors is valgrind.
Valgrind is a command-line tool that can trace memory errors in a variety of code. Follow these steps:
Install valgrind on your system.
Download the python valgrind suppression file: valgrind-python.supp.
Follow the directions in the README.valgrind file to customize your python suppressions. If you don’t, you will have spurious output coming related to the python interpreter instead of your own code.
Run valgrind as follows:$> valgrind -v --suppressions=valgrind-python.supp python my_test_script.py
The result will be a list of all the memory-related errors, which reference lines in the C-code generated by cython from your .pyx file. If you examine the referenced lines in the .c file, you will see comments which indicate the corresponding location in your .pyx source file. Hopefully the output will give you clues as to the source of your memory error.
For more information on valgrind and the array of options it has, see the tutorials and documentation on the valgrind web site.