For quite some time, I wanted to add code coverage to the C part of numpy. The upcoming port to python 3k will make this even more useful, and besides, Stefan Van Der Walt promised me a beer if I could do it.
There are several tools to do code coverage of C code – the most well known is gcov (I obviously discard non-free tools – those tend to be fairly expensive anyway). The problem with gcov is its inability to do code coverage for dynamically loaded code such as python extensions. The solution is thus to build numpy and statically link it into python, which is not totally straightforward.
Statically linking simple extensions
I first looked into simpler extensions: the basic solution is to add the source files of the extensions into Modules/Setup.local in python sources. For example, to build the zlib module statically, you add
zlib zlibmodule.c -I$(prefix)/include -L$(exec_prefix)/lib -lz
And run make, this will statically link the zlib module to python. One simple way to check whether the extension is indeed statically link is to look into the __file__ attribute of the extension. In the dynamically loaded case, the __file__ returns the location of the .so, but the attribute does not exist in the static case.
To use gcov, two compilation flags are needed, and one link flag:
gcc -c -fprofile-arcs -ftest-coverage …
gcc … -lgcov
Note that -lgcov must be near the end of the link command (after other libraries flags). To do code coverage of e.g. the zlib module, the following works in Modules/Setup.local:
zlib zlibmodule.c -I$(prefix)/include -fprofile-arcs -ftest-coverage -L$(exec_prefix)/lib -lz -lgcov
If everything goes right after a make call, you should have two files zlibmodule.gcda and zlibmodule.gcno into your Modules directory. You can now run gcov in Modules to get code coverage:
cd Modules && gcov zlibmodule
Of course, since nothing was run yet, the code coverage is 0. After running the zlib test suite, things are better though:
./python Lib/test/test_zlib.py && gcov -o Modules Modules/zlibmodule
The -o tells gcov where to look for gcov data (the .gcda an .gcno files), and the output is
Lines executed:74.55% of 448
Build numpy statically
I quickly added a hack to build numpy C code statically instead of dynamically in numscons, static_build branch, available on github. As it is, numpy will not work, some source code modifications are needed to make it work. The modifications reside in the static_link branch on github as well.
Then, to statically build numpy with code coverage:
LINKFLAGSEND=”-lgcov” CFLAGS=”-pg -fprofile-arcs -ftest-coverage” $PYTHON setupscons.py scons –static=1
where $PYTHON refers to the python you build from sources. This will build every extension as a static library. To link them to the python binary, I simply added a fake source file and link the numpy as libraries to the fake source in Modules/Setup.local
multiarray fake.c -L$LIBPATH -lmultiarray -lnpymath
umath fake.c -L$LIBPATH -lumath -lnpymath
_sort fake.c -L$LIBPATH -l_sort -lnpymath
where LIBPATH refers to the path where to find the static numpy libraries (e.g. build/scons/numpy/core in your numpy source tree). To run the testsuite, one has to make sure to import a numpy where multiarray, umath and _sort extensions have been removed, it will crash otherwise (as the extesions would be present twice in the python process, one for the dynamically loaded code, one for the statically linked code). The test suite kind of run (~1500 tests), and on can get code coverage afterwards. For multiarray extension, here is what I get:
Lines executed:52.56% of 293
Lines executed:50.00% of 12
Lines executed:62.23% of 1030
Lines executed:68.38% of 117
Lines executed:81.48% of 189
Lines executed:47.43% of 350
Lines executed:61.96% of 1028
Lines executed:42.31% of 208
Lines executed:64.69% of 1583
Lines executed:70.41% of 774
Lines executed:77.95% of 721
Lines executed:51.80% of 361
Lines executed:44.09% of 372
Lines executed:50.00% of 60
Lines executed:47.35% of 942
Lines executed:56.11% of 442
Lines executed:66.67% of 183
Lines executed:76.81% of 345
Lines executed:55.07% of 937
Lines executed:59.08% of 523
Lines executed:0.00% of 111
Lines executed:66.67% of 129
Lines executed:59.49% of 316
Lines executed:56.00% of 25
Lines executed:42.42% of 877
Lines executed:89.36% of 47
Lines executed:58.75% of 514
Lines executed:49.12% of 1134
The figures themselves are not that meaningful ATM, since the test suite does not run completely, and the built numpy is a quite bastardized version of the real numpy.
The numpy modifications, although small, are very hackish – I just wanted to see if that could work at all. If time permits, I hope to be able to automate most of this, and have a system where it can be integrated in the trunk. I am still not sure about the best way to build the extensions themselves. I can see other solutions, such as producing a single file per extension, with every internal numpy header/source integrated, so that they could be easily build from Setup.local. Or maybe a patch to the python sources so that make in python sources would automatically build numpy.
One thought on “First steps toward C code coverage in numpy”
Kudos, David! Remember to cash in on that beer at SciPy2009!