This will be the first post of a serie about numscons, a project I have been working now for a bit more than 6 months. Simply put, numscons is an alternative build system to build numpy/scipy and other python softwares which heavily rely on compiled code. Before talking about numscons, this first post will be a list of problems with the current build system.
Current flaws in distutils/numpy.distutils:
Here are some things that several people, including, would like to be able to do:
- If a package depends on a library, it is difficult to test for the dependency (header, library). In autoconf, it is one line to test for the headers/libraries. With numpy.distutils, you have to use 50 lines of code, and it is quite fragile.
- Not possible to build ctypes extensions in a portable way.
- Not possible to compile different part of a package with different compilation options.
- No dependency system: if you change some C code, the only reliable way to build correctly is to start from scratch.
- CFLAGS/FFLAGS/LDFLAGS do not have the expected semantics : instead of prepending options to the one used for actual compilation, they override the flags, which means that doing something like CFLAGS=”-O3″ will break, since -fPIC and all necessary options to build python extensions are missing.
- The way to use different BLAS/LAPACK/Compilers is arcane, with too many options, which may fail in different ways.
Why not improving the current build system ?
I sent last year an email on the numpy ML explaining the problems I got with distutils and its extensions numpy.distutils. The majority agreed that the current situation was less than ideal, but the people who knew enough about the current system to improve it could not spend a lot of time on it. The current build system is a set of extensions around distutils, the standard package for build/distribution under python. Here lies the first problem: distutils is a big mess. The code is ugly, badly designed, and not documented. In particular:
- Difficult to extend: although in theory, distutils has the Command class which can be inherited from, a lot of magic is going on, and there is not clear public API. Depending on the way you call distutils, the classes have different attributes !!!
- Distutils fundamentally works as a set of commands. You first do that, then that, then that. That’s a wrong model for building softwares; the right model is a DAG of dependencies (ala make). In particular, for numpy/scipy, when you change some C code, the only way to reliably rebuild the package is to start from scratch.
- the compilation options are spread everywhere in the code. Depending on the platform, it is available in distutils.sysconfig (UNIX) or not (windows). On the later, it is not possible to retrieve the options for compilation. This, combined with the lack of extensibility means simple things like building ctypes extensions is much more difficult than it should be.
Using scons to build compiled extensions:
For this reason, I thought it may be better to use a build system which knows about dependencies and compiled code, and preferably in python. The most known contender with those characteristics is scons. scons is a make replacement, written 100% in python. In particular:
- scons is built around the DAG concept. Its dependency system is top-notch: if you change link option, it will only relink; if header files change, scons automatically detects it.
- scons has a primitive but working system to check for dependencies (check for headers, libraries, etc…). It works like autoconf, that is instead of looking for files, it tries to build code snippets. This is much more robust than the current numpy.distutils ways, because if for example your blas/lapack is buggy, you can detect it. Since many people build their own blas/lapack for numpy/scipy, and get it wrong, this is important
- scons is heavily commented, reasonably well documented, and some relatively high-profiles companies are using it, so it is a proven software (vmware uses for some of its main products, Intel uses it, Doom and all Id-softwares on Linux are built with scons; it seems that generally, scons is quite popular in the gaming community, both open source and proprietary).
Scons has also some disadvantages:
- It uses ancient python (compatible with 1.5.2). This has many consequences which are unfortunate IMO, and the advantages of compatibility are outweights by its disadvantages IMO. In particular, some code is quite arcane because of it (use of apply instead of the foo(*args, **kw) idiom).
- A lot of things are ‘unpythonic’, and a lot of the logic in harcoded in the main callee, meaning you cannot really use it as a library within your project. You have to let scons drive the whole process.
- It misses a lot of essential features for packaging, meaning it is not often used for open source projects.
- It is relatively slow, although this is not a big problem for numpy/scipy.
- scons developers community is not large: it is mainly the job of 2-3 people, and I believe this is partly a consequence of 1 and 3.
Nevertheless, I decided to use scons, and I believe it was the right choice. One thing which pleased me is that instead of improving numpy.distutils, a fragile system that nobody outside numpy/scipy will use anyway, I instead spend time implementing missing features in scons, some of which are already integrated upstream (better fortran support, better support of some fortran compilers, etc…). This way, everybody can benefit of those new features.
Next post in the serie will be about the features I was interested in implementing in numscons, and how I implemented them.