This is the 2nd part of the serie about numscons. This part will present scons in more details, to show it can solve problems mentioned in part 1.
scons is a software intended as a replacement to the venerable make software. It is written in python, making it a logical candidate to build complex extension code like numpy and scipy. The scons process is driven by a scons script, as make process is driven by a Makefile. As makefiles, scons scripts are declarative, and scons automatically builds the Directed Acyclic Graph (DAG) from the description in scons scripts to build the software in a correct order. The comparison stops here, though, because scons is fundamentally different than make in many aspects.
Scons scripts are python scripts
Not only Scons itself is written in python, but scons scripts themselves are also python scripts. Almost anything possible in python is possible in scons script; rules in makefiles are mostly replaced by Builders in scons parlance, which are python functions. This also means that anything fancy done in numpy.distutils can be used in scons script if the need arises, which is not a small feat.
Scons has a top notch dependency system
This is one of the reason people go from make to scons. Although make does handle dependency, you have to set up the dependencies in the rules, for example, for a simple object file hello.c which has a header hello.h:
hello.o : hello.c hello.h $(CC) -c hello.c -o hello.o
If you don’t set the hello.h, and changes hello.h later, make will not detect it as a change, and will consider hello.o as up to date. This is quickly becoming intractable for large projects, and thus several softwares exist to automatically handle dependency and generate rules for make. Automake (used in most projects using autotools) does this, for example; distutils itself does this, but it is not really reliable. With make files, you have to regenerate the make files every time the dependency changes.
On the contrary, scons does this automatically: if you have #include “hello.h” in your source file, scons will automatically add hello.h as a dependency to hello.c. It does though by scanning hello.c content. Even better, scons automatically adds for each target a dependency on the code and commands used to build the target; concretely, if you build some C code, and the compiler changes, scons detects it.
Thus, scons solves for free the dependency problem, one of the fundamental problem of distutils for extension code (this problem is the first in the list of distutils revamp goals).
build configurations are handled in objects, not in code:
Another fundamental problem of distutils is the way it stores knowledge about build a particular kind of target: the compilation flags, compilers, paths are embedded in the code of distutils itself, and not available programmatically. Some of it is available through distutils.sysconfig, but not always (in particular, it is not available for python built with MS Visual Studio).
On the other hand, Scons stores compiler flags and any kind of build specific knowledge in environment objects. In that regard, Environment instances are like python dictionaries, which store compiler, compiler flags, etc… Those environment can be copied, modified at will. They can also be used to compile differently different source files, for example with different optimization or warning level. For example
warnflags = ['-Wall', '-W'] env = Environment() warnenv = env.Clone(CFLAGS = warnflags)
Will create two environments, and any build command related to env will use the default compiler flags, whereas warnenv will use the warning flags. This also makes customization by the user much easier. People often have trouble compiling numpy with different options, for example for more agressive compilation:
CFLAGS="-O3 -funroll-loops" python setup.py build
Does not work because CFLAGS overrides CFLAGS as used by distutils, and all compiler flags are kept in the same variable (Flags from distutils and flags from the user are stored at the same place). With scons, those can easily be put in different locations. With numscons, those work out of the box:
python setup.py build # Default build CFLAGS="-W -Wall -Wextra -DDEBUG -g" python setup.py build # Unoptimized, debug build CFLAGS="-funroll-loops -O3" python setup.py build # Agressive build
scons enables straightforward compilation customization through the command line. This is important for users who like to build numpy/scipy on special configuration (which is quite common in the scientific community), and also for packagers, who complain a lot about distutils and its weird arguments handling.
Scons is extensible
scons is also extensible. Although it has some quircks, in particular some unpythonic way of doing things, it is built with customization in mind. As mentionned earlier, scons generate targets from source (for example hello.o from hello.c) through special methods called Builders. It is possible and relatively easy to create your own builder. Builders can be complex, though, but that’s because they can be very flexible:
- Builders can have their own scanner. For example, the f2py builder in numscons has its own scanner to automatically handle dependencies in <include_file=…> f2py directives.
- Builders can have their own emitters: an emitter is a function which generate the list of targets from the list of sources. It can be used to dynamically add new source files, and modify the list of targets. For example, when building f2py extensions, some extra files are needed, and emitter is a way to do it.
- Builders have many other options which I won’t talk about here.
The scons wiki also contains a vast range of builders for different kind of tasks (building documentation, tarballs, etc…). With builders, building code using swig, cython, ctypes is possible, and does not require some distutils magic: if you know how to build them from the command line, implementing builders for them is relatively straifgtforward, as long as they fit in the DAG view (f2py for example was quite difficult to fit there).
Scons has a configure subsystem
When building numpy/scipy, we need to check for dependencies such as BLAS/LAPACK, fft libraries, etc… The way numpy.distutils does it is to look for files in some paths. This is highly unreliable, because the mere existence of a file does not mean it is usable; in particular, maybe it is too old, or nor usable by the used compiler, etc… Scons has a configure subsystem which works in a manner similar to autotools: to check for libfoo with the foo.h header, scons will try to compile a code snippet including foo.h, and try to link it with -lfoo (or /LIB:foo.lib with MS compiler). This is much more robust. Robustness is important here because people often try to build their own blas/lapack, make some mistake in the process, and then can build numpy successfully. Only once they try to run numpy do they have some problems. Another problem with the current scheme in numpy.distutils is that it is fragile, and difficult to modify by people with unusual configuration (Using Intel or AMD optimized libraries for example); thus, only the few people who know enough about numpy.distutils can do it. Finally, the scons subsystem is much easier to use:
config = Configure() config.CheckLibraryWithHeader('foo', 'foo.h') config.Finish()
Is straightfoward, whereas the same thing in numpy.distutils takes around 50 lines of code. Out of the box, the scons configure subsystem has the following checks:
- CheckHeader: to check for the availability of a C/C++ header
- CheckLib: to check for the availability of a library
- CheckType/CheckTypeSize: to check for the availability of a type and its size
- CheckDeclaration: to check for #define
An example I find striking is to compare the setup.py and the scons script for numpy.core. Because of the configure subsystem, the scons script is much easier to understsand IMHO.
Now, the scons subsystem is not ideal either: internally, it relies heavily on some obscure features of scons itself for the dependency handling, which means it is quite fragile. For most usages (in particular checking for libraries/headers, which is the only thing that the vast majority of numscons users will use), this works perfectly. For some advanced uses of the subsystem, this is problematic: the fortran configuration subsystem of numscons for example requires grepping through the output (both stdout/stderr) of the builders inside the checkers, and this does not work well in scons (I have to bypass the configure buidlers, basically).
When looking at the list prepared by David M. Cook for distutils improvements, one can see that scons already solve most of them:
- better dependency handling: done by scons DAG handling
- make it easier to use a specific compiler or compiler option: through scons environments
- allow .c files to specify what options they should/shouldn’t be compiled with (such as using -O1 when optimization screws up, or not using -Wall for .c made from Pyrex files: through scons environments
- simplify system_info so that adding checks for libraries, etc., is easier: through scons configure subsytem
- a more “pluggable” architecture: adding source file generators (such as Pyrex or SWIG) should be easy: through builders, actions, etc..
And more interesting for me, when I see some problems in scons, I can solve them upstream, so that it benefit other people, not just numpy/scipy. In particular, the fortran support was problematic in scons, and since scons 0.98.2, my work for a new fortran support is available. CheckTypeSize and CheckDeclaration, as well as some configuration header generation improvements were also committed upstream.
In Part 3, I will explain the basic design of numscons, and how it brings scons power into numpy build system.