Numscons: current state, future, alternative build tools for numpy

Several people in numpy/scipy community have raised build issues recently. Brian Granger has wondered whether numscons is still maintained, and Ondrej recently asked why numscons was not the default build tool for numpy. I thought I owe some explanations and how I see the future for numscons.

First, I am still interested in working on numscons, but lately, time has become a sparser ressource: I am at the end of my PhD, and as such cannot spend too much time on it. Also, numscons is more or less done, in the sense that it does what it was supposed to do. Of course, many problems remain. Most of them are either implementation details or platform specific bugs, which can only be dealt if numscons is integrated to numpy — which raises the question of integrating numscons into numpy.

Currently, I see one big limitation in numscons: the current architecture is based on launching a scons subprocess for every subpackage, sequentially. As stupid as this decisions may sound, there are relatively strong rationales for this. First, scipy is designed as a set of almost independent packages, and that’s true for the build process as well. Every subpackage declares its requirements (blas, lapack, etc…), and can be build independently of the others: if you launch the build process at the top of the source tree, the whole scipy is built; if you launch it in scipy/sparse/sparsetools, only sparsetools is built. This is a strong requirement: this is impossible to do with autotools, for example, unless each subpackage has its own configure (like gcc for example).
It is possible in theory with scons, but it is practically almost impossible to do so while staying compatible with distutils, because of build directory issues (the build directory cannot be a subdirectory of the source tree). When I started numscons, I could see only two solutions: launching independent scons builds for each subpackage, or having a whole source tree with the configuration at the top, but scons is too slow for the later solution (although it is certainly the best one from a design POV).

The second problem is that scons cannot be used as library. You cannot do something like “from scons import *; build(‘package1’); build(‘package2’)”. Which means the only simple solution to have independent builds for each package is to launch independent scons processes. Having to use subprocesses to launch scons is the single fundamental numscons issue.

1 Because scons is slow to start (it needs to check for tools, etc…), it means no-op builds are slow (it takes 20 seconds to complete a no-op full scipy built on a more than decent machine, which is why numscons has an option–package-list to list the packages to rescan, but that’s nothing more than an ugly hack).

2 Error handling is hard to do: if scons fails, it is hard to pass useful information back to the calling process

3 Since distutils still handle installation and tarballs generation, it needs to know about the source files. But since only scons knows about it, it is hard to pass this information back to distutils from scon. Currently, it only works because it knows the sources from the conventional setup.py files.

Another limitation I see with scons is the code quality: scons is a relatively old project, and focused a lot on backward compatibility, with a lot of cruft (scons still support python 1.5). There is still a lot of development happening, and is still supported; scons is used in several high profile projects (some vmware products are built with scons, Intel acknowledges its use internally, Google uses it – Steve Knight, the first author of scons works at Google on the Chrome project, and chromes sources have scons scripts). But there is a lot of tight coupling, and changing core implementations issues is extremely challenging. It is definitely much better than distutils (in the sense that in distutils, everything is wrong: the implementation, the documentation, the UI, the concepts). But fixing scons tight coupling is a huge task, to the point where rewriting from scratch some core parts may be easier (see here). There are also some design decisions in scons which are not great, like options handling (everything is passed through Environments instances, which is nothing more than a big global variable).

A potential solution would be to use waf insteaf of scons. Waf started as a scons fork, and dropped backward compatibility. Waf has several advantages:

  • it is much smaller and nicer implementation-wise than scons (core waf is ~ 4000 LOC, scons is ten times more). There are some things I can do today in waf I still have no idea how to do in scons, although I am much more familiar with the latter.
  • waf is much faster than scons (see here for some benchmarks)
  • it seems like waf can be used as a library

But:

  • waf is not stable (the API kept changing; the main waf developer said he would focus on stability from now on)
  • waf does not support Fortran – this one should be relatively easy to solve (I have some code working already for most fortran requirements in scipy)
  • I am still not satisfied with its tool handling – that’s a hard problem though, I have not seen a single build tool which handle this well. That’s not a fundamental issue to prevent the use of waf.
  • support on windows looks flaky

There may also be hope to see more cross-pollination between scons and waf – but that’s a long term goal, unless someone can work on it fulltime for at least several weeks IMHO. Porting numscons to waf should be relatively easy once fortran is handled – I think basic porting could be done in one day or two.

Ondrej also mentioned cmake: I think it would be very hard to build numpy with cmake, because it means you have to give up distutils entirely. How to make it work with easy_install ? How to generate tarballs, windows installers, etc… ? If anyone wants to try something else here is my suggestion: ignoring tarball/install issues, try to build numpy.core alone on windows, mac os X and linux, with either Atlas, Accelerate or nothing. Use scons scripts as informations for the checks to do (it is much more readable than distutils setup.py files). If you can do this, you will have solved most build details necessary to build the whole scipy. It will certainly give you a good overview of the difficulty of the task.

Advertisements

numscons and cython

numscons 0.9.2 has just been released. The main feat of this release is cython support: I implemented a small cython tool during the cython tutorial at scipy08, and now, you can build a cython extension from .py or .pyx:

from numscons import GetNumpyEnvironment
env = GetNumpyEnvironment(ARGUMENTS)
# cython tool not loaded by default
name = "cython"
env.Tool(name)
# Build a python extension from yop.py
env.DistutilsPythonExtension(source = ["yop.py"])

The example can be found in test/examples/cython in numscons sources. This is preliminary, since there is no way to pass option to cython generation.


numscons, part 2 : Why scons ?

This is the 2nd part of the serie about numscons. This part will present scons in more details, to show it can solve problems mentioned in part 1.

scons is a software intended as a replacement to the venerable make software. It is written in python, making it a logical candidate to build complex extension code like numpy and scipy. The scons process is driven by a scons script, as make process is driven by a Makefile. As makefiles, scons scripts are declarative, and scons automatically builds the Directed Acyclic Graph (DAG) from the description in scons scripts to build the software in a correct order. The comparison stops here, though, because scons is fundamentally different than make in many aspects.

Scons scripts are python scripts

Not only Scons itself is written in python, but scons scripts themselves are also python scripts. Almost anything possible in python is possible in scons script; rules in makefiles are mostly replaced by Builders in scons parlance, which are python functions. This also means that anything fancy done in numpy.distutils can be used in scons script if the need arises, which is not a small feat.

Scons has a top notch dependency system

This is one of the reason people go from make to scons. Although make does handle dependency, you have to set up the dependencies in the rules, for example, for a simple object file hello.c which has a header hello.h:

hello.o : hello.c hello.h
        $(CC) -c hello.c -o hello.o

If you don’t set the hello.h, and changes hello.h later, make will not detect it as a change, and will consider hello.o as up to date. This is quickly becoming intractable for large projects, and thus several softwares exist to automatically handle dependency and generate rules for make. Automake (used in most projects using autotools) does this, for example; distutils itself does this, but it is not really reliable. With make files, you have to regenerate the make files every time the dependency changes.

On the contrary, scons does this automatically: if you have #include “hello.h” in your source file, scons will automatically add hello.h as a dependency to hello.c. It does though by scanning hello.c content. Even better, scons automatically adds for each target a dependency on the code and commands used to build the target; concretely, if you build some C code, and the compiler changes, scons detects it.

Thus, scons solves for free the dependency problem, one of the fundamental problem of distutils for extension code (this problem is the first in the list of distutils revamp goals).

build configurations are handled in objects, not in code:

Another fundamental problem of distutils is the way it stores knowledge about build a particular kind of target: the compilation flags, compilers, paths are embedded in the code of distutils itself, and not available programmatically. Some of it is available through distutils.sysconfig, but not always (in particular, it is not available for python built with MS Visual Studio).

On the other hand, Scons stores compiler flags and any kind of build specific knowledge in environment objects. In that regard, Environment instances are like python dictionaries, which store compiler, compiler flags, etc… Those environment can be copied, modified at will. They can also be used to compile differently different source files, for example with different optimization or warning level. For example

warnflags = ['-Wall', '-W']
env = Environment()
warnenv = env.Clone(CFLAGS = warnflags)

Will create two environments, and any build command related to env will use the default compiler flags, whereas warnenv will use the warning flags. This also makes customization by the user much easier. People often have trouble compiling numpy with different options, for example for more agressive compilation:

CFLAGS="-O3 -funroll-loops" python setup.py build

Does not work because CFLAGS overrides CFLAGS as used by distutils, and all compiler flags are kept in the same variable (Flags from distutils and flags from the user are stored at the same place). With scons, those can easily be put in different locations. With numscons, those work out of the box:

python setup.py build # Default build
CFLAGS="-W -Wall -Wextra -DDEBUG -g" python setup.py build # Unoptimized, debug build
CFLAGS="-funroll-loops -O3" python setup.py build # Agressive build

scons enables straightforward compilation customization through the command line. This is important for users who like to build numpy/scipy on special configuration (which is quite common in the scientific community), and also for packagers, who complain a lot about distutils and its weird arguments handling.

Scons is extensible

scons is also extensible. Although it has some quircks, in particular some unpythonic way of doing things, it is built with customization in mind. As mentionned earlier, scons generate targets from source (for example hello.o from hello.c) through special methods called Builders. It is possible and relatively easy to create your own builder. Builders can be complex, though, but that’s because they can be very flexible:

  • Builders can have their own scanner. For example, the f2py builder in numscons has its own scanner to automatically handle dependencies in <include_file=…> f2py directives.
  • Builders can have their own emitters: an emitter is a function which generate the list of targets from the list of sources. It can be used to dynamically add new source files, and modify the list of targets. For example, when building f2py extensions, some extra files are needed, and emitter is a way to do it.
  • Builders have many other options which I won’t talk about here.

The scons wiki also contains a vast range of builders for different kind of tasks (building documentation, tarballs, etc…). With builders, building code using swig, cython, ctypes is possible, and does not require some distutils magic: if you know how to build them from the command line, implementing builders for them is relatively straifgtforward, as long as they fit in the DAG view (f2py for example was quite difficult to fit there).

Scons has a configure subsystem

When building numpy/scipy, we need to check for dependencies such as BLAS/LAPACK, fft libraries, etc… The way numpy.distutils does it is to look for files in some paths. This is highly unreliable, because the mere existence of a file does not mean it is usable; in particular, maybe it is too old, or nor usable by the used compiler, etc… Scons has a configure subsystem which works in a manner similar to autotools: to check for libfoo with the foo.h header, scons will try to compile a code snippet including foo.h, and try to link it with -lfoo (or /LIB:foo.lib with MS compiler). This is much more robust. Robustness is important here because people often try to build their own blas/lapack, make some mistake in the process, and then can build numpy successfully. Only once they try to run numpy do they have some problems. Another problem with the current scheme in numpy.distutils is that it is fragile, and difficult to modify by people with unusual configuration (Using Intel or AMD optimized libraries for example); thus, only the few people who know enough about numpy.distutils can do it. Finally, the scons subsystem is much easier to use:


config = Configure()

config.CheckLibraryWithHeader('foo', 'foo.h')

config.Finish()

Is straightfoward, whereas the same thing in numpy.distutils takes around 50 lines of code. Out of the box, the scons configure subsystem has the following checks:

  • CheckHeader: to check for the availability of a C/C++ header
  • CheckLib: to check for the availability of a library
  • CheckType/CheckTypeSize: to check for the availability of a type and its size
  • CheckDeclaration: to check for #define

An example I find striking is to compare the setup.py and the scons script for numpy.core. Because of the configure subsystem, the scons script is much easier to understsand IMHO.

Now, the scons subsystem is not ideal either: internally, it relies heavily on some obscure features of scons itself for the dependency handling, which means it is quite fragile.  For most usages (in particular checking for libraries/headers, which is the only thing that the vast majority of numscons users will use), this works perfectly. For some advanced uses of the subsystem, this is problematic: the fortran configuration subsystem of numscons for example requires grepping through the output (both stdout/stderr) of the builders inside the checkers, and this does not work well in scons (I have to bypass the configure buidlers, basically).

Conclusion

When looking at the list prepared by David M. Cook for distutils improvements, one can see that scons already solve most of them:

  • better dependency handling: done by scons DAG handling
  • make it easier to use a specific compiler or compiler option: through scons environments
  • allow .c files to specify what options they should/shouldn’t be compiled with (such as using -O1 when optimization screws up, or not using -Wall for .c made from Pyrex files: through scons environments
  • simplify system_info so that adding checks for libraries, etc., is easier: through scons configure subsytem
  • a more “pluggable” architecture: adding source file generators (such as Pyrex or SWIG) should be easy: through builders, actions, etc..

And more interesting for me, when I see some problems in scons, I can solve them upstream, so that it benefit other people, not just numpy/scipy. In particular, the fortran support was problematic in scons, and since scons 0.98.2, my work for a new fortran support is available. CheckTypeSize and CheckDeclaration, as well as some configuration header generation improvements were also committed upstream.

In Part 3, I will explain the basic design of numscons, and how it brings scons power into numpy build system.

numscons, part 1: the problems with building numpy/scipy with distutils

This will be the first post of a serie about numscons, a project I have been working now for a bit more than 6 months. Simply put, numscons is an alternative build system to build numpy/scipy and other python softwares which heavily rely on compiled code. Before talking about numscons, this first post will be a list of problems with the current build system.

Current flaws in distutils/numpy.distutils:

Here are some things that several people, including, would like to be able to do:

  1. If a package depends on a library, it is difficult to test for the dependency (header, library). In autoconf, it is one line to test for the headers/libraries. With numpy.distutils, you have to use 50 lines of code,  and it is quite fragile.
  2. Not possible to build ctypes extensions in a portable way.
  3. Not possible to compile different part of a package with different compilation options.
  4. No dependency system: if you change some C code, the only reliable way to build correctly is to start from scratch.
  5. CFLAGS/FFLAGS/LDFLAGS do not have the expected semantics : instead of prepending options to the one used for actual compilation, they override the flags, which means that doing something like CFLAGS=”-O3″ will break, since -fPIC and all necessary options to build python extensions are missing.
  6. The way to use different BLAS/LAPACK/Compilers is arcane, with too many options, which may fail in different ways.

Why not improving the current build system ?

I sent last year an email on the numpy ML explaining the problems I got with distutils and its extensions numpy.distutils. The majority agreed that the current situation was less than ideal, but the people who knew enough about the current system to improve it could not spend a lot of time on it. The current build system is a set of extensions around distutils, the standard package for build/distribution under python. Here lies the first problem: distutils is a big mess. The code is ugly, badly designed, and not documented. In particular:

  1. Difficult to extend: although in theory, distutils has the Command class which can be inherited from, a lot of magic is going on, and there is not clear public API. Depending on the way you call distutils, the classes have different attributes !!!
  2. Distutils fundamentally works as a set of commands. You first do that, then that, then that. That’s a wrong model for building softwares; the right model is a DAG of dependencies (ala make). In particular, for numpy/scipy, when you change some C code, the only way to reliably rebuild the package is to start from scratch.
  3. the compilation options are spread everywhere in the code. Depending on the platform, it is available in distutils.sysconfig (UNIX) or not (windows). On the later, it is not possible to retrieve the options for compilation. This, combined with the lack of extensibility means simple things like building ctypes extensions is much more difficult than it should be.

Using scons to build compiled extensions:

For this reason, I thought it may be better to use a build system which knows about dependencies and compiled code, and preferably in python. The most known contender with those characteristics is scons. scons is a make replacement, written 100% in python. In particular:

  1. scons is built around the DAG concept. Its dependency system is top-notch: if you change link option, it will only relink; if header files change, scons automatically detects it.
  2. scons has a primitive but working system to check for dependencies (check for headers, libraries, etc…). It works like autoconf, that is instead of looking for files, it tries to build code snippets. This is much more robust than the current numpy.distutils ways, because if for example your blas/lapack is buggy, you can detect it. Since many people build their own blas/lapack for numpy/scipy, and get it wrong, this is important
  3. scons is heavily commented, reasonably well documented, and some relatively high-profiles companies are using it, so it is a proven software (vmware uses for some of its main products, Intel uses it, Doom and all Id-softwares on Linux are built with scons; it seems that generally, scons is quite popular in the gaming community, both open source and proprietary).

Scons has also some disadvantages:

  1. It uses ancient python (compatible with 1.5.2). This has many consequences which are unfortunate IMO, and the advantages of compatibility are outweights by its disadvantages IMO. In particular, some code is quite arcane because of it (use of apply instead of the foo(*args, **kw) idiom).
  2. A lot of things are ‘unpythonic’, and a lot of the logic in harcoded in the main callee, meaning you cannot really use it as a library within your project. You have to let scons drive the whole process.
  3. It misses a lot of essential features for packaging, meaning it is not often used for open source projects.
  4. It is relatively slow, although this is not a big problem for numpy/scipy.
  5. scons developers community is not large: it is mainly the job of 2-3 people, and I believe this is partly a consequence of 1 and 3.

Nevertheless, I decided to use scons, and I believe it was the right choice. One thing which pleased me is that instead of improving numpy.distutils, a fragile system that nobody outside numpy/scipy will use anyway, I instead spend time implementing missing features in scons, some of which are already integrated upstream (better fortran support, better support of some fortran compilers, etc…). This way, everybody can benefit of those new features.

Next post in the serie will be about the features I was interested in implementing in numscons, and how I implemented them.

Why setuptools does not matter to me

It is that time of the year where packaging questions resurface in the open (on python-dev and by Armin)

Armin wrote an article on why he loves setuptools, and one of the main takeaway of his text is that one should not replace X with Y without understanding why X was created in the first place. There is another takeaway, though: none of the features Armin mentioned matters much to me. This is not to say they are not important: given the success of setuptools or pip, it would be stupid not to recognize they fulfill an important gap for a lot of people.

About tradeoffs

But while those solutions provide a useful set of features, it is important to realize what they prevent as well. Nick touches this topic a bit on python-dev, but I mean something a bit different here. Some examples:

  • First, the way setuptools install eggs by adding things to sys.path caused a lot of additional stat on the filesystem. In the scientific community (and in corporate environments as well), people often have to use NFS. This can cause import speed to take a lot of time (above 1 minute is not unheard of).
  • Setuptools monkey patches distutils. This has a serious consequence for people who have their own distutils extensions, since you essentially have to deal with two code paths for anything that setuptools monkey patches.

As mentioned by Armin, setuptools had to do the the things it did to support multi-versioning. But this means that it has a significant cost for people who do not care about having multiple versions of the same package. This matters less today than it used to, though, thanks for virtual env, and pip that installs things as non-eggs.

Similar argument can be made about monkey-patching: distutils is not designed to be extensible, especially because of how commands are tightly coupled together. You effectively can NOT extend distutils without monkey-patching it significantly.

Hackable solutions

A couple of years ago, I decided that I could not put up with numpy.distutils extensions and the aforementioned distutils issues anymore. I started working on Bento sometimes around fall 2009, with the intend to bootstrap it by reusing the low-level distutils code, and getting rid of commands and distribution. I also wanted to experiment with simpler solutions to some more questionable setuptools designs such as data resource with pkg_resources.

I think hackable solutions are the key to help people solving packaging solution(s). There is no solution that will work for everyone, because the usecases are so different and clash with each other. Personally, having a system that works like apt-get (reliable and fast metadata search, reliable install/uninstall, etc…) is the holy grail, but I understand that that’s not what other people are after.

What matters the most is to only put in the stdlib what is uncontroversial and battle-tested in the wild. Tarek’s and the rest of the packaging team efforts to specify and write PEP around the metadata are a very good step in that direction. The PEP for metadata works well because it essentially specify things that have been used succesfully (and relatively uncontroversial).

But an elusive PEP around compilers as has been suggested is not that interesting IMO: I could write something to point every API issues with how compilation work in distutils, but that sounds pointless without a proposal for a better system. And I don’t want to design a better system, I want to be able to use one (waf, scons, fbuilt, gyp, whatever). Writing bento is my way of discovering a good design to do just that.

Bento 0.0.4 released !

I have just released the new version of Bento, 0.0.4. You can get it on github as usual

 

Bento itself did not change too much, except for the support of sub-packages and a few things. But now bento can build both numpy and scipy on the “easy” platforms (linux + Atlas + gcc/clang). This posts shows a few cool things that you can do now with bento

Full distribution check

The best way to use this version of bento is to do the following:

# Download bento and create bentomaker
git clone http://github.com/cournape/Bento.git bento-git
cd bento-git && python bootstrap.py && cd ..
# Download the _bento_build branch from numpy
git clone http://github.com/cournape/numpy.git numpy-git
cd numpy-git && git checkout -b bento_build origin/_bento_build
# Create a source tarball from numpy, configure, build and test numpy
# from that tarball
../bento-git/bentomaker distcheck

For some reasons I am still unclear about, the test suite fails to run from distcheck for scipy, but that seems to be more of a nose issue than bento proper.

Building numpy with clang

Assuming you are on Linux, you can try to build numpy with clang, the LLVM-based C compiler. Clang is faster at compiling than gcc, and generally gives better error messages than gcc. Although bento itself does not have any support for clang yet, you can easily play with the bento scripts to do so. In the top bscript file from numpy, at the end of the post_configure hook, replace every compiler with clang, i.e.:

for flag in ["CC", "PYEXT_CC"]:
     yctx.env[flag] = ["clang"]

Once the project is configured, you can also get a detailed look at the configured options, in the file build/default.env.py. You should not modify this file, but it is very useful to debug build issues. Another aid for debugging configuration options is the build/config.log file. Not only does it list every configuration command (both success and failures), but it also shows the source content as well as the command output.

What’s coming next ?

Version 0.0.5 will hopefully have a shorter release period than 0.0.4. The goal for 0.0.5 is to make bento good enough so that other people can jump in bento development.

The main features I am thinking about are windows and python 3 support + a lot of code cleaning/documentation. Windows should not be too difficult, it is mainly about ripping off numscons/scons code for Visual studio support and adapt it into yaku. I have already started working on python 3 support as well – the main issue is bootstrapping bento, and finding an efficient process to work on both python 2 and 3 at the same time. Depending on the difficulty, I will also try to add proper dependency handling in yaku for compiled libraries and dependent headers: ATM, yaku does not detect header change, nor does it rebuild an extension if the linked libraries changed. An alternative is to bite the bullet and start working on integration with waf, which already does all this internally.

Recent progress on bento – build numpy !

I have spent the last few days on a relatively big feature for bento: recursive package description. The idea is to be able to simply describe packages in a deeply nested hierarchy without having to write long paths, and to split complicated packages descriptions into several files.

At the bento.info level, the addition is easy:

Subento: numpy/core, numpy/lib ...

It took me more time to figure out a way to do it in the hook file. I ended up with a recurse decorator:

@recurse(["numpy/core/bscript", "numpy/lib/bscript"])
@post_configure
def some_func(ctx):
    ....

I am not sure it is the right solution yet, but it works for now. My first idea was to simply use a recurse function attached to hook contexts (the ctx argument), but I did not find a good way to guarantee an execution order (declaration order == execution order), and it was a bit unintuitive to integrate both hook decorator and the recurse together.

The reason why I tackle this now is that bento is at a stage where it need to be used on “real” builds to get a feeling of what works and what does not. The target is numpy and hopefully later scipy. Although I still hope to integrate waf or scons in bento as the canonical way of building numpy/scipy with bento, this also gives a good test for yaku (my simple build system).

It took me less than half a day to port the scons scripts to bento/yaku. A full build, unnoptimized build of numpy with clang is less than 10 seconds. A no-op build is ~ 150 ms, but as yaku does not have all the infrastructure for header dependency tracking yet, the number for no-op build is rather meaningless.

Bento (ex-toydist): what’s coming for 0.0.3

A lot has happened feature-wise since the 0.0.2 release of toydist. This is a
short summary of what is about to come in the 0.0.3 release.

Toydist renamed to bento

I have finally found a not too sucky name for toydist: bento. As you may know, bento is a Japanese word for lunch-box (see picture if you have no idea what I am talking about). The idea is that those are often nicely prepared, and bentomaker becomes the command to get a nicely packaged software :)

Integration of yaku, a micro build framework

The 0.0.2 release of toydist was still dependent on distutils to build C
extensions. I have since then integrated a small package to build things, yaku
(“grill, bake” in Japanese). This gives the following features when building C extensions

  • basic dependency handling (soon auto-detection
    of header file dependency through compiler-specific extensions)
  • reliable out-of-date detection though file content checksum
  • reliable parallel execution

I still think complex packages should use a real build system like waf or
scons, and in that regard, bento will remain completely agnostic (the distutils
build is still available as a configuration option).

Hook

Any command may now be overridden, and some hooks have been added as well.
Here is a list of possible customizations through hooks:

  • adding custom commands (for example build_doc to build doc)
  • adding dynamically generated files in sdist
  • using waf as a build tool
  • adding autoconf-like tests in configure

This opens a lot of possibilities. Some examples are found in the hook subdirectory

Distcheck command

This command configure, build, install and optionally test a package from the
tarball generated by sdist. This is very useful to test a release.

This command is still very much in infancy, but quite useful already.

One file distribution

Since bento is still in the planning phase, its API is subject to significant
changes, and I obviously don’t care about backward compatibility at this stage.
Nevertheless, several people want to use it already, so I intend to support
a waf-like one file support. It would be a self-extracting file which looks
like a python script, and could be included to avoid any extra dependency. This
would solve both distribution and compatibility issues until bento stabilized.
There is a nice explanation on how this works on the waf-devel blog

Bug fixes, python 2.4 support

I have started to fix the numerous but mostly trivial issues under
python 2.4. Bento 0.0.3 should be compatible with any python version from 2.4
to 2.7. Although python 3.x support should not be too difficult, it is rather
low priority. Let me know if you think otherwise.

Yaku, a simple python build system for toydist

[EDIT] Of course, just after having written this post, I came across two
interesting projects: mem and fbuild. That’s what I get for not
having Internet for weeks now … Both projects are based on memoization
instead of a dependency graph, and seem quite advanced feature-wise.
Unfortunately, fbuild requires python 3.1. Maybe mem would do. If so, consider
yaku dead[/EDIT]

While working on toydist, I was considering re-using distutils ability to build
C code at first, with the idea that people would use waf/scons/etc… if they
have involved compilation needs. But distutils is so horrendous that I realized
that implementing something significantly better and simpler would be possible.
After a few hours of coding, I had something which could build extensions on a
few platforms: yaku (“bake” in Japanese).

Yaku main design goal is simplicity: I don’t want the core code to be more than
~ 1000 LOC. Fortunately, this is more than enough to create something
significantly better than distutils. The current codebase is strongly inspired
by waf (and scons to some extent), and has the following features:

  • Task-based: a yaku task is like a rule in make, with a list of
    targets, dependencies, and a list of executable commands
  • Each task knows about its environment (e.g. flags for C compilation),
    and environment changes as well as dependencies changes trigger a
    task (re)-execution
  • Extension through callback: adding support for new source files
    (cython, swig, fortran, etc…) requires neither monkey patching or
    inheritence. This is one of my biggest grip with distutils
  • Primitive autoconf-like features to check for header, libraries, etc…

Besides polishing the API, I intend to add the following features:

  • Parallel build
  • Automatically find header dependencies for C/C++ code (through
    scannning sources)

I want to emphasize that yaku is not meant as a replacement for a real build
tool. To keep it simple, yaku has no abstraction of the filesystem (node
concept in scons and waf), which has serious impact on the reliability and
power as a build tool. The graph of dependencies is also built in one shot, and
cannot be changed dynamically (so yaku won’t ever be able to detect dependency
on generated code, for example foo.c which depends on foo.h generated from
foo.h.in).

Nevertheless, I believe yaku’s features are significant enough to warrant the
project. If the project takes off, it may be possible to integrate yaku within
the Distribute project, for example, whereas integrating waf or scons is out of
the question.