What’s coming in for bento 0.0.4

I initially intended to release the new version (0.0.4) of bento around mid-end
of August, but it now seems like end of september is more likely. The reason is
that I have been working pretty hard on bento and yaku for complex builds in
the last few weeks, where complex means numpy/scipy.

The main reason for making scipy buildable with bento is to get a “feel” of how
really extensible bento is. I think that since 0.0.3, bento is fairly usable,
but extensibility is really what bento is about. The only way that I know to
have an extensible design is to actually extend it in as many scenario as
possible, and as far as complex distutils-based build go, scipy is a pretty
good scenario.

The bottom line: I expect a fully working bento build of scipy within a few
days
(most hairy fortran stuff now builds and run the tests ok)

Major changes from 0.0.3

No backward-incompatible changes are required for the bento.info format. The
major change is recursive package support, which ended up being more complex
than anticipated. I already described this feature in a previous post: it
mostly boils down to splitting a big bento.info into several “sub-bento” files
in subdirectories.

Implementation-wise, it required a redesign of internal representation for
files. The issue is how to know that two file names represent the same files: I
quickly realize that using filenames is too complex and too fragile, and I
decided to re-use the Node class from waf, which builds an internal
representation of the filesystem. The conversion is still going on, but it
simplified a lot of hairy code that I used to write in bento (and distutils
previously). It particularly helps to compute the relative paths between too
paths:

relpos = node.path_from(othernode)

If node is /foo/bar and othernode is /foo, relpos will be bar, and .. if node
and othernode are inverted. Doing this from the filenames alone has many
corner cases, and path name computation are surprisingly slow on python (waf
Node class caches things like absolute path name computation).

Thanks to the waf Node class, I can now easily list the packages, extensions,
etc… specific to one sub bento, relatively to the sub bento directory, and
translates packages, extensions, etc… as seen from the top directory.

I am happy with the internals, but the “API” for recursive build description is
not good, to put it mildly. To add a subpackage description with associated
bscript (hook file), you need to:

  • add the sub bento.info to the Subento field in the parent bento.info
  • add the bscript file into the list of the recursive decorator inside
    the parent bscript file. Even though the decorator may be put on e.g.
    the configure hook, the build command will also look there for sub
    bscript files, which is not intuitive at all.

You can see some examples
there
I am still looking for a good solution to this issue.

Yaku enhancements

Except for recursive package description, not much has changed in bento, and
most of the work has happened in yaku. The first big change is that yaku itself
also uses a waf-like Node class: although I resisted this at first, I think it
is for the best, and it also simplified a lot of hairy corner cases inside
yaku.

The other big change in yaku is overriding/extending it. I am interested in the
following cases:

  • adding new tool (clang, intel compiler, etc…)
  • adding a new process in the chain (say building extensions from .c.src
    instead of .c without monkey-patching original code)
  • overriding flags for some extensions (say building one extension with -Os
    instead of -O2)
  • overriding extension hook for some extensions. For example in general,
    fortran source files are compiled into .o directly for “pure” (not using
    python C API) libraries, but f2py allows to build a python extension from
    the .f directly. Yaku now allows for temporary overriding the command
    associated to .f file

Now, all those four cases are implemented. Chaining a templating system to
cython (for .pyx.in -> .pyx -> .c -> .o -> .so/.pyd) is now very simple,
supporting new compilers can be done easily, and playing with compiling options
straightforward internally. There are a few issues, though. Besides how the
API should look like, a corny situation is dealing with dictionaries of
configurations. In yaku, each task has an environment attached to it, which is
a simple dictionary containing things like CFLAGS, CC, etc… Most of the
time, you want to share those dictionaries across tasks. Unfortunately, python
semantics for dictionaries don’t make that easy, and deepcopy is too expensive.
A Copy-On-Write dictionary, which internally share common parts between
dictionaries, would be ideal, but I am afraid implementing one in python would
be very difficult.

I am also still not entirely convinced that yaku is warranted:
fbuild is nearly the ideal system if it were
not limited to python 3, and the new waf 1.6 looks great (T. Nagy, the waf
maintainer, recently updated fortran support for 1.6). Fortunately, bento is
build-tool agnostic from the start, and trying waf inside bento for a real
project is on the TODO list.

Other bento features

I put an hold on other features planned for 0.0.4. The main missing features
for bento are:

  • distutils compatibility mode (so that bento.info may be used within distutils)
  • wininst <-> egg conversion
  • good documentation
  • python 3 compatibility
  • virtualenv and pip support
  • automatic command dependency (e.g. automatically re-run configure before
    build if necessary)

Python 3 support will definitely not go into 0.0.4. Virtualenv/pip support
should not be difficult, automatic dependency for commands is badly needed.

All being said, I think bento is shaping up quite ok. At my work, I constantly
have to deal with distutils idiosyncraties for the most trivial things, and I
am looking forward to seeing it replaced with something saner.

Recent progress on bento – build numpy !

I have spent the last few days on a relatively big feature for bento: recursive package description. The idea is to be able to simply describe packages in a deeply nested hierarchy without having to write long paths, and to split complicated packages descriptions into several files.

At the bento.info level, the addition is easy:

Subento: numpy/core, numpy/lib ...

It took me more time to figure out a way to do it in the hook file. I ended up with a recurse decorator:

@recurse(["numpy/core/bscript", "numpy/lib/bscript"])
@post_configure
def some_func(ctx):
    ....

I am not sure it is the right solution yet, but it works for now. My first idea was to simply use a recurse function attached to hook contexts (the ctx argument), but I did not find a good way to guarantee an execution order (declaration order == execution order), and it was a bit unintuitive to integrate both hook decorator and the recurse together.

The reason why I tackle this now is that bento is at a stage where it need to be used on “real” builds to get a feeling of what works and what does not. The target is numpy and hopefully later scipy. Although I still hope to integrate waf or scons in bento as the canonical way of building numpy/scipy with bento, this also gives a good test for yaku (my simple build system).

It took me less than half a day to port the scons scripts to bento/yaku. A full build, unnoptimized build of numpy with clang is less than 10 seconds. A no-op build is ~ 150 ms, but as yaku does not have all the infrastructure for header dependency tracking yet, the number for no-op build is rather meaningless.

Bento (ex-toydist): what’s coming for 0.0.3

A lot has happened feature-wise since the 0.0.2 release of toydist. This is a
short summary of what is about to come in the 0.0.3 release.

Toydist renamed to bento

I have finally found a not too sucky name for toydist: bento. As you may know, bento is a Japanese word for lunch-box (see picture if you have no idea what I am talking about). The idea is that those are often nicely prepared, and bentomaker becomes the command to get a nicely packaged software :)

Integration of yaku, a micro build framework

The 0.0.2 release of toydist was still dependent on distutils to build C
extensions. I have since then integrated a small package to build things, yaku
(“grill, bake” in Japanese). This gives the following features when building C extensions

  • basic dependency handling (soon auto-detection
    of header file dependency through compiler-specific extensions)
  • reliable out-of-date detection though file content checksum
  • reliable parallel execution

I still think complex packages should use a real build system like waf or
scons, and in that regard, bento will remain completely agnostic (the distutils
build is still available as a configuration option).

Hook

Any command may now be overridden, and some hooks have been added as well.
Here is a list of possible customizations through hooks:

  • adding custom commands (for example build_doc to build doc)
  • adding dynamically generated files in sdist
  • using waf as a build tool
  • adding autoconf-like tests in configure

This opens a lot of possibilities. Some examples are found in the hook subdirectory

Distcheck command

This command configure, build, install and optionally test a package from the
tarball generated by sdist. This is very useful to test a release.

This command is still very much in infancy, but quite useful already.

One file distribution

Since bento is still in the planning phase, its API is subject to significant
changes, and I obviously don’t care about backward compatibility at this stage.
Nevertheless, several people want to use it already, so I intend to support
a waf-like one file support. It would be a self-extracting file which looks
like a python script, and could be included to avoid any extra dependency. This
would solve both distribution and compatibility issues until bento stabilized.
There is a nice explanation on how this works on the waf-devel blog

Bug fixes, python 2.4 support

I have started to fix the numerous but mostly trivial issues under
python 2.4. Bento 0.0.3 should be compatible with any python version from 2.4
to 2.7. Although python 3.x support should not be too difficult, it is rather
low priority. Let me know if you think otherwise.

Yaku, a simple python build system for toydist

[EDIT] Of course, just after having written this post, I came across two
interesting projects: mem and fbuild. That’s what I get for not
having Internet for weeks now … Both projects are based on memoization
instead of a dependency graph, and seem quite advanced feature-wise.
Unfortunately, fbuild requires python 3.1. Maybe mem would do. If so, consider
yaku dead[/EDIT]

While working on toydist, I was considering re-using distutils ability to build
C code at first, with the idea that people would use waf/scons/etc… if they
have involved compilation needs. But distutils is so horrendous that I realized
that implementing something significantly better and simpler would be possible.
After a few hours of coding, I had something which could build extensions on a
few platforms: yaku (“bake” in Japanese).

Yaku main design goal is simplicity: I don’t want the core code to be more than
~ 1000 LOC. Fortunately, this is more than enough to create something
significantly better than distutils. The current codebase is strongly inspired
by waf (and scons to some extent), and has the following features:

  • Task-based: a yaku task is like a rule in make, with a list of
    targets, dependencies, and a list of executable commands
  • Each task knows about its environment (e.g. flags for C compilation),
    and environment changes as well as dependencies changes trigger a
    task (re)-execution
  • Extension through callback: adding support for new source files
    (cython, swig, fortran, etc…) requires neither monkey patching or
    inheritence. This is one of my biggest grip with distutils
  • Primitive autoconf-like features to check for header, libraries, etc…

Besides polishing the API, I intend to add the following features:

  • Parallel build
  • Automatically find header dependencies for C/C++ code (through
    scannning sources)

I want to emphasize that yaku is not meant as a replacement for a real build
tool. To keep it simple, yaku has no abstraction of the filesystem (node
concept in scons and waf), which has serious impact on the reliability and
power as a build tool. The graph of dependencies is also built in one shot, and
cannot be changed dynamically (so yaku won’t ever be able to detect dependency
on generated code, for example foo.c which depends on foo.h generated from
foo.h.in).

Nevertheless, I believe yaku’s features are significant enough to warrant the
project. If the project takes off, it may be possible to integrate yaku within
the Distribute project, for example, whereas integrating waf or scons is out of
the question.

First public release of toydist

Toydist 0.0.2 has just been announced, and since this is the first public release since I announced it at Scipy India 2009, I thought it would be the occasion of summarizing the current status of toydist, and where I see it going the next few months.

Toydist is an experimental alternative to distutils/setuptools, and aims at replacing the whole packaging infrastructure for python softwares, without requiring people to throw away their current infrastructure. The main philosophy of toydist is simplicity + extensibility:

  • simple: it should be simpler than distutils for simple packages, to the point where it is difficult to get it wrong. Although packaging is difficult, there are known good practices, and the tools should at least hint at those practices.
  • extensible: it should be possible to do things as complex as wanted in some parts of packaging, while still benefiting from toydist capabilities otherwise.

In other words, making toydist more pythonic, with OOWTDI, without getting in your way.

The present

The focus of this first release has been the design of a declarative package description, and implementing just enough features so that toydist can install itself. A simple command line interface, called toymaker, is provided as well. Installing a package with toymaker is very similar to the autotools’ way

toymaker configure
toymaker build
toymaker install
toymaker sdist # Assemble a tarball

I have also implemented preliminary support to build eggs and windows installers (.exe-based), through the buildegg and buildwininst commands.

This first release also brings a few distribution-related features which have been big pain points in distutils/setuptools. First, the flexibility of autotools installation scheme is available at configuration stage

    toymaker configure --prefix=somepath --libdir=someotherpath --mandir=yetanotherpath

works as expected, and every customized path is available inside toydist from the beginning, instead of being available only at install time as in distutils.

Secondly, data files are correctly handled, instead of the distutils/setuptools’ mess. Toydist makes the difference between extra source files, which are not intended to be installed (say .rst source documentation), and data files which are installed. For the later, you can declare as many data files sections as possible, and each data file section potentially has a different installation path

DataFiles: manpath
SourceDir: doc/
        TargetDir: $manpath
        Files: man1/foo.1, man3/foo.3

This syntax, inspired from automake, will cause doc/man1/foo.1 to be installed as $manpath/man1/foo.1 and doc/man3/foo.3 as $manpath/man3/foo.3. As the TargetDir field accepts non-expanded path variables, and because you can define new path variables, you can be as flexible as possible.

For toydist to be successful at all, transition from a setup.py-based build must be straightforward. For simple packages, this is as simple as

toymaker convert

inside the same directory as setup.py. Packages such as Jinja2 and Sphinx can already be converted pretty accurately using this method. Packages which rely heavily on distutils extensions, like NumPy or Twisted will most likely never be convertible this way.

As there is a lot of existing infrastructure based on distutils (and setuptools), with tools like virtualenv, pip or buildout, going from toydist to setup.py is also desirable. This can be done manually at the moment

from distutils.core import setup
from toydist.core import PackageDescription

pkg = PackageDescription.from_file("toysetup.info")

DESCR = pkg.description
CLASSIFIERS = pkg.classifiers

METADATA = {
            'name': pkg.name,
            'version': pkg.version,
            'description': pkg.summary,
            'url': pkg.url,
            'author': pkg.author,
            'author_email': pkg.author_email,
            'license': pkg.license,
            'long_description': pkg.description,
            'platforms': 'any',
            'classifiers': pkg.classifiers,
}

PACKAGE_DATA = {
            'packages': pkg.packages,
}

if __name__ == '__main__':
        config = {}
        for d in (METADATA, PACKAGE_DATA):
                for k, v in d.items():
                        config[k] = v
        setup(**config)

Toydist own setup.py is basically as above. The next version of toydist will have a distutils compatibility layer so that this will look as follows

from toydist.distutils_compat import setup

if __name__ == '__main__':
        setup("toysetup.info")

Depending on the required compatibility level with distutils, one can write distutils command to support some toydist features.

What’s coming next ?

Easy interoperation with distutils, setuptools, etc…

For toydist 0.0.3, I intend to add support for a single-file distribution of toydist, ala waf. Integrating the full code of the packaging program in a source distribution is sometimes quite useful in my experience (that’s how autotools manage its cross-platformness that to a some degree), and this would make distributing toydist-enabled packages easier.

Except on windows, it should be possible to make this single bootstrapping file not bigger than 100-200 kb, so space would not be an issue. Windows needs more as building windows installers require binaries which take a lot of space.

Extensibility through commands hooks

My minimal threshold to consider toydist succesfull is the ability to build numpy and scipy. I am convinced that a packaging tool should leverage existing build tools for complex extension builds, be it scons, waf or even the venerable make. Toydist started as a prototype to make writing things like numscons easier and it is still a major design principle I intend to follow throughout toydist development.

I am currently working on a hook API so that any toymaker command can be customized in an auxiliary python file. Toydist 0.0.3 will contains examples to build simple python C extensions with waf in a couple of lines of code. Building extensions with a real build system like waf brings automatic dependency handling, parallel builds and other features which are near impossible to implement correctly in distutils.

Replacement for pkg_resources

There are currently only two ways to retrieve data files from an installed python package: through __file__ and pkg_resources. file has the advantage of simplicity, but it is inflexible. pkg_resources is too complicated, and significantly slows down everything which uses it, and I have no use for its other features (plugins).

Using something akin to autoheader to install-time generated data locations should be easy to implement:

  • no more imports slow down (pkg_resources can easily increase import times by a factor of 2 to 3)
  • much more robust, without the possibility to break other packages (pkg_resources is a single point of failure for every package which uses it – I have had some experience where installing one setuptools package broke unrelated existing packages on my system).

Solving huge memory use on Ubuntu 9.10

I noticed huge (~ 500 Mb) usage of memory on my workstation running on Ubuntu 9.10 (on amd64), causing issues when running large numerical numpy scripts. It ended up being related to ureadahead.

Removing ureadahead, and rebooting seems to fix the issue for now. I am baffled such a bug stayed in a release version. It seems that every new version of Ubuntu is slightly worse than the previous one. It is still better than many other low-hassle distributions, but I worry about the trend.

Progress for numpy on windows 64 bits

The numpy 1.3.0 installer for windows 64 does not work very well. On some configurations, it does not even import without crashing. The crashes are mostly likely due to some bad interactions between the 64 bits mingw compilers and python (built with Visual Studio 2008). Although I know it is working, I had no interest in building numpy with MS compiler, because gfortran does not work with VS 2008. There are some incompatibilities because the fortran runtime from gfortran is incompatible with the VS 2008 C runtime (I get some scary linking errors).

So the situation is either building numpy with MS compiler, but with no hope of getting scipy afterwards, or building a numpy with crashes which are very difficult to track down. Today, I realized that I may go somewhere if somehow, I could use gfortran without using the gfortran runtime (e.g. libgfortran.a). I first tried calling a gfortran-built blas/lapack from a C program built with VS 2008, and after a couple of hours, I managed to get it working. Building numpy itself with full blas/lapack was a no-brainer then.

Now, there is the problem of scipy. Since scipy has some fortran code, which itself depends on the gfortran runtime when built with gfortran, I am trying to ‘fake’ a minimal gfortran runtime built with the C compiler. Since this mini runtime is built with the MS compiler and with the same  C runtime as used by python, it should work if the runtime is ABI compatible with the gfortran one. As gfortran is open source, this may not be intractable :)

With this technique, I could go relatively far in a short time. Among the packages which build and pass most of the test suite:
 – scipy.fftpack
 – scipy.lapack
 – some scipy.sparse

Some packages like cluster or spatial are not ANSI C compatible, so they fail to build. This should not be too hard to fix. The main problem is scipy.special: the C code is horrible, and there needs many hacks to build the C code. The Fortran code needs quite a few functions from the fortran runtime, so this needs some work. But ~ 300 unit tests of scipy pass, so this is encouraging.