A few remarks on distutils2

Disclaimer: I am working on a project which may be seen as a concurrent to
distutils2 efforts, and I am quite biased against the existing packaging tools
in python. On the other hand, I know distutils extremely well, and have been
maintaining numpy.distutils extensions for several years, and most of my
criticisims should stand on their own

There is a strong consensus in the python community that the current packaging
tools (distutils) are too limited. There has been various attempts to improve
the situation, through setuptools, the distribute fork, etc… Beginning this
year, the focus has been shifted toward distutils2, which is scheduled to be
part of the stdlib for python 3.3, while staying compatible with python 2.4
onwards. A first alpha has been released recently, and I thought it was a good
occasion to look at what happened in that space.

As far as I can see, distutils2 had at least the three following goals:

  • standardize a lot of setuptools practices through PEPS and implement them.
  • refactor distutils code and add a test suite with a significant coverage.
  • get rid of setup.py for most packages, while adding hooks for people who
    need to customize their build/installation/deployment process

I won’t discuss much about the first point: most setuptools features are
useless to the scipy community, and are generally poor reimplementations of
existing solutions anyway. As far as I can see, the third point is still being
discussed, and not present in the mainline.

The second point is more interesting: distutils code quality was pretty low,
but the main issue was (and still is) the overall design. Unfortunately, adding
tests does not address the reliability issue which have plagued the scipy
community (and I am sure other communitues as well). The main issues w.r.t.
build and installation remain:

  • unreliable installation: distutils install things by simply copying trees
    built into a build directory (build/ by default). This is a problem when
    you decide to change your source code (e.g. renaming some modules), as
    distutils will add things to the existing build tree, and hence install
    will copy both old and new targets. As with distutils, the only way to have
    a reliable build will be to first rm -rf build. This alone is a consistent
    source of issues for numpy/scipy, as many end-users are bitten by this. We
    somewhat alleviate this by distributing binary installers (which know how
    to uninstall things and are built by people familiar with distutils idiocy)
  • Inconsistencies between compiler classes. For example, the MSVCCompiler
    class compiler executable is defined as a string, and set as the attribute
    cc. On the other hand, most other compiler classes define the compiler_so
    attribute (which is a list in that case). They also don’t have the same
    methods.
  • No consistent, centralized API to obtain basic compilation options (CC
    flags, etc…)

Even more significantly, it means that the fundamental issue of extensibility
has not been adressed at all, because the command-based design is still there.
This is by far the worst part of the original distutils design, and I fail to
see the point of a backward-incompatible successor to distutils which does not
address this issue.

Issues with command-based design

Distutils is built around commands, which almost correpond 1 to 1 to command
line command: when you do “python setup.py install”, distutils will essentially
call the install.run command after some initialization stuff. This by itself is
a relatively common pattern, but the issue lies elsewhere.

Options handling

First, each command has its own set of options, but the options of one command
often affect the other commands, and there is no easy way for one command to
know the options from the other one. For example, you may want to know the
options of the install command at build time. The usual pattern to do so is to
call the command you want to know the options, instantiate it and get its
options, by using e.g. get_finalized_command:

install = self.get_finalized_command("install")
install_lib = install.install_lib

This is hard to use correctly because every command can be reset by other
commands, and some commands cannot be instancialized this way depending on the
context. Worse, this can cause unexpected issues later on if you are calling a
command which has not already been run (like the install command in a build
command). Quite a few subtle bugs in setuptools and in numpy.distutils were/are
caused by this.

 

According to Tarek Ziade (the main maintainer of distutils2), this is addressed in a distutils2 development branch. I cannot comment on it as I have not looked at the code yet.

Sub-commands

Distutils has a notion of commands and “sub-commands”. Subcommands may override
each other’s options, through set_undefined_options function, which create
new attributes on the fly. This is every bit as bad as it sounds.

Moreover, the harcoding of dependencies between commands and sub-commands
significantly hampers extensibility. For example, in numpy, we use some
templated source files which are processed into .c: this is done in the
build_src command. Now, because the build command of distutils does not know
about build_src, we need to override build as well to call build_src. Then
came setuptools, which of course did not know about build_src, so we had to
conditionally subclass from setuptools to run build_src too [1]. Every command
which may potentially trigger this command may need to be overriden, with all
the complexity that follows. This is completely insane.

Hook

Distutils2 has added the notion of hooks, which are functions to be run/before
the command they hook into. But because they interact with distutils2 through
the command instances, they share all the issues aforementioned, and I suspect
they won’t be of much use.

More concretely, let’s consider a simple example: a simple file generated from
a template (say config.pkg.in), containing some information only known at
runtime (like the version and build time). Doing this correctly is
surprisingly difficult:

  • you need to generate the file in a build command, and put it at the right
    place in the build directory
  • you need to install it at the right place (in-place vs normal build, egg
    install vs non-egg install vs externally_managed install)
  • you may want to automatically include the version.py.in in sdist
  • you may want the file to be installed in bdist/msi/mpkg, so you may need to
    know all the details of those commands

Each of this step may be quite complex and error-prone. Some are impossible with a
simple hook: it is currently impossible to add files to sdist without rewriting
the sdist.run function AFAIK.

To deal with this correctly, the whole command business needs a significant
redesign. Several extremely talented people in the scipy community have
indepedently attempted to improve this in the last decade or so, without any
succes. Nothing short of a rewrite will work there, and commands constitutes a
good third of distutils code.

Build customization

distutils2 does not improve the situation w.r.t. building compiled code, but I
guess that’s relatively specific to the big packages like numpy, scipy or
pywin32. Needless to say, the compilers classes are practically impossible to
extend (they don’t even share a consistent interface), and very few people know
how to add support for new compilers, new tools or new binaries (ctypes
extensions, for example).

Overall, I don’t quite understand the rationale for distutils2. It seems that
most setuptools-standardization could have happened without breaking backward
compatibility, and the improvements are too minor for people with significant
distutils extensions to switch. Certainly, I don’t see myself porting
numpy.distutils to distutils2 anytime soon.

[1]: it should be noted that most setuptools issues are really distutils
issues, in the sense that distutils does not provide the right abstractions to
be extended.

About these ads

7 responses to “A few remarks on distutils2

  1. email

    I’m coming to Python from the Ruby community. This past month I’ve been trying to get my head around how Python software is packaged; setuptools, distribute, distutils, easy_install, pip, etc. I’m not impressed so far. My biggest complaint is the inability to uninstall packages. Also I’ve not found a way to search PyPI from the command line.

    Ruby gems were create outside of the Ruby core group. Once it was mature enough it became part of the Ruby distribution. I suppose the Python community could at least have a look at it to get some inspiration. It works so much nicer than what I’ve experienced so far.

    I do hope the tooling improves. :)

    • cournape

      Well, actually, if python-specific packaging and uninstall is what you care about, distutils2 will improve the situation compared to the current mess.

      But although language-specific packaging solutions have their place, I am much more interested in integration with native ones myself, and I believe they solve many mores issues than most people pushing setuptools/distribute believe.

    • I think looking at Ruby’s packaging is great. How does Ruby deal with compilation/compiler choices/compiler options acrsoss platforms when the gem contains a mixture of Ruby and native code? I have actually found it easier to compile libs for various platforms we support separately and then just include them as files in a Python egg. I can then install into one clean shared location.

  2. Kurt Smith

    Long-term, do you plan to replace numpy.distutils with Bento + pluggable-build-system? I went with waf for fwrap’s build needs and I am very happy with it–a major improvement over the mess that is extending distutils.

    I have not used waf with Bento yet, but will try to do so if I can scrape together a spare afternoon.

    • cournape

      Yes,, exactly. Waf is pretty cool, although a bit too magic in some parts to my taste. Overall, it has the right balance between simplicity and power IMO. I looked a bit at bento/waf integration a while ago (http://github.com/cournape/Bento/blob/master/examples/hooks/waf/bscript), but the example is most likely broken – it uses waf 1.5, and bento has significantly changed since.

      Right now, bento + yaku already builds numpy and scipy on the most significant platforms, and I would say it is already usable for developers (it can build numpy/scipy much faster than distutils and numscons, up to one order of magnitude thanks to the easiness of setting debug builds).

  3. Pingback: Distutils 2 alpha 4 – work in progress « Fetchez le Python

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

October 2010
M T W T F S S
« Sep   Nov »
 123
45678910
11121314151617
18192021222324
25262728293031

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 9 other followers

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: