Ian Bicking: the old part of his blog

Re: Packaging Python

Luckily I don't think that debian is going to listen to you about this :-) but if they did, I think it's a disaster waiting to happen.

Basically, what you're saying is that any time you want to integrate 2 python libraries on a given system, you should have to create two entirely new "virtual installations". If everyone writes libraries this way, why bother with namespaces? I know that in my "virtual installations", "n.py" is the networking module. It's a lot less work to type "import n" than to create an __init__.py, a package directory, and do 'from my_networking import n'.

Now, working-env.py looks valuable. In fact, it looks identical (from what I can see) to Divmod's Combinator, something I wrote myself to handle similar deployment issues. However, real installations and real packaging are going to beat this kind of ad-hoc code slinging any day. The fact that they're still more work is unfortunate, but in the end it's worth it to have all your code in one place so it's easily loadable from one interpreter.

It sounds like when you boil it down, this is the argument between static and dynamic linking. Dynamic linking may be less important due to disk-space concerns today, but it is still clearly the superior option for security reasons. If ten of your "working environments" have the same library installed and it has a security flaw, it is going to be a lot more work to make sure they're all properly updated (by hand, by copying files) rather than having your distro install a new (but tested to be compatible by distro QA) version of the library.

Comment on Packaging Python
by Glyph Lefkowitz

Comments:

Basically, what you're saying is that any time you want to integrate 2 python libraries on a given system, you should have to create two entirely new "virtual installations". If everyone writes libraries this way, why bother with namespaces? I know that in my "virtual installations", "n.py" is the networking module. It's a lot less work to type "import n" than to create an __init__.py, a package directory, and do 'from my_networking import n'.

I don't know what you mean by "integrate 2 python libraries". Libraries don't integrate. They are integrated. What is integrating them? A developer, or an application. If it's a developer, they should be doing it in a development sandbox. If it is an application, then it is a package.

Underlying the system-level packages are python-level packages. I'm not arguing against Python packages (though my original post didn't make the distinction as it should have). In this model a system-level package (e.g., an rpm) contains a set of python-level packages. There is some kind of point of entry into this set of packages; typically it will be a script that changes sys.path. In other environments where there isn't an executable point of entry, I'm not sure how the activation happens. For plugins it is a little fuzzy too, but plugin systems are already fuzzy in these packaging systems.

However, real installations and real packaging are going to beat this kind of ad-hoc code slinging any day. The fact that they're still more work is unfortunate, but in the end it's worth it to have all your code in one place so it's easily loadable from one interpreter.

How? Why is it easier? It's a heck of a lot more implicit. It's a heck of a lot harder to debug. It's harder to control dependencies. It's harder to develop, harder to branch. We have a modest number of tools that work well with the current system; more tools than we currently have to work with a new isolated system. /usr/bin/python is just one tool, it's not something magical. These tools are the only advantage I see to the status quo. But it's not an impressive number of tools on either side.

"Real" packaging leaves many problems unsolved. It doesn't solve the problem of different applications requiring different versions of code. The packaging systems will whine and complain and get in your way if you have conflicting requirements, but they won't help you in any way. "Real" packaging systems force you to upgrade even when you don't want to, they force you to create a globally consistent environment. This is everything that pisses deployers off. This is the style of development that makes it hard for Debian to make releases. "Real" packages are an enabler for coupled software, and promote cultures that are getting in the way of building properly decoupled software.

It sounds like when you boil it down, this is the argument between static and dynamic linking. Dynamic linking may be less important due to disk-space concerns today, but it is still clearly the superior option for security reasons. If ten of your "working environments" have the same library installed and it has a security flaw, it is going to be a lot more work to make sure they're all properly updated (by hand, by copying files) rather than having your distro install a new (but tested to be compatible by distro QA) version of the library.

To me that sounds like bad tool support being a justification for complex architecture.

Honestly, why is it so much worse if 10 packages have to be upgraded in response to a security flaw instead of 1? Because we can't keep track of what libraries are embedded in what packages? That doesn't seem hard to resolve. Because computers are incapable of doing repetitive tasks? Because the network bandwidth is so valuable? It doesn't make sense to me.

I'm not arguing that everything should be entirely ad hoc, though for my own uses system-level packages are not useful. I'm arguing that developer use libraries, but deployers never do. Current systems kind of suck for developers. And to the degree they don't suck for developers, they suck for deployers. It's a back-and-forth where neither is happy. So I'm saying developers shouldn't bother using system packages for libraries at all. Setuptools and sandbox environments are already a much better experience for them. And deployers don't care about libraries, so don't we shouldn't waste our time trying to expose that level of ganularity to them.

# Ian Bicking

Okay, I can see where you're coming from. The deployment system within my company uses a "virtual environment" or "sandbox" model, and it does have its advantages. However, it also creates its own complexity and coordination problems, which I think are easy to underestimate if you haven't used such a system on a large scale. If I'm a library developer in this system, it can be quite hard to get updates to my library deployed to all the relevant application environments, even with the help of an extensive system for tracking deployments and dependencies.

Ubuntu uses the same packaging system (and almost all of the same packages) as Debian, and they have no problem releasing on schedule. Debian does have a problem (note: I am a Debian developer), but it's not a technical problem and does not have a technical solution.

# Matt Brubeck

First of all, let me be clear that there is a point of agreement here. I think there is a happy medium between what you're saying and the status quo. Right now, per-user installation is screwed, and there is no generally accepted facility for virtual installations. site.py only reads user-installed directories on OS X, for some reason. That sucks. There are many times wheren virtualized installations are really handy, and certainly per-user installation is important (at the very least, if your installer does not have system-administrator privileges) and it is one of python's strengths that this kind of configuration is relatively easy to do from the interpreter itself, from any application. The fact that there are no standard or accepted conventions for doing this, especially in Python where it is only a matter of convention, all the technical problems are solved, is unfortunate.

I'm arguing that developer use libraries, but deployers never do.

If that's the core argument, then I'll argue against just that and I won't bother with defending the other things I was saying :). Debian developers (who I think are generally considered authoritative on the topic of deployment, at least) have already responded here saying why deployers care about libraries. However, another good indication of the fact that deployers deal with libraries directly is the phenomenon of library configuration; for example, /etc/fonts/fonts.conf - someone deploying a GNOME desktop will edit that file, and it doesn't configure the X server "application" - it is a common configuration file read by every application that uses fontconfig, generally invoked by the Xft library. There's also a pile of library-related configuration in /etc/gnome* and /etc/gconf. If fontconfig were installed for every application, first of all, there would be a lot of copies of fontconfig:

% apt-cache rdepends libfontconfig1 | wc -l

1091

and second of all, it would be a package management nightmare making sure that all 1091 applications on this system with their own copy of fontconfig were able to read the same configuration format.

(By the way: fontconfig is about 1M all told, so although disk space isn't as big a deal as it used to be, all those packages would have 1G of just copies of fontconfig. Once you add in the inevitable X and Gnome dependencies in each package too, that number would explode into impracticality really fast.)

Speaking in terms of the strength of tools, good libraries are tools in their own right, not just for developers. Deployers can use to configure and customize large groups of applications at a time. In the best case, a library can allow you to tweak its behavior independently of an application, to affect an application's behavior (or many applications' behavior) without the applications having explicitly coded any support for it.

# Glyph Lefkowitz

However, another good indication of the fact that deployers deal with libraries directly is the phenomenon of library configuration; for example, /etc/fonts/fonts.conf - someone deploying a GNOME desktop will edit that file, and it doesn't configure the X server "application" - it is a common configuration file read by every application that uses fontconfig, generally invoked by the Xft library.

That's an example of a situation where discovery and registration of resources is necessary. That is definitely a problem with bundled software, though it is not trivial in a centralized system either. /etc/fonts/fonts.conf has policy associated with it, and various scripts that do things to that file -- all of which must work properly for the entire system to work properly -- and all as an augmentation of the (fairly slow) index of package metadata that exists elsewhere.

It's harder with software bundles, but I don't think that isolated installs need to be entirely isolated from the system either. I think it's better to start from the default of fully insulated and add conventions from there. It's not easy regardless of what you are doing.

(By the way: fontconfig is about 1M all told, so although disk space isn't as big a deal as it used to be, all those packages would have 1G of just copies of fontconfig. Once you add in the inevitable X and Gnome dependencies in each package too, that number would explode into impracticality really fast.)

Perhaps caching of shared content needs to be a central concept of this. Something I've meant to add to working-env.py, for instance, is a way of linking in libraries from elsewhere -- probably using Setuptools .egg-link files (which are largely equivalent to platform independent symlinks). In that case it would an opt-in sharing -- which is better than implicit sharing without any opt-out option at all (except for careful manipulation and stacking of the entries on sys.path). But a more implicit sharing of resources that are identical would be possible.

I don't think fontconfig is actually an example of a resource at all, but the problem certainly exists. I'm also not sure how far down to push this isolation. Right now most Python applications depend on a set of libraries that mostly should be bundled. Should everything be bundled everywhere? Thinking about what that would mean would be an interesting thought experiment ;) I'm not even sure what it would look like.

Deployers can use to configure and customize large groups of applications at a time. In the best case, a library can allow you to tweak its behavior independently of an application, to affect an application's behavior (or many applications' behavior) without the applications having explicitly coded any support for it.

Applications should delegate to their component libraries when possible and reasonable, and let information pass down. This usually has nothing to do with the packaging used. Applications would still use libraries, and those libraries can still look at their environment; nothing changes with respect to that.

# Ian Bicking

Applications should delegate to their component libraries when possible and reasonable, and let information pass down. This usually has nothing to do with the packaging used.

In fact it does. Default configuration is very much part of the library package, at least on debian.

Applications would still use libraries, and those libraries can still look at their environment; nothing changes with respect to that.

The thing that changes with respect to that is that, within a compatible version of a library, the format of the system configuration for the library may change, or features may be added, without necessarily alerting applications to that fact. Or, an entirely new version might be released, which provides a compatibility later.

Now, there are ways to design around this, future-proofing your format etc, but demonstrably library authors do not always do this. Configuration formats do change, and will continue to change whether application authors start bundling everything under the sun with their application or not.

Right now the user experience of this is, you upgrade the library, and Debian prompts you if you want to upgrade your system config file. You (and your users) have to infer that you also must upgrade files under ~/. as well, but at least one upgrade to that file and you're done with it. If every application packages every library, all of a sudden you've got 1091 copies of fonts.conf, under /etc/gaim/fonts/fonts.conf, /etc/gimp/fonts/fonts.conf, and you have to track the version of fontconfig used by every single one of those apps manually. Even if you make no modifications, the package maintainers for each of those applications suddenly has to become a fontconfig expert, whereas before they didn't even have to know this file existed.

If you include .egg-link files with your application that "link" to other libraries, how is that different from an import statement "linking" to another library? It's just adding additional work to your import lines. What if I want to write a plugin for application X which imports a library from application Y? What is my "application"? How do I install under package X in such a way that I can then "link" to package Y? Perhaps each project should also come with an XML config file which describes all its dependencies? The pygtk project had thoughts along these lines before, and their solution has mainly made people unhappy: http://www.tortall.net/mu/blog/2006/01/18/pyxyz_require_sense

I've been talking a lot about random C libraries as if they might be in Python (and I hope in the future more will) but let me speak directly and practically about Python as it is now. The status quo may not be perfect, but it effortlessly allows me to write python plugins for nautilus or gaim which import twisted, gtk, vte, sqlite, ogg, BitTorrent, or any other library on my system. It seems you want to break that by unbundling every library (I have over 100 python libraries installed through Ubuntu's packaging system, and a half-dozen installed in my user environment) from my system, and putting it into the applications which use it, and making me re-install or re-declare the use of those libraries in my "working environment" for the plugin, all apparently to prevent some hypothetical breakage. (Is the plugin an "application"? How does its environment differ from that of the (usually non-Python) application it's hosted in?)

To get to the bottom of this, though: what's the real problem you're trying to solve here? Is it just making side-by-side installation so that applications don't break when subtly different versions of libraries are installed?

# Glyph Lefkowitz