Ian Bicking: the old part of his blog

site-packages Considered Harmful

The more I've worked with installation and deployment and setuptools, the more I think site-packages (as in having /usr/lib/python2.4/site-packages on the path) is a bad idea, and that it should be empty and maybe not even on sys.path at all.

Why should anything get installed for the "site" aka "the computer"? The standard library gets that treatment, only only kind of, since there is actually a separate standard library for every major Python version, and those are kept carefully separate.

And the standard library is certainly not developed like nearly any other libraries that are used. It gets updated only with bugfixes until major releases, and even then with considerable consideration of backward compatibility. There are a few libraries that are similarly stable, like mx.DateTime, but not that many. (And in practice it is okay to put that kind of highly stable and infrequently-updated library in site-packages.)

For everything else site-packages is just a mess. What happens when you upgrade a library there? Who knows... maybe nothing (if there's another package earlier on the path that covers up the package in site-packages). Or maybe several packages will now need upgrades, as they depended on an interface that has now changed. Or maybe the system is just broken, because upgrades aren't available for everything that depended on the library.

And the whole thing is very implicit. Debugging import errors and sys.path is a pain in the butt. It leads to problems of software appearing to be fixed or upgraded or edited, when the actual code being loaded isn't what you intended. There are a lot of different things you can do with $PYTHONPATH, sitecustomize, .pth files, and other sys.path modifications. The order in which you do things can matter a lot. This all sucks.

Instead there should be a global installation of the standard library, plus those boring/stable libraries that can practically be considered "standard". And that's the only globally installed thing at all. Everything else is only installed in a specific context.

What is that context? I'm not exactly sure. I think all the pieces are in place to make the technical issues easy to resolve, and to make it easy to manage and work with localized contexts. But exactly how the user/developer works with these pieces, I'm not yet sure. But I am pretty sure that once we ditch site-packages completely -- get rid of the crufty old highly-coupled installations that regularly confound even the most experienced Python developers -- a whole lot of things surrounding installation will suddenly become a lot easier.

Created 08 Feb '06

Comments:

IMHO, what is worse is Python based web applications which try and store the modules corresponding to distinct web pages into the standard Python module system. Throw in concepts like automatic reloading of the modules generating a web page when the module code file on disk changes and you have a good recipe for disaster.... ;-)

# Graham Dumpleton

FYI, I've found your virtual-python.py script invaluable.

# Chris McDonough

site-packages should be used for system wide, stable packages. I try and use mostly things that are in Debian. That way I am pretty sure the package is useful, stable, easily installable(and upgradable, removable), and of a certain level of quality.

I recommend any finished package go through the getting into debian experience. It is a good peer review, and an extra set of bug testers.

Requiring root to install random-buggy-incomplete-python-package is quite bad too.

Have fun!

# Rene Dudfield

he more I've worked with installation and deployment and setuptools, the more I think site-packages (as in having /usr/lib/python2.4/site-packages on the path) is a bad idea,

setuptools is the problem, not site-packages. You're trying to do the work of a distribution -- managing having compatible libraries and applications all on one system. I was quite literally angry when I installed recent versions of SQLObject and saw that config.py was all of a sudden deciding to download setuptools, then setuptools downloaded formencode. It gave me a 15 second window, but that's no excuse for doing something you should never ever do. It was almost infuriating enough stop using SQLObject, and I do think that setuptools should be banned.

Here's the thing, Python tools shouldn't address problems already solved. The issue of handling multiple versions of libraries on a system is one that is far outside the scope of Python, and exists for pretty much every library and system. There are also solutions outside of the scope of Python -- the biggest being (1) naming major version differences, like pysqlite2, as modules themselves, (2) using a distribution. If your libary isn't backwards compatible, put the version in its name -- problem solved.

As for the rest of it, apt-get, PRM, emerge -- all these tools manage dependencies and conflicts very well, and they work for every application, not just python ones. They'll always do a better job than setuptools, and they handle all development platforms, unlike setuptools. Setuptools is a horrible, categorically bad idea. Stop using it right now.

Two cents.

-Ken

# Ken Kinder

Distributions aren't doing this work, they aren't trying to do this work, and they are so far from doing this work that is it laughable to expect any resolution to come from that direction. They aren't even trying!

What major packaging system allows for multiple parallel installations of the same library? Not just in Python, in anything? Certainly Debian does not. The only way they do it is by changing the name of the package itself, and then because of site-packages they not only have to change the name of the distribution package, but the actual Python package. That's not something you do incrementally or lightly.

Current packaging systems are good at maintaining the global state and coupled software we've already created. That is good, because that software can be hard to maintain otherwise. But in fixing that problem they introduce many limitations, limitations that anyone who has used those systems knows about. And honestly, they don't offer that much in return. How often do you find the library you want available in a current version? In my experience it is quite uncommon.

Maybe I would be more optimistic about current distribution formats if I felt people involved in these packaging projects -- Redhat, Debian, Ubuntu developers, and so forth -- were trying a little harder to address these problems. (Well, credit to Ubuntu for trying harder than the others.) But I only find out when a library I write shows up in a distribution indirectly, and I don't get any feedback about synchronizing build processes, creating consistent metadata, or anything else. I don't see them trying to push for better standards about how packages get installed, or providing general feedback on these issues.

And distributions themselves would save a lot of work by going down exactly the path I describe. When there's a Python app you want to install, just package it up as one big bundle, and don't try to separate it out into its respective libraries. That saves everyone time and effort.

# Ian Bicking

source
Plus, waiting for distributions to solve it introduces a massive time lag. I'm perpetually annoyed with installation of packages, because I really don't like putting non-distribution packages into distribution directories like /usr. Putting things in /usr/local still means, essentially, that you need to be root, and giving one particular project a Python module that it needs is a massive pain. I too use virtual-python.py, and it's a solution, but (and do please take this the way it's meant) it's a nasty hack rather than a proper way to solve the problem. It ought to be easy to, say, drop TurboGears stuff in a directory of my project and have it get used, and it completely is not easy. Setuptools does not help here one little bit, because it thinks it's managing the One Central Python and has to be contorted into a per-user basis; it can't do per-project installation of extra packages without further contortions.
# Stuart Langridge

What major packaging system allows for multiple parallel installations of the same library? Not just in Python, in anything?

I'm not sure if you would consider it a "major packaging system," but GoboLinux? does this fairly well. It installs each package in it's own directory and links to the current 'default' version. But, other than that, I haven't seen anything.

MWM

# Matthew Marshall

What major packaging system allows for multiple parallel installations of the same library? Not just in Python, in anything?

Gentoo's portage has SLOTs. AFAIK this is exactly it. I have many standard libs in multiple versions on my system.

With Portage different versions of a single package can coexist on a system. While other distributions tend to name their package to those versions (like freetype and freetype2) Portage uses a technology called SLOTs. An ebuild declares a certain SLOT for its version. Ebuilds with different SLOTs can coexist on the same system. For instance, the freetype package has ebuilds with SLOT="1" and SLOT="2" [1].
[1]http://www.gentoo.org/doc/en/handbook/handbook-x86.xml?part=2&chap=1#doc_chap4
# Fabian

I don't know if it counts as a "major packaging system" (I guess not if you're referring to OS systems), but RubyGems seems to be designed to do just this:

http://onestepback.org/articles/rubygemsfacets/multipleversions.html

But it of course requires doing things with require since it's not built-in to the language (at this point). I can imagine a similar tact in Python giving people fits.

# ToddG

Setuptools has a very similar scope to Gems, with very similar motivations, and probably several similar techniques.

# Ian Bicking

I should also note that there is work on this kind of isolation in a general way, in the form of people setting up full OS virtualization. This is kind of heavy handed, but then that's where the work is happening, to make this a reasonably efficient and manageable way to deploy things. It still can't be as granular as a per-script setup. But it provides many of the same advantages on a larger scale.

# Ian Bicking

The idea is less to have every single version of a library installed that each package might have been developed with. The entire purpose of shared libraries is that the entire system uses one library. You can always just bundle your own private libraries with your application, which is kind of wasteful on disk space, but useful when you don't know the destination environment...

# Ken Kinder

Debian tries to limit the number of separate versions of a library in order to avoid the need to update many different versions when (I daren't say if) there is a need for a security patch. If the upstream developer keeps breaking source compatibility between versions, the library probably doesn't belong in Debian (or in any non-toy application).

# Ben Hutchings

So what you're saying is that Zope is on to something with its software home and instance home? For those uninitiated, a Zope instance is what you would start and can respond to requests. A single piece of installed Zope software can be used to create multiple instances. Packages can either be placed in a particular instance (instance home), in which case they're instance specific, or in the software home, in which case they are available to all instances of that version of Zope. Rarely packages are placed in site-packages.

New versions of Zope installed result in their own software homes, and can have their own instances.

So what we could do is formalize some of these concepts and create a more generic way to do all this going beyond Zope. Zope can then fit into that pattern.

# Martijn Faassen

Yes, very much like the instance home. Jim was just asking about how that and setuptools would interact recently as well, and Phillip was talking about local plugin directories, and I realized site-packages was just becoming a global plugin directory, and that this global plugin directory cause me nothing but pain. So figuring out how to use setuptools with Zope instance homes, and then how this interacts with plugins and development and other use cases, could lead to a much more pleasant experience for everyone.

One of the issues with the Zope instance home is that Zope "owns" the home, more or less. It specifically is not part of the home. But every other piece of software is part of it. But there are many situations where no one piece owns the working set of packages, it is something more abstract. But if you install a command-line tool, I would expect it to be contained by the home similarly to how anything else might be contained. But if you are installing a web application that uses Paste, I would expect PasteScript (and the paster command) to be part of that same home, and not have a home of its own.

So I think these instance homes -- or "working sets" -- need to be more abstract than belonging to any one product. For some packages -- Zope being notably large (nay, absolutely huge) -- you also don't literally want a complete copy of the package in each instance home. So we have to consider how these can be centrally managed in some fashion. I think setuptools .egg-link provides the tool for saving disk space. The reporting tools to manage all the links and instance homes are still a big deal, and don't exist yet.

# Ian Bicking

+1 on the idea of INSTANCE_HOME. However, I think you do want a globally-installed executable that "owns" the instance. Think of the instance as a word processing document, and the executable as your word processor. Files and applications, man. The zen is in defining a generic-yet-coherent problem domain for the application: word processing, spreadsheets, presentations.

<horn-tooting>

httpy hits the generic-yet-coherent sweet spot for the "website" problem domain. The httpy executable "owns" an instance. Instances can contain multiple sub-apps, each with their own libraries located in situ. So, for example, a single site could integrate a CMS written with Zope 3, a Roundup issue tracker, and a custom contact form, let's say. You also get sitewide in- and outbound hooks for centralized security, templating, etc.

Full docs at http://www.zetadev.com/software/httpy/.

</horn-tooting>

"[Y]ou also don't literally want a complete copy of the package in each instance home."

Why not? Disk space is definitely not the limiting reagent.

# Chad Whitacre

Totally agreed.

One of the big things I have to deal with are black-box "appliance" installations of hosts. In this case, my application should not have to modify site-packages to install things it needs (like sqlobject, postgresql packages, etc) on the remote host. Once I modify anything outside of my $CWD (i.e: the installation path) I have altered the system in such a way that in all future installations I have to walk down into that site-packages directory and check for versions/updates/etc.

I typically work around this by dropping everything into a /lib directory of my app - and then hacking sys.path to suck in the directory relative to my applications path. This makes execution outside of that directory difficult as I have to assume that /lib is relative to the binary location. I realize there are other ways around this - but have something built into the interpreter that says "I am only going to check dir X for modules (of a set revision)" then otherwise forcing users/applications to hack with sys.path and the $PYTHONPATH is sort of cludgy.

I don't have any good alternatives - but I do know that in addition to having to work with an appliance where I shouldn't be modifying anything outside of $MYDIR and then potentially working with multiple versions of Python on the same machine (2.3/2.4) with the current site packages methodology is really frustrating.

# Jesse

But site-packages IS empty!! Stuff end up there when users put them there! I don't see what the problem is? If you have trouble with site-packages, why do you install stuff there?

Personally, i usually bundle third-party modules with my software. Some things that are more "standard library-like" and that i trust to be stable i install through the packaging system. Kind of like what you propose. I've been doing this for years without problems. So i don't see what you are complaining about.

And frankly, a lot of your arguments are not really specific to python, and can be applied to /usr/lib and shared libraries in general. So this is just the old static versus dynamic linking argument all over again, and the same arguments apply.

# Fredrik

And on the flip-side -- if you don't want trust site-packages, your startup script can rip it out of sys.path...

# Patrick Maupin

+1 to Fredrik! I also have to throw in agreement with Ken that the whole setuptools is quite frustrating when it doesn't work. I actually chose to write a scraper with Perl's WWW::Mechanize over the Python equivalent because setuptools did not work correctly on OS 10.4.4. I simply failed, downloading the same packages over and over; never recognizing that things were installed. This is better than distutils how?! I like Python. I prefer Python. It physically hurt that I had to use Perl to get my job done.

Frankly, I really don't care where things are installed as long as it's consistent, customizable, and usable. setuptools and ez_install failed for me. It may have been designed to be easy to use, but it wasn't consistent and certainly wasn't usable. Perhaps in time, they'll work the bugs out. However, I still prefer having the option to use simple distutils setups to install my Python software.

# Chad

Does this not work for you?

python setup.py easy_install --no-deps .

I'm curious why not. (Just didn't know it existed perhaps?... it is documented in the very module you're complaining about, but perhaps I should make it more prominent.)

# John Lee

Two other points:

  1. The project you complain about is mine and hasn't been releasing very often recently (I must fix my crappy release scripts). Also, the use of setuptools by that project is new. So, I put my hands up and admit that the failure could easily be my fault, not setuptools'.
  2. If you get time, a bug report is welcome! (Send it to me -- if it's setuptools' fault, I'll forward the bug report along)
# John Lee

Certainly a lot of these issues exist outside of Python. However, we can solve them in Python, whether or not they are solved elsewhere.

I can figure this out for myself, and I feel we're getting close to where we need to be at work. The problem though is that, having figured it out, the only way to express this is as a bunch of additions and configurations to the system, some practices, and some guards. I just want us to agree on a good way that people should do this -- it's not a technical issue. And the tools should not just support that practice, but encourage it, with one set of conventions.

Right now one of the reasons some people balk at setuptools is because it solves problems they have already been solving on their own with hacked setup.py files and sys.path hacks. Setuptools breaks a lot of these hacks (it tries really hard not to break too many, but it is inevitable), since it is actually trying to come up with an inclusive way to do these things. I think some people have become comfortable with their own practices, and have forgotten that other people don't use those practices, don't know what those practices are, and don't have any tools to help them go down that path. So, that's what I'm complaining about -- the default distutils configuration, and the default sys.path, and the default site.py, all encourage bad practice. They don't stop you from doing things the right way, but they don't help either.

# Ian Bicking

I have multiple versions of the same Python packages on my machine and I want installations of these things to remain separate in most cases, but I also want it to be easy to use one of them when I need to, and when I need to install them, I want them to be easy to install. To get there now, I create a virtual python for each "context" and just use "setup.py install" that puts the package in the virtual python's site packages dir. I have maybe five or six of these littering one project directory right now (slightly different versions of the same codebase combining different versions of different packages), mostly because it's the simplest thing that works. I can also just delete the virtual python directory and the things that happened to get installed to them just go away, which is nice, because it means I don't need to clean up the "master" site-packages.

One simple thing that could be done that might prevent vpython from being the simplest thing that can work is to have another distutils setup script flag that would just install packages to a particular place. The --home arg to distutils setup scripts is typically a bit confusing because it doesn't work on Windows, and it also appends "lib/python" to the path I give it (IIRC, in older versions, I think it used to do "lib/pythonX.X", which was even worse; I think I stopped using it then). Presumably it appends lib/python to the path because the package might need to install something in a "bin" directory, but that's quite rare in practice.

If there was some flag to a distutils setup script that just said "install all the packages/modules in this distribution exactly where I tell you to on the filesystem and don't guess anything else and don't worry about scripts and non-package data files", that would be helpful in some circumstances.

# Chris McDonough

"If there was some flag to a distutils setup script that just said ..."

+1. My method is basically the same.

# Chad Whitacre

Slightly off-topic, but I really wish more people who use distutils would test that their packages work properly with bdist_rpm. It may not solve the problem for distro packagers, who have their own constraints, but for end-users at very least it allows easier package tracking and updates. I understand that bdist_rpm isn't the most easy thing to use (the MANIFEST stuff is just ... arcane), but the benefits are pretty big.

# taj