Ian Bicking: the old part of his blog

Versioned Imports

I started thinking about versioned imports tonight (after thinking about the standard library), and thought I'd write down some of my ideas. These ideas are very half-baked.

Here's the syntax I imagine:

import cgi >= 2.3
import foo.bar == 1.0
And so on. In turn the package foo would actually reside in site-packages/foo-1.0. However, the version number in the path would only be part of the import process.

Each versioned module or package would have a variable __supported_versions__ which would be a list of versions this module was compatible with. Thus if foo version 1.1 was backward compatible, it might have a line like:

__supported_versions__ = [(1, 1), (1, 0)]
So if foo 1.1 was installed, import foo.bar = 1.0 would actually work. Thus, when you try to import a module, you find the nearest/newest version of the library that might work, check its __supported_versions__, and if that's not a match, you look again and try another version. This way you can support old versions either by retaining backward compatibility, or by having multiple versions of the module installed. This should keep an explosion of files from occurring, while still allowing multiple versions. (Problem: what if the module incorrectly indicates the version it supports? Or what if you depend on internal interfaces which aren't taken into account when evaluating backward compatibility? Maybe import foo.bar is 1.0? And does 1.0 apply to foo or foo.bar?)

Note also that all versions are tuples of numbers and strings. Strings (like "2.3") would be parsed into tuples, by finding all the contiguous digits and turning them into numbers, and throwing away all punctuation. So 2.3a1 becomes (2, 3, 'a', 1), much like sys.version_info. This makes comparison easier (e.g., "1.10" < "1.2", but (1, 10) > (1, 2))

In turn, you could indicate a default version for imports, if no version was explicitly specified. This would be in sys.default_library_versions, and would be a dictionary of package names to default versions, with "stdlib" for the Python standard library. Thus you might have a value like {'stdlib': (2, 1)} to run in a 2.1 compatibility mode. Explicit versions would always override this. You might also allow an environmental variable, like PYTHONVERSION="2.4:zope 3.1" to run in 2.4-compatibility mode, and with the zope package in 3.1-compatibility mode.

There should be some way of annotating compatibility outside of either module. E.g., version 0.9 of foo was written when version 1.0 of bar was released. Version 2.0 of bar is incompatible, but foo wasn't specific about what version should be imported. Since 1.0 is released and over with, the system administrator has to somehow indicate that the default version of bar only for foo 0.9 is (1, 0). I suppose this could be done with something like sys.default_library_versions, except it would be a similar dictionary that would only apply to ('foo', (0, 9)).

It might be possible to do something like this now, just using import hooks. Instead of the syntax using operators, it would look like:

import cgi__gt__2_3 as cgi
import foo__eq__1_0.bar as bar
Where this import hook would look for *__operator__version. Distutils wouldn't be compatible with this, of course, so you'd have to manually install and annotate modules to test it out.

Some advantages: you wouldn't generally need multiple versions of Python installed. You could install multiple versions of libraries. You wouldn't have to worry (as much) about breaking code when upgrading Python or your libraries; and at least it should be relatively clear how to fix problems that occur (easier than it is right now to mess around with sys.path?). In turn, library authors (including for the standard library) could break backward compatibility more freely, knowing that their users have coping strategies.

Thoughts?

Created 09 Oct '04
Modified 13 Aug '05

Comments:

I think about this a lot. I disagree with the way you're conceptualizing it though.

IMHO ">=" is a useless operator for imports. The only useful thing is to say, "I want interface version X" - any package that provides version X may then claim to *be* version X. "1 < x < 5" may be useful for some kind of packaging metadata - for example, to say that versions 1 - 7 claim to implement interface version 1, but only versions 1 - 5 actually do.

This can already be done simply by having a rigorous naming convention for packages, and providing some kind of implementation-hiding for python packages, so you can be sure of some narrow band that your program is implemented in terms of, and that your understanding of what is used publicly is the same as that of the module maintainer.

"Only what is hidden can be changed without risk." A module can only be expressed in terms of one version of an interface - although it may be a locally-defined subset which actually exists in multiple other interfaces, that itself should be encapsulated somewhere, so you don't accidentally import stuff you shouldn't be. If it is expressed in terms of more than one, then you need something like:

try:
from foo_api_1 import thingy
except ImportError:
from foo_api_2 import thing1, thing2
from my_compat_hacks import frobnicate
def thingy():
return frobnicate(thing1(),thing2())
# Glyph Lefkowitz

Sorry, but I think the "import foo == 2.0" is possibly the ugliest idea I've ever seen. Should I parse that as (import (foo == 2.0)) or ((import foo) == 2.0)? One way, you should be getting a NameError, and even if you didn't, now modules evaluate to FLOATS? The other way, 'import' becomes some new half-statement-half-expression monster that should be frankly be drowned at birth.

One of the real problems, I think, is exactly how the module versioning would work. Are we importing each possible matching modules one at a time until we find the one with the right version? That could get expensive, and lord help us if any of those modules have complicated initialization code.

Maybe we'd be better off extending the bytecode to make it easier to pull out a '__version__' variable without importing, and just extending the 'imp' module with a 'find_all_modules' function which returns a list of matching modules we can pick at the bytecode of.

# t e whalen

I am in favor of versioned imports. I do not have a problem with a directive with "==" or any other boolean operator. I somehow think an 'if' needs to be in there as well as some try/catch mechanism. Maybe all built in to some easy to understand/read syntax.

What concerns me most is a standard way to specify version. With so many places to indicate this and in so many different ways I am afraid that it will be very confusing to remember, find and keep the version. I almost feel that it would be better to have a __version__.py file much like the __init__.py file in the root of a module. Then the file will always contain a float of the major and minor version number. Likewise, I do not see a reason why disutils could not generate this file (when it does not already exist) from one of many other ways. Lastly, if may be helpful to allow the float to be calculated in a single string from a Revision number or by other means. Example content of __version__.py file:

float( "$Revision 1.11 $"[10:-2] )

or just:

1.11

# Brian Ray

Glyph: But there's no way to apply package naming to the standard library, which has no packages (mostly), only modules. I think the standard library needs versioning just as much as other packages... well, it doesn't need it now, but if the standard library is ever going to grow in a way that isn't strictly backward compatible then it will.

Using comparison operators is a little fishy, though I think there's a case for at least an operator for "provides a version", and one for "is a version". Code that is tightly coupled to an implementation (not an interface) is, IMHO, sometimes justified. It's just what happens when you are trying to get stuff done, not necessarily get stuff done the right way. So version requirements don't belong entirely to either the importer or the importee -- it's an agreement, where either side can be more or less strict about it. E.g., a critical bug or security fix should probably be possible to install even when the module is tightly coupled to another module. Or, the importer might determine that one of several versions of the imported package will work, but doesn't respect the imported package's assertion that it provides a certain interface.

So I guess I don't trust simply using conventions, I'd like more than that. Formal interfaces are a good way to think about this, but the reality is that versioning needs to apply to all packages, not just the ones that think about this situation formally. It needs to apply to packages and modules we have right now -- because practically, we will continue to have modules produced in the current style for several years, despite any new conventions that are added. I think we need tools, not just conventions -- a good tool can get stuff done even when other parts of the system aren't assisting it.

TE Whalen: Hopefully we could use the metadata in the directory to get the import right on the first try, most times. A clearer kind of metadata might also be called for -- right now filenames are the only metadata we have short of importing the module and looking for magic variables. This can be problematic; maybe we need a real index of what packages and what versions are installed, and where they are installed. Right now distutils setups are pretty much write-only, they don't seem to keep track of what they've done. I think there would be a number of things to be gained from changing that.
# Ian Bicking

> Thoughts?

I think you're over-engineering it. Go for the simplest thing that works for 99% of cases, and leave the underlying API (__import__, etc.) sufficiently open that folk who want to be obtuse can code up custom importing behaviours themselves. Far as I can tell, the simplest thing that works is to have three version numbers:

- one supplied in your script's import statement:

import foo version 1.1.0

- one supplied in the module/package file/folder name:

foo 1.1.5

- and one supplied in the module/package metadata:

__oldest_supported_version__ = 1.0.5


1. The 'import NAME version NUMBER' statement indicates the module version the script was authored against. (If the script author _knows_ their script will work with an older version of the same module, they may use the older version number instead.)

2. The version number in the module filename allows you to store multiple versions of the same module in the same directory. It is also the first number used when importing a module: the __import__ function should start with the newest module first.

3. When the __import__ function imports a module, it checks the module's __oldest_supported_version__ against the version supplied by the import statement. If the desired version is lower than __oldest_supported_version__, the __import__ function should load the next newest version of the module; and so on, until it finds one that is acceptable.

Notes:

i. If no suitable versioned module is found, import a non-versioned module (if available).

ii. If no version is given in the import statement, import the newest module available.

iii. If a versioned module has no __oldest_supported_version__ attribute, it's backwards compatible with all previous releases.

It's loose, simple, can comfortably co-exist with non-versioned modules and involves no fancy comparison operators or other bloat. The various underlying operations (list filenames and versions of all modules of a given name, compare version numbers, etc.) can be exposed as magic functions for the benefit of those control freaks who simply _must_ bind against version 0.133.45a72r4 and nothing else, but for the vast majority of users the system has a dirt simple interface and in practice simply does 'the right thing'.

...

BTW, regarding metadata: Python's module system just plain sucks at it. As T E Whalen points out, you really don't want to have to import modules to get their metadata. Magic attributes are a Bad Thing and should be gotten rid of. While burying metadata in comments at the top of a module (where they can be retrieved by reading the module as a plain text file), that smells a bit hackish to me and a better option would be to use the package format for all modules and have a standard 'meta.txt' file included in the package folder that contains all metadata in RFC-822 or other human-readable/writable format. This would actually be a good foundation for a unified, decentralised, module metadata scheme: a single location within each package containing all information describing that package that can be accessed via a standard API, rather than having it spread all over module attributes, distutils scripts, pkginfo files, etc. where it's frequently duplicated and hard to access. This is getting off-topic though and would be better discussed in a separate thread, so I'll leave it there.

Regards
# has

http://claytonbrown.com/cbwiki/ModuleVersioning

And heres one I prepared earlier....

Clay ;-)

The biggest problem I have found to date with this is packages whom import rather than from locally but from themselves

eg.

foo.bar
8<------8<------8<------8<------8<------
import foo.abc
import foo.xzy
8<------8<------8<------8<------8<------

I solved this in the few incidents I had of this by adjusting the code to

8<------8<------8<------8<------8<------
import abc
import xyz
8<------8<------8<------8<------8<------
# Clayton Brown

An extra 2 cents, I might add,

I feel the biggest problem is establishing a CPAN (or PyPi if you like) repository that allows such versioning, as well as binary compatability issues, and implementing this logic in pythons core through a PEP so that python inately asses which versions are specified/available/downbloadable etc.

This could allow python to satisify its import dependencies on the fly in future.
Along with move forward to newer versions automatically, with the ability to stop at any level, warn, halt, etc.
# Clayton Brown

Oy yeah, and wrapp this all up nicely in the PEP for importing python modules in zipfiles was the last logical step i thought which would add to this solution.

Ok so its been a while since I thought about this problem.... im a bit rusty.

Clay
# Clayton Brown

On a related note, I like the idea of a 'v' prefix for strings, such that v'1.0_a2' would return either a tuple (1, 0, 'a2') as exemplified in your post, or a value of type 'version'. I think this is a cute, concise way to support arbitrary version formats, but not generally useful enough to warrant the effort. There's also the problem of combinations with the existing string prefixes (do you just disallow them, or do 'vr', 'rv', 'vu', 'uv', 'uvr', 'urv', 'ruv', 'rvu', 'vur', and 'vru' become builtin identifiers).

Another option would be a builtin 'version' method (or type). version('1.0_a2') is quite clear, and probably not too verbose. Again, this could return either a tuple, or a value of type 'version'.
# ajrw

I really like the perl syntax: require :)
# Jkx