Ian Bicking: the old part of his blog

More on Python Metaprogramming

David Heinemeier Hansson has some comments related to the Snakes and Rubies event. I have more I want to say in reaction to that post, and specifically to his comparison of Django models and Rails models. But I thought I'd at least start with a simple discussion of what can be done in Python. This is his Rails example:

class Person < ActiveRecord::Base
    belongs_to :project_manager
    has_many   :milestones
    has_and_belongs_to_many :categories
end

How do you implement this in Python? Here's the Django syntax:

class Project(meta.Model):
    project_manager = meta.ForeignKey(ProjectManager)
    milestones = meta.OneToManyField(Milestone)
    categories = meta.ManyToManyField(Category)

Here's the SQLObject syntax:

class Project(SQLObject):
    project_manager = ForeignKey('ProjectManager')
    milestones = MultipleJoin('Milestone')
    categories = RelatedJoin('Category')

To me they all look pretty similar. So maybe it's not such a good example. But anyway, here would be the direct syntactic port of ActiveRecord:

class Person(ActiveRecord):
    belongs_to('project_manager')
    has_many('milestones')
    has_and_belongs_to_many('categories')

This is quite hard to implement in Python. There's only two cases I know of that implement kind of function in a class body -- implements() and advise() in Zope and PEAK respectively. Doing this involves deep tricks, including putting the interpreter temporarily into trace mode and changing metaclasses around.

But it's still hard because belongs_to() in Python won't get the class as an argument (as it does in Ruby, as an implicit self aka @). And it's yet harder because the class doesn't exist; classes in Python only come into existance after their bodies are evaluated.

So the semantic (not syntactic) equivalent of the ActiveRecord code would be:

class Person(ActiveRecord):
    pass
Person.belongs_to('project_manager')
Person.has_many('milestones')
Person.has_and_belongs_to_many('categories')

I think there are some Python ORMs that look like this. Basically you are running some class methods, and those methods modify the class. This is not complicated in Python at all, but admittedly it doesn't look as pretty. Some people have proposed that decorators should be allowed on classes, at which point this might look like:

@belongs_to('project_manager')
@has_many('milestones')
@has_and_belongs_to_many('categories')
class Person(ActiveRecord):
    pass

A minor change here is that belongs_to and friends will probably not be in the ActiveRecord class. Also, I think if anything this looks worse than the last example (which unlike this one actually works). Now I think I'm -1 on class decorators.

Of course, neither SQLObject nor Django use these techniques. They use attributes to store these relations, because in Python the only easy annotations you can make on a class are with attributes. Everything else gets thrown away.

Attributes can be somewhat magic, but there is a limit in Python. That limit is what descriptors can do, and basically descriptors can respond to attribute access. They can't tell the class that they exist (until someone tries to access them), they never know what attribute name they are bound to, and they don't know what class they are bound to until they are accessed. In ORMs this causes some problems, because classes really want to know what columns they have, and columns want to know what name they were given.

Some systems use a model that's just a bunch of dead data, until a compilation phase that turns it into Python source code. MiddleKit does that; Django used to do that but doesn't anymore.

Other systems, SQLObject and now Django, use metaclasses. Metaclasses can be magic, but they aren't magic here. With a metaclass you can trigger some code to be run everytime a class with that metaclass is created (and subclasses inherit the superclass's metaclass).

In effect, it's like:

class Person(ActiveRecord):
    project_manager = belongs_to
    milestones = has_many
    categories = has_and_belongs_to_many
# Implicitly:
Person.setup_everything()

Where setup_everything looks at all the attributes for "magic" attributes, and lets those attributes modify the class.

SQLObject doesn't work quite like this, but it's moving that way. Now SQLObject uses a simple metaclass with a class method __classinit__ that takes the place of setup_everything. It's also (in the svn trunk) starting to use a convention for those magic attributes. Up until now SQLObject just knew that *Col values were columns. And then joins and indexes were also added. I don't want to add any more new things, so I've made __classinit__ look for any new attributes with a method to call like __addtoclass__(cls, attr_name), and the attribute can in turn modify the class however it chooses. I'd rather it not modify the class too much, but it's necessary at times. The other thing that's come up is that this all gets hairy with subclassing (and probably does in Ruby too), and I've found events to be important. But they are also more complicated than they should be just to fix things up on subclassing.

But anyway, in the end Python can be pretty much like that ActiveRecord example, except just slightly inverted, and using a superclass that supports that kind of programming. I would like for those specific techniques to become more widespread and idiomatic, and even better for us all to agree on some simple conventions. But there's no dramatic differences here from Ruby, just like similar original sources make clear.

Update: there's another technique for ActiveRecord-like functions which I hadn't thought of; since it showed up three times in the comments, clearly it needs acknowledgement: a Django experiment (wasn't actually applied to the codebase), and two examples with a mockup of the sort of code you'd write.

To summarize these, they involve having the belongs_to function (and any similar functions like has_many) saving an object to some global variable. Then when the class is actually created, it uses a metaclass to pull out any values it finds in that global variable and applies them to the class.

This is distinct from Zope's implements() (and PEAK's advise()) which can operate on any class. It's this ability to apply a function to a class that was not specifically designed for such application that makes it unique (and hard to implement). But with some cooperation from the class it is not hard to implement. Of course, it's a little funny that these two are equivalent using these recipes:

belongs_to('project_manager')
class Person(ActiveRecord): pass

and:

class Person(ActiveRecord):
    belongs_to('project_manager')

But oh well.

Created 06 Dec '05
Modified 08 Dec '05

Comments:

The "attribute creation without assignment" thing (like the implements() and advise() examples you brought up) was considered for Django, but we ended up deciding it was too much magic and could confuse experienced Python programmers. Check out http://code.djangoproject.com/attachment/ticket/122/proof_of_concept.py for the proof-of-concept implementation by Robin Munn, if only because it's an interesting hack. :)

# Adrian Holovaty

Pretty interesting. I'm working on some django stuff atm, and I have ended up using events (cheesy homegrown atm) and making the attributes do most of the work as well. So I think django and sqlobject will continue to be pretty similar...

# rjwittams

I'd use syntax like this:

class Project(ActiveRecord):
    _db_attributes = dict(
        project_manager = meta.ForeignKey(ProjectManager),
        milestones = meta.OneToManyField(Milestone),
        categories = meta.ManyToManyField(Category))

This makes the job of the metaclass slightly easier (and quite a bit more efficient) and satisfies EIBTI.

# Christian Tanzer

>EIBTI Nice DETLA. ;)

# Chris

I think one of the key differences the snakes&rubies brought out was David's (and by extension, Ruby's) love of DomainSpecificLanguages. Which seems to be a hot item these days, and as David says in the post you linked to about the event, not a focus of as much interest or support in Python.

And part of the reason for that is highlighted in the Magic & Backtracking post: backtracking and clear indication of where something came from (via namespaces etc). I'm not saying Ruby doesn't do this nicely (I have no idea), but DSLs of the ruby-style in python seem to require metaprogramming monkeying, which immediately feels "unpythonic". But I'm not convinced that the ActiveRecord DSL is the only sane one for representing Database models.

Say we rename "ForeignKey->Is_A", "OneToMany->Are_Many" "ManyToMany->Is_Many_And_Are_Many", ignoring the awkward is_a inheritance meaning. Now, in the case where we actually have to specify the class that a column maps to:

Only slightly tortured English:

A Project belongs to a project manager who is a Person.
A Project's projectManager is a Person

A Project has many Milestones which are (of) Milestone.
A Project's milestones are many (of) Milestone.

(I personally think ManyToMany breaks down in both cases, because in both cases we are really specifying the attribute for this class to access via: I would rather specify just "Many" on both sides.):

A Project has many and belongs to many categories which are (of) Category
A Project's categories are many (of) Category (and by the way, a Category's projects are many (of) Project

Rails: Project belongs_to a projectManager (class Person)
SQLObject: Project projectManager IsA Person

Rails: Project has_many milestones (class Milestone).
SQLObject:  Project milestones Are_Many Milestone

... (getting off track, leaving it in here to show my train of thought)

At which point Rails people will scream DRY, and I'll probably agree, so allow the class to be derived from the name.

Ok, so basically I'm saying with python we're stuck with parentheses and attributes. And because of attributes, we're somewhat stuck with = being our verb. But that just means our language for models is different:

class Project(SQLObject):
 projectManager = A(Person)
 milestones = Many()  # explicitly: Many(Milestone)
 categories = ManyAndMany() # again, from the Project side I'd ideally like if this was just Many()

(ie, to some extent we're getting pulled into a debate constrained by ActiveRecord's words in their DSL for models.)

# Luke Opperman

That limit is what descriptors can do, and basically descriptors can respond to attribute access. They can't tell the class that they exist (until someone tries to access them)

I don't understand what you mean by that. Classes (and even instances) can certainly know that a descriptor exists in its own dict, like any other attribute.

...they never know what attribute name they are bound to

Unless they are told what name they are bound to, which is easy enough to do in the constructor. In the case when you define a descriptor in the class body, you can use a metaclass to tell the descriptor what name it's bound to, and skip repeating the name.

...and they don't know what class they are bound to until they are accessed.

Again, unless they are told, which is easy enough.

In ORMs this causes some problems, because classes really want to know what columns they have, and columns want to know what name they were given.

In Dejavu*, I got around the second issue (column names) by giving each descriptor a "key" attribute, which is either provided in the constructor or (more commonly) by the metaclass. I got around the first issue by storing a _properties dict in the owner class, of the form: {descriptor.key: instance_value}. Each instance makes a copy of that dict for itself. In this way, both the owner class and its instances know which of their attributes are UnitProperty descriptors.

# Robert Brewer

I don't understand what you mean by that. Classes (and even instances) can certainly know that a descriptor exists in its own dict, like any other attribute.

The point is that the class has to do some extra stuff to find the attribute, where ActiveRecord/Ruby's technique doesn't require anything special in the class. You can't know that a column object exists until you search your __dict__. And follow MRO, for that matter.

I do explain right after this how you can make a class look for these things, and give attributes access to some of that extra information. I'd like for this to become a natural idiom in Python, not viewed as special magic. So I'd like a consistent set of practices, which doesn't presuppose what kind of object you are looking for (i.e., doesn't involve looking for all column attributes).

# Ian Bicking

Good discussion. I might add that while associations can be reasonably assigned to attributes mentally, it's harder to do something like that for acts_* and other meta-programming wrap-ups in Rails. So for example:

class Story < ActiveRecord::Base
  belongs_to :iteration

  acts_as_taggable
  acts_as_list :scope => :iteration
end
# David Heinemeier Hansson

If I were to do these in SQLObject (and someone else has started some of these), I'd do:

class Tag(SQLObject):
    name = StringCol(alternateID=True)
    # Assuming joins.PolyMorphic is a signifier to take the place of a class name:
    items = joins.ManyToMany(joins.Polymorphic)

class Story(SQLObject):
    iteration = ForeignKeyCol('Iteration')
    sort = IntCol()
    tags = joins.ManyToMany('Tag', polymorphic=True)

class Iteration(SQLObject):
    stories = OneToMany('Story', mutableOrderBy='sort')

i = Iteration.get(1)
i.stories.moveUp(1) # or maybe...
i.stories.swap(0, 1)

s = Story.get(1)
fiction = Tag.byName('fiction')
s.tags.add(fiction)

Joins in SQLObject (unlike Django, I think) are defined on both sides of the relation. I'm okay with that for a few reasons:

  • Each class is a fairly complete description, and nothing magically appears because of something else in the system.
  • belongs_to or has_many? There's no reason to use one over the other. I dislike forcing people to make arbitrary decisions.
  • Each side of the join has certain options which aren't symmetrical tied to the other end. Cascading makes sense on the ForeignKey. Ordering makes sense on the target of the key. You'd have to duplicate all these options to both ends for both ways of expressing the relation, but possibly with subtely different language.
  • Less name generation means less rules. Rails pluralization rules certainly don't seem appealing. I get the impression from what I've heard from you and others that a lot of time has been spent on that feature and probably even more time on discussion of that feature. So (to me) it's just as well that the code includes both iterations and 'Iteration', and both stories and 'Story'.

Generally speaking, I'd like to see functionality like the acts_* functions in SQLObject, but for each case having the functions grouped under a single attribute. So where acts_as_nested_set adds a handful of methods, in SQLObject all those methods would live under a single attribute.

# Ian Bicking

During that part of the presentation, I was quite curious what the database generated underneath looks like for those polymorphic relationships. From the example in the presentation, how does it represent "Taggings"? Does it actually create:

  Person               Message
    |                     |
 PersonTaggings     MessageTaggings
(person_id, tag_id) (message_id, tag_id)
            \      /
              Tag

And then Tag.taggings.collect knows all the intermediate tables to combine?

Guess I have to dig into the ActiveRecord code.

# Luke Opperman

I don't know whether it's really desirable, but it doesn't seem to be hard to achieve the 'direct syntactic port of ActiveRecord:' using an auxilliary queue and a registration callback e.g.,:

_RegisterPropertyQueue = []

class ActiveRecord(object):
    class __metaclass__(type):
        def __init__(cls, name, bases, dct):
            setattr(cls, "_has_many", set())
            setattr(cls, "_belongs_to", set())
            setattr(cls, "_has_and_belongs_to_many", set())
            for obj in _RegisterPropertyQueue:
                obj.__register__(cls)
            del _RegisterPropertyQueue[:]


class RegisterProperty(object):
    def __new__(cls, *args, **kw):
        o = object.__new__(cls)
        _RegisterPropertyQueue.append(o)
        return o
    def __register__(self, cls):
        raise NotImplementedError

class Relation(RegisterProperty):
    def __init__(self, other):
        self.other = other

class has_many(Relation):
    def __register__(self, cls):
        other = self.other
        cls._has_many.add(other)
        other._belongs_to.add(cls)

class belongs_to(Relation):
    def __register__(self, cls):
        other = self.other
        cls._belongs_to.add(other)
        other._has_many.add(cls)

class has_and_belongs_to_many(Relation):
    def __register__(self, cls):
        other = self.other
        other._has_and_belongs_to_many.add(cls)
        cls._has_and_belongs_to_many.add(other)



class ProjectManager(ActiveRecord): pass

class Milestones(ActiveRecord): pass

class Categories(ActiveRecord): pass

class Person(ActiveRecord):
    belongs_to(ProjectManager)
    has_many(Milestones)
    has_and_belongs_to_many(Categories)


assert ProjectManager in Person._belongs_to
assert Milestones in Person._has_many
assert Person in Categories._has_and_belongs_to_many
# Michael Spencer

Oooh, nice Python code. I'm really interested in how these techniques develop in the Python community.

There's a third trick you can do with Ruby's metaclasses, but it depends on a block-form lambda.

The following example is inspired by UnrealScript, which is a Java-like game programming language. UnrealScript objects can be in different states, and each state can selectively override methods:

class Monster < Actor
  on :make_noise do
    "Growl!"
  end

  # "default_" should be a decorator.
  default_state :awake do
    on :drink_warm_milk do |glasses|
      self.state = :asleep if glasses >= 1
    end
  end

  state :asleep do
    on :make_noise do
      "Snore!"
    end
  end
end

monster = Monster.new
monster.make_noise # -> "Growl!"
monster.drink_warm_milk 2
monster.make_noise # -> "Snore!"

There's more syntactic clutter here than I'd like, but the code works. The biggest limitation is that Ruby doesn't support default values for block parameters.

Is there an obvious way to approximate this in Python? I suspect there might be something with decorators that would get pretty close.

# Eric Kidd

Perhaps a stupid question....

Can't those relations/constraints be defined right in the SQL schema?

Perhaps that's too much coupling to a specific storage model? I kinda like that because sometimes you end up with multiple toolkits (in multiple languages - including command-line sql) in place that could interact with the data storage, so having one consistent set of rules with no back-door seems like a good thing....

# Bill Seitz

It's not so hard as you'd think

# Crack fingers

# File orm.py:
belongs_to_gatherer = []
def belongs_to(what):
    belongs_to_gatherer.append(what)

has_many_gatherer = []
def has_many(what):
    has_many_gatherer.append(what)

class MetaRecord(type):
    def __new__(cls, name, bases, dictionary):
        global belongs_to_gatherer, has_many_gatherer
        Record = super(MetaRecord, cls).__new__(cls, name, bases, dictionary)

        # Now we grab what we've gathered and run the respective methods on them:
        for i in belongs_to_gatherer:
            Record.belongs_to(i)
        belongs_to_gatherer = []

        for i in has_many_gatherer:
             Record.has_many(i)
        has_many_gatherer = []

        return Record

class ActiveRecord(object):
    __metaclass__ = MetaRecord

    @classmethod
    def belongs_to(cls, what):
        # Set it to belong to: what
        print cls, "belongs to", what

    @classmethod
    def has_many(cls, what):
        # Set it to have many: what
        print cls, "has many", what

__all__ = [ActiveRecord, belongs_to, has_many]

# In another file:
from orm import *

class Person(ActiveRecord):
    belongs_to('project_manager')
    has_many('milestones')
# brantley

Note that all these examples (this is the third along these lines) require cooperation on the part of the class. I hadn't really considered that, but it is different from Zope's implements() which can be applied to any class without any cooperation on the part of that class.

# Ian Bicking

I liked that! Pretty cool. What about?

def belongs_to_classmethod(cls, what):
print cls.__name__, " Belongs to ", what
class MetaActiveRecord(type):
def __new__(cls, name, bases, dictionary):

Record = super(MetaActiveRecord, cls).__new__(cls, name, bases, dictionary)

if dictionary.has_key('belongs_to'):
Record.belongs_to_classmethod = classmethod(belongs_to_classmethod) Record.belongs_to_classmethod(dictionary['belongs_to'])

System Message: WARNING/2 (<string>, line 13)

Definition list ends without a blank line; unexpected unindent.

return Record

class ActiveRecord(object):
__metaclass__ = MetaActiveRecord
class Person(ActiveRecord):

belongs_to = 'project_manager' def __init__(self):

System Message: ERROR/3 (<string>, line 22)

Unexpected indentation.
print "Person __init__"

a = Person()

prints: Person Belongs to project_manager Person __init__

# Gordon Scott

By the way, Zope 3 has a few more things like 'implements' that are being used more and more as Zope 3 matures. Most commonly, these things are used on interfaces, which is the biggest in-language 'DSL' that Zope 3 has. A common thing is to make constraints about what kind of objects a container might hold, or what kind of containers an object might be placed in.

from zope.interface import Interface, Attribute
from zope.app.container.constraints import contains, containers
from zope.app.container.interfaces import IContainer, IContained

class IThing(Interface):
    """ Just a thing. """
    about = Attribute("What this thing is about.")

class IThingHolder(IContainer):
    """ A container that can hold things. """
    # This sets up a constraint on the container so that it can only
    # hold objects that implement the IThing interface
    contains(IThing)

class IThingContained(IContained):
    """
    An interface stating that things can be contained in thing holders.
    """
    containers(IThingHolder)

This is much better than an older way of doing the same thing.

from zope.app.container.constraints import ContainerTypesConstraint
from zope.app.container.constraints import ItemTypePrecondition
import zope.schema

class IThingHolder(IContainer):
    """ A container that can hold things. """
    def __setitem__(name, object):
        """
        Redefine __setitem__  in the interface so that a precondition
        can be set to restrict contained objects to IThings
        """
    __setitem__.precondition = ItemTypePrecondition(IThing)

class IThingContained(IContained):
    """ Put a constraint on the __parent__ field """
    __parent__ = zope.schema.Field(
        constraint=ContainerTypesConstraint(IContentContainer),
        )

There are other little things around that also have made life a little bit nicer, such as object adaptation.

from zope.interface import implements
from zope.app import zapi
from example.interfaces import ISquarePeg, IRoundHole

# Old Way
class OldSquarePegToRoundHoleAdapter(object):
    # even older style would be __implements__ = IRoundHole
    implements(IRoundHole)
    __used_for__ = ISquarePeg

    # implementation....

# New Way
class SquarePegToRoundHoleAdapter(object):
    implements(IRoundHole)
    zapi.adapts(ISquarePeg)

It's a nice showcase for what is possible. What's nicest about this is that 'implements()', 'adapts()', and so on can be looked up easily in an API documentation tool. Trying to remember all of the __funny_names__ is not so easy.

# Jeff Shell