Schevo and Durus

I saw two presentations on the second day of PyCon which interested me in combination, one on Schevo and another on Durus. I've always been reluctant about object databases. ZODB... well, it works, and I'm sure it works well -- I haven't really had problems -- but it doesn't make me feel safe.

This isn't because of integrity or stability issues as much as data stability, portability, upgradability, queriability (is that a word?) I want to be able to ask questions about my data (like, say, how many people registered in a certain date range), questions that I didn't think about when I was setting up my data. And I want to control how my data changes as my application and the data itself evolves; generally it should be easy, and it should always be possible -- ALTER TABLE and other operations (like multi-record UPDATE) mostly do that well in an RDBMS.

Schevo is a fairly restricted object model built on top of a database. It builds indexes and relations and maintains integrity for you, and seems to have some conventions to control upgrades. These are all the aspects of an RDBMS that I care about... well, at least most of the things I care about.

Schevo is really just the schema -- it builds on top of an object database (Pypersyst, Durus, or ZODB). Which is where Durus comes into play -- Durus is kind of a simpler reimplementation of ZODB. The only real way that ZODB sounded better was that it's threadsafe, so you can run it in process in a threaded environment. But Durus is client-server -- like ZEO (the client-server extension to ZODB) -- and that's good enough for me.

I like an RDBMS for many of its practical advantages -- it's really good infrastructure. But I'm not in love with them. I want data that can last for years and across systems; that doesn't have to be an RDBMS. Right now that's practically the only good option -- XML persistence is another option, or some other simple flat-file systems, but those are painful to work with. I still don't feel good about object databases, but at least this feels like a move in the right direction.

I'm trying to wrap my head around some of these issues also. I'm working on two projects now that both use RDBMS backends. One of them seems like a natural for using an OODB. It's a DB of network devices (routers, switches, APs, etc) and I've built my own simple ORM for it (could have used SQLobject, I know!). In many places, I already act like I'm using an OODB. E.g. instead of using crafted SQL to pull specific devices out of the DB, I loop over them all ('SELECT * from devices'), wrap each row up as an object instance, then check attributes on them to find matches (device.Model == 'Router M10'). I'll be examining this one to see how well something like durus could replace MySQL. I imagine it would be quite easy.

OTOH, the second DB is much larger and I think of as containing dumb data. A few well crafted indexes and queries are used to dump out search results as quickly as possible. There's nothing OO about it.

For what it's worth whenever I've dabbled with OODBs I haven't been able to quickly answer questions which involve aggregation, and I always want to do that to a greater or lesser extent. You know, the sort of queries like "Give me all of the transactions in the third quarter last year over five thousand dollars from suppliers in Western Australia"

This is the strength that catapulted relational databases over and above their network and hierarchial competitors in the 70's and 80's. Removing the need to re-organise your data when you want to look at it in a slightly different way made a big difference in the real world where change is the only constant. When object databases crack it (perhaps with a consistent query language) then I confidently predict adoption will skyrocket.

There is a lively thread on this very subject on the db-sig mailing list, look for 'Pycon2005 and database divide' at this page:

http://mail.python.org/pipermail/db-sig/2005-March/thread.html

Ian Bicking: the old part of his blog

Schevo and Durus

Comments: