Ian Bicking: the old part of his blog

PyPI and small code

In some of my more recent developments I've been trying to build smaller, more reusable modules. A lot of them aren't necessarily terribly robust or generalized, but (for me at least) they are an improvement. Some of them might be useful to other people. And I'm sure there's lots of people who have done the same.

These files are part of larger projects, but aren't really related. For instance, the wiki has some modules related to file uploading (FTP and SSH) which are decoupled and pretty much unrelated to the larger application. There exist more complete libraries (like Paramiko), but there are many small-scope problems that call for small-scope libraries. I would probably have written less of these modules myself if I had a good place to find other people's work.

Right now the only place we have to keep this code is the Python Cookbook, which has a whole host of problems. It would be great if PyPI could take over that role.

Some issues:

With this in mind, the structure of PyPI doesn't seem right. But the Python Cookbook is too little structure. Safe Expression Evaluation is not a recipe. It's not a package either. It's a module. Maybe I don't want to "release" my code, but I do want to update it and provide the most up-to-date version. I do want to document it and provide it as a consistent module, not just a recipe. I want to provide both the module, and provide information about relationships to other modules (e.g., those ftpupload and sshupload modules go together, but can be used independently). As a user I want PyPI to keep the code around when the informal releases go away. I want to see what other people think of the code, and what other modules might be a better (or worse) choice, or modules that are symbiotic. I want other people to be able to take over or fork the module in a structured way (i.e., make the forking relationship clear).

When I think about the concrete features, most of them could apply to normal packages as well. So maybe this is just a generalization of PyPI.

Created 15 Jun '04
Modified 14 Dec '04

Comments:

PyPI is intended to be an index of metadata that is generated by distutils. I'm not sure I'm comfortable extending that scope to include actual code fragments. It would confuse the meta-data schema and user interfaces considerably.

Having said that, PyPI now has a reasonable user database which could be useful to you. Code like the Trove handling could also be useful. I've always wanted some form of comment / rating system for packages too - and that'd be essential for a code fragment database.

Finally, PyPI is bordering on being too large for the technologies it's built on; sqlite will need to be replaced by postgresql some time soon and the cgi.py-based web ui scales very poorly. Development such as you're proposing would push those technologies over the edge :)
# Richard Jones

Richard Jones:
"Finally, PyPI is bordering on being too large for the technologies it's built on; sqlite will need to be replaced by postgresql some time soon"

Eh? Could you elaborate on what you mean? What exactly is too large for sqlite?

From the sqlite FAQ:

"(10) Are there any known size limits to SQLite databases?

As of version 2.7.4, SQLite can handle databases up to 2^41 bytes (2 terabytes) in size on both Windows and Unix. Older version of SQLite were limited to databases of 2^31 bytes (2 gigabytes).

SQLite arbitrarily limits the amount of data in one row to 1 megabyte. There is a single #define in the source code that can be changed to raise this limit as high as 16 megabytes if desired.

There is a theoretical limit of about 2^32 (4 billion) rows in a single table, but this limit has never been tested.
There is also a theoretical limit of about 2^32 tables and indices.

The name and "CREATE TABLE" statement for a table must fit entirely within a 1-megabyte row of the SQLITE_MASTER table. Other than this, there are no constraints on the length of the name of a table, or on the number of columns, etc. Indices are similarly unconstrained.

The names of tables, indices, view, triggers, and columns can be as long as desired. However, the names of SQL functions (as created by the sqlite_create_function() API) may not exceed 255 characters in length."
# Don Wong

The "too large" is not disk space, but more related to the limits of it being a cgi app that opens the database each time.

sqlite also only allows a single writer per database, and even that is less useful than it could be. If you have a writer and one or more readers on the same DB, and the writer tries to modify a table while a reader is reading from it, the writer will fail.

Don't get me wrong - sqlite is great for small single-user prototypes, and it's certainly much better as an SQL engine than *spit* mysql, but it doesn't scale in the way that PyPI needs.

Unfortunately getting a PG instance up and running is subject to round tuit availability on the part of the relevant python.org folks.
# Anthony Baxter

Surely the code fragments could live somewhere else, and everyone could be encouraged to upload their packages/modules to that location and to specify it as the download URL in PyPI. Obviously, it would be nicer if one could upload the package using the PyPI interface, however.

And as for the comments and ratings systems, is something like http://www.kde-look.org/ of interest?
# Paul Boddie

Pretty much anyone with spare time and bandwidth allowance could step up to that, sure. Not hard. All you need to do is make available a small script which imports a module, runs some obvious function (__minidist__?) to get a dictionary containing a subset of the usual setup keywords but without the py_modules, and lobs it at the minidist site with an XML-RPC call. The minidist site can then register more extensively with PyPI and make the module available as an automatically stitched together zip or tarchive with an appropriately generated setup.py. Overall, not a bad weekend hack. If only I weren't moving house. :|
# Garth T Kidd

Please note that this discussion has moved to the catalog-sig mailing list. Please join in there!
# Richard Jones

What about the vaults of Parnassus ?
It's maintained by hand - but it has a good set of links to small modules of the sort mentioned by Ian. Only links though, no repository.

Whats the address of the catalog-sig ? I'd be willing to do some work on maintaining... be nice to put something back ?
# Fuzzyman

The Vaults are almost identical in scope to PyPI, which reduces the value of both of them. For catalog-sig: http://www.python.org/sigs/catalog-sig/
# Ian Bicking