Ian Bicking: the old part of his blog

path module

I've been using Jason Orendorff's path module (doh, I'd been misspelling Jason's name) lately in a couple command-line scripts. A while ago I stopped writing many scripts in shell, since I always found it hard to grow the scripts, and the quoting always causes problems. How many shell scripts can deal with filenames with spaces in them? Shell scripting is a crappy language. I always thought Tcl would make a good shell -- both simpler and more reliable than shell, but just as string-oriented. But it never seemed to happen... so it goes.

Anyway, when writing command-line programs you do lots of file operations, and typically use os.path a lot. os.path is a pain to use -- too much os.path.join(os.path.dirname(...)) and the like. This makes Python rather painful to use.

Despite wanting a path module for a long time, I keep forgetting to use the one I have (standard or no). After using it some, I find it very pleasant. Besides avoiding os.path, I get nice functions like text() and write_text() which read and write text to the file in a single chunk, no open() required. And I'm sure there's more good stuff in there that I haven't assimilated yet either.

There's an outstanding (pre) PEP, and a second discussion on comp.lang.python surrounding a path module. Personally Jason's module fits me well, and I think it's worth the time to learn even if you've become accustomed (as I have) to os.path.

(And I like the use of / for os.path.join -- purists say that it's not division, but I say who cares; % substitution isn't modulo either)

Update: Jason wrote at length about some of his thoughts on the module in the comments

Created 24 Jan '04
Modified 14 Dec '04

Comments:

No experience with this module, but I agree with you. / makes sense. I'm for it (as long as there's a method way to do it too and it's not called __div__).

# Bob Ippolito

There's a .joinpath() method as well. (Since path subclasses str, the .join() method is already taken)
# Ian Bicking

Coooooool. Thanks for the pointer.
# John P Speno

There is a bug in modulefinder, which struggles on import of Jason Jorendorff's path module.

That is especially bad when using py2exe - maybe also with other distutils stuff (but I did not experience that, because I only use distutils with py2exe)

One solution is to rename path.py to jpath.py.

Anyway - path.py from Jason Jorendorff is GREAT. I even use it to access contents of a file via

p=path.path("somefile.extension")
content=p.bytes()

The other nice thing: when dealing with windows, path.py keeps a lot of the filenametroubles perfectly wrapped.

Harald
# Harald Armin Massa

My favourite features in Jason's path module are the iterator-style walk* methods. Code like:

for pyfile in path('src').walkfiles('*.py'):
....

is just a pleasure to write (and to read!).
# Graham Fawcett

What a wonderfully pythonic path module! As already said: thanks for the pointer. And thanks to Jason Jorendorff (and contributors) for writing it! Is there any mandate for including it in the Python Standard Library some day (supposedly named differently)? In my eyes (after just quickly checking it out) it seems as close-to-core and useful as optik, PyUnit and other fairly recently included modules. By having it along with the other batteries by default, it might provide yet another 'first glance of python' epiphany.
# Niklas Lindström

I have a couple links in the original post to a PEP (which has lost momentum for the moment), and an older discussion where I suggested the module has a place in the core as well.

The PEP summarizes some issues. I can't say I agree with some of the conclusions people had in the thread that followed (e.g., that path shouldn't subclass string, __div__ shouldn't be provided, and some of the other stuff).

I think the next step, though, is to get some more people using the module. People shouldn't (and don't have to) theorize about what's a good API. An API exists (two, actually -- the other is linked from the PEP), and people should use them and see what they think, THEN comment. That's process is appropriate for adoption in the standard library as well.
# Ian Bicking

Since several comments have repeated the original mistake, it would seem worthwhile to point out that the module is actually by Jason Orendorff, and not Jason Jorendorff...
#

This rocks! But it does not seem to work on win98. I tested the walk funtion with this little script:
8<---------
from path import path
import sys
p = path('c:\\')
for f in p.walk():
print f
------->8

It fails with this trace back:
8<---------
Traceback (most recent call last):
File "test.py", line 4, in ?
for f in p.walk():
File "D:\download\path\path.py", line 346, in walk
if child.isdir():
AttributeError: 'str' object has no attribute 'isdir'
------->8

Can anyone confirm this?
# Fredrik

> Since path subclasses str, the .join() method is already taken

It's a bad idea to leave original str.join method in path class: in this case path.join(...) makes no sence and should be replaced or "removed" (raise exception).
# ods

I'm answering my own question about the walk bug.
I changed line 342:
for child in self:
to:
for child in self.listdir():
and now it works. Maybe it should be fixed in another way, but this seem to work.
I think this bug applies to all walk methods.
Cheers,
Fredrik
# Fredrik

Thanks for the comments here, especially Fredrik's bug report. I guess I should use my unit tests. Will fix tonight.

=== What I've learned from path ===


1. path is useful, much more useful than I expected. Now when I use os.path, supposedly to save myself the pain/time/complexity of installing path.py, I often end up switching to path later ( fortunately that's pretty easy, by design). Is Guido really "lukewarm" about this? I have to wonder if he's tried it...


2. Which features I use: Roughly in order, bytes(), files(), .name, lines(), isdir(), isfile(), .parent, .size, + operator, / operator, .mtime, makedir(), walkfiles(), followed by everything else.

walkfiles() makes me smile.

listdir() is not useful. It's superceded by all those other more specific methods, particularly files().

The optional wildcard argument to files(), et al, is really cool. I forget who suggested it, but I'm glad I decided to try it out instead of instantly showering them with scorn (always my first impulse).

On modern Windows systems, paths are Unicode. So on Windows, path subclasses unicode. Maybe someday I'll use this. :)


3. path subclasses str. A path is-a string. I don't really like it. But it's inescapable. The precedent is overwhelming: the entire tradition of Unix, Windows, C, and Python programming. I've come to see the wisdom of it.

There's a practical aspect, too. Why would you write a path class that you can't use with existing functions that take paths as arguments? I like writing win32file.FileCopy(path1, path2, True), not win32file.FileCopy(str(path1), str(path2), True). The latter is offensive because I'm taking an object that is already the "right" type and explicitly converting it to a type that makes less sense in order to pass it in. Plus it breaks Unicode. (Gotcha.)

That said, my approach has a big, unsightly wart. path inherits three decidedly unwanted features from str: join(), split(), and iterability. This sucks, but it sucks worse when subclasses break promises made by their superclasses (something else I learned from path).


4. Using / and + as semantic sugar for .joinpath(): I don't use / much. But if we're to have an operator that does this, I'm convinced + is a very bad choice. (1) It's confusable with string concatenation, especially since variables and function arguments are untyped. (2) It introduces bugs when you port old str-based code to use path objects. Code like x = mydir + "/foo" looks right, but it's broken. (3) I think string concatenation is actually a more common operation than joinpath() for paths. bakfile = myfile + '.bak' and so forth.


5. Personal preferences I suppressed while making path:

I dislike relative paths. They're an error-prone shortcut. Great for ad hoc shell commands, of course.

Ugh, a lot of these method names stink. But I kept them exactly as in the standard library, where applicable. I hope this makes it easier for people to grab path and start using it.

The dothisdothat() naming convention is Python's worst wart. I'd prefer either doThisDoThat() or do_this_do_that(). But standard library style it is. No point being inconsistent.

I usually don't like "big" classes. path is too big for my taste. But people wanted all these features, and I admit I use them too.

# Jason Orendorff