Kata 19: an optimization anecdote

I spent some time last night working on Kata 19 (these Katas are little programming exercises, a nice programming break from writing Useful Code). This was definitely the hardest one I've done so far -- you find paths between words, where words are connected if they only differ by one letter. For example: beast to least to leant to leans to loads to loons to loops.

The algorithm I used was pretty quick. Searches take at most 0.1sec, return the shortest path, and do not block on impossible paths. It does a breadth-first search, starting from both the first and terminal words (finding where they meet somewhere in the middle).

The problem was that it was really slow to start up. Reading my dictionary of 96 thousand words took about 53sec (and 166Mb of memory) on my Athlon 650MHz PC. Once it is loaded it is really fast. Perfect for a word-path-finding-server. But I don't think this justifies a persistent server.

Here is this original code. I then set upon optimizing the start time. Here's the things that worked well:

I was creating a dictionary that looked like {'hat': Set(['_at', 'h_t', 'ha_']), ...}. Instead of calculating all these permutations up-front, I calculated them as needed. Down to 31sec (22sec gained) and 124Mb.
Garbage collection was hitting. I could tell because the reading would stall at random times (when the GC was doing work). I didn't need this garbage collection while I was reading data in, because I knew no circular garbage was being created (and not much non-circular garbage was created either). Using the gc module, I turned this off. Down to 17sec (14sec gained), doesn't help memory.
I used sets in several places. One of the central data structures was a dictionary that looked like {'_at': Set(['hat', 'bat', 'sat', ...]), ...}. But I never used set operations on those values, I only iterated over them. I changed it to a list. Down to 9sec (8sec gained), 75Mb memory.

Those were all the effective changes I made. I tried some other things, to little or no effect:

Using .setdefault(word_with_unders, []).append(word) for creating the dictionary. Gained about 0.5sec over using .has_key(word_with_unders). Using exceptions here lost me 7sec -- that should only be done when you are expecting very few misses.
I iterate over range(len(word)) for each word. Since words show up in the dictionary in sorted order, I tried caching the range result (since I'd only need a handful of different ranges for the entire run). No significant effect.
range vs. xrange: no difference.
Avoiding globals, by putting range=range in the method signature. No significant gain.
"%s_%s" % (word[:i], word[i+1:]) vs. word[:i] + word[i+1:]. Actually lost 2sec. Probably using % instead of + for stings will only help when the strings are long (these are very short).
The dictionary has 's at the end of some words. There was no significant diffence between if "'" in word: word = word.split("'")[0] and if word.endswith("'s"): word = word[:-2]. Using line.strip() to get rid of \n wasn't significantly faster than line[:-1] either.

The other big optimization changed the nature of the program slightly. If you only want to test the path between two words (say hat and bee), then you only need to load words of that length (length 3 in this case). You can ignore all the other words. By doing this I could load the necessary parts of the dictionary in about 1.5sec, and use only 36Mb (it depends somewhat on what length words, of course). As a baseline, I can read the entire word list into a Python dictionary in about 0.8sec.

By using Psyco I can cut the 8sec (with all the little optimizations) down to 3.5sec, using psyco.full() It only brings the original down to 41sec (and increases memory to 180Mb), which is still pretty slow. Pysco plus turning GC off gives us 11sec, which is pretty good -- it does better when we tell Psyco psyco.bind(load_words) (i.e., tell Psyco to only try to optimize the one function: 11sec vs 15sec). In contast, psyco.bind doesn't even effect our optimized load_words, only psyco.full helps there.

I'm pretty happy with the finished program. For the most part all the important, time-critical parts are already in C -- they are dictionaries, sets, and lists. These data structures make algorithm implementation in Python really easy and scalable all at once. While you could write this in C for more performance, it would take a lot of work just to get something that worked, and a lot more work to match the speed of Python's built-in data structures. Maybe a language like OCaml or Haskell could match (or beat) Python's agility and performance. I'd be curious.

For more on Python optimization: loop optimization, another anecdote, and an anecdote using images. Some specific tips, and some general tips

Created 26 Oct '03
Modified 14 Dec '04

Comments:

Just a minor nit: sets aren't in C yet. They're a 100% Python library module. If there's enough interest (and I think there is) they'll become a builtin (see http://python.org/peps/pep-0218.html).
# Johannes Gijsbers
I didn't bother optimising the initial dictionary loading: instead I built a precompiled dictionary for each word-length. As graphs of linked word-nodes, they're pretty small.

[epiphany:~/Code/kata19] cmiller% ./nodef.rb ruby gold
dictionary loaded in: 0.397383s
ruby => rubs => cubs => cues => cued => cued => coed => cold => gold
261 nodes touched in 0.020318s

# Charles Miller
Erk. Just noticed the bug in the output. :)
# Charles Miller
My coworker and I had a good time with this kata.

After finishing it I created a corollary. Find the longest word chain. For example, the longest word chain for all three letter words is [asp ask ark are abe aye rye rae ray say sky ski sri uri uzi]. That took 210 milliseconds to find.

I highly recommend trying the corollary.

My longest time are for the 8 letter words. The chain is 30 words long and it takes 991 milliseconds to find.

# Chris Grindstaff

Ian Bicking: the old part of his blog

Kata 19: an optimization anecdote

Comments: