Speed is a Process, not an Attribute

(Were you looking for Null-Terminated Strings Strike Again?)

or: Python vs. C Extensions

Any reader of the Python list/group (comp.lang.python) will have become used to the "Is Python Right For Me?" queries, typically with a question about performance. The typical answer is that if Python's performance is problematic, you can code key portions in C -- at an extreme, Python can be a scripting language for your C (or C++, or now even Objective C) application.

I don't think this is a very good answer, because it's not the answer that most Python programmers come to. Extension programming is a sort of release valve -- it means there's always a way out if you need it. Python can never be worse than C, because you can always rewrite more and more of your code in C until your application performs sufficiently. But it really is a release valve, and like a release valve it's only there in case of emergency.

I think the real answer to these people should be: Python is fast enough just as it is. Your application may not be fast enough when you first write it in Python, but you can almost always make Python -- just Python -- perform sufficiently. This isn't true for every domain, but it's true for the most common programs.

C is good for one kind of performance -- running code really fast. But this is only one kind of performance, and now that CPUs are so fast it's not the most important kind of performance. Programs have bottlenecks, and nothing but the bottleneck matters. CPUs are not the bottleneck!

Two applications that reminded me of this: Evolution and Mail.app. At least for my usage, Mail.app runs much faster than Evolution (and Sylpheed and any other mail reader I've tried). This isn't because of language or runtime differences. Mail.app runs faster because it is smarter. I have big IMAP mailboxes, and Mail.app caches messages intelligently, checks those caches intelligently, and does its best to queue IMAP queries for quick response to what I'm most interested in.

Mail.app isn't written in Python, but it shows what good optimization is -- it's not just executing code fast, it's executing only as much code as necessary, and being responsive.

So, I think when someone asks if Python performance is an issue, tell them no, and don't apologize for Python by pointing to the release valve of C extensions. If you can write a full-featured Python program twice as fast as C (probably more like 10x, but for argument...), that leaves the Python programmer a lot of time to think about how to make the program more responsive. Some of that time spent in optimization could be used to mitigate Python's relatively slow computation, but more likely it will be spent avoiding expensive operations with disks and networks. In the end, I would expect the pure Python program to be faster than the C program (given an equal amount of programmer time).

Created 19 Oct '03
Modified 25 Jan '05

Comments:

"In the end, I would expect the pure Python program to be faster than the C program (given an equal amount of programmer time)."

Another way to look at it: I'd expect the Python program to work successfully and the C program to be about 5-10% finished. After all, time-to-market counts for more than anything else in a software-for-sale or consulting-by-the-hour job.
# Peter Herndon

In my limited experience, the people who obsess about performance and therefore only wrote things in C/C++ have been the ones who were most likely to write bad and inefficient C/C++ code. Usually most of their bad coding was done in the name of up-front optimization.

For example, they would decide to use a 3MB two-D array of structs because "it would be faster than using STL containers and classes." It didn't matter to them that their data could be represented in other ways (even in plain C) that would be faster and more memory efficient, and it didn't matter that it might be easier to code if you let a library juggle some of the details for you. I think the real issue was that they didn't know how to do anything other than use two for-loops to iterate over a static array. And I guess that's my point - these arguments usually get made because someone is slightly ignorant of their "favorite fast language" and completely ignorant of what you gain by using a higher level language.

This is just my experience; I'm not extrapolating it to say all performance-minded arguments are motivated by ignorance. There are cases where C/C++ might be necessary, but I think they are few and far between.
# Alan McIntyre

It does seem like optimizing the python code would be a better
approach, but is there a comprehensive document somewhere that
describes how one can best optimize their python code? A
tutorial or something would be a big help. If you could just
give python newbies a link to that page, that would go a long
way toward helping out.
# Corey Coughlin

Corey: psyco (http://psyco.sf.net) can give some wonderful speedups with just two line of codes inserted at the begining of your module. It only works on i386 arch for now though.

Now to answer your question more precisely, there is no such document, though http://www.python.org/doc/essays/list2str.html is one of the references on the topic. http://trific.ath.cx/resources/python/optimization/ is also a nice one (and they both come on the first page when googling for "python optimization"). The key thing is that you should never ever optimize blindly. Learn how to use the profiler module to find the bottleneck of your application and this is the only place where you should ever optimize. Make your code work in the first place. Make it clean. Make it readable. Then and not before, find the bottleneck (if any) .
# Alexandre Fayolle

Ian Bicking: the old part of his blog

Speed is a Process, not an Attribute

Comments: