Ian Bicking: the old part of his blog

Re: Firsttryatgenericfunctions comment 000

source
1.
If the patching was done first it will work.  The code needs to be changed to 
use the full import jsonify and then jsonify.jasonify = jasonify_mod

2.
As the code stands, if/else is quicker.  Quicker to load(less imports) and quicker to run.
I will take your word for it that maybe there could be a performance improvement with generic functions in other cases...  however my speed tests(see below) seem to point otherwise.

Also note that if/else can translate easily into fast code(psyco or C).  So if you do have a performance problem, if/else will be faster, or you could easily change it to use some other dispatch method.  The if/else code with psyco runs 5.2 times faster, making it about 150 times faster than the generic dispatch version.  When psyco is used it fails to run at all with generic functions.


3.
You need to know what other rules exist with generic functions too.  Or at least what order they are applied.

Good point about only being able to put it at the top, unless you add extra code.


4.
Interesting optimization.  However generic functions are still slower in my tests.

5.
Thanks for spotting the error!  Ps. there are lots of warnings in your src/protocols/_speedups.c file ;) With the default debian compiler gcc4.02.


For the case where you can not change the existing source, then generic functions seem to have a use.  However there are other methods available which can work fine(arguably better).  Maybe there are other use cases for them where they make more sense.  However for the cases here, if/else works fine.





Below is a version of jsonify with a speed test at the bottom.  The if/else version is about 30 times faster.  Not including the extra module load time.  jsonify2.py is the if/else version, jsonify.py is the generic function version with the timing code attached to the bottom.

Perhaps there could be a problem with my machine, so you can repeat the test on your own to check if you would like.


time python2.4 jsonify.py 
Time for just the running parts.  Not including module load time.  9.05578112602

real	0m9.659s
user	0m9.428s
sys	0m0.167s

time python2.4 jsonify2.py 
Time for just the running parts.  Not including module load time.  0.386996984482

real	0m0.805s
user	0m0.721s
sys	0m0.070s



import sqlobject

def jsonify(obj):
    # @@: Should this match all iterables?
    if isinstance(obj, (int, float, str, unicode)):
        return jsonify_simple(obj)

    elif isinstance(obj, (list, tuple)):
        return jsonify_list(obj)

    elif isinstance(obj, dict):
        return jsonify_dict(obj)

    elif isinstance(obj, sqlobject.SQLObject):
        return jsonify_sqlobject(obj)

    elif hasattr(obj, "__json__"):
        return jsonify_explicit(obj)

    elif isinstance(obj, sqlobject.SQLObject.SelectResultsClass):
        return jsonify_select_results(obj)


def jsonify_simple(obj):
    return obj

def jsonify_list(obj):
    return [jsonify(v) for v in obj]

def jsonify_dict(obj):
    result = {}
    for name in obj:
        if not isinstance(name, (str, unicode)):
            raise ValueError(
                "Cannot represent dictionaries with non-string keys in JSON (found: %r)" % name)
        result[name] = jsonify(obj[name])
    return result

def jsonify_sqlobject(obj):
    result = {}
    result['id'] = obj.id
    for name in obj.sqlmeta.columns.keys():
        result[name] = getattr(obj, name)
    return result

def jsonify_explicit(obj):
    return obj.__json__()

def jsonify_select_results(obj):
    return jsonify_list(obj)



if __name__ == "__main__":
    import time
    a = range(100)
    b = {"asdf":2}
    class AA:
        def __json__(self):
            return "asdf"
    c = AA()

    t1 = time.time()
    for x in range(10000):
        r1 = jsonify(a)
        r2 = jsonify(b)
        r3 = jsonify(c)

    t2 = time.time()
    print "Time for just the running parts.  Not including module load time.  %s" % (t2 - t1)
Comment on Firsttryatgenericfunctions comment 000
by Rene Dudfield

Comments:

Would you mind using restructured text for your comments? Whatever you're doing instead is very hard to read.

You seem to be missing the point, by the way. If a library is trying to provide an extensible function, it's silly to make everyone monkeypatch it and disallow anybody from importing it. And, with the linear reduction in speed caused by monkeypatching, there will sooner or later be a breakeven point where the generic function is faster.

Right now, very little of RuleDispatch is in C, so that breakpoint might take several people adding their own type lookups. But as time goes on and RuleDispatch gets faster, the breakeven point comes sooner as well. Also, with more expensive tests than simple isinstance() checks, the repeated checking will reach the breakeven point faster as well. Import time also won't matter with code that is using dozens of generic functions, not just one.

You also are wrong about not needing generic functions in cases where you can change the source. There are still reasons why you'd use them:

  1. Writing optimal if-then trees for large and complex rule sets is difficult to do correctly
  2. A collection of rules is more likely to have 1-to-1 correspondence with business rules or requirements, and therefore easier to track/verify
  3. An application that gets customized rules for different customers will need a modularly-extensible rule system, that doesn't require all the rules to be in a single hand-tuned tree.

In other words, generic functions are useful for the same reason Python is more useful than assembly language. You can always write the same code in assembly, but it's easier with a high-level specification that's automatically translated. The key is simply that it have performance that's "good enough", and for complex code with expensive tests, generic functions can already do better than a human for optimization, in terms of time to produce a correct and efficient decision tree.

Eventually, they'll beat even your example here, which by the way is a bit like saying, "see, I can write C code that runs faster than Python". Well, duh. Try the test with a dozen extra types added by monkeypatching (not by-hand inlining as shown), and see what the speed difference is. I don't know what the breakeven point for this example is, but I can guarantee there'll be one. Past that breakpoint, monkeypatching will keep getting slower and slower, but the generic function will continue to have basically the same performance. And the more complex your ruleset and tests, the sooner that breakpoint will be reached.

# Phillip J. Eby

'''You seem to be missing the point, by the way. If a library is trying to provide an extensible function, it's silly to make everyone monkeypatch it and disallow anybody from importing it. And, with the linear reduction in speed caused by monkeypatching, there will sooner or later be a breakeven point where the generic function is faster.'''

No. You can use a function from other modules after changing it at runtime.

'''Eventually, they'll beat even your example here, which by the way is a bit like saying, "see, I can write C code that runs faster than Python". Well, duh. Try the test with a dozen extra types added by monkeypatching (not by-hand inlining as shown), and see what the speed difference is.'''

Performance for generic functions seems to be slower until you have 40 different source files for if/else. However it is not 40 different types. If you put the rules in less source files, then it is lots quicker still. From my tests the break even point if they are in the same file, would be about 1000 rules. With psyco, the break even point for if/else would be 120 source files... not that I tested it with 120, just that the performance behaviour is consistent with this when using 40.

So generic functions need to be optimized for those cases where you do not have 40 different source files. Otherwise the claim of better performance can not be made. I can't think of a real life example where people actually have 40 different source files for these types of rules, but I'm sure there are cases. The other place where generic functions are faster is if you make lots of duplicated slow rules.

For simplicity of reading, sit ten random python programmers in front of the two examples, and see which one is understood more easily.

For simplicity of debugging, try debugging your C code compared to some if/else rules written in python.

I think if people understand generic functions, and if there are lots of rules that can not be kept in the same place, then they would be easier to maintain.

Thanks for pointing out why generic functions are useful. It has been interesting playing around with them. I hope I have helped you see some reasons why they are not useful.

# Rene Dudfield

"""Performance for generic functions seems to be slower until you have 40 different source files for if/else."""

Not true - see my test results below. Adding just two monkeypatches to your hand-tuned jsonify brings your benchmark down to being 50% slower than the generic function version.

# Phillip J. Eby

Oops. I mean 35% slower. I was looking at 2.xx vs 3.xx without including the 'xx's in the division. :)

# Phillip J. Eby

Regarding the benchmark, I tried it out this evening and noticed a couple of things. First, the timing figures you quote above show a 23.4X slowdown, not 30X.

Second, I sped up the test by a factor of three by changing one rather simple thing. Ian's rule checking for hasattr(obj,"__json__") is unnecessary; I simply made jsonify_explicit the default and let it raise an error otherwise, since that's what happens anyway. (Making the same change to your version doesn't result in as much of a speedup, but I did it anyway so that it's an apples to apples comparison.)

Also, by doing a bit of profiling, I saw that much of the remaining difference in time comes from the fact that RuleDispatch's equivalent to isinstance() is still in Python, not C. So, I spent another 30 minutes or so whipping up a Pyrex version, and thus sped up the test by another factor of 2.

My final stats, having modified both your version and Ian's version to have the simpler logic, and using the C isinstance code, are 1.58 seconds for your hand-tuned version, vs. 2.88 seconds for the generic function version. (I used time.clock(), which is more accurate for timing purposes, and for both measurements I took the best of 3 runs.) So, the end result is that the generic function version takes about 1.37 times as long to run, which is not bad for stacking up against your hand-optimized code.

But this still isn't a fair comparison, since you proposed to extend the function via monkeypatching. So, I tested a simple version using only one level of monkeypatching:

def jsonify(obj):
    # @@: Should this match all iterables?
    if isinstance(obj, (list, tuple)):
        return jsonify_list(obj)
    return jsonify2(obj)

def jsonify2(obj):
    if isinstance(obj, (int, float, str, unicode)):
        return jsonify_simple(obj)
    elif isinstance(obj, dict):
        return jsonify_dict(obj)
    elif isinstance(obj, sqlobject.SQLObject):
        return jsonify_sqlobject(obj)
    elif isinstance(obj, sqlobject.SQLObject.SelectResultsClass):
        return jsonify_select_results(obj)
    return jsonify_explicit(obj)

This code runs in 2.91 seconds (best of 3), so it's already at breakeven compared to the generic function! So, I then added a Decimal case wrapper to both scripts, and reran the timings. Result: the monkeypatching version takes 3.88 seconds (best of 3), and the generic function version took 2.88 seconds - exactly the same as the case without Decimal. (I did not modify the test to actually include any Decimal values, but of course the monkeypatching approach slowed down anyway.)

Anyway, it appears that as soon as you monkeypatch more than once or twice for something like this, generic functions will start coming out ahead. And if they don't, you can always post the results on a blog somewhere and I'll tune up the implementation so that they do come out ahead. ;)

(The C speedup for isinstance() checks is now in the RuleDispatch SVN version, by the way.)

# Phillip J. Eby

I am glad you have sped up your generic functions. If they can be within 5-10x of the if/else method, generic functions become much more worthwhile imho. At 1.3x the speed they become awesome!

Can you please post the code to your changes so that I can verify them?

btw, my version was not 'hand optimized'. I copy-pasted the rules from the other version. Your version is hand optimized... changing the rules, and adding a C version for some rules. By changing the order of the __json__ changed the behaviour of the code. Possibly breaking it in circumstances(when an sql object sqlobject.SQLObject.SelectResultsClass has a __json__ method).

Or did I get the way the rules work wrong? I assumed they would be tested from first declared to last declared.

Is there a way you can make that particular type of rule faster? That is the hasattr(obj, "__json__") type condition. It would seem to be a very common one for this use with types. I have seen it used elsewhere eg, __str__, __html__, etc. Or maybe that is not why the original generic function was slower.

The interesting thing to note here, is that the slowness of one condition had more effect on the generic function version than the if/else version. Is that right? I can't understand why that condition accounts for such a slowdown with generic functions.

To be clear, my proposed approach is to keep things in one file, with the rules in the one place. Only if absolutely necessary do you override functionality in different files. I think it is better to refactor than to patch with conditions hidden in different files.

I also see the useful things of generic functions for when there are lots of rules spread across lots of files you may not own.

If you can get them as fast or faster than if/else rules, then I can definitely see myself using them for game programming. Especially if you can get it to work with psyco.

ps. Are you aiming to have these in python at some point? Is it not released yet? I can't find the documentation for it on your page. I did find http://peak.telecommunity.com/PyCon05Talk/ though. Pretty good that I could figure it out(I think) without documentation, and only a blog post to go by.

# Rene Dudfield