I realized that this was your objection after I made my post. It doesn't change my opinion, but does give me something concrete to prepare an argument against. Your example hinged on a few "not quite a tail-call" cases and the idea that you would have to understand the underlying implementation to be able figure out why one runs out of memory and the other doesn't. I doubt it would take long to find several examples like those ALREADY IN THE LANGUAGE. The fact that Python is iteration-friendly and recursion-hostile is something that developers have to understand today.
Every language implementation has correct programs that run so poorly that they fail or effectively fail due to running out of space or time (an "infinite loop" is merely one that exceeds the patience of the developer/user). By your logic, any optimization that would make one of these correct programs run acceptably is suspect.
The funny thing is, I'm opposed to cluttering up Python. I'm not sure it was a good idea to add decorators to the language. It's certainly going to make for more challenging code than tail-call elimination ever will.
BY THE WAY: The stack trace problem is not intractable:
In debugging mode, when the interpreter makes a tail-call, it could push a debugging frame onto the stack. The debugging frame contains a pointer to a circular list in which each entry represents a parameter set for a tail call. Subsequent tail-calls would use the existing frame (as long as it is at the top of the stack) and place their parameter sets on the list. The circular list is of finite size and wraps around when that limit is reached, incrementing a counter of abridged calls.
If a non-tail-call is made, it makes use of the stack as normal. Note that the debugging frame is now no longer at the top of the stack. A subsequent tail-call will push a new debugging frame onto the stack (as above).
The return through the frame "returns" to code which cleans up the circular list and returns to the context calling the context that made the tail-call just liek it should.
The stack trace display function needs to know about the debugging frames and how to read the circular lists. If the circular list was not exhausted, then the trace will be no different than for non-optimized tail-calls. If it was, it will need to say something like " ... 95 Calls abridged ... ". Frankly, I think this will make the stack traces MORE readable for deeply recursive code.
Note1: Due to co-recursion the circular list has to support multiple parameter set signatures (A tcalls B tcalls A tcalls B ...). Note2: I would keep the initial parameter set for the tail-call creating the frame, overwriting tail calls 2 thru N (where N is the maximum size of the circular list). This would have the advantage that the "abridgement" would be bracketed nicely.