I haven't looked at it in a while, so I could be wrong, but I think with small enough programs you can still squeeze some payload into L1 in long tight loops where you're not jumping up and down the Python stack a lot.
But your overall point stands: if you're writing non-trivial Python programs your L1 is usually spent on language/runtime overhead.
The interpreter's stack?