I'm not saying 50-100% growth doesn't happen, I'm saying that it's an outlier (based on complete system data from multiple platforms).
If it's really a common issue with firefox's JS implementation for some reason, there's nothing to stop them from using a compressed pointer scheme for JS that uses 32b pointers and a known offset; it's a common technique. You give up one register to keep the offset around, but you've still netted 7 registers from the switch to the 64b ISA, and adding the offset is either a simple OR or can be folded into an LEA (nearly free or free).
Let's not confuse "they haven't had a chance to tune the 64b JS engine, and the 32b one has been worked on for years" with "x86_64 is slower than x86_32".
Is your "complete system data from multiple platforms" from JS apps?
Let's also not confuse "a sufficiently smart compiler..." with real-world observed performance :)
BTW data layout transformations like compressed fields would also be useful on 32-bit. The vast majority of object graphs would be happy with 16 or even 8-bit identifiers, not to mention JS numbers which are all 64-bit floats even on 32-bit platforms.
If it's really a common issue with firefox's JS implementation for some reason, there's nothing to stop them from using a compressed pointer scheme for JS that uses 32b pointers and a known offset; it's a common technique. You give up one register to keep the offset around, but you've still netted 7 registers from the switch to the 64b ISA, and adding the offset is either a simple OR or can be folded into an LEA (nearly free or free).
Let's not confuse "they haven't had a chance to tune the 64b JS engine, and the 32b one has been worked on for years" with "x86_64 is slower than x86_32".