Heat issues are very valid. But the "per square inch" density is relative to a square inch of fab wafer. So if it can be done on one wafer, it counts. If it's stacking discrete chipsets, not so much.
Stacking likely wouldn't save substantial cost compared to producing multiple different wafers. It could even increase cost if it decreases yield. That's very different from making the transistors smaller, where the cost per transistor decreased exponentially in the past. People focus too much in Moore's law (transistors per area), when the only interesting quantities are 1) price per performance and 2) power draw per performance.