Thread stacks are reserved, but not allocated up-front (unless you're running on OSv apparently[0]).
1000 concurrent threads will not take stack size * 1000, they'll take used pages * 1000 (with used pages being at least 1) + the kernel overhead of the thread structures.
That's trivial to check, even in a high-level language (with its own additional overhead) e.g. spawning 1000 threads in Python on OSX takes ~25MB. OSX uses 512k stacks for non-main threads.
Yes, 1000 concurrent threads will take stack size * 1000.