It's quite obvious if you look at the spec. It retains very large amounts of information that is simply not needed merely for executing programs, but is quite useful for compiler backends and tools.
It's quite smart to write a LLVM backend and frontend for a custom (compact) bytecode.
"Finalized" and gzipped PNaCl code is about the same size as gzipped x86 or gzipped emscripten-generated JS code. PNaCl code seems to have a static overhead of about 400kByte, probably statically linked code which is either handled through Web APIs in emscripten, or dynamically linked CRT in native executables.
It's quite smart to write a LLVM backend and frontend for a custom (compact) bytecode.