I did some emscripten/WebGL performance tests a while ago which went into the hundred-thousands particles for 3D-shape-particles at 60fps, but this depends very much on the target hardware, less on the browser or operating system.
Both tests use simple 3D shape particles (5-vertex diamonds) and hardware-instanced rendering (instanced_arrays extension), the first updates particle positions on the CPU, the second on the GPU (fragment shader writes particle position to offscreen render target, which is then sampled in vertex shader to displace particle positions).
The first (CPU updates) goes up to around 450k particles on my Windows7 desktop machine before consistently dropping below 16ms per frame, the second (GPU updates) goes to about 800k particles before dropping below 60fps:
These numbers are not much different then using desktop GL (2.1 with extensions). The real advantages for this type of scenario would come from updating persistently mapped buffers which are not available in WebGL.
What's impressive is the raw math performance of asm.js, the particle update loop is simple glm code (https://github.com/g-truc/glm) without any fancy optimizations.
Running the instancing demo on an iPhone6+ I can get to 150k particles before dropping below 60fps. There's an obvious speed up about three seconds in where I guess the JITter decides to get serious.
So, even though there's no official support for asm.js on mobile safari. It still works surprisingly well!
Try it without instancing. I was working on a rendering approach and initially went with instanced rendering. I was reading about how badly some GPU's handle that, so switched it off and saw a pretty decent perf gain. Basically trading some startup time to mass create the attribute buffers in exchange for better ongoing perf.
Not what you mean, but here's the same demo without instancing and lots of drawcalls, but careful, it will grind your browser to a halt pretty quickly:
This is one uniform update and one draw call per particle in a loop.
Unfortunately draw call overhead on WebGL is really bad compared to desktop GL. On my Windows7/nvidia setup I can go up to about 75k draw calls before frame rate drops below 60fps, on other platforms it's worse. So in this particular (very simple) scenario, hardware instancing actually pays off.
For quad particle systems it is indeed usually better to just directly write the vertices to a dynamic buffer instead of instancing, but I don't have a demo (yet) for this particular case :)
Yeah, definitely keep draw calls low. I'm not surprised to see instancing beating that.
I'd be curious to see your comparison over non instanced/single draw call though. I threw the adapter code I wrote for my instanced version up here if it helps: http://pastebin.com/1gjvreBs
Both tests use simple 3D shape particles (5-vertex diamonds) and hardware-instanced rendering (instanced_arrays extension), the first updates particle positions on the CPU, the second on the GPU (fragment shader writes particle position to offscreen render target, which is then sampled in vertex shader to displace particle positions).
The first (CPU updates) goes up to around 450k particles on my Windows7 desktop machine before consistently dropping below 16ms per frame, the second (GPU updates) goes to about 800k particles before dropping below 60fps:
http://floooh.github.io/oryol/Instancing.html
http://floooh.github.io/oryol/GPUParticles.html
These numbers are not much different then using desktop GL (2.1 with extensions). The real advantages for this type of scenario would come from updating persistently mapped buffers which are not available in WebGL.
What's impressive is the raw math performance of asm.js, the particle update loop is simple glm code (https://github.com/g-truc/glm) without any fancy optimizations.