> Why isn’t std::simd in stable yet? Leaving aside any *specific* blockers: - It...

colonial · 2025-11-05T23:02:27 1762383747

> Many people already use non-portable SIMD for the 1-3 targets they care about, instead.

This is something a lot of people (myself included) have gotten tripped up by. Non-portable SIMD intrinsics have been stable under std::arch for a long time. Obviously they aren't nearly as nice to hold, but if you're in a place where you need explicit SIMD speed-ups, that probably isn't a killer.

JoshTriplett · 2025-11-05T23:14:34 1762384474

Exactly. Many parts of SIMD are entirely stable, for x86, ARM, WebAssembly...

The thing that isn't stable in the standard library is the portable abstraction layer atop those. But several of those exist in the community.

exDM69 · 2025-11-06T12:10:11 1762431011

Despite all of these issues you mention, std::simd is perfectly usable in the state it is in today in nightly Rust.

I've written thousands and thousands of lines of Rust SIMD code over the last ~4 years and it's, in my opinion, a pretty nice way of doing SIMD code that is portable.

I don't know about the specific issues in stabilization, but the API has been relatively stable, although there were some breaking changes a few years ago.

Maybe you can't extract 100% of your CPUs capabilities using it, but I don't find that a problem because there's a zero-cost fallback to CPU-specific intrinsics when necessary.

I recently wrote some computer graphics code and I could get really nice performance (~20x my scalar code, 5x from just a naive translation). And the same codebase can be compiled to AVX2, SSE2 and ARM NEON. It uses f32x8's (256b vector width), which are not available on SSE or NEON, but the compiler can split those vectors. The f32x8 version was faster than f32x4 even on 128b hardware. I would've needed to painstakingly port this codebase to each CPU, so it was at least a 3x reduction in lines of code (and more in programmer time).

camel-cdr · 2025-11-06T14:50:51 1762440651

A f32x16 version would also be faster on 256b hardware, but spill in SSE. For Zen5 you probably want to use f32x32.

I'd prefer if std::simd would encurage relative to native SIMD width scaling (and support scalable SIMD ISAs).

exDM69 · 2025-11-06T15:38:01 1762443481

> A f32x16 version would also be faster on 256b hardware, but spill in SSE. For Zen5 you probably want to use f32x32.

Yeah, exceeding native vector width is kinda just adding another round of loop unrolling. Sometimes it helps, sometime it doesn't. This is probably mostly about register pressure.

And architecture specific benchmarking is required if you want to get most performance out of it.

> I'd prefer if std::simd would encurage relative to native SIMD width scaling (and support scalable SIMD ISAs).

It is possible to write width-generic SIMD code (ie. have vector width as generic parameter) in Rust std::simd (or C++ templates and vector extensions) and make it relative to native vector width (albeit you need to explicitly define that).

In my problem domain (computer graphics etc) the vector width is often mandated by the task at hand (e.g. 2d vs 3d). It's often not about doing something on an array of size N. This does not lead to optimal HW utilization, but it's convenient and still a lot faster than scalar code.

Scalable SIMD ISAs are kind of a new thing, so not sure how well current std::simd or C vector extensions (or LLVM IR SIMD ops) map to the HW. Maybe they would be better served by another kind of API? I don't really know, haven't had the privilege of writing any scalable vector code yet.

What I'm trying to say is IMO std::simd works well enough and should probably be stabilized (almost) as is, barring any show stopper issues. It's already useful and has been for many years.

vlovich123 · 2025-11-05T21:00:17 1762376417

> we can't fix any API issues.

Can’t APIs be fixed between editions?

JoshTriplett · 2025-11-05T21:02:19 1762376539

Partially (with upcoming support for renaming things across editions), but it's a pain if the types change (because then they're no longer common vocabulary), and all the old APIs still have to exist.