Hacker News new | past | comments | ask | show | jobs | submit login

They're asking for sum() on a slice f32s. The sum() function actually works via a Trait for this specific purpose, Sum, so you could go like this...

New Type wrapper for f32 called like FastFloat, marked #[repr(transparent)], and if necessary (not sure) have the compiler promise you you're getting the same in memory representation as an actual f32.

Implement Sum over FastFloat by having it use the faster SIMD intrinsics for this work to give you an answer, accepting the potential loss of accuracy.

Now, unsafely transmute the f32 slice into a FastFloat slice (in principle this is zero instructions, it just satisfies the type checking) and ordinary sum() goes real fast because it's now Sum on the slice of FastFloats.




If you want to go the newtype + Sum impl route, you don't have to make it `#[repr(transparent)]` or transmute the slice. You can just `impl Sum<FastFloat> for f32` and do `f.iter().copied().map(FastFloat).sum()`

https://rust.godbolt.org/z/b9s3dna6r


Oh, I didn't think of that, clever.

EtA: The attraction of a New Type plus trait impl is that is re-usable. You could imagine (particularly if it was stable which your approach isn't yet) packaging up several speed-ups like this in a crate, enabling people to get faster arithmetic where they can afford any accuracy trade off without them needing to know anything about SIMD and without (like the C or C++ compiler flags) affecting unrelated code where accuracy may be critical.



Nice, although I notice it doesn't implement Sum or Product :D




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: