Hacker News new | past | comments | ask | show | jobs | submit login

Decoding in libjpeg-turbo is mostly implemented in ASM. Rust, like C, is slower than hand-optimized SIMD ASM.



If anyone's interested, this is the (epic!) discussion thread for "Getting explicit SIMD on stable Rust":

https://internals.rust-lang.org/t/getting-explicit-simd-on-s...

It still wouldn't compete with really good hand-tuned asm, but it might help reduce the perf/usability tradeoff a bit.


I'd just like to see 2,3, and 4 element vectors as a first class citizen in C, C++, or Rust. These are incredibly common for so many things it's hard for me to understand this omission. I want to be able to pass them by value and as return values from function. I want to do operations like a=b+c with vectors without creating classes or overloading operators. For a lot of people the SIMD instructions are about parallel computation, but for me they represent vector primitives.


What would baking them into the language add that a library solution couldn't? I'm not keeping track of Rust very closely these days, but it supports arithmetic operator overloading and the Copy trait. What's missing?

I agree that this needs to be standardized for vector-ey crates to be able to talk to each other seamlessly. Otherwise we'll end up with a rerun of the C++ strings fiasco, with char* and wchar_t* and std::string and std::wstring and BSTR and _bstr_t and CString and CComBSTR and QString and GString and...


>> What would baking them into the language add that a library solution couldn't?

Standardization. Optimal performance. Better Syntax.


OOP often involves overhead. If someone defined a really clean vector type/object in a library that let me write expressions the natural way, that could be added to the language I suppose. And that's what I want.


Rust, C and C++ all possess ways (it's the even default/only way in C and Rust, and almost so in C++) to write types like this that don't involve the typical pointer-soup/dynamic-call overhead of typical OOP.

People can and do write vector in libraries right now, usually using existing compiler support that gives them guarantees about SIMD.


What I find odd about your reply, is that you downgrade C++, yet its really the only language that has sufficient operator overloading to do what the poster was requesting (transparent support without OO overhead).

AKA, its completely possible, and there are quite a number of C++ libraries that make vectors look like native types complete with long lists of global operator overloads for interaction with other base types. Generally these libraries are just thin wrappers around inline assembly or intel intrinsics for SSE/AVX and generally don't bring any form of OO syntax into the picture.

That said, even with C++ the libraries tend to fall down a bit when it comes to individual element operation/masking because the closest method is generally using array syntax for the elements which limits the operations to a single element in the vector at a time. Which means you end up with OO method calling syntax for element masking (or creative solutions with operator() )


I didn't denigrate C++, I'm specifically talking about the behaviour of the data types/mention of OOP which is orthogonal to the surface syntax like using symbolic operators instead of textual names. It is factual that the default data declaration in C++ has a little of the "pointer soup" due to methods and `this`. This can often get inlined away and so is usually irrelevant (hence "almost so"), but the poster did emphasise their desire for pass-by-value. This can be avoided by e.g. using friends more than methods, but this isn't the default way a lot of people write C++.

(Also, Rust has pretty transparent support in the same manner also without OO overhead, but differs slightly because methods can and often do use pass-by-value. This is the distinction I was drawing.)


You could also wrap a reference to a plain old struct and operate on it using a class. That's often how I write C++ when I need to be explicit about how the data is structured. Not everything in C++ requires an OO approach.

OP was specifically recommending operator overloading on a plain old struct, which can be done without declaring a class. Indeed, operator overloads can be declared as global functions in C++. The this pointer doesn't enter in to it at all in that case.


I think you're also missing my point. Techniques like what you describe are exactly what I was referring when I said "C++ possess[es] ways" and "using friends rather than methods", although you don't even need a wrapper for a reference (which is actually a pointer, and so also part of what I mean by "pointer soup"!) for this sort of code: just the struct works fine (although a `class` does too, the only difference is the `struct` keyword has different privacy defaults).

The only "downgrade" I made was saying that it is only "almost" the default in C++, versus the other two where they are completely the default.

To be clear, like C and Rust, C++ has great semantic attributes for this sort of thing:

- classes/structs that don't require allocation/pointers

- precise control over pass-by-value (for everything except the `this` pointer of methods)

- pervasive static dispatch of methods/functions (including operators, which can be considered to be method/function with an unusual name and call syntax)

The only downside, and the reason I said "almost" (which is what the C++ETF (C++ Evangelism Task Force) seems to be up in arms about), is methods are what most people reach for by default and so the `this` pointer comes into play. But as you point out, and as I implied in my original comment, this isn't required, just the default.


Look into the Clang vector extensions. They're very similar to GLSL/OpenCL vectors. They implement some very basic operator overloading too, so they're much less annoying that calling a function for everything. I made a simple linear algebra library using them, if you want to see an example: https://github.com/GavinHigham/glla


Have you tried Halide? http://halide-lang.org/

I've seen it used for various video/photo processing operations.


Off-topic: the "Halide Talk" video in that page was very good: https://www.youtube.com/watch?v=3uiEyEKji0M


Do you mean something like this[0] (probably the worst possible implementation but I was in a rush)? I suppose you still have to create a "class" but you could set it up to use syntax like `V4(1, 2, 3, 4)` if you prefer.

[0]: https://play.rust-lang.org/?gist=989fe1c05e3e68df89642032743...



What if your target architecture has a different vector width and hence you're using 4 element vectors but wasting 12?

If you want to use vectors explicitly, you can do so using intrinsics for most vector architectures.


The sizes I mentioned are extremely common in 2d and 3d geometry. This is due to the number of dimensions visible in our world. While someone may want to run 11 dimensional calculations in string theory, there are a large number of common real-world applications of the lengths 2,3,4. In C and C++ you can often use intrinsics but the big 3 - ARM, Intel, PPC - all define them differently. I want this common stuff to be part of the language. Sure go ahead and support general vectors via class definitions and such, but give me direct support for the common sizes.


Your issue with intrinsics are that the different ISAs have different specs. Fine. But if that was your only issue, then your use case is that you're manually vectoring hot loops correct?

Assuming that's true, you want to maximise performance by using as much of the parallelism that vectors give you. So if you're dealing with [4 x int32], on a 128 bit vector ISA you would be fully utilising your vector registers, but moving to say AVX-512 you're now only using 1/4 of your potential parallelism.

Your architecture independent vector types would have to target the lowest common denominator, and completely defeat the purpose of vectorisation.


I want to do math with 2,3, or 4-element vectors. These will typically represent 2d or 3d coordinates or velocities. 4-element vectors may be used for homogeneous coordinates or similar. My point is that these are very common mathematical entities and should be explicitly supported by the language.

How these map to any particular processors resources is not my problem - though the three major vector extensions today all have 4-element vectors. Some support more, but that's not terribly relevant to the math I want to do. A smart compiler could pack multiple small vectors into a wide vector register just like they try to pack multiple scalars in there today.

I am not interested in vectorizing loops. I want to write code like this:

Vec3double position = {5.8,3.9,2.1};

Vec3double velocity = {1.0,0.0,0.0};

double timestep = 0.01;

position += timestep*velocity;

and so on. Yes, I also want the common use case of multiplication of vector and a scalar to be that easy.

Any paint program or graphics library (including font rendering) does a ton of this stuff. So does every 2d or 3d physics engine. Ray tracing. FEA software. CAD. The list of uses for these vector sizes is long and has nothing to do with auto-vectorizing loops. Of course there are plenty of applications where loop vectorization is valuable and I don't want to take anything away from that. I just want built-in support for these common mathematical entities in the base language.


In another comment you say that you want these to turn into SIMD instructions. FYI vectorizing vectors of elements in an array of structs fashion is usually less performant than structures of arrays.

On ARM you have specialised load and store instructions that can de-interleave into vector registers such that register a contains VecType.x and register b has VecType.y etc, but are a bit slower.

If you don't care about SIMD performance then fair enough, but if you care enough about this issue to want the compiler to generate SIMD instructions, you better be willing to change your code to be performant on your particular target because even small changes can impact whether or not it's worth vectoring vs leaving it as scalar code.



I'm familiar with that. The problem is that it's not standard. Having the actual types v4float or v4double as part of the language would also make it easier to mix and match code from different places/libraries.


Fortran?


It doesn't really make sense in C as any modern optimizing compiler will turn it into SIMD. IIRC rust needs explicit SIMD due to bounds checking.


Bounds checking gets in the way for simple cases but anything even slightly more complicated needs to be written extremely carefully in any language, for autovectorisation to work. It is like, essentially, writing explicit SIMD without using intrinsics and without guarantees it will work as desired.

And, that is assuming that the autovectoriser is able to synthesise the desired instructions, e.g. I believe SSE2's packssdw & packusdw ("pack with signed/unsigned saturation") and pmaddwd ("multiply and add packed integers") are useful in a JPEG codec but I find it extremely unlikely that any compiler will autovectorise to them.


There are thousands of vendor intrinsics and no compiler that I'm aware of is able to just automatically use all of them in a reliable way. The idea that "Rust needs explicit SIMD due to bounds checking" is very wrong.


Because SIMD ins throughput is highly processor specific? Rust will also not 'automatically use all of them' there is no magic abstraction would make any compiler use some of the really fancy and useful SIMD ins.


I don't know what you're talking about unfortunately. My statement about compilers and SIMD isn't Rust-specific. My point was that "rust needs explicit SIMD due to bounds checking" is factually wrong.


No it isn't, it is one of the reasons that rust is getting SIMD, if it cannot eluide the bounds checking then obviously it will not vectorize the code in question.


I'm one of the people working on adding SIMD to Rust, so I'm telling you, you're wrong. If you want better vectorization and bounds checking is standing in your way, then you can elide the bounds checks explicitly. That doesn't require explicit SIMD.


How do you safely elide bounds for something the compiler cannot reason about? How would Rust handle SIMD differences when trying to generate specific code as you would in C?


> How do you safely elide bounds for something the compiler cannot reason about?

Who said anything about doing it safely? You can elide the bounds checks explicitly with calls to get_unchecked (or whatever) using unsafe.

> How would Rust handle SIMD differences when trying to generate specific code as you would in C?

Please be more specific. This question is so broad that it's impossible to answer. At some levels, this is the responsibility of the code generator (i.e., LLVM). At other levels, it's the responsibility of the programmer to write code that checks what the current CPU supports, and then call the correct code. Both Clang and gcc have support for the former using conditional compilation, and both Clang and gcc have support for the latter by annotating specific function definitions with specific target features. In the case of the latter, it can be UB to call those functions on CPUs that don't support those features. (Most often the worse that will happen is a SIGILL, but if you somehow muck of the ABIs between functions, then you're in for some pain.) The plan for Rust is to basically do what Clang does.

The question of safety in Rust and SIMD is a completely different story from auto-vectorization. Figuring out how to make calling arbitrary vendor intrinsics safe is an open question that we probably won't be able to solve in the immediate future, so we'll make it unsafe to call them.

And even that is all completely orthogonal to a nice platform independent SIMD API (like you might find in Javascript's support for SIMD[1]), since most of that surface area is handled by LLVM and we should be able to enable using SIMD at that level in safe Rust.

And all of that is still completely and utterly orthogonal to whether bounds checks are elided. Even with the cross platform abstractions, you still might want to write unsafe code to elide bounds checks when copying data from a slice into a vector in a tight loop.

[1] - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...


Optimizing compilers are rarely successful at turning non trivial code into simd.


> IIRC rust needs explicit SIMD due to bounds checking.

Bounds checks can be eliminated and code can be vectorized if the optimizer can prove it; explicit SIMD is useful for the cases where it can't.


I want it to turn it into SIMD instructions. What I don't want is to write classes, functions, or loops that have to be automatically converted to SIMD. I want a simple built-in type for these three size vectors. I also mentioned that they should be passed by value and be able to be returned by value in a (SIMD) register. This is the most efficient way to write and execute vector math.


Can't a library just add that? Make some types, implement some functions and/or overload some ops. If you're defining the special type anyway, I'm not sure why it has to be built in.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: