I respectfully disagree. > to use different allocators on a per-collection basis...

kibwen · on June 18, 2023

> Are Rust programs allocating mostly from the stack then?

Yes, overwhelmingly so.

The basic rule of manual optimization is that the upper bound for the improvement that can be gained by optimizing any one "thing" is equal to the proportion of the program's runtime that is devoted to that thing. E.g. if a flamegraph shows that 10% of a program's runtime is devoted to a specific function, then improving the performance of that function is going to improve the performance of that program by at most 10%. (We might call this "generalized Amdahl's law": https://en.wikipedia.org/wiki/Amdahl%27s_law )

Therefore, in order to improve the performance of a program by 2-3x by changing allocators, that requires that a program be spending at least 50% to 66% of its runtime on allocation. And it would be a highly anomalous Rust program that was allocator-bound in this way.

dxhdr · on June 18, 2023

How do you measure cache line utilization with a flame graph? I'm not sure it's as simple as you're describing. Poor memory access patterns from heap allocations can kill performance via death from a thousand cuts. It's not about spending "X% of the time on allocation," it's about every computation spending unnecessary time reading and writing to the cache due to poor choice of allocation strategy. Custom allocators can ensure data layout fits the access pattern.

More specifically, arrays are about the fastest thing around, and handing out objects from within a preallocated array tends to give the best access performance by a large margin. From what I understand you basically have to sidestep the Rust borrow checker to achieve this. Which does raise some interesting questions.

I think the original poster was also saying that heap allocations are slow, which is true, but I agree it'd be easier to tell if your program is having trouble with that.

kibwen · on June 18, 2023

I'm unclear how cache line utilization is relevant here. In both Zig and Rust, the choice of allocator has no effect on the layout of data in a given collection. In other words, this Zig code:

    var foo_list = ArrayList(u8).init(foo_allocator);
    try foo_list.append(1);
    try foo_list.append(2);
    try foo_list.append(3);

...and this Rust code:

    let mut foo_list = Vec::new();
    foo_list.push(1);
    foo_list.push(2);
    foo_list.push(3);

...are both going to produce an array of [1,2,3] stored in the heap. The choice of allocator only affects where that array itself ends up being stored. Fetching an element of foo_list is going to cache the other elements in the list regardless of its location in memory, so the choice of allocator doesn't matter for that purpose. The only thing that could make a difference is if you want multiple different collections to be fetched in the same cache line, but if your collections are so small that multiple collections fit in the same cache line then your data isn't big enough to be worrying about optimizing your cache utilization (and furthermore, even if you carefully lay out your collections in such a way, as soon as any of your growable arrays need to be reallocated that completely throws all of your careful organization out the window).

> More specifically, arrays are about the fastest thing around, and handing out objects from within a preallocated array tends to give the best access performance by a large margin. From what I understand you basically have to sidestep the Rust borrow checker to achieve this.

Where did you get this impression? Rust has stack allocated arrays, and you can hand out references to them just as well as you can to any other owned type. I can think of no distinction that the borrow checker makes between stack- and heap-allocated types.

tw061023 · on June 19, 2023

While I agree with you, my point was specifically the cost of a malloc. As someone who had actually profiled a pool vs malloc, this is kind of a salient point for me.

In some workloads short-lived allocations are common. Like memory is allocated for milliseconds and then deallocated - and X% of time will be spent just inside the allocator trying to find the best block to allocate. At this point you want to make sure that you don't spend more time in the allocator than allocations actually live.

tw061023 · on June 19, 2023

> Yes, overwhelmingly so.

I wonder how common business tasks can be coerced to this model.

Also I believe this is not a good excuse - the language should still provide a way to properly manage allocations even if it is not needed most of time.

> The basic rule of manual optimization is that the upper bound for the improvement that can be gained by optimizing any one "thing" is equal to the proportion of the program's runtime that is devoted to that thing.

Yep, and this is why you should always measure things before optimizing. But then again, you can save yourself a lot of time by not doing stupid things like allocating from heap in a hot loop and profiling just to discover such basic mistakes.