> three issues: "How big is it", "Who owns it", and "Who locks it"
The issue with this kind of thinking is the belief that there is a "it", not "they".
Say, you are writing a HTTP server [1]. A beginner mindset (what seems to be demonstrated in this post) with start allocating left and right, for every HTTP header, for every piece of string, for every piece of metadata record, etc. Thus, "how big is it" become no bigger than it needs to be. The lifetime of these allocations and deallocations will be made as "narrow" as possible, meaning that the programmer will try to deallocate as soon the they stop needing the data stored; "who owns it" becomes the user of the data itself. And so on.
However, think of an alternative strategy, something which game and embedded developers have been using for decades. You allocate a memory arena when the HTTP request comes. From that point on, every "allocation" is equivalent to bumping a pointer in that arena. If the arena gets full, we allocate another and chain it to the previous one, as a linked list. No deallocations are performed (or could possible be performed) until the Request is parsed, relevant processing is done, and Response is constructed and sent. At this point, the entire arena chain associated with that particular request-response cycle is deallocated in one go.
How big is it? We don't care, smaller than the arena.
Who owns it? That particular request-response cycle.
Who locks it? No one, if different request-response are parallelised wrt each other; otherwise, depends on the nature of concurrency/parallelism.
[1] Usually, I would have given a gamedev example, and then people would have said that it only works in gamedev. That's why I have tried to give a webdev example, considering the majority demographic on this website.
> You allocate a memory arena when the HTTP request comes.
The per-request-arena concept is one of the most powerful and IMO underappreciated tools out there for dealing with memory-lifetime issues. I've used it or advocated for its use at multiple jobs and projects, quite successfully, even for kernel code. Windows NT uses it in IRPs. Network code in both BSD and SysV did something pretty similar, at least in the long-before-Linux days. It's tried and true. Having a clear moment when things should be freed and an equally clear description of what those things are is wonderful.
That said, even games and network servers and storage drivers often have other complex data structures not related to individual requests. It's usually not hard to come up with rules for each of these, but the rules forced on you by something like STL might not be as convenient, performant, and/or verifiable as those you come up with yourself.
I hope that some day programmers will stop using memory-unsafe languages except when they really need to, and that when it is necessary those languages will give them the tools to solve those problems themselves instead of pretending they're already solved. They're not. Rust's borrow checker plus Zig's arena concept plus a couple more pieces might actually get us there. C++'s feature salad never will.
Maybe I didn't express myself properly (English is not my native language), but the point wasn't that everyone should use Memory Arena™ instead of Smart Pointers™ or Borrow Checker™. I am not trying to sell a conference here :)
The point was, to put it in Mike Acton's words, "when there is one, there are many". Usually, things are grouped, perhaps semantically, and it make sense to think of them as a group rather than breaks them in pieces. Arenas are one rather simple grouping mechanism, but they are not the only one. Of course, explaining complex grouping will require explaining the associated problem-solution pair, which is obviously out of scope of a forum reply.
All I would say is, if one steps out of one-at-a-time thinking, and statically plans the dynamically happening data transformation, and therefore exploit patterns and relationships inherent in that data flow, programming in low level languages becomes much easier.
But you need to know the problem (really really know it), and you need to know the solution. Temet Nosce.
Um, yeah, but there are other issues that make it not so nice. And it's an inoperative concept for most kernel/embedded work. I'd say in most contexts where forking would be OK you should be using a GC language anyway.
> And it's an inoperative concept for most kernel/embedded work
I recently integrated mini_httpd (a small forking server) into an embedded system as part of a solution for distributing updates across a cluster of these systems.
You must be talking about 15 cent microcontrollers (that still have a TCP/IP stack with SSL) or something, not ARM cores with MMU's running Linux.
I said most kernel or embedded. It's lovely that there are all sorts of embedded devices nowadays capable to support a style of programming indistinguishable from a general-purpose system, but that's not the only or even most relevant case. There are also billions of devices out there - not just 15-cent microcontrollers either - that don't lend themselves to that style because of real-time or other requirements. Even within a conventional system, there's a ton of stuff in the kernel - pre init, board support, most storage and networking - that can still use a per-request-arena approach to good effect but can't fork a process.
As I said, if you can fork all the time you probably shouldn't be using a memory-unsafe language anyway. Nothing you've said so far has suggested otherwise. The basic techniques for implementing your own memory safety are still valuable and worth discussing for people who aren't in Easy Mode all the time.
This subthread is strictly about web serving. Requirements combinations like "HTTPS serving in pre-init or board-support code" or "real-time HTTPS serving in under-powered embedded board" are not practical or relevant, unless we replace "HTTPS serving" with something else.
With regard to the other point, if we are using a memory-safe language, then we can use fork to get an instant arena, regardless of how that language performs resource management under the hood. Whatever allocations happen in the request, of memory or file descriptors, are reliably gone when that exits. If there is any problem in the implementation of the memory safety, the failure is contained to a process. Thus we have an additional reason to use process containment: not having control over the memory management, and not trusting it 100%.
A meta comment, not really related to your point, but I found it amusing in context.
You're writing about how doing lots of little things in an HTTP server is inefficient; and you're replying to John Nagle, who contributed the Nagle TCP optimization that debounces tiny sends with a little bit of delay, in the hope of doing more work in one go!
(On topic of your HTTP example: this is where generational GC can shine. Ideally one of your younger generations covers the full allocation cycle for a request & response, so that nothing survives the collection and it's ultra-cheap to collect since tracing the roots doesn't find anything. The nice thing about GC instead of an explicit arena is that it's global, so you don't need to contort all your library calls to ensure they're using the right allocator.)
Game dev here. Try reading the Godot code base. It's tiny allocations all the way through.
It's really quite bad.
BUT
I'm using it professionally and it's useful. It's hard for me to mentally reconcile how good and bad the creator of Godot simulatenously was (is?).
Casey also goes way too far in his take (as usual).
RAII is a method to guarantee safety and correctness. It's useful in many many scenarios.
Game dev and c++ is a great area to hone your performance skills. And it's awesome to really squeeze the power out of a CPU, but doing that is only a small small piece of the game development journey. Ultimately, you must make a product that is entertaining.
PS: Mike Acton's talk is really only tangentially related. He's just talking about uber perf in general.
As productive and useful as Godot is (it is an awesome engine), its design resembles game engines of the late 90s to early 2010s when OOP was all the rage. Back then, most game engines were written that way (lots of tiny heap-allocated objects, connected by shared pointers). It's really just since the 2010s that the CPU/memory gap is the main driving force of game engine design (which wasn't much of an issue in the late 90's).
On the other hand, many games don't need to juggle more than a few dozen to a few hundred dynamic instances of one "thing" (outside of the particle- and animation-systems at least), so for most games a modern ECS design is definitely overkill (except for some specific parts of the game). Providing a good "game building workflow" in the editor is definitely more important.
Hey Floh, love your work! It's validating to hear you say these things. For me, the biggest annoyance with Godot's code base is the lack of clear ownership semantics (no use of smart pointers), and the use of fully hand rolled collection types for everything. It makes it hard to inspect variables in a debugger! Also the performance of the collection types are bad because of all the allocations. Seems like those problems will never get solved as it would be too expensive to rectify.
A fairly common way is to first implement experimental gameplay logic in a scripting- or visual-language, and once it works, extract the performance-critical parts into "proper" C++. Basically, the common lower level gameplay building blocks are written in C++, glued together by some higher level mechanism (like noodle graphs or a scripting language, and common features move into C++ as needed). Unless it's Unity, in that case, replace C++ with C#.
Wow, Casey doesn't hold back. Clearly, without hesitating says that 100% of code written in RAII* style sucks. Interesting considering Stroustrup considers RAII to be "the basis of some of the most effective modern C++ design techniques."
I'd like to hear Casey's comments on TDD.
*RAII style is lumped together with try/catch, smart pointers, tons of malloc/free new/delete
Casey hasn't a clue. He's only written games and he hasn't actually had to solve the issues that are being solved by in other domains. If he actually did have that experience he'd see his ideas don't scale. Games are in generally, vastly different than other apps (word processors, video editors, browsers, etc) in for one, for the most part, they get to choose all of their data upfront. If you're making "The Last of Us 2" there is no user data. There's no "some people will use this to write a letter to grandma and yet some other people will write an 800 page book on physics with mathematical diagrams" and yet another will write a report on the market with linked live data.
Consider "games" like Minecraft, Roblox or Dreams, those are entirely user-data-driven. Different types of games are at least as different to each other as to other types of applications (or rather, they are not less diverse than other applications, they usually just have a higher focus on performance).
How are they different? You have some primitives like a block, and meaning is given by users to a group of them. The program has to care only about the primitives.
(Of course making it performant, not rendering/“simulating” everything and the like is exceedingly hard, but it is true that it is more of a “closed world” as opposed to some other areas of software development.)
This type of "creative games" lets users combine basic building blocks in the same way a word processor application allows to write entire books by combining a a limited set of characters. The limitations of strictly linear games like Last of Us are not because of technological restrictions but because it's hard to tell a cinematic story while still giving the user complete freedom.
Minecraft was created in Java with ducktape and hacks and a lot of ugliness and perf characteristics which wouldn't fly past a pedantic programmer like Casey
Is your argument that arenas don't scale because user-provided data is variable in size?
Although arena memory is casually described as "allocate one huge chunk of memory up front," you are not literally only allocating one block ever and praying it never runs out. If you run out, you allocate another block. The point is that you don't call malloc for every string, object, list, etc. Adhering to this largely eliminates the need for RAII. What about this doesn't scale?
Personal anecdote: I'm building an IDE, where literally all of my data is provided by the user, and arenas have worked perfectly. I don't think I have a single destructor except for dealing with things like file descriptors, etc.
Bjarne Stroustrup is a Director at Morgan-Stanley, a major investment bank in New York where software difficulties may cost millions of dollars per minute. He has described his workday as people coming to him with a difficult software engineering problem, about which he asks increasingly detailed questions until light dawns, and they go away ready to re-write the badly designed subsystem causing the trouble.
When using a "rubber duck" I think you just force yourself to turn loose thoughts into coherent ideas by having to express them clearly. This can reveal some problems that were not apparent before the idea was expressed. It's a "rubber duck" because it's really just a monologue and an inanimate object could do the job of the listener.
What's described above seems more like Socratic questioning, where a person asks questions that reveal facets of an idea that the person being asked may not have considered, thereby prompting the asked to rethink their assumptions and draw new conclusions.
I don't see grouped resources as an alternative to the way we use RAII. It doesn't mean that the approach doesn't have its own merits, but it doesn't cover the things we do with RAII. The architecture of a DAW isn't much like a server that responds to requests, even if there are elements of it that do correspond to that pattern.
Alternatively, you are referring to something I'm not aware of.
Does TDD here mean ‘Type Driven Development’ a la Idris or, as I assume (and seems more likely) does it mean Test Driven Development?
If you mean test driven, I don’t think I have heard Casie discuss anything remotely close to the standard presentation of TDD strategies. I would imagine, based on watching a large amount of Casie streaming, both alone and with Blow, that he would view testing (as in TDD) as unproductive for his particular style of development and his goals as a programmer. But, since it’s in my brain now I will ask him next time I catch a live stream just to satisfy my curiosity.
I can’t speak to any discussions specifically about TypeDD. But would bet Casie would be wholly uninterested and consider it theoretical academic fluff that pulls away from the fundamental data transformation work of development.
Your comment reads like you disagreed in some way with the parent, but then you showed a scenario and defined it in a way you can answer the important questions. Just drop the "we don't care" part and they're all good answers that can be used to model the ownership and usage.
The lack of locks in this scenario is important in itself and can be expressed in types. (It shows for example that as long as the whole process is migrated to another thread, you can have safe N:M scheduling) The process ownership of the arena can be well defined as well.
The "alternative strategy" as you described it doesn't have different questions - just different answers.
No, the point wasn't a scenario. The point was to stop focussing on individual allocation-deallocation and start thinking about the data transformation pipeline. And every program is a data transformer — because data transformation is all a computer does or can possibly do.
When one starts thinking in terms of data transformation pipeline, and start to relate lifetime of data with the lifetime of various phases of that pipeline, then one stops worrying about individual "objects" and start thinking in terms of aggregates. Suddenly, there is no need to track the ownership or size for each object, because those properties are now shared with the same (or isomorphic) properties of the phases of pipeline itself. As long as the position in the pipeline is tracked for (e.g., the call stack), all other lifetimes will automatically be tracked.
I get the idea you're describing and use it for various purposes. But I don't agree "there is no need to track the ownership or size for each object". You can offload some of that thinking to the arenas, the same way you'd do it with GC, sure. But you can't just ignore ownership - sure request context is owned by the arena - but do you need to copy the values you pass to logging? do you need to wait for log flush before destroying request context?
What about the more common cached data / process state?
Arenas simplify the processing just like GC but they're not magic and don't solve everything.
EDIT: And about data sharing (with logging, etc.), that's part of the data pipeline too. So yes, your aggregation mechanism will take it into account. These are not two separate problems, irritatingly coupled due to reality; these are two parts of the same problem.
> From that point on, every "allocation" is equivalent to bumping a pointer in that arena.
I'm not a C++ developer, so I have a hard time imagining how would this be implemented in real code. I know how to allocate a blob of memory, but how do I redirect all the later allocations (that use `new` or that are hidden in std::string and other containers) to use parts of that blob? How do I know when the blob is going to overflow so that I need to allocate another one? Is it possible to give 5-10 lines example showing the basics of the technique?
It’s not an either or thing, there is probably no need to allocate strings in the arena.
Some cpp structures do allow for custom allocators, but you are more likely to have an arena instance and call some specific function on it and it will do the allocation for you. It usually operates on only a few types, not meant for arbitrary allocations.
> Some cpp structures do allow for custom allocators,
Yeah, that's what I was thinking about, but my only knowledge of allocators comes from gcc error messages which include all the template parameters - I have no idea how they work or how to switch to another one :(
I get the idea, though, it's basically what Erlang does for its processes - each one has a "private heap" just for it. If the process exits, the whole chunk of memory can be reclaimed and there's no need to run GC on it anymore (there's no sharing of memory between processes in Erlang, at all, so it's safe).
You either pass the allocator around to every stl container, and everything that uses an stl container. Or you override new/delete and have a sidechannel that defines what arena to allocate in.
Another optimization would be to keep the request buffer around and use it as backing storage for all the strings objects that result. The weakness of this strategy is that it doesn't make sense to keep it around if you only need a small part of it. It's a common source of memory leaks in Java where `String.substring()` just creates a new object with different start and end indexes for the same backing buffer.
A possible disadvantage of arena allocators is that everything is now tied to the lifetime of the arena, even though it might not be required that long. You'd need to practice a strategy like subarenas, RAAI, garbage collection or reference counting within the request as well.
The occurrence and advantages of all these strategies unfortunately are heavily workload-dependent. Most strategies work well for 80% of the workload. Different workloads of course.
This is why I'm a strong advocate of hiring former game developers - they know what matters when it matters, not before, and not after. how? Pain and tears of being there, trying in the obvious wrong way first, and then fixing it during production while kiddies shout at you. Former game devs are the Marines of software.
Muratori has demonstrated to me one time too many that he hasn't a got a clue about the things he's talking about.
To be precise: He may or may not have a clue about some things he's talking about, but I don't know enough about them to form an opinion. But in more than one occasions he went on to talk (sometimes at great length) about things where I know he's mostly or completely wrong in what he said.
I don't have a way to know the scope of these things, the things he doesn't know or understand yet talks about. Maybe it's just when it comes to "systems programming". Maybe it's everything that isn't game design and programming. And maybe he's wrong about games too. As I said, I don't know enough to judge everything he says, but the things I was able to judge convinced me not to trust him and his opinions.
And since I can't set the scope safely, I have to not trust him regarding everything.
For example, while his post[1][2] on ETW being "the worst API ever made" is slightly amusing and has _some_ good points, it mostly demonstrates:
(a) Inability or unwillingness to read the documentation. It would have saved most of his problems. But let's say that to criticize the _design_ of an API we can disregard that for a moment.
(b) Inability or unwillingness to pick the right tool for the task or pure trolling.
(c) Worst of all: Complete lack of understanding of the goals, purposes, limitation, design goals and tradeoffs of the system.
You can't criticize the design as being too complicated, while offering an irrelevant alternative for a "glorified memcpy()" (his words), when you don't understand what it does or what it's supposed to do.
(I emphasize again: I'm not saying the API is a paragon API design and that all is perfect there. I am saying the criticism is both grossly exaggerated and demonstrates misunderstanding of what the API actually does.)
After a couple of those I don't trust him and you shouldn't either but you're welcome to.
> You can't criticize the design as being too complicated...
I hadn't finished reading the ETW rant; but I don't see anything wrong with it. All he seems to be saying is that for communicating with OS, ioctl-like all-in-one APIs are bad while epoll like APIs are better. Not really a controversial statement.
> After a couple of those I don't trust him
Who said anything about trust? I have made enough mistakes and learnt enough lessons, and it's nice to see other well-respected programmers (like Casey and Mike) come to the same conclusions like mine. To paraphrase your post: "You shouldn't repeat those mistakes but you're welcome to".
EDIT: Actually, after finishing reading the blog post, wow! I thought Linux's perf_event API was shit, but this ETW crap takes the cake. Well done, Microsoft; you had one job.
While I agree with the sentiment, my impression is that the GP's point is about memory safety rather than performance. So yes, this applies for common patterns like per-frame memory in games, in which case the "it" is the arena. Otherwise, as a general rule, profile first, then optimize.
I made a special purpose http sever in C++. I had a function called request() and just allocated everything in the stack of that function. Only a few KB. eg a vector of the headers etc. Of course after the function was done the memory was freed.
> You allocate a memory arena when the HTTP request comes. From that point on, every "allocation" is equivalent to bumping a pointer in that arena. If the arena gets full, we allocate another and chain it to the previous one, as a linked list.
Didn't you just describe the regular memory management as applied by OS in a process scope? At the completion of the process, all memory allocated by the process is reclaimed.
On some systems this approach is combined with granular quotas to allow multi-process coexistence.
Shifting this sort of memory management over to application scope may yield benefits now, but ultimately it's an OS-level concern.
Asking the OS for memory is a low-level operation, which means it is not portable and should be done by a library or framework. Also, since it's a syscall (involving a context switch), you don't want to do in performance-sensitive code.
Some applications might benefit from allocating huge pages. By default, the page size is something of the order of 4KB or 16KB. Large numbers of pages result in huge overhead. By allocating fewer, large pages on the order of MBs can help.
Interestingly it's exactly what the PHP memory model does, each request is a shared nothing VM space, nothing survives the end of the request, even if allocated memory objects don't get free'd.
The issue with this kind of thinking is the belief that there is a "it", not "they".
Say, you are writing a HTTP server [1]. A beginner mindset (what seems to be demonstrated in this post) with start allocating left and right, for every HTTP header, for every piece of string, for every piece of metadata record, etc. Thus, "how big is it" become no bigger than it needs to be. The lifetime of these allocations and deallocations will be made as "narrow" as possible, meaning that the programmer will try to deallocate as soon the they stop needing the data stored; "who owns it" becomes the user of the data itself. And so on.
However, think of an alternative strategy, something which game and embedded developers have been using for decades. You allocate a memory arena when the HTTP request comes. From that point on, every "allocation" is equivalent to bumping a pointer in that arena. If the arena gets full, we allocate another and chain it to the previous one, as a linked list. No deallocations are performed (or could possible be performed) until the Request is parsed, relevant processing is done, and Response is constructed and sent. At this point, the entire arena chain associated with that particular request-response cycle is deallocated in one go.
How big is it? We don't care, smaller than the arena.
Who owns it? That particular request-response cycle.
Who locks it? No one, if different request-response are parallelised wrt each other; otherwise, depends on the nature of concurrency/parallelism.
[1] Usually, I would have given a gamedev example, and then people would have said that it only works in gamedev. That's why I have tried to give a webdev example, considering the majority demographic on this website.
Some reference:
1. Mike Acton's CPPCon Talk: https://m.youtube.com/watch?v=rX0ItVEVjHc
2. Rant by Casey Muratori (Brusqueness Allergy Warning): https://www.youtube.com/watch?v=f4ioc8-lDc0&t=4407s