My language selection checklist: 1. Does the program need to be fast or complica...

lucb1e · on Dec 1, 2021

By 'complicated' in point 1, do you mean 'large'? Because a complex algorithm should be fine -- heck, it should be better in something like Python because it's relatively easy to write, so you have an easier time thinking about what you're doing, avoid making a mistake that would lead to an O(n³) runtime instead of the one you were going for, takes less development time, etc.

I assume you meant 'large' because, as software like Wordpress beautifully demonstrates, you can have the simplest program (from a user's perspective) in the fastest language but by using a billion function calls for the default page in a default installation, you can make anything slow. Using a slow language for large software, if that's what you meant to avoid then I agree.

And as another note, point number 2 basically excludes all meaningful software. Not that I necessarily disagree, but it's a bit on the heavy-handed side.

fleventynine · on Dec 1, 2021

By complicated I guess I mean "lots of types". Static typing makes up for its cost once I can't keep all the types in my head at the same time.

Point number 2 excludes pretty much all network-connected software, and that's intentional. I suppose single-player games are ok to write in C or C++.

lucb1e · on Dec 3, 2021

> 2 excludes pretty much all network-connected software

Not caring about local privilege escalation I see ;). Attack surface might be lower, but from lockscreens to terminals there is a lot of stuff out there that doesn't need to be connected to the internet itself before I find it quite relevant to consider whether it was written in a dumb language.

Twisol · on Dec 1, 2021

I suspect Ada would make the cut, with the number of times it's been referenced in these contexts, but I haven't actually taken the time to learn Ada properly. It seems like a language before its time.

pyjarrett · on Dec 2, 2021

It's never too late to start!

- https://alire.ada.dev/

- https://learn.adacore.com/

- https://pyjarrett.github.io/programming-with-ada/

IshKebab · on Dec 1, 2021

As I understand it it's only memory safe if you never free your allocations, which is better than C but not an especially high bar. Basically the same as GC'd languages but without actually running the GC.

It does have support for formal verification though unlike most languages.

pyjarrett · on Dec 1, 2021

> if you never free your allocations

Technically, it used to be memory safe before this and they rolled back the restrictions to allow "unchecked deallocation".

Pointers ("Accesses") are also typed, e.g. you can have two incompatible flavors of "Widget*" which can't get exchanged which helps reduce errors, and pointers can only point at their type of pointer unless specified otherwise, are null'd out on free automatically and checked at runtime. In practice, you just wrap your allocations/deallocations in a smart pointer or management type, and any unsafe usages can be found in code by looking for "Unchecked" whether "Unchecked_Access" (escaping an access check) or "Unchecked_Deallocation".

The story is quite different in Ada because of access type checks, it doesn't use null-terminated strings, it uses bounded arrays, has protected types for ensuring exclusive access and the language implicitly passes by reference when needed or directed.

My experience with writing concurrent Ada code has been extremely positive and I'd highly recommend it.

pjmlp · on Dec 2, 2021

Ada has improved a lot since Ada 83, it is quite easy to use RAII since Ada 2005.

Jach · on Dec 1, 2021

Re 3, people have known how to build real-time GCs since like the 70s and 80s. Lots of Lisp systems were built to handle real-time embedded systems with a lot less memory that our equivalent-environment ones have today. Even Java was originally built for embedded. While it's curious that mainstream GC implementations don't tend to include real-time versions (and for harder guarantees need to have all their primitives documented with how long they'll execute for as a function of their input, which I don't think Rust has), it might be worth it to schedule 3-6 months of your project's planning to make such a GC for your language of choice if you need it. If you need to be hard real time though, as opposed to soft, you're likely in for a lot of work regardless of what you do. And you're not likely going to be building a mass-market application like a browser on top of various mass-market OSes like Windows, Mac, etc.

throwaway894345 · on Dec 1, 2021

If your "deterministic amount of time" can tolerate single-digit microsecond pauses, then Go's GC is just fine. If you're building hard real time systems then you probably want to steer clear of GCs. Also, "developer velocity" is an important criteria for a lot of shops, and in my opinion that rules out Rust, C, C++, and every dynamically typed language I've ever used (of course, this is all relative, but in my experience, those languages are an order of magnitude "slower" than Go, et al with respect to velocity for a wide variety of reasons).

whimsicalism · on Dec 1, 2021

My impression was Go's GC was a heck of a lot slower than "single-digit microsecond pauses." I would love a source on your claim

throwaway894345 · on Dec 2, 2021

I had seen some benchmarks several years ago around the time when the significant GC optimizations had been made, and I could've sworn they were on the order of single-digit microseconds; however, I can't find any of those benchmarks today and indeed any benchmarks are hard to come by except for some pathological cases with enormous heaps. Maybe that single-digit µs values was a misremembering on my part. Even if it's sub-millisecond that's plenty for a high 60Hz video game.

fleventynine · on Dec 1, 2021

If it can really guarantee single-digit microsecond pauses in my realtime thread no matter what happens in other threads of my application, that is indeed a game changer. But I'll believe it when I see it with my own eyes. I've never even used a garbage collector that can guarantee single-digit millisecond pauses.

titzer · on Dec 2, 2021

Have you measured the pause times of free()? Because they are not deterministic, and I have met few people who understand in detail how complex it can be in practice. In the limit, free() can be as bad as GC pause times because of chained deallocation--i.e. not statically bounded.

gpderetta · on Dec 2, 2021

People don't call free from their realtime threads.

alophawen · on Dec 2, 2021

This is true, but for performance's sake, you should not alloc/free in a busy loop, especially not on a real time system.

Allocate in advance, reuse allocated memory.

titzer · on Dec 6, 2021

> Allocate in advance, reuse allocated memory.

In practice, almost all real-time systems use this strategy, even going so far as to allocate all memory at compile time. The first versions of Virgil (I and II) used this strategy and compiled to C.

When doing this, the whole debate of memory safe (automatic) vs unsafe (manual) is completely orthogonal.

cozzyd · on Dec 2, 2021

But you can generally control when free is called.

dpifke · on Dec 2, 2021

Not sure current state of the art, but Go's worst-case pause time five years ago was 100µs: https://groups.google.com/g/golang-dev/c/Ab1sFeoZg_8

elabajaba · on Dec 2, 2021

Discord was consistently seeing pauses in the range of several hundred ms every 2 minutes a couple years ago.

https://blog.discord.com/why-discord-is-switching-from-go-to...

terinjokes · on Dec 2, 2021

Hard to say without more details, but those graphs look very similar to nproc numbers of goroutines interacting with the Linux-of-the-time's CFS CPU scheduler. I've seen significant to entire improvement to latency graphs simply by setting GOMAXPROC to account for the CFS behavior. Unfortunately the blog post doesn't even make a passing mention to this.

randomswede · on Dec 2, 2021

Anecdotally, the main slowdown we saw of Go code running in Kubernetes at my previous job was not "GC stalls", but "CFS throttling". By default[1], the runtime will set GOMACSPROCS to the number of cores on the machine, not the CPU allocation for the cgroup that the container runs in. When you hand out 1 core, on a 96-core machine, bad things happen. Well, you end up with a non-smooth progress. Setting GOMACPROCS to ceil(cpu allocation) alleviated a LOT of problems

Similar problems with certain versions of Java and C#[1]. Java was exacerbated by a tendency for Java to make everything wake up in certain situations, so you could get to a point where the runtime was dominated by CFS throttling, with occasional work being done.

I did some experiments with a roughly 100 Hz increment of a prometheus counter metric, and with a GOMAXPROCS of 1, the rate was steady at ~100 Hz down to a CPU allocation of about 520 millicores, then dropping off (~80 Hz down to about 410 millicores, ~60 hz down to about 305 millicores, then I stopped doing test runs).

[1] This MAY have changed, this was a while and multiple versions of the compiler/runtime ago. I know that C# had a runtime release sometime in 2020 that should've improved things and I think Java now also does the right thing when in a cgroup.

terinjokes · on Dec 2, 2021

AFAIK, it hasn't changed, this exact situation with cgroups is still something I have to tell fellow developers about. Some of them have started using [automaxprocs] to automatically detect and set.

[automaxprocs]: https://github.com/uber-go/automaxprocs

randomswede · on Dec 3, 2021

Ah, note, said program also had one goroutine trying the stupidest-possible way of finidng primes in one goroutine (then not actyakly doing anything with the found primes, apart from appending them to a slice). It literally trial-divided (well, modded) all numbers between 2 and isqrt(n) to see if it was a multiple. Not designed to be clever, explicitly designed to suck about one core.

pbohun · on Dec 2, 2021

I found this go.dev blog entry from 2018. It looks like the average pause time they were able to achieve was significantly less than 1ms back then.

"The SLO whisper number here is around 100-200 microseconds and we will push towards that. If you see anything over a couple hundred microseconds then we really want to talk to you.."

https://go.dev/blog/ismmkeynote

syntheticcorp · on Dec 1, 2021

I believe Java’s ZGC has max pause times of a few milliseconds

dralley · on Dec 2, 2021

Shenandoah is in the the same latency category as well. I haven't seen recent numbers but a few years ago it was a little better latency but a little worse throughput.

titzer · on Dec 2, 2021

3b. Does your program need more than 100Mb of memory?

If no, then just use a GC'd language and preallocate everything and use object pooling. You won't have GC pauses because if you don't dynamically allocate memory, you don't need to GC anything. And don't laugh. Pretty much all realtime systems, especially the hardest of the hard real time systems, preallocate everything.

chii · on Dec 2, 2021

> My language selection checklist:

1. What are the people going to implement this an expert in?

Choose that. Nothing else matters.

kobebrookskC3 · on Dec 2, 2021

pretty sure the people who wrote the vulnerable code were experts.

eyelidlessness · on Dec 1, 2021

Answering a question with a sincere question: if the answer to 3 is yes to deterministic time, but no to tight memory constraints, does Swift become viable in question 4? I suspect it does, but I don’t know nearly enough about the space to say so with much certainty.

fleventynine · on Dec 1, 2021

I'm not super familiar with Swift, but I don't see how it could be memory-safe in a multi-threaded context without some sort of borrow checker or gc. So I think it is rejected by question #2.

eyelidlessness · on Dec 2, 2021

Swift uses automatic reference counting. From some cursory reading, the major difference from Rust in this regard is that Swift references are always tracked atomically, whereas in Rust they may not be atomic in a single-owner context.

To my mind (again, with admittedly limited familiarity), I would think:

- Atomic operations in general don’t necessarily provide deterministic timing, but I'm assuming (maybe wrongly?) for Rust’s case they’re regarded as a relatively fixed overhead?

- That would seem to hold for Swift as well, just… with more overhead.

To the extent any of this is wrong or missing some nuance, I’m happy to be corrected.

fleventynine · on Dec 2, 2021

Incrementing an atomic counter every time a reference is copied is a significant amount of overhead, which is why most runtimes prefer garbage collection to reference counting (that, and the inability of referencing counting to handle cycles elegantly).

Rust doesn't rely on reference counting unless explicitly used by the program, and even then you can choose between atomically-reference-counted pointers (Arc) vs non-atomic-reference-counted pointers (Rc) that the type system prevents from being shared between threads.

eyelidlessness · on Dec 2, 2021

I promise I’m not trying to be obtuse or argumentative, but I think apart from cycles your response restates exactly what I took from my reading on the subject and tried to articulate. So I’m not sure if what I should take away is:

- ARC is generally avoided by GC languages, which puts Swift in a peculiar position for a language without manual memory management (without any consideration of Swift per se for the case I asked about)

- Swift’s atomic reference counting qualitatively eliminates it from consideration because it’s applied even in single threaded workloads, negating determinism in a way I haven’t understood

- It’s quantitatively eliminated because that overhead has such a performance impact that it’s not worth considering

astrange · on Dec 2, 2021

Swift has a similar memory model to Rust, except that where Rust forbids things Swift automatically copies them to make it work.

People using other languages appear terrified of reference count slowness for some reason, but it usually works well, and atomics are fast on ARM anyway.

saagarjha · on Dec 2, 2021

It's important to note that while Swift often allows for code similar to Rust to be written, the fact that it silently inserts ARC traffic or copies often means that people are going to write code that do things that Rust won't let them do and realize after the fact that their bottleneck is something that they would never have written in Rust. I wouldn't necessarily call this a language failure, but it's something worth looking out for: idiomatic Swift code often diverges from what might be optimally efficient from a memory management perspective.