This article demonstrates well something I've seen a lot with people coming from GC languages getting into Rust: they just write the code the way they're used to and work around the borrow checker by slapping `Arc<Mutex<_>>` all over the place.
Then that leads to frustration and wondering why you even got rid of the GC in the first place if you end up with a crappier non-transparent reference-counted garbage collection with all these Arc/Rc. Some devs even seem to think that non-GC devs are silly poseurs who refuse to have a garbage collector only to reimplement it manually with reference counting. We are silly poseurs alright, but we're more efficient than that!
I share the author's conclusions: don't do that. If you find yourself slapping Mutexes and Arc/Rc all over the place it probably means that there's something messed up with the way you modeled data ownership within your program.
In fairness, "just use Rc<>/Arc<>/clone()/etc" is common advice from the Rust community in response to criticism that the borrow checker puts an undue burden on code paths which aren't performance sensitive.
> If you find yourself slapping Mutexes and Arc/Rc all over the place it probably means that there's something messed up with the way you modeled data ownership within your program.
It only means that the data model doesn't agree with Rust's rules for modeling data (which exist to ensure memory safety in the absence of a GC). This doesn't mean that the programs the user wants to express are invalid. And this really matters because very often it doesn't make economic sense to appease the borrow checker--there are a lot of code paths for which the overhead of a GC is just fine but lots of time spent battling the borrow checker is not fine, and I think Rust could use a better story for "gracefully degrading" here. I say this as a Rust enthusiast.
EDIT: I can also appreciate that Rust is waiting some years to figure out far it can get on its ownership system alone as opposed to building some poorly-conceived escape hatch that people slap on everything.
> In fairness, "just use Rc<>/Arc<>/clone()/etc" is common advice from the Rust community in response to criticism that the borrow checker puts an undue burden on code paths which aren't performance sensitive.
Yes, and I think it's good to push back on that. I personally feel it's pretty misguided. I've been meaning to write about this, but haven't... so I'll just leave a comment, heh. Someday...
Thanks you for that example. You really should put that somewhere public.
However, that example commits the Rust "sin" I always see with "spawn" explanations in that it packs everything into a closure.
Please, please, please ... for the sake of newbies everywhere ... please define a function and call that function from inside the spawn closure. This is one of the Rust things I spend the most time explaining and unpacking for junior engineers.
The separate function makes explicit the variables that are moving/borrowing/cloning around, which names are external/passed/closure-specific, and how it all ties together. It breaks apart the machinery that gets all conflated with the closure. They can always make it succinct again later once they understand the machinery more completely.
This gets particularly bad with "event loops" (I have a loooooong rant I need to write up about event loops and why they are evil--and it's not Rust specific) and deep "builder chains" (which I consider a Rust misfeature that desperately needs to go away).
The issue in Rust is that "spawn" is generally the first time that a programmer is forced to confront closures--before that they can completely ignore them.
It is a quirk of Rust that someone who "just wants to spawn a thread" suddenly gets all this language overhead dumped on them in a block.
What about iterators? My instinct is that people would run into wanting to use `map` or `filter` or something before they feel the need to spawn threads when using a new language, although that might be my bias coming from a more functional background before learning Rust. The types of closures used as predicates tend to be a lot simpler though, so I guess this may not be what you mean by "confronting" them.
Iterators? Nope. You can happily live in "for x in foo" land without ever touching map/filter/collect.
> might be my bias coming from a more functional background before learning Rust
This is exactly the problem. The people I'm dealing with are coming from "imperative land" and haven't had a functional background. Someone stepping up to Rust as "C with memory safety" does not have any of that functional background.
Please do remember that a "closure" is built on a lot of prior abstractions. What is a "scope"? What is "variable capture"? What is "heap" and "stack"? Why does that matter here and not normally?
No programming language is only used by experts, and Rust is no exception.
The issue is that "spawn" smacks you with a bunch of that baggage all at once.
Yes! That time I wanted to learn Rust every time I saw a complicated closure my mental parser/linter crashed (from a lifetime C developer PoV).
I don't understand how a function that is defined in-place is more clear than a nice function on its own. Perhaps because it gives a sense of continuity (like in async code), but man, it's really painful to read (at least to me). In fact, if I have to write some Rust right now, all my closures would be defined elsewhere, whenever possible.
I guess someone coming from only C might not have encountered closures, but most languages have them these days (modern javascript code uses them everywhere), and I don't think they should be considered especially advanced. "How do closures work?" Was a question I was expected to know how to answer when going for my first ever junior programming job with zero professional experience (and no CS degree either).
Someone coming from C also would certainly know about the difference between the stack and the heap too, so it's a little strange to me that GP called that out
A lot of embedded code NEVER calls malloc (thus the need for no_std in Rust). Consequently, you can go a long way in embedded without really knowing the difference between stack and heap.
"Embedded programmer" does not imply "Linux Kernel Hacker".
There’s a pretty wide chasm between kernel hackers and people who have used malloc before. :) And an embedded programmer ought to have the ability to intuit (a naive) malloc on their own (you can have malloc without a kernel).
Map/filter closures almost always never actually enclose variables, instead just acting on the elements from the iterator. They’re basically a one-off nameless function. On the other hand, closures used for spawning threads almost always enclose variables from the outer context. That’s an important distinction that makes them seem quite different even if they’re the same underneath.
When I started with Rust, I actually thought that the common
spawn(|| …)
syntax was something special, not just a closure with no variables.
"Just use clone" seems like reasonable advice to give people on day 1 or week 1. I guess the hazard there is that they might end up writing `fn foo(s: String)` everywhere instead of `fn foo(s: &str)`, but they can gradually learn the better thing through case-by-case feedback, and correcting this is usually a small change at each callsite rather than a total program rewrite.
On the other hand "just use Arc" is definitely smelly advice. I think going down that route, a new Rust programmer is likely to try to do something that really won't ever compile, and wind up really frustrated. Maybe we can distinguish this often-really-bad advice from the other mostly-ok-at-first advice?
"Just use clone" is absolutely fine. I even put it in the book!
Yes, I guess to me they're distinct, but I can totally see how it may seem similar. I'll try to make sure to make that explicit when I talk about this, thanks!
The recommended solution uses "scoped_threadpool".
But "scoped_threadpool" uses "unsafe".[1] They could not actually do this in Rust. They had to cheat. The language has difficulty expressing that items in an array can be accessed in parallel. You can't write that loop in Rust itself without getting the error "error[E0499]: cannot borrow `v` as mutable more than once at a time".
And, sure enough, the "unsafe" code once had a hole in it.[2] It's supposedly fixed.
If you look at the example with "excessive boilerplate" closely, you can see that it doesn't achieve concurrency at all. It locks the entire array to work on one element of the array, so all but one of the threads are blocked at any time. To illustrate that, I put in a "sleep" and some print statements.[3]
You could put a lock on each array element. That would be a legitimate solution that did not require unsafe code.
To do this right you need automatic aliasing analysis, which is hard but not impossible. (Been there, done that.) Or a special case for "for foo in bar", which needs to be sure that all elements of "bar" are disjoint".
> But "scoped_threadpool" uses "unsafe".[1] They could not actually do this in Rust. They had to cheat.
Using unsafe isn't "cheating". The whole point of using Rust is that you can encapsulate very small amounts of "unsafe" code within a safe abstraction, and know that there is no way to abuse the safe APIs to produce undefined behavior. The formal verification project "RustBelt" [0] has proven exactly this -- safe Rust code composes arbitrarily to produce safe behavior.
There is still a burden of proof on anyone writing a safe function that uses unsafe functionality internally. Keep in mind that "unsafe" Rust still poses more restrictions on the programmer compared to raw C/C++.
> The language has difficulty expressing that items in an array can be accessed in parallel.
The Rust borrow semantics is based on `exclusive XOR shared` references. At the first order of approximation it's not possible to mutate a value through a shared reference. Interior mutability complicates this image somewhat, but it's enough to explain why what you wrote is wrong.
Spawning `n` threads and giving each thread a mutable reference to the vector `v` goes against the above borrow semantics because that would result in `n` exclusive references. Only through exclusive references is mutation allowed. Since this is not allowed it becomes a compile-time error.
In other words, it's not that the language has "difficulty expressing" that scenario. It was explicitly designed to not allow it.
Using unsafe isn't "cheating". The whole point of using Rust is that you can encapsulate very small amounts of "unsafe" code within a safe abstraction, and know that there is no way to abuse the safe APIs to produce undefined behavior. The formal verification project "RustBelt" [0] has proven exactly this -- safe Rust code composes arbitrarily to produce safe behavior.
"Within a safe abstraction" is the question. The question is whether a piece of encapsulated code is always memory-safe for all uses. There's no guarantee of that from the language. The RustBelt people are working on tools for that, but it will probably require proof work in Coq for each bit of code containing "unsafe" to get there.
Look at the scoped_threadpools example again [2]
use scoped_threadpool::Pool;
fn main() {
let mut pool = Pool::new(3);
let mut v = vec![1, 2, 3];
pool.scoped(|scope| {
for i in &mut v {
scope.execute(move ||{
*i += 1;
});
}
});
println!("v: {:?}", v);
}
There's an implicit assumption here that all the values returned from the iterator at
for i in &mut v
are disjoint. But iterators, in general, do not guarantee disjoint outputs. An iterator which returned each reference value twice, for example, is a valid iterator. But used in the context above, two threads would receive mutable access to the same element of a vector. That seems to violate a core Rust safety assumption.
This would be a nice test to run through the version of the Miri interpreter that implements the dynamic "stacked borrow" checks from the RustBelt group.[2] That tool should catch this.
> There's an implicit assumption here that all the values returned from the iterator at
> for i in &mut v
> are disjoint. But iterators, in general, do not guarantee disjoint outputs. An iterator which returned each reference value twice, for example, is a valid iterator. But used in the context above, two threads would receive mutable access to the same element of a vector. That seems to violate a core Rust safety assumption.
Your assumptions here are wrong. "An iterator which returned each reference value twice" could only be implemented using unsafe Rust, and code emitting multiple co-occuring mutable references to the same value are instant-UB. In other words, a "safe abstraction" being able to do this is actually not a safe abstraction at all. This would be a bug in the implementation of the custom iterator.
In fact, if you tried to prove such an implementation using the formal tools (Iris) from RustBelt you wouldn't be able to do it. I know, because I've done it as part of my course work under Lars Birkedal.
I'm not sure if you misunderstanding how `&mut v` turns into an iterator, which values it spits out etc, or what is happening. Let's look at how that code is translated into a program and typechecked. First, let's look at the for-loop itself. Anything on the form of
for x in y
must have an implementation of the `IntoIterator` trait for the type of the expression `y`. Whatever that type might be. In this case `y` is actually `&mut v`, where `v`'s type is `Vec<i32>`. Luckily enough, an implementation of `IntoIterator` exists for any `&mut Vec<T>`. [0]
Looking at the expanded signature we can see that the associated type `Item` is `&mut T`. This tells us that the types the resulting iterator will produce are `&mut i32`. Keep in mind that this reference is into the backing memory store of the `Vec<i32>` of `v`. This is where Rust's semantic help us.
You say the following:
> But iterators, in general, do not guarantee disjoint outputs.
The issue with that statement is that iterators don't have to. Rust guarantees that condition. Safe Rust can't return multiple mutable references to the same value. Further, any unsafe implementation, even when exposed through a safe interface, giving this behavior is a bug. It would be great if the compiler could statically verify unsafe code too, but if it could there wouldn't be a need to call it unsafe.
> The question is whether a piece of encapsulated code is always memory-safe for all uses. There's no guarantee of that from the language.
That is ultimately impossible to prove completely and statically. Turing and Gödel made sure of that. However, some things that the compiler can't prove we can still prove as outside observers. Often these things will boil down to controlling and checking the states of several variables before performing an unsafe operation, but knowing that in a given context the operation is safe.
The first part of the above quote is partially true. However, if we assume a given unsafe implementation is indeed safe then the rules of Rust's semantics make the composition of any number of safe APIs a completely safe composition. This is a strong, useful statement because it means the very few places where unsafe is required can be easily checked, and every other piece of code doesn't have to spend mental energy for the programmer to consider the safety of their implementation.
The issue with that statement is that iterators don't have to. Rust guarantees that condition. Safe Rust can't return multiple mutable references to the same value.
So how does Vec return mutable references to elements? Unsafe code, apparently. You can't write your own safe collection class with an iterator and return mutable references. You'd need a proof of disjointness system to do that.
The question is whether a piece of encapsulated code is always memory-safe for all uses. There's no guarantee of that from the language. That is ultimately impossible to prove completely and statically.
One can construct code for which memory safety is not decidable. That's a good reason to reject it. The Microsoft Static Driver Verifier has a simple solution - if symbolic execution can't verify safety within some time limit, it rejects the driver.
> So how does Vec return mutable references to elements? Unsafe code, apparently. You can't write your own safe collection class with an iterator and return mutable references. You'd need a proof of disjointness system to do that.
If you have a mutable binding to a vector `v` you can get a single mutable reference to one of the elements of that vector at a time. Because getting a single mutable reference to an element requires taking a mutable reference to the vector the disjointedness invariant is upheld. This is the core of the Rust borrow checker. If you're unsure about the semantics you should be reading about it, not trying to sus them out from a fairly surface level discussion on HackerNews.
This code:
for x in &mut v {
*x = // ...
}
is perfectly valid. For every iteration of the loop only a single mutable reference to an element exists, and the reference is dropped before the next mutable reference is taken. This following piece of code is a full example that showcases how taking two mutable references is not allowed.
Do note that using `mut_one` after `mut_two` is important, as otherwise the compiler will infer that `mut_one` can be dropped before `mut_two` is create, removing the clash.
> So how does Vec return mutable references to elements? Unsafe code, apparently.
The implementation can be found quite easily[0]. It's essentially invariants upheld by runtime checks and knowledge about the state of the given references/pointers. Notice the comments starting with "SAFETY". They explain the assumptions/conditions/reasons that make the code inside the `unsafe` block safe. If these assumptions can never be violated with malicious input or calling patterns then the code is actually safe and it can be a safe wrapper.
We're talking about Vec's iterator, not simple slice access. That "SAFETY" stuff in slice.rs is just a subscript check. Vec's iterator is in [1]. The iterator which returns a mutable reference is unsafe code, because you can't express that concept in safe code.
To recap: the interesting issue is the thread pool example which spins off a new thread with a mutable reference to each element of an array. That's a rather unusual thing to do. It is only safe if each iteration returns a reference disjoint from all other references. Because Rust at the language level does not understand disjointness within an array, it needs a hack using "unsafe" to make this work. The borrow checker would not allow this in safe code.
chunks_mut uses the same trick, with more unsafe code.
My point in all this is that because the language doesn't have syntax for talking about disjointness of parts of an array, each time this comes up, another hack in unsafe code is required. But enough here; I may say more on the Rust forums.
> They could not actually do this in Rust. They had to cheat.
That isn't true at all. The implementation of scoped_threadpool being in rust is all the evidence you need that they did not cheat. unsafe is a perfectly valid construct in rust that does not subvert its benefits. It just is a reversal of the defaults in other languages (unsafe by default - opt-in to safer constructs like smart_ptr).
People freak out too much over this stuff. If you are learning - feel free to use escape hatches from "idiomatic rust" but recognize that you aren't being idiomatic (and therefore may be handicapping your learning!). Its okay to use unsafe despite idiomatic rust trying to avoid it unless necessary (and even then providing safe wrappers over it). Its also okay to use Arc<Mutex> and the like if you need to just recognize there may be a better way.
unsafe is not cheating. It is part of the language for a reason, there to be used when appropriate. It is literally impossible to make a useful program without any unsafe code anywhere, because there aren’t Rust CPUs. By this definition, “Rust” does not exist.
thread::spawn uses unsafe code. sleep uses unsafe code. Is your own example not in Rust? Are you cheating?
(And yes, the spawn version isn’t optimally concurrent. The comment was long enough without getting into that. People struggle to get examples to compile before they worry about things like this. That would be next steps.)
If you need "unsafe" in pure Rust code, not to talk to something external, the language has failed you at some point.
There are a relatively small number of trouble spots. They include, at least:
- No way to talk about partially initialized arrays in Rust, which leads to Vec needing "unsafe".
- Back references, which makes some kinds of trees, and of course doubly-linked lists, difficult.
Those I've mentioned before. From the discussion above, add:
- Problems around interior mutability within arrays, where you need some way to say, and prove, "a is disjoint from b", before working separately on a and b.
These are classic proof of correctness areas. It's fairly straightforward to state the conditions that have to be proven, and usually not that hard to prove them. The RustBelt crowd seems to be working on this.
Maybe they'll come up with a way to prove out conditions like that, short of grinding away by hand in Coq. Most of those problems are within reach of a fully automatic prover, like a SAT solver.
> These are classic proof of correctness areas. It's fairly straightforward to state the conditions that have to be proven, and usually not that hard to prove them.
You're right, but current Rust cannot express logical preconditions in code, much less compiler-checkable proofs. Many groups are working on this via a variety of approaches, but it will take some time before there's a common standard for expressing these things as part of the Rust language itself.
From a language like Modula-3 or Ada point of view, unsafe should be used only for low level system programing stuff, like passing data into DMA buffers, OS APIs, Assembly and similar.
I agree with Animats here, to work around typesystem issues means that the type system still isn't expressive enough to define certain proprieties in a safe way.
> It only means that the data model doesn't agree with Rust's rules for modeling data (which exist to ensure memory safety in the absence of a GC).
This doesn’t seem correct at all?
GC merely solves freeing memory when it is no longer needed. But it does not solve parallel access.
One of the beauties of Rust is that you can write highly parallel code and if it compiles it works. Meanwhile Python languishes behind the GIL. GC does not even attempt to solve multithreaded memory access.
I’m happy to be corrected if I’m wrong or missing something here.
You're right on the local point that Rust's ownership rules guarantee correctness in the case of parallel access while GCs do not; however, the broader point is that Rust's ownership rules also reject many programs which would be correct with GC.
For example:
fn foo<'a>() -> &'a [u8] {
let v = vec![0, 1, 2];
// returns a value referencing data owned by the current function
v.as_slice()
}
vs Go's:
func foo() []uint8 {
// subject to escape analysis, but code like this would likely return
// a fat pointer into the heap--no complaints because there's a GC.
return []uint8{0, 1, 2}
}
> One of the beauties of Rust is that you can write highly parallel code and if it compiles it works. Meanwhile Python languishes behind the GIL. GC does not even attempt to solve multithreaded memory access.
Python's GIL is unrelated to GC, but yes, Rust's borrow checker guarantees correct parallel access of memory. But I think this benefit is overblown for a couple reasons:
1. contrary to popular opinion, if you've learned how to write parallel programs, it's not tremendously difficult to write them correctly without a borrow checker. In my experience, whatever time I've saved debugging pernicious data races is lost by the upfront cost of pacifying the borrow checker. Maybe this wouldn't hold for people who aren't experienced with writing parallel code (but I imagine such people would have a harder time grokking the borrow checker as well).
2. most data races in my experience aren't single threads on a host accessing a piece of shared memory, but rather many threads on many hosts accessing some network resource (e.g., an S3 object). The borrow checker doesn't help here at all, but you still have to "pay the borrow checker tax".
Again, this isn't a tirade against the borrow checker, but an insistence that tradeoffs exist and it's not just a "you're just doing it wrong" sort of thing.
Ehhh I’d have to hard disagree with #1. But we’ll likely just have to agree to disagree.
Maybe I’m just a bad enough programmer I write parallel bugs. But C++ certainly doesn’t make it easy to write correct code in any way.
I personally think it’s pretty darn difficult to ensure correctness in a large program. Especially when multiple programmers are involved. And especially when you are adding features to an existing system.
However I will also admit that I haven’t written a large Rust program so I can’t claim to have run into all of its warts.
There’s no silver bullets in life. I work primarily in video games. GC is the bane of existence and is something that provides seriously negative value.
We definitely agree it’s all a trade off. GC provides some value and some costs. Borrow checkers provide some value and some cost.
> It only means that the data model doesn't agree with Rust's rules for modeling data (which exist to ensure memory safety in the absence of a GC).
I think my initial response was that Rust’s model exist for more than just that single reason. Whether those reasons or trade offs are useful depend on the program in question.
In my work I never want a GC, but damn would I love a borrow checker.
C++ provides the tools to build robust parallelism that is also optimal, with the giant caveat that implementing it is left to the programmer (and good libraries are not abundant). Rust offers built-in correctness but not optimality, and C++ offers optimality but no built-in correctness. Many massively parallel codes and virtually all massively concurrent codes necessarily lean on latency hiding mechanics for concurrency and safety. Ownership-based safety can’t be determined at compile-time in these cases, but you can prove in many cases that safety can be dynamically resolved by the scheduler at runtime regardless of how many mutable references exist. This has a lot of similarities to agreement-free consistency in HPC, where no part of the system has the correct state of the system but you can prove that the “system” will always converge on a consistent computational result (another nice set of mechanics from the HPC world).
The problem with ownership-based safety for massive parallelism is that the mechanics of agreeing on and determining ownership don’t scale and often can’t be determined at compile-time. Some other safety mechanics don’t have these limitations. C++ doesn’t have them built-in but you can implement them.
> Maybe I’m just a bad enough programmer I write parallel bugs. But C++ certainly doesn’t make it easy to write correct code in any way. I personally think it’s pretty darn difficult to ensure correctness in a large program.
IMO the key to writing parallel code is to keep the parallelism confined to a small kernel rather than sprawling throughout your codebase. If you try to bolt on parallelism then you’re going to have a bad time. It needs to be part of your architecture. It’s not easy, but it’s easier than writing parallel code that will pass the borrow checker IMHO. But yes, we may have to agree to disagree.
> We definitely agree it’s all a trade off. GC provides some value and some costs. Borrow checkers provide some value and some cost.
Agreed!
> I work primarily in video games. GC is the bane of existence and is something that provides seriously negative value.
I’m very curious about videogames development. In particular, I get the impression that aversion to GC in videogames comes down largely to experiences with Java back when pauses could be 300ms. I’m very curious if Go’s sub-millisecond GC (and its ability to minimize allocations, etc) would be amenable to videogame development. Thoughts?
> I think my initial response was that Rust’s model exist for more than just that single reason. Whether those reasons or trade offs are useful depend on the program in question.
Heartily agree.
> In my work I never want a GC, but damn would I love a borrow checker.
In my line of work, I like the idea of using Rust but realistically the economic sweet spot is something like “Go with sum types” or “Rust-lite”.
> I’m very curious about videogames development. In particular, I get the impression that aversion to GC in videogames comes down largely to experiences with Java back when pauses could be 300ms. I’m very curious if Go’s sub-millisecond GC (and its ability to minimize allocations, etc) would be amenable to videogame development. Thoughts?
It really just comes down to average/worst case time.
Modern games are expected to run anywhere from 60 to 240 frames per second. 60 is the new baseline. VR runs anywhere from 72 to 120. Gaming monitors regularly hit 144Hz. And esports goes as high as 240 and even 360.
In Unity C# GC can take tens of milliseconds. This is, uh, obviously very bad. High tier unity games spend a LOT of time avoiding all allocs. This is not fun in a GC language. Most indie games just hitch and its pretty obvious. I’m not sure if Unity’s incremental GC has graduated from experimental mode.
If a GC had a worst case time of less than a millisecond that’d be fine. That’s actually a pretty big chunk of a 7ms frame, but hey probably worth it. If it’s usually 250us but once a minute spikes to 3ms that’ll cause a frame to miss. If once every 5 minutes it’s a 50ms GC that’s a huge hitch. For a single player game it’s sloppy. For a competitive multiplayer game it’s catastrophic.
Unreal Engine actually has a custom garbage collector. But it’s only for certain days types and not all memory. That’s a nice compromise. Games in particular are good at knowing if the lifetime of an allocation is short, per frame, long-term, etc.
> I’m very curious about videogames development. In particular, I get the impression that aversion to GC in videogames comes down largely to experiences with Java back when pauses could be 300ms. I’m very curious if Go’s sub-millisecond GC (and its ability to minimize allocations, etc) would be amenable to videogame development. Thoughts?
Well, Minecraft is written in Java, and it runs fine from what I’ve heard. In .NET land, there was a short lived toolkit for C# called XNA - Terraria is (was?) written in it. Both Java and C# are garbage collected.
I haven’t looked at Unity too deeply, but isn’t Unity (and the games made in it) built in C#?
Game programmers mostly want tight control of object layout and lifecycle. GC doesn't matter much when use ECS all over your codebase. As long as they can run the GC when they want and can layout the objects flat without needing pointer indirection it would be very suitable.
I think C# is popular because it allows the above. When Java has proper value types it might be suitable for writing games.
> As long as they can run the GC when they want and can layout the objects flat without needing pointer indirection it would be very suitable.
To be clear, the problem isn’t pointer indirection, but rather lots of objects on the heap, right? Pointers should be fine as long as there aren’t many allocations (e.g., pointers into an arena)?
> I think C# is popular because it allows the above. When Java has proper value types it might be suitable for writing games.
Go also has value types, FWIW, and they are a lot more idiomatic than in C# from what I’ve observed.
Yeah, in theory, but I’ve never had much luck with OCaml and every time I dig into my problems here it ends with the OCaml fanboys berating me so I’ve pretty much given up on it.
My example is probably not very good. Let’s say you have a function which implements a callback. The function takes a &str and returns some subslice. Let’s say you want to implement another function which implements that same interface, but returns the string lowercases. In Rust, you have to update the interface, you have to update all implementations, and you have to jump through hoops to avoid unnecessary cloning in the subslice case. In Go or other GC languages, there aren’t “owned” vs “unowned” types, so you don’t have to make any updates to the interface or other implementations. In a sense, GC allows us to abstract over ownership.
>Rust's ownership rules guarantee correctness in the case of parallel access while GCs do not
That’s a dangerous misconception. The Rust ownership model only guarantees that the program is free of data races. That’s a necessary but not sufficient condition of program correctness.
>Rust's ownership rules guarantee correctness in the case of parallel access while GCs do not
Only for threads accessing in-memory data structures, it does nothing for other kinds of data access scenarios either to external resources or via OS IPC mechanisms.
I'd add: some GC'd languages solve this with sending around deeply immutable objects (functional languages mostly), in a way that can be more flexible than the way Rust handles immutable objects.
> In fairness, "just use Rc<>/Arc<>/clone()/etc" is common advice from the Rust community in response to criticism that the borrow checker puts an undue burden on code paths which aren't performance sensitive.
I hypothesize that this mostly comes from a laziness in response as it's an easy response to give to people who are used to having a garbage collector. I come from the C world (still learning Rust) and every time I see one of these pieces of advice given I'm forced to facepalm.
Clone is "okish", but I don't remember a case where I had to use Rc or Arc (sure, there are use cases for that, but not for basic stuff)
Some people just try to force their way into a new language and don't realize that if you keep doing something that looks stupid or weird it probably is (and no, yours is not a special case)
> Clone is "okish", but I don't remember a case where I had to use Rc or Arc (sure, there are use cases for that, but not for basic stuff)
I think the use case is "I haven't yet completely grokked the borrow checker and/or I don't have time to pacify it, but I would prefer not to copy potentially large data structures all over with Clone".
> Some people just try to force their way into a new language and don't realize that if you keep doing something that looks stupid or weird it probably is (and no, yours is not a special case)
You're responding to my comment which is about the Rust community prescribing this as a solution to newcomers. We're not talking about newcomers obstinately refusing to learn new idioms in the language they allegedly want to learn (although no doubt this happens, especially if the language in question is Go :p ).
> I think the use case is "I haven't yet completely grokked the borrow checker and/or I don't have time to pacify it, but I would prefer not to copy potentially large data structures all over with Clone".
For basic stuff I agree, though I wouldn't use it.
> about the Rust community prescribing this as a solution to newcomers.
There's probably a sweet spot for using those constructs in not so obvious places while going through the simpler stuff in a more idiomatic way
> very often it doesn't make economic sense to appease the borrow checker--there are a lot of code paths for which the overhead of a GC is just fine but lots of time spent battling the borrow checker is not fine
To me it's not about performance. A little bit of time spent now appeasing the borrow checker will pay off ten fold later when you don't have to deal with exploding memory usage and GC stalls in production.
GC is great for quick hack jobs, scripts, or niches like machine learning, but I believe at this point it's a failed experiment for anything else.
> To me it's not about performance. A little bit of time spent now appeasing the borrow checker will pay off ten fold later when you don't have to deal with exploding memory usage and GC stalls in production.
I'm confused by the "it's not about performance. [reasons why it is, in fact, about performance]" phrasing, but in general a lot of applications aren't bottlenecked by memory and a GC works just fine. Even when that's not entirely the case, they often only have one or two critical paths that are bottlenecked on memory, and those paths can be optimized to reduce allocations.
> GC is great for quick hack jobs, scripts, or niches like machine learning, but I believe at this point it's a failed experiment for anything else.
That sounds kind of crazy considering how much of the world runs on GC (certainly much more than the other way around). I feel the need to reiterate that I'm not a GC purist by any means--I've done a fair amount of C and C++ including some embedded real time. But the idea that GC is a failed experiment is utterly unsupported.
That's the story that GC sold us. History has proven it wrong. Citation: the fire hose of articles on HN about how GC bit people in the ass, and they now have to go back into their code and write a bunch of duct tape code to work around Garbage Collector Quirk #4018 de jure that results in hitching, insane memory usage, and random OOMs.
> That sounds kind of crazy considering how much of the world runs on GC
And much of HN runs on comments complaining about the _absurd_ amounts of memory all those non-bottlenecked applications use to do otherwise simple tasks. Or the monthly front page articles about developers and companies working to fix their otherwise straightforward, non performance critical production services that are choking themselves because the GC is going wild.
I say GC is a failed experiment because it promised that programmers would be able to write code without worrying about memory. But ever since its popularization 26 years ago with the dawn of Java, coders writing in garbage collected languages have been doing nothing but worrying about memory. The experiment failed. It's time to move on.
The borrow checker is an infantile incarnation of a bigger idea that is finally panning out. Rather than garbage collecting during run-time; garbage collect during compilation using static analysis. Being in its infancy it's not as easy and free to use as we'd like. But it's the path forward. And just like garbage collection before it, in the vast majority of cases, programmers don't care whether it's more or less performant. Garbage collection was vastly less performant than manual management. But it required _so_ much less developer time to build the same applications. My argument is that Rust's borrow checker, as painful as it is, results in more developer time up front, but less developer time overall when you consider the long tail of code upkeep that garbage collected applications demand.
Hence my comment: "A little bit of time spent now appeasing the borrow checker will pay off ten fold later when you don't have to deal with exploding memory usage and GC stalls in production."
It's not about performance; it's about saving yourself the time of having to come back to your code a month later because your TODO app is using a gig of RAM and randomly hitching.
I’ve been working in tech for a decade, I’ve scaled several large products and worked on distributed complex systems with a lot of users and some seriously workloads. Memory use has been a major issue perhaps two or three times total.
> It's not about performance; it's about saving yourself the time of having to come back to your code a month later because your TODO app is using a gig of RAM and randomly hitching.
If your GC program is using excessive RAM, that’s because of a memory leak, not the garbage collector. This can happen in C/C++ as well; just malloc/new and forget to free/delete. Last I checked, C and C++ aren’t garbage collected languages.
It's rarer, and you can rule it out entirely by just not using types that let you leak memory. Afaik, circular `Rc` references, `Box::leak` (and friends), `MaybeUninit` and overzealous thread spawning are the only ways of leaking memory in safe Rust.
Even if you avoid those things, how does safe Rust make leaks more rare than in a GC language? Presumably leaks in a GC language or safe Rust are almost always going to be stuff like “pushing things into a vector repeatedly even though you only care about the last item in the vector”, and clearly safe Rust doesn’t stop you from doing this any more than a GC.
Note also that GC languages don’t even have the circular references case to worry about since they don’t have any need for reference counting in general.
> That's the story that GC sold us. History has proven it wrong. Citation: the fire hose of articles on HN about how GC bit people in the ass, and they now have to go back into their code and write a bunch of duct tape code to work around Garbage Collector Quirk #4018 de jure that results in hitching, insane memory usage, and random OOMs.
I follow HN daily and very rarely do I see articles lamenting GC (I'm only familiar with a small handful of incidents including some pathological cases with Go's GC on enormous heaps (many terrabytes) and some complaints about Java's GC having too many knobs), certainly not in the general case. Indeed, for the most part people seem quite happy with GC languages, especially Go's sub-ms GC. In particular, memory usage (and thus OOMs) have nothing to do with GC--it's every bit as easy to use a lot of memory in a language that lacks GC altogether. This is incorrect, full stop.
> I say GC is a failed experiment because it promised that programmers would be able to write code without worrying about memory.
GC promises that programmers don't have to worry about freeing memory correctly, and it delivers on that promise. I'm not a GC purist--there's lots of criticism to be had for GC, but we don't need point at patently false criticisms.
> The borrow checker is an infantile incarnation of a bigger idea that is finally panning out. Rather than garbage collecting during run-time; garbage collect during compilation using static analysis. Being in its infancy it's not as easy and free to use as we'd like. But it's the path forward.
Maybe. I like the idea, but I'm skeptical that putting constraints on the programmer is going to be an economical solution, at least for so long as the economics favor rapid development over performance. Conceivably rather than rejecting code that aggrieves the borrow checker, we could picture a language that converts those references into garbage collected pointers transparently, but we kind of have this already today via escape analysis--and indeed, I think this is the economic sweet spot for memory management because it lets users have a GC by default but also minimize their allocations for hot paths.
> Hence my comment: "A little bit of time spent now appeasing the borrow checker will pay off ten fold later when you don't have to deal with exploding memory usage and GC stalls in production."
But the borrow checker is strictly less effective in preventing memory leaks than a (tracing) GC (borrow checker will happily allow circular refcounts). More importantly, having to pacify the borrow checker on every single codepath when only 1-2% of code paths are ever going to be problematic is not a good use of your time, especially when you can do some light refactoring to optimize. With respect to GC stalls, these are particularly rare if you have a GC that is tuned to low-latency (Go's GC can free all memory in less than a millisecond in most cases).
> It's not about performance; it's about saving yourself the time of having to come back to your code a month later because your TODO app is using a gig of RAM and randomly hitching.
That sounds like the textbook definition of a performance concern, but again memory usage is orthogonal to GC and random hitching isn't a problem for latency-tuned GCs. Even while Rust is faster than many of its GC counterparts, this difference typically comes down to the ability of the compiler to output optimized code--not the memory management system. That said, for realtime applications, nondeterministic GCs aren't appropriate.
The situation is a bit more nuanced than this. Yes, a GC intrinsically incurs an integer factor performance cost for many codes relative to e.g. optimized C++ (I’ve optimized both). However, while ownership-based safety models are a material improvement in some cases, in other cases the borrow-checker encourages materially suboptimal software architecture e.g. in cases where ownership can only be optimally evaluated at run-time. There are other deterministic safety models often used in these cases.
tl;dr: there are significant GC performance problems that a borrow-checker doesn’t solve. I can imagine cases where the performance improvement is significantly less than you might hope.
> Yes, a GC intrinsically incurs an integer factor performance cost for many codes relative to e.g. optimized C++ (I’ve optimized both)
I'm guessing this performance difference isn't caused by GC but rather correlated with GC. I.e., GC langs tend to output relatively poorly-optimized code relative to the absolute beastly C/C++/Rust compilers or else idiomatic code results in objects scattered all over the heap (killing cache coherency) while idiomatic C, C++, Rust, etc tend to allocate coherent objects (objects which tend to be accessed in succession) next to each other in memory.
TL;DR: No doubt a GC can be slower than manually managed memory in some cases, but it's insufficient to conclude that the entire performance gap between C/C++/Rust and Java/Go/etc is attributable to GC.
No, there are major classes of macro-optimizations that are impossible to implement with a GC even in theory. Some of these optimizations are idiomatic for state-of-the-art systems code. The impact on performance is integer factor.
In my opinion, Rust is optimized to be a systems programming language. As such, we'd expect that it shouldn't really be your first choice for writing an "application". (I'm not referring to the OP at all here)
HOWEVER, Rust is such an excellent language that we all want to use it to write applications ANYWAY. That's kind of amazing in and of itself- that people want to deal with a non-GC'd language at all to write applications. Because, really, garbage collection is awesome and there's almost zero reason to avoid it unless you need extremely predictable performance, or very low runtime overhead, etc.
As far as wrapping everything in Arcs and Mutexes is concerned: Yes, that's ugly and it's a lot of extra typing. On the other hand, your performance is still likely to be orders of magnitude faster than Python for some general application-type tasks, and it will likely avoid headaches with the borrow checker, etc.
So, honestly, I don't know if I recommend doing that or not. Want I want is to be able to tell people to not use Rust for what they're doing. I'd like to have a different recommendation for a "garbage collected Rust" but there really isn't anything that I think is good enough for the title. Maybe Scala 3 (I haven't played with it yet) or OCaml when its multi-core stuff is done. Maybe F# or C# are good enough, too.
> HOWEVER, Rust is such an excellent language that we all want to use it to write applications ANYWAY.
Is this really the reason, or is it because people think they will get free performance wins by choosing the correct language? The plethora of articles detailing a developer's story trying to write X-style language in Rust shows that most are not approaching Rust with the correct mindset.
I have limited experience is Rust, but seems like most people are attracted by shiny new toys in the language. Many people have issues with OO languages, and the separation of data and function is attractive. But it lacks the run time most application developers have come to know, specifically GC and easy references.
I can only speak for myself, obviously. But, I think it's obviously about more than performance. Otherwise, people have always had C, C++, LISP, and even Java and Go that will be generally faster than Python, PHP, and JavaScript.
I'd say there are three things that most people rave about with Rust:
* traits as type classes (even if they don't know the term "type class")
* enums and pattern matching (not even as flexible as some other languages like MLs, but way more than is possible in most popular languages with only Swift coming close)
* the Iterator trait. I might write a sonnet about how much I love Iterator in Rust. It's lazy iteration, but it's also optimized at compile time to basically just be turned into a for loop.
The problem with OCaml is not multicore. If it had an easier to use build system and a larger community (so more libraries), it could be great for server-side software. But dune is way harder to use than Cargo and you can often miss libraries. Here's an example of Dark moving away from OCaml to F# https://blog.darklang.com/leaving-ocaml/, which doesn't have as much this problem because you can use C# stuff. Scala also doesn't suffer as much from it, part of it because it was really popular at some point, and part of it because you can use Java anyways.
I hope at some point we'll have this "garbage collected Rust", but for now OCaml, Scala and F# all have a worse developer experience than Rust. I could add Haskell to that too.
I've been thinking for some time that something like Go but based on a ML would fill this "garbage collected Rust" niche quite well. Maybe something built over Rust itself to leverage all the ecosystem? You could write all of the glue code in this language, have access to a large ecosystem of libraries and have an option for high performance code. This would also complement Rust nicely: I know that OCaml has a really fast compiler, which would be a breath of fresh air for the community.
> I hope at some point we'll have this "garbage collected Rust", but for now OCaml, Scala and F# all have a worse developer experience than Rust. I could add Haskell to that too.
F# on Visual Studio + .NET ecosystem (which includes C++/CLI, C# and first class support for COM/UWP) is worse than Rust?!?
How many of those things work well on Linux/MacOS? Rust, Cargo, rust-analyzer, VSCode work on every major plateform. You also have access to the .NET ecosystem but it's not as easy as the Rust ecosystem since most of the packages are C#, not F#, and idiomatic C# and idiomatic F# are quite different. In terms of resources too, F# is quite poor compared to Rust.
Edit: another thing, compiling to native is not really a first class citizen on .NET. That makes it a bit worse to distribute tools compared to Rust, Go or OCaml. Scala also suffer from this.
You can use C# from F#, of course, but then you can run into issues with nulls and things like that. There's a reason every programming language has their own implementation of JSON/XML parsing, HTTP and things like that: offering an idiomatic API is important.
Last time I use the VSCode F# plugin (around a year ago), the experience was strictly worse than rust-analyzer. The compiler errors are also not as good. I've never used Rider so I can't comment on that.
You're right about the 20 years of production code for C#. NuGet lists 241418 packages compared to crates.io 61579 crates, and there's a good chance most of them are more mature than the Rust alternatives. However that's C#, not F#, and I doubt F# represents more than 10% of the packages. That's still better than doing it yourself if there's no package or writing bindings to C though.
I still think F# and Scala are solid choices if you want a functional language today, but it's not the "garbage collected Rust" ideal I wish we had.
I think you're talking about Swift. Alas it doesn't seem to have caught on outside of the Apple ecosystem, so library support isn't great there either.
I do agree that language built on top of the Rust ecosystem (kind of like how PHP exposes a lot of C libraries) could make a lot of sense.
I don't think I am talking about Swift. The compilation is slow because of LLVM and you still have to handle cyclic references because of ref counting. As you said too, library support isn't great.
I don't know how exactly is the dev experience too, at least outside of XCode. Last time I played a bit with it, adding the language server to VSCode was a bit of a pain. Still, I think Swift is a step in the right direction.
I've heard good things about Nim too for this niche but never tried it. I think it'll suffer from the same problem of not being able to leverage an existing ecosystem though.
Nim compiles to C (through C?), so at least leveraging the C/C++ ecosystem is straightforward/first class.
It is certainly difficult to justify breaking away from the huge established multiple decade ecosystems and standard libraries seen with the JVM/.NET/Python platforms for instance.
OCaml is also pretty good at leveraging the C ecosystem, but that's not what people are looking for in a "garbage collected Rust". If they want something close to C they'll just use Rust. What they're looking for is something easier to write that still looks like Rust and having all the developer experience that Rust has. For this, the Nim ecosystem is way too small and can't leverage an existing platform. The same do apply to OCaml and Swift so it's not a Nim specific problem, it's just that Nim doesn't bring much compared to OCaml.
For what it's worth, OCaml is a great under-appreciated language forever dented by ugly (in my opinion) syntax and oddities like different operators for floats and ints. There's a good reason the folks behind the Reason ML fork changed things up.
Swift is another great language - forever dented by being tied hand and foot to the Apple holy garden and Objective-C legacy and runtime with memory-related keywords such as 'weak' and 'unowned'.
Nim (and perhaps D) are the closest in my mind to the "garbage collected Rust" in that they are not only garbage collected, but they are procedural-oriented, can write highly generic code and macros, and come with near-C performance. Given the "Pythonic Rust" title, another edge Nim has is its semantic whitespace, very Python-esque syntax.
I've looked a bit at Gluon before and it was a bit too much like Haskell for what I imagined. I don't think using monads for IO is a good idea if you want to keep the Rust approach of pragmatism first.
I don't know anything about dyon but a quick look tells me it's more meant for game development.
> I'd like to have a different recommendation for a "garbage collected Rust" but there really isn't anything that I think is good enough for the title.
Not the person you replied to, but OCaml, Scala, and F# all suffer from the potential drawback of being functional-first, and thus difficult for people to adapt to. If you are someone who is an OO programmer and are considering Rust for its potential performance and different model, it would be hard for me to recommend a language whose model is as different as OCaml's. Learning Rust entails learning the borrow checker and unlearning some of your OO habits; learning OCaml requires a sea change in your whole mentality & approach to programming if you've never experienced a functional lang before.
Meanwhile, both D and Nim have the benefits of being procedurally grounded, and hence more familiar to someone coming from (say) C# or Java.
Having said that, I think both of those languages may suffer from a lack of tooling/library breadth & depth. Nim in particular seems really cool, but it seems to be struggling to catch enough attention & interest to build a larger community -- probably in large part because it targets many of the same people who are currently high on Rust.
I don't really agree here, even if they're not used much by the community OCaml can be written as a regular procedural or OO language. There are a few thing that can look weird for people new to this (immutable by default, the type system) but they're precisely what people looking for a GC'd Rust want I think. Rust also takes a lot from functional languages, and idiomatic Rust code usually looks like functional code. I think the biggest shock coming from Rust to OCaml would be the compiler errors and the quantity of online resources. There's a world of difference between the two, and they matter a lot when learning. I think the same is true for Scala. I don't know if F# is the same but since it takes from OCaml it may still have the procedural capabilities, and it seems it still has the OO features of OCaml.
I do agree that D and Nim may be easier to approach from C# or Java, or even Python for Nim or C and C++ for D. These already represent a huge chunk of programmers, but not really what the typical person looking for a "garbage collected Rust" is, in my opinion. Especially since Java and C# programmers already have Scala and F#. While they're both different from what they're used to, they leverage the same plateform and can use the same libraries which is a huge plus.
I agree with your conclusion too, and I think OCaml suffers from this too. Scala and F# have at least the libraries part covered.
> I'd like to have a different recommendation for a "garbage collected Rust"
I have exactly the same problem. Rust is nearly perfect to me, due to its non-nullable type, traits, package manager. It's just that losing garbage collection is not justifiable in my use cases. I've researched several languages but nothing "clicks" unlike what Rust did to me several years ago. C# might be my best bet but it lacks statically compiled binary. Go is my next bet but it lacks a sophisticated typesystem. Kotlin is my third bet but the Java toolchains are slow and too "enterprise-y". Swift is too Apple-centric and has weird corner cases like you can't change the order of the keyword parameters, due to its Objective-C origin. Life is full of trade-offs, I guess!
This is also a problem with Rust conversations and tutorials online: There's a lot of talk about what not to do or what you can't do in Rust, but it's rarely followed up with an explanation of what users should do.
Mutexes aren't an entirely foreign concept to many programmers. Obviously if someone can architect their system in a way that doesn't require Arc<Mutex<_>> then go for it, but we need to be careful about giving blanket advice without alternatives.
Indeed, this blanket advice does not apply in situations where shared resources that require mutable access need to be synchronized. If my async HTTP router is serving two requests that are using the same database client to persist changes, that needs to be synchronized. How does one do this without a mutex?
The lifetime/borrow-checking is the mutex. If you've got a database client that must only be used by one caller at a time, make a function that takes the client as &mut.
By acquiring and releasing resources. I’m not intimately familiar with Rust, so I’ll use Go as an example.
Create a one-buffered channel: `make(chan T, 1)`. To acquire, receive from the channel. If the object is in use, the receive will block (goroutine put to sleep) until it’s available. To release, send the acquired object to the channel.
>Folks really ought to read the C/C++ literature to understand why Rust evolved in a unique direction. That gives better compare/contrast.
Sounds like a good idea for the post-singularity age of infinite lifespans, but what folks really need to do is learn what they need to learn as best they can when they need it.
Maybe if one just wants to scrape by. I wouldn’t want systems code from those folks to be honest... Maybe that’s fine up the layers but the whole point of Rust is correctness and stability—to get the that point, maybe invest a month hitting the books?
> Folks really ought to read the C/C++ literature to understand why Rust evolved in a unique direction. That gives better compare/contrast.
Unfortunately the C/C++ literature is dense and not at all approachable. The Rust literature is much, much better for newcomers to systems programming (partly because it doesn't have to cover a load of weird failure cases that simply don't exist in Rust).
> one _cannot_ just arbitrarily “go” from a higher level language to a systems language.
I mean, it's like learning anything new. You have to do a bit of unlearning, and grasp the core concepts. I don't think there is anything especially difficult about systems languages. I had a background of JavaScript and PHP, and I was able to pick up Rust well enough to use it in my day job in a couple of weeks.
C/C++ has been around for a while and there is gobs of literature. For this I’m guessing you mean “the website literature.”
Many developers these days scarcely read books on subject matter. But if they did read books, they would find there’s actually more C/C++ literature and established best practices than there is for Rust (naturally).
The effective C++ series comes to mind as something not just approachable, but enjoyable and largely insightful.
The truth is... systems engineering is a discipline which requires an understanding of historical context. You get that with the C/C++ books! I highly recommend reading on it to better appreciate Rust.
Whenever I see the word "pythonic" I am cautious. Not all languages are Python, nor should they all become Python, nor are they worse for _not being Python_.
Python has a reputation as "executable pseudocode", a language that makes it easy to express the essence of what you're actually doing without clutter or ceremony. That's a worthy design goal for every language IMO.
I suspect that every language in the world can be clear and direct. I am sure that some people can write unintelligible pseudocode.
We don't expect speakers of other languages to write more Englishonic phrases, although of course that does happen. (English speakers often criticise people for not speaking good English.)
Every language is readable by and intelligible to people who know the language.
Pseudocode is a useful concept. There is no "correct" pseudocode because it is meant as a DSL for _thinking_. Any non-trivial implementation of thinking will quickly become more detailed and more subtle in notation than a shopping list (for example). Opinions about the subtlety of notation abound.
I am not aware of any high-level language that makes a shopping list hard to understand. Above that, we are deeply into opinion about what defines "clutter", "ceremony", and effective notation.
We pretend all human languages are equal for political reasons, not because it's reality. I would hope that programming language preference is still neutral enough that we can talk about what things different languages do better or worse. There are plenty of things wrong with the Python ecosystem, but the language syntax is widely known as a success.
> Hmm, what is the "non-political" reality in the question of equality of languages?
Human languages aren't equal; plenty of languages have problems either in specific areas or just generally, and any academic linguist can tell you (off the record) which.
> There are other successful languages, too.
Python is probably the most successful teaching language; I can't think of another language with a better reputation syntax-wise. (And if another language has such a reputation, it's probably because that language is also "Pythonic").
Just want to point out that Rust and Python live on opposite sides of the spectrum. A systems language requires good understanding of systems (whether rust or c/c++) and also some assembly language. So Pythons “ease of teaching” actually could yield bad systems engineers, because it shields users from the system...
Honestly, I would personally teach plain C for algorithms because it has the simple for loops and while loops, tail-call recursion, arrays and structures.
> Pythons “ease of teaching” actually could yield bad systems engineers, because it shields users from the system...
Not necessarily. Most of what it "shields" users from is unnecessary ceremony and clutter rather than relevant system behaviour.
> Honestly, I would personally teach plain C for algorithms because it has the simple for loops and while loops, tail-call recursion, arrays and structures.
C's loops are not simple, C implementations do not generally have tail calls, C makes the distinction between arrays and pointers far too subtle (which not only confuses learners but also causes bugs in real code), C's structure support is bad (primitive unchecked unions, but no support for proper disjoint sum types). It may have been useful as a portable assembler at one point (back when it was compiled in simple fashion), but it's not a good language for anything these days.
> any academic linguist can tell you (off the record) which.
Hmm, once again you allude to unspoken truth. What is your understanding of this unspoken truth about which languages are "better"? I assume that if you hold an opinion, you can present it clearly without vague references to unnamed sources.
> Python is probably the most successful teaching language; I can't think of another language with a better reputation syntax-wise. (And if another language has such a reputation, it's probably because that language is also "Pythonic").
In pseudocode, what you have written is: I assert P because I think P is best and prefer P; also, any other language that meets my requirements is P.
> I assume that if you hold an opinion, you can present it clearly without vague references to unnamed sources.
I don't want to get banned.
> In pseudocode, what you have written is: I assert P because I think P is best and prefer P; also, any other language that meets my requirements is P.
Your pseudocode has a type error.
I don't much care for Python myself. But I believe languages with reputations for good syntax are Python-like. If this isn't so, it should be easy to provide a counterexample: a language that has a reputation for a good syntax that isn't Pythonic.
> I believe languages with reputations for good syntax are Python-like.
"Only syntax like P has a good reputation". That is your opinion and I uphold your right to hold it. I would however like to understand it better.
Given this dependency on the "reputation" of "pythonic syntax", whom do you accept as "recommenders" for reputation? What constitutes a "good reputation" to your mind?
Do the importance of project, or perhaps durability of code have any value in this reputation?
Is it possible that more detailed notation has a purpose, or is it always "clutter" because it is not "pythonic" or "Englishonic"? Math notation itself is eloquent in the extreme, yet it is not "pythonic" and certainly not "Englishonic".
Is it enough that millions of other people use different tools? Literally billions of other people speak and write a language that does not have the "Englishonic" properties of English, for example. Some of these languages have a notation that is superior to English. [see GB Shaw on English notation]
You say quite clearly in another comment that "C" is not a good language, despite its importance as a language and its influence on several other important, long-serving programming languages. I will just cite the TIOBE index as something tangible; if imperfect, it is at least not a mysterious allusion.
Take one of these languages. Javascript, for example, is a very successful language. Apart from the "ecosystems" of languages, I assume you accept that the notation of Javascript is -- by any measure other than aspersions -- a successful notation system.
> Given this dependency on the "reputation" of "pythonic syntax", whom do you accept as "recommenders" for reputation? What constitutes a "good reputation" to your mind?
Do you actually disagree with me about the reputation of different language syntaxes? I don't have a specific list of influences, just a general impression from e.g. HN-like discussion sites, programming meetups, work colleagues...
> Do the importance of project, or perhaps durability of code have any value in this reputation?
No - we're talking solely about syntax.
> Is it enough that millions of other people use different tools? Literally billions of other people speak and write a language that does not have the "Englishonic" properties of English, for example. Some of these languages have a notation that is superior to English. [see GB Shaw on English notation]
I see your last two sentences as the demonstration that the answer to your question is "no". A language may become very popular despite having a very poor syntax, and even clear improvements to syntax are often not adopted.
> You say quite clearly in another comment that "C" is not a good language, despite its importance as a language and its influence on several other important, long-serving programming languages. I will just cite the TIOBE index as something tangible; if imperfect, it is at least not a mysterious allusion.
Sure, and I'm aware that this is a controversial view. I think few would defend C's syntax; rather they tend to claim that it has good performance or is close-to-the-machine (views that I disagree with, but would acknowledge the popularity of, and be prepared to argue my case against).
> Take one of these languages. Javascript, for example, is a very successful language. Apart from the "ecosystems" of languages, I assume you accept that the notation of Javascript is -- by any measure other than aspersions -- a successful notation system.
I don't accept that a language being popular means it has good syntax, if that's what you're saying.
Your notion of "pythonic" syntax being superior remains vague. You haven't clearly said what non-biased measure supports your claim.
"Pseudo-code" itself can be anything, as it is a term coined for a loose notation of ideas.
If your objection were "line noise" (i.e. non-Pythonic or non-Englishonic characters), the bias in this is striking. There are other languages that are as effective or even better. The fact that we may not understand these other languages is not an elevated argument for converting them all to a language that we do understand, except perhaps as a study exercise.
As far as English itself is concerned, it is a disastrous confusion of phonemical artefacts. People learn to speak and write English despite the major problems with the writing conventions.
You say "reputation" is your measure, but this is nothing more than "group opinion". You responded that professional popularity is not your measure, so C, Java, and Javascript do not receive your approval. Yet they are successful languages. They are not perfect, but neither is Python.
As far as human languages are concerned, your position is untenable. A fluent speaker speaks $language and understands $language in its subtleties. A person who does not speak $language understands little (or perhaps nothing) and is either trying to learn or has installed opinions instead of knowledge.
> You say "reputation" is your measure, but this is nothing more than "group opinion".
One could say the same about e.g. the scientific consensus on a given topic. Ultimately anything nontrivial in today's world relies on other people.
> You responded that professional popularity is not your measure, so C, Java, and Javascript do not receive your approval. Yet they are successful languages.
They are popular but not for their syntax. Their fans and advocates largely admit as much. Plenty of professionals will say things like "I use language X despite its cumbersome syntax, because ...".
> As far as human languages are concerned, your position is untenable. A fluent speaker speaks $language and understands $language in its subtleties. A person who does not speak $language understands little (or perhaps nothing) and is either trying to learn or has installed opinions instead of knowledge.
And yet it's possible to be fluent in multiple languages, and also to study languages in an objective way without being fluent in them. Linguistics is a legitimate field of study with a wide body of existing research (computer language design, on the other hand, has not yet reached that level of maturity).
Much of Python's influence is actually from Modula-3, not C++. The only reason C++ is similar is because C++ also borrowed liberally from Modula-3.
In particular, Python's import, exception handling, and object systems are based on Modula-3's (though IIRC it did also borrow some C++ innovations for the object system).
(Modula-3 is the Velvet Underground of programming languages: your average programmer/listener has never heard of it, but it influenced so many languages/musicians you have heard of)
>Anyway, to reiterate: folks interested in systems languages should read the C/C++ literature and actually become a systems engineer first.
This is an extremely weird take. So before I can code some rust I should first get a new career and master C/C++? Talk about barriers of entry and gatekeeping. I think I may have toyed around a bit with rust - because its fun, and as I am not a systems engineer (just an other type of engineer) I am deeply sorry.
Couldn't have summed it up better if I tried. Rust is an awesome language but at the end of the day it's a systems language. As such it's not particularly suitable for writing just any type of application. Seems it shares one of my many problems with javascript: The fact that everything is being written and re-written in javascript(even if there are different reasons for both). As I've said a million times, "just because you can, doesn't mean you should". The endless Arc<Mutex<_>> is a symptom of it. In larger application this turns into an endless hell of read/write locks and sooner rather than later it blows up straight into your face. Which is further amplified by the somewhat recently introduced async/await. And let's not overlook the many incompatible tokio versions and the fact that tons and tons of crates use different versions of tokio: Just as everything looks buttery smooth, BOOM: thread 'main' panicked at 'not currently running on the Tokio runtime.' Damn it... Awesome... Which one is it this time...
On the subject of python, I think Rust has a very powerful niche it could take: slightly modifying an old Google philosophy(which they can't stick to anymore for a million and one reasons of course): "Python where we can, Rust where we must". To my mind it could be a very pleasant recipe for a large number of startups.
I come from C# and Rust absolutely drives me bonkers. I try to reference all the things all the time, cause its ingrained in my brain as more efficient than copying stuff around or/and requesting things from hashmaps. The first thing I've tried to implement was a string interning module and I was stuck on it for three days (I still don't know how to do it without duplicating each string twice). Rust is completely orthogonal to the concept of a nice garbage collected graph of objects.
On this very specific point, for this specific problem, I think that `Rc<str>` is criminally underused. It's a great type if you need a heap allocated string, but want to hold on to copies of it in various places; and it derefs to `&str`, so you can use it transparently with most of `std`.
Maybe check out one of the existing string interning modules? There are some good ones. I believe it's common to internally use indexes instead of references to get around the borrow checker in these kind of situations.
That's my impression with Rust for now - you use indexes instead and borrow as needed. Long lived borrows are a no go, I was shocked to figure out you cannot return a new thing and a reference to it from a function.
I have a similar experience. I initially naively thought that implementing data structures would be a good exercise in grasping syntax and ownership. It's usually one of the earlier exercises I do in learning a new language.
However data structures in Rust are not at all straightforward. Especially implementing traits, such as an Iterator for a Hashmap. And std implementations use loads of unsafe so beginner me became increasingly confused as to whether these things were not expressible in safe rust.
> I try to reference all the things all the time, cause its ingrained in my brain as more efficient than copying stuff around or/and requesting things from hashmaps.
Many people don't realize that actually referencing all the things all the time is LESS efficient than copying stuff around on modern architectures, unless you copy really big things. This is because the majority of improvement of memory access in the recent years is seen in sequential speeds but not latency.
I’ve never tried that so i don’t know the challenge but my first thought was that it doesn’t seem incompatible with rust’s ownership model - assuming an interned string has static lifetime and then hand out immutable references?
I wanted to drop strings which are not used anymore, you do need Rc for that (no way around it I think), but my problem was that I also needed that strings as a key to HashMap<String, Weak<String>>. So I use that HashMap to check if the string is interned and give a new counted reference as needed and also periodically check any weak refs that should be dropped. This works, but the key better be immutable (I think?) so you cant do HashMap<Weak<String>, Weak<String>> (HashSet?) and hence need second copy as a key.
That's an interesting case. This won't solve your whole problem, but you should be able to make the key the hash of the string and then have a HashMap<u64, Weak<String>>. I feel like there's a pretty solution where you stick the whole map in a Rc and stick that in every interned string. Then strings can remove/lookup themselves. To make that work you'll want some sort of concurrent hash map.
But how would you handle hash collisions if the value is just a Weak<String> rather than some kind of nested container type?
I started thinking along the lines of an arena allocator, that would allow a shorter lifetime than static so you could remove unused references by dropping the arena and moving over still live references to a new arena. I guess arenas the wrong term with that behaviour, maybe generational mark and sweep garbage collector is closer to describing this approach.
It a avoids the double reference though of the Rc<> model.
What about using a HashSet<Rc<str>>. And returning a type that implements Drop. The drop method can remove the string from the cache when the last reference is dropped. Since the cache must be accessed by the interning function and drop, I used interior mutability. By putting the cache behind a Rc<RefCell<>>.
With the same principle, we can use a static, arc and mutex to use a the cache as a global variable. And implement thread-safe string interning. Here, locking the cache at the wrong time during the drop can lead to a race condition.
To make the InternedStr behaves a bit more like &str or an immutable String, I implemented Clone, Display, Pointer, conversion from &Interned to &str. Ideally it should probably resemble how String de-references to &str for everything (Deref<Target = str>).
> I share the author's conclusions: don't do that. If you find yourself slapping Mutexes and Arc/Rc all over the place it probably means that there's something messed up with the way you modeled data ownership within your program.
That's just not a very useful statement to a beginner. Rc/RefCell and Arc/Mutex have their use, for data with patterns of lifetime or mutability that cannot be derived at compile time based on program syntax - something that comes up all the time in practical programs. Of course one should seek to refactor these things out whenever possible (among other things, programs that use shared data heavily are also harder to survey wrt. correctness), but to say as a generalized statement that there's no case for this feature is just not correct. The feature is also intentionally heavy on syntax so there's no missing where you're using it.
Yup, definitely a frustration I have reading through rust tutorials as well. So many of them will get out of memory problems by "Oh, just add Arc around it".
I’m new to Rust, and definitely guilty of all this :) do you know of any good resources which discuss transforming your data model to align with the borrow checker? Or is this just something that comes naturally with time?
I don’t agree with a lot of this post but I think it does a good job pointing out that porting a library from ecosystem A to ecosystem B is not always a good idea. A good API design in one language can be a horrible one in another.
Python doesn’t expect you to share references to things - it’s just that the API that was being copied follows an imperative style that mutates objects.
But the predominant concern in a lot of porting efforts is interoperability with developers who are already familiar with the current API. Porting the API to feel very Rusty would be a mistake when your audience is people who already know the Python API. Python itself has adopted some decidedly non-pythonic APIs for the same reason.
In the Python world if this bothers you you're supposed to use the adapter pattern to Pythonify the mechanically ported API.
I would first start by asking - why am I porting an identical API? Why am I not going back to first principles and rethinking from ground zero? Usually there is a compelling reason to switch from A to B, likely for performance in this case (going from py -> rust). Any programmer worth a salt knows that nothing in life is free - and everything has a tradeoff. In this case the tradeoff (should) be that the API might change, but the performance benefits will be worhtwhile.
For instance - lets look at sequential operations against a DB versus batches. In a sequential style, you might iterate your items and just write them one by one. Alternatively, in a batched style you might need to do things like prepare queries or store your queries alongside the actual data values, then hand it all to a batch mechanism that will perform the write. The ergonomics are completely different but at the end of the day the result (rows in the table) will be the same. So even in the same language you will see totally different ways to go about solving the same problem. This is why I do not really see this as a python/rust argument and more of a generic program architecture argument.
In the above db case - the programmer isn't disappointed that they need to rewrite their iterative loop approach because the know that the batching approach is going to be much faster and will achieve their goals.
I think the same parallel can be drawn here: it is not a bad thing to have a slightly different API. Especially in the context of going from a dynamic language to one that is compiled and more performant.
All of this is kind of a moot point though because the Rust lib in question doesn't have any instructions/documentation or design docs so it is hard to say for certain what the intention is here other than a port for the sake of porting.
Mentioned this briefly below, but this was largely an academic exercise: there is a lot of existing code written (in python) against the python API in question, and I was curious to see how feasible it would be to reimplement that API in such a way that this existing code would continue to work with minimal modification, but now running on top of a rust implementation.
What's additionally challenging in this case is that the design of the underlying rust API (the norad crate[1]) was also more or less done, so this really was just a matter of trying to shim.
In any case, I think we more or less agree; just trying to provide a bit more background on the motivation. This was originally just circulated as a gist between a few interested parties, who were largely familiar with the motivations; it certainly didn't occur to me that it might be interesting to a general audience.
Arguably you'd want to write in idiomatic style for the ecosystem you're working in. It doesn't help when the standard library plus any other libraries you're using offer rusty apis and this one library is pythonic.
> interoperability with developers who are already familiar
I wouldn't say it's "predominant". Some instances are code authors trying to bring their library to new audiences (c/c++ frameworks with multiple language bindings), or developers unfamiliar with the language but want that specific API in their own (Python's requests library is a great example, cloned in many languages now).
font = Font.open("MyFont.ufo")
glyphA = font.layers.defaultLayer["A"]
point = glyphA.contours[0].points[0]
point.x = 404;
assert point.x == glyphA.contours[0].points[0].x
The author apparently wants to be able to perform ad hoc modifications to an existing font.
Unfortunately, it's not clear from the article what the use case is. Most of the time I've worked with fonts, they're treated as immutable values.
If the idea is that a font needs to be built from a serialization format, then an alternative approach would be to eliminate the python mutable interface and replace it with a build function that calls the underlying Rust to build a Font. That way Python never needs to deal with mutating a font.
Rust lets you take unlimited numbers of references to an immutable value. That's not a problem, other than defining lifetimes if you pass those references around and/or hold them in structs.
The problem arises the instant you want to mutate the value behind a reference. Hold just one active immutable reference at the same time, and that won't be possible. Note that mutable and immutable references can in some situations exist in the same block thanks to non-lexical lifetimes.
There are a lot of existing python scripts that run on top of the existing python `ufoLib2` library, and the initial motivation for this work was seeing what would be involved in creating a wrapper API that would allow all of these existing scripts to work transparently on top of an existing rust crate.
Exactly. If I do glyphA.contours[0].points[0].x = 1
what happened to someone else holding a reference to the same glyph? To a python programmer it might not be a surprise to see such an API (a shared mutable value), but to most OO programmers I hope this API looks very strange.
The conclusion in the article that using a getter/setter would fix most of the issues seems much more straight forward. Maybe I don't write enough python, but I rarely think modifying properties on a library object is preferable over clearly named methods.
How the object is mutated isn't really relevant to the smell. It's that an object can be mutated while someone else has a reference to it (and that the API is designed that way),
Honestly, the Arc<Mutex<T>> approach is the right one here. We need to hand out shared references, and that’s what that is. Otherwise there’s a ton of bookkeeping, and throwing exceptions in code that could otherwise reasonably work.
This approach assumes of course that there are no reference cycles in the graph. If there are, you’ll need some way to clean them up if Weak isn’t sufficient.
Alternatively, is there a way to ask Python to allocate memory for you, which is then subject to Python’s GC? If so, you could make a smart pointer type for this and use that everywhere. Though you’ll still need to figure out how thread safety works. So this really would only be for handling reference cycles.
> python expects you to share references to things. When you create a binding to a point in a contour, that binding refers to the same data as the original point, and modifying the binding modifies the collection.
Honestly, as someone who has written way more Python than Rust, this seems like bad API design regardless of language. This screams "it's impossible to take a font, make two separate modifications to it, and then work with those separate modifications at the same time", because deepcopying objects is usually very difficult.
Passing pointers around is a pretty common pattern. It avoids allocating memory and copying data unnecessarily. Seemed like a decent choice in the late 80's and early 90's, probably all the way up through mid-2000's. It's still fine as the default for many situations.
> It avoids allocating memory and copying data unnecessarily
My experience is the exact opposite. Pervasive shared mutability leads to developers making lots of unnecessary copies out of fear that some other part of the code will change values out from under them.
Maybe it's just the kind of work I have done (web services, where objects don't live longer than a request), but I have never had a problem of "some other part of the code changing values out from under me" in Python or Go.
I go with the "we're all adults here" philosophy and trust my colleagues not to inappropriately mutate the things they receive. It's worked well enough so far. I understand why it might not work so well for other people.
Naive question: in Python, isn't this solved by .copy and .deepcopy if you really have to get a copy of an object rather than create a new instance of something? I'm curious where the bad API design is, unless you're saying that Python's assignment behavior itself is bad and everything should be immutable by default.
Even in Rust, the `Clone` trait produces a deep copy.
Copying only up to a specific depth in a general way is what is difficult.
I would argue that shallow copies, id est the depth being 1, are far more difficult to realize that copies of unbounded depth until a value o trait `Copy` is reached, id est a type that is purely encoded by the data it takes up on the stack and owns no further data and can thus be fully be `Clone`ed by copying it's stack bits.
> Even in Rust, the `Clone` trait produces a deep copy.
Clone is not really a deep copy. I like the description that says it is "deep enough to give you an object of identical/independent ownership (of what you started with), but no deeper".
Example: Rc<String> when cloned only gives you another handle to the Rc, the string data is not duplicated (for this reason we don't call it a real deep copy). You get a new rc handle that is on equal footing with the old handle.
There is plenty of Rust types that consist of tree-shaped ownership relations with no frills - in these cases clone and deep copy are identical. Let's say for example a HashMap<i32, String>.
I think every "deep" copy ultimately has some limitation like this. Even in Python, an object can implement __deepcopy__() to prevent copy.deepcopy() from "copying too much", and things like file objects, classes, and modules get special treatment.
I found that in Rust, you don't have to think about the distinction so much - not like you have to in for example Python and C# (pervasive/transparent object reference languages). In Rust it's predictable and mostly obvious which data is shared ownership or shared, and which is not. We don't need to talk about deep vs shallow copy semantics much in Rust. :)
It's not intrinsically difficult, but in Python it's arguably un-Pythonic and somewhat awkward. Like, sharing references has built-in syntax that's taught to most Python programmers within their first hour with the language, whereas copying - without writing a tedious manual implementation - requires importing the `copy` module [0], which outlines a number of pitfalls, quirks, and caveats with the process, and that I've literally never used so far as I can recall in a decade of professional Python development.
You have to pass around the whole object you want to copy for that to work. For example in a compiler, if you function only sees an AstNode, it can't make a deep copy of the whole AST easily. You also have to be careful with things like objects that contain refcounted pointers.
Speaking of.. is there an article about writing Pythonic python? I think that would be extremely useful to various team members. Sometimes the official documentation is too verbose.
Raymond Hettinger's talks are great. I know it's not a succint blog post, but they are out there are watching a video is very approachable.
Maybe https://www.youtube.com/watch?v=T-TwcmT6Rcw (Dataclasses! We could cheekily say Python gets better at something Rust does - dataclasses makes Python better at records.)
Another (2013) classic: https://www.youtube.com/watch?v=HTLu2DFOdTg it is very well known, and it has the very memorable advice: what's a class that only has one method? That should be a function!
> Dataclasses! We could cheekily say Python gets better at something Rust does - dataclasses makes Python better at records
It's not clear if you mean python got better at storing records than previous python or if it got better at storing records than rust.
To be clear, dataclasses make python better at storing records than what was previously available in python but still not as good as Rust (serde, immutable defaults, no perf penalty with immutability).
>>> import this
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
I've often wished Python indexed more on "simple is better than complex" and "There should be one-- and preferably only one --obvious way to do it". Python has become quite complex over the years, including its object system, its tooling, ecosystem, and even many of the standard library APIs (thinking of you, `subprocess`!).
It’s a journey, to be honest. Very hard to prescribe one solution because there are different ways people think about this question (syntax or architecture?) and different cultures around tools (like basing a web app on a heavier tool such as Django vs a lighter tool like Flask) and then of course battles over line lengths, types vs no types, etc…
(I also divide Python into two big camps: building software/apps vs. data science and analysis, which further subdivides the community - if you ever read a post on how to do XYZ from the perspective of a data hacker it will usually fall into the non-pythonic category)
There is a fantastic book in the Ruby world called “Eloquent Ruby” but I have yet to encounter an analog for Python.
RealPython has some cool posts, though. I think they’re doing the best job pushing more modern/clean practices forward at the moment.
I checked this thread and the first thing I searched for was nim-lang[1]. Looks like nobody mentioned it yet.
Nim/Nim-lang makes it much easier to write Pythonic code. Minus all the complex things that only smart(er) programmers get(like GC, etc.), with Nim you can write code that is much more readable.
Another win for it is that it has very limited support for OOP, which IMHO sucks in terms of readability and understanding what the code is doing(as compared to imperative code).
What is the idiomatic way to mimic inheritance in Rust, when writing bindings for OO languages? I'm new to Rust but I imagine there are ways to mimic upcasting/downcasting its types?
With the 'as' keyword you can cast between types [0]. Instead of inheritance, you can implement Traits for objects [1], which works more like composition.
How expensive is this kind of casting in Rust? Like, if I have a struct which has a superset of attributes that I need for the type I'm casting to, do I need to construct a new struct instance just to have something to treat as my target type, or is there a simpler way that doesn't involve new allocation?
Given Superset struct A, some other struct B that happens to have a subset of A’s fields, and some function foo which takes an argument of type B, you could
1) implement From<A> for B, and use that to convert instances of A when you want to call foo. This would probably involve some copying and/or allocation.
2) back up and turn foo into something that takes a trait argument instead. Now instead of taking instances of A, what it needs struct to provide is defined by a trait, which you can provide implementations of for both A and B. Now you can pass either to foo.
3) potentially, depending on the layout of structs, create a union and do some type punning to convert between them. this requires unsafe and you’d better be right about the struct layouts.
4) just make foo take a tagged union of the two structs. this isn’t unsafe but has different storage tradeoffs. also foo would need to handle each case separately.
5) find some other way to skin this cat. there’s plenty
#2 would be idiomatic in most circumstances I think?
It really depends on how you're implementing it. If you're doing a clone/deepcopy-style conversion, or if you're consuming the value you're casting from. However, since most values in Rust are placed on the stack, "allocating" them is incredibly cheap. Moving them around is often also cheap, and involves just a few `mov`s from registers to registers, or stack-address to stack-address.
I actually end up generally only using polymorphism in Rust and have never needed inheritance. For polymorphism, as other comments have mentioned, you should look at trait objects (`&dyn T`) or generic parameters for monomorphism.
If absolutely necessary, Objects can be downcast by using Any (though I've personally not needed to write Rust code that needs this): https://doc.rust-lang.org/std/any/trait.Any.html. Edit, instead I tend to rely on Rust enums (for it's algebraic type features).
Also, in some cases you may end up wanting to implement Deref or DerefMut, https://doc.rust-lang.org/std/ops/trait.Deref.html, but this shouldn't be use to create inheritance, it's more used for getting references from one type to another similar type (like String derefs to &str for example).
I think OP's first approach (with some combination of Arc/Rc and Mutex or just plain RefCell) was probably on the right track and I'm not sure why they chose something else. They don't really elaborate on this?
> This was my initial approach, but it started to become pretty verbose, pretty quickly. In hindsight it may have ultimately been simpler than where I did end up, but, well, that’s hindsight.
If you have python code and trying to generate equivalent rust for performance, please give py2many a try. Looking for feedback.
Also experimental support for pyo3 extensions. But really the main idea of the project is static python and complete recompilation (not interoperability with dynamic python code).
Aggressively shared-xor-mutable. If an object A has a reference to some other object B, either B is (deeply) immutable or A has the only reference to B. This means that in general, data is only held by some code as long as is needed to operate on it, at which point the data is relinquished.
Personally, I find OO designs to be enhanced by this principle, so I don't think it's only something one does in Rust. I certainly learned it from Rust though.
For what it's worth, following this pattern informally is usually a good idea anyway in Python (and Lua and Ruby and Javascript etc).
Even if B is technically mutable (which 99% of the time it is because almost everything in Python is mutable), just don't mutate it and pretend like you're not allowed to do so.
> Even if B is technically mutable (which 99% of the time it is because almost everything in Python is mutable), just don't mutate it and pretend like you're not allowed to do so.
This gets you pretty far, but it's hard to ensure that nobody else ever mutates B. If B is technically mutable, and there are multiple mutable references to B, then when you invoke some other object in the course of your work, control might come back to you with B mutated without you realizing it. This is why the XOR is so important: if B is shared (with you) and mutable (not by you), B could change under you while you've passed control temporarily to someone else.
This is certainly less problematic than having multiple mutable references, but it's still a source of complexity. As a very small toy example, consider iterating forward over an array while deleting elements.
One of my frustrations when working with Python is that it has been adding very useful FP-inspired tools, but any random library I invoke can pull the rug out from underneath me.
I have long felt that retrofitting a language to add support for immutability/FP constructs is better than nothing but significantly worse than starting with immutability as a core principle.
Lua makes it relatively easy to do this to a table, by assigning a __newindex metamethod which throws an error any time code attempts to assign to a field of a given table.
Which you can get around with rawset, that's what it's there for, but it will catch any idiomatic attempt to mutate the data.
I do this! Dataclasses and Enums as the foundation of program layout. No inheritance. No getters/setters. No @property. Judicious use of typing and function / class doc comments.
The `Arc<Mutex<T>>` approach here would make me worried about reference cycles, unless the type relationships here look like a tree/DAG. Out of curiosity, if I store references to other things as opaque `pyo3::PyObject` objects, would the CPython cycle collector be able to see these references and collect their cycles?
Can someone explain why they need to implement this in Rust rather than python? What operation is the author doing to a font that is compute intensive enough and happens often enough that you would gain anything by implementing this in a complicated low level language like Rust?
Font projects (especially multi-script ones) can amass (tens of) thousands of small XML files in the UFO format. Rust can load and save through them in seconds, Python can't :)
The world needs fewer python idioms rather than more. Python is a sprawling language with an identity crisis stemming from its orthogonal object oriented and functional features. Why impose this mindset on Rust? Perhaps python, or at the very minimum the python developer, has much to gain from other languages instead.
Then that leads to frustration and wondering why you even got rid of the GC in the first place if you end up with a crappier non-transparent reference-counted garbage collection with all these Arc/Rc. Some devs even seem to think that non-GC devs are silly poseurs who refuse to have a garbage collector only to reimplement it manually with reference counting. We are silly poseurs alright, but we're more efficient than that!
I share the author's conclusions: don't do that. If you find yourself slapping Mutexes and Arc/Rc all over the place it probably means that there's something messed up with the way you modeled data ownership within your program.