> One of them is called `error-chain` and another one is called `failure`. We didn’t realize these existed and we aren’t sure if there’s a standard approach.
`error-chain` is in maintenance mode these days; `failure` is its spiritual successor and seems that it's on the path to becoming a community standard eventually.
I'll add that bindings Chucklefish did for Lua via rlua[1] are some of the best I've seen. I had Lua up and running from scratch in a project within 5 minutes which is pretty darn awesome.
Given other comment dismissing it, I'll do a quick review of that Chucklefish paper. Here's what I noticed:
1. A case study of Rust used in something performance-critical. People pushing or just assessing it like to see those.
2. They say they're all in on modern C++ but still trigger undefined behavior and crashes a lot in practice. Member's of Rust team regularly claim that happens despite C++'s safety improvements over time. In this case, it was happening in C++ but not in Rust with C++ developers using both. Far as learning complexity, they're both complex enough that C++ coders should be able to learn Rust. So, the main drawback is in both languages. Looking at complexity vs reliable iterations, Rust provided an advantage in knocking out problems that were slowing the C++ coders down sometimes by "hours" of debugging.
3. On parallelism and concurrency, their prior method was getting a single-core implementation working first that they transformed into something for multi-core. This was giving them a lot of trouble. That there's lots of ways to implement concurrency in C++ means I can't be sure if it was due to the language, what library/framework they were using, their own techniques, or some combo. Also, I'm not going to guess since I don't use C++. :) Regardless of what they were doing, the alternative approach in Rust let them get a lot of stuff right on the first try. So, it was easier for those C++ coders to do multicore in Rust than in C++. It's evidence in favor of Rust teams claim that Rust makes concurrency easier with less debugging given they were newcomers immediately getting good results. However, I don't think it's evidence of anything comparative between Rust and C++ concurrency without knowing what they were doing in C++. As in, some C++ coders might be getting better results with different methods where gap between them and Rust might be anywhere from zero to smaller than this case.
4. Finally, handling platform differences was so much easier as newcomers using Rust's tooling than it was for them as C++ veterans that they still saved time overall in Rust despite having to implement Rust support for game platforms themselves. That's strong evidence that Rust's package manager or platform tooling is excellent with some anecdotal evidence it's better than C++ at this cross-platform, use case.
So, there's a summary of what I got out of the case study for anyone that might find it useful.
EDIT: That Reddit thread has some great comments about floats, too. Problems, solutions, Rust vs C++ handling, and so on.
How did it break? If it was just a matter of syntax then error-chain does have support for attributes on the enum variants, or you can look at derive-error-chain for a more regular syntax instead.
I'm not sure what you mean - this looks like a very conventional whitepaper to me.
Wikipedia says
> In business, a white paper is closer [than the government meaning of a policy document] to a form of marketing presentation, a tool meant to persuade customers and partners and promote a product or viewpoint.
If you have more specific feedback about what you'd expect a whitepaper to look like, we'd love to hear it. We surveyed some stuff and didn't really find much commonality, but would prefer to follow conventions if there are any!
I do think that some data would be good, but at the same time, the kind of data that's relevant here is really hard to get in any concrete terms, so might actually be harmful.
> a "paper" of any sort
Remember that this isn't a paper of just any sort: it's a whitepaper. So it has a specific audience: CTOs, VPs, and other technical management types. Audience is important for what you have to include in any work.
Oh! Actually, I'm paging through "Rust in production at Figma" (https://blog.figma.com/rust-in-production-at-figma-e10a0ec31...) and I think this is closer to what I would expect in a "paper" - while acknowledging that am predisposed toward thinking "graphs and data" when I think of a paper, I do think these kinds of blog posts have more of an impact (and seem more true to the idea of a "paper") on me than the current Rust whitepapers.
That said, I will happily concede that I have no idea who the target audience is for the whitepapers and probably don't know the accepted industry definition of a whitepaper. However, I'm very happy to see Rust flourish either way.
It's not supposed to be a scientific paper - it's narrative a report on their experience doing something. Obviously not all papers have data in them - not even all computer science research papers have data in them!
Tangentially related, but if any of you work with designers or are yourselves designers, please give Figma a try. It is identical to Sketch in so many ways, better in some others (particularly editing vectors and dealing with nested "symbols" or other components), and only falls behind in a few areas. If any of you have had to deal with the nightmare of keeping your design files in sync between designers forgetting to push changes to dropbox or InVision, or multiple designers working together and having their changes fall out of sync, the ability to edit and view the same document together cannot be overstated.
> We chose Rust for this rewrite because it combines best-in-class speed with low resource usage while still offering the safety of standard server languages. Low resource usage was particularly important to us because some of the performance issues with the old server were caused by the garbage collector.
Reading between the lines here, they didn't go with a more mature language like Java because they were worried GC tuning would be a problem?
Given all the other issues they noted with using a less mature language like Rust in production, that's a pretty heavy load to take on in exchange for not having to tune GC. Isn't GC tuning a fairly well understood problem? Is there something about encoding large documents that makes it a significantly greater obstacle?
Or are there other unstated considerations at play here? For example, I mean this completely earnestly and not cynically, but there is a lot more PR and recruitment value in blogging about a hot new cutting-edge language than "how we rewrote our TypeScript server in Java".
Language maturity is not a one-dimensional problem and it's also not equal to the age of the language.
To name just two ways in which I consider Rust more mature than Java:
* It has a lot of fundamentals that are based on more academic languages which have explored some specific PLT space for a long time and let it mature. The fruits of that are now in Rust (many aspects of its type system for example).
* Rust's community has an almost absurd ability (contrasted with most other languages) to focus on specific core libraries and tooling. For example serde[1], the Rust serialization library, is well-understood, simple, mature and supported in basically every Rust library out there.
The second point is extremely impressive when contrasted with the state of things in Java-land, where you often have many different solutions for the same problems. Sometimes even the well-known and used ones are of very questionable quality[2].
Jackson's status as the de facto Java standard for reading and writing JSON really reflects poorly on the Java community. Too often the biggest, hairiest solutions get anointed as the safe "best practices" choice for all new development when really they should only be used if you know specific reasons why they will pull their weight relative to simpler solutions. Jackson is an 800lb gorilla with an impressive arsenal of features, but there's no way in hell it should be the default choice for a simple greenfield CRUD application.
That's a performance comparison that ignores all other aspects such as simplicity, readability, and ease of maintenance. It also shows Jackson coming in last for parsing small files.
One thing that we've been hearing for a while is that even if you can tune a GC, it still takes up significantly more memory than the equivalent Rust program. GCs can be memory hungry. Sometimes, you care a lot about RSS. It just depends.
GC still brings its own set of problems like Finalizers and the like.
GCs also tend to have cliffs where you hit some unknown threshold when they start thrashing where deterministic release of memory tends to be much more linear.
> GCs also tend to have cliffs where you hit some unknown threshold when they start thrashing where deterministic release of memory tends to be much more linear.
I've experienced this, having previously worked on Java services.
It can be quite a pain and can throw a wrench into your understanding of how you intended to scale.
Rust might immature, but it's rock solid. I don't think lack of maturity of the core language is a reason not to use it in production anymore. Lack of available libraries may be an issue, but if it's not then Rust seems like a pretty goos choice to me.
Their old Nodejs system. Java GC tuning is more robust and well understood. So I'm wondering if it was truly a case of "Java GC is going to be a nightmare, let's take on a very early stage language with lots of unknown risks instead." Or if perhaps there's some champing at the bit to adopt a hot new language for its own sake.
FWIW, I'm excited for Rust and also Swift for exactly these reasons. I just want to see hard justifications for using them in production this early.
Java GC is -- in my experience -- every bit of the nightmare of node.js, if not more so. Even for those for whom this is their domain of expertise, "GC tuning" boils down to superstition, voodoo and witchcraft -- which are euphemisms for simply changing the GC algorithm until things seem to improve. These changes (in my experience) often move the problem around without solving it, and when deep in a nightmare Cassandra tuning exercise, one exasperated domain expert said to me: "Why would one ever write a database in a garbage collected language?!"
I think the question was profound (if accidentally so), and I agree with him: when performance and footprint matter, it is reasonable to expect software engineers to manage their resources in a predictable manner -- which excludes garbage collected environments.
Rust is a very interesting candidate for such environments, and I think the experience described here is quite helpful for those of us seriously contemplating the language.
> "GC tuning" boils down to superstition, voodoo and witchcraft -- which are euphemisms for simply changing the GC algorithm until things seem to improve
Then, that is what "tuning" usually means in engineering. See also "PID loop tuning".
Problem: We're running into an issue with the garbage collector which, given the nature of the task, is not actually adding any value.
Potential solution 1: Replace with code that doesn't have garbage collector overhead in the first place
Potential solution 2: Replace with code that has different, robust garbage collector
I get where you're coming from, but even I would be inclined to choose solution 1 in this case. That they chose Rust instead of C++ is their own business, but that they chose either of those over Java just seems sensible to me.
That's not really a fair summary. The issue is that less mature language have lots of other issues that come along with the good. Just in this article they noted immature libraries, difficulties with async that kept them from migrating fully off of node.js, and difficulties with error handling. So you have to weigh those in the comparison, not just focus on the one major benefit.
They were able to resolve a performance-critical issue that was problematic to a critical part of their application by basically writing a script rather than needing to refactor their code base.
They can benefit further by choosing something more performance-oriented than Node.js, sure. But they weren't having issues with their whole application failing - just a specific piece. When you want to expand your driveway your first thought shouldn't be what materials you want to build your new house in.
Interesting, I got the impression that the problem was being restricted to a single thread. “being single-threaded, couldn’t process operations in parallel”.
Wonder what the article would have said in a parallel universe where the problem was migrated to something like the Akka framework on the JVM?
I'd imagine that if they had switched from Node.js to Akka they likely would've solved some of their issues dealing with single-threading - while I like Node.js I think the point where performance matters more than feature iteration it should be refactored into something that suits your domain better.
That being said, they mention being turned on to low resource usage specifically because of garbage collection issues and resource overhead. While the JVM is lighter than the Node.js runtime when considered across multiple cores I think that's just going to give them more runway to tackle the underlying problem they were still having.
"Instead of going all-in on Rust, we decided to keep the network handling in node.js for now. The node.js process creates a separate Rust child process per document and communicates with it using a message-based protocol over stdin and stdout. All network traffic is passed between processes using these messages."
This is not entirely surprising, Rust's async I/O story isn't settled yet. tokio and futures exist, but there's some churn there that means it's a bit difficult to deal with at the moment. An RFC for async/await in Rust has been proposed and it's likely that that will be implemented this year, and then tokio should be reworked atop that, which should provide a good basis for things moving forward. It'll take some time for the crates ecosystem to settle around that, though.
I’m curious to know if they ever looked at https://github.com/Microsoft/napajs from Microsoft or any other in-nodejs solutions to deal with the need for multithreading vs launching multiple processes...
Honestly, it's pretty encouraging. All of the problems they listed are things I see as being very nearly solved.
For example, I ran into the futures issues. Now I use nightly's async/await, and it's a massive improvement while still being early days.
I also use the Non Lexical Lifetimes feature, and see a lot of ergonomic wins there.
The other stuff is being worked through as well - Failure is coming along, and I hear good things. Libraries are always improving.
I would be a lot more terrified if these issues were surprising, or not being dealt with, or were extremely hard to fix without breaking the language, etc. Instead, it's a list of things I've run into and can even solve today with a few features.
It depends on perspective, I guess. After all, they say
> While we hit some speed bumps, I want to emphasize that our experience with Rust was very positive overall. It’s an incredibly promising project with a solid core and a healthy community. I’m confident these issues will end up being solved over time.
Rust is still a relatively new language, and most of the cons list fits in with that. They also talked about how they worked around these issues, and what we're doing to address them, which is pretty wonderful.
I'm also happy to talk more about any of the specific points here to add more context; for example, the comment about error-chain and failure is spot on. error-chain is the older, more battle tested library, failure is the newer one that's try to address some issues with it. The ecosystem is still shaking out. (I personally love failure.)
It does match expectations for a language of Rust's maturity and complexity though.
One needs to have a robust instinct for separating the excited statements of early adopters and fans from the more mundane reality: developing a language ecosystem takes a long time and other languages have had decades to iterate.
" each document lives exclusively on one specific worker"
Would decoupling the workers and the documents they work on not solve this problem? Granted this might be non trivial, but it might have solved the fundamental issue thats arguably more interesting.
Somebody has to be responsible for reconciling multiple workers' changes to a document and arriving at a consistent state. Sounds like they're probably doing it in memory within an application worker. You could imagine a solution where they move this to the storage layer and have multiple workers acquire locks to mutate it, but I'm not sure how that wouldn't just move the problem.
The part I did not get is why a worker was holding on to other files when it was working on a single doc.
"Throwing more hardware at the problem wouldn’t have solved this issue because a single slow operation would still lock up the worker for all files associated with that worker"
Admittedly, the blog is light on details here and I am unfamiliar with the product. With the re-write, they also just moved the problem to rust. So I think that your suggestion to move the problem to the storage layer could have been another viable solution.
They didn't do this before because the memory overhead of having a separate worker for each document would have made the infrastructure costs exorbitant, presumably (since you'd have to pay for whatever fixed overhead costs Node has). But the lower overhead of using a Rust process per doc instead of a JS process per doc has allowed them to move to that model.
There can be an M:N mapping - An update queue can be consumed by a fixed pool of workers that can lock a file and commit the update. This is crude but wont explode the number of workers.. if you have a hashing scheme for the worker pool and implement linear probing, you can achieve some degree of preference for the same worker.
If their system was such that a worker maintains state for a bunch of docs, and persists them at checkpoints, while also being doing compute intensive tasks per document on a single threaded node instance, I would ask why it was designed like that in the first place.
EDIT: Quote from the OP:
> One of them is called `error-chain` and another one is called `failure`. We didn’t realize these existed and we aren’t sure if there’s a standard approach.
`error-chain` is in maintenance mode these days; `failure` is its spiritual successor and seems that it's on the path to becoming a community standard eventually.