Hacker News new | past | comments | ask | show | jobs | submit login
Rust in production at Figma (figma.com)
230 points by steveklabnik on May 2, 2018 | hide | past | favorite | 66 comments



On the topic of Rust in production, game studio Chucklefish (makers of Starbound) are developing a cross-platform (including Xbox/PS4/Switch) game in Rust, and recently released a whitepaper: https://www.rust-lang.org/pdfs/Rust-Chucklefish-Whitepaper.p... , and last year did an AMA with more technical details: https://www.reddit.com/r/rust/comments/78bowa/hey_this_is_ky...

EDIT: Quote from the OP:

> One of them is called `error-chain` and another one is called `failure`. We didn’t realize these existed and we aren’t sure if there’s a standard approach.

`error-chain` is in maintenance mode these days; `failure` is its spiritual successor and seems that it's on the path to becoming a community standard eventually.


I'll add that bindings Chucklefish did for Lua via rlua[1] are some of the best I've seen. I had Lua up and running from scratch in a project within 5 minutes which is pretty darn awesome.

[1] https://crates.io/crates/rlua


Given other comment dismissing it, I'll do a quick review of that Chucklefish paper. Here's what I noticed:

1. A case study of Rust used in something performance-critical. People pushing or just assessing it like to see those.

2. They say they're all in on modern C++ but still trigger undefined behavior and crashes a lot in practice. Member's of Rust team regularly claim that happens despite C++'s safety improvements over time. In this case, it was happening in C++ but not in Rust with C++ developers using both. Far as learning complexity, they're both complex enough that C++ coders should be able to learn Rust. So, the main drawback is in both languages. Looking at complexity vs reliable iterations, Rust provided an advantage in knocking out problems that were slowing the C++ coders down sometimes by "hours" of debugging.

3. On parallelism and concurrency, their prior method was getting a single-core implementation working first that they transformed into something for multi-core. This was giving them a lot of trouble. That there's lots of ways to implement concurrency in C++ means I can't be sure if it was due to the language, what library/framework they were using, their own techniques, or some combo. Also, I'm not going to guess since I don't use C++. :) Regardless of what they were doing, the alternative approach in Rust let them get a lot of stuff right on the first try. So, it was easier for those C++ coders to do multicore in Rust than in C++. It's evidence in favor of Rust teams claim that Rust makes concurrency easier with less debugging given they were newcomers immediately getting good results. However, I don't think it's evidence of anything comparative between Rust and C++ concurrency without knowing what they were doing in C++. As in, some C++ coders might be getting better results with different methods where gap between them and Rust might be anywhere from zero to smaller than this case.

4. Finally, handling platform differences was so much easier as newcomers using Rust's tooling than it was for them as C++ veterans that they still saved time overall in Rust despite having to implement Rust support for game platforms themselves. That's strong evidence that Rust's package manager or platform tooling is excellent with some anecdotal evidence it's better than C++ at this cross-platform, use case.

So, there's a summary of what I got out of the case study for anyone that might find it useful.

EDIT: That Reddit thread has some great comments about floats, too. Problems, solutions, Rust vs C++ handling, and so on.


Oh that's good to hear. I tried `error-chain` and dropped it when it broke `#serde(flatten)` for me. I'll keep any eye on Failure.


How did it break? If it was just a matter of syntax then error-chain does have support for attributes on the enum variants, or you can look at derive-error-chain for a more regular syntax instead.


Are they using Vulkano?


They’re using a wrapper they wrote themselves over the gl crate.


That looks more like a very short pdf with a white background than a 'whitepaper'.


I'm not sure what you mean - this looks like a very conventional whitepaper to me.

Wikipedia says

> In business, a white paper is closer [than the government meaning of a policy document] to a form of marketing presentation, a tool meant to persuade customers and partners and promote a product or viewpoint.


Much of what I've done professionally over the past decade was to produce white papers in the following sense:

https://en.wikipedia.org/wiki/Grey_literature


If you have more specific feedback about what you'd expect a whitepaper to look like, we'd love to hear it. We surveyed some stuff and didn't really find much commonality, but would prefer to follow conventions if there are any!


I think one thing that I expect to see in something labeled a "paper" of any sort is data, but that may be a personal bias.


Thanks!

I do think that some data would be good, but at the same time, the kind of data that's relevant here is really hard to get in any concrete terms, so might actually be harmful.

> a "paper" of any sort

Remember that this isn't a paper of just any sort: it's a whitepaper. So it has a specific audience: CTOs, VPs, and other technical management types. Audience is important for what you have to include in any work.


That makes sense, thanks!


No problem. I bet CTOs would love “we had a 30% reduction in defect rate” or data like that, for sure, it’s just so hard to collect...


Oh! Actually, I'm paging through "Rust in production at Figma" (https://blog.figma.com/rust-in-production-at-figma-e10a0ec31...) and I think this is closer to what I would expect in a "paper" - while acknowledging that am predisposed toward thinking "graphs and data" when I think of a paper, I do think these kinds of blog posts have more of an impact (and seem more true to the idea of a "paper") on me than the current Rust whitepapers.

That said, I will happily concede that I have no idea who the target audience is for the whitepapers and probably don't know the accepted industry definition of a whitepaper. However, I'm very happy to see Rust flourish either way.


It's not supposed to be a scientific paper - it's narrative a report on their experience doing something. Obviously not all papers have data in them - not even all computer science research papers have data in them!


Tangentially related, but if any of you work with designers or are yourselves designers, please give Figma a try. It is identical to Sketch in so many ways, better in some others (particularly editing vectors and dealing with nested "symbols" or other components), and only falls behind in a few areas. If any of you have had to deal with the nightmare of keeping your design files in sync between designers forgetting to push changes to dropbox or InVision, or multiple designers working together and having their changes fall out of sync, the ability to edit and view the same document together cannot be overstated.


Are you related to the company?


No relation, I just really like it.


I'm not but +1 to their comment. Good tools are good.


He's not related to Figma.


Sketch keeps working after the subscription expires.

Affinity is pay once, use as long as you want.

It looks like Figma is a SaaS. Might as well pay for Adobe if one wants to give up even the meagre amount of ownership they had over their software.


It is identical to Sketch in so many ways, better in some others

Is there anything you miss from sketch? what do you think sketch does better?


Sketch has good plugins, like Craft. Figma only just introduced an analogous concept, so if you rely heavily on any you might miss it.


Can Figma import/edit .sketch files?


Yes it can import them, but not export back.


> We chose Rust for this rewrite because it combines best-in-class speed with low resource usage while still offering the safety of standard server languages. Low resource usage was particularly important to us because some of the performance issues with the old server were caused by the garbage collector.

Reading between the lines here, they didn't go with a more mature language like Java because they were worried GC tuning would be a problem?

Given all the other issues they noted with using a less mature language like Rust in production, that's a pretty heavy load to take on in exchange for not having to tune GC. Isn't GC tuning a fairly well understood problem? Is there something about encoding large documents that makes it a significantly greater obstacle?

Or are there other unstated considerations at play here? For example, I mean this completely earnestly and not cynically, but there is a lot more PR and recruitment value in blogging about a hot new cutting-edge language than "how we rewrote our TypeScript server in Java".


Language maturity is not a one-dimensional problem and it's also not equal to the age of the language.

To name just two ways in which I consider Rust more mature than Java:

* It has a lot of fundamentals that are based on more academic languages which have explored some specific PLT space for a long time and let it mature. The fruits of that are now in Rust (many aspects of its type system for example).

* Rust's community has an almost absurd ability (contrasted with most other languages) to focus on specific core libraries and tooling. For example serde[1], the Rust serialization library, is well-understood, simple, mature and supported in basically every Rust library out there.

The second point is extremely impressive when contrasted with the state of things in Java-land, where you often have many different solutions for the same problems. Sometimes even the well-known and used ones are of very questionable quality[2].

[1]: https://serde.rs/ [2]: https://github.com/FasterXML/jackson-databind/pull/1423


Jackson's status as the de facto Java standard for reading and writing JSON really reflects poorly on the Java community. Too often the biggest, hairiest solutions get anointed as the safe "best practices" choice for all new development when really they should only be used if you know specific reasons why they will pull their weight relative to simpler solutions. Jackson is an 800lb gorilla with an impressive arsenal of features, but there's no way in hell it should be the default choice for a simple greenfield CRUD application.


Without turning this into a long tangent subthread.. could you point to some support for this? The comparisons I've seen are more nuanced.[1]

[1] https://blog.takipi.com/the-ultimate-json-library-json-simpl...


That's a performance comparison that ignores all other aspects such as simplicity, readability, and ease of maintenance. It also shows Jackson coming in last for parsing small files.


The type system has no relationship to language maturity. It's just a different type system with different trade-offs.

Your second point is invalidated by the blog itself: "Many libraries are still early".


One thing that we've been hearing for a while is that even if you can tune a GC, it still takes up significantly more memory than the equivalent Rust program. GCs can be memory hungry. Sometimes, you care a lot about RSS. It just depends.


GC still brings its own set of problems like Finalizers and the like.

GCs also tend to have cliffs where you hit some unknown threshold when they start thrashing where deterministic release of memory tends to be much more linear.


> GCs also tend to have cliffs where you hit some unknown threshold when they start thrashing where deterministic release of memory tends to be much more linear.

I've experienced this, having previously worked on Java services.

It can be quite a pain and can throw a wrench into your understanding of how you intended to scale.


Rust might immature, but it's rock solid. I don't think lack of maturity of the core language is a reason not to use it in production anymore. Lack of available libraries may be an issue, but if it's not then Rust seems like a pretty goos choice to me.


>Reading between the lines here, they didn't go with a more mature language like Java because they were worried GC tuning would be a problem?

The post says GC was already a problem in their old system.


Their old Nodejs system. Java GC tuning is more robust and well understood. So I'm wondering if it was truly a case of "Java GC is going to be a nightmare, let's take on a very early stage language with lots of unknown risks instead." Or if perhaps there's some champing at the bit to adopt a hot new language for its own sake.

FWIW, I'm excited for Rust and also Swift for exactly these reasons. I just want to see hard justifications for using them in production this early.


Java GC is -- in my experience -- every bit of the nightmare of node.js, if not more so. Even for those for whom this is their domain of expertise, "GC tuning" boils down to superstition, voodoo and witchcraft -- which are euphemisms for simply changing the GC algorithm until things seem to improve. These changes (in my experience) often move the problem around without solving it, and when deep in a nightmare Cassandra tuning exercise, one exasperated domain expert said to me: "Why would one ever write a database in a garbage collected language?!"

I think the question was profound (if accidentally so), and I agree with him: when performance and footprint matter, it is reasonable to expect software engineers to manage their resources in a predictable manner -- which excludes garbage collected environments.

Rust is a very interesting candidate for such environments, and I think the experience described here is quite helpful for those of us seriously contemplating the language.


> "GC tuning" boils down to superstition, voodoo and witchcraft -- which are euphemisms for simply changing the GC algorithm until things seem to improve

Then, that is what "tuning" usually means in engineering. See also "PID loop tuning".


Problem: We're running into an issue with the garbage collector which, given the nature of the task, is not actually adding any value.

Potential solution 1: Replace with code that doesn't have garbage collector overhead in the first place

Potential solution 2: Replace with code that has different, robust garbage collector

I get where you're coming from, but even I would be inclined to choose solution 1 in this case. That they chose Rust instead of C++ is their own business, but that they chose either of those over Java just seems sensible to me.


That's not really a fair summary. The issue is that less mature language have lots of other issues that come along with the good. Just in this article they noted immature libraries, difficulties with async that kept them from migrating fully off of node.js, and difficulties with error handling. So you have to weigh those in the comparison, not just focus on the one major benefit.


They were able to resolve a performance-critical issue that was problematic to a critical part of their application by basically writing a script rather than needing to refactor their code base.

They can benefit further by choosing something more performance-oriented than Node.js, sure. But they weren't having issues with their whole application failing - just a specific piece. When you want to expand your driveway your first thought shouldn't be what materials you want to build your new house in.


Interesting, I got the impression that the problem was being restricted to a single thread. “being single-threaded, couldn’t process operations in parallel”.

Wonder what the article would have said in a parallel universe where the problem was migrated to something like the Akka framework on the JVM?


I'd imagine that if they had switched from Node.js to Akka they likely would've solved some of their issues dealing with single-threading - while I like Node.js I think the point where performance matters more than feature iteration it should be refactored into something that suits your domain better.

That being said, they mention being turned on to low resource usage specifically because of garbage collection issues and resource overhead. While the JVM is lighter than the Node.js runtime when considered across multiple cores I think that's just going to give them more runway to tackle the underlying problem they were still having.


Lot of companies pick x.y.z because it's fancy not because it's the best solution. ofc Java would have been a great solution here.


Java still has GC overhead and higher resource consumption in general, when compared to rust or C++


Interesting:

"Instead of going all-in on Rust, we decided to keep the network handling in node.js for now. The node.js process creates a separate Rust child process per document and communicates with it using a message-based protocol over stdin and stdout. All network traffic is passed between processes using these messages."


This is not entirely surprising, Rust's async I/O story isn't settled yet. tokio and futures exist, but there's some churn there that means it's a bit difficult to deal with at the moment. An RFC for async/await in Rust has been proposed and it's likely that that will be implemented this year, and then tokio should be reworked atop that, which should provide a good basis for things moving forward. It'll take some time for the crates ecosystem to settle around that, though.


Yeah, waiting for Rust's async/await to mature is probably a good idea. But I'm not sure why they didn't just use threaded I/O instead.


Sounds like they use processes for fault isolation.


I’m curious to know if they ever looked at https://github.com/Microsoft/napajs from Microsoft or any other in-nodejs solutions to deal with the need for multithreading vs launching multiple processes...


That's a really good review of rust. Especially of async side.


I look forward to using rust, but will probably wait a few more years for the language to mature. Interested in seeing what it will become!


Man, is that "cons" section ever terrifying.


Honestly, it's pretty encouraging. All of the problems they listed are things I see as being very nearly solved.

For example, I ran into the futures issues. Now I use nightly's async/await, and it's a massive improvement while still being early days.

I also use the Non Lexical Lifetimes feature, and see a lot of ergonomic wins there.

The other stuff is being worked through as well - Failure is coming along, and I hear good things. Libraries are always improving.

I would be a lot more terrified if these issues were surprising, or not being dealt with, or were extremely hard to fix without breaking the language, etc. Instead, it's a list of things I've run into and can even solve today with a few features.


It depends on perspective, I guess. After all, they say

> While we hit some speed bumps, I want to emphasize that our experience with Rust was very positive overall. It’s an incredibly promising project with a solid core and a healthy community. I’m confident these issues will end up being solved over time.

Rust is still a relatively new language, and most of the cons list fits in with that. They also talked about how they worked around these issues, and what we're doing to address them, which is pretty wonderful.

I'm also happy to talk more about any of the specific points here to add more context; for example, the comment about error-chain and failure is spot on. error-chain is the older, more battle tested library, failure is the newer one that's try to address some issues with it. The ecosystem is still shaking out. (I personally love failure.)


That cons list needs to be put in the context of C/C++ tackling the same set of issues.


It does match expectations for a language of Rust's maturity and complexity though.

One needs to have a robust instinct for separating the excited statements of early adopters and fans from the more mundane reality: developing a language ecosystem takes a long time and other languages have had decades to iterate.


What were the reasons you decided to go with a child_process model instead of wrapping you Rust in a Node native module?


" each document lives exclusively on one specific worker"

Would decoupling the workers and the documents they work on not solve this problem? Granted this might be non trivial, but it might have solved the fundamental issue thats arguably more interesting.


Somebody has to be responsible for reconciling multiple workers' changes to a document and arriving at a consistent state. Sounds like they're probably doing it in memory within an application worker. You could imagine a solution where they move this to the storage layer and have multiple workers acquire locks to mutate it, but I'm not sure how that wouldn't just move the problem.


The part I did not get is why a worker was holding on to other files when it was working on a single doc. "Throwing more hardware at the problem wouldn’t have solved this issue because a single slow operation would still lock up the worker for all files associated with that worker"

Admittedly, the blog is light on details here and I am unfamiliar with the product. With the re-write, they also just moved the problem to rust. So I think that your suggestion to move the problem to the storage layer could have been another viable solution.


They didn't do this before because the memory overhead of having a separate worker for each document would have made the infrastructure costs exorbitant, presumably (since you'd have to pay for whatever fixed overhead costs Node has). But the lower overhead of using a Rust process per doc instead of a JS process per doc has allowed them to move to that model.


There can be an M:N mapping - An update queue can be consumed by a fixed pool of workers that can lock a file and commit the update. This is crude but wont explode the number of workers.. if you have a hashing scheme for the worker pool and implement linear probing, you can achieve some degree of preference for the same worker. If their system was such that a worker maintains state for a bunch of docs, and persists them at checkpoints, while also being doing compute intensive tasks per document on a single threaded node instance, I would ask why it was designed like that in the first place.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: