Hacker News new | past | comments | ask | show | jobs | submit login
Rust explained using easy English (github.com/dhghomon)
335 points by Santosh83 on July 23, 2020 | hide | past | favorite | 79 comments



Very nice work.

It's a good exercise to put complicated ideas into very simple language.

George Orwell had that job during World War II. The BBC used to broadcast news to the British Empire in India and Hong Kong, using Basic English. Orwell's job was to translate the news into Basic English. He later wrote that this is a political act. Converting news to Basic English means taking all the ambiguity out. If a political statement was ambiguous, removing the ambiguity means making a political decision. Someone has to decide what the statement really means.

That's where Newspeak, in "1984", came from.[1]

[1] "Orwell, The Lost Writings", https://www.amazon.com/Orwell-Lost-Writings-George/dp/087795...


That was a pretty cool TIL, and really explains his obsession with language in 1984!


thank you for this, that is a fascinating snippet of information that sent me down a rabbit hole


Similarly, Room 101 was a room where he had to suffer long boring meetings.

Many years back I used to suffer long meetings in another 101 meeting room - I suspect that numbering scheme might be quite common.

[Edit: Mind you in the same building a lecture room where I used to suffer maths lectures is now a bar! The building having been converted to a hotel.]


Thanks for mentioning the book, it seems really interesting!

The book can be borrowed at https://archive.org/details/orwelllostwritin00orwe



This is great, thanks for sharing.


I think the paragraphs are simple enough, but my personal preferences is, some paragraphs better come first than others.

I am not sure if it's the same to others, but for me to learn a language for the first 15 minutes, I am really not interested with the differences of signed vs unsigned integers. I may have look of this when I got stuck somewhere later.

I think a better order will be:

1. play ground

2. hello world

3. control flow

4. function declaration

5. string, list and dict

6. featured stuff with this language (e.g. threading, async call)

I should be able to write basic stuff and play around with the above. Then I would try to sort out all kinds of remaining pitfalls myself :)


I'm picking up Rust again, and one of our lead architects tried to pick it up but ran into the same issues I've ran all along: third party libraries (even extremely popular ones) have outdated example code or docs that are inconsistent with actually released versions. So in the meantime I will focus on learning the out of the box features of Rust.

It is a shame because I love the potential of Rust but it seems that not much has changed in the two years since I last touched it in the regards to documentation for external projects. I still am not good enough with Rust to start producing my own projects as a result, which in other languages you can do rather quickly. I hope the ecosystem matures in this regard because I love everything else, the tooling, the build system is very modern and mature as best as I can tell.


The issues with documentation aren't really something I'd attribute to the Rust ecosystem, but just to people/humans in general. I've had this problem with every language I worked with: JS/TS, Haskell, C/C++, Go.

Eg,the C++ boost docs are IMO completely decrepit, but people still manage to be productive using it.

If you see a mistake in the docs, open an issue (or a PR), it only costs you little time.


> I've had this problem with every language I worked with: ... Haskell ...

By the way, if you see outdated Haskell docs please report them on https://github.com/tomjaguarpaw/tilapia. I am (slowly) collecting examples and trying to improve the ecosystem.


Most of my contributions on GitHub are tickets, so I just might have to. Thing is I have a rule of thumb that if I can't get a language to work in about 15 minutes, I ditch it. But now that I do want to get into Rust I may just have to write up tickets when I run into issues.


This happened to me the other day with sqlx. Tried to duplicate the quick start example and got a slew of errors. I should make a ticket, maybe I did something wrong.


>third party libraries (even extremely popular ones) have outdated example code or docs that are inconsistent with actually released versions.

That's actually one of my big pet-peeves about OSS. This is especially annoying because the Rust ecosystem already has a really nice solution to the problem - skeptic ([0]).

Skeptic finds Rust snippets in markdown files and dynamically builds tests to run them. So if your docs go out of sync with your code, `cargo test` fails.

[0] - https://github.com/brson/rust-skeptic


Kinda sad it looks like that was last updated 2 years ago


skeptic should only be useful for arbitrary markdown files though, `cargo test` should run the docstring snippets. The example snippets are more difficult to craft though, as they need to be "self-contained" and you need to hide the bits you don't want to show by marking them as a pseudo-comment using `#`. Rustdoc also has a `--test` flag.

Tested it on clap (no specific reason, just have a checkout locally) and it ran 314 snippets from the docs so seems to work.


My main concern is if Rust changes significantly in a few years and this utility breaks then.


That's a reasonable concern. IMO, a tool like Skeptic would add a lot of value to the ecosystem if deployed widely. It would be really cool if something equivalent was integrated into cargo.


File bugs for these!

I wouldn't be surprised if readmes were rotten, but you should be able to rely on examples in docs and example programs in Cargo projects, because they are automatically built and run during unit tests.


That's interesting I didn't even think about that, then I'm not sure why it doesn't work. One case I still remember was where Actix (web framework) had docs pointing to severely older versions of the software, while the readme pointed to another, and none of them matched up quite right at the time... It was just not useful. If the examples work but you don't know where to go from there what can you even do at that point...


Actix had a major rewrite when it moved from old futures to new async/await model, so for a while there was a big difference between released/stable 0.x version and new shiny/dev 1.x version. Now it has mostly settled down.


That's great to know, might have to revisit it now that I'm back experimenting with Rust.


The discussion about shadowing variables doesn’t seem to make a very good case to me about this feature. Why would you want to introduce all those variables on the stack, each of a different type (even if they have the same name, presumably they are new pieces of stack memory)? Why not just mutate an existing variable each time?


Sometimes you need a different type and you don't care about the intermediate values. Parsing a string into an integer is a good example. Once you have the integer, generally you don't care to use the string anymore.

  let num = String::from("10");
  let num: i32 = num.parse().unwrap();


For me, understanding shadowing adds to one's understanding of what 'let' means. Whether the shadowed variables each get new memory I'd be happy to leave to the compiler/optimiser.


Each variable is of one type. If you change the type, you must declare a new variable.


I like efforts like this that try to use different wording to explain the same concepts. That said, this document still uses words I would not consider to be "simple English".

There are many such cases in the document, but let's take for example "inference". There is a definition of what "type inference" is in the first paragraph of the section. But the reader must still then learn this phrase at the same time as learning the programming language. Maybe a simpler English statement would be "Automatic types", using the word automatic, which is more wide spread and is adopted in many languages. The meaning is not as precise but the document can then further explain what it is.

I am not a native English speaker but have become so used to using it that it is sometimes difficult to know which terms cause issues to programming language learners. Perhaps HN is the wrong place to ask this but I would be very interested in knowing which terms other non-native speakers have problems with.


Unfortunately, we still need technical terms to define precise concepts. I'd argue that "inference" in "type inference" is not something a non-technical native English speaker would understand either, so it's less a matter of "simple English" vs. "technical vocabulary".


I agree. The problem is that most people learn these precise technical terms first in their own language and must then learn them also in English (well, in truth, for me it was the other way around :) ).

I think that this write-up could use a simpler English, as it is titled, then the reader can later proceed to learning the more precise technical terms in other material.


Personally I don't know what half of my technical vocabulary translates to in my native language :P I know what type inference means and I'm pretty sure I could explain it in Finnish but I have no clue what the "standard" jargon is if it even exists.

I have lots of vocabulary that is "English-only"; it's especially noticeable if I'm asked to translate something, as coming up with a translation can take a while even though I can effortlessly understand the source sentence.


> The problem is that most people learn these precise technical terms first in their own language and must then learn them also in English

I think this is only possible true for people who speak languages with large programming communities - Chinese and maybe Russian - but I doubt it’s true for anyone else.


I am from a small country of two million people. We have our own language and the faculties have a mandate to coin new technical terms in our language. Our theses all have to be written using these terms (they are updated annually).

Unfortunately, we in the industry largely don't follow these updates so in our conversations we use English terms. People just graduating, however, may be exposed to the English terms anew when they start working.

A dilligent student would have studied material in English as well but it is possible for people to get to a Masters in Computer Science and do all their studying in our native language.


Interesting! I guess I am proven wrong. If you don’t mind saying, what country are you from?


I am from Slovenia. We have had confusions with Slovakia, which is a source of a few jokes here, so just to disambuguate :) https://en.m.wikipedia.org/wiki/Slovenia


Inference works in all Romance languages, as well — it's a Latin word and this use is extremely close to its use in logic (going back to Aristotle).

Anyway, I'd consider simple english a tool for introducing new concepts.

Learning programming still means that the concepts must be introduced, and having a new word to bind the new concept seems good to me.


But not in Slavic languages :)

I guess my point was just that I find the write up in "simple English" the same as the Rust book (I have just been going through it).

I would not have any objections if the term is introduced later on. To rely on it from the start just seems... well, the same as the book :)


As a non-native speaker, I'd find "type deduction" much easier to grasp than "type inference".


Agreed. Asked around here a few people and "deduce" is more wide spread than "infer". A lot of people don't know the meaning of either, though, and it doesn't help that deduction's first translation is related to the mathematical operation :(

I would still suggest to go with automatic :)


This could really use a Table of contents


It has one now


I feel this is like the condensed version of a book. Very useful, gets the job done so you can make yourself a quick picture of what is it like programming in Rust.

This not IMO newbie programmer material so I think a caveat could be added.


Do you have specific examples of what is difficult to understand in the standard books for a non-native speaker?

Just asking out of curiosity and maybe readers can learn a thing or two about making documentation more accessible.


I'm not overly familiar with rust's documentation in particular but in my limited experience of teaching programming to a few unfortunate souls, the issue is usually figuring out what framing device to use.

The actual level of English in good documentation is usually fairly "low" - i.e. keep it simple, assume your reader wants to learn (Too many people suffer from I'll let you google that for me). That and a lot of non-native speakers probably speak better English than me (I know more words but I'm lazy)

I know how to program, I've been bitten by memory bugs - rust's design makes sense to me (i.e. the borrow checker is relatively intuitive because I've written and worked on compiler/s). That is difficult to convey in 10 minutes to someone who only knows how to plot a graph in python (for example).

My approach is usually to state all my assumptions as I go, while naming as much as I can as I go (e.g. googling for operators can be very difficult) when explaining things. In writing, I find that writing documentation away from the code can be quite fruitful otherwise the overall structure and meter of the code can be hard to divine.


> I know how to program, I've been bitten by memory bugs - rust's design makes sense to me (i.e. the borrow checker is relatively intuitive because I've written and worked on compiler/s). That is difficult to convey in 10 minutes to someone who only knows how to plot a graph in python (for example).

I think you're getting at an important issue here. I've been programming (as a hobbyist, not a professional) for over a decade, have contributed to a bunch of open source projects, written software of my own. Mostly in languages like Python, but I've dabbled with C, Rust, etc. The stuff about lifetimes in this guide is still borderline incomprehensible to me. The reason for that is that the basic issue here isn't the difficulty of the language used to describe the concept, but that the concept itself is complicated. Maybe it needs to be complicated, but I don't think there's any way to distract from the fact that Rust is a very hard language to learn.


> someone who only knows how to plot a graph in python (for example).

Hey, I've been writing Python daily for 12 years and I struggle with that!

First you have to get matplotlib installed. That often works nowadays, but you have to be sure you've installed numpy first, and on a bad day you'll find yourself looking at some sort of C compiler error and copying stuff from stackoverflow to make libgfortran be found or whatever.

Then you try to remember the bizarre way people import matplotlib. Your mind briefly flits to memories of blog posts explaining that matplotlib has two parallel APIs, and you start feeling annoyed about that. Then you start feeling annoyed that one of the APIs is based on matlab; why should a key part of the python ecosystem be apeing matlab? It's not like anyone has ever thought matlab had APIs, or anything really, worth emulating. Anyway, it's something like `from matplotlib.pyplot import plt`, or some nonsense like that. So you spend a few minutes failing to remember, and then googling for it with a rising sense of frustration.

Then you get this error:

_RuntimeError: Python is not installed as a framework. The Mac OS X backend will not be able to function correctly if Python is not installed as a framework. See the Python documentation for more information on installing Python as a framework on Mac OS X. Please either reinstall Python as a framework, or try one of the other backends._

And now you're really quite annoyed. What the fuck is a framework? It's some MacOS thing but it's always been clear that it's a non-UNIXy concept that you don't want to and shouldn't need to know and you kind of suspect it shouldn't even exist. But holding that opinion would require finding out what it is.

So you google for how to overcome that. And then... there's some weird thing where you have to put a line in the middle of your inport statements before importing matplotlib... and finally you wish you hadn't been so stubborn and had just used an IPython notebook, but you just kind of hate them despite how beautiful and well-engineered they are. They can't be version-controlled, you hate typing code into your browser, you always end up with an incomprehensible mess of mutable state, etc.

I used to use R and, back then, I enjoyed making plots and did so efficiently and reasonably well.


Ignorance is bliss when you can just copy and paste example code and swap out the plotted function (is what I was alluding to)


I like the general idea. I haven’t got the time to do more than skimming right now but I already feel that a single markdown file (especially without a table of contents) isn’t the right medium for this.


I assume the markdown is the source and that other presentational forms will follow.


You can try your hand at writing in this style here:

https://splasho.com/upgoer5/


Interesting tool - I tried pasting in your comment and it seems like the word "style" is not permitted!

  Uh oh! You have used a non-permitted word (style)`


Because it isn't in the list of accepted words[0] (most used words) as inspired by XKCD[1].

[0] https://splasho.com/upgoer5/phpspellcheck/dictionaries/1000....

[1] https://xkcd.com/1133/

Edit: I just figured out you might fully realize that and were probably just pointing it out to OP. Ignore me. :)


Indeed I did :) but still useful to have the links for other interested readers!


I'm confused. Only u8 can be cast to char, but char is 4 bytes long? How do those two things works together?


Because char's range has a hole from 0xd800 to 0xd8ff (surrogate pairs [1]) so not all u16 values (or larger) can be safely converted to char.

[1] https://en.wikipedia.org/wiki/UTF-16#U+D800_to_U+DFFF


Additionally Unicode only goes up to U+10FFFF (1,114,112 code points) which is far short of a full 32 bits (4,294,967,295 values).


I found the paragraph about char a bit unclear; especially when explaining that all chars are 4 bytes long, and std::mem::size_of_val("Hello!") is 6 bytes. I would have expected it to be 24 bytes long!


A Rust String is encoded as UTF-8, it is not an array of char-s. You can get an iterator over individual codepoints using the chars() method.

https://doc.rust-lang.org/std/string/struct.String.html#meth...


Don't most other languages use utf16 for strings?


It's fairly complicated in other languages.

Languages that were developed before the advent of Unicode, like C and C++, frequently leave the encoding of strings up to the system; but this can be complicated, because there are a number of different legacy 8 bit encodings, along with legacy encodings that don't fit into 8 bits. So they have two character types, char and wchar_t; char is a type that can always encode at least 8 bits, and wchar_t is an implementation defined wide character type, which is 16 bits on some platforms and 32 bits on others.

When Unicode was first designed, and released as Unicode 1.0, it was envisioned as a 16-bit universal encoding, with the expectation that you would use it just like 8 bit character encodings but with 16 bit character types instead. This encoding is known as UCS-2.

However, experience and unification of the Unicode standard with ISO 10646 showed that 16 bits would not be sufficient for all of the characters in all of the world's writing systems; in order to fit CJK characters into that 16 bit encoding, a large effort was made to unify characters from different languages that represented basically the same thing but might be written slightly differently in Traditional Chinese, Simplified Chinese, Korean, Japanese or the historical Vietnamese writing system. Additionally, a lot of decisions had to be made about inclusion of historical or obscure characters, and so there were many characters which couldn't be encoded properly in early Unicode.

This led to Unicode 2.0 introducing a surrogate character mechanism, now known as UTF-16, in which certain unused ranges of code points were used in pairs to represent a wider range of code points, in order to expand Unicode from a single 16 bit plane of characters to 17 16-bit planes.

Meanwhile, Ken Thompson and Rob Pike adapted an earlier 8-bit encoding of ISO-10646 called UTF-1 into UTF-8, which had a number of desirable properties; it was a superset of ASCII, the interpretation of the ASCII subset of bytes never depended on state, and it was self-synchronizing, so if you started reading in the middle of a multi-byte character, you could always find the start of the next character.

This meant it was possible to use UTF-8 in all of the traditional APIs and file formats which were compatible with 8-bit extensions of ASCII; which meant that you could migrate to Unicode support without significant changes to APIs and file formats. Also, for text which consists mostly of the ASCII subset, which includes many ASCII-based markup languages, it was much more efficient than UTF-16.

In the early days, however, many languages and systems which adopted Unicode started doing so via UCS-2, and then with the advent of Unicode 2.0 and surrogate pairs, needed to adopt to UTF-16. This was done somewhat inconsistently, and it can cause a lot of pain and confusion.

I've already probably written more than is necessary, so I won't go into the history of each system, but a quick survey of a few popular platforms and languages and their native string encoding:

* C and C++: Support 8-bit unspecified encoding strings (char * , std::string), unspecified width wide characters (wchar_t * , std::wstring, which are 16-bit UTF-16 on Windows and 32-bit UCS-4 on most other platforms), and explicit widths (char8_t * , std::u8string, char16_t * , etc...) which are explicitly UTF-8, UTF-16, and UCS-4.

* Windows uses 16-bit wchar_t * , nominally in UTF-16 but not enforced, as its native string type, and also provides 8 bit APIs with character encoding which can vary at runtime

* Linux and other Unix systems generally use 8-bit char * in an unspecified encoding (which can vary at runtime) as their native string type; most systems these days are using UTF-8 as that 8-bit encoding, but it's not guaranteed

* Java, JavaScript, and C# all use UTF-16 as their native encoding, though in many cases there are still APIs which make most sense for UCS-2 rather than UTF-16.

* Python 2 used unspecified 8-bit encodings for the str type, and a custom encoding that supports all of Unicode for the unicode type. Python 3 changed so that the bytes type was 8-bit with unspecified encoding, and str type was Unicode in a custom encoding; a bytes type alias for str was added to Python 2 to ease the transition, as was a unicode type alias for str in Python 3

* Go and Rust use UTF-8 as their native string types

* Most text-based file formats, such as HTML, have standardized on UTF-8 as the backwards compatibility with ASCII is a big benefit for such formats.

There are a few references on why UTF-8 is generally a better choice for languages and APIs than UTF-16, despite some of the legacy use of UTF-16 in languages and APIs:

* http://utf8everywhere.org/

* https://www.cl.cam.ac.uk/~mgk25/unicode.html


somewhat of a rust noob: but as I recall there is a difference between string and array of chars.

string use utf-8, which is 1 byte for the most common characters.

char is unicode


They are both unicode. A `String` is unicode encoded as `UTF-8` whereas a `char` is simply a single, un-encoded, code point.

An array of `char` would technically be UTF-32. Which isn't particularly useful, imho.


A char is a scalar value, rather: https://doc.rust-lang.org/std/primitive.char.html


You're right, of course. Unicode terminology always trips me up. I have to remember that a scalar value is almost the same as a code point except that it excludes surrogates.


ok, thnx for clarifying.


Nice!

Sometimes I wish we had a basic "algorithms" language that was used absolutely everywhere, and every book everywhere teaching another language would assume the person knows how to program using the basic language and then teach the new language as a diff of the basic one.

Why do I say this? Because as someone who is versed in quite a few languages, I find the signal to ratio of these texts to be uncomfortable. For example, I know C really well, so the explanations of integer sizes are irrelevant, I know what is shadowing, etc., but the explanation of the return statements is really useful, the variable declaration is interesting, so I keep having paragraphs I can skip followed by paragraphs I have to read, followed by paragraphs I can skip, etc. I get tired quickly when reading like this.


Word choices are very important. Yes we want to use smaller common words but not to the point of being misleading.

> Type inference means that if you don't tell the compiler the type, but it can decide by itself, it will decide.

The word 'decide' implies that there was a choice that was made. I would think of inference more as 'determine' as the process is deterministic just not always successful. Or are there actual cases where the compiler infers a type that's close but not entirely correct?


I think a wiki format might better suit this presentation. It’s tough because you wash to be comprehensive but when you have one page, that means you’ll have to clutter your document with all the comprehensive details. In a wiki, you can hide those details in deeper pages for only the more curious reader to find (or for those on their second read-through). Nevertheless, this document is phenomenal and I will refer to it in the future.


As the author of a programming language ; ) I find this article very inspiring.

I just want those new to programming to understand the joy of systems and simple graphics programming like in the ol' Turbo Pascal days!

http://www.adapplang.com/


The most interesting thing I’ve learned yet from this, and I’m not quite done with it, is that you can pass a constant (let) to a function, which can accept it as mutable, and therefore actually mutate data that the caller thinks cannot be mutated! I’m not sure how I feel about this.


Note that that is passing by value, so its moving/copying ownership. Its like having a immutable integer variable and copying it to a different mutable variable.


As a non-native English speaker, I can read this faster than my usual technical content! Great work!


Such simple language allows my mind to instantly start composing syntax. Excellent work.


this is great. it really helped demystify rust and get me started writing some. thank you.


only `u8` can be cast as `char`, not `i32`

Can someone explain that in basic english?


Say you have eight books of equal size on a shelf. It's easy to put these eight books on another shelf of equal size or larger. If you have a shelf holding thirytwo books, it is impossible to fit them all on the smaller shelf, so it's better to simply not do it.


It can fail. Use `try_into` instead.


found this on reddit for this post:

"I applaud this. And it reminds me of the up-goer five. https://xkcd.com/1133/"


You link doesn't work because of the double quote at the end.

Working link: https://xkcd.com/1133/


Randall Munroe did a whole book in this style: https://xkcd.com/thing-explainer/

It's a lot of fun.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: