Hacker News new | past | comments | ask | show | jobs | submit login
So you want to design a programming language (2017) (lmu.edu)
177 points by behnamoh on Feb 26, 2022 | hide | past | favorite | 58 comments



If you want to design a programming language, and are okay with building off of an existing language, I suggest giving Racket a try. It can do more than just create DSLs and compile down to Racket code. I recently learned about sham* which allows interfacing with LLVM. I've linked some examples and references below

References 1. https://docs.racket-lang.org/guide/languages.html 2. https://docs.racket-lang.org/turnstile/

Examples 1. https://github.com/soegaard/urlang 2. https://github.com/rjnw/sham 3. https://github.com/ShawSumma/lure 4. https://github.com/racket/rhombus-prototype 5. https://github.com/lexi-lambda/hackett


Or do none of that. I understand that this article is meant to encourage people to learn some cool stuff within programming languages, but it’s also discouraging. The better answer in my view is to learn how to build a tree walking interpreter with Crafting Interpreters, then build whatever comes to mind. Want functional programming? Add immutability and first class functions. How about concurrency? Add an event queue. Then try to make a bytecode interpreter or a compiler. Sure these ideas require some of the knowledge detailed in the article, but you can learn that info JIT. Don’t try to learn it all before you get started. Just start and see what happens.


There’s a difference between building a toy language and designing something people would actually use.

I think the article is more aimed at the latter, whereas if you just want to get something working and have fun, your approach makes more sense.


I think most people who are interested in programming languages would do well to start by making toy languages. There's a lot of pressure in trying to make something "useful", and you're almost certainly not going to succeed at that on your first attempt anyway.

The list of prerequisites in the article is extensive, overwhelming, and largely unnecessary if you're just getting started. Most of the concepts that it lists as "prerequisites" are ones that I finally understood while implementing a toy language, not stuff that I studied ahead of time.


Part of the usefulness of the guide might be to dissuade people from even trying, by showing them how hard language design is.

I don't think I would ever want to do anything beyond a DSL or bytecode for tiny 128 byte ish embedded config or the like, and even then only if nothing else worked.

The odds of usefulness are low even with pro backing. Understanding all this stuff puts you in a better position to decide it you actually want to try.

With toy projects you can just keep iterating and iterating and never get anywhere, and not really fully realized why it's not successful.


> Part of the usefulness of the guide might be to dissuade people from even trying, by showing them how hard language design is.

That is a bad thing. Going deep in any domain is hard; the thing to do is make the on-ramp gentler so more people can try it out, not make it steeper to discourage new learners.

If someone wants to start making games, we tell them to start off by cloning Pong or Breakout. We don’t say “be careful, here’s the long list of topics you’ll need to understand before you can make Fortnite.”


Yes, but if you approach it as a toy then "success" doesn't mean "wide usage" it means "I had fun and learned stuff".

I love Crafting Interpreters in part because he's explicit about this being mostly about learning and about play. I love languages (spoken, written, programming) and it was a great introduction that really got me rolling.

Articles like this consistently overwhelmed me, with the result being that I didn't really try it until years after I wanted to. Yeah, languages aren't for everyone, but specifically trying to scare people off of it seems harsh and unnecessary.


> Yeah, languages aren't for everyone, but specifically trying to scare people off of it seems harsh and unnecessary.

That depends. It's really hard to articulate just how much work is involved in writing a programming language. It's one of the classic infinite time sinks that exists in the CS field; no matter how much work you put into it, there's more to be done. It's very hard to ever call it "finished". I've seen a lot of language projects start as "I was working on this other thing and then I thought to myself, gee, here's a great opportunity for a programming language to make this easier." 8 years later they're still working on that programming language, the original project a distant memory.

Programming languages have a way of sucking in developers without them realizing it until it's too late. So while I wouldn't dissuade anyone from writing a PL per se, I would warn them that writing a PL is a *huge* undertaking in its own right, and should really never be undertaken to help you solve your own problems. If you want to go this route, then you need to abandon your problem and focus on the PL instead.


I'd almost agree, but at the same time such work can be a lot of fun.

Personally I think an average programmer could knock out a BASIC, FORTH, or similar programming language - with extensions - in a few months of part-time work, if not less.

The harder part, the part that I can see taking a lot of time, is trying to create a "batteries included" language which has a very complete and complex standard library.

Sure there are cheats if you're able to use dlopen, and FFI, to access C-stuff, but it's still not easy to create client-libraries for MySQL, Postgres, Redis, HTTP, etc, etc. Those kind of extensions/support make your language very very useful for users though. So there's a trade-off.

I've put together a couple of simple languages, with a standard-library of maybe 30 functions. I'd draw the line at anything more no matter how appealing it might seem because I can just imagine how much of a time-sink it might become.


The article is aimed at people hoping to make an actual useful language.

Lots of people could make a BASIC or a FORTH in a week. It would almost certainly be rather useless, unless the language itself is the point of the project using it.

99% of GitHub repos appear to be useless. They show up because someone wanted to make a simpler version of something that they can understand.

They don't teach much about the mainstream professional way. They often don't perform better, since things like GPU and SIMD are more important than simplicity.

They are perfectly fine as toy projects. I have a few myself(Although none of them use any real interesting algorithms or new concepts or anything). But that's... all they are.

At most they will become super niche things like suckless.

But all programmers should know that Electron apps run just fine even on cheap modern hardware, and that existing solutions are probably going to be better than anything you can do, unless you spend months to years like they did.

The programming community doesn't really seem to respect just doing everything by the book, the way a Microsoft dev would, and it's cool to see someone reminding people that it's a perfect good option to just grab some npm packages and make your thing and not bog yourself down with how it all works.


Yeah you’re right. I guess I’m saying that you gotta build a few toy languages before the big one. I don’t think you can read all of the material in the blog, then go build a language that people would actually use. Ideally you should concurrently read the material, work on a toy language, and work on an existing language.


Or just write your own operational semantics and grammar in Racket's Redex :). Lots of ways to Rome.


An article that leverages LLVM like this helps:

https://mukulrathi.com/create-your-own-programming-language/...


It's a page from Loyola Marymount University's Compilers course, which involves building a compiler for a language of your own design. This is basically a syllabus for a course that has a lot more detail at each stage.



One of the sections in the article is headed "study existing languages", but why not also study languages that shaped programming language history? No mention of the Pascal-family of languages (Pascal, Modula-2, Oberon), no mention of Ada. Even Algol 68 is full of ideas that would reward study.

Also, the language descriptions are too brief to be helpful (Java and C# for being enterprisey, C++ and Rust for pointers and other system constructs)


I think it's even more important to study history of PLs if you want to make a new one. The history of programming languages was shaped heavily by Moore's law, and the fact that if you waited 18 months, your previous slow program would likely be twice as fast on newer chips. The result was that programming languages rose to the top by riding this exponential wave through mechanical sympathy.

Today, however, Moore's law has stalled, and a different kind of power law is on the rise: core counts. Languages which previously saw success by being imperative and optimized for single core execution are falling over trying to keep up. We see increased efforts to harness the power of multiprocessing through asynchrony and parallelism being first-class citizens in newer languages, while others struggle to stay relevant.

Languages of the future will be built from the ground up to harness massive CPU core counts that will be available on consumer desktop machines in the near future. My advice to any budding language designer is stop looking to reinvent C and C++. There's currently a wave of programming languages that came of age circa 2010 - 2020 which are focused on just that (Go/Rust/Zig etc.), but if you're just looking to get into that space now you're late to the party. Instead look back to Erlang and the original promise of Object Oriented programming as the basis for a new language in 2022.


I expect an extreme version of this to be the future. Not just many cores, but many machines. My ETLs at work run on a spark cluster and are specified across programs organized in a separate DAG. That’s the kind of “program” with lots of head room to improve.

I’d bet future many-machine heterogeneous-resource languages will make that a lot easier.


Erlang and Elixir are already pretty strong in this regard.


Any ideas why the strengths of erlang and elixir haven’t made their way into the mainstream?


I would say that the strengths of Erlang have made their way into the mainstream, but into infrastructure and tooling rather than languages. I believe the reason for that is because the current crop of popular mainstream languages will never forsake their imperative roots. Erlang's true strength is that it approaches the distributed computing problem from first principles, which yields a language that feels right to work with in that domain; whereas the mainstream attempts to shoehorn distributed computing onto an imperative foundation, which has always been very awkward since it's really just trying to square a circle.


The wiki page is rather informative in the sense of choosing which historical languages may be worth a closer look:

https://en.m.wikipedia.org/wiki/History_of_programming_langu...


Yeah that list is glib. To include Brainfuck but not Pascal is telling.


The context of this page is that it's part of a university course, a per-requisite for which would have been already studying a lot of older languages. The examples listed there are a survey of modern languages, trying to hit as many paradigms/language design choices as possible.


...and no mention of Nim, which shares a lot with Pascal, Python, C and Ada and is more innovative.


This is a great resource for pointing out the things you need to think about in designing a language.

To see the nitty gritty line-by-line walkthrough of everything that goes into actually building a language (all the way down to writing your own VM) I HIGHLY recommend reading Crafting Interpreters[1] by Bob Nystrom. I’m not a language hacker but found everything about this book worthwhile and very interesting.

[1] https://craftinginterpreters.com/


I'd also recommend Writing An Interpreter In Go [1] by Thorsten Ball, and its companion, Writing A Compiler In Go.

[1] https://interpreterbook.com/


We did a programming language for a project. It was a clinical rules language.

The distinguishing features of it were the data model (it had first class access to the temporal data model we had on the back end), and that we kept the tri-state model of SQL with values and NULL.

Other than that it didn't have any real structured types, and just few high level actions.

It was a straightforward project, ALGOL-esque, basic control structures. It compiled to Java source code, and the resulting classes loaded in to the app server.

We used a "compiler compiler" tool, rather than doing it ourselves. It gave us an AST that we then walked. First time doing anything like that, especially with a tool like that.

Especially in an ALGOL like language, getting the expressions to work is the "hard" part. That's where your "napkin to white board to syntax file" falls apart with reduce errors and what not.

But when you get it to work, it's magic. When you get "a = a + 1" to work, and know that where 1 line of code works, 10000 lines of code will work, it's an amazing feeling. You just build the things a piece at a time, testing all the way.

In the end, we had an unresolved precedence issue (as I recall), but we never bothered to fix.

The funniest unexpected outcome was the classic SQL model of using NULL. Simply, 1 + NULL = NULL, and we expanded that to all expressions.

It kind of fell apart in something like: IF A = 1 AND B = 2 OR C > 3 THEN...

If any of those variable (A, B, C) was NULL, the entire expression evaluated to NULL, which was false. It took me by surprise when it happened, but, "duh", of course.

I simply changed the NULL rule to no apply to boolean operators. Instead of evaluating to NULL, I had them all evaluate to FALSE, and that fixed that.

Looking at Wirths work (Pascal, Oberon, etc.), you'll see that compilers can be simple. They're work, but they're simple work. We obviously make them more and more sophisticated all the time, but your compilers don't need to be that way to be effective, productive, and useful. We created thousands of lines of code using that language, it solved the problem very nicely that it was supposed to do.

Runtimes can be hard, but that's a separate problem.


> Simply, 1 + NULL = NULL, and we expanded that to all expressions.

That's not quite right. In SQL NULL represents an unknown value, so 1 + unknown => unknown.

  TRUE OR NULL => TRUE
  FALSE OR NULL => NULL
  FALSE AND NULL => FALSE
  TRUE AND NULL => NULL
Which is also why NULL = NULL => NULL, i.e. unknown == unknown => unknown.


NULLs in SQL are a mess so I don't think one should try to attach any general meaning to them. Nor should SQL's NULL handling be a model for any other language. For example, 1+NULL is NULL, but adding the same two values via SUM(1, NULL), returns 1.


> The funniest unexpected outcome was the classic SQL model of using NULL. Simply, 1 + NULL = NULL, and we expanded that to all expressions.

> It kind of fell apart in something like: IF A = 1 AND B = 2 OR C > 3 THEN...

That is similar to how NaN values are propagated in IEEE 754 floating point standard.

However, a comparison with a NaN would cause an exception (in CPU or programming language, unless you're using a language such as C where FP exceptions are disabled by default).

And then there are several min and max operators that don't cause an exception on every NaN but which have different preferences for propagating NaN or values. The distinction is that a comparison affects control flow whereas min/max are still considered data flow.


> But when you get it to work, it's magic. When you get "a = a + 1" to work, and know that where 1 line of code works, 10000 lines of code will work, it's an amazing feeling. You just build the things a piece at a time, testing all the way.

This is it right here. Ever since I started dipping into this stuff, it's been one of the most intoxicating drinks for me in all of programming. You just get a few little pieces working, and then you know you've created infinite permutations of working programs. Each feature you implement is another infinity of possibilities. There's nothing quite like it.


This article addresses the technical aspects of writing a programming language. However, regardless of the approach you take, it can easily consumer you since there is no no true target to reach. I wrote about that recently. https://write.as/loke/dont-write-a-programming-language


As someone 2 years deep into implementing a language, I enjoyed your article

A minor typo spotted:

> it is impossible [not] to keep thinking about ways to improve your project.

Indeed the most productive days are those when I decide a major feature is unnecessary


Thank you very much. Fixed.

And your comment resonates strongly with me. I have experienced exactly the same thing.


One "taks" instead of "task" typo I spotted too.


Suprised that the article doesn't mention Forth, which is probably one of the easiest languages to implement, and one of the easiest to extend.


This seems to be written by someone who is bragging about their technical capabilities, not someone interested in invention, discovery, inspiration, and trial and error.

There is a time for 'the right' approach, and there is a time for crazy innovations that can take something to the next level.


I can confirm, I'm building a language with all sorts of crazy innovations and did not take the approach that this article does. However, now that I'm 80-90% of the way to v1 with my syntax and features finalized, I'm actually starting to go back through and clean up the code in ways that formalizes it a bit more.


Study Java and C# to be "enterprisey"...

What's their definition of "enterprisey"? It feels somewhat like an implied perjorative.

Maybe this could be rephrased as "study Java/C# to understand why many businesses choose these languages to build things that make them money"?


I take “enterprisey” to mean “optimizing for adoption by businesses, rather than for other qualities”.


I wrote the equivalent article 8 years ago:

https://www.digitalmars.com/articles/b89.html


And a rebuttal argument (or really just an article that takes a very different stance) written in response to Dr. Bright’s article from parent, that while I don’t necessarily agree with, I thought brought out some very interesting food for thought.

https://medium.com/hackernoon/considerations-for-programming...


I'm flattered you addressed me as Dr, but I only have a BS degree and am not entitled to be called Dr. But you may call me Ruler of the Universe if you prefer.

Thanks for pointing out the riposte, I did not know it existed.

Amusingly, C++ has adopted a number of D's innovations!


Err, I'm sorry, but why are you entitled to be called Ruler of the Universe? I have seen photos of you and there are no equidistant markings anywhere on you.


This a lot of discussion about how to design a programming language without a mention of why you're designing a programming language.

I mention why because different whys have different orderings of hows. (For example, if you're interested in some new semantics, working on a parser first is probably counter-productive.)


More than a programming language, what about new paradigms :)


To those having created languages, should I target LLVM or is transpiling to C a viable way (like I intend to) ?

Transpilers to C seem rare, maybe for a reason..


You could compile to C in the first revision of the compiler, but write it so that you could change to a LLVM back-end in the future. That is, unless your language does something that isn't a good fit for C, such as exceptions or continuations.

But do also look into other parts of the LLVM frameworks, especially MLIR.

Historically, there have been many compilers that produced C code, but some may have done so because of lack of compiler frameworks. The first C++ compilers compiled to C. Eiffel compiled to C. Nim compiles to C. I've also used a compiler framework that compiled to Java.


One of the bigger problems of transpiling to C is inheriting its oddities, such as the definition of overflow (or lack thereof) depending on the signedness of integers, etc., etc. Personally, I'd argue that transpiling to C combines most of the downsides of a new language with most of the downsides of C. I will admit, though, that the zero-effort FFI support of the approach is quite enticing.


you can use C to create any system you want including your preferred overflow, signedness, etc. Yes, it's that powerful. Then transpile to what you've created... which would still be transpiling to C. C lives outside the box, try to think that way.


Without compiler extensions galore, no C is not that powerful. Until C11, you couldn't even control alignment without hacky union workarounds that weren't great. You simply can't model exotic integer types in C like you can in LLVM IR. You can't disable struct padding without a GNU extension. The variance and portability of libc is pretty annoying too; you'd be surprised how many GNUisms you thought were in the spec. But worst of all, C's integer behavior is extremely unintuitive and basically force you to use case every operand and every binOp result to ensure desired behavior in all cases. You totally could transpile to C, but you'll basically have to master the C spec & its edge cases, write everything out in single-static-assignment form with explicit casts everywhere, avoid most of libc, use at least C11, accept some shortcomings, and avoid using most C language constructs and basically write assembly in C. Either do all that, or you'll inherit some of the quirks and shortcomings of C (and have your C compiler misinterpret your code). And even then, it probably won't optimize that well. Please, just output LLVM IR or something.


Yes, that's how I see it. C can do anything - albeit ardously - and in the process of writing that transpiler to C in C I hope to make a huge leap in my understanding of C's model and of constructs in general.


I'm creating a language and transpiling to C. At this point I'm about 80-90% of the way to v1 and transpiling to C is working well so far but I'm aware of the possibility that I may need to change it in the future. However, I absolutely love that a programmer using my language has the ability to write C code when needed, similar to how Python has C extensions.


Transpiling to C is certainly possible but the infrastructure for things (e.g. DWARF data for debugging) is much better when using LLVM IR directly. IIRC some operations are more pleasant (as in LLVM's definitions are more consistent/have fewer gotchas).

That said C transpilation is certainly a viable root (e.g. Idris does this)


Will Chrichton at Stanford Uni argued for targeting Rust rather than LLVM to get the benefits of a safer intermediate language and a standard library for free.

It is interesting to consider the trade-offs even if you go with C or LLVM:

https://willcrichton.net/notes/rust-the-new-llvm/


I think a useful article in this vein would be building useful DSLs for real world applications.


+1 I think this is a much more valuable exercise and much harder because it require knowledge of the domain.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: