Reference implementations of medium sized applications are incredibly useful for leveling up as a programmer. While there are many large successful open source applications, many are overwhelming to read and learn from.
Having something that outlines the key features and components and which ignores the important but complicated edge cases assists in keeping the attention focused.
Now if there are annotation within the source code, that would be truely incredible.
I found that another hard thing that feels like a prerequisite for leveling up further is getting the feel for design/architecture of medium-sized systems. There's a pretty awesome resource for that particular need - the series of books called "The Architecture of Open Source Applications".
I'm just through the few first chapters of the first book, and I must say it's absolutely amazing. Each chapter gives some understanding of the thought process people designing (and iterating on) a known open-source project had.
> getting the feel for design/architecture of medium-sized systems
Yes. This is exactly the kind of thing I was once looking for, only I didn't know how best to phrase at the time. Think I've since found some of the details I needed then, but will look into these nonetheless and see what I missed, if anything.
Corrode is absolutely incredible. This file is literate Haskell, which means there's more documentation than code (I guess), and it transforms C into Rust.
Wholeheartedly agree. Making my own lisp was an eye opener on how programming languages work - and that's after I've read a lot of theory on the matter.
By the way, is there are similar resource for building your own relational SQL-based database?
> While there are many large successful open source applications, many are overwhelming to read and learn from.
As long as the first commit isn't something like 'import to git' or 'add the code' (which seem to be tragically frequent) I find VCS a huge help here. The problem is that large OSS applications are (tautalogically) large. VCS allow cutting it back, and showing the evolution.
Minix (mini Unix) is a teaching operating system that goes with the well known os book "Operating Systems Design and Implementation". We used it at Cal. Very fun.
A trivial thing, but README.md for me says "Micro Emacs in D", where it should say "Micro Emacs in C" (the repo description on Github is correct though).
It's my primary editor. I'm so used to it I don't even know what the commands are - I just watch my fingers do it.
The first thing I do with a new system is port ME to it. I'll use vi to do what's necessary to get there (usually get ssh working so I can edit the files with ME on another machine).
I've worked out how to do syntax highlighting on it, but never got around to doing it.
Another thing to like is it has no configuration files. This mattered a great deal in the DOS days because floppies were so slow! But even today, who wants to futz with such? If it needs configuring, I just tweak the code and recompile it.
Back in the 80s, I handled configuration by having ME directly patch the ME executable. (This was a trick I learned from the old ADVENT Fortran game.) It was marvelously simple and bulletproof to do that.
Unfortunately, programs that patched their own executables became huge no-nos as malware took off, and I had to abandon that.
I wrote quite a few DOS programs in the late 80's/early 90's that used ini files appended to the .exe as the configuration. All you had to do was run `copy prog.exe + config.ini program.exe` to build a single file that could be passed around without worrying about either a lost or changeable config file. Your code just needed to read its own exe header to figure out where to offset into it to pull out the ini contents after opening the file in read-only mode. It worked very well for apps where you didn't want users to mess with the config.
Mine was a matter of grouping the global variables for configuration together. Take the address of it, compute the offset of that address in the .exe file, and write.
It's my birthday today and after a peaceful morning routine I hadn't yet decided what I would do today except that I would do whatever I felt like doing. I felt like writing a text editor in C.
I think Kilo is a great little project, well-structured and very educational.
Another useful resource I've relied upon in the past, dates back to the 1990s: Freyja, which is Craig Finseth's emacs-like editor written in C.
Here is a list of features:
* deletions are automatically saved into a "kill buffer"
* ability to edit up to 11 files at once
* ability to view two independent windows at once
* integrated help facility
* integrated menu facility, with help on all commands
* can record and play back keyboard macros
* supports file completion and limited directory operations
* includes a fully-integrated RPN type calculator
It was designed for MS-DOS with the Cygwin terminal library.
I found the architecture to be very clean, and it is well explained in Finseth's classic book ("The Craft of Text Editing"). The book is worth reading even if you never touch the code: http://www.finseth.com/craft/
It uses a multi-buffer architecture roughly similar to Walter Bright's text editor (see sibling posting). (I knew about Finseth's editor years ago, but was not aware of Bright's work until now, thanks Walter!)
Scite is a barebone open source text editor created by Neil Hodgson to exercise his "Scintilla" text-editor c++ library which is used in others like notepad++.
In hindsight, I would have a bit more caution programming text editors. I started tweaking and modifying Scite years ago, it was very interesting but it was no small undertaking and I came to understand why Neil advised in the support forum, to customise it using the inbuilt Lua scripting. Im still using this 6 year old customised version of Scite that I never managed to sync with the latest version, and it has 10 thousand lines of custom Lua facilities like file encryption, navigation panels, multi-edit mode etc.. which I wrote and stabilised a few years ago. I rarely venture to alter it now that I am at last comfortable with it, but its going to need serious attention sooner or later...
Okay, I won't lie, this is a bit of a side-swipe but...
... it sounds like you kind of wanted emacs. One of the most impressive things I find about emacs (especially since semi-proper packages became a thing) is just how easy it is to get stuff that is 5-10-15 years old working on it. No word of a lie, it's amazing how they've managed to break so little over the years.
No I appreciate the tip and have often considered diving into Emacs or Vim. Im very fond of the Scite and Lua combo as well now, and i would like to have the time to share and develop it more than to set up on a new system - but yes those other systems do look good...
Yes its great --a super efficient and fast little VM and comfortable syntax and scoping. I realised how great Lua is for separating user-complexity from an applications core as my customisations got more substantial.
Over at howl.io we've been working on another Lua/moonscript based editor. Incidentally it used scintilla as well until we switched to an in-house engine called aullar. The author wrote a blog post about it: https://howl.io/blog/2016/05/26/introducing-aullar.html
Yes it's great. Interfacing with new libraries is easy with LuaJIT's FFI and implementing low level stuff works well too (e.g. something like reverse find on a string basically gets JITted to C speed).
I wrote an open source editor with some buddies 10 years ago and still use it on a daily bases. That was built around codemax. A month ago, me and one of those friends picked it up again for a new revision and started retooling using Scintilla. Scintilla is an amazing control, very well documented and it has a lot of features.
Some of the standard features from codemax have to be rebuilt, but I'm still very impressed.
Oh and before somebody plugs their favorite editor and why I should be using that instead, if it doesn't know DataFlex it won't help ;)
I also tried to make an editor 5 years ago, because I was not satisfied by either notepad++ or sublime text. I was checking out notepad++'s code, and I was quite amazed by the fact that making a text editor was not some simple task.
The syntax highlighting functionality alone is a combination of something as complex as what a compiler/parser does, and you have to do it almost in real time.
So if you want to color every different part like operators, symbols, braces, with a different color, and you try to do this on C++, it is not going to be a small task...
> So if you want to color every different part like operators, symbols, braces, with a different color, and you try to do this on C++, it is not going to be a small task...
Quickly looked at your source without the slightest knowledge of Nim language. It was certainly Python-like in ease of reading like some here told me. Nice, clean look to it vs the C stuff. Exception is types.nim file that just looks syntactically rougher than others with the asterisks. It looks like class or struct definitions but I'm curious if the asterisk has an effect like it does in C or what's it do?
The asterisks are what define a symbol as "exported" from a module -- every symbol is private to the module scope by default, but able to be exported by applying the asterisks to it; types.nim is funky because I was exploring a slightly different file/module structure (and having a singular place for top-level shared types that most other modules import is quite useful at times, and gets around circular dependencies), except it does end up looking slightly odd!
That's it. For the most part, one would just export the symbols from given modules as needed (and a lot of the properties on some of the ref object types are needlessly exported, too, which makes it worse)
One other thing is that the {.compile.} pragma in term.nim is not needed anymore, as Nim's stdlib has added a lot of those in[0], but it does show how easy it is to bridge between the two languages (and I'm not much of a C developer!)
Under 50 lines for a text editor written in K by the language's author. Way beyond my present understanding, but the promise of very small, powerful code is incredibly attractive.
It is, but the little I've seen of K implies that it's small size comes from a combination of two things:
1) A standard library / set of operators that are a very good fit for the typical domains it works on.
2) Minimizing symbol length.
E.g. I translated one example I saw into Ruby, and ended up with something of similar length once I 1) implemented equivalent methods, 2) dispensed with all idioms for how to write Ruby and went for single character variable names and method names etc.
In other words, there's nothing particularly "magic" there.
K code gets to where it is largely by because its author and users are willing to violate every convention from other languages in terms of how to write and structure code in pursuit of a philosophy that is fundamentally different in terms of e.g. focusing more on code size.
That could be a good thing, but I'm not convinced that the extremely sparse code is worth the (to me at least) extreme lack in readability.
Code golf can be fun in any language, but it seems few of us have gone back to writing other code and decided it's worth aiming for code that small. In fact, I've more than once rewritten code to be longer because it made it easier to read.
That said, there are good parts in terms of language constructs etc. that'd be worth learning from. If only it wasn't so incredibly annoying to decipher the code (yes, I'm sure it gets faster when you get used to it).
Just to be that guy. Yeah, it's cool that the language author built a text editor in under 50 lines, and there's a sort of geeky hacky appeal to trying to cram as much functionality as possible into as small a space as possible.
But what does it actually mean in terms of programming? How maintainable is "small code"? How readable is it? (Well, you answered that question already.) How hackable is it?
It's a curiosity, and a really fun one! But it's not practical.
I wouldn't say the code is really that small in its language context, which is why comparing code by lines is inherently fallacious.
It's just that a large percentage of languages are quite similar in how they're structured. Since Pascal and at least until Java/C#, most mainstream languages ended up roughly doing "one thing" per line. Quite often one function call or simple mathematical operation. Then each function or block of a larger one does one larger thing. And so forth.
Code and/or languages that break that paradigm are often confusing to "switchers" and thus often abandoned or maligned way too early.
But most often, they're just scanned differently.
Assembly would be the opposite end. It basically takes "do one thing per line" to the extreme. But quite often, experienced asm programmers scan the program by blocks, as some patterns are quite common or you find some constant/string to attach your focus to, then continue from there into the details.
Forth programmers obviously read slightly differently, due to the high level of decomposition and the stack based nature (fewer parameters).
Lisp looks a bit weirder at the first glance, but I wouldn't even say that it's read all that differently from the Algol family, if written imperatively enough.
Functional code, in almost any language, often has to be read differently, as a lot of things can happen in one line.
APL and some DSLs (regular expressions for example) are the opposite end of the spectrum. But line length would be the wrong axis to judge things, the amount of operations isn't necessarily that much smaller.
Data exchange (arrays vs. stacks vs. function parameters) is the bigger change, as is symbolic density (APL symbols vs J shorthand vs. function names vs. HiHowYouDoingIAmAnAbstractJavaBeansFactoryConsumerImplementationNiceToMeetYou)
> But what does it actually mean in terms of programming?
Without trying to be too trite, what does this question mean in terms of English?
> How maintainable is "small code"?
Once you get the hang of it it's as maintainable as any other code base
> How readable is it? (Well, you answered that question already.)
There is a learning curve, but the code is actually reasonably readable. It takes time to get used to many operations occurring in one line but there are benefits (eg you can see everything that the CTRL-Z function for undo does at a glance).
> How hackable is it?
Again, I'm not sure what you're asking here.
The editor is very bare bones so isn't a great example of production code. In a real system generally people are a bit more verbose.
But super-compact code could be very practical/readable/hackable, as long as there's the prerequisite knowledge. Fewer symbols means less complexity to parse. This works, provided that those symbols map to powerful operators that can be really understood and effectively combined to produce the desired outcome.
And there's no need to scroll: perceive everything in one glance!
If you are looking for an interesting project that uses terminal input modes try writing a command line utility that reads in a password correctly. (eg. Masks the password input so it does not appear on screen whilst it is being typed, prevents someone from scrolling up in terminal and copy/pasting the typed password and zeros out the password after use to prevent it being visible in process address space).
It is surprisingly tricky to get right. There used to be a POSIX function in the c standard library - "getpass" to handle this but the implementation was not thread safe and thus it was depreciated from posix spec - the only portable way I know to do this is to "roll your own code" using termios - not ideal.
Yeah, this is still a problem. Trying to write a simple CLI in node and the library I'm using to prompt a user for their credentials only replaces the user's input with asterisks. There is still an open issue for properly masking the password input: https://github.com/SBoudrias/Inquirer.js/issues/177
I'm curious if anyone here regularly codes in a text editor they wrote themselves?
I've often thought of coding one for fun, with no intention to share it, just for the purpose of having a long-term project that evolves along with my skills. I've never made time for it, but I still consider it once in a while.
I do, sort of. I bought a text editor a number of years ago that was very popular. I still maintain it and use it daily and have often thought about putting out a new version. I have it running on many OS's and devices now. I have also added a lot of features specific to what I wanted. I'm not really sure if they are useful to others or not.
Example: I have an iPad version where I can use an Apple Pencil and handwrite my code on the screen. To me it is very useful when my wife is driving me someplace and I still want to work. Recline in the passenger seat and code.
I don't know of others that do this at this time. I didn't actually do any research. It is something that I wanted for myself and the way that I like to work. I prefer to hand write as much as possible. I was trying out a Surface Book just because I could hand write my e-mails!
It is working pretty well for me. I used OpenAI and basically started asking my friends to hand write samples for the NN to learn from. They write them various levels of neatness and various slants, font size, etc. I store the handwriting strokes as a series of points and convert to digital text when they are ready. I save the handwriting along side the digital version. It is possible to hand write, then come back days later and your work is still there. Maybe I could make a video to show if it is useful.
QEdit's configurability was amazing. Back in the early 90s I had a completely hand built config, a feat I have never dared to repeat with subsequent editors. I wrote a lot of code in it at home, and nagged my employer into buying some licenses for the sophisticated programmers among the staff (it was a bank, so not many, but a few!). I would edit locally and FTP to the server rather than use vi directly there!
When my computer illiterate aunt decided to write a book back in the early 90s, I set her up with a minimalistic QEdit, a few bat files to perform backups, versioning, etc automatically and a floppy disk for each day of the week plus a daily backup one. Simple instructions and process, simple editing setup, and two years later the 670-page book was finished and published, and she was ready to actually learn how to use a computer.
So a big thank you for creating such an excellent tool!
I made significant changes to antirez's kilo editor myself, adding in support for Lua-based scripting. I use that for composing emails, again in a lua-based mail client that I wrote myself.
For coding though? I usually use emacs. It's just so damn customizable..
I used to ~20 years ago. It was a curses-based mini-emacs in C; the fun part was a notebook-style interface to a Forth-like language I added in. Eventually I wanted the same kind of interface for Python programming in Emacs, and moved on to that with https://github.com/darius/halp.
I would not say that I use it regularly, but I have a somewhat working version of what I remember to be the Norton Editor in C++ / ncurses. It was 100% motivated by seeing antirez's artistic triumph, and my effort is sadly lacking in comparison. Still, I look at it every few weeks and add a new feature, and now I am mostly editing the code in the editor itself, which is gratifying.
I actually need to get off my duff and code what I really want, which is a version of what I remember the best parts of the Norton Editor implemented as an emacs mode. I have great memories of writing pascal code in ne.com, so it is a big nostalgia item for me.
I had a copy that I found on an old floppy about 15 years ago and I tried to see if I could use it in a Windows command window, but unfortunately it did not seem to work at all. Now that you mention a virtual machine though, I regret that I did not try and setup a FreeDOS VM.
Googling "norton editor manual pdf" lead me to a few old copies of the manual, so that is what I have used as a guide in my work, but most of it is just driven by how I remember it.
Regularly? No, but I got https://github.com/oconnor0/build-your-own-editor to the point that I use it for small editing tasks or occasional remote work. I have a few commits that were done entirely in it. I don't think I'd use it over vi except that I wrote it and want to use it. It also has keybindings far closer to what I'm used to in Sublime Text 3.
I got fed up with the standard offerings back in '07 or so and hammered out my idea of a code editor that's perfect for me over a long weekend. It mmaps files initially for instant response, it has a small sensible command set that I can remember completely, it depends on nothing more than the standard C library and a terminal emulator, and it's not many more lines of code than some Emacs configuration files that I've seen. I still use it for everything.
If you're so sensitive to superficial syntactic issues you're gonna miss out on a lot of reading/learning pleasure in this life. (It's one of my pet peeves: http://akkartik.name/post/readable-bad)
I was taking about the ability to debug code. Various IDEs/compilers have sometimes issues and bugs on multistatement lines in DWARF/etc. I've worked on a low level debugger, so it was a big problem, even few years back.
Two function calls plus arithmetics plus step-return.
Depending on the debugger interface(eclipse with gdb, vs, etc), one will have to press some form of Step 1 to 4 times to advance the line. Depending on the complier that was supposed to provide correct debug info (GCC vs msvcc vs xlc) and bugs, these steps may or may not work.
It is short, but when you do multi platform code, a real pain to deal with.
Consider aix and Linux multiplatform code for one of my past projects. My choice would have been:
1. Step and maybe hit a bug, msg the complier team, proceed writing register values on paper.
2. Break up the line, recompile. Few minutes on Linux, over half an hour on aix.
This is a pretty common way to write C, though, it's not something specific to this particular codebase. You just had a non-standard use case where you were constantly running into low-level bugs. (In an embedded platform?) If you aren't in that domain anymore, it's worth revisiting the trade-off. In most domains, gdb's continue command is super useful.
No. Parent is correct. Writing code just to show how clever you are, at the expense of others who need to debug and understand it later, is poor taste.
It's pretty weird to see "correct" justified by calling the opposite "poor taste". We're talking about a mismatch between a style of code and certain tools; a better answer is to use tools that work well with the style of code you like.
I agree that naming expressions often helps understanding, and that's worthwhile -- I guess I just disagree about this code at first glance. Maybe if I tried to delve into it.
Thanks for making this! It is obviously a labor of love. This kind of incremental building is a lot of work, but adds a revealing dimension not seen in a flat annotated source.
The next fun experiment is to handle gigabyte files without undue performance troubles with a lively ui thread! This is where scintilla et al run into limits.
Scintilla is an editor for source code. Normally code isn't gigabytes in size and doesn't need a lexer, code folding or syntax highlighting. Not saying that it can't be done, but if you open a file like that then turning off those features would already make it a lot easier to handle huge files.
Just have to say, this is ridiculously detailed, which is great. I'm not looking to build a text editor in C, but I learned a good amount just skimming the first few chapters.
Great site, great presentation, thanks for digging into it with so much detail.
Would something like this be a way for a beginner to C (barely any experience) to get their feet wet? Or is there an expectation of basic familiarity with C already?
Yep, lots of Ruby/PHP/Python/Javascript, some Java. So I can comfortably read syntax in most languages, but obviously none of the languages I've used are as low level as C.
Yeah, brush up on pointers a tad bit (just have a look at some youtube videos, honestly), if you are uncomfortable with them, and then you should be good to go.
I'm working on an editor in JavaScript. You would be surprised how fast string operations, like concatenation, are in JavaScript! You can hold the entire buffer in a String! While browsers renders text very well, the DOM is relative slow to interact with, but there are other ways to render in JavaScript, for example the Canvas, or into a terminal, or even stream a video, or talk directly to a display.
marginally relevant: I was looking for a terminal text editor for git commits and other similarly simple tasks: my only requirement is that I can save&leave with ^D. Any suggestions?
You can add simple Emacs-like keybindings by changing that into
$ echo 'rlwrap cat > "$@"' >editor
Or just learn ed:
$ GIT_EDITOR=ed git commit --amend -a
256
1c
stuff and more stuff
.
w
271
[master c5092a6] stuff and more stuff
Date: Thu Apr 6 12:58:18 2017 +0200
1 file changed, 1 insertion(+)
(the numbers are ed telling me how much was read and written; `1c` means change the first line; `.` means I'm done inserting (go back to command mode) and `w` means write/save; exit with ^D)
for some strange reason the 'Fira Mono' google font is displaying all &s as |s. Anyone else seeing this? Cool book though, I'm going to work through it this weekend.
You may have a broken version of the font locally. If you download the offline version of the tutorial, it'll come with all the fonts it uses and hopefully will work.
For better portability, terminfo should be used instead of using hard-coded terminal sequences. Otherwise, this is a really great intro. I liked the beginning with how to put your configure your terminal.
Having something that outlines the key features and components and which ignores the important but complicated edge cases assists in keeping the attention focused.
Now if there are annotation within the source code, that would be truely incredible.