Roll Your Own UNIX Clone

jacquesm · on Aug 4, 2009

Do it! By all means, really, seriously. You'll learn more about coding complex systems than any other project you've ever done.

It's absolutely doable.

If you want to make your life easier go for a micro kernel with message passing. That will partition the tasks to the point where a single determined person can go all the way from MBR to prompt.

Runner up in the experience category: make a game from scratch that has multiple threads of execution, but use a single threaded process. That's almost the same experience but completely in userland, which should make your debugging life a lot easier.

But if you want to learn all there is to know about memory management, interrupts and interfacing with hardware as well as process scheduling build a kernel.

Older dr. Dobbs issues contain tons of useful information.

Here is some food for thought: http://ww.com/task.cc.html and http://ww.com/task.h.html

Other sources of inspiration: the early minix kernels and the early work on what is now known as FreeBSD by Bill and Lynne Jolitz (also described in dr Dobbs iirc)

Locke1689 · on Aug 4, 2009

If you want to make your life easier go for a micro kernel with message passing. That will partition the tasks to the point where a single determined person can go all the way from MBR to prompt.

I don't know if I'd say that... For one, unless you are actually going to build vdevs and stuff any kernel you build is going to be a microkernel. In my experience, however, I would not say that message passing makes your life that much easier. In addition, it's probably a good idea to write an ELF compatible OS if you want a prompt. Writing a shell isn't that hard, but I wouldn't want to replicate the basic userland tools if I can just statically compile a busybox library. You should also plan to implement some sort of ramdisk if you want a useable environment (or virtual devs with drivers, but that would probably be harder). Actually, I'd recommend just building off QEMU, since that'll give you rudimentary device framework and keep you from having to debug on physical hardware (like I do).

jacquesm · on Aug 4, 2009

Message passing will make the kernel dead simple, after that it is all userland. This makes it much much harder for one device driver you're writing to crash another (or the kernel) by data corruption. A simple post mortem crash inspection will tell you what went wrong. Instead of doing the same on a whole kernel you'll just have that one little process that goes under, and the rest of the machine is still working.

I don't think I could have ever gotten my little OS to the self hosting stage without that.

It still took more time than I care to remember though :)

A virtual machine is a great tool as well, especially if it gives you access to the cpu contents of the running VM.

That will save you days if not weeks trying to find out why your switch from 'real' to 'protected' mode during the boot didn't work...

Locke1689 · on Aug 4, 2009

Yes, that's why I recommended QEMU (which is a fast system emulator). A simple postmortem crash inspection will tell you what went wrong anyway. Simply write a normal micro handler to dump the registers at the end. Since you have the RIP, just objdump your binary and read the assembly from where it took the #GP or #PF. If you blew the stack away, you should automatically know from RSP, etc, etc.

Kitten[1] is the LWK which I work with, although my focus is on the Palacios VMM. It's quite a simple OS that is by no means out of a good programmer's reach.

[1] https://software.sandia.gov/trac/kitten

Locke1689 · on Aug 4, 2009

I'll just leave these here...

http://www.intel.com/products/processor/manuals/index.htm

[disclaimer: I'm an OS/kernel dev]

sown · on Aug 4, 2009

I gotta ask...if I go do this sort of thing on my own, I might produce something that won't segfault or oops()...if I were to walk into a job interview for an os/kernel dev job (the kind that rand say I did write a crappy primitive kernel would that look good or bad?

I've been working for almost 3 years now and this is an area that I would like to move into.

Locke1689 · on Aug 4, 2009

I'm not in industry but in academia so I'm hesitant to say anything concerning commercial development. I can't ever see it being a negative thing, but the difference between a toy kernel like he is proposing and a real system is larger than many people think. For example, knowing how to use the GDT and the LDT is definitely required, but knowledge of SMP design and general hardware architecture (think chip design, memory hierarchy, etc) are equally important. If you were looking for a career in systems design, definitely read the Intel Manuals cover to cover -- then go look at the Linux/BSD kernel to see where things were explained so poorly that they may as well be wrong. I would also highly encourage you to take part in a systems open source project (since I work on virtual machines, I would of course suggest Xen or QEMU).

If you like, a much simpler version of the Linux kernel, Kitten OS[1], is used to run at Sandia and in sequence with the Palacios academic virtual machine[2], which is what I'm working on. It may be worth looking at, although I still highly recommend that you look into actual production code.

[1] https://software.sandia.gov/trac/kitten

[2] http://v3vee.org/

dkersten · on Aug 4, 2009

The System Programming manual and the Instruction Set references (I do not have a copy of the Application Programming manual, unfortunately, and don't really want to read it in PDF form) are exceptionally good and have helped me back when I used to mess with toy kernels and also helped with Uni assignments. I'd definitely recommend them to anyone who is interested in kernel or assembly development.

mahmud · on Aug 4, 2009

But you can bootstrap a cheap Unix with chapters 5-10 of this:

http://www.logix.cz/michal/doc/i386/

Getting into protected mode and exec'ing processes in their on private space has been done with ~4 pages of asm.

Locke1689 · on Aug 4, 2009

Don't run in v8086 mode unless you have to (BIOS boot in Intel VMX). Just go directly to PE or LME as the new docs tell you. It may be interesting to look at the history though.

Also, while devs have to write portable code -- don't bother. It'll save you a bunch of time if you just write the IA32e specific asm code.

If anyone has any interest, I can also post my adaptation of an asmx86 vim syntax file I hacked up from something I found on Stack Overflow. It's not that great, but it does at least fix some of the register highlighting.

michael_dorfman · on Aug 4, 2009

I'd think that Tanenbaum's book Operating Systems: Design and Implementation (OSDI) and the included MINIX3 source would be high on the list of reading for any kernel-curious developers.

Locke1689 · on Aug 4, 2009

Good, but a little outdated. For general systems overview, as well as introduction to UNIX architecture, I'd actually recommend my textbook, which is the CMU intro systems book, Computer Systems: A Programmer's Perspective[1]. For OS theory, our OS/advanced OS textbooks are fine, but for actual implementation my coworkers recommended The Design of the UNIX Operating System[2], Linux System Programming[3], and Understanding the Linux Kernel[4].

[1] http://www.amazon.com/Computer-Systems-Programmers-Randal-Br...

[2] http://www.amazon.com/Design-Operating-System-Prentice-Softw...

[3] http://oreilly.com/catalog/9780596009588/

[4] http://oreilly.com/catalog/9780596005658/

ericmc · on Aug 4, 2009

I also recommend Computer Systems: A Programmer's Perspective to anyone who wants an intro to systems. One of the best textbooks I've read.

michael_dorfman · on Aug 4, 2009

Outdated? I thought the Linux Kernel was outdated in comparison. The Minix kernel was rewritten from scratch in 2006 for version 3.

Locke1689 · on Aug 4, 2009

OSDI (2nd) was written in 1997, perhaps you were thinking of this? http://www.amazon.com/Modern-Operating-Systems-3rd-GOAL/dp/0...

Minix 3 is a whole different ball game though. It's an interesting kernel with some interesting ideas, but I still recommend Linux as it's both more popular and reflects the majority of UNIX design decisions today. Even the OSs with praise for microkernel design tend to incorporate a number of monolithic features.

EDIT: Sorry! I had no idea there was a Third Edition of OSDI out http://www.pearsonhighered.com/educator/academic/product/0,,.... However, I should mention that I meant those Linux books to be read in conjunction with the kernel code -- the book is still valid today despite the additions to the kernel in the past couple years.

michael_dorfman · on Aug 4, 2009

The 3rd edition of OSDI is definitely worth reading. The MINIX3 kernel is only 4000 lines of code; very clean, and easy to wrap your head around.

christopherolah · on Aug 4, 2009

Writing a kernel is fun... I started writing one a few weeks ago. It's not a real one, just something to play with. But it's very interesting, probably because I'd never worked at such a low level before.

Great resource: http://www.osdev.org/

daeken · on Aug 4, 2009

OSDev and OSDever ( http://osdever.net/ ) are both great resources in general, but I can't help but mention #osdev on Freenode. Incredibly useful resource if you're in need of assistance.

asciilifeform · on Aug 4, 2009

Please don't! Build something original. It really is possible!

Once you've built a UNIX clone, you have polluted your mind as an OS designer. I did so (as a standard college homework assignment) and it took me years to shake the crud out of my head.

As for the tutorial, it is very well written, but x86-32-centric and therefore worthless. There are nontrivial differences between x86-32 and x86-64 from the standpoint of an OS author who wants to make full use of the latter's capabilities.

daeken · on Aug 4, 2009

Building an 'original' OS when you don't know how to build an OS is like asking a kindergartener to design a car.

As for the x86/x64 differences, they're by no means insurmountable once you have the concepts down, which is exactly what this tutorial provides.

asciilifeform · on Aug 4, 2009

What makes you assume that blindly copying Unix is necessarily the easiest way to learn OS building?

daeken · on Aug 4, 2009

It very well could not be, but what's important here isn't what's built, but how it's built. The concepts are important, not the end result. Compared to the vast majority of intro to OS tutorials out there, this is great.

As I said, you can't expect someone to create something original unless they understand the mistakes of the past. I started off implementing toy kernels with no structure, moved on to Unix-like kernels, and eventually ended up in a realm entirely different. Just because someone starts out copying Unix doesn't mean they're going to be blind to other competing designs or completely unique ones.

Expecting someone to build something new without any basis is just foolish.

asciilifeform · on Aug 4, 2009

> Expecting someone to build something new without any basis is just foolish

Base it on your mental model of how your CPU's internals ought to be used, as inferred from the latter's manuals - like 1980s microcomputer users did.

Creativity exists.

daeken · on Aug 4, 2009

I want to become a musician. If I listen to existing music, my creativity will be limited by what I've learned. Does that mean that listening to music is a bad thing if you want to become a musician?

asciilifeform · on Aug 4, 2009

Music is a poor example: a field with unusually rigid cultural norms, where true originality is punished far more often than it is rewarded. So, taming one's creativity by exposure to current norms is exactly how you become a successful (permissibly mildly creative) musician.

jacquesm · on Aug 4, 2009

Who knows, sometimes being versed in the status-quo is a hindrance.

daeken · on Aug 4, 2009

If you don't understand the mistakes of the past, you're bound to repeat them. This is especially true if you haven't implemented a real-esque kernel already. Pushing someone away from a tutorial that does a great job of teaching the concepts while telling them to do something original is just ludicrous.

OS development is already difficult enough, why make it harder? For every ten people that read and follow this and decide to make the Next Big OS (TM) following Unix concepts, you might get one that sees something bigger.

jacquesm · on Aug 4, 2009

Ok, I can agree with that. So how do you feel about studying some of the more exotic variations on the theme (say, Plan 9) ?

daeken · on Aug 4, 2009

If you're serious about getting into OS development and doing something different, you have to study everything available. Plan 9 (and the closely related Inferno), L4 (my personal favorite microkernel), NT, BeOS, Singularity, etc. If you're going to build something unique, you have to sample everything.

asciilifeform · on Aug 4, 2009

This is terrifyingly bad advice, all the more so because no one here seems to understand why or even thinks there is anything to understand.

Learning has a direct cost in creativity. Once you have mired your brain in cached thoughts (http://lesswrong.com/lw/k5/cached_thoughts/) of Unix brokenness, it is very difficult to dislodge them. They won't feel like cached thoughts, or like anything special for that matter - just "the way you write operating systems."

Not one of the systems you mentioned deviates in any fundamental way from the mistakes of the original Unix. Not one. One of the reasons for this is that each was designed by people who have been steeped in Unix internals.

daeken · on Aug 4, 2009

Yes, learning has an impact (negative and positive) on creativity, but not learning has an impact on your ability to design and implement in general.

In addition, what Unix mistakes do you see in Singularity, for instance? It's drastically different from Unix-like kernels in effectively every way.

What OSes would you recommend that budding OS developers study, if not these? Amoeba is one of the few I can think of off the top of my head that might fit what you're looking for.

asciilifeform · on Aug 4, 2009

> What OSes would you recommend that budding OS developers study

Try this one:

http://www.memetech.com/

It is just a 512-byte bootblock demo, and yet it does something which no braindead Unix clone can: orthogonal persistence.

daeken · on Aug 4, 2009

Building a demo and building a full kernel are two very, very different things. In addition, it really says nothing about the design, but the implementation (again, we come back to implementation). There's no reason you couldn't implement orthogonal persistence in a Unix-like system, even if it's not optimal.

Honestly, I'm sort of baffled we're still arguing about this. I can't stand Unix-like kernels, I just believe that this particular tutorial is excellent at teaching the basic concepts required to put together any OS. If you find a tutorial of this sort of quality for any other design, submit it and I'll be certain to upvote it.

The world needs more OS designers, and tutorials of this sort lower the barrier to entry.

Edit: Also, if you're on Freenode by any chance, shoot me a PM (my nick is my username here). I always enjoy talking with someone who's as passionate about OS technology as I am.

jacquesm · on Aug 4, 2009

thank you for that link, very interesting stuff!

Any insight on how this compares to other technologies out there ? Forth ?

JamesM · on Aug 9, 2009

As to your previous point, the tutorial series is aimed to be one thing and one thing only: A set of tutorials to help bridge the gap between the practical and the theoretical as far as OS implementation is concerned.

There are plenty of tutorials on ASM coding, and plenty on low-level C coding. There are plenty of books about OS internals. The books and articles I've found before tend to stay firmly in the theoretical, in the abstract. The aim of this series is to show how the theory can be implemented. Then, the reader has the knowledge to (possibly) implement their own algorithms and know how they link to the CPU's internals.

As to your second point - do you think that the memory manager in this OS is optimal? Do you think that the linked-lists everywhere are optimal? Of course not. Everything was chosen for its simplicity - I failed in some areas I know, and I'm rewriting them as I speak (they're stalled at the moment due to a lack of time); the heap for example is a mess in that series. The new one is much easier to understand.

The series does not create an optimal OS for IA32. So your point about transitioning to amd64 to "make full use of the latter's capabilities" is moot - you have to do more research to optimise for IA32 anyway, let alone amd64!

(And let's not forget that all amd64 CPUs are backwards compatible with IA32).

dkersten · on Aug 4, 2009

Thats great and all, but hes forgetting two very important aspects of a real OS: drivers and software.

Ok, hes not building a real OS, but rather a toy OS (presumably for the learning experience), so these things are not so much of an issue. Its actually a good article, IMHO, and I wish it had been available a few years back when I tinkered with writing my own toy kernel. The "Roll your own unix clone" title given here is pretty misleading though.

JamesM · on Aug 9, 2009

I wouldn't say it was misleading - the front page of the article clearly states that a kernel is being developed. Drivers and software belong in the rest of the operating system. Given the availability of GNU software, a UNIX userspace is fairly easy given a smallish set of system calls.

Yes, I'm the article's author. Yes, I'm biased.

GeneralMaximus · on Aug 4, 2009

Whenever I think about doing something like this, I always get stuck up on x86 ASM. There don't seem to be many books on ASM out there, and the ones I could find in the bookstores were all written around MASM which, of course, does not work on Mac OS X.

Any ideas?

limmeau · on Aug 4, 2009

The difference between Intel/MASM syntax and AT&T/GNU syntax is rather superficial[1], so with a little squinting you can use a MASM book with the GNU assembler. Newer versions of the GNU assembler also come with a directive .intel_syntax so you can write your learning programs in MASM syntax.

[1]http://www.ibm.com/developerworks/linux/library/l-gas-nasm.h...

jacquesm · on Aug 4, 2009

Here's a trick to help get you started:

Write a relatively simple C program that does something useful that you understand thoroughly, then generate intermediate assembly code from your C compiler, with optimization turned off.

That way you get a problem that you already know how to solve in a .s listing that you can inspect and modify to your hearts content.

Then try to optimize it, make it run quicker by rearranging stuff.

You'll learn lots that way and the barrier to entry is low.

klipt · on Aug 4, 2009

Also, check out http://webster.cs.ucr.edu/AoA/

JamesM · on Aug 9, 2009

The Netwide Assembler (NASM) which I mention in the article frontpage has the same syntax as MASM without the nasty licence. Some of the assembler directives may differ, but 90% of your program will be identical.

sown · on Aug 4, 2009

http://www.ipdatacorp.com/mmurtl.html

This is a neat book. I read the first six chapters a while ago and it seemed interesting.