Linux internals

xtacy · on March 17, 2011

The illustrations look really nice! He says he uses Visio 2007: http://duartes.org/gustavo/blog/post/quick-note-on-diagrams-....

zi · on March 17, 2011

I read through the entire article. Very fascinating. I admit that I don't understand but more than 5% of what is being explained here, but it is a VERY in depth look at how things are mapped out. I wasn't aware that operating systems were so wasteful in the area of redundant resource management. Mad props on this article.

danvet · on March 17, 2011

No wasteful redundant resource management. It might look like that, but for every data structure in the kernel there's a use-case that absolutely needs the corresponding lookup/access to go fast. A bunch of examples from memory management:

* You need your pagetables for the cpu to find your memory.

* You need your vma tree in each struct mm to quickly go from virtual address to the backing store to handle missing pagetable-entries. People routinely allocate insane amounts of virtual memory and just use a few bytes, so prefilling all the pagetables is a no-go.

* You need the struct page for every physical page to the backing store (file or anonymous memory) and keep track of it's state (e.g. locked into place due to i/o, dirty and needs to eventually be written back to the filesystem).

* You need the radix pagecache tree to quickly go from file-offset to it's cache page (read/write and page fault handling). Also you need to walk that tree in-order to write back dirty pages efficiently (i.e. avoid tons of random seeks).

* All the connections are used, too. E.g. when shrinking the size of a file, all the vma's that map it are looked up via the file's address_space to shoot down these mappings (and deliver SIGBUS in case userspace access these now invalid virtual address).

* Anonymous memory (i.e. not memory mappings of a file with the pages in the file page cache) also neds to be tracked in a similar way. It's sligthly different because anonymous memory has a few different corner-cases with Fancy Behaviour (tm) like the shoot-down-all-mappings-on-truncate dance necessary for file-backed memory.

And often-used data structures are insanely optimized: Fields that are not used at the same time overlap (struct page is the most obvious example). And because pointers are always aligned to 4 bytes, there are 3 bits available in each that are used to store some information or the type of the object pointed to (e.g. the color of the red-black tree implementation is stored like this). Further structures are as often as possible embedded into each another to avoid the cache-miss when chasing a pointer.

Because it's all extremely optimized, it's easy to assume all memory access is O(1). It's not, and every time someone blurbs that his hash-table has O(1) access times my brain immediately goes "BS! ... not on any real hardware and not on any real kernel." Of course, you need pretty big hash-tables to notice the effect ...

EDIT: layout fixes.

_b8r0 · on March 17, 2011

This is probably one of the best descriptions of Linux memory management I've ever seen:

http://duartes.org/gustavo/blog/post/how-the-kernel-manages-...

Great find!

grourk · on March 17, 2011

"The first time I saw FSB packet descriptions I had a huge “ahhh!” moment so I hope someone out there gets the same benefit."

I think he means "aha!" moment, but it certainly could go either way.