What's Inside a Linux Kernel Core Dump

saagarjha · 2024-02-09T09:22:35 1707470555

> On a system which is configured for kexec crash dumps, some memory is reserved at boot time for a second Linux kernel (the “kdump kernel”). On startup, the system uses kexec_load(2) to load a kernel image into this reserved memory region. If a panic occurs, all CPUs are halted and control is transferred to the kdump kernel.

Is there really no better way to do this than to have an entire second kernel ready to take over? Like a more specialized piece of code that only handles kernel coredumps?

askl · 2024-02-09T10:32:59 1707474779

> Like a more specialized piece of code that only handles kernel coredumps?

Can't the kdump kernel be exactly that?

You can build a second kernel separate from the normal one, that only has the necessary options enabled to handle the coredump stuff. This way it can be relatively small and you can reuse the required driver code.

saagarjha · 2024-02-10T11:37:32 1707565052

Is that what it is?

jclulow · 2024-02-09T10:17:47 1707473867

That is in fact how crash dump works on illumos systems. Once you enter a kernel panic, interrupts are disabled, all other CPUs are brought to rest, etc. Then the panic code takes the pages to write to the dump and passes them to a special routine in the driver for the dump device: usually either a slice on a disk or a zvol). That driver is responsible for getting the device into a state where polled I/O can be performed, and to write the pages out. At the end the system reboots, and on the next boot a program called savecore inspects the dump device to see if there is a new dump, and if there is it transfers it into a new file.

loeg · 2024-02-09T15:36:54 1707493014

FreeBSD does the same thing (probably some shared 80s BSD heritage?). One challenge is actually providing that special routine for the dump device in a way that is safe at crash time. You don't know how bad a state your crashed kernel is in, and some buses may not have a polled IO mode (USB?).

The Linux kexec process kind of elegantly restarts into a known state, with the known drawback of hogging some memory. It's definitely a different set of tradeoffs and both seem vaguely reasonable.

worthless-trash · 2024-02-09T11:13:00 1707477180

> Is there really no better way to do this than to have an entire second kernel > ready to take over? Like a more specialized piece of code that only handles > kernel coredumps?

So the requirements of this software are: It must be able to boot a system, write to the specific sections of memory, have some method of initializing hardware, some ability to write the other memory to disk or network connections. It sounds like you're talking about an OS.

I believe this is exactly what KDUMP does.

loeg · 2024-02-09T15:37:55 1707493075

The crash dumper only needs to write to disk or network. The other steps are just things the kexec process does to achieve that aim.

worthless-trash · 2024-02-10T12:43:22 1707569002

No, this is not true. If you just boot like a regular operating system, you will trash the crashed kernel making the dump useless.

loeg · 2024-02-10T15:06:50 1707577610

A crash dumper does not have to be implemented with a reboot; this is just something Linux chooses to do. I have actually worked on kernel code that performs crash dumps to disk and over the network, if you find that kind of expertise relevant.

worthless-trash · 2024-02-13T04:16:19 1707797779

I really do.

I believe kexec/kdump does execute in place and does not go through a full reboot, am I wrong ?

msizanoen · 2024-02-09T15:45:36 1707493536

OpenVMS on x86 implements kernel crash dumping in more or less the same way as Linux. See Chapter 6 of https://docs.vmssoftware.com/docs/vsi-openvms-x86-64-boot-ma...

elitistphoenix · 2024-02-09T06:14:51 1707459291

Is it just me that feels dirty learning about the linux kernel from an Oracle page?

Twirrim · 2024-02-09T13:52:12 1707486732

For what it's worth, Oracle is one of the largest contributors to the kernel, especially if you filter out the drivers subtree, which tends to skew the metrics for various reasons (lots of large generated code, almost identical arm SoC variants etc)

It would be really odd if they couldn't teach you about the kernel.

jakogut · 2024-02-09T08:27:21 1707467241

As a counter-example, Zstd is created and primarily developed by a Meta employee.

I also felt the same way upon opening the article, but companies are made of individuals, and often individuals do valuable and insightful work, even inside of bureaucratic and Kafkaesque organizations like Oracle.

unmole · 2024-02-09T06:28:18 1707460098

It's just you.

mlry · 2024-02-09T07:59:32 1707465572

No. It's not. I'm glad someone shared this feeling because I had it too.

nottorp · 2024-02-09T11:10:23 1707477023

I'm really surprised this is not behind some $400/seat/month subscription paywall too...

claudex · 2024-02-09T12:54:49 1707483289

It is, now Oracle lawyers will sent a letter with a settlement proposition for only 800k$ to your company for accessing the web page /s

nottorp · 2024-02-09T15:55:16 1707494116

I don't have 800k. Will a kidney and my first born do?

PenguinCoder · 2024-02-09T06:24:36 1707459876

Feels wrong, but the information density and presentation are pretty good for this. Easy to understand I think.

tyingq · 2024-02-09T13:58:43 1707487123

A bit, but there's plenty of very smart people there, and this particular page seems controlled by the engineers and not the leadership.