I imported the full Linux kernel git history into pgit

tonnydourado · 2026-04-09T08:48:20 1775724500

That was an informative post but Jesus Christ on a bicycle, reign in the LLM a bit. The whole thing was borderline painful to read, with so many "GPTisms" I almost bailed out a couple of times. If you're gonna use this stuff to write for you, at least *try* to make it match a style of your own.

vidarh · 2026-04-09T09:00:12 1775725212

To add a tip on how to make it match your own style: You can get decently far by pointing it to a page or so of your own writing, and simply tell it to review the post section by section and edit it to match the tone and style of the example. It's not perfect by any means, but it will tend to edit out the type of language you're not likely to use, so really to make it sound less LLM-like, almost any writing sample from a human author works.

mplanchard · 2026-04-09T12:03:07 1775736187

You can also just write it.

I’d much rather read someone’s imperfect writing than the soulless regression-to-the-mean that LLMs produce. If you’re not a native speaker or don’t have confidence in your writing, I’d urge you to first ask for an edit by another human, but if that’s not an option, to be extremely firm in your LLM prompting to just have it fix issues of grammar, spelling, etc.

erichanson · 2026-04-09T20:33:01 1775766781

"soulless regression-to-the-mean", damn that's quote of the day.

vidarh · 2026-04-09T13:15:48 1775740548

Almost nobody recognises well written AI texts. I've seen plenty of AI written text pass right by people who are sure they can always tell. It takes very little, because the vast majority of AI writing you spot involves people doing nothing to make it clean up the style.

vidarh · 2026-04-09T23:39:03 1775777943

I find it quite funny how this got downvoted. My statement is based on concrete knowledge of a project that tested this, and demonstrated quite conclusively that most people consistently fail to detect AI written text that's gone through even very basic measures to seem more human.

seizethegdgap · 2026-04-10T01:31:49 1775784709

Is it really worth your time to complain about fake internet points on a comment nested 4 deep?

mplanchard · 2026-04-09T11:57:57 1775735877

I did bail out because of this, despite being pretty interested in the content. I love reading, but I cannot stand LLM “writing” output, and few things are important enough for me to force myself through the misery of ingesting ChatGPT “prose.” I only made it to the second section of this one.

darkwater · 2026-04-09T09:02:18 1775725338

100% agreed. Maybe this inner reaction will disappear over the years of being exposed to the GPT writing style, or maybe LLMs will be "smarter" on this regard, and being able to use different styles even by default. But I had the same exact feelings as you reading this piece.

vidarh · 2026-04-09T09:03:57 1775725437

It's really simple to fix by asking an LLM to apply a style from a sample, so my guess is a lot of product will build in style selection, and some provider will add more aggressive rules in their system prompts over time.

jillesvangurp · 2026-04-09T12:37:36 1775738256

I would recommend using guard rails to guide tone, phrasing, etc. This helps prevent whole categories of bad phrasing. It also helps if you provide good inputs for what you actually want to write about and don't rely too much on it just filling empty space with word soup. And iterate on both the guard rails and the text.

multjoy · 2026-04-09T12:53:07 1775739187

Or, you know, just write it yourself.

mplanchard · 2026-04-09T12:11:09 1775736669

It’s not even just about the style. It’s a matter of respect for your readers. If you can’t be bothered to take the time to write it, why on earth should I care enough to take the time to read it?

vidarh · 2026-04-09T13:11:14 1775740274

If the content has value, I could not care less.

darkwater · 2026-04-09T12:52:24 1775739144

Yes, but you need a style before :) But in TFA's author case, he actually had a few other blog posts which feel not LLM generated to use as an example, I agree.

vidarh · 2026-04-09T13:13:18 1775740398

But for plenty of applications it doesn't need to be your personal style. It only needs to be your personal style if you want to present it as your own writing. Otherwise it just matters that it's well written. A catalogue of styles would work well for lots of uses.

47282847 · 2026-04-09T15:02:06 1775746926

„Rewrite in a style appealing to Hacker News users critical of AI slop“.

vidarh · 2026-04-09T23:26:20 1775777180

I mean, there are lots of people here that writes well enough that giving it some style samples and tell it to adapt the text to "this style: [insert post]" wouldn't be the worst idea.

consp · 2026-04-09T09:22:48 1775726568

I stopped at "pgit handled it.". The tldr was appreciated though as now I don't have to sieve though the LLM bloat.

tombert · 2026-04-08T21:16:49 1775683009

If I recall correctly, the Fossil SCM uses SQLite under the covers for a lot of its stuff.

Obviously that's not surprising considering its creator, but hearing that was kind of the first time I had ever considered that you could translate something like Git semantics to a relational database.

I haven't played with Pgit...though I kind of think that I should now.

anitil · 2026-04-09T01:26:49 1775698009

The sqlite project actually benefited from this dogfooding. Interestingly recursive CTEs [0] were added to sqlite due to wanting to trace commit history [1]

[0] https://sqlite.org/lang_with.html#recursive_query_examples

[1] https://fossil-scm.org/forum/forumpost/5631123d66d96486 - My memory was roughly correct, the title of the discussion is 'Is it possible to see the entire history of a renamed file?'

anitil · 2026-04-09T01:27:24 1775698044

On and of course, the discussion board is itself hosted in a sqlite file!

20after4 · 2026-04-09T07:17:16 1775719036

When you import a repository into Phabricator, it parses everything into a MySQL database. That's how it manages to support multiple version control systems seamlessly as well as providing a more straightforward path to implementing all of the web-based user interface around repo history.

ImGajeed76 · 2026-04-10T00:00:03 1775779203

you should! "go install" it and you're up in a minute.

adastra22 · 2026-04-09T09:56:03 1775728563

Git was a (poor) imitation of the monotone DVCS, which stored its data in sqlite.

xeubie · 2026-04-09T10:05:21 1775729121

True, git poorly imitated monotone's performance problems.

gjvc · 2026-04-08T22:43:51 1775688231

"If I recall correctly, the Fossil SCM uses SQLite under the covers for a lot of its stuff."

a fossil repository file is a .sqlite file yes

ptdorf · 2026-04-09T00:50:21 1775695821

So SQLite is versioned in SQLite.

yjftsjthsd-h · 2026-04-09T04:31:36 1775709096

Yep:) To be fair, I expect git to be stored in git, mercurial to be in mercurial, and... Actually now I wonder how svn/cvs are developed/versioned.

deepsun · 2026-04-09T06:20:39 1775715639

SVN in SVN for sure, it's a well made product. The market just didn't like it's architecture/UX that doctates what features available.

CVS is not much different from copying files around, so would not be surprised if they copied the files around to mimic what CVS does. CVS revolutionized how we think of code versioning, so it's main contribution is to the processes, not the architecture/features.

vidarh · 2026-04-09T09:02:25 1775725345

The market did like it just fine until Git came around. It just had a very brief moment in the sun....

tombert · 2026-04-09T14:41:56 1775745716

My first software job, I was a junior person, and every Friday, we would have The Merge, where we'd merge every SVN branch into trunk. We always spoke of it like it was this dreadful proper noun, like Voldemort or something.

The junior engineers were the ones doing this, and generally my entire day would be spent fixing merge conflicts. Usually they were easy to resolve, but occasionally I'd hit one that would take me a very long time (it didn't help that I was still pretty inexperienced and consequently these things were just sort of inherently harder for me). I just assumed that this was the way that the world was until I found `git-svn`.

`git-svn` made a task that often took an entire day take something like 45 minutes, usually much less. It was like a light shining down from heaven; I absolutely hated doing The Merge, and this just made it mostly a solved problem.

After that job, I sort of drew a soft line in the sand that I will not work with SVN again, because at that point I knew that merging could be less terrible. I wasn't necessarily married to git in particular, but I knew that whatever the hell it was that SVN was doing, I didn't like it.

vidarh · 2026-04-09T23:30:13 1775777413

By the time git became well known, sure, SVN fell from favour very quickly for good reason. But it had a few years in the sun. Not nearly as long as Git has had at this point.

There were also many holdouts in places that didn't need complex merges.

Having a fixed merge cadence strikes me as both utter madness and totally inflicted nightmare, though. If you're going to merge on a fixed cadence rather than when things are ready, you almost might as well have people push straight to trunk.

tombert · 2026-04-08T23:06:43 1775689603

Makes sense, I haven't used the software in quite awhile.

spit2wind · 2026-04-09T11:22:32 1775733752

> only a handful of VCS besides git have ever managed a full import of the kernel's history. Fossil (SQLite-based, by the SQLite team) never did.

I find this hard to believe. I searched the Fossil forums and found no mention of such an attempt (and failure). Unfortunately, I don't have a computer handy to verify or disprove. Is there any evidence for this claim?

gritzko · 2026-04-09T12:28:28 1775737708

I was giving students an assignment to import git repo into fossil and the other way around. git was a tad faster, but not dramatically.

ImGajeed76 · 2026-04-09T13:46:50 1775742410

i did look into this before writing the post. there's a fossil-users mailing list post by Isaac Jurado where he reported that importing Django took ~20 minutes and importing glibc on a 16GB machine had to be interrupted after a couple of hours. he explicitly warned against trying the linux kernel. the largest documented import on the fossil site itself was NetBSD pkgsrc (~550MB) which already showed scaling issues. so "never did" is fair - not because anyone tried and failed, but because it was known to be impractical and explicitly discouraged.

corbet · 2026-04-09T13:19:34 1775740774

I hate to blow our own horn, but I'm gonna...if you are interested in seeing this kind of kernel-development data mining, fully human-written, LWN posts it every development cycle. The 6.17 version (https://lwn.net/Articles/1038358/) included the buggiest commit and much surrounding material. See our kernel index (https://lwn.net/Kernel/Index/#Releases) for information on every kernel release since 2.6.20.

Or see LWN on Monday for the 7.0 version :)

ImGajeed76 · 2026-04-09T13:42:51 1775742171

Thanks! LWN's development cycle reports are incredible and were actually an inspiration. The goal here wasn't to replace that kind of expert analysis but to show what becomes possible when you can just write SQL against the raw history. Your reports add the context and understanding that no database query can provide.

anonair · 2026-04-09T18:57:21 1775761041

I wish one day tools like gitlab and forgejo ditch filesystem storage for git repos and put everything in sqldb. I’m tired of replicating files for DR

niobe · 2026-04-09T00:20:13 1775694013

Very cool

gurjeet · 2026-04-08T20:59:49 1775681989

Technically correct title would be: s/Kernel into/Kernel Git History into/

    Pgit: I Imported the Linux Kernel Git History into PostgreSQL

worldsayshi · 2026-04-08T22:57:33 1775689053

Wow that has a very different meaning from what I thought.

JodieBenitez · 2026-04-08T21:08:11 1775682491

Read the title and immediately thought "what a weird way to solve the performance loss with kernel 7..." The mind tricking itself :)