Postgres: The next generation

pjungwir · on Oct 10, 2023

I am hoping to be part of that "Next Generation"---although I'm already 46. I'm sure there are others a bit younger.

My last talk at PGCon tried to fill some of the gaps around what you must know to hack on Postgres, especially the Executor Phase and TupleTableSlots. I'm not the most qualified, but sometimes it takes a learner to know what learners need. Just the other day I wrote a table of contents for a book about how to contribute to Postgres. I figure it would sell at least ten copies. Maybe a serial publication online would be better. Curious if anyone would be interested in either of those?

For now Postgres is more like a hobby for me, but if someone is hiring to do open source Postgres contributions full time, I'm open to a chat. ;-)

craigkerstiens · on Oct 10, 2023

Just coming to say hi Paul! I recall you being very excited and getting the email after you submitted your first patch and loved it.

I think a ton of the current set of "next generation" came to Postgres a good bit later. Even Tom himself will tell you he did some stuff with images for a few years which undersells himself (tiff, jpg, png - in some form involved in creation of each of those), then found this Postgres thing and started working on it.

pjungwir · on Oct 10, 2023

Craig, you really made me feel welcome in the Postgres community. Maybe people have. It's such an amazing group. Thanks for all the pg advocacy you've done!

gavinray · on Oct 10, 2023

If you haven't already, you might be interested in reaching out to PG contributor Andrey Borodin.

He does a lot of content focused around how to get started contributing to Postgres and is very friendly + interested in speaking with other PG enthusiasts.

He might be open to collaboration or offering some helpful advice =)

For example: https://www.youtube.com/watch?v=rihfAnd_leM

You can reach him either via email (x4mmm@<repeat-the-previous>.ru) or Twitter (@x4mmmmmm)

pjungwir · on Oct 10, 2023

Thank you!

andatki · on Oct 10, 2023

Thanks. This looks great!

mch82 · on Oct 11, 2023

I’ve met a bunch of top engineers that kept working into their 70s and beyond because they couldn’t do the cool stuff at home. I’m very excited to see if more people may begin to retire earlier & get involved with open source now that participation is more broadly accessible.

fanf2 · on Oct 10, 2023

I get the impression that publishing the work in progress online can be a good way to market the book, eg https://www.cl.cam.ac.uk/~rja14/book.html

Some publishers allow early-access readers to help with reporting mistakes https://nostarch.com/early-access-program

davidthewatson · on Oct 10, 2023

+1 I'll buy your book. What's it called?

pjungwir · on Oct 10, 2023

Thank you! It's just a table of contents. :-) And there is a book about temporal data I want to write first. . . . But it's awesome to hear someone is interested.

EDIT: to plug a couple other independent-author Postgres books on my short-term reading list, these look awesome (but not quite the same focus):

- https://postgrespro.com/community/books/internals

- https://theartofpostgresql.com/

biehl · on Oct 10, 2023

I would be happy to have a look at your book drafts.

pjungwir · on Oct 10, 2023

Thank you! May I keep you in mind once I have something? My email is in my profile, so maybe send me a note. (I don't see yours.)

biehl · on Oct 10, 2023

I sent you a note and added my email to my profile. Best of luck with the writing.

wkoszek · on Oct 10, 2023

My aspirational goal is to be in a position to retire early and hack on Postgres full-time, it’s so interesting. It has networking, storage, data, algorithms etc.

C is less of a problem tbh. Postgres has good code style and its pretty consistent. The complexity of the internals is, and having a small community may infer the speed in which people help you.

teaearlgraycold · on Oct 11, 2023

We should have an NSF for FOSS contributions

nikita · on Oct 10, 2023

You can work full time for someone as a Postgres committer.

1500100900 · on Oct 10, 2023

Unless you're Tom, you'll have to work on tickets assigned to you by some manager.

samaysharma · on Oct 10, 2023

That's not really true any more.

I personally know several Postgres contributors / committers who have a very high amount of control on what projects they work on.

candiddevmike · on Oct 10, 2023

I wonder if C codebases are going to have problems finding maintainers in the future. Postgres has commercial backing and inertia, but it seems like we are lacking a pipeline of (proficient) C developers.

swatcoder · on Oct 10, 2023

In some future, of course, but that future is decades away. C is a living language with no shortage of active users and a learning curve that's not very steep for systems programmers working in other language.

It can feel daunting to today's web and application developers who are used to an opaque curtain between themselves and the underlying system architecture, but systems programmers using C++, Rust, etc already work behind that curtain (but with thicker gloves). They've often worked with C in the past, at least in education or experiment, and can ramp up on the footgun quirks with some intentional study when taking it up professionally.

There are arguments against picking C for some new systems projects, but -- outside a lack of systems programmers more broadly -- there's no pressing concern around finding maintainers for what already exists.

anarazel · on Oct 10, 2023

(postgres hacker)

Yes, I think that will be the case. Obviously not a scientifically measured, by I think we're already seeing that average C skills for new contributors are lower than what they used to be - of course that might just be my slowly greying beard speaking. So far I think people just "learn on the job", but how much delta that bridge I am not sure.

I think eventually we'll have to make it easier to use some other language in parts of the system (e.g. in-core data type implementations). But realistically I think that's still a bit off.

koolba · on Oct 10, 2023

> ... I think we're already seeing that average C skills for new contributors are lower than what they used to be - of course that might just be my slowly greying beard speaking.

That's definitely true. Just like everything else, actual experience matters and the C-share of public codebases (out of the total set) is less and less every year. So while someone may have studied the docs and overall language mechanics, he'd be less likely to have actually worked on a different C code base. Even less so for people who have worked on portable C software.

> I think eventually we'll have to make it easier to use some other language in parts of the system (e.g. in-core data type implementations). But realistically I think that's still a bit off.

There is an elegance to having the C structs directly match the data layout. But I suppose that it's only elegant if your mind has been wired from the start to think about byte alignment and word sizes. Coming from the world of interpreted languages with dynamic objects as bags of properties, I bet it's nowhere near as magical.

eatonphil · on Oct 10, 2023

> There is an elegance to having the C structs directly match the data layout.

Doesn't this ability significantly depend on the system (and also becomes less relevant when you have variable length data)?

petergeoghegan · on Oct 10, 2023

> Postgres has commercial backing and inertia, but it seems like we are lacking a pipeline of (proficient) C developers.

It's difficult to get into Postgres development, but that has little to do with C expertise. The hard part is having the right domain expertise. Building up general familiarity with the system just takes a long time.

hedora · on Oct 10, 2023

I'm guessing this won't be a major problem. I don't know any proficient Rust or C++ developers that are not also proficient C developers.

I do wonder when even more C code bases are going to start seeing modules ripped out and replaced with Rust.

This is already happening with Linux, curl, and with C++ stuff like Chrome, many MS products, Amazon S3, etc, etc. The biggest explicit holdout I know of is OpenBSD, and that's because they're trying to keep the bootstrap / base install toolchain small.

phamilton · on Oct 11, 2023

> I'm guessing this won't be a major problem. I don't know any proficient Rust or C++ developers that are not also proficient C developers.

There's a growing trend in high level languages like Ruby/Elixir/node to use Rust + FFI to optimize specific hot code paths. Likewise, pgrx is a much more approachable path towards postgres extension development.

All of these development paths have always been available with C, but Rust has made them more accessible to a wider range of developers. I'd wager a non-trivial number of Rust developers don't feel comfortable in C.

shrimp_emoji · on Oct 11, 2023

> I'd wager a non-trivial number of Rust developers don't feel comfortable in C.

https://img.ifunny.co/images/5dd3adca9d0da5310faa752ddd53565...

rafaelmn · on Oct 10, 2023

I'm not proficient at C++ but I have written a decent amount of it (although not in years)

While I'm sure I'd be able to pick up on the style of the project, it would take a lot of time. Large scale C looks completely different to C++. No RAII, no classes, virtual dispatch, smart pointers. Containers are completely different, templates/generics only via preprocessor.

I think C requires a lot more conventions and experience to get correct code than C++, and especially Rust.

beautron · on Oct 11, 2023

> No RAII, no classes, virtual dispatch, smart pointers.

I enjoy programming more when the above concepts are absent. RAII and smart pointers tend toward a fragmented and confused layout of a program's memory—there are much simpler ways!

The arena concept for managing your program's memory is more straightforward. It's easier to think about (not confused) and it becomes natural to have your memory laid out nice and orderly (not fragmented, which can be horrible for performance). See the recent article by Ryan Fleury:

https://www.rfleury.com/p/untangling-lifetimes-the-arena-all...

I also think life is easier without classes or virtual dispatch. I value a sort of "mathematical elegance" in programming languages, and prefer to create programs from a small set of fundamental language primitives. Classes and virtual dispatch don't earn their keep in that set.

rafaelmn · on Oct 11, 2023

I'm not arguing C++ is better than C since I have never worked on a non-trivial pure C codebase. I'm just saying enough things from C++ are not available that even tough the language is roughly a subset of C++, how you end up using it is completely different and not a lot of experience/patterns carry over. Even less for Rust I'd say.

wg0 · on Oct 10, 2023

Now this might be idiotic take but isn't C easier and simpler to learn than Rust? Even Typescript's type system is pretty complicated compared to C.

Or I might have revealed my ignorance about how much complicated C is.

steveklabnik · on Oct 10, 2023

Brainfuck is easier and simpler to learn than C. That doesn't mean that writing an application in Brainfuck is easier than writing one in C.

"which is easier to learn" is basically an inherently subjective concept at this point, we as a field do not have an objective way to answer this question, except for extreme cases like the one I am drawing above. For any "real" language, it is much, much, much less clear-cut.

wg0 · on Oct 10, 2023

C isn't brainfuck. Modern tooling (linters, valgrind and friends) should remedy shortcomings if any.

There aren't classes, generics, exceptions etc but that also means there isn't much to learn either.

fsociety · on Oct 11, 2023

Modern tooling and lack of language features does not make it easier to design a program. I have seen self-proclaimed C developers struggle with managing the ownership and lifecycle of memory in a sane way.

dajtxx · on Oct 11, 2023

I started writing commercial software in c in the late 80s, did my own memory management system to work around the limited memory handles in win3, got sdcc going for a custom z80 system where I had to do the startup code, and i have no idea how const works.

steveklabnik · on Oct 11, 2023

> C isn't brainfuck.

I never said it was.

I specifically contrasted the two, saying that C is easier to build an application in than Brainfuck would be. Because it is. That means I think they're different, not the same.

jmull · on Oct 10, 2023

Are there fewer C programmers than their used to be?

Are they getting generally dumber for some reason?

I get that younger devs often don't have the low-level background older devs necessarily have, but all of the useful skills are learnable (and they have less bad things to unlearn, too). All good devs, of any age, have learned to learn what they need when they need it. I don't think anything has changed there.

I think the issue is more that key positions in these more prominent well established projects are usually filled and haven't been opening up.

lmm · on Oct 11, 2023

Fewer new programmers are learning or being taught C. And fewer mid-career programmers are choosing it as a language to learn.

Of course any competent programmer can pick up most languages. But projects written in a language that's in decline still have a harder time attracting developers (look at anything written in COBOL or Perl or TCL): it's less good for their CV than a more popular language, and the level of support in tooling and the library ecosystem tends to be worse, which means you spend more time working on the scaffolding and less time doing interesting work. And frankly C is already a language where you spend a lot of time on ceremony and bookkeeping and relatively little on the essence of the problem.

thesnide · on Oct 12, 2023

Yet, I am mentoring some young devs. And i see a pattern as I advise them to learn C.

Some do have "eureka" moments and begin to understand many things about their current higher level language of choice. Others don't and struggle all along.

Most often than not, the 1rst category will have a much brighter career in programming than the 2nd one. And that is perfectly fine, as none can do everything.

So, i really think that C is my canary. Much like ancient greek and latin are in the academics. Except that C is actually still very useful by itself

lmm · on Oct 12, 2023

Any arbitrary obtuse puzzle will distinguish the more intelligent students from the less so. Doesn't mean it represents any deep insight or useful thing. (Have you tried the same exercise with any other "hard" language in place of C - Assembly, Brainfuck, Forth, APL, Haskell?)

OJFord · on Oct 10, 2023

I doubt it, universities basically choose between C & Java as the main if not only teaching language. Sure the former might evolve into C++, but it somewhat inherently starts as C, and I suspect you didn't mean 'C developers' to exclude people with C++ experience anyway?

An organisation worried about it can easily hire grads/juniors with a corresponding proficiency, and ensure that they still have that by the time they're seniors by employing them to work in C.

packetlost · on Oct 10, 2023

If people are worried about Postgres, they should be more worried about Linux maintainership.

piaste · on Oct 10, 2023

There are many more billion-dollar companies that rely on the Linux kernel than there are that rely on Postgres.

packetlost · on Oct 10, 2023

I mean, fair.

craigkerstiens · on Oct 10, 2023

So many thoughts on this. The community has definitely ebbed and flowed, on this for a while. A few varying pieces of insight with no intention other than to share a bit more on the PG community. And I'm sure some current and former colleagues already in comment threads are going to correct me on nuance of a lot of this.

For several years there were no new committers at all. In recent years the team has tried to be a little more intentional about adding new ones and culling those no longer involved.

About 15 years there was a phase of letting a lot of younger people earn their commit bit. I can recall 3 people by name that all got a commit bit before the age of 25, and they may have actually all been under 22. One of those three shortly after moved on to work outside of the Postgres community, another quietly was busy on other things for over 10 years before coming back, and the third was actively involved going forward. I suspect there was some unease of folks getting a commit bit and then sort of falling off a cliff so it slowed for a few years on adding new folks. Edit - sounds like it was less age driven but maybe still slightly related to some folks falling off that there was a slow down in new committers – tldr - you're not getting a commit bit right out of college for Postgres.

What to me would be interesting but likely hard to gather is what age to people become a committer to Postgres. It wouldn't surprise me if the average age of getting a commit bit is closer to 45 than not. Many folks contributing come to Postgres after other systems work or just don't consider contributing to they're a bit more seasoned because it feels intimidating–I mean patches sent on a mailing list who does that any more? Postgres thats who.

andrenotgiant · on Oct 10, 2023

I have the honor of working with a Postgres ~committer~ contributor who was just over 25 when they first contributed! The story about their first commit is great:

They were testing SQL behavior for Materialize and thought to check that both systems handle interval functions identically. Being thorough, they tried something like:

  select interval '0.5 months 2147483647 days';

You can try it yourself on dbfiddle[0] Instead of erroring, Postgres returned a bogus value `{"days":-2147483634}` you can read why here[1]

So naturally they decided to fix it in Postgres, which is why they contributed and why it's handled properly in 15+ [2]

[0] https://www.db-fiddle.com/f/ijT76fsmL99bHvXxhAtf7j/0 [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit... [2] https://www.db-fiddle.com/f/i3KikCb72AN1EZpywErZvr/1

koolba · on Oct 10, 2023

> I have the honor of working with a Postgres committer ...

That's not a committer, that's someone who submitted a patch that got committed. A committer is the one who actually applies the patch and can push the branch into the mainline repo. Committers decide if something is worthy of being merged.

Now that aside, yes this plus reviewing patches to get a wider feel for the codebase is how you eventually become a committer.

Best way to eat an elephant is one bite at at time.

craigkerstiens · on Oct 10, 2023

This is a common source of confusion for a ton of folks. Anyone can submit a patch, but commit bits are reserved for a much smaller list. The attitude is something like you commit it, you maintain it–so if bugs come in you'll spend your time fixing those for whatever time it takes vs. working on the next shiny feature that you're excited about for the next release.

There was sort of a fuzzy "major" contributors (https://www.postgresql.org/community/contributors/) which were people that contributed major features and then a list of other contributors. Depending on who you talk to this is either dated or a pretty close attempt at reflection of reality but not perfect. In recent years they expanded the contributors to include others that were contributing in non-code ways though it's still a decent place to find people contributing to major feature sets.

Of course this is not to be confused with the core team–which is more like a steering committee. But not so much steering committee of code and feature sets.

andrenotgiant · on Oct 10, 2023

Ahh thanks for clarifying - now I better understand the significance of the OP's point about the rarity of younger COMMITTER's.

gavinray · on Oct 10, 2023

The thing about becoming a PG contributor is that the barrier to entry is fairly high.

I love Postgres so much I have a PG tattoo, but from the perspective of the two ways you can contribute:

- As a random user, in your free time: There's not a ton of "Good first issue" type tickets. Where you can ease your way into PG dev by working on something that doesn't require you to have context on many parts of the PG architecture and at least a little historical knowledge on why things are written the way they are. Also, it can be a bit intimidating to have your patches reviewed by the likes of Tom or Andres.

- As a developer for a paid PG company like EDB/PG Pros/Crunchy etc: It's a sort of Catch-22 scenario here, where it's difficult to get hired as a junior without having previous PG hacking experience, but the path to doing that is not the easiest thing in the world.

If I was going to work somewhere that wasn't $CURRENT_CO, it'd be somewhere doing PG work, but there's not a lot of viable avenues/inroads there.

hlinnaka · on Oct 10, 2023

PostgreSQL isn't that special as a codebase. Every codebase has its quirks, every project has its own processes and there's a learning curve. When you switch to a new job as a software engineer, you pick it up. PostgreSQL is no different: you can hire an engineer to work on PostgreSQL.

I'm not sure how well that path works in growing new contributors, though. In a usual company setting, the goals are better defined, and the company is in control. Once you reach the goals, mission accomplished. With an open source project it's more nebulous. Others might have different criteria and different priorities. You are not in control. Choosing the right problems to work on is important.

Other storage or database projects would be a good source of new contributors. If you have worked on another DBMS, you're already familiar with the domain, and the usual techniques and tradeoffs. But to stick around, you need some internal desire to contribute, not just achieve some specific goals.

harikb · on Oct 10, 2023

The biggest hurdle I see is that it is a C project, unfortunately something we can do nothing about. It is so much harder to trust a random code not have to have serious implications for the database. It will take ages for someone to get comfortable with the pg-code-base way of handling errors, basic string manipulation, memory alloc/free etc.

I want to highlight the difference in "making a non-core contribution" to "understanding database internals". I am highlighting it is not the latter, but the former that is the first hurdle.

I wanted to reuse builtin pg code to parse the printed statements from logs - I ended up writing a parser (in a non-C language) myself which was faster.

gavinray · on Oct 10, 2023

Couple of points in this post, so will address a few of them:

  "(Paraphrased) C is bad, and it takes forever to pick up the PG-specific C idioms"

There's probably not a productive conversation to be had about C as a language. I will say that as of C23, the language is not quite as barebones as it used to be and incorporates a lot of modern improvements.

On the topic of PG-specific C -- there are a handful of replacements for common operations that you use in PG. Things like "palloc/pfree", and the built-in macros for error and warning logging, etc.

I genuinely don't think it would take a motivated party more than a day or two to pick all of these up -- there aren't that many of them and they tend to map to things you're already used to.

  "I wanted to reuse builtin pg code to parse the printed statements from logs - I ended up writing a parser (in a non-C language) myself which was faster."

It's true that the core PG code isn't written in a modular way that's friendly to integration piecemeal in other projects (outside of libpq).

For THIS PARTICULAR case, the pganalyze team has actually extracted out the parser of PG for including in your own projects:

https://github.com/pganalyze/libpg_query

zxexz · on Oct 10, 2023

libpg_query is a godsend of a library. I spent a lot of time writing a custom parser before I found it - was very happy to replace the whole thing. A major boon was the fingerprinting ability - one of my needs was to track query versions in metadata.

craigkerstiens · on Oct 10, 2023

I disagree on this. Yes it's C. But I've heard people comment "I don't like writing C, but I don't mind Postgres C".

The bigger hurdle which Peter mentioned in another thread is simply building up enough expertise with the system and having the right level of domain expertise.

stouset · on Oct 10, 2023

> Yes it's C. But I've heard people comment "I don't like writing C, but I don't mind Postgres C".

While "Postgres C" might be wonderful, in practice learning the project's unique idioms is yet another hurdle for newcomers to overcome.

eatonphil · on Oct 10, 2023

Every project has unique idioms. Let alone ones that are 30+ years old.

Idioms are a baked in cost of learning to contribute to any project.

fanf2 · on Oct 10, 2023

I found that I learned a lot when trying to write a logical decoding plugin. So I guess if you are a user of Postgres and there’s some small friction you could reduce by writing a plugin, it’s a good way to get started. Scratch your own itch, you don’t have to publish the results :-)

samaysharma · on Oct 10, 2023

I don't have the data for the average age, but I was recently in a conversation around how long does it take to become a committer since getting involved in Postgres by writing code for it.

So, I wrote a couple git commands like below [1] to figure out when someone was first named in a commit message vs when they made their first commit (as a committer) for the last 10 people who became committers.

The average time of involvement was ~8.9 years (just comparing month / year), with the lowest being ~6.5 years.

Obviously one could do better analysis but my goal was just to get an approximate understanding.

[1] git log --grep 'Name' --format=%cs | sort | head -1

git log --author 'Name' --format=%cs | sort | head -1

dist-epoch · on Oct 10, 2023

How much bigger (in lines of code) is Postgres now versus the one from 15 years ago?

Maybe it was more approachable for a 22yo then, you could figure out more of it.

Also, C was a standard language back then, today the kids are more likely to program in Rust than in C.

anarazel · on Oct 10, 2023

> How much bigger (in lines of code) is Postgres now versus the one from 15 years ago?

I was curious as well and wrote a, very crude, script to measure it:

  for t in $(git tag -l|grep -E 'REL.*_0$|REL[67]_[0-4]$'|grep -v REL2);do echo -ne "$t\t"; git ls-tree -r $t --object-only |xargs git show |grep -a -v '^\s+$'|wc -l;done
  REL6_1          270033
  REL6_2          320297
  REL6_3          386532
  REL7_0          630771
  REL7_1          843219
  REL7_2          986991
  REL7_3          1363668
  REL7_4          1492418
  REL8_0_0        1649775
  REL8_1_0        1702325
  REL8_2_0        1806170
  REL8_3_0        2017685
  REL8_4_0        1924918
  REL9_0_0        2011704
  REL9_1_0        2225796
  REL9_2_0        2290872
  REL9_3_0        2405598
  REL9_4_0        2487304
  REL9_5_0        2527906
  REL9_6_0        2632559
  REL_10_0        2534653
  REL_11_0        2771914
  REL_12_0        2697892
  REL_13_0        2822066
  REL_14_0        2980221
  REL_15_0        3054963
  REL_16_0        3351147

This is counting non-empty lines. It's definitely not a good measure of overall code size, as it includes things like regression tests "expected" files. But as that's true for all versions, it should still allow for a decent comparison.

8.3.0 was released 2008-02-01, with 2M non-empty lines, we're now at 3.4M.

pcthrowaway · on Oct 10, 2023

I suspect you'd get much more useful results by checking out the version tags and running `cloc` - https://github.com/AlDanial/cloc

monkchips · on Oct 16, 2023

great contribution here from Craig, in terms of the ebbs and flows and useful history. i had no idea about that cluster of folks under 22 with commit bits.

tristan957 · on Oct 10, 2023

I am a new contributor to Postgres as of 5 months ago. I turn 27 later this month. While I haven't contributed too much of value, I have a few commits here and there with plans to do other things, like make it easy to build Postgres extensions with Meson and hopefully drop the autotools build of Postgres ASAP. You might also catch me in the pgbouncer or pgvector repositories very soon.

The story of how I came to contribute to Postgres is that I got tired of working at a software consultancy where I had worked for 3 years, 2 years during college, 1 year full-time. I had always more envisioned myself as an open-source systems software guy. I found a job at Micron working on an open-source storage engine. It kind of felt like luck to have found that job to be honest, but I felt like the job description was written for me, so I applied. I worked on that project for 2.5 years and loved it. Unfortunately, Micron laid off my entire team back at the end of February. I began searching for a job, and eventually received an offer from MongoDB to work on the C/C++ drivers, but that was rescinded. Then, I really started to lean into my network, and one guy I knew from #mesonbuild on Libera.Chat/Matrix worked on Postgres, so I messaged him and asked if there were any positions open in the Postgres space for someone with my background that he might be aware of. He told me that Neon was hiring, so I sent an application to work on their storage engine, but in the initial interview, my eventual manager thought that I would fit better in the new Postgres team they were forming, which would contribute upstream to Postgres. I am very appreciative to Neon for taking a chance on me.

The topic of this blog post is interesting because it just came up during some discussions with other young Postgres contributors at PGConf NYC. Some points that we brought up:

- It is hard to get patches looked at, even small ones. With better name recognition in the community, comes more patch reviews it seems, which is most likely the case in most projects, but still, it is a circular issue.

- The organization of the Postgres mailing list is not very good. You are forced to drink from the firehose that is pgsql-hackers, whereas the LKML is organized into various subsystems. Modern code forges have value in that you can subscribe to certain tags on PRs/issues, which isn't the case with the current state of pgsql-hackers.

- Adding things to a commitfest is a little burdensome. The only way to get a patch through the entire Postgres CI is adding it to the commitfest, and even then, you have to be proactive about checking it, or hope that a committer will tell you to look at the CI failure.

- Bug reports are also sent to a mailing list (pgsql-bugs). There is no equivalent to the Linux bugzilla for Postgres for instance.

- Patches are sent as attachments to emails, and not necessarily git-format-patched either, whereas the LKML uses git-send-email exclusively from what I can tell.

All in all, it kind of seems like tooling in the Postgres contributor community works best for those that have been ingrained in it for 15+ years, which I guess is the case for most things. I don't want this to turn into a "Use GitHub/GitLab" post. Let it be known that I actually think email is the superior way to communicate about patches, but the tooling around the mailing list could improve. Everything seems very disjoint. SourceHut, I think, has done a good job of making mailing list development more approachable for the everyday contributor. Issues, mailing lists, CI/CD, and repositories are all connected to each other. There aren't separate services like they currently are for Postgres.

This comment is probably worth a blog post of its own at some point, but I'll end it here. If you are also new to contributing to Postgres, perhaps we can share experiences. Email me at tristan <at> neon.tech or tristan <at> partin.io. Another Postgres contributor that I talked to thought it might be useful to have monthly meetings among non-committer contributors where we can talk about the patches we are working or have posted in order to get reviews from peers.

andatki · on Oct 11, 2023

That last idea is really good Tristan and I bet you’ll have interest. You could lead an online meetup. Melanie Plageman was interested in ideas like that as well. We briefly discussed different types of office hours.

tristan957 · on Oct 11, 2023

Yeah, Melanie was who I had that conversation with actually.

gurjeet · on Oct 14, 2023

I don't this recurring meeting proposed in any of the mailing lists I'm part of. If you start this conversation, please include me as well.

monkchips · on Oct 16, 2023

very cool that you found Neon/they found you. this does indeed look like a good post brewing. sounds like potentially low hanging fruit for the postgres community in terms of organisation and processes.

anarazel · on Oct 10, 2023

> - It is hard to get patches looked at, even small ones. With better name recognition in the community, comes more patch reviews it seems, which is most likely the case in most projects, but still, it is a circular issue.

I agree that this is a significant issue. I'm less sure about the "better name recognition" bit, I feel there's also a significant drop off at the other end. But that might just be biased by my level of experience.

> - The organization of the Postgres mailing list is not very good. You are forced to drink from the firehose that is pgsql-hackers, whereas the LKML is organized into various subsystems. Modern code forges have value in that you can subscribe to certain tags on PRs/issues, which isn't the case with the current state of pgsql-hackers.

Yep. And it has gotten a lot worse in the last couple years, I'd say.

> - Adding things to a commitfest is a little burdensome. The only way to get a patch through the entire Postgres CI is adding it to the commitfest, and even then, you have to be proactive about checking it, or hope that a committer will tell you to look at the CI failure.

My main reason to reply here was this: You can also enable CI in your repository: https://github.com/postgres/postgres/blob/master/src/tools/c... - that's the same CI that happens for commitfest entries.

> - Bug reports are also sent to a mailing list (pgsql-bugs). There is no equivalent to the Linux bugzilla for Postgres for instance.

Yep, I hate this. I loose track of things all the time, I'm way too easily distractable. I think the kernel bugzilla is pretty useless, but it's not that hard to do better than that.

> - Patches are sent as attachments to emails, and not necessarily git-format-patched either, whereas the LKML uses git-send-email exclusively from what I can tell.

I find lkml style patch handling bad as well, particularly with every patchset revision getting its own thread. Very easy to loose track.

> All in all, it kind of seems like tooling in the Postgres contributor community works best for those that have been ingrained in it for 15+ years, which I guess is the case for most things.

I personally wouldn't say it works particularly well, even after participating in development for about 15 years... I'd also say that the development process has evolved some in that time, just not as far as it'd be good. It's a lot of hard work to get a community as grey-beardy as the PG community to evolve. Not impossible, but ...

> I don't want this to turn into a "Use GitHub/GitLab" post.

Personally I strongly dislike using either for nontrivial work. But: I still think we ought to accept PRs/MRs via one of the two, just to make it easier for newer contributors. But it isn't just my call...

> Let it be known that I actually think email is the superior way to communicate about patches, but the tooling around the mailing list could improve.

I suspect you'd actually have a hard time finding more than 2-3 people disagreeing with that notion. One of the problems is that many of us end up preferring to spend time hacking on postgres than on development-process tooling / integration....

tristan957 · on Oct 11, 2023

> My main reason to reply here was this: You can also enable CI in your repository: https://github.com/postgres/postgres/blob/master/src/tools/c... - that's the same CI that happens for commitfest entries.

I need to get around to this...

> I find lkml style patch handling bad as well, particularly with every patchset revision getting its own thread. Very easy to loose track.

This is a good point, but if it works for the largest open-source software project in the world, I think it could work for Postgres too. I find that being able to reply inline to an email without having to copy-paste huge sections of an attachment is pretty valuable.

> One of the problems is that many of us end up preferring to spend time hacking on postgres than on development-process tooling / integration....

Completely agree, which is why I think we need to investigate migrating to a self-hosted SourceHut instance or something, so we can just use the tools that are provided by a company whose job it is to write those tools.

anarazel · on Oct 11, 2023

> > I find lkml style patch handling bad as well, particularly with every patchset revision getting its own thread. Very easy to loose track.

> This is a good point, but if it works for the largest open-source software project in the world, I think it could work for Postgres too. I find that being able to reply inline to an email without having to copy-paste huge sections of an attachment is pretty valuable.

I had actually forgotten that issue, having scripted it years and years ago to be automatic. My beef around this is gmail attachment being randomly ordered...

tristan957 · on Oct 13, 2023

My beef with gmail is that it completely destroys mimetypes. Everything that gmail postgres contributors send (from the web interface) seems to always be application/octet-stream, which is really annoying for configuring my email client.

https://git.sr.ht/~tristan957/dotfiles/tree/master/item/aerc...

anarazel · on Oct 13, 2023

Yea. I've long since given up on that ever being correct. Too many people with too many mail clients. Thus for many mime types I just force that to be redone:

  mime_lookup application/octet-stream application/x-gzip application/gzip application/x-bzip2 binary/octet-stream application/x-download invalid/octet-stream

Not that that's a good answer...

gurjeet · on Oct 14, 2023

> I need to get around to this

Being a long-timer in the Postgres community, I was hesitant to enable this. But trust me, the 5 minutes you'd spend enabling this would be completely worth it. The breakage reports for your commits, if any, are worth it because they save community's time finding and reporting those bugs to you.

I only wish this system was not tied to GitHub and CirrusCI, though.

DrDroop · on Oct 10, 2023

I've done a little bit of postgres work with the help of pgrx and I can recommend it as a platform to build data solutions. Another great resource has been the CMU channel: https://www.youtube.com/@CMUDatabaseGroup

gavinray · on Oct 10, 2023

pgrx unfortunately has essentially zero documentation or samples on use outside of extensions.

For instance, say you want to write a new Table Access Method handler. There are bindings to TableAM stuff in the core pg-sys SDK but no docs or examples on using it in Rust.

zombodb · on Oct 10, 2023

I don’t think we even expose the TableAM APIs? They are incredibly hard to generate bindings for from the C headers — lots of inline functions and complex #define macros.

We have an ambitious goal with pgrx and it’s going to take many years and countless hours of developer effort to get there.

It can, however, serve as a way for newcomers to gain experience with Postgres internals.

Regarding the TableAM specifically, when we are able to create a safe Rust wrapper around it, that wrapper will be documented.

gavinray · on Oct 10, 2023

Ah, curious what this bit does then, I think I might have misunderstood it:

https://github.com/pgcentralfoundation/pgrx/blob/c2eac033856...

zombodb · on Oct 11, 2023

I think maybe what you’re really looking for are the files here: https://github.com/pgcentralfoundation/pgrx/tree/c2eac033856...

Those are the internals we currently expose as unsafe “sys” bindings.

As we/contributors identify more that are desired we add them.

pgrx’ focus is on providing safe wrappers and general interfaces to the Postgres internals, which is the bulk of our work and is what will take many years.

As unsafe bindings go, we could just expose everything, and likely eventually will. There’s just some practical management concerns around doing that without a better namespace organization —- something we’ve been working.

The Postgres sources are not small. They are very complex, inconsistent in places, and often follow patterns that are specific to Postgres and not easy to generalize.

If you’ve never built an extension with pgrx, give it a shot one afternoon. It’s very exciting to see your own code running in your database.

misiek08 · on Oct 11, 2023

My observation is: most of people coming to IT these days are just money greedy, no enthusiasts anymore. It’s very sad and it seems like many open source projects die because of that - no back contributions or any help, only “copy and paste from SO and get paycheck”. I’m not saying everyone is like that by my close observations and conversations in people from few companies I contributed in show me 19:1 ratio…

Disclaimer: I work for two companies every day because had so much free time because did my job too fast for standards and only wasted time waiting for meetings. Did a lot of side jobs to do something interesting, many time for free or to just use some new hardware and experiment with.

nijave · on Oct 11, 2023

I think companies are partially to blame. Contracts filled with invention assignment clause and outside activities clauses raise the barrier to contribution.

ideamotor · on Oct 12, 2023

Should be illegal.

bagasme · on Oct 11, 2023

> Postgres isn’t in any sense struggling to attract new users – there are plenty of 22 year olds defaulting to it today.

Yes, I do use Postgres for a lot of my self-hosted apps. For PHP applications, though, I keep using MariaDB as their default/only DB.

throwjabdb · on Oct 10, 2023

TL;DR PostgreSQL are aging and Neon os hiring and training juniors instead of existing commiters so the developer base grows.

kumarvvr · on Oct 11, 2023

As an interested programmer, but with no experience in C/C++, I would really love to have a series of videos explaining the code in detail, so that I can begin to contribute.

alberth · on Oct 10, 2023

What limitations does Postgres have that would cause you to use another data store?

Or what big feature / functionally is Postgres missing (if any, like replication?)

mattashii · on Oct 11, 2023

Assuming you mean replication (as defined in https://www.postgresql.org/docs/current/warm-standby.html#ST... or https://www.postgresql.org/docs/current/logical-replication....), then PostgreSQL has it (see those docs for details). If you mean something else, could you elaborate?

nerdbaggy · on Oct 10, 2023

Replication for sure

cannonpalms · on Oct 11, 2023

Would you mind expanding on this? In what ways is PG's replication support lacking versus the competition?

choffman · on Oct 10, 2023

Article says Postgres launched in 1986. Wikipedia says 1996.

tristan957 · on Oct 10, 2023

> The implementation of POSTGRES began in 1986

https://www.postgresql.org/docs/current/history.html

Postgres started as a project at Berkeley.

fanf2 · on Oct 10, 2023

The POSTGRES project at Berkeley started in 1986, the open source successor became PostgreSQL in 1996.

dist-epoch · on Oct 10, 2023

> Who is going to be doing the work in 2043?

ChatGPT. It won't matter anyway since we won't be around to worry about this question.

DiggyJohnson · on Oct 11, 2023

2043 is only 20 years away. Presumably many of us will be around and productive in that year.