Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Any C programming Postgres wizards willing to port 460 lines of code?
124 points by andrewstuart on July 5, 2015 | hide | past | favorite | 81 comments
I'd really like to use Postgres as a backing store for libgit2 https://libgit2.github.com/

I wonder if there are any kind C programming wizards who know Postgres and might consider doing the open source port? It's beyond my Python programming skills and I dare not write crappy C code for fear of creating something nasty and insecure.

I can repay either by reciprocating with Python/web development/Linux/AWS knowledge, or if I have nothing of value to offer then I can offer thanks and praise.

The existing MySQL implementation is 460 lines of code.

There's a MySQL implementation here: https://github.com/libgit2/libgit2-backends/blob/master/mysql/mysql.c

There's a sqlite implementation too: https://github.com/libgit2/libgit2-backends/blob/master/sqlite/sqlite.c

Some relevant links: http://blog.deveo.com/your-git-repository-in-a-database-pluggable-backends-in-libgit2/




Lucky for you I actually did exactly this over a year ago. We're not using it anymore so I'll just open source it for you: https://gist.github.com/mhodgson/d29bbd35e1a8db5e0800

Please note that I also don't know much C, but this implementation does work. Also included is a Postgres version of the Ref DB backend (so nothing hits the filesystem). There are a few bits that are not implemented since we didn't have use for the reflog and those parts are technically optional.

Would probably be good to get another set of eyes on this from someone much more familiar with C.

Hope this helps!


How awesome is that!

What is the license?

Any reason you didn't use it in the end? What was your use case for it?

Question for you... and I'll read the source in the morning...but just quickly does it prevent storage of duplicate objects? That's one of the main things Im interested in is saving space when multiple git repos contain exactly the same object.

And THANKS again. Awesome. Can I do anything for you? Send you a bottle of wine? Help with some Python or Linux? If you put your contact in your profile I'll drop you an email.


Happy to help. The license is MIT (just added).

In terms of duplicating objects, I believe that if you do choose to store objects from many repos in the same table, they will NOT be duplicated and you will get your space savings. Don't take my word for it though.

We actually did use this code in production for a period of time. In the end we realized that one of the main features of Git, immutability, didn't suit our needs well and we designed a versioning system based closely on Git, but built on Postgres directly. The main benefit of this is using primary keys as the object ids, instead of hashes of the content. This means we can change the content without changing the object's id (which in normal Git then means changing the tree, commit, and every parent commit).

Good luck!


Did you consider leveraging the refdb to offer immutable primary keys?

I had been hacking together a Kyoto Tycoon-backed implementation for a project (since dropped); our design exposed the ref id to the user (e.g. 'master', 'master/mhodgson', etc) and branch/merge as necessary. This way, our primary keys remained a constant refName that pointed to the HEAD of a commit chain, each of which referenced immutable commits/trees/blobjects.

Although my days of libgit2 hacking are long past, I'm very curious if/how our design could have been improved; immutable pkeys were important for us as well.

Github: https://github.com/anulman/libgit2/tree/kyoto/src/backends/k...


I'm not sure I follow. Our use case required the ability to easily update blobs (in this case formatted written text) without having to rewrite history every time. I don't immutable ref ids addresses that particular requirement...


Not sure they would either, though perhaps a use case for git_commit_amend [1]?

Regardless, sounds fairly implementation-specific. Think I just followed you on Twitter, happy to discuss further offline.

[1] https://libgit2.github.com/libgit2/#HEAD/group/commit/git_co...


Kinda thinking it might be beneficial adding this to the libgit2 project itself (eg via GitHub PR).

Any objections to that?


You could certainly try. I believe the core contributors moved away from the idea of pluggable backends once they realized the performance limitations. It still works great for some use cases, but I think the folks at GitHub quickly realized it wouldn't work for them.


I'd be interested to hear more about the performance limitations.

My naive thoughts were that it would perform extremely well as I had thought that Postgres scales extremely well with multicore.

Is there anything I can read anywhere about such performance limitations? Am I correct in understanding that you found performance limitations - I assume when compared to file system?

Any pointers to info on where github tried this?


Once you understand how git works under the hood it's actually fairly easy to predict that performance will be poor. A simple checkout involves accessing 100s if not 1000s of objects. Also, you can't fetch these all at once because the objects you need to fetch are determined based on a nested tree. So you have to query the tree all the way down, getting each nested tree or blob based on the previous tree's contents. So ultimately you're doing 100s-1000s of queries for any given git command. Each query is fast, but even at 1-2 ms per query it adds up quickly.


Also, note that some of the code will need updating since it was written specifically with Ruby bindings in mind. Should be easy to spot and update/remove the rb_* method calls.


Any Postgres/C experts passing this way, if you have comments on the code that would be amazingly appreciated.


Notes/review:

- The ruby error reporting can be ported to giterr_set() https://libgit2.github.com/libgit2/#HEAD/group/giterr/giterr...

- Uses prepared statements √

- Requires the libgit2 source tree to be in the include path to build as it uses some internal headers.

- Should probably escape input to git_buf_printf() before it's passed to the DB.

- Should change return values from magic (0, -1, etc) to constants (like GIT_OK, GIT_ERROR, GITERR_NOMEMORY)

- Memory allocation is very light (mostly uses stack buffers) and seems sane at a glance.

- I'd recommend a -Wall -Wextra -Wpedantic compile on clang or a clang static analyzer run to see if there's anything weird or undefined I missed.

Update 2: Nevermind, what I thought was a bug in read_prefix() is probably just poorly documented libgit2 interface - I believe read_prefix() operates on GIT_OID_HEXSZ due to the git_oid_ncmp() function which does memcmp with 4-bit precision (so you can use a short hex id with an odd length).


I have no problem with such requests, and I think there should be more. But there is one issue with that. What's your motive?

Are you looking for this to complete a $xxK client project? Or are you looking for this to help students understand a C implementation?

It makes a world of difference. If you are getting paid for the result, then you should share with the developer. If you are not, and doing this for the greater good, then it's fine.

We should really define the line here. There is enough exploitation in the dev world and the last thing we want is developers exploiting other developers.


Have you tried https://github.com/davidbalbert/libgit2-postgresql? I know nothing about libgit2, but came across this repo while looking it up. If it's got some issues it's a great place to start from, since it also gives you somebody who could code review and you don't need to start completely from scratch.


I did a quick review.

This project uses PQescapeLiteral() and asprintf() to build queries. I'd recommend prepared statements over this approach (which is how the MySQL backend builds queries).

It doesn't implement anything but write() and free() yet, so it's not actually functional.

Unfortunately I believe this would clobber the library license (GPLv3 on this Postgres plugin vs GPLv2 with a linking exception for libgit). With this in mind I'd probably just recommend a rewrite. It's not that much code.

Update: it looks like the libgit2 backends aren't well maintained: https://github.com/libgit2/libgit2-backends/issues/13#issuec...


I did notice it's a different license. Do you think there is value in porting the MySQL or SQLite version? They would share the libgit2 license.


Would you mind casting your expert eye please over the solution that has magically arrived in this thread? I'd be interested in your thoughts.


I wish there were more Ask HN posts like this.

It's also a great chance for someone who perhaps is looking for a gig to put some code up here that quite a few people will review.


If it's so short, why not give it a shot and put it up for code review so others can catch the nasty and insecure bits? Great learning experience :)


It would be good if it was awesomely and expertly done rather than amateurish, which my fumbling would result it.

Good idea though, time I started getting some C skills but on a less important bit of code.


This doesn't look too difficult to do (definitely not a "wizard" level task), and I think there will be more work in setting up tests, etc than actually writing the code. A good learning opportunity so if you've got the time go ahead and do it yourself.

If you end up doing it yourself, I'll volunteer to do a quick review.


Courage! You can do it! It would also be a great learning opportunity since you have the other 2 libs to use as an example. You should go for it and then put it up for code reviews before using it. Seriously! Give it a shot and then ask HN to review it in another Ask HN.


Unless you're doing this for a business use case (which your insistence on expert quality and security earlier kinda suggest), it shouldn't matter.

Don't fish for free work when you could turn this into a learning opportunity.

Besides, why would you want to use PG as a backing store unless you were doing weird multitenant stuff?


I'm a sole developer in pyjamas coding in the lounge room trying to come up with an idea that will make an income when I'm not doing my day job to pay the bills, in case you're asking if I am a rich company trying to get free development services. I'm happy to reciprocate technical services if what I know is of value. I write open source code too so I'm contributing although not to the libgit2 project.

Same as many other developers here on HN I suspect. Coding in the hope of building something people want to use.


So, you're looking for somebody else to do at least some of the technical work for a passive income project, without some sort of profit-sharing arrangement or transparency about that fact (until I brought this up)?

Truly rich companies are more likely just to pay for talented developers or services than small folks--see also aquihires.

Anyways, enough of the business/philosophy stuff...what are you looking to gain by using PG as a backing store?


It would be good if all code was expertly done. Yet that costs money or effort.


I had a quick look and agreed for a learning exercise this would be fantastic and given you have two existing examples to compare with it would be possible to work it out and learn some C at the same time and even if you did not know C; You would be able to hack it together without much brain taxing.


Where would you put it up for review?


The PostgreSQL developers team lives on the pgsql-hackers mailing list (http://www.postgresql.org/list/pgsql-hackers/) which is appropriate for discussion of current development issues, problems and bugs, and PROPOSED NEW FEATURES. =)

See also http://www.postgresql.org/developer/


But it's not a postgresql feature, it's a libgit feature. That said, you might still get a review out of it, but it's certainly not the obviously correct place.


Oops. Thanks for the correction.



Put it up as a gist on github. Or put it up as a PR on github. Branches are cheap! :)


I decided to do it just for the fun of it.

https://github.com/cbdevnet/libgit2-backends/blob/master/pos...

Caveat Emptor: I did not test this since somehow the Debian package of libgit2-dev seems not to include the GIT_* constants. It should probably work, though.


It probably isn't the most efficient way, but were an ODBC backend written for it, libgit2 would be able to use any database that has an ODBC driver - which AFAIK is almost all SQL-based ones.


Wow! My C knowledge is dated, but my Postgresql knowledge is good. Would be fun to attempt if I had time.


Didn't know that libgit2 has a MySQL backend, thanks. :D


Performance is an interesting question, I wonder how a Postgres backend would compare to serving git from file system.


As an aside is their any crossover or anticapted convergence between libgit2 and git?


Presume from the down votes that's a contentious question? I never knew you could add different back ends to libgit2, maybe this'd be a useful feature to cmd line version? I'll answer my own question in anyone else is interested in the diversity of how git is or can be implemented: http://thread.gmane.org/gmane.comp.version-control.git/20421...


You know what?

I propose we use "Task HN:" and do these requests more frequently. We'll surely learn a lot, come to ingenious solutions, and maybe, just maybe, make our time on HN more productive. I, for one, want to at least see, if not help, what small but interesting byte-size hurdles others encounter and how others can solve it in different ways, and all the discussion around it.

Working on something together binds communities even tighter.


Sounds like that would devolve into a situation where people are trying to exploit others for free work. There are plenty of places on the Internet where you can go to find people asking others to write software for them for little or no money. Should this really be one of them?


>Sounds like that would devolve into a situation where people are trying to exploit others for free work.

And it sounds like this is one of them. The OP is "trying to come up with an idea that will make an income when I'm not doing my day job to pay the bills":

https://news.ycombinator.com/item?id=9833548

If you can't pay someone to do it, even at some of the low rates I see on elance, then learn to do it yourself. It would be a different matter if this were someone asking for input on the code he had written, but just asking someone to do it for him is unseemly.

I'm 100% in favor of a feature on HN where existing open source projects ask for help and/or contributions, but someone who's starting up a for-profit business and asking for free labor leaves a very bad taste.

I've had would-be clients like this, who ask me to do work for free or at a significant discount since "it's related to your open source work, anyway, and you could even use it in your project". Made the mistake of accepting work from a client like that once, but never again.


If the result is open source I don't see that as a huge problem.


The majority of top open source contributors are working in some kind of paid position. The idea that "open source = cost free labor" is not beneficial to the OS community or to software engineers in general.


There's a difference between "Write my code for me plz?" and "Many people like myself would love to use x, but it's not ported over. Can anyone help?" I believe this is the latter.


No. As you say there are enough places where that shit happens.

Should hacker news be the place where interesting suggestions for stimulating open source tasks were shared? I think yes, more so than all the politics that creeps up here.

And if the requirement is that it must be open source and not directly something the asker will make money of then it stands very little chance of being abused and if it does, is it any worse than all the other spam we downvote/report?


Upvoting/downvoting should, at least in theory, prevent that from happening.


The best form of moderation is not encouraging behavior that requires moderation. Downvotes are a form of community feedback but not really a moderation tool. For example, in cases where I am in doubt about the benefit of something to HN, my threshold for downvoting is much larger than my threshold for upvoting: I am more likely to upvote when "it might be useful" than to downvote a "might not be useful" because there is a person's karma at the other end of the arrow.


Absolutely. Apply judgement.


There is no downvoting at HN


I believe the same trend like in any (most) other non-news submissions will follow - intellectually stimulating tasks get upvoted.


Agree, maybe doing "Help FOSS" could be more relevant just posting any task, Open source contributor might ask for help, for some of their specific fields, HN has plenty of excellent developers


Excellent developers (and even non-excellent developers) have an abundance of opportunities to contribute to open source projects. I think that "Show HN" provides a reasonable exposure mechanism for a project to reach HN's audience and one with a roadmap has the chance to highlight areas where people might lend their expertise.


why be cynical? If I were a c expert and had some time to kill (perhaps between jobs) I'd happily jump at the chance to advertise my skills and help someone out.


I had an idea a while ago to build a website where open source projects could make very specific requests. If it was something that another project needed as well, it would get a "bump" of some sort.

If it was in small bitsized chunks, i'd definitely surf it looking for easy things I could contribute that could help a few people.

Even better is if the site could certify it helped them, and I could get some kind of tax credit for it :D


> I, for one, want to at least see, if not help, what small but interesting byte-size hurdles others encounter and how others can solve it in different ways, and all the discussion around it.

I've thought about this idea and I think without proper "project managers" (I don't mean someone with a project management degree - but just someone to coordinate all the efforts) it seems like it could be a total failure.

Here is an example: I have a C++ game engine that I need help with feature X. I think any C++ developer could come up with an implementation of X, but does the style fit my game engine? Does it interconnect with the rest of the engine? (of course ignoring the fact that any discussion of C++ would generate gigabytes worth of comments - I've read some of the newsgroup discussions on style alone...). You can't just be like "create a logging interface" without having studied the rest of the code base. It would be like trying to create a feature for Apache or Linux kernel - if we didn't study their code base and style anything we submit would be laughed out the room - not because what we did won't work or wrong but it doesn't fit with the rest of the code.

This is where the project managers come in - they already know the style and inner workings and take what you submit and hack it into the right style and push it into a SCM or push it upstream (or reject it). Personally I would feel more inclined to make contributions to the Linux kernel if there was a friendly middle man I could look over my work before it gets to Linus - only because I fear if I submit something stupid I'll get chewed out by Linus.

Even if this were to happen - I would not want to read 10k comments of tail call recursion optimization, smart pointer usage, or discussion of non-portable code that will work on 99% of systems except for AIX Unix and Blue Gene/Q.


This is such a good god damn idea.


I agree with the sentiment. Leveraging the skill and goodwill of the community is a great idea and anything that reduces the friction for volunteers is a good idea.

On The other hand, I think there is little to be gained from frictionless requests for volunteers. That invites spam and low quality requests because there's nothing to lose. A linked blog post with technical details and relevant context is probably the ideal level of impedence with the HN format. The format of this request is a nice hack, but standardizing on a kludge is not the way to go.

Maybe the structure is a monthly "Can you help?" thread...and perhaps a complimentary "Can I help?" thread.


> That invites spam and low quality requests because there's nothing to lose.

I think you may be underestimating desire for geek cred. Non-throwaway account would have little incentive to just spam with trivial tasks because they won't be upvoted, might even be flagged, and are tarnishing individuals reputation.

You may have a point about format not being ideal and having too little friction. Not sure.

> Maybe the structure is a monthly "Can you help?" thread...and perhaps a complimentary "Can I help?" thread.

This most certainly wouldn't work for these types of requests, as they are rarely in opportunity to be scheduled for the next month, or at the very least, get solved by then by ugly hacks.

Again, not sure what would emerge out of it really.


I've seen plenty of behavior outside community norms from non-throwaway accounts on HN. Including my own.

If an open source project can't schedule major features a month out, then throwing more bodies at it won't solve the fundamental problem of disorganization and isn't prepared to make appropriate use of volunteer's time in ways that really value that time. Structuring policy around a continuous stream of "emergencies" invites low quality requests. A policy which parallels HN's job listing policy favors HN members rather than people for whom everything above zero euros is paying too much. The HN community benefits from a slower process with a higher barrier to posting requests.


A lot of open source projects are just a single person doing it in their spare time. Throwing more bodies at the problem is exactly what is useful there. If you're developing on your own in your spare time it can hardly be called disorganisation.


Exactly useful to whom? The tossing of warm bodies seems a rather suboptimal way to build teams, much less communities; perhaps because being tossed provides so little utility to the warm body. Building sustainable teams and communities around an open source project is always going to be hard work and a low barrier policy on HN that encourages requests for help doesn't change that.


Useful to anyone even remotely involved in the process. The random helping hand gets that warm fuzzy feeling of helping, and nerd credz. I've done this before, it really is a motivator. The sole maintainer gets assisted with their roadblock. All users/watchers of the project get to rejoice because the project has progressed.

I do not see why you focus so much on team building when that is completely unrelated to what is being discussed. Not all open source projects are big enough to warrant any kind of team. They're literally just someone's side project, which may or may not be useful to other people.


I don't really think either of us can reliably substantiate predictions of how it would pan out but, considering that in this very submission, help was asked for, received and open-sourced at that behest at no cost to anyone (how awesome is that?), the idea stands validated at 1:0. :)


A few years ago, around the US Thanksgiving holiday, there was an "Offer HN:..." thread. Over the next few weeks, there were a number of genuine offers from very capable people. It was really awesome

But eventually, it became what any cynic would expect: a mechanism for "We will build your iOS app for $2500" offers and today it's dead. This isn't a 1.0 release. It's a 0.1 alpha. Nobody has written the 400 lines of good expert code requested. The existing C code that has been offered may or may not blow up under the OP's load or not meet their exact needs. No one is claiming it is bet-your-business ready. The fundamental open-source project problems of maintenance and further development are not solved or even addressed.

More importantly, it hasn't been demonstrated that this mechanism works for getting people to substantially support the project with more than the goodwill of contributing existing code developed for another purpose. Don't misunderstand me, that's a great thing. Which is why a once a month format is a reasonable starting point.


Taking this great idea a step further, I'd love to see projects.ycombinator.com which could eventually be submitted as YC proposals. I wonder if YC need any help building such a tool... I'd be game to help if others want to?


Yeah, what harm could come of asking people interested in news aggregation to complete tasks they have little interest in and little reason to do so?

Surely the overall quality of HN won't suffer even more from people asking other hackers to write their C for them, what a great idea!

fucking /s


Uninteresting tasks won't get upvoted. Interesting tasks get upvoted, and become a story in itself.

Surely it won't have as detrimental effect as anonymous throwaway accounts rich with shallow sarcasm and curse words.


If you don't like it then don't participate. Some of these projects might be interesting. Unlike you apparently I actually like programming and do have an interest in taking part.


I share your concern, but if the tasks are of little interests and there are no great reason to do them, surely they will not be upvoted?

HN has always been a bit more than a news aggregator, there are "Tell HN"s and "Show HN"s too.


Seeing the same problem solved in different ways is one of the best ways to learn. Cynicism is intellectual junk food.


It's sort of telling that you refuse to put any effort into showing anyone you can at least put together a crappy implementation to demonstrate understanding and contribute back to the same knowledge space (beginner knowledge is knowledge) you wish to extract from.

The demand for the highest quality code for what is essentially begging is also not becoming. Even contributing a terrible implementation would tickle people's motivation button (so they can teach someone without building your entire project, show they are measurable better than someone, etc)

"Beggars can't be choosers"


I don't see why I shouldn't ask if anyone is willing to help.

1: it's open source snd presumably would be committed to the libgit2 project.

2: it's likely to be useful to others

3: it's central, critical code that isn't well suited to beginners. Of course it should be quality code, it's responsible for storing data.

4: it would strongly benefit from the eye of a Postgres expert, which I am not

5: I do not have money to hire developers

6: I write open source code too so I contribute my time to the public


Don't let the haters fool you - they're the same guys who told dropbox that they didn't see the point of their services over sharing usb flash drives.

This is a good idea. Other people could abuse it but this is certainly not abuse, any more than it would be to just file a feature request as an issue on a project. Hope it comes through.


This comparison is lost on me. What do the drop box naysayers have to do with people submitting code requests?


there are just some people who initially hate nearly every good idea


No one said you had to run your for-profit idea on whatever code you come up with. There is something to be said about someone who asks for something but isn't willing to put in the work personally (or only the work they want to do).


Reminds me of the 'Stone Soup' proverb.

https://en.m.wikipedia.org/wiki/Stone_Soup


Is there a reason it couldn't be done with C++11/14 with an extern interface to C?

Here is a lib to match python's string functions in C++.

https://code.google.com/p/pystring/

When working with string and vectors C++11 can actually be pretty straight forward, productive, and clear with no manual memory management. It definitely doesn't have as many string functions out of the box as modern scripting languages though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: