Hacker News new | past | comments | ask | show | jobs | submit login
The story of Google Guava and patches (plus.google.com)
87 points by michaelneale on March 11, 2012 | hide | past | favorite | 49 comments



I'll call bullshit. Either they care about external developers or they don't. This is saying they don't.

Google's culture seems insular and elitist. Besides Guava, they did the same thing with GWT (which, as much as I love GWT, didn't work out in the project's best interests, IMO), and now are doing the same thing with Dart (AFAICT).

Maybe in the 90s you could get away with this. But now if you don't have an active external community, whenever your old guard of Guava/GWT/Dart developers gets bored and leaves, the new guys that come in behind them aren't going to care nearly as much about the Google internal technologies vs. the true open source technologies they've been hacking on before/after their time at Google.

So the Google/internal technologies will eventually stagnate.

Perhaps internally-driven projects can get more stuff done in the short term (thanks to dedicated resources), but I think in the long term the external community out-innovates internal projects (due to the internal teams getting burdened with legacy requirements (cough GWT), politics, etc.). Dunno, that's my impression.


I used to an be external contributor to open-source GWT, and then became a Google employee to work on the GWT team and the situation really has nothing to do with elitism or culture.

It basically boils down to a matter of resources. When I started as an open source contributor to GWT, it was used by external developers, but not really used internally by Google, so changes made by external committers couldn't possibly break anything.

Slowly over time, more and more Google properties started using GWT, and suddenly, you had the situation where an external user could submit a patch, that passed all the GWT unit tests, but broke major Google properties (e.g. AdWords, Google Groups, Wallet/Checkout, etc) Google builds everything from head, so when you do an internal commit, not only do your unit tests run, but the unit tests from every project that depends on GWT, so you find very quickly if your patch broke real applications. This happens all the time. I commit, pre-submit queue for GWT is green (all tests past), then hundreds of other projects get their chance, and there's always 1 or 2 that break, not always because of GWT per se, sometimes because of bad code in those projects.

The problem is, there is no way for external committers to get notified of internal (potentially confidential) apps breaking on their changes. This meant that every external commit would need to be reviewed and proxied by someone on the GWT team.

Now, back when we had over 20+ people working on GWT, it wasn't hard. Now there are only 5 full time committers, and it has become a lot more difficult to keep up with the external community and support internal users.

I have been internally advocating that we "re-open source GWT". That is, we make the "source of truth" be an external repository, possibly re-hosting it on GitHub or Google Code, and fork it off from the internal version. We grant all our of best and dedicated contributors rights to administer and commit on equal footing with Google employees, and we run external continuous build systems for it.

On the innovation front, I think it's true for gwt-user, but for the compiler, I've hardly gotten any external contributions for optimizations and almost all of the improvements in speed and code size have arisen internally. This may indicate that we should run separate open source projects for the compiler/tools and the libraries, splitting them up and separately managing them.

But the GWT team IMHO was never elitist, just saddened that contributions were piling up and we lacked the bandwidth to review and commit them in a timely manner. I feel bad about it, given the time people put in, and I've been spending time recently trying to collect all outstanding patches for landing into GWT 2.5.


Hi Ray. You make good points, more than I can justly respond to right now.

Briefly, I did not know GWT was initially not used widely within Google--I had assumed it was from near-day-1.

The re-open sourcing sounds cool, although it would be interesting to see how it could play nicely with the internal build-from-head system you guys have. E.g. to avoid effectively forked projects.

Speaking of building from head, I'm sure it's a net win, but, as an outsider, the handcuffs of backwards compatibility seem overly tight. More frequent major-point releases that could clean up cruft might be nice. Not sure how you guys handle major-point releases internally? If at all?

And, yeah, elitist was too strong, especially to apply to individual developers. However, not just recently, but over the lifespan of GWT, it hasn't spawned an external dev community (AFAIK), so it seems like something is off.


The history of GWT was that it was started by Joel and Bruce and acquired by Google. Internally, at the time, Google had been using Closure Compiler and millions of lines of Javascript code, so just from inertia, there would not have been much use in the beginning, because it's not like the GMail team is going to rewrite GMail in GWT over night. Really, the first high profile consumer facing project done with GWT was Wave. AdWords is also GWT, but not very sexy.

I come from a background of using Maven to build my projects, and Google's internal build system is somewhat maven like, but it doesn't let you specify versions in dependencies, so you always end up depending on HEAD. To me, this is the root problem making it hard for projects that live simultaneously in the open and closed worlds.

It would be interesting to see how the Guice team handles it, but maybe they're patch velocity is small.

As for why GWT didn't get a huge external community of committers? I do think it has something to do with the fact that it is a gated community, that people feel like they don't "own" it, Google does. Maybe re-open sourcing it and rebranding it as "Open Web Toolkit" or "Community Web Toolkit" would somewhat remove those mental blocks.

I would love for the open community to be true owners of GWT, and Google as just a contributor. I've been lobbying to make it happen, and I hope it does. Too many external people have put in a lot of work, they deserve it.


Why was GWT team reduced? Developers were moved to Dart project?

Is there any schedule for 2.5? We are waiting for SourceMaps, since debug mode is working slow in Chrome & IE, and with Firefox there always is a version gap. Safari plugin is also broken since version 5.


I can't fully explain why it was reduced, but it's several factors:

1) Some people left, for various reasons (like being at Google for 6+ years), for example, 5 left to go to different startups.

2) Some went to work on Dart

3) Some when to work on Chrome

4) Some went to work on a secret project

5) Maturity. At Google, 20 is considered a "big" team. The Closure Compiler, on which GMail, Docs, Google +, Google Search, Google Maps, etc is based is not even staffed, it's just the result of 20% contributions. GWT was, from an organizational standpoint, even in the wrong division. It should have been in infrastructure/dev-tools/etc, but instead it was under a management chain that cared about other stuff.

GWT 2.5 will be released very soon. I am trying to collect all external contributions that have been languishing and land them first.


Google could carry this suggestion further and re-opensource all projects in the same manner including Android..

Not that it would be an easy transition but the long-term value from all the communities would be in Google's favor..


If only I was a senior VP. :) There are many many Googlers who would like projects to be more open source than they are, since the company has aggressively hired people who value open culture. It's not always possible, I don't know the Android team's constraints, but I would gather if you're working with an OEM on a new secret device, they don't want hints of it leaking in the public repository. :(


I'm with stephen on that. You say each of you working in the GWT team, as a person, didn't have enough time to do review of submissions. To you there was not enough bandwidth.

I say your employer didn't allocate enough bandwidth. It prioritized internal needs so far over the free software approach that the free software aspect died off completely.

You are right in saying none of the developers was an elitist; you are not right in saying that Google in itself wasn't elitist. Your manager's manager's manager knew exactly what he was doing when he allocated this many people to your team, and not more; when he didn't allocate people to do full-time community submission review when there was obviously enough work for multiple such positions.

That's one issue with "sponsored but free" projects: they never are really free because the work gets prioritized that benefits the sponsor, not the work that benefits the project as a whole. Projects are forcibly steered to keep their center of gravity and biggest source of submission inhouse, under the control of the sponsor. It's a very simple strategy of making a "free" project non-free; you can immediately pull the plug on a project which misbehaves: first you make sure the developers not under your control get stalled and discouraged, and that their work becomes harder; they cannot keep up. They couldn't continue the work being done on their own. Then, you make sure the others are so engrossed in work you, the sponsor, assign to them that they cannot keep a project healthy through intellectual exchange with externals. You build a glass wall around the project. You make internal documentation that does not need to be redistributed with your "free software" because it is not a part of it according to the law. The (complicated) build system isn't, either. Finally you have complete control of the direction, progress, and life of the project. You, the sponsor, can submit the project to your neurotic desires, which we, developers, all know. The typical corporate bullshit-o-rama that never fails to make a good project go to hell.

Open source is not a salvation.

The only way to prevent really good projects which have received huge amounts of contribution from skilled developers from becoming useless wastes of talent is to make sure that external requests, contributions, and requirements are taken at face value and are given a chance to be evaluated with the same attention as internal problems. This cannot happen in a corporation where the management performance is counted as a function of successfully resolved projects coming from internal clients. This cannot happen at all if there are internal clients, because "internal client" is a term made exactly with the point of being able to work better with "internal" clients than "external" clients. Sponsored software is not free software unless it is something akin to a donation, where the sponsor is not given creative or otherwise rule over the project.


Not all of Google is that way. The Chromium project, I think, is a notable exception. Gerrit Code Review is happy to accept outside contributions. And technically the Git maintainer (Junio Hamano) is a Google employee these days.

Sadly, Kevin doesn't sound too good here:

And here's the last thing. Be honest: if you were going to sign yourself up for doing all that work above... wouldn't you at least want to have the pleasure of writing the code for it yourself?

Code is code. (Well, as long as it's not awful/ugly code.) I'm as happy to marshal through someone else's code as write my own.


No it's not. Code as patch becoming code now becomes immediately your problem and headache. Unless it's exactly how the core devs would write it.

Don't even get me started on what I think about accepting or submitting patches for code I don't actually plan on using on a day to day basis.


Highlight: Stop submitting patches to Guava - it is too much work for us.

My Favorite response (from Martijn Verburg): "...could you guys work with the community to teach them to submit better proposals/patches? Many open source projects are able to do this from the Linux Kernel through to hobby projects like PCGen. Perhaps talking to their committer teams might give you some insights."


Interestingly I heard that the Java people at Google rail against using Python for large projects because they supposedly get out of hand..


What does it mean for a project to "get out of hand" in this context?


Java starts off out of hand, and just gets worse when things like Guice are thrown into the mix.


Sounds like Martijn doesn't actually know much about Linux kernel development.

Patches get rejected from the kernel all the time for many of the same reasons listed in the linked post. It takes a long time for most new kernel contributors to get anything substantive in, and major changes almost never go in without huge reviews, fights, competing proposals, etc..

The only difference between that and what this post talked about is that for the kernel, a lot of the sausage making goes on in public (though far from all of it).


I actually think Martijn does understand that many patches will still get rejected - but he would like to still see some community contributions getting in (as opposed to the current message of 'stop contributing').

I personally think that a project with source available under an OSI Licene but no effort to respond to or build a community is not really an open source project.


But the Linux community does a pretty good job of describing how to submit good patches. The kernel sources include README files for SubmittingPatches and CodingStyle (and scripts for checking patch style).


Maybe I'm alone in thinking this but having been in the position of reviewing more than a few non-trivial bugfix patches myself I think I might tend to agree with Kevin.

Sure it's great people are excited and want to contribute but all that excitement is due to the love and care people sweated in to making every single line in that codebase as perfect / performant / easy to understand as possible.

Patches almost never add to something like that. ESP not on such a small focused library. Truth be told most of the time on open source projects you're accepting patches simply to get more community involvement and acceptance. Guava doesn't need acceptance, it has been lovingly accepted already. If you want open armed love go to apache commons.

If you want perfect performant code you can use and trust consistently go to guava.

I'm grateful and happy that it exists and it is a pleasure and delight every time I incorporate a little bit more into my codebase, slowly.


Gee, how does Linux ever make it! All of it is contributed! Oh: Torvalds just sits down and comes through on looking at the submissions.



Am I the only one who is really annoyed by links to Google+ that can only be seen after signing in? I thought it was considered bad style to do that for NYT links here. Maybe the same holds true for G+ links?


This post should be publicly visible without being logged in (at least it is for me). But this is the second HN thread in a week where I've seen a comment like this, so can you send me your details (web browser, etc), so I can debug, please?


I also get the sign in redirect in the Android Browser unless I choose to get the desktop site.


Visit the link on an iPad.


Thanks—will take a look.


Asked to sign in on iphone4s. Quite annoying and uncalled for.


You really have to take the good with the bad when it comes to Google sponsored libraries.

I'm usually won over by them because Google does truly great Java API design, their releases are relatively high quality (there is some assurance of quality when it's used internally at Google) and their libraries almost always enforce good design (they don't accept things that you would consider helpful if they think it will be easy to use incorrectly or abuse).

But with that comes the bad. If something's not helpful to Google, it won't have sponsorship to be added to the library. The library will always support only versions of Java that Google internally uses (Kevin has said before that it is unlikely that Guava will be expanded to cover even Java 6 any time soon).

So I'm enjoying using their libraries while they are current, but am fully aware that they might need to be forked eventually.


The point that seems to be missed here is that Google is eating their own dog food. Doing such they are hesitant to fix what "ain't broke." Were this merely code that was being thrown over the fence from time to time I'm sure you'd see a higher patch adoption rate.


It seems there is open source which is setup for community contribution and open source which isn't. We tend to only really think of open source, ideally at least, in terms of projects that allow community contribution.

From what I have read Android is pretty similar? Very hard for developers to actually get some of their code merged.

Wondering now what other Google projects are like for outside contributions, Chromium etc.


Very few Google projects allow outside contributions. Go is one that does; no surprise there of course. There is an attitude pervasive at Google that open source is a great marketing tool, but that it's a one-way street. I think it's because of the high employment standards at Google; they cannot fathom how someone without an @google.com email address can do better than them.

Dart is the perfect example of this: developed in secret, in a dark room, and then dumped on the open source community. When no one was excited about it they shrugged their shoulders, confused about what they had done wrong.

I have no interest in Java, and I really hope someone forks this project and treats it like a real open source project.


I'm not sure where you came up with any of these assertions, as they are all simply false.

First, plenty of Google projects allow outside contributions. There are over 1400 open source Google projects (The number is actually much larger, but i've only counted those that wanted to be identified specifically as Google projects), and >98% of them allow contributions the last time I looked. In fact, it's easier to simply list the ones that don't than the ones that do. I have to imagine you have your own list of what you think are "Google projects", and are working off of that when you made your assertion.

Second, there is definitely no "attitude pervasive at Google that open source is a great marketing tool, but that it's a one-way street". As the guy generally responsible for helping teams that want to open source stuff, I can tell you that in 6 years, I've run into this attitude maybe 5 times out of the (again) 1400+ projects that got released. That's not to say everyone open sources stuff at Google for the same reason, but there is certainly no pervasive attitude like you describe. The reality is a lot of folks at Google have released open source projects for a lot of reasons, and of those reasons, "marketing tool" is pretty far down the list.

BTW, none of this is to say I agree with Kevin's approach to running Guava; I don't, for various reasons. But in the end, he (and the rest of the guava folks) are the ones doing the work, and the issues here will work themselves out in the normal way (either people will grudgingly accept it and keep using guava, or some fork will become more popular eventually and take over). In either case, Kevin making a clear statement on the situation helps move things along one way or the other, and is a lot more than you can get out of other OSS projects that do something similar.


What do you mean exactly by "allow outside contributions"?

Do you have stats on the number of projects with at least one commiter that doesn't work for google? The number of projects with non-trival LoC written by non-employees? The number of projects with an external mailing list as the place where project decisions are hashed out?


1. I mean are willing to accept outside contributions of code if people submit them, and put them in the codebase if they are acceptable.

2. I have the stats, but given that the vast majority of open source projects (Google or otherwise) don't ever grow past a few people, I don't see why it would be relevant? I also don't see why it's relevant whether they work for Google or not. It's not like we require projects follow a different process for Googler committers vs non, so i don't see how it's any different from a project where the committers are all really good friends who work on an OSS project together. We also hire a lot of committers to our open source projects. I'm guessing you want to make a distinction between "corporate open source projects", and "non-corporate open source projects", but in reality, making such a distinction would be a mistake, because the typical differences are in policies and preferences, and they apply equally well to either. IE What matters is the policies the project applies to committers and contributors, not whether they all work for the same company.

3. Again, I have stats, but for the vast majority of open source projects, just because most are willing to accept them, doesn't mean anyone ever contributes. This has nothing to do with Google, of course. If you look at the hundreds of thousands of projects on say, sourceforge, you will find the number that have either at least one not-same-email-domain-as-owner committer or non-trivial LOC written by a not-same-email-domain-as-owner committer else is quite low.

4. This seems to be a governance and social issue, I don't track it formally, as it would be quite difficult to do so. It would also be wrong for us to try to force a model on folks. We give folks info about what we thinks are best practices, and in fact, free copies of the producing OSS book (Karl used to work with us :P), how they run their projects is generally up to them. We are happy to give them advise when asked, and are happy to consult in general.


Thanks for your reply. I appreciate the engagement.

---

I think you set up a false dichotomy between corporate open source and non-corporate open source. A better division would be between single organization projects and multi-organization projects. For example, OpenJDK is very much a corporate project - but in addition to Oracle, IBM, Apple and RedHat all have people heavily involved in development. That means that if Oracle were for whatever reason to lose interest the project wouldn't necessarily die (leaving aside patent issues).

On the other hand look at GWT, see in particular your colleague cromwellian excellent comments upthread. Here was a technology that Google was at one time devoting a great deal of resources to and was moving quickly and in exciting directions. A lot of companies built businesses on top of the GWT library. Now, for what seem like very good and valid reasons, Google has scaled back its efforts on the project. That's fine, but there is no community ready to pick up the slack. The reasons there is no one to pick up the slack are not technical or legal, but as you describe it "governance and social issue[s]".

When there is no path to becoming a committer, you are walled off from discussions about the future of the project and your patches are accepted reluctantly at best - why spend the time to deeply familiarize yourself with a codebase?

Except by being hired away by Google, which isn't exactly going to make your company thrilled to sponsor your work on a project.


tldr: It's more difficult to maintain a Java util library than it is to maintain the Linux kernel, so patches are not welcome. They'd love the community to do their bitch work though.


a better point is the preface to the guava project docs:

"The Guava project contains several of Google's core libraries that we rely on in our Java-based projects"

given that, it is fair to say "this is essentially an internal codebase, and we would prefer to develop it ourselves so that it fits our internal practices and standards; however, it is an extremely useful set of libraries, and we are happy to share it with the open source community so that you can use it too, if you like"


Sounds like a good article, wish I could read it on my mobile iOS device.



For those who can't read the G+ post (yes, it can be a pain):

----------

The story with #guava and your patches

Guava users,

Many of you, when you request a feature for Guava, have submitted a patch to us with the implementation (or even pasted code directly into bug reports).

And we have almost never accepted any of these patches, let alone even read them. And I know this makes us look all manner of self-absorbed, arrogant and unappreciative. That's what I'd think in your shoes. So it's time I tried to explain to you more fully why it's like this.

I realize that from your perspective, you're handing us a shiny new feature on a silver platter. It should be making our decision easy, since the work is already done. It's a gift of your time and effort and you've already solved the problem and all we need to do is just accept it! Looked at that way, we're either idiots or jerks for not being interested.

But here's the part that I don't think many of you understand: the work you've done to produce that patch is actually minuscule compared to the total amount of work we have to do to put it in Guava. I know that it feels to you like you've certainly gotten us more than halfway there, but trust me, it's only scratched the surface.

- We have to work out whether the problem it's trying to solve is truly the right problem - We have to work out whether the solution presented is truly the best solution we can come up with - We have to find evidence in the internal Google codebase that users will actually use the proposed feature if we create it. If we are adding methods to our libraries that don't get used, it hurts our case when we try to argue to management that we're doing important work and need more staff. - We have to figure out how it relates to the piles of legacy code we have floating around our libraries (that you, lucky folks you are, don't even see!), and how we would deal with migrating those users if they exist. - We have to decide the best name and location for the new API. This is hard! We spend a lot of time in our API review meetings just batting names around. - We have to review the code deeply. Our code reviews are grueling and go on for many rounds. When you look at the code in Guava it tends to look "obvious", but we work very hard to achieve that quality. It's only obvious in hindsight. - In almost every case we have to completely rewrite the javadoc that first gets submitted to us. And this is very hard. Writing good documentation is probably the biggest challenge we ever face. - The tests that were first written are rarely sufficient; we're going to need to add more. When we do, some usually fail. - If the change touches on any existing functionality, we have to submit it to Google's global submit queue and analyze test results from many thousands of projects to make sure we won't break any internal users with it. - If the change goes in, we have to deal with the machinery that gets that change integrated out to you in Guava. - We then become responsible for fixing any bugs with it that come up over time, and dealing with the related feature requests it will touch off. - And the code never "stays finished' in general; we are constantly performing various maintenance tasks over our whole library (or even the whole codebase of Google), to make various cross-cutting improvements, and every bit of new code added increases that burden.

There's more I'm leaving out, but you get the idea. Guava did not get to the level of quality it has by accident, but by us being really obsessive about these things.

Now, when the patch comes from outside Google, we have additional mechanical overhead. One of us has to sponsor the patch as if it's their own, converting it into an internal patch that can merge correctly (which isn't always as trivial as it sounds), and sending it for review to another member of the team. And because we are the ones most familiar with our own style, conventions, practices and pitfalls to avoid, etc., sometimes just doing that plus "cleaning up" the code to get it ready for review is already more time-consuming than if we had written it ourselves from the start. That doesn't even mean that the code sent to us in the patch was bad. It can be very good by most standards but still need a lot of rework for our purposes.

Remember, if your feature is valuable, then we're going to want it in Guava whether you provided a patch or not. Providing the patch doesn't make it more likely that we'll decide it's a good fit for Guava -- if anything it just puts us more on guard against that seductive temptation to think "but it's already mostly done anyway, might as well!"

And here's the last thing. Be honest: if you were going to sign yourself up for doing all that work above... wouldn't you at least want to have the pleasure of writing the code for it yourself? I love writing code -- that's why I do this! -- but such a large majority of my time goes into activities like those described above. If my job were all about just applying other people's patches, I would inevitably start hating it after a while. Let me have some fun sometimes, okay? :-)

I really hope this helps to understand why your patches seem to go into a black hole. I know that no matter what I say it will probably continue to seem unappreciative and condescending, and I apologize. I do recognize that you are just trying to help. But, if you really want to help, then keep an eye out for the times when we will ask for help on a particular issue, because that's where your time and energy will really do the most good!

Rantingly yours, KB


Branching.

A simple feature everyone is trying to avoid, but sometimes it's like am open door from golden cage.

Guava is the "from inside out" project presented as "take it or leave it". I would not personally beat Google because of their attitude towards changes. But they have to state it explicitly, otherwise more contributors - after so much work they invested - will feel betrayed!


> If the change touches on any existing functionality, we have to submit it to Google's global submit queue and analyze test results from many thousands of projects to make sure we won't break any internal users with it.

Is there any public info about "Google's global submit queue"? I would love to learn more about such a huge automated test system.


Guava is a work of art created by the team that maintains it; what's the big deal if you can't add your own code to it? Merely a lost opportunity to advance your own vanity?

Respect their boundaries and let your experience inform your feedback to them. If you read what Kevin is saying, it is clear they are interested in hearing if, how, and why their library is helping or hurting your own project. They will probably listen if your feedback provides the answers they seek.


The hard part of guava is deciding what to include and exclude and they base those decisions mostly on what is useful at Google. The code is easy and handing them a patch is pointless.


"Sam Berlin - Disclaimer: I don't know what patches to the Linux Kernel are typically like, nor PCGen. But, +Martijn Verburg, there's a pretty big difference between submitting a patch to a library and submitting a patch to a "project". Patches to libraries are typically changes/additions/removals to the API, whereas patches to 'projects' are typically changes to the internals. It's a whole lot easier to change the internals of something than it is to change the API. Changing the API means the effects can bubble outwards. Changing the internals is usually just optimizations or bug-fixes."

haha wow. this is why I love java programmers


How is this Java-specific? Same problem exists in any language.


The simple solution for dealing with Google-scale bureaucracy is to fork and continue pushing forward. Then when that fork gets locked down, create a new one. Not every project is going to be bureaucracy impaired, but there is a correct procedure when it is. Multi-round "grueling" code reviews, and API review meetings? W.T.F.


Java is stupid.


Guava is one of the best Java libraries available today, and the fact that the bar for submitting patches is so high is a simple consequence of that.

You can't have it both ways.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: