Not one word about what I consider the most toxic aspect of working at Google: blind allocation.
Unless one absolutely does not care what one wishes to work on, joining Google is throwing your future into the Hogwarts hat of an ill-defined cabal of billionaires to pick your job at Google. Sometimes that works out, but most of the time you are allocated to whatever mission-critical project is currently leaking buttcheeks.
In my brief time there, I was placed on a 7-person team that lost one person per month. That is the single worst retention rate I have ever seen. I left after 4 months for that and the realization that the powers-that-be were in denial about the coming impact of an emerging technology that they have since embraced a year or so after my departure.
That said, the SCCS and the build process were top-notch.
But there are good reasons why despite all the perks and smart people, Google's overall retention rate is barely a month longer than Amazon's.
When I joined Google I was given 4 teams in 2 different locations to choose. I had lunch with each of the teams, and chose the one that better suited my style.
To be fair, your experience at Google will depend a lot on what team you land in. Not everyone's given a choice, but now I know better: Talk to your recruiter and make them clear what you want.
That's interesting. When did you join, and where? There was no such option in 2011; you just ended up wherever you ended up, and were expected to make the best of it.
It largely depends on how impressive you are when you come in, and how in-demand your skillset is across the company. After the hiring decision is made, managers then "bid" on Nooglers they want for their teams, and higher-priority teams (like Android, Ads or Search) will beat out lower-priority teams (like Blogger or Finance). If multiple managers bid on you but you don't like the team you're assigned to, you have the option to work for one of other ones, eg. I was assigned to Search when I joined in 2009, but my recruiter made it clear that if I really didn't want to do it, there were teams in GMail and Docs that would be happy to have me. If only one manager has bid on you, then you're stuck there.
The bidding process seems like something that could be brimming with discrimination.
For example, if you didn't go to a top tier school you could be hired on only to find out that a single team wants you. It's in bumfuck nowhere 2000 miles away and its primary purpose is to program the road sign outside google HQ.
It could lead to very different googles for different people, and makes me wonder if applying to work there would really be worthwhile for someone like me.
This remind me that I don't like employment anti-discrimination laws, especially for jobs highly unlike manual labor. One of the reasons is PIPs which best show the kinds of problems I am talking about IMO, and I think there is a lot of laws that varies by state there (with California being most strict I think).
Sure, that would be great, but as a result of leaving, I have been blacklisted by HR from ever returning. And the one guy that tried to bring me back in a year ago met in the lobby of his building and then walked me out to the marsh behind the GooglePlex before saying a word to me.
I then asked if I could use said technology for which I am a recognized expert, he said no, and that was the end of that.
I have been tempted to print myself a "Bad Cultural Fit - Google HR" T-shirt in response, but I have better things to do.
People aren't blacklisted for leaving, that's blatantly false. Your story is obviously missing some key elements.
Median tenure does not necessarily indicate retention issues for core roles at any company. It could simply be rapid growth or lots of roles that typically come with high turnover (e.g Amazon's warehouse employees).
You get classified as "regretted" or "non-regretted" attrition. If you're "regretted", you can come back anytime within I think 2 years without interviewing, or with favorable interview afterwards. If you're "non-regretted", it can become very difficult to come back unless you have a really strong recommendation from some other manager within Google, or come in through a company they acquire. They already have data on your job performance that's a lot more reliable than an interview, and if it's negative, it'll take a lot to overcome that.
That sounds right. I was on a team leaking personnel managed by a manager who clearly wanted to be doing other things with his other team and who forgot I existed for the 3 out of the 4 months I was there until I reminded him I existed whereupon I was told I was failing to meet his expectations.
Upon complaining that it was unfair to judge me behind the curve when there had been zero feedback whatsoever up to that point, he then reported me to HR and offered to let me leave the team. There were two other potential teams making use of the technology for which I have expertise: one was new and it had zero openings at the time but which is now arguably one of the most prominent teams at Google and the other one's manager asked me one question "Where did you get your degree?" and didn't like my answer (I suspect) because it wasn't Stanford (desirable attributes for the position listed a Stanford degree) but I don't know for sure because he cut off communication at that point
And yes, in a perfect world I should have reminded my manager sooner of my existence but he was off on paternity leave as well with no one to run things in his absence so I really don't know the winning move here other than what I did: leave for one of Google's competitors where I spent the next 4.5 years or so focusing exclusively on the aforementioned technology.
If Google HR is dismissing the AUC of my entire career over 4 blind-allocated months at Google, that's asinine, but it would explain a lot and fit the facts. Thanks for the info.
Indeed, "blacklisted for leaving Google" seems pretty far-fetched. In reality people who leave Google with reasonable performance ratings have the option to return within 6 months after they leave.
The thing is that it didn't seem like that at the time. Someone high up went around the company to see if there was a fit for me, claimed there wasn't, and then said I could either give up on the technology or go work somewhere else. An offer came in around then to do so, I took it. Ironically, that person ultimately left Google arguably over the consequences of adopting that technology later on.
There are ways to hack this if you know your way around the system.
What you need to do is find a team doing work you find interesting within a higher priority focus area with headcount, make contact with their manager, and start 20%ing. If you do a good job with your 20% project - high quality, and done timely - then you can float the idea of transferring permanently to their team. If a manager with headcount in a high-priority focus area requests a transfer from a low-priority focus area, and the employee is enthusiastic about it and has already done work for them, they're going to get it.
I've seen this work for several people who were there only 4-6 months. Officially, you need to be on your starting team for 18 months before transferring, but this is bullshit. Unfortunately, nobody in an official capacity is going to tell you this as a Noogler, so I've also seen a few talented Nooglers leave the company because they ended up on a poor fit for their skills.
I didn't even apply until finding a team I was really excited about. I had coffee with a couple TPMs and managers, found the right group, applied, and have been really happy. The median tenure of the folks I work with is 5 years, and so far it's been great.
The above like appears to be based on using tenure of current employees to measure "retention". So that is going to heavily skew low for companies that are growing, which makes it unsurprising that Kodak has the best retention and growing tech companies are generally low.
So I wouldn't put much stock in that as a meaningful comparison.
That tenure report looks pretty useless. You can't tell how much of it is not hiring, and how much is people leaving. Any metric where Kodak is #1 is probably not a metric that indicates good places to work.
>Engineers are permitted to spend up to 20% of their time working on any project of their
choice, without needing approval from their manager or anyone else.
From what I understand talking to current employees, this is now bullshit. You could spend 20% on other stuff, but the culture in many of the groups is such that you are putting your peer review at risk by doing so because of your reduced output.
Any current Googlers that spend 1 day of every week working on something completely unrelated to their main job want to comment?
This (like a lot of things at Google) varies wildly across PA and even among managers within a PA. All my managers so far have been extremely supportive of 20% time, and I've personally worked on three different 20% projects over my last four years (including two which have been open-sourced). That being said my engagement model has generally been less "one solid day a week" and more "a few days in a row once a month or two" in quieter times for my 80% gig, or "an hour or two every day" in busier ones.
My understanding is that while I'm a little bit of an outlier, in general 20% time at Google is nowhere near as dead as people on the internet tend to claim (at least for engineers).
I spend 10-20% of my time at Google (depending on the needs) developing and maintaining an internal tool that is used by 100-1000 engineers across 10-100 teams (not including my current team). Not only have my last 3 managers been extremely supportive of this, I've netted a few peer bonuses from this, and some nice feedback from senior people that appreciated my work.
The "approval from my manager" amounted to telling them in our weekly 1:1 meeting that I wanted to spend my 20% time on that project.
Of course, if you spend 20% of your time working alone on something that produces 0 results in the span of several quarters, I suspect your experience is not going to be the same.
You've made yourself an integral part of the company and gotten your name out there to boot.
Back when I worked at Compaq there was a tool everyone used in the build process and it had a splash screen that mentioned the author. That guy was a legend at Compaq because everyone had at least heard of him. When my buddy took over the build process he ended up emulating Mr. Legend and put his name on his tool that was used thousands of times a day. Same thing happened to him: oh you're the Mr. Coder? He had his pick of projects for a very long time.
I'm not currently, but I was doing that for the last couple years with no ill consequences. I'm only not doing that now because I don't have a 20% idea that's particularly exciting to me.
Although there is a well-defined process for launch approvals, Google does not have a well-defined process for project approval or cancellation. Despite having been at Google for nearly 10 years, and now having become a manager myself, I still don’t fully understand how such decisions are made.
The rest of the paragraph you quote is fairly important as it goes on to explain cases and how a project can be cancelled.
The rest of the quote:
In part this is because the approach to this is not uniform across the company. Managers at every level are responsible and accountable for what projects their teams work on, and exercise their discretion as they see fit. In some cases, this means that such decisions are made in a quite bottom-up fashion, with engineers being given freedom to choose which projects to work on, within their team’s scope. In other cases, such decisions are made in a much more top-down fashion, with executives or managers making decisions about which projects will go ahead, which will get additional resources, and which will get cancelled.
And on the other end of the spectrum we had Jobs era Apple, with One True God overseeing all demigods who oversee projects. IMO the things that made Apple products so successful was this structure which left no room for confusion and that the company was run by QA people. My impression of Jobs is that he was a QA guy himself.
Jobs was notoriously a capricious micromanager at NeXT; see e.g. "The NeXT Big Thing", Randall Stross.
Thanks to the secrecy that shrouds everything Apple, it's hard to get a handle on whether he became a more evolved manager (or, let's not euphemise: less of a flaming asshole) later on at Apple.
Is the practice of shoving all disparate pieces of proprietary software (or individual projects) in the same repo a common occurrence? I have found that pulling unrelated changes just so that I can push my changes is an inconvenience. Furthermore, tracking the history of a particular project is confounded due to unrelated commits. I am sure that their vcs (piper?) makes this a feasible task, but for git, it seems like it would suck.
The article posted by kyrra, mentions this.
Given the value gained from the existing tools Google has built and the many advantages of the monolithic codebase structure, it is clear that moving to more and smaller repositories would not make sense for Google's main repository. The alternative of moving to Git or any other DVCS that would require repository splitting is not compelling for Google.
It seems like they have just too much invested in this "shove it in the same repo" style. Or is this the more appropriate way to do things in a large organization?
Coming from companies that use reasonably-sized git repos, I absolutely hated Google's VCS.
Here's some of my painpoints with it:
* No branches. If you want to make a temporary code branch, you create a CL (Google's version of a pull request), but never submit it. This means nobody else can collaborate on it with you, and it must be manually updated to HEAD.
* No CL collaboration. Unlike Git branches, CLs can only contain changes from one user.
* No stable branch. Since everything is essentially on one long branch, it's a real hassle when a project is broken at HEAD. Sure, integration tests should ideally prevent this. In practice, HEAD is often broken. Teams have created bash scripts and mailing lists to determine 'stable' old versions that can be checked out for development.
* Single versions of libraries. Any library that is used is also checked into the VCS. However, only one version of the library can exist in the codebase, which is rarely updated. However, there are exceptions to this.
At one point, Sergey mentioned bringing Google "up to industry standards" regarding VCS's. However, that would be a monumental task and I doubt it will happen.
It's interesting. This is exactly how it's worked at every large company I've been at. You have these enormous source trees that are essentially unbranchable. Then you have load build teams that build the latest checked in version. When you want to do anything that uses a component you aren't working on, you grab the built version and use it locally. They you hope like crazy that any changes you make don't break any downstream users (who are integrating with yesterday's build). If the build breaks (which happens frequently), then people use builds from 1, 2, or 3 days ago, leading to inevitable "integration hell".
Although I seriously doubt that Google has this problem, one of the biggest drawbacks of the scheme is that nobody knows how to build anything -- not even their own project. If you are coordinating with another project, then you're always having to wait days for the load build to finish so you can get their changes. If the build breaks for an unrelated reason, you can lose a whole week of development to screwing around.
When working in that kind of environment, I tend to maintain my own VCS, then squash my commits when integrating with the main VCS. I also do all my own load build. Everywhere I've worked, I've been heavily criticised (usually by management) for doing this, but productivity has been on my side, so people reluctantly accept it. I often find it strange how so many people prefer doing it the other way...
> If the build breaks for an unrelated reason, you can lose a whole week of development to screwing around.
This happens often on some projects at Google (although 2 days is the longest I've seen it broken). Others have told me they use this time for documenting code and writing design-documents.
> When working in that kind of environment, I tend to maintain my own VCS, then squash my commits when integrating with the main VCS.
Google actually has a tool that allows developers to use Git on their local machine, which is squashed into a CL when pushed. However, some projects are too reliant on the old system for this to work.
Everything you posted is wrong, which makes me believe you know everything you posted is wrong. Nobody writes things that inaccurate by accident.
For the benefit of YC, here are corrections:
> No branches. If you want to make a temporary code branch,
> you create a CL (Google's version of a pull request), but
> never submit it. This means nobody else can collaborate on
> it with you, and it must be manually updated to HEAD.
Piper has branches, which can be committed to as normal by any number of engineers. It's common to have both branches for developing certain features ("dev branches"), and branches pinned to a stable base version with cherry-picked bug fixes ("release branches").
> No CL collaboration. Unlike Git branches, CLs can only
> contain changes from one user.
CLs are equivalent to Git commits. Collaboration is expected to occur via a series of CLs, just like Git-based projects have large changes made via a series of smaller commits.
> No stable branch. Since everything is essentially on one
> long branch, it's a real hassle when a project is broken
> at HEAD. Sure, integration tests should ideally prevent
> this. In practice, HEAD is often broken. Teams have
> created bash scripts and mailing lists to determine
> 'stable' old versions that can be checked out for
> development.
There is a global testing system for the entire repository, which is used to decide whether a particular project's tests pass. Commits on which all relevant tests pass are the branch point for releases. This is similar to the Linux kernel's dev model, where stable releases are cut at known-healthy points in an evolving codebase.
Important libraries define more rigorous releases, similar to Git labels, which are updated automatically every day or two. These both reduce the amount of tests that need to run, and reduce chances of errors in low-level code affecting many teams.
> Single versions of libraries. Any library that is used is
> also checked into the VCS. However, only one version of
> the library can exist in the codebase, which is rarely
> updated. However, there are exceptions to this.
Many third-party open-source libraries have multiple versions, and new upstream releases are added when there's either a security/bug fix, or someone wants a new feature.
Three of the four languages most often used at Google (Python, C++, Go) do not allow multiple versions of a library to be linked into a single process due to symbol conflicts. This is a limitation of those languages, not of the Google repository, and they affect any company that allows use of third-party code. The standard recommendation at Google is to avoid dependency hell by sharding large binaries and using RPCs to communicate. This development model has many advantages that have been documented elsewhere.
I've worked on two teams so far, neither of which used the branches you described. They opted to either flag off features or keep long-running CLs.
However, I'll try to learn more about Piper branches. I doubt my team will make large workflow shifts, but it would still be good to understand!
> CLs are equivalent to Git commits
I'd argue that CLs are not equivalent to Git commits, the equivalent would be a CL snapshot. Best practices in git are to have frequent, small commits. CLs tend to be much larger, and the review process means that having small CLs would greatly slow workflow.
> There is a global testing system for the entire repository, which is used to decide whether a particular project's tests pass. Commits on which all relevant tests pass are the branch point for releases.
Yes, but this doesn't help with development. When HEAD is broken, a developer has to chose between developing on an outdated codebase, or developing on a broken codebase.
> Many third-party open-source libraries have multiple versions, and new upstream releases are added when there's either a security/bug fix, or someone wants a new feature.
Popular libraries may be updated more often, however other libraries don't have many resources dedicated to them. After a brief correspondence with the team that manages third-party libraries, I decided it would be easier to implement the feature myself instead of following whatever process was required to update the library. And no, I wasn't trying to use 2 versions of the same library.
Despite your assertion, I'm not trying to write anything incorrect, and I appreciate your response.
While Piper does technically have SVN-like branches, they're so unwieldy and poorly supported by lots of tooling (our team could never get TAP and our slightly-unusual testing to work on branches) that they're a much more niche tool than something like Git branches.
I agree with you with respect to CLs/snapshots. I would sometimes try chains of DIFFBASE-linked CLs in a crude emulation of linear Git branches (I want to experiment with taking things different ways, and I want the version control to store/back up my work), which sort-of worked when you're writing the code, but merging could be nasty. But there was also ad-hoc Git hosting available internally. I started using that for my experimenting, squashing into Piper commits when it was ready to share. It wasn't terrible, though still not optimal for collaboration.
> Best practices in git are to have frequent, small commits.
I'd qualify that to only on a working branch - it's preferable to squash it all to 1 commit when merging upstream in order to make reverting painless if there is something wrong with the proposed change.
Lots of small commits cause massive pain with git :( .
I don't know which team you work on, so this advice might not be appropriate for your situation. Feel free to email me and I'll try to help you get things sorted. I won't make any attempt to link your corp username to these comments. Same for any other Googlers reading this chain.
> I've worked on two teams so far, neither of which used the
> branches you described. They opted to either flag off
> features or keep long-running CLs.
> However, I'll try to learn more about Piper branches. I
> doubt my team will make large workflow shifts, but it
> would still be good to understand!
Putting new features or changed behavior behind a flag is a good process. Encourage your teammates to avoid long-running CLs in favor of submitting flag-guarded code. You should treat every run of [g4 patch] as a warning that the CL is dragging on too long.
Dev branches require a change in workflow, and can be unhelpful if a team has bigger process issues (like yours sounds like). I recommend looking into them and trying out the codelab, but don't start advocating for them just yet.
Release branches are very important. If your team is not currently using them, that needs to be fixed ASAP. Look into Rapid[1], try it out for a small CLI utility. Advocate for all releases to be done via Rapid. Despite the name it can be slower than plain [g4 sync; blaze build] but the extra steps it runs are important and useful.
> I'd argue that CLs are not equivalent to Git commits, the
> equivalent would be a CL snapshot. Best practices in git
> are to have frequent, small commits. CLs tend to be much
> larger, and the review process means that having small
> CLs would greatly slow workflow.
I'm a Go readability reviewer, so I review a lot of CLs written by people in other parts of the company. My firm belief is that CLs should be small and frequent. CLs start to get hard to review at around 300 lines of feature code. If you are regularly reviewing large CLs, push back and ask the authors to split them up. Often these CLs are trying to do too many things at once and you can find a good fracture point to split them into 2-3 parts.
If you are regularly writing large CLs, or long chains of diffbase'd CLs, that's a sign that your codebase may be poorly factored. Take a step back from the tactical level and look at what your CLs are touching. Is the UI's HTML formatting mixed into the business logic? Are you touching the same file over and over? Move things around, use includes, use data embeds. Replace long param lists with data structures. All this standard software engineering advice applies 2x when working with other devs.
> Yes, but this doesn't help with development. When HEAD is
> broken, a developer has to chose between developing on an
> outdated codebase, or developing on a broken codebase.
If HEAD is broken, your first priority should be getting HEAD fixed. Whether that means fixing the code or rolling back to an earlier version, you should not accept a broken HEAD.
After it's fixed, look at why it broke. Why was a CL that broke things allowed to be submitted? Do you have proper presubmit tests? Consider adding your TAP project(s) to METADATA files so Piper will make sure tests pass and binaries build before allowing the submit.
If other teams' changes are breaking your code, add your TAP project(s) to their METADATA or help them improve their own test coverage.
> After a brief correspondence with the team that manages
> third-party libraries, I decided it would be easier to
> implement the feature myself instead of following
> whatever process was required to update the library.
Third-party code has special rules that might prevent you from doing something reasonable. Ask the team for help with the process. It's a natural developer instinct to write new code instead of trying to update shared dependencies. If you can fight that instinct and get the new dep version imported, it will improve not just your project but the projects of everyone who might use that dep in the future.
Other than "works at Google scale?" That not a dig at Perforce, it scales well but not to the absurd scale Google dials it up to. That is the major feature that drives any differences.
It actually works quite nicely. Most of the google software is built internally. Ensuring everyone is running at head is a blessing (rarely a curse), because when you make a change you immediately see if it breaks something. You can also be sure everything gets all the bug fixes in their new release. This is good for various reasons, including the fact that it makes security audits significantly simpler.
Some google orgs, like android, don't use the existing infra, and they kind of struggle and have to reinvent most wheels, because of that.
The company I work for has a monorepo and has a team dedicated to develop lots of tooling around it to make it manageable.
Do you want to check out only the app you're working on? There is a script that grabs it and its dependencies.
Do you want to do continuous integration? We have plugins to our CI server that understand when the app code or one of the dependencies has been updated.
Building always off HEAD is nice and it solved the issues we had with diamond dependencies but I am not completely convinced this is the right approach
I am pretty sure that engineers there don't sync the entire repo to their workspace. Look up the talk that introduced Piper and Client in the Cloud (I think it was at the BUILD conference 1-2 years ago).
No, it's not common. It's something that a couple of big companies have done, and while it might work for them it does NOT work for most people. It's a really, really bad way of doing development.
Because smaller shops want the benefit of monorepo without paying the cost/discipline it requires. Mono repos without excellent unit tests and integration tests, review processes and various tools, is a disaster. And, in fairness to Google and Facebook, when they talk about the benefits of monorepo, they always mention these costs (it just gets ignored by people: Headline Driven Development)
And I'm sure that the number of companies doing really good integration tests in somewhere in the 1/10000 range (or worse). Every dependency, external provider, internal API, data store, taking into account versioning....it's hard.
I'm a huge believer in automated testing, I've written about it and done it for almost two decades now. My views on it have continuously evolved...and comprehensive and effective integration testing is still a complete mystery to me. The only reason I'd want to work at BigTechCo would be to learn about that specific aspect of software development.
One True Version force you to move whole company in the same pace. Even if you have legacy project you need to keep updating because your dependencies are moving. In the same time if you want to refactor or redesign API you need to wait for all your users and communicate changes since you cannot have beta/alpha release.
I am guessing that in Google people they try to avoid above issue by creating and then depreciating a lot of projects instead having new major release. Older projects will become frozen because too many things depends on them.
This is visible in Google products that take years to change in any way. Gmail/Search are basically the same as long I remember. Given number of engineers in Google it is hard to see any output.
Well I suppose my statement was a little strong. The Linux kernel is essentially a monorepo: it contains the kernel and many, many dependent projects (drivers/modules). They do what Google claims to do with their monorepo: change all dependent packages at once if they change an internal API.
Google is highly non-representative of businesses and technology businesses in particular. It has a near monopoly on the search business and has enormous amounts of money from a single source -- advertising.
Google Not a Startup/Has Not Been a Startup for Many Years
Founded: September 4, 1998 (19 years ago)
IPO: August 19, 2004 (13 years ago)
Number of Employees (2015): 57,100
Revenues per Employee: $1.3 Million
Profits per Employee: $409,000
The issue is that Google has so much money and is so successful that it can do all sorts of things that are extremely inefficient, even very harmful and still do fine, unlike smaller companies or startups in particular.
For example the arxiv article states:
2.11. Frequent rewrites
Most software at Google gets rewritten every few years.
This may seem incredibly costly. Indeed, it does consume a large fraction of Google’s resources.
Google has the money to do this. As the article argues, it may work for Google. On the other hand, Google has so much money and such a dominant market position, it probably can keep succeeding even if continual code rewriting is actively harmful to Google.
In orthodox software engineering theory, competent software engineers, let alone the best of the best that Google claims to hire, should write modular highly reusable code that does not need to be rewritten.
Many rewrites and "refactorings" are justified by claiming the "legacy" code is "bad code" that is not reusable and must be written by "real software engineers" to be reusable/maintainable/scalable etc. One rewrite ought then to be enough.
Even highly successful businesses, for example $1 Billion dollars in revenues with $50 million in profits and 5000 employees (revenues per employee of $200,000), have nowhere near these vast resources --- either in total dollars or per employee or product unit shipped. Blindly copying highly expensive software development processes from Google or other super-unicorn companies like Apple or Facebook is likely a prescription for failure.
There's a meta-lesson about the structure of the software industry, though:
Different market niches in software have wildly different sizes & profitabilities. It's worth doing everything you can to get into a market that is both huge and wildly profitable, and once you're there, you should do everything you can to defend that niche from new entrants.
Pretty much everything in Google's strategy - the constant rewriting, the expansion into adjacent markets like web browsers and mobile phones and ISPs, the highly-paid employees, the 20% time - follows from this. Also, most of the counter-intuitive advice in the startup world - the obsessive focus on growth, the companies that shut down because they aren't growing fast enough, the existence of venture capital and willingness to accept significant losses of control in exchange for capital infusions, the use of disposable coding practices and technical debt - also follows from this principle. A lot of things about the software industry that seem stupid, irrational, or short-sighted become perfectly rational when you understand the market structure of technology.
I agree. Many companies, including mine, run critical stuff on applications that are over a decade old that do just fine. Worst thing you can say about them is that some weren't designed well enough for easy extensions or integration with newer apps. They do their job, though. The backbone of ours is mainframe code running on mainframes and AS/400's with simple, terminal interfaces. Terminal stuff is ultra-fast & reliable but sometimes ugly. Some have GUI apps that basically hide the terminal details they interact with to be a bit easier to use. Those terminal apps, probably 20+ years old, still work and get periodically updated. New people learn them easily, too, since interface was well-designed for the time. Can't goof off on the thin clients either as there's no web browser or native apps. ;)
I've seen Google do some rewrites that make sense when one has the money for them. The shift from eventual to stronger consistency in their databases via F1 RDBMS was impressive. Worth some rewrites across the apps to use it to knock out major problems that could affect them once and for all. After developing Go, they also might want to rewrite performance-critical apps in, say, Python to it. There's definitely benefits on such rewrites. A lot of the other stuff I'm betting they could've done more long-term esp if using and extending FOSS solutions.
>Blindly copying highly expensive software development processes from Google or other super-unicorn companies like Apple or Facebook is likely a prescription for failure.
Yes, this can't be stressed enough. Out here with the little people like non-Google-employed me, there's almost a desperation to copy the things that Google and Facebook do, and it's often justified with "Well, Google, Facebook, or Netflix use it!"
A huge amount of tech fads over the last decade have been the direct result of Google's publications and public discussions. Google might need BigTable, but that doesn't mean we should all bury our SQL RDBMS and use a crappy open-source knockoff "based on the BigTable paper". More than likely, Google would've been happy to keep a SQL RDBMS.
Google has the engineering staff, including two extremely technical founder-CxOs with PhDs on which Google's initial technology was based, and the hundreds of millions of dollars necessary to do things The Google Way. They can rewrite everything that's slightly inconvenient and do a lot of cutting-edge theoretical work. They have the star power and the raw cash to hire the people who invented the C language back in the 70s and ask them to make a next-gen C on their behalf.
Google has the technical and financial resources to back this type of stuff up, provide redundancy, put out fires, and ensure a robust deployment that meets their needs. They keep the authors of these systems on-staff so they can implement any necessary changes ASAP. In many cases, Google is operating at a truly unprecedented scale and existing technologies don't work well, so that theoretical cutting-edge is necessary for them.
None of those things are going to be true for any normal company, even other large public ones. Google's solutions are not necessarily going to meet your needs. Their solutions are not necessarily even good at meeting their own needs. Stop copying them!
I'm so. sick. of sitting through meetings that have barely-passable devs making silly conjectures about something they heard Google/Facebook are doing or that they read about on a random hyper-inflated unicorn's engineering blog.
You want your stack? OK, here it is: a reasonably flexible, somewhat established server-side language (usually Java, .NET, Python, Ruby, or PHP; special applications may need a language that better targets their specific niche, like Erlang), a caching layer (Redis or Memcached) that's actually used as cache, NOT a datastore, a mainstream SQL database with semi-reasonable schemas and good performance tuning, and a few instances of each running behind a load balancer. That's it, all done. No graph databases, no MongoDB, no quadruple-multi-layered-massively-scalable-Cassandra-or-Riak-clusters, no "blockchain integration", no super-distributed AMQP exchanges, ABSOLUTELY NO Kubernetes or Docker (use Ansible), none of that stuff.
Just sit down and get something done instead of wasting millions of your employers' dollars trying to make yourself feel important. If you do need an AMQP exchage, bolt it on after the fact to address the specific issue at hand, once you know why you need it, and make sure you know what you're doing before you put it in production (it's apparently difficult to grasp that AMQP servers are NOT data stores and that you can't just write directly to them and wait for some worker to pick it up if you care about data integrity).
Don't start a project with the primary intention of making a convoluted mess; let it get that way on its own. ;)
While I agree with the spirit of your post, I think you are taking the argument too far in places. For instance, we are a small team of devs and Docker solved a real problem for us. Our deployment was once on bare OS and it was a nightmare. Different distributions, different library versions (with their own sets of bugs)... Unfortunately we couldn't demand a specific environment from our customers and at the same time it drained our resources trying to accommodate all installation variants. When Docker came out we tried it out, saw that it solves all these issues beautifully and never looked back. But we use it only because it insulates our installation from Linux (apart from the kernel that is), not because it is hip.
More general advice is: take G, A, FB, NF... papers (and HN posts, while we are at it ;) ) with a grain of salt, test before use and make sure it solves more problems than it creates - for you.
Yeah, I don't mean it to be an all-encompassing prohibition. It's just important that people start with the basics and complicate their software with esoteric solutions only as a matter of necessity. This also ensures that the engineers understand the functionality that the esoteric solution needs to provide instead of making grossly incorrect assumptions and misuses that lead to major disasters (e.g., treating AMQP servers as a data store; a client lost data last week due to this silly assumption).
I think there can be many rational reasons for 're-writes' that preclude any assumptions about 'bad code'.
New infrastructure requirements, languages, performance considerations, markets, customers etc. etc. - they all change.
An expanding and adaptive business will likely have to refactor code that was 100% perfect in 'context ABC' because they have some new 'context XYZ'.
There are so many examples to think of.
When they went from one datacenter to many, whenever they replace the hardware architecture in their datacenters or change newtworking configurations. When they decided to integrate G+ into everything. When they decide on using a new common UI framework.
The 'roll on' effects may necessitate refactoring of other code.
And this is not including more opportunistic things in a given area.
Maybe they started doing some stuff in 'Go' and wanted to refactor other modules into 'Go' for greater maintainability?
And I don't think anyone will be 'blindly copying' Google's processes.
So much snark and negativity on this thread, when the paper is just a factual description of a pretty impressive feat of software engineering (still way better than most companies I have seen the inside of).
I really like the first part of the paper and it describes a lot of their best practices, but the last 1/3 or so of the paper is more about Google's culture, no? IMO that opens them up for the snark here.
Google has some amazing perks and it's a wonderful overall employer, but I really wonder how much cash is left on the table by not addressing its shockingly low retention rate or at least explaining why it's not a problem.
There are a lot of articles estimating the cost of a departure is anywhere from 6-12 months of the lost employee's salary. Does that seem like something that should be ignored like they seemingly do? Why do shareholders even tolerate this?
Fair point - I was mostly focused on the build system. It is quite something and a marvel of engineering. The culture is actually good in parts of the company -- like core systems infra -- but yes it is hard to spread that throughout, it seems. A separate article about culture and what's good and bad would be interesting, but probably not likely to be seen in public, particularly if it's honest...
Google has gotten so large so quickly in the past 3 years, that I wonder how much damage has been done to their engineering culture. A lot of "less than stellar people" have joined in these recent years according to several of my friends that work there (in infrastructure and some ML groups).
It seems the push to golang is entirely to sustain large projects with average engineers. Maybe somewhere high up they decided that it's better to just have a massive engineering workforce rather than only hiring top talent? At what point does brain drain start as the best people get sick of dealing with mediocrity?
I've only been there 13 months but I've found the quality of previous and new hires are ridiculously high. Given the number of talented people applying every day I think Google could increase hiring by 100% with no appreciable drop in quality.
Google has a lot of imperfections addressed elsewhere in the comments here but the engineering culture is extremely strong and one of the best parts about being there. I think the overall "Googley culture" is suffering but it isn't for technical reasons.
Then again maybe I'm one of those C players who snuck in the past year.
>I've only been there 13 months but I've found the quality of previous and new hires are ridiculously high.
Seek out the people that were there 6+ years ago and compare their quality with the average Googler. You'll likely notice a difference.
Google now has over 50,000 employees. It's not possible to get that big that quickly without a reduction in hiring rigor. There wasn't a sudden influx of geniuses to fill that demand.
They're pretty much the same raw talent, just with 6+ extra years of navigating the Goog and collecting all the arcana that makes you an effective engineer there. Of all of the complaints I have heard from older and newer Googlers, not a single one (in person or on Memegen) has ever complained about a drop in engineer quality. If anything it is a continuing frustration over how high the hiring bar continues to be and how we can't get referral bonuses for any of our friends.
Except no, I saw stuff like that too when I was there along with childish crap like throwing gum in the urinals and googlers acting like entitled jerks towards the cleaning staff.
I met some great people at Google, but I'd be remiss not to mention I also met too many #IAmGoogle sorts (who were all dudes BTW).
I read most of the paper. For the most part, it struck a nice tone as being mostly descriptive and not too promotional. However, the final paragraph in the conclusion section differs:
> "For those in other organizations who are advocating for the use of a particular practice that happens to be described in this paper, perhaps it will help to say “it’s good enough for Google”.
In my opinion, this style of writing doesn't fit nor belong.
I would leave that paragraph out. Instead, let's judge on the merits and applicability of an engineering practice based on thinking, reasoning, and experimentation.
That said, as I've read various comments about Google's processes, I'm struck by the cognitive dissonance. On one hand, I see bandwagoning; e.g. "monolithic source control is nuts; we don't do that; no one I know does that". There is also some appeal to authority; e.g. "well, Google is the best, they do X, so we should too." I'm glad to see different argumentation fallacies colliding here.
With one-or-two exceptions, what the paper describes is very familiar and sounds like most software teams, but most teams don't achieve Google-like performance/stability/success.
The differentiation is in details that the paper doesn't explore.
I agree, this paper will just lead to more monolithic Git repos "because it's good enough for Google", without the appreciation of Google's other tooling and processes.
The single-repo is quite surprising. How do they manage to store that much data in a central place? I guess they're using some sort of distributed network file system - This seems overly complex though. It would be interesting to know if this was intentional (if there is a reason for this) or things just evolved like this out of habit.
I think that engineers in most large software companies don't actually accomplish much on a day-to-day basis. I'm saying this after having worked in both corporations and startups. Large companies are laughably inefficient - Engineers tend to focus all of their energy on very narrow (often tedious) problems at great depth.
These huge companies never try to reign-in the complexity because they don't really need to - Soon enough, the employees at these companies get used to the enormous complexity and start thinking that it's normal - And when they switch jobs to a different company; they contaminate other companies with that attitude. That's why I think startups will always have an advantage over corporations when it comes to productivity.
In these big companies, the most clever, complex solutions tend to win.
See my reply elsewhere in this post to the ACM article. It's custom in-house based on their own distributed databases. The front-end UI is based on perforce, but there is a Git front-end (but logically it behaves a lot like perforce).
As the ACM paper calls out, files are loaded asynchronously like on access, so you only have the files you need on your desktop. From a developer's point of view, you can see all the files in the entire repo at all times. And with how CITC (client in the cloud) works, I can move from a laptop to a desktop without any effort (all of the files I have in a working state on one are immediately available everywhere I can access CITC).
As far as complexity... how would a large company solve these problems without these large and complex systems? Google is probably one of the more efficient ways I've seen large companies work. When you want all your devs to share code, thing get hard at scale.
"Most software at Google gets rewritten every few years" at incredible expense. That sounds crazy, but the article claims it has benefits including keeping up with product requirements, eliminating complexity, and transferring ownership. Would be interesting to see some kind of metric indicating how much of an outlier Google really is here, and what measures it takes to make sure rewrites aren't worse (second system).
I spent ~18 months at Google, and one of the annoying aspects was the diversity of build systems. I built with Ninja, emerge, Blaze, and Android's Rube-Goldberg-shell-script system.
> Software engineers at Google are strongly encouraged to program in one of four officially-approved programming languages at Google: C++, Java, Python, or Go.
I wonder which of these languages they use to develop the google front page or any other frontend when no Javascript is allowed...
Those are the primary, general four languages, but other languages are used as necessary for specific domains (e.g. Javascript, Objective-C/Swift), they're just not guaranteed to get the same level of internal tooling and infrastructure support (though in practice they do).
Clearly there is some use of Dart and JavaScript (vanilla, Angular and Polymer). It would be interesting to hear percentages across Google development.
Interesting, compared to what's common in the automotive industry it doesn't even mention the terms "requirements", "specifications", "estimations", "project plan", "tracability", "UML", etc...
Right, if it mentioned those, search would still be the version we saw in 2007. None of the other products would exist yet because they would still be conducting user studies based on wire frame mock ups of workflows designed by committees of behavioral psychologists.
That's a bit dramatic, but when your product development has a fast turnaround for fixes (git push vs 100 million dollar recall) and it won't kill people when it breaks, you should immediately throw most of that process shit out the window.
You can't be competitive in consumer SaaS if you get bogged down in 'real engineering' processes.
A clear statement of what you're going to do, some constraints on the design a la Design-by-Contract, and languages/libraries that mitigate errors by design are so easy to do that small shops do them on a regular basis. Ada/SPARK, Eiffel, Ocaml, and Haskell are examples with steady business in industry with last three used on relatively-fast-moving projects. Add in static analysis, spec-based generation of tests, and/or fuzzers to get lots of reliability for free. Guess what? This method also scales excellently if the company has access to a huge pile of servers and engineers whose build system can automate all the checks with every submitted change.
Your idea that it has to be as ridiculous as process junkies is a strawman. A strawman that happens in a lot of places for sure but doesn't have to. Google can just take the few, easy-to-apply practices from high-integrity systems to get tons of benefit. It's the 80/20 rule for increasing assurance.
All of the processes you just described are antithetical to the processes I was referring to. Writing URL, design specs, etc is not the same thing is automated test generation, static analysis, and an API contract agreement. The thing you just described is exactly what I'm saying replaces all of the "on paper" crap that was taught as "software engineering" in schools as short as 4 years ago.
Or 40 years ago! I'm no fan of "process", but if you don't at least write down what you're trying to do (requirements) and some metrics for judging success, you open yourself up to two equally troublesome outcomes (depending on who's doing the evaluating): it's all a success or everything's a failure. Well written requirements and metrics (even brief ones) remove some of this evaluation ambiguity.
Except, this is showing very much in reliability of Google, which is horrible.
Just look at Google Nest and their massive outage in December 2015, that’s usually enough to kill a company, and if Nest was independent, it’d have hurt them massively.
Their reliability is nice compared to competing websites, but compared to other infrastructure, it’s quite bad. Hell, I’ve seen 2 orders of magnitudes more Google outages than power outages in my entire life (combined 29min power outage vs. several days Google outage since '96)
A lot of the things they had to build in-house for repo/build/test/deploy as described in chapter two, all of us are fortunate enough to get for (almost) free with all the tools these days.
>>Engineers are permitte d to spend up to 20% of their time working on any project of their choice, without needing approval from their manager or anyone else.
Can someone tell more about what they did and the things are that permitted (even though without needing approval.
It's all kinda of things: I did mine on another team I was thinking of transferring to (as a sort of pilot program), a friend of mine in ads worked on a cloud robotics team (this was like 5 years ago), another friend spent some time on a research team for the stuff he had done his thesis on, etc
EDIT: one personal example is at some point I took some external classes and used 20℅ time for it. But most cases are for developing some sort of tools or product.
In theory basically anything software related is possible, and I've never heard of a legitimately suggested 20% project being rejected, but these days its most common for people to 20% building a feature that interests them for another project, rather than an entirely new project.
In my experience, yes. Most people find it courteous to inform their manager but I did a lot of 20% time that led to getting hired with Google Brain, and it was never any doubt in my mind that my manager would approve.
That being said, there were times when I was really excited about the project so I would work 80% time plus 40% time and stay late.
I don't want to get too specific but I did a 20% project that led to a couple of conference abstracts with one subteam. This built relationships which helped when I applied internally to work on a different subteam.
20% time aiding in team transfer is very common I think.
You guys seem to think that Google is benevolent by giving engineers that 20%, while in fact they do that for very selfish reasons: it is all about copyrights.
They hire the smartest Software Engineers in the world, so it is only a matter of time that some of them will create new, disruptive product.
If they weren't given that 20% of time at Google they would do that anyway, on weekends, but since they do it in company-sponsored time then Google owns the copyrights to all of their work.
What new York state law prohibits this? My contract says that as a salaried employee I work at the company for 23:59 hours Monday to saturday, so it's only Sunday that's off the clock.
And from what I understood, all inventions they own, but I am still very iffy on what's qualified as an invention.
A coworker even asked for bosses blessing to sell some stuff on the side unrelated to the company, he said no because as per the contract that wouldn't let coworker give full undivided attention to the company.
> Relate at the time of conception or reduction to practice of
the invention to the employer’s business, or actual or demonstrably
anticipated research or development of the employer
For google, that's pretty much all software related projects.
> You guys seem to think that Google is benevolent by giving engineers that 20%, while in fact they do that for very selfish reasons: it is all about copyrights.
> The maintenance of operational systems is done by software engineering teams, rather than traditional sysadmin types, but the hiring requirements for software engineering skills for the SRE are slightly lower than the requirements for the Software Engineering position.
I'm an SRE. Both the OP article and the YouTube video are making generalizations that aren't really accurate.
There's two positions called "SREs", SRE-SWE and SRE-SA. SRE-SWE has to pass a full software interview, and are equivalent to SWEs working for other departments. SRE-SA are not expected to have Google-level software engineering skills, but make up for that in other knowledge areas.
Because they aren't measures of how good you are at developing, testing, and releasing software, they're just measures of how good you are at estimating the amount of work it'll take to complete a project 3 months in advance.
> Mostly just stuff any competent company would/should be doing. it's google though, so they act like it's super awesome.
Yes, you're absolutely correct. But here's the thing - it was actually Google that pioneered many of this. Many of the big/competent companies that are following these practices are because of Google's "DNA" leaking into those companies (via former employees bringing along the best practices learned at Google, etc.)
They may have done a better job instituting these practices across a large organization, and some of their tools have very useful and novel features, but I very much doubt there is a single practice that they actually invented. If you think there is one, please be specific. I think what Google contributed is evidence that these practices can be instituted at scale, which really was sorely lacking in some cases. This helped the industry disseminate them.
Of course it's hard to say if they completely, 100% invented anything from scratch. But they sure did "pioneer" a lot of unique practices that other software companies were not following at the time.
A specific example - the practice of keeping the entire codebase at the company under a single "source" repo. Pre-Google - it would've been considered outrageous to have the entire codebase of a sophisticated software company keep their entire software contents under a single repo. But Google did it, and other companies have followed suit successfully (as Google DNA has leaked to other companies).
Yes, of course keeping code in a single repo is not a "new invention". Linux is a single repo; many smaller companies have only a single repo because their only product is a single web app. Google keeps nearly 100% of their entire codebase in a single repo - and that was definitely a novel approach at the time.
As someone who worked at both companies for a long time, I assure you that Google's best practices (circa when I switched) were a generation ahead of Microsoft's. Mostly due to MSFT having much longer software release cycles, a more primitive, Windows-based internal cloud, many legacy build systems, less inter-group trust, and little company-wide desire to improve things.
Says right in the article: various config and dependency files, presumably both as caches (where everyone would generate the same product) or as a record of where things stood on at time t.
For example:
> In some cases, notably Go programs, build files can be generated (and updated) automatically, since the dependency information in the BUILD files is (often) an abstraction of the dependency information in the source files. But they are nevertheless checked in to the repository.
They don't use git or any other distributed version control system, so there is no incentive to keep it small. And anything outside the source control system isn't accessible to all the tools that use it, so it would introduce complexity.
Nah, because it would cost the company billions of dollars in lost productivity waiting for these files to get re-processed every time someone built the thing. Google's general philosophy is that humans are expensive and computers are cheap, so pretty much anything that helps the humans go faster is a going to be a net benefit in the long run.
They sure do at Amazon. Frugality is one of the explicit leadership principles and initiatives often have cost saving as a primary goal and always as a secondary goal.
It was very eye opening and helpful for me. Given that at our startup we are just starting to grow and trying to set software development processes and standards to help with the growing number of devs, this info provides a good guidance on what to aim for, and also showed me that we are going in the right path in several ways.
No need to be needlessly sarcastic.
Data-driven means that you collect various metrics on dev workflow, what slows productivity, or on the product side (user patterns, retention, etc.) and use those when making decisions.
Unfortunately, many companies still base their decisions very simplistic metrics and/or on "instinct".
Sorry, it was late and I didn't want to write a more substantive response.
The issue here is that politics are unavoidable. Being more data-driven is just another way of running your political process. And yes, it's a better way as long as you know its limitations. Collecting data and sifting through it to extract useful information takes time, creative thinking, and even "instinct" to figure out the right questions and hypotheses. Furthermore if you're going to collect data on dev workflow you better not have incentives there for employees or they will be gamed.
One of my pet peeves is technical people who worship so strongly at the altar of rationality that they are blind to their own biases. Even the most guileless and logical engineer still has an emotional life and worldview that forms the building blocks of what turns into "politics" when you get a large group of people together.
Unless one absolutely does not care what one wishes to work on, joining Google is throwing your future into the Hogwarts hat of an ill-defined cabal of billionaires to pick your job at Google. Sometimes that works out, but most of the time you are allocated to whatever mission-critical project is currently leaking buttcheeks.
In my brief time there, I was placed on a 7-person team that lost one person per month. That is the single worst retention rate I have ever seen. I left after 4 months for that and the realization that the powers-that-be were in denial about the coming impact of an emerging technology that they have since embraced a year or so after my departure.
That said, the SCCS and the build process were top-notch.
But there are good reasons why despite all the perks and smart people, Google's overall retention rate is barely a month longer than Amazon's.
http://www.slate.com/blogs/business_insider/2013/07/28/turno...