[DISCLAMER: I used to work at Google in general, but not at Google Cloud]
I'm not sure whether this has been discussed here before, but I'd love to take this forum to share an angle from the tech side of things:
IMO, Google is _cursed_ to keep deprecating its products and services. It's cursed by Google's famous choice of mono-repo tech stack.
It makes all the sense and has all the benefits. But at a cost: we had to keep every single line of code in active development mode. Whenever someone changed a line of code in a random file that's three steps away on your dependency chain, you will get a ticket to understand what has changed, make changes and fire up every tests (also fix them in 99% of the cases).
Yeah, the "Fuck You. Drop whatever you are doing because it’s not important. What is important is OUR time. It’s costing us time and money to support our shit, and we’re tired of it, so we’re not going to support it anymore." is kind of true story for internal engineers.
We once had a shipped product (which took about 20-engineer-month to develop in the first place) in maintenance mode, but still requires a full time engineer to deal with those random things all the time. Would have save 90% of that person's time it it's on a sperate branch and we only need to focus on security patches. (NO, there is no such concept of branching in Google's dev system).
We kept doing this for a while and soon realized that there is no way we can sustain this, especially after the only guys who understand how everything works switched teams. Thus, it just became obvious that deprecation is the only "responsible" and "reasonable" choice.
Honestly, I think Google's engineering practice is somewhat flawed for the lack of a good solution to support shipped products in maintenance. As a result, there is either massively successful products being actively developed; or deprecated products.
I have also worked at Google (in an unrelated department) and completely disagree with this. Maintaining old code & services is a problem everywhere. Monorepo vs multirepo, monolith service vs microservices etc. all have nothing to do with it. There will always be a broken dependency, a service/API/library you rely on about to deprecate, new urgent security patches, an outage somewhere upstream or downstream which you have to investigate, an important customer hitting an edge case which was hidden for years. You will always need a dedicated team to support a live product regardless of how it is engineered.
The problem at Google was (and maybe still is) with lack of incentives at the product level to do any of this. You don't get a fat bonus and promotion for saying that you kept things working as they should, made an incremental update or fixed bugs. When your packet goes up to the committee (who don't know you and know nothing about your team or background), the only thing that works in your favor is a successful new product launch.
And as an engineer you still have multiple avenues to showcase your skills. That new product manager you just hired from Harvard Business School who is eager to climb the ladder does not. And due to the lack of a central cohesive product strategy, this PM also has complete control of your team's annual roadmap.
The "you must make something new to get promoted" was a common meme (literally) at Google, but I never saw that myself. I got promoted, and I sat on promotion committees, and it didn't seem that important. I did sort of start a new project to get from 4 to 5 (rather a prototype was handed to me by my more senior team members), but it was clear to me that the path to 6 was not starting a new project -- it was increasing the reach and value of my existing project. I left before that happened (Google Fiber got canceled, found a new team but it wasn't really my thing), so I'll never know for sure, but I didn't feel any pressure to make something new for the sake of making something new. There was, of course, pressure to make the thing that you did work on good, and nobody was really going to stop you from making something new.
Basically, that whole eng ladder thing is really important. I looked at that a lot for my own promotions and for evaluating candidates for promotions. Just dealing with churn isn't really on there, so it's probably not something you should focus too much on. I'd say that's true at any job; customers aren't going to purchase your SaaS because you upgraded from Postgres 12 to 13. They give zero fucks about things like that. You do upgrades like that because they're just something you have to do to make actual progress on your project. Maybe unfortunate, but also unavoidable. Finding a balance is the key, as with anything in engineering.
The biggest problem I found with promotions is that people wanted one because they thought they were doing their current job well. That isn't promotion, that's calibration, and doing well in calibration certainly opens up good raise / bonus options. Promotion is something different -- it's interviewing for a brand new job, by proving you're already doing that job. Whether or not that's fair is debatable, but the model does make a lot of sense to me.
Things could have changed; I haven't worked at Google for 4 years. But this was a common complaint back then, and it just wasn't my experience in actually evaluating candidates for promotion.
"The biggest problem I found with promotions is that people wanted one because they thought they were doing their current job well. That isn't promotion, that's calibration, and doing well in calibration certainly opens up good raise / bonus options. Promotion is something different -- it's interviewing for a brand new job, by proving you're already doing that job. Whether or not that's fair is debatable, but the model does make a lot of sense to me."
Thanks for articulating this distinction so clearly; it's a simple enough idea, but it seems to elude so many.
> Promotion is something different -- it's interviewing for a brand new job, by proving you're already doing that job.
Every large corporation has a concept of levels. It makes sense to use levels as a progression (they are numeric after all) rather than a new job each time. That’s what job titles/roles are for.
I’m not convinced by this summary, and it seems anecdotal rather than realistic.
The reason things like this eludes so many people is because it never gets properly explained by anyone. What jrockway explained might be simple, but it is quite rare to see such an explanation.
With respect to your experience, the impact of promotion chasing was heavily felt by product teams and I wouldn't expect it to be that visible to people on the promo committees. I watched multiple fellow Googlers rush project work and cut corners in order to be able to "ship" and put the project in their promo package (and be frustrated when they missed promo). In some cases I got to watch them abandon the project and move on to something else even though it badly needed additional maintenance and cleanup due to all the corner-cutting. In one specific case, all the corner cutting led to multiple significant exploit chains (one of them delivered persistent root on Chromebooks)
> I watched multiple fellow Googlers rush project work and cut corners in order to be able to "ship" and put the project in their promo package (and be frustrated when they missed promo)
This kind of confirms my point -- the committee isn't looking for "created a disaster area a month before promo packets were due". They want a consistent track record of success at the next level.
I definitely encountered this problem at Google (there was a reason it was a meme), but it was far more prevalent at the EM/PM/director level, and so still directly affected the overall product strategy for the org and what you as an IC got to work on.
>Promotion is something different -- it's interviewing for a brand new job, by proving you're already doing that job
I've worked a couple places where getting a "meets expectations" on your annual review was expected
Their review processes were calibrated such that you should [almost] never get a 5 ("always exceeds")
A handful of 4s ("sometimes exceeds") was good - but not a requirement ('too many' 4s indicated you were in the wrong role, so titles/pay/etc would be adjusted)
More than one 2 ("sometimes doesn't meet") was reason for extra mentoring, one-on-ones, etc
There were no 1s ("fails to meet") - if you would otherwise have earned a 1 in any category, you'd've been let go already
I think monorepo makes it easier to update downstream dependencies atomically as part of an upstream change, and thus encourages a culture of unstable APIs.
That is, careful evolution of internal APIs is not given much weight, so modularity - in the sense of containing change - suffers.
I don't think monorepos must necessarily go this way, but expressing dependencies in terms of build targets rather than versioned artifacts has a strong gravitational effect. Change flows through the codebase quickly. That has upsides - killing off legacy dependencies more quickly - and downsides, wanting to delete code that isn't pulling its weight because of the labour it is inducing.
[I currently work at Google but I've only been here a few weeks. I certainly don't speak for the company.]
I think this is absolutely it. A lot of best practices go down the drain when the compiler compiles the bytecode. As such a lot of best practices are about people, not the computer. APIs can be just as stable behind a network or a library, but way more people are onboard with never break your APIs than they are never break your function.
Concerning the promotion thing, I hear this a lot from Googlers but isn't it the same everywhere else? Most tech companies (big tech at least) will promote on product achievements, not maintenance.
The main difference was/is that at Google your immediate manager, director, PM, peers and everyone else in your product unit (who you work with every day) have almost zero say in whether you get promoted or not. You have to essentially summarize everything you did in bullet points and send it over to an anonymous committee who don't know who you are. They will base their decision on this piece of paper without any additional background or context.
This does help in various ways – the process is more objective, there is less bias, less departmental/managerial politics etc. The drawback is that a lot gets lost in translation. There is too much burden on you as an engineer to pick and choose what you spend your time on so it looks good to the committee.
In other companies I have worked at getting promoted was a byproduct of doing a good job. At Google getting promoted is the job.
There have been some changes that make this entirely untrue for earlier promos and partially untrue for promotions to L6/Staff. There's considerably more locality at this point.
That’s not true anymore until you are going for 6/7 (depending on your org). Now, committees are based in your org and members are expected to be somewhat familiar with your work.
This sounds like any large organization, where each engineer is only one tooth on a very large gear.
I'm guessing, but do not have enough anecdotal experience myself, that just about any large tech company employee here is reading your description and thinking "sounds like my company."
I'm curious how sound my hypothesis/guess is. Can other large organization employees answer with a claim that this does NOT describe their situation?
This makes sense. I do not work at Google, we do not have monorepo, and yet many problems feel similar.
The guys who maintain the company infrastructure introduce some changes, send an e-mail notification, and call it a day. The maintenance you need to do at your existing project to keep up with these changes does not count as important work, because it is not adding new features. Therefore it is important to run away from the project as soon as it stops being actively developed.
I work for Google for the 3rd time; 12+ years in total. It's first time when this issue is brought to my conscience. I think that the issue exists, but it's not due to the monorepo, it's due to the internal APIs changing.
I learned how to avoid the Google3 tax that you mentioned, when the old thing is deprecated, and the new one is not working yet.
Surprisingly, the answer for me was to embrace Google Cloud: its APIs have stability guarantees. My current project depends on Google Cloud Storage, Cloud Spanner, Cloud Firestore and very few internal technologies.
I believe that this is in general a trend at Google: increasing reliance on Google Cloud for new projects. In this sense, both internal and external developers are in the same boat.
As for the monorepo - it's a blessing, in my perspective. Less rotten code, much easier to contribute improvements across the stack.
I worked as a SWE at Google and also at Google cloud. I both disagree with this and find it a very perplexing angle.
I think the issue is a mis-aligned (financial) incentive structure.
With the right incentive structure, challenges in either monorepo or federated repo can be overcome.
With the wrong incentive structure, problems will grow in both monorepo and federated repo.
The choice of repo simply manifests the way in which thorns of the incentive structure arise, but its the
incentive structure which is the root cause.
I think monorepo has an effect, but the bigger effect is dependencies expressed as build targets rather than versioned artifacts. When your dependencies are in the eternal present, you're forced to upgrade, or you're a blocker. And I don't think you can do dependencies as build targets well without a monorepo.
I think this is a little inaccurate. Google's monorepo does have branching but it is a second class citizen with minimal support that most engineers aren't aware of unless you've carried a product through many releases.
That being said, every release can stand on its own and be iteratively changed without taking on changes from the rest of the company.
The highest possibilities for breakages to be introduced are at boundaries where your long running services depend on an another team's service(s), but this problem is not unique to Google.
Google can choose to maintain a long running maintenance project or deprecate it, and I won't claim to know what plays the biggest factor in that decision (it's likely unique to every team), but having a monorepo definitely is not part of the equation.
We actually tried that route (i.e. using a branch) before deprecating the project. Instead, the infra team told us something like the branch can only live for about 6 month before they have to deprecate the toolchain that supports the branches that are more than 6 months old.
How does multi-repo codebase solve that problem? You would still need to keep up with your infra at minimum unless you run everything yourself too. Now you have another problem...
Imo mono repo has little to do with it and it’s more just an eng culture of shipping above all else (heavily influenced by their promo process)
I don't know exactly what's going on at Google, but the key feature request seems to be that one chunk of code be able to depend on a consistent version of the library interfaces. If it wasn't a mono-repo, you could specify a dependency as a particular version of the other repository. But if everything is in the same repository, and one directory of code depends on a past version of library code, then everything falls apart. Keeping code that doesn't work in the mono repo, with its test last failing, is worse than deleting the code at the point that the API change breaks the other chunk of code.
Maybe in java world it was different but when i wrote c++ and go code there breaking existing apis was extremely frowned upon and if they had to do it people usually sent you automated code edit PRs for this
I can attest. I work at a similar megacorp with a very large megarepo. If you commit a change that breaks any kind of test, anywhere, that shit is getting reverted very rapidly. If you MUST make a breaking change, congradulations, you get to update all of your users code too.
This is about company culture, not the programming language. When someone breaks their API and someone else's code stops working as a consequence, will the first person get told to fix their API, or will the second person get told to fix their application?
Some companies may have a consistent policy about this, in other companies it may depend on which team happens to have more political power.
At least during my time there nobody was introducing breaking (at compile time) api changes wily-nily. What did happen was people would deprecate (or “sunset” as pms loved to call it) a runtime api that ppl depended on - i.e shutting down some servers. So splitting the monorepo would do nothing here unless you’re willing to run those services yourself.
its easier to park a codebase and have it rot when its not in a monorepo where everything is assumed to be consistent/working all the time.
I'd have thought you could just pull maintenance-mode products out of the monorepo tree and stash them somewhere else. Let it rot by choice. Is basically what everyone else does, let you perform maintenance tasks on your schedule not other monorepo participants schedule.
I would assume multi-repo also means dependency on packages rather than code. So code changes that are not backwards compatible yield a new version of the package that doesn't have to get applied across all uses of that repo.
I can tell you that having a multi-branch code management system doesn't make this easier. You will only pay the tax at a different point in time.
In the monorepo you are forced to update things immediately if something brakes. In the multi-branch system things will get unnoticed for a while. Until you have to update dependency A (either due to a bug, security issue or you want a new feature), and then observe that everything around it moved too. Now a lot of investigations start how to update all those changed packages at once without one breaking the other. I experienced several occasions where those conflicts required more than 2 weeks of engineering time to get resolved - and I can also tell you that this isn't a very gratifying task. Try starting a new build which just updates dependency D and then notice 8 hours later than something very very downstream breaks, and you also need to update E, F, but not G.
I actually preferred it multiple times if changes would lead to breakages earlier, so that the work to fix those would be smaller too. So that's the contrarian few.
Overall software maintainence will always take a siginficant amount of time, and managers and teams need to account for that. And similar to oncall duties it also makes a lot of sense to distribute the maintainence chores across the team, so that not a single person has to end up doing the work.
If you make changes to a large shared module, is it your responsibility to chase down each and every usage of it? For example if you are upgrading a dependency due to a somewhat breaking security issue such as Jackson 2.8->2.12
At Google it mostly _is_ your responsibility to do that, yes.
There is substantial tooling assistance to assist with this, and it's common to make changes by adding new functionality, writing an automated transform to shift usage from old to new, sharding out the reviews to the suitable OWNERS, and finally removing the old functionality.
Very heavily used common code tends to be owned by teams that are used to this burden. That said, it does complicate upgrades to third party components.
You do not chase down, the buld system detects all affected modules and runs their tests. That's the advantage of monorepo - contineous integration that includes all dependent modules.
It's also a disadvantage, to be clear. Tests take longer to run when you need to rebuild your database and not just your own code. There's no easy way to put something in maintenance mode and only take changes for bug fixes, because maintaining forks is not a significant thing. Thus downstream dependencies must pay not just for bug fixes but also for feature improvements, deprecations etc.
It works well enough if everything is making money and is being actively developed.
> Yeah, the "Fuck You. Drop whatever you are doing because it’s not important. What is important is OUR time. It’s costing us time and money to support our shit, and we’re tired of it, so we’re not going to support it anymore." is kind of true story for internal engineers.
Oh man that explains everything, I can totally relate to that.
the monorepo makes everything worse but it’s only part of the problem, i believe.
the big problem is forcing everyone to keep everything “updated”.
What is really needed is a way, given a certain state (branch, etc) to find a way to reliably reproduce the build artifacts AND to have a way for your software to depend on these packages at specific versions.
This way you can make an informed decision about when or if yo upgrade something and you know for a fact that (setting security issues aside) you will not have to touch the code and you can keep running it forever.
Look at virtually any modern programming language. The way the packages work makes of breaks the language. I never understood why Google seems to believe they are special and basic stuff does not apply to them, but it does.
Also, IMHO huge difference between how thing are run and work inside Google and how things work “in the wild”.
> Would have save 90% of that person's time it it's on a sperate branch and we only need to focus on security patches. (NO, there is no such concept of branching in Google's dev system).
I wonder if this is why you have so many different programming languages being used under the hood at Google? Essentially people using a programming language as a branch. If you're working on a completely different language in theory you could shelter your team's product?
I'm not sure if agree. Almost everything major is written in Java or C++ still. And I'll disagree that the issue is with libraries at all. It's with other services changing out from underneath you.
I'm not sure whether this has been discussed here before, but I'd love to take this forum to share an angle from the tech side of things:
IMO, Google is _cursed_ to keep deprecating its products and services. It's cursed by Google's famous choice of mono-repo tech stack.
It makes all the sense and has all the benefits. But at a cost: we had to keep every single line of code in active development mode. Whenever someone changed a line of code in a random file that's three steps away on your dependency chain, you will get a ticket to understand what has changed, make changes and fire up every tests (also fix them in 99% of the cases).
Yeah, the "Fuck You. Drop whatever you are doing because it’s not important. What is important is OUR time. It’s costing us time and money to support our shit, and we’re tired of it, so we’re not going to support it anymore." is kind of true story for internal engineers.
We once had a shipped product (which took about 20-engineer-month to develop in the first place) in maintenance mode, but still requires a full time engineer to deal with those random things all the time. Would have save 90% of that person's time it it's on a sperate branch and we only need to focus on security patches. (NO, there is no such concept of branching in Google's dev system).
We kept doing this for a while and soon realized that there is no way we can sustain this, especially after the only guys who understand how everything works switched teams. Thus, it just became obvious that deprecation is the only "responsible" and "reasonable" choice.
Honestly, I think Google's engineering practice is somewhat flawed for the lack of a good solution to support shipped products in maintenance. As a result, there is either massively successful products being actively developed; or deprecated products.