What do you mean by "Unix"? Are you talking about some specific Unix version, or is there something in the POSIX spec that says that libc isn't a library?
It's not that libc is supposed to not be a library, but those functions are the POSIX-defined interfaces to the OS. Linux is unusual in that it defines its stable interfaces in terms of the syscall ABI, enabling different implementations of the libc that can work semi-reliably across kernel versions.
I remember some funny problems with Glibc, like, 20 years ago, but it's been invisible to me (as a user) since then. You get a new Glibc, old binaries still work, it's fine.
Just like with Windows the challenges affect developers rather than users.
> You get a new Glibc, old binaries still work, it's fine.
Indeed, but when you need to build for an older glibc it's not so simple. This is a common use case, since e.g. AWS's environments are on glibc 2.26.
Ideally you'd like to build for all targets, including older systems, from a single, modern environment (this is trivial in Windows) -- and you can do some gymnastics to make that happen[1] -- but in practice it's easier to just manage different build environments for different targets. This is partly why building Linux wheels is so convoluted for Python[2].
Hardly a world-ending problem, but my point is simply that C runtimes are a pain everywhere.
> Ideally you'd like to build for all targets, including older systems, from a single, modern environment (this is trivial in Windows)
https://github.com/sjmulder/netwake does what you're talking about, but it does a lot of gymnastics to make it work, and it also needs to use MinGW rather than MSVC for that to be the case.
I'm pretty sure I've run into binaries breaking on new versions of Glibc but maybe it's because the architecture or calling convention changed. I've never really gotten the sense that GNU cares much about binary compatibility (which makes sense, they argue that sharing binaries is mostly counter productive.)
> How many ways can you think of would there be for your statement to be false?
Not as many as you might think.
The systems at Google may seem incredibly complicated--and they are--but when I worked there, the scenarios where somebody intercepts and exfiltrates data without your knowledge are extreme.
> If someone "higher clearance" than you decided to make you believe the above, but actually retain it somewhere in someway you weren't allowed to see.
The way this data is stored, it is designed so that access to the data is logged and the logs have various alerts / auditing procedures to catch exfiltration attempts. SREs will periodically create user data and try out clever ways of destroying or exfiltrating it to test that these controls work. The Snowden leaks also cast a long shadow over work at Google, and since then, basically, all the traffic and data in storage has been encrypted in ways that make it difficult for state level actors to surreptitiously intercept it. These systems are a bit nightmarish to design, because there are competing legal/compliance reasons why data must be retained or must be purged. For example, certain data must be retained for SOX compliance, data may be flagged as part of an ongoing investigation, data may be selected for deletion for GDPR compliance, etc.
Obviously, it is POSSIBLE that someone is still exfiltrating data, but you have hundreds or thousands of smart engineers who are trying to prevent "insider risk" and "state level actors". People within the company are a big part of the threat model, and agencies like the CIA, Mossad, KGB, etc. are also part of the threat model.
The stack may be complicated, but it's also designed with defense-in-depth to prevent people at lower levels in the stack from subverting controls at higher levels in the stack. For example, people who work on storage systems may be completely unable to decrypt the data that their storage systems contain.
If you're going to get pissy about it, it's obviously true that we are not 100% certain that data is destroyed when we say it is. But this invokes a standard for "knowing" that precludes knowing the truth of any statement which is not an analytic statement.
You don't have to believe, even for a second, if you didn't work with the wipeout systems. That's fine. I'm not trying to convince that wipeout works as intended, because I know that I can't provide the evidence to you.
However, you seem to be arguing that other people don't know that the wipeout systems work--that it's somehow impossible to know.
From what I remember, Chubby was one of those services that was a bit more sensitive to poor client behavior. You could cause problems for other users with poor Chubby behavior, and you could accidentally end up with code that causes a high Chubby load.
Chubby is a lot like etcd except without the hindsight. Distributed locking is pretty complicated and if I remember correctly there was a great podcast from Google's Kube podcast about distributed locking that goes into a lot of this. [0]
If you are going to do any distributed locking I strongly recommend listening to it. Has a lot of pain points discussed.
Collateral damage is an interesting way to put it. I've heard the internal story from people who worked at Google at the time, and it sounds like the rough sequence of events goes like this:
1. Google Reader is launched, built on internal Google technologies (the distributed database and filesystem technologies available at the time, like GFS).
2. Headcount is not allocated to Google Reader to do ongoing engineering work. Headcount is instead allocated to projects like Google+.
3. The technologies underneath Google Reader (like GFS) are shut down. Without the engineering headcount to migrate, Google Reader is shut down.
Google+ was reportedly shut down for the same reasons (but different technologies). The internal tech stack at Google is always changing, and projects without sufficient headcount for ongoing engineering will eventually get shut down. The timing of the Google Reader and Google+ shutdown reflect the timing of changes in Googles tech stack more than it reflects any strategic direction by Google.
[Edit: Just to be clear, this doesn't explain the reason why these projects get shut down. It just explains the timing.]
As someone seeing org level decisions being interpreted very differently by different people, I am not sure how much weight we can give to this insiders representation- this sounds like what an Eng manager (who himself might not know the real reason) would tell their disgruntled engineers.
Also the real question is why google didn’t assign an Eng team for this product used by millions, not why products without engineers die..
Just to be clear... I wasn't explaining the reason these projects get shut down, just the timeline of events and some of the contributing technical factors, since these factors are a little different at Google than at other companies.
The decision to deallocate headcount and stop ongoing engineering effort on a project will eventually cause that project to get shut down, no matter what company you work at. However, at many of the software companies I've worked at, projects that run on "industry-standard" or at least mundane tech stacks can run for a very long time with a relatively low amount of effort. At Google, the timeline is shorter.
For example, if you have a web app that runs on Rails or PHP, or something that runs on the JVM, maybe with a Postgres, MySQL, or MS SQL backend, you might be able to shove it onto different machines or VMs for years, only making occasional / minor changes to the code base. If, in 2008, you had a JVM app which used PostgreSQL and ran in Apache Tomcat, there's a good chance you could still run it today with minor changes.
At Google, the internal tech stack--filesystems, databases, monitoring, etc... has changes that are large enough and frequent enough that the situation is different, and projects are shut down on stricter timelines.
This is a really good story about how not to be a customer-centric organisation and not take user feedback.
What I take away is that just because they’re not paying customers doesn’t mean they won’t remember and judge you. And clearly people hold grudges for a long time (witness the number people who still maintain “Micro$oft is evil” from their 90s experiences).
“Big cloud” has had fires take out clusters, and somehow they manage to keep it out of the news. In spite of the redundancy and failover procedures, keeping your data centers running when one of the clusters was recently *on fire* is something that is often only possible due to heroic efforts.
When I say “heroic efforts”, that’s in contrast to “ordinary error recovery and failover”, which is the way you’d want to handle a DC fire, because DC fires happen often enough.
The thing is, while these big companies have a much larger base of expertise to draw on and simply more staff time to throw at problems, there are factors which incentivize these employees to *increase risk* rather than reduce it.
These big companies put pressure on all their engineers to figure out ways to drive down costs. So, while a big cloud provider won’t make a rookie mistake—they won’t forget to run disaster recovery drills, they won’t forget to make backups and run test restores—they *will* do a bunch of calculations to figure out how close to disaster they can run in order to save money. The real disaster will then reveal some false, hidden assumption in their error recovery models.
Or in other words, the big companies solve all the easy problems and then create new, hard problems.
You know, those are excellent observations. But they don’t change the decision calculus in this case. Using bigger cloud providers doesn’t eliminate all risk, it just creates a different kind of risk.
What we call “progress” in humanity is just putting our best efforts into reducing or eliminating the problems we know how to solve without realizing the problems they may create further down the line. The only way to know for sure is to try it, see how it goes, and then re-evaluate later.
California had issues with many forest fires. They put out all fires. Turns out, that solution creates a bigger problem down the line with humongous uncontrollable fires which would not have happened if the smaller fires had not been put out so frequently. Oops.
> There was this Oregon company - Mentor Graphics, I think they were called - really caught a cold trying to rewrite everything in C++ in about '90 or '91. I felt sorry for them really, but I thought people would learn from their mistakes.
I've talked to some of the people at Mentor Graphics who were there during that period. The company basically went from #1 in the industry to #3 or so during the course of the C++ refactor (and the EDA industry isn't exactly big). Bjarne Stroustrup showed up at the company now and then because the company was such a major early adopter. Inheritance chains were 5 or 10 classes deep. A full build took a week. The company hosted barbecues on weekends and invited the employees' families so they could see each other.
I only worked there somewhat later, so I just heard the stories from people who were there at the time, and only after I had been there a while. Take some old-timers out to lunch now and then, you'll learn a lot. I ended up leaving, I was more than a bit frustrated by the organizational culture and the build system my team used was by far the worst I have ever seen in my entire life.
But Oregon is an awesome place to live, the salary was good, and the hours were normal.
30 years later, I still run into the same problem regularly. I'm not sure why, but this seems to be an anti-pattern that everyone needs to learn about the hard way.
If it's anything like when I was in school, as soon as the curriculum trots out its first object oriented language, you get a lecture about how is-a relationships are the greatest invention since the compiler, and deep inheritance hierarchies are both the most practical and the most morally righteous way to organize your abstractions.
(Meanwhile, ironically enough, I'm not sure I've ever heard a CS instructor even mention the Liskov Substitution Principle.)
JWZ told a similar story about Netscape between versions 3 and 4 being rewritten in C++. The full story is in the book Coders at Work, but part of it appears here: