This is the thing I was most forward to reading about in the article, but there were no figures about how large the "largest Google Dataflow job ever" is. There are a bunch of relative figures, 5x 2018 - but what does that translate to? How long did it take?
Ya, concrete details were conspicuously missing. Like petabytes? Exabytes? I suspect that the "largest dataflow job ever" is significantly smaller than the kind of crap Google regularly throws at the backend that dataflow runs on. With that infrastructure at their fingers, I suspect engineers regularly fire off jobs orders of magnitude larger than necessary simply because it's not worth the 3 hours of human effort it'd take to narrow down the input set.