Can't Github just keep the old archive as it is for the already-existing releases and use the new format for new releases? Over time old releases phase out and the advantage of the new format is completely in effect. You can even use a time-based cut-off date if you somehow want to get it in sync.
The article explicitly says, "Internally, this archive is created at request time by the `git archive` subcommand". In other words, there is no pre-existing archive and apparently no cache of generated archives. Which means a request for an archive gets one generated with whatever format is in effect at that moment.
Why github doesn't cache archives instead of regenerating them on the fly is unclear, and maybe something the developers should address. Or maybe there was a cache and it got blown away by the change that caused the archive checksums to change.
Yes, they can. The thing is that tarballs are not part of release artifacts, even though they do appear to be among them. If you look closely, even the endpoints that user uploaded artifacts and generated tarballs point to are different.
Github could just generate the tarball once and store it in the same way as other release artifacts. But for some reason they chose not to.
To clarify: I am proposing to use the old archival method for already-existing releases, for example by passing in the necessary arguments to git archive forever. Releases without previous archival or created after some point in time in the future call git archive with the new arguments. No substantially more storage should be necessary.
Once the accompanying gzip-version is sufficiently old or unsafe, breaking very old releases does not matter anymore, making it a seamless transition.
I'd imagine they create it on demand and just cache for some time. That's way less storage needed than having every single commit be also a tarball stored somewhere