But this means that Microsoft is publishing a black box (Copilot) that contains GPL code.
If we think of Copilot as a (de)compression algorithm plus the compressed blob that the algorithm uses as its database, the algorithm is fine but the contents of the database pretty clearly violate GPL.
While I do believe that thinking and compression will turn out to be fundamentally the same thing, the split you propose is unclear with NN-based models. Code and data are fundamentally the same thing. The distinction we usually make between them is just a simplification, that's mostly useful but sometimes misleading. Transformer models are one of those cases where the distinction clearly doesn't make any sense.
If we think of Copilot as a (de)compression algorithm plus the compressed blob that the algorithm uses as its database, the algorithm is fine but the contents of the database pretty clearly violate GPL.