If you understand "cost-effective" to mean the same thing as "feasible with today's tech", maybe. As in, if we feed it all the raw data, we'd need more powerful, expensive devices and they would take years or decades to complete any training on the raw data set.
But without it being done, it's an unproven hypothesis at best.
It wouldn't take decades or years of compute to train a language model that doesn't tokenize text first. It's not an 'unproven hypothesis' because it's already been done. It's just a good deal more cost effective to tokenize so those exercises aren't anything more than research novelty.
But without it being done, it's an unproven hypothesis at best.