I am one of the people who worked on Google's PaLM model.
Having skimmed the GitHub readme and medium article, this announcement seems to be very focused on the number of parameters and engineering challenges scaling the model, but it does not contain any details about the model, training (learning rate schedules, etc.), or data composition.
It is great that more models are getting released publicly, but I would not get excited about it before some evaluations have been published. Having a lot of parameters should not be a goal in and of itself. For all we know this model is not well trained and worse than Eleuther AI's 20B parameter model, while also being inconveniently large.
1. The OP did not criticize the headline; they criticized the content. If you read the article that you linked, you would find that they do, in fact, evaluate the performance of the model.
2. 540 billion parameters is notable for its size, which is likely why they lead with that particular headline.
The difference is PaLM was extensively benchmarked and it performed as well as it should, which is to say, amazingly well. The irony here is that you should instead be invoking that other ~500b model, Nvidia's Megatron-530b, which was undertrained, only cursorily evaluated (no interest in any new capabilities or even examining old ones like inner monologues) and promptly forgotten by everyone after the headlines about being the largest dense model: https://arxiv.org/abs/2201.11990#microsoftnvidia
it's in there look for this sentence. And they did some top dog stuff: Training details and best practices on acceleration and stabilizations can be found on Medium (English)
Given that Yandex is a crucial part of Russian propaganda arm, we should consider the whole range of possibilities from:
* Good. This is great researchers helping community by sharing great work. (which is what I'd like to assume before I have any proof of the contrary)
* Bad. This very expensive training has been approved by Ya leadership (which is under Western personal sanctions) because they've secretly built in RU's propaganda talking points into the model. Such as "war in Ukraine is not a war but special operation" etc.
No. read my message again. As I said, we should assume good intention first until proven otherwise.
But we should have better tools to test for biases/toxicity. Perspective API is great tool for toxicity detection. But I'm not aware of any "propoganda" detection tool.
Having skimmed the GitHub readme and medium article, this announcement seems to be very focused on the number of parameters and engineering challenges scaling the model, but it does not contain any details about the model, training (learning rate schedules, etc.), or data composition.
It is great that more models are getting released publicly, but I would not get excited about it before some evaluations have been published. Having a lot of parameters should not be a goal in and of itself. For all we know this model is not well trained and worse than Eleuther AI's 20B parameter model, while also being inconveniently large.