> In this section, we aim to understand the sources of GPT’s predictive ability.
Oh boy... I wonder how a neural net trained with unsupervised learning has a predictive ability. I wonder where that comes from... Unfortunately, the article doesn't seem to reach a conclusion.
> We implement the CoT prompt as follows. We instruct the model to take on the role of a financial analyst whose task is to perform financial statement analysis. The model is then instructed to (i) identify notable changes in certain financial statement items, and (ii) compute key financial ratios without explicitly limiting the set of ratios that need to be computed. When calculating the ratios, we prompt the model to state the formulae first, and then perform simple computations. The model is also instructed to (iii) provide economic interpretations of the computed ratios.
Who will tell them how an LLM works and that the neural net does not calculate anything? It only predicts the next token in a sentence of a calculation if it's been loss-minimized for that specific calculation.
It looks like these authors are discovering large language models as if they are some alien animal. When they are mathematically describable and really not so mysterious prediction machines.
At least the article is fairly benign. It's about the type of article that would pass as research in my MBA school as well... It doesn't reach any groundbreaking conclusions except to demonstrate that the guys have "probed" the model. Which I think is good. It's uninformed but not very misleading.
I have heard of generalization vs memorization, but the article you shared is very high quality. Thank you.
I do not think that SOTA LLMs demonstrate grokking for most math problems. While I am a bit surprised to read how little training is necessary to achieve grokking in a toy setting (one specific math problem), the domain of all math problems is much larger. Also, the complexity of an applied mathematics problem is much higher than a simple mod problem. That seems to be what the author of the first article you quoted thinks as well.
Our public models fail in that large domain a lot. For example, with tasks like counting elements in a set (words in a paragraph). Not to mention that they fail in complex applied mathematics tasks. If they have been loss-minimized for that specific calculation to the point that they exhibit this phase change, then that would be an exception.
But in the financial statement analysis article, the author says explicitly that there isn't a limitation on the types of math problems they ask the model to perform. This is very, very irregular, and there are no guarantees that model has generalized them. In fact, it is much more likely that it hasn't, in my opinion.
In any case, thank you again for the article. It's just such a massive contrast with the MBA article above.
Phase changes and grokking make me nervious... It seems once you reach a certain threshold of training, you can continually "phase-change" and generate these emergent capabilities. This does not bode well for alignment.
Oh boy... I wonder how a neural net trained with unsupervised learning has a predictive ability. I wonder where that comes from... Unfortunately, the article doesn't seem to reach a conclusion.
> We implement the CoT prompt as follows. We instruct the model to take on the role of a financial analyst whose task is to perform financial statement analysis. The model is then instructed to (i) identify notable changes in certain financial statement items, and (ii) compute key financial ratios without explicitly limiting the set of ratios that need to be computed. When calculating the ratios, we prompt the model to state the formulae first, and then perform simple computations. The model is also instructed to (iii) provide economic interpretations of the computed ratios.
Who will tell them how an LLM works and that the neural net does not calculate anything? It only predicts the next token in a sentence of a calculation if it's been loss-minimized for that specific calculation.
It looks like these authors are discovering large language models as if they are some alien animal. When they are mathematically describable and really not so mysterious prediction machines.
At least the article is fairly benign. It's about the type of article that would pass as research in my MBA school as well... It doesn't reach any groundbreaking conclusions except to demonstrate that the guys have "probed" the model. Which I think is good. It's uninformed but not very misleading.