Hacker News new | past | comments | ask | show | jobs | submit login

I am a university year 2 student learning about basic mathematics and statistics related to neural networks. One thing that shocks me is that there isn't an "incremental" solution for building larger (more parameters) AI models (like GPT-4) despite having one in a smaller size e.g. GPT-3.5 (I saw the term "incremental (compiling)" nearly everywhere in the software engineering industry). I am curious how is this not possible theortically?



It is possible, just not practical in many cases. For incremental computations you should be able to either reverse the computation or store the inputs _and_ intermediate results. And you have to repeat some non trivial share of computations anyway, possibly all of it. For AI training this is prohibitively expensive and it is simpler to train from scratch. Not saying it is impossible but demand so far is not there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: