Hacker News new | past | comments | ask | show | jobs | submit login

It's funny because GPT-4 is actually a pile of 3.5s. You just need to set it up correctly.



I guess it's the difference between an ensemble and a mixture of experts, i.e. aggregating outputs from (a) model(s) trained on the same data vs different data (GPT-4). Though GPT-4 presumably does not aggregate, but it routes.


> GPT-4 is actually a pile of 3.5s

I understand the intension and reference you're making. I bet the implementation of GPT-4 is probably something along those lines. However, spreading speculation in definitive language like that when the truth is unknown is dishonest, wouldn't you agree?


Sure, I could it put it less definitively, but realistically, what else can it be? The transformer won't change much and all of the models, at the core use it. It's a closely guarded secret because it's easy to replicate.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: