I guess it's the difference between an ensemble and a mixture of experts, i.e. a...

jamala1 7 months ago | parent | context | favorite | on: More Agents Is All You Need: LLMs performance scal...

I guess it's the difference between an ensemble and a mixture of experts, i.e. aggregating outputs from (a) model(s) trained on the same data vs different data (GPT-4). Though GPT-4 presumably does not aggregate, but it routes.