Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
ModuleFormer: Modularity Emerges from Mixture-of-Experts (arxiv.org)
1 point by dhruvdh on Sept 17, 2023 | hide | past | favorite | 1 comment


GitHub Repo: https://github.com/IBM/ModuleFormer

GitHub Description: ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward experts. We released a collection of ModuleFormer-based Language Models (MoLM) ranging in scale from 4 billion to 8 billion parameters.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: