Here Is Google DeepMind’s New Research To Build Massive LLMs With A Mixture Of Million Experts
A deep dive into the development of the Mixture-of-a-Million-Experts (MoME) architecture, which outperforms traditional LLMs in performance and computational efficiency like never before
There is an LLM war happening around us.
It might not be immediately obvious, but all big tech companies are in a rush to develop better LLMs that outperform existing ones.
Increase the model size, the dataset size, and the amount of compute, and voila, you have a better model than before.
Given this Scaling law, researchers at Google DeepMind found that tweaking the model architecture a certain way also significantly improves its performance and training efficiency.
Their insight arose from the fact that the Transformers architecture, at the core of an LLM, stores most of its factual knowledge in the dense Feed Forward (FFW) layers.
Keep reading with a 7-day free trial
Subscribe to Into AI to keep reading this post and get 7 days of free access to the full post archives.