Into AI

Into AI

Share this post

Into AI
Into AI
Here Is Google DeepMind’s New Research To Build Massive LLMs With A Mixture Of Million Experts

Here Is Google DeepMind’s New Research To Build Massive LLMs With A Mixture Of Million Experts

A deep dive into the development of the Mixture-of-a-Million-Experts (MoME) architecture, which outperforms traditional LLMs in performance and computational efficiency like never before

Dr. Ashish Bamania's avatar
Dr. Ashish Bamania
Jul 28, 2024
∙ Paid
2

Share this post

Into AI
Into AI
Here Is Google DeepMind’s New Research To Build Massive LLMs With A Mixture Of Million Experts
Share
Image generated with DALL-E 3

There is an LLM war happening around us.

It might not be immediately obvious, but all big tech companies are in a rush to develop better LLMs that outperform existing ones.

Increase the model size, the dataset size, and the amount of compute, and voila, you have a better model than before.

Given this Scaling law, researchers at Google DeepMind found that tweaking the model architecture a certain way also significantly improves its performance and training efficiency.

Their insight arose from the fact that the Transformers architecture, at the core of an LLM, stores most of its factual knowledge in the dense Feed Forward (FFW) layers.

Keep reading with a 7-day free trial

Subscribe to Into AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Dr. Ashish Bamania
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share