Superfast Matrix-Multiplication-Free LLMs Are Finally Here
A deep dive into Matrix-Multiplication-Free LLMs that might drastically decrease the use of GPUs in AI, unlike today
A recent research article published in ArXiv has proposed massive changes in LLMs as we know them today.
The researchers involved in the project eliminated Matrix Multiplication (MatMul), a core mathematical operation performed in LLMs.
They showed how their new MatMul-free LLMs can perform strongly even at billion-parameter scales and how they can even beat the performance of traditional LLMs on certain tasks!
This is huge and has been made possible just because of this single optimisation!
This is because the MatMul operation, although extremely important for LLMs, is highly computationally expensive. It is what that makes today’s LLMs highly reliant on Graphics Processing Units (GPUs) for their training and inference.
But this might not be true anymore!
Here’s a story where we deep dive into how these new MatMul-free LLMs were made possible and how they positively influence the future of AI.
Keep reading with a 7-day free trial
Subscribe to Into AI to keep reading this post and get 7 days of free access to the full post archives.