Cache-Augmented Generation (CAG) Is Here To Replace RAG
A deep dive into how a novel technique called Cache-Augmented Generation (CAG) works and reduces/ eliminates the need for Retrieval-augmented generation (RAG).
LLMs respond to user queries based on their training data.
But this data could get outdated pretty soon.
To make sure that LLMs can answer a query using up-to-date information, the following techniques are popularly used:
Fine-tuning the entire model
Fine-tuning a model using Low-Rank Adaptation (LoRA)
In a recent ArXiv pre-print, researchers just introduced a new technique called Cache-Augmented Generation (CAG) that could reduce the need for RAG (and, therefore, all its drawbacks).
CAG works by pre-loading all relevant knowledge into the extended context of an LLM instead of retrieving it from a knowledge store and using this to answer queries at inference time.
Surprisingly, when used with long-context LLMs, the results show that this technique either outperforms or complements RAG across multiple benchmarks.
Here is a story in which we di…
Keep reading with a 7-day free trial
Subscribe to Into AI to keep reading this post and get 7 days of free access to the full post archives.