Cache-Augmented Generation (CAG) Is Here To Replace RAG

A deep dive into how a novel technique called Cache-Augmented Generation (CAG) works and reduces/ eliminates the need for Retrieval-augmented generation (RAG).

Dr. Ashish Bamania

Jan 11, 2025

∙ Paid

LLMs respond to user queries based on their training data.

But this data could get outdated pretty soon.

To make sure that LLMs can answer a query using up-to-date information, the following techniques are popularly used:

Fine-tuning the entire model
Fine-tuning a model using Low-Rank Adaptation (LoRA)
Retrieval-Augmented Generation (RAG)

In a recent ArXiv pre-print, researchers just introduced a new technique called Cache-Augmented Generation (CAG) that could reduce the need for RAG (and, therefore, all its drawbacks).

CAG works by pre-loading all relevant knowledge into the extended context of an LLM instead of retrieving it from a knowledge store and using this to answer queries at inference time.

Surprisingly, when used with long-context LLMs, the results show that this technique either outperforms or complements RAG across multiple benchmarks.

Here is a story in which we di…

Keep reading with a 7-day free trial

Subscribe to Into AI to keep reading this post and get 7 days of free access to the full post archives.