A deep dive into how a novel technique called Cache-Augmented Generation (CAG) works and reduces/ eliminates the need for Retrieval-augmented generation (RAG).
but using CAG also means hosting LLM locally, correct?🤔
Not always. We could add the KV cache to any LLM and query it, but hosting it locally would help with the other operations apart from this.
Let’s see how big companies decide to provide it into their services
but using CAG also means hosting LLM locally, correct?🤔
Not always. We could add the KV cache to any LLM and query it, but hosting it locally would help with the other operations apart from this.
Let’s see how big companies decide to provide it into their services