Tiny Recursive Model (TRM): A Deep Dive
Deep dive into the architecture and inner workings of the 7M parameter Tiny Recursive Model (TRM) that beats the most advanced reasoning LLMs on complex problems.
A new class of AI models is emerging. It’s called the Recursive Reasoning Model.
The architecture of the earliest recursive reasoning model, called the Hierarchical Reasoning Model (HRM), was published by Sapient Intelligence in early 2025.
The biologically inspired HRM consists of two interdependent neural networks operating at different frequencies (one updating faster than the other).
With just 27 million parameters, it outperforms powerful LLMs on complex tasks, such as solving challenging Sudoku puzzles, finding optimal paths in large mazes, and on ARC-AGI, when trained using only 1,000 examples.
While these results were impressive enough, new research published by a researcher at Samsung SAIT AI Lab has improved HRMs to reach even better performance.
Their newly proposed Tiny Recursive Model (TRM) uses a single small network with only two layers.
With only 7 million parameters, the TRM achieves a test accuracy of 45% on ARC-AGI-1 and 8% on ARC-AGI-2. (If you’re new to them, ARC-AGI benchmarks are specifically designed to act as a helpful signal for reaching AGI.)
This is a better result than most advanced reasoning LLMs available today (including Deepseek R1, o3-mini, and Gemini 2.5 Pro), achieved with less than 0.01% of their parameters.
Here is a story where we understand the architecture and inner workings of the Tiny Recursive Model (TRM) and explore the reasons why it beats most advanced reasoning LLMs available to us today.
Let’s begin!
Before we start, I want to introduce you to my new book called ‘LLMs In 100 Images’.
It is a collection of 100 easy-to-follow visuals that describe the most important concepts you need to master LLMs today.
Grab your copy today at a special early bird discount using this link.
But First, How Do LLMs Reason?
Reasoning in LLMs is a popular area of AI research.
To ensure that LLMs can reliably answer complex queries, they use a technique called Chain-of-Thought (CoT) reasoning. CoT imitates human reasoning by having the LLM produce step-by-step reasoning traces before giving its final answer.
Using CoT involves generating more tokens during inference (called Inference-time scaling), which in turn means using more inference compute. Generating high-quality CoT traces also means training the LLM to do so by using high-quality training datasets using expensive RL techniques.
Despite their extensive use, CoT-based LLMs still fail on benchmarks like ARC-AGI. As an example, while humans can fully solve the tasks in ARC-AGI-2, OpenAI’s o3 (high) achieves merely 6.5% accuracy on it.
A ray of hope emerged with the introduction of the Hierarchical Reasoning Model (HRM) in early 2025, which, with only 27 million parameters and using only 1000 training samples, achieved exceptional performance on complex reasoning tasks, including ARC-AGI.

To understand Tiny Recursive Models, you’ll have to understand HRMs well. Let’s learn about them in depth before proceeding further.
What Is The Hierarchical Reasoning Model (HRM)?
The HRM architecture is inspired by the human brain and consists of four components:
Input network (
f(I)
), which converts a given input into embeddings, passing it to the low-level moduleA faster, low-level module (
f(L)
) for detailed computations (the “Worker” module)A slower, high-level module (
f(H)
) for abstract, deliberate reasoning (the “Controller” module)Output head (
f(O)
), which gets the output from the high-level module and produces a final output
Both low and high-level modules follow the 4-layer Transformer architecture, with:
No bias in linear layers (following the PaLM architecture)
Rotary embeddings, and
SwiGLU activation function

Understanding A Forward Pass In An HRM
Given an input x̃
, the high-level and low-level modules start with their initial latent vectors (z(H)
and z(L)
, respectively) and recursively update them.
The low-level module updates z(L)
at a higher frequency, and the high-level module updates z(H)
at a lower frequency.
After recursion, the latent vector of the high-level module z(H)
is used to reach the final answer ŷ
using the output head.
Mathematically, a forward pass of HRM looks as follows:
Let’s understand this step by step.