Into AI

Into AI

Hierarchical Reasoning Model: A Deep Dive

A deep dive into the Hierarchical Reasoning Model to understand its internals that help it outperform powerful reasoning models available to us today.

Dr. Ashish Bamania's avatar
Dr. Ashish Bamania
Aug 10, 2025
∙ Paid
Image generated with Google ImageFX

Reasoning well is one of the biggest challenges for AI models available today.

Most popular LLMs use Chain-of-Thought (CoT) prompting and Inference time scaling for reasoning, but they still aren’t good enough.

Even with their imperfect reasoning approach, these models have very high latency and are too expensive for everyday use.

Check out the performance of the current most powerful LLMs on the ARC-AGI benchmarks, which contain tasks that are easy to solve for humans, yet hard, or impossible, for AI.

ARC-AGI benchmarks leaderboard (Source: Arc Prize)

But this is about to change.

A small Singapore-based AI lab, founded in 2024, called Sapient Intelligence, has just open-sourced and published a new AI architecture called Hierarchical Reasoning Model (HRM) that has shocked the AI community.

The HRM architecture is inspired by the human brain and uses two interdependent recurrent networks:

  • a slower, high-level module for abstract, deliberate reasoning (the “Controller” module)

  • a faster, low-level module for detailed computations (the “Worker” module)

HRM uses two interdependent recurrent neural networks that work at different paces, mimicking how the brain processes information to solve problems.

With only 27 million parameters and only 1000 training samples, an HRM can achieve nearly perfect performance on complex Sudoku puzzles and optimal path finding challenges in large mazes.

In comparison, o3-mini-high, Claude 3.7 8K, and DeepSeek-R1 all have zero accuracy on this task.

Alongside this, an HRM outperforms all of these models on ARC-AGI-1 and ARC-AGI-2 benchmarks, directly from the inputs without any pre-training or CoT data.

HRM outperforms all other strong reasoning LLMs on complex Sudoku puzzles, optimal path finding in large mazes, and ARC-AGI benchmarks.

Here is a story where we deep dive into how a Hierarchical Reasoning Model works and understand its internals that help it outperform powerful reasoning models available to us today.


But First, Why Can’t Our Current LLMs Reason Well?

Deep neural networks are the backbone of all the Artificial Intelligence popularly available to us today.

Deep neural networks operate on a fundamental principle: the deeper a network (the more layers it has), the better it performs.

The most successful architecture, called Transformer, that powers all our LLMs is again a deep neural network that follows this principle.

However, there’s a problem: the LLM architecture is fixed, and its depth doesn't grow with the complexity of the problem being solved.

This makes them unsuitable for solving polynomial-time problems.

LLMs also aren’t Turing complete.

(Turing-complete systems can perform any computation that can be described by an algorithm as long as it has enough time and memory.)

To work around these limitations, LLMs rely on Chain-of-Thought (CoT) prompting, which is a technique of reasoning that breaks down complex tasks into simpler intermediate steps before solving them.

However, CoT prompting involves reasoning using human language. This is different from how humans do it. For humans, language is primarily a tool for communication rather than reasoning or thought.

This also means that a single misstep can lead to the reasoning derailing completely.

Furthermore, training reasoning LLMs requires a massive amount of long CoT data, which makes this process expensive, raising a concern about whether we will run out of data to train future LLMs on.

Alongside this, generating numerous tokens for complex reasoning tasks results in slow response times and increased use of computational resources at inference/ test time.


What Can We Learn About Reasoning From The Human Brain?

While LLMs use explicit natural language for reasoning, humans reason in a latent space without constant translation back and forth to language.

Following this insight, researchers from Meta published a technique called Chain of Continuous Thought (CoConuT) in 2024, which outperformed CoT in many logical reasoning tasks while using fewer reasoning tokens during inference.

Since then, many such techniques have been introduced, but they all suffer from a limitation.

The LLMs being trained for latent reasoning aren’t deep enough, as stacking up more and more layers leads to vanishing gradients, which means no learning for the model.

LLMs also use Backpropagation through time (BPPT), which is incompatible with research on how the human brain learns.

The next natural question from here is: So, how does the human brain really learn and reason?

We do not have a complete answer to this question, but we know that the brain is structured in layers or different levels, and these levels process information at different speeds.

The low-level regions react fast to sensory inputs like vision, and for movement, and the high-level regions are used for integrating information over longer timescales and slow computations, like abstract planning.

The slow, higher-level areas guide the fast, lower-level circuits that then execute a task. This is evident by different brain waves (slow theta waves and fast gamma waves).

Both areas also use feedback loops that help refine thoughts, change decisions, and learn from experience.

This hierarchical model in the brain gives it sufficient “computational depth” for solving challenging tasks.

Could we borrow these concepts and create an AI architecture that can replicate what we know about how the human brain works?


Here Comes ‘Hierarchical Reasoning Model’

Inspired by the human brain, the Hierarchical Reasoning Model (HRM) architecture consists of four components:

  • Input network (f(I))

  • Low-level recurrent module (L-module represented by f(L) or the “Worker” module)

  • High-level recurrent module (H-module represented by f(H) or the “Controller” module)

  • Output network (f(O))

An HRM performs reasoning over N high-level cycles, each containing T low-level timesteps. This makes the total timesteps per forward pass N × T.

Keep reading with a 7-day free trial

Subscribe to Into AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Dr. Ashish Bamania
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture