Into AI

Into AI

Share this post

Into AI
Into AI
Revisiting The Basics: Rotary Position Embeddings (RoPE)
Copy link
Facebook
Email
Notes
More

Revisiting The Basics: Rotary Position Embeddings (RoPE)

A lesson on Positional Embeddings from the ground up.

Dr. Ashish Bamania's avatar
Dr. Ashish Bamania
Feb 08, 2025
∙ Paid
8

Share this post

Into AI
Into AI
Revisiting The Basics: Rotary Position Embeddings (RoPE)
Copy link
Facebook
Email
Notes
More
6
Share

Transformers process tokens in parallel rather than sequentially.

This is what gives them the computational advantage over RNNs.

Into AI is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

However, this also makes Transformers position-agnostic, meaning they do not have a sense of the order of the tokens they process.

Consider these two sentences:

  1. “The cat sits on the mat.”

  2. “The mat sites on the cat.”

To a Transfromer, both of them are the same.

This isn’t good for language processing.

Therefore, positional information in the form of positional embeddings (vectors) is added with token embeddings before Transformers process them.

The Transfromer architecture where the addition of Positional Encoding to the Input embedding can be seen in the lower left and right corners (Image obtained from the research paper titled ‘Attention Is All You Need’ )

Keep reading with a 7-day free trial

Subscribe to Into AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Dr. Ashish Bamania
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More