September Recap On "Into AI"

Here are the three interesting AI topics that we discussed this month.

Oct 04, 2025

Image generated using Gemini 2.5 Flash Image

Hey there! 👋🏻

Thanks for being a curious subscriber to this publication. I’m glad you’re one of the individuals who love staying at the forefront of AI research and development.

September was a month of some exciting discussions on Into AI. Here are three topics that we discussed in depth this month.

Before we move forward, I wanted to share that my latest book, called “LLMs In 100 Images”, has become my best-read book of all time!

It is a collection of 100 easy-to-follow visuals that describe the most important concepts you need to master LLMs today.

Grab your copy today at a special early bird discount using this link.

1. Proximal Supervised Fine-tuning

Proximal SFT is a new algorithm that combines traditional supervised fine-tuning (SFT) of LLMs with reinforcement learning methods, specifically PPO.

Although post-training with SFT is straightforward and more efficient compared to RL, LLMs trained using it tend to have poor generalization capabilities (i.e., they memorize the training dataset).

Supervised fine-tuned models also show diminished exploration capabilities when further post-trained with RL.

To address these issues in SFT, researchers introduced Proximal Supervised Fine-Tuning (PSFT), which supercharges SFT by stabilizing the optimization it brings to the model parameters.

This leads to better generalization capabilities in an LLM while leaving room for further exploration and improvement in subsequent RL post-training stages.

Here is the post where we discussed all the details about this algorithm.

Proximal SFT: SFT Supercharged By RL Is Here

Dr. Ashish Bamania

Sep 5

Read full story

2. Learn To Train A Reasoning Model From Scratch Using GRPO

GRPO (Group Relative Policy Optimization) is a reinforcement learning method popularly used for training Reasoning LLMs today.

GRPO was first introduced in the DeepSeekMath paper.

The main difference between GRPO and PPO is that GRPO does not use a Value model to estimate Advantage.

Instead, it calculates the Advantage using the relative scoring between a group of outputs from the model for a given prompt (hence, the ‘relative’ term in GRPO).

In this post, we learned to train a non-reasoning Qwen model to reason using GRPO.

Learn To Train A Reasoning Model From Scratch Using GRPO

Dr. Ashish Bamania

Sep 19

Read full story

3. 11 System Design Concepts Explained, Simply

Systems design is an important part of all software engineering tasks you will work on in your career.

While junior developers may get by with writing functional code, senior engineers are valued for their ability to keep the bigger picture in mind.

In this post, I collaborated with the brilliant

Neo Kim

on his newsletter

The System Design Newsletter

and discussed the 11 key concepts that you need to understand systems design and become a better engineer.

The System Design Newsletter

11 System Design Concepts Explained, Simply

Get my system design playbook for FREE on newsletter signup…

2 months ago · 187 likes · 9 comments · Neo Kim and Dr. Ashish Bamania

There’s also an update regarding the publication's price, which has been simplified to just $50/ year.

Consider becoming a paid subscriber to this publication and never miss an update!

I plan to write many more interesting research-based articles and hands-on tutorials this month.

Thanks again for being a curious reader of ‘Into AI’. It’s great to have you here! 👋🏻

Into AI