Into AI

Into AI

Share this post

Into AI
Into AI
An LLM With A Visual Sketchpad Can Now Smash Its Competitors Without One (Even GPT-4o)
Copy link
Facebook
Email
Notes
More

An LLM With A Visual Sketchpad Can Now Smash Its Competitors Without One (Even GPT-4o)

A deep dive into the “Sketchpad” framework that enables LLMs to draw and reason via the “Visual Chain-of-Thought Prompting” approach

Dr. Ashish Bamania's avatar
Dr. Ashish Bamania
Aug 16, 2024
∙ Paid
5

Share this post

Into AI
Into AI
An LLM With A Visual Sketchpad Can Now Smash Its Competitors Without One (Even GPT-4o)
Copy link
Facebook
Email
Notes
More
1
Share
Image generated with DALL-E 3

Humans have been using Sketching as a tool for formulating ideas, communicating them and using them to solve problems for ages.

Think about all the cave paintings that still make sense of what they are about.

Thanks for reading Byte Surgery! Subscribe for free to receive new posts and support my work.

Or the first images you created as a child, haphazardly drawing with multiple crayons on a blank canvas when you did not yet know how to speak.

Sketching somehow preserves and propagates knowledge like text never can.

This was an important insight that stuck with the researchers of a recent pre-print on ArXiv.

They introduced a framework called Sketchpad, which gives multi-modal LLMs a visual sketchpad and the tools to draw on it.

The framework allows these LLMs to draw intermediary sketches to boost their reasoning ability when prompted.

And yes, it works wonders!

Sketchpad significantly enhances task performance compared to other LLMs that do not utilize sketching, …

Keep reading with a 7-day free trial

Subscribe to Into AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Dr. Ashish Bamania
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More