An LLM With A Visual Sketchpad Can Now Smash Its Competitors Without One (Even GPT-4o)
A deep dive into the “Sketchpad” framework that enables LLMs to draw and reason via the “Visual Chain-of-Thought Prompting” approach
Humans have been using Sketching as a tool for formulating ideas, communicating them and using them to solve problems for ages.
Think about all the cave paintings that still make sense of what they are about.
Or the first images you created as a child, haphazardly drawing with multiple crayons on a blank canvas when you did not yet know how to speak.
Sketching somehow preserves and propagates knowledge like text never can.
This was an important insight that stuck with the researchers of a recent pre-print on ArXiv.
They introduced a framework called Sketchpad, which gives multi-modal LLMs a visual sketchpad and the tools to draw on it.
The framework allows these LLMs to draw intermediary sketches to boost their reasoning ability when prompted.
And yes, it works wonders!
Sketchpad significantly enhances task performance compared to other LLMs that do not utilize sketching, …
Keep reading with a 7-day free trial
Subscribe to Into AI to keep reading this post and get 7 days of free access to the full post archives.