Into AI

Into AI

Share this post

Into AI
Into AI
Is OpenAI’s o1 The AI Doctor We’ve Always Been Waiting For? (Surprisingly, Yes!)
Copy link
Facebook
Email
Notes
More

Is OpenAI’s o1 The AI Doctor We’ve Always Been Waiting For? (Surprisingly, Yes!)

A deep dive into o1’s performance in Medicine, its strengths and weaknesses, and how it can be further enhanced towards an early promising AI doctor candidate.

Dr. Ashish Bamania's avatar
Dr. Ashish Bamania
Oct 24, 2024
∙ Paid

Share this post

Into AI
Into AI
Is OpenAI’s o1 The AI Doctor We’ve Always Been Waiting For? (Surprisingly, Yes!)
Copy link
Facebook
Email
Notes
More
1
Share
Image generated with DALL-E 3

OpenAI’s o1 is out, and its performance on STEM tasks is mind-bending!

Quoted from OpenAI’s research article titled ‘Learning to Reason with LLMs’:

OpenAI o1 ranks in the 89th percentile on competitive programming questions (Codeforces), places among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA).

The model has been trained using Reinforcement learning and uses a long internal Chain-of-Thought approach to think through the problem before generating an output.

Its performance scales incredibly with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute).

OpenAI o1’s performance (American Invitational Mathematics Examination (AIME) accuracy) improves with both train and test-time compute (Image from the article titled ‘Learning to Reason with LLMs’ by OpenAI)

Whether mathematics, competitive programming, or Ph. D-level questions in Physics, Chemistry, and Biology, it answers them all with a high degree of correctness.

Performance of o1 as compared to o1 preview and GPT-4o on different STEM benchmarks, where solid bars show Pass@1 accuracy and the shaded region represents the performance of majority vote/ consensus approach. (Image from the article titled ‘Learning to Reason with LLMs’ by OpenAI)

And, its performance is substantially higher than the previous state-of-the-art GPT-4o.

Performance improvemenets of o1 over GPT-4o across different benchmarks (Image from the article titled ‘Learning to Reason with LLMs’ by OpenAI)

But what about Medicine?

Reserachers of this new pre-print on ArXiv answered precisely this.

Keep reading with a 7-day free trial

Subscribe to Into AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Dr. Ashish Bamania
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More