We went from models not being able to do any sort of even the most basic math to “look they can’t even reliably pass the most difficult mathematics benchmark on earth!” In 18 months and you’re staying to “stop the hype!” ?? They’re putting up new SOTA results every 3 weeks with no sign of slowing down and you think that’s a reason to *temper* expectations??
Where do you think these models will be one year from now? Why should we overfit to their capabilities at the time of writing this article instead of gawking at a nearly vertical trend line in capabilities?
Conclusion: Stop the Hype TrAIn LOL, thanks for sharing and testing!
Where can I find the solutions?
They can be found here:
https://epoch.ai/frontiermath/benchmark-problems
We went from models not being able to do any sort of even the most basic math to “look they can’t even reliably pass the most difficult mathematics benchmark on earth!” In 18 months and you’re staying to “stop the hype!” ?? They’re putting up new SOTA results every 3 weeks with no sign of slowing down and you think that’s a reason to *temper* expectations??
Where do you think these models will be one year from now? Why should we overfit to their capabilities at the time of writing this article instead of gawking at a nearly vertical trend line in capabilities?