'FANformer' Is The New Game-Changing Architecture For LLMs
A deep dive into how FANFormer architecture works and what makes it so powerful compared to Transformers
LLMs have always surprised us with their capabilities, with many speculating that scaling them would lead to AGI.
But such expectations have led to disappointments in the last few days, with GPT-4.5, the largest and best model for chat from OpenAI, performing worse than many smaller models on multiple benchmarks.
While DeepSeek-V3 scores 39.2% Pass@1 accuracy on AIME 2024 and 42% accuracy on SWE-bench Verified, GPT-4.5 scores 36.7% and 38% on these benchmarks, respectively.
Keep reading with a 7-day free trial
Subscribe to Into AI to keep reading this post and get 7 days of free access to the full post archives.