AI Just Made Data Compression Algorithms Multiple Times Better Than Ever

A deep dive into the 'LMCompress' algorithm that outperforms all traditional algorithms available today, supercharging data compression like never before.

Dr. Ashish Bamania

May 20, 2025

∙ Paid

A massive 403 million terabytes of data is generated on the internet every day.

Given this rate, the global data pool is estimated to grow to 163 zettabytes (one billion terabytes) by 2025.

Plot obtained from Seagate’s report based on IDC’s Data Age 2025 study

Although the rate of data creation is exponential, the data compression methods available to us are hitting their limits.

Can AI help us solve it?

The answer is yes!

A recent research paper published in Nature Machine Intelligence has proposed a new compression algorithm called LMCompress, which uses LLMs to compress data.

This algorithm is so effective that it doubles the lossless compression ratios of JPEG-XL for images, FLAC for audio, and H.264 for videos, and quadruples the compression ratio of bz2 for texts.

This isn’t an episode of Silicon Valley; it is actually happening!

Here is a story where we deep dive into how this algorithm works to supercharge data compression like never before.

Let’s begin!

But First, How Do We Currently Compress Data?

There are two types of Data compression methods:

Lossless: These methods can compress and perfectly decompress/ reconstruct data without any loss
Lossy: These methods lose some data during compression to reach a higher data compression ratio

For those new to this term, the data compression ratio measures the relative reduction in data size when using a data compression method/ algorithm.

A higher data compression ratio means better compression.

Keep reading with a 7-day free trial

Subscribe to Into AI to keep reading this post and get 7 days of free access to the full post archives.