Into AI

Into AI

What Is Class Imbalance In Machine Learning & How To Fix It

Avoid the frustration when dealing with Imbalanced real-world datasets and learn to fix them at ease

Dr. Ashish Bamania's avatar
Dr. Ashish Bamania
Sep 10, 2023
∙ Paid
2
Share
Credits: Midjourney

What Is Class Imbalance?

Real-world datasets are messy (unlike the Scikit-Learn datasets).

Class imbalance arises when the distribution of examples across different classes is not uniform. 

Byte Surgery is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In other words, some classes have a lot more samples than others. 

For example, think about a dataset where the task is to detect a rare lung disease on Chest X-rays (Binary Classification). Out of 10,000 patients, only 50 might have the disease, while 9,950 do not.

The same might apply to a dataset used for a Regression problem of predicting house prices in a city. Most houses are priced between $100,000 and $500,000, but there are a few luxury mansions priced at over $10 million.

Such datasets might skew the model training towards better detecting the majority class along with the inability to detect the minority class.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Dr. Ashish Bamania
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture