What Is Class Imbalance In Machine Learning & How To Fix It
Avoid the frustration when dealing with Imbalanced real-world datasets and learn to fix them at ease
What Is Class Imbalance?
Real-world datasets are messy (unlike the Scikit-Learn datasets).
Class imbalance arises when the distribution of examples across different classes is not uniform.
In other words, some classes have a lot more samples than others.
For example, think about a dataset where the task is to detect a rare lung disease on Chest X-rays (Binary Classification). Out of 10,000 patients, only 50 might have the disease, while 9,950 do not.
The same might apply to a dataset used for a Regression problem of predicting house prices in a city. Most houses are priced between $100,000 and $500,000, but there are a few luxury mansions priced at over $10 million.
Such datasets might skew the model training towards better detecting the majority class along with the inability to detect the minority class.
Keep reading with a 7-day free trial
Subscribe to Into AI to keep reading this post and get 7 days of free access to the full post archives.