Why the Median is a Better Choice Than the Mean for Outlier Detection in Z-Scores

Pranit Pawar
5 min readFeb 11, 2025

--

Outlier detection is an essential step in data analysis, particularly when we need to ensure the accuracy and reliability of our models. One popular method to detect outliers is the Z-Score, which helps identify data points that deviate significantly from the average. However, the traditional z-score calculation uses the mean and standard deviation to assess how far a data point is from the average. But here’s the problem: The mean is sensitive to outliers, which can skew your results and make it harder to identify true outliers accurately.

In this blog post, we will explore how replacing the mean with the

Median — a more robust measure of central tendency, can improve the effectiveness of outlier detection.

The Problem with the Mean in Outlier Detection

Before diving into the benefits of using the median, let’s first understand the limitations of using the mean when applying the z-score for outlier detection:

1. Mean is Sensitive to Outliers

The mean is calculated by summing up all data points and dividing them by the number of points. When there are extreme values or outliers in the dataset, the mean can be significantly skewed. This means that outliers themselves can distort the mean, which in turn affects the z-score calculation, making it harder to identify the data points that are truly unusual accurately.

2. Distorted Analysis

For example, imagine you have a dataset of student exam scores where most students score between 50-80, but one student scores 10. If you calculate the mean, it will be dragged down by that single low score, making it less representative of the majority. As a result, you may wrongly classify data points as outliers or miss the actual outliers.

Enter the Median: A Robust Alternative

The median is a far more robust measure of central tendency, and it is less sensitive to outliers. The median is simply the middle value of a dataset when the values are ordered from lowest to highest. If there is an even number of values, the median is the average of the two middle values. Here’s why the median can be a better choice for detecting outliers:

Resistant to Extreme Values:

Since the median only considers the middle value(s) of the dataset, it isn’t affected by extreme data points. Whether you have a few outliers or many, the median will remain largely unchanged.

More Accurate Outlier Detection

Using the median in the z-score formula allows you to detect outliers more accurately, especially in datasets with skewed distributions or when there are significant extreme values.

How to Use the Median in Z-Score Calculation

Typically, the z-score is calculated as:

Formula For Z score

Where:

X is a data point.

μ is the mean of the data.

σ is the standard deviation.

However, to make the Z-score more robust, we can replace the mean (μ) and standard deviation (σ) with the median (M) and median absolute deviation (MAD), respectively:

Formula after conversion

Where:

M is the Median of the dataset.

MAD is the Median Absolute Deviation, which measures the spread of data relative to the median, defined as the median of the absolute differences between each data point and the median.

The Median Absolute Deviation (MAD) is a measure of variability that is also resistant to outliers. It’s calculated as:

By using the median and MAD, we reduce the influence of outliers on the detection process, making our outlier detection method more robust and reliable.

Benefits of Using Median and MAD for Outlier Detection

  • Reduced Sensitivity to Outliers: As mentioned, the median and MAD are resistant to outliers. This helps ensure that the z-scores you calculate are more accurate and reliable.
  • Better Handling of Skewed Data: In datasets where the distribution is not normal or is heavily skewed (such as income data, test scores, etc.), using the mean can lead to misleading results. The median provides a better measure of the “typical” value in these cases.
  • Improved Robustness: In datasets with a mix of outliers and normal values, replacing the mean with the median allows you to get a more accurate sense of what constitutes an outlier, even when the data contains significant noise.

Example: How Median-Based Z-Score Helps in Real-World Applications

Imagine you’re analyzing a set of real estate prices in a city, and you have a dataset with a few properties that are priced much higher than the rest (luxury properties). These high prices could skew the mean price, making it seem like most properties are priced higher than they really are. By using the Median and MAD, you can better identify which properties are outliers based on their price without being overly influenced by the extreme values of luxury estates.

Conclusion

The median and median absolute deviation provide a more accurate and reliable method for outlier detection than traditional z-scores that rely on the mean and standard deviation. By using the median instead of the mean, you reduce the risk of distorting your analysis due to extreme values or outliers, ensuring that the detection of unusual data points is more robust.

If you’re working with datasets that may contain outliers, consider switching to the median-based z-score for more accurate outlier detection. This small adjustment can have a significant impact on the quality of your analysis and the results of your machine learning models or statistical analysis.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Pranit Pawar
Pranit Pawar

Written by Pranit Pawar

Automation is about Mindset as much as Technology | Cloud and Security Evangelist | AWS Certified | Python | Kubernetes | visit - https://github.com/pranit-p

No responses yet

Write a response