Start With The Problem
You trained a house price prediction model. Now you want to know — how good is my model?
Your model made these predictions:
|
House |
Actual
Price |
Predicted
Price |
|
1 |
₹50 L |
₹45 L |
|
2 |
₹80 L |
₹85 L |
|
3 |
₹60 L |
₹58 L |
|
4 |
₹90 L |
₹95 L |
|
5 |
₹70 L |
₹65 L |
You need one single number that tells you — on average, by how much is my model wrong?
That number is MAE.
What is MAE?
MAE = Average of absolute differences between actual and predicted values
Three steps only:
- Find the error (Actual − Predicted) for each row
- Make all errors positive (take absolute value)
- Take the average
The Formula
MAE = (1/n) × Σ |Actual - Predicted|
- n = total number of predictions
- | | = absolute value (just remove the minus sign)
- Σ = sum of everything
Manual Walkthrough — Step by Step
|
House |
Actual |
Predicted |
Error
(A-P) |
|Error| |
|
1 |
50 |
45 |
+5 |
5 |
|
2 |
80 |
85 |
-5 |
5 |
|
3 |
60 |
58 |
+2 |
2 |
|
4 |
90 |
95 |
-5 |
5 |
|
5 |
70 |
65 |
+5 |
5 |
Step 1 — Sum of absolute errors:
5 + 5 + 2 + 5 + 5 = 22
Step 2 — Divide by n (5 houses):
MAE = 22 / 5 = 4.4
Result: MAE = 4.4 Lakhs
This means — on average, your model is wrong by ₹4.4 Lakhs per house. Simple and clear.
Why Absolute Value? Why Not Just Average the Errors?
Without absolute value:
Errors = +5, -5, +2, -5, +5
Sum = +5 - 5 + 2 - 5 + 5 = 2
Avg = 2 / 5 = 0.4
This says model is almost perfect — but it's clearly not! Positive and negative errors cancel each other out and give a false picture.
Absolute value fixes this — every error counts as positive, no cancellation.
Python Program
import numpy as np import pandas as pd from sklearn.metrics import mean_absolute_error import matplotlib.pyplot as plt
# --- Data --- actual = [50, 80, 60, 90, 70] predicted = [45, 85, 58, 95, 65]
# --- Manual Calculation --- errors = [a - p for a, p in zip(actual, predicted)] absolute_errors = [abs(e) for e in errors] mae_manual = sum(absolute_errors) / len(absolute_errors)
print("=== Manual Calculation ===") print(f"Errors : {errors}") print(f"Absolute Errors : {absolute_errors}") print(f"MAE (manual) : {mae_manual}")
# --- Using NumPy --- mae_numpy = np.mean(np.abs(np.array(actual) - np.array(predicted))) print(f"\nMAE (numpy) : {mae_numpy}")
# --- Using Scikit-learn --- mae_sklearn = mean_absolute_error(actual, predicted) print(f"MAE (sklearn) : {mae_sklearn}")
# --- DataFrame view --- df = pd.DataFrame({ 'Actual' : actual, 'Predicted' : predicted, 'Error' : errors, 'Absolute Error' : absolute_errors }) print(f"\n{df.to_string(index=False)}")
# --- Plot --- x = range(1, len(actual) + 1) plt.figure(figsize=(10, 5)) plt.plot(x, actual, label='Actual', marker='o', linewidth=2) plt.plot(x, predicted, label='Predicted', marker='s', linewidth=2, linestyle='--')
for i in x: plt.vlines(i, min(actual[i-1], predicted[i-1]), max(actual[i-1], predicted[i-1]), colors='red', linewidth=2, alpha=0.6)
plt.title(f'Actual vs Predicted (MAE = {mae_sklearn})') plt.xlabel('House') plt.ylabel('Price (Lakhs)') plt.legend() plt.grid(True) plt.tight_layout() plt.savefig('mae_plot.png') plt.show() print("\nPlot saved!")
Output:
=== Manual Calculation ===
Errors : [5, -5, 2, -5, 5]
Absolute Errors : [5, 5, 2, 5, 5]
MAE (manual) : 4.4
MAE (numpy) : 4.4
MAE (sklearn) : 4.4
Actual Predicted Error Absolute Error
50 45 5 5
80 85 -5 5
60 58 2 2
90 95 -5 5
70 65 5 5
Plot saved!
The red vertical lines in the plot show the error for each prediction — MAE is just the average length of those red lines.
How to Read MAE
MAE is in the same unit as your target variable.
|
Target |
MAE = 4.4
means |
|
House Price
(Lakhs) |
Wrong by
₹4.4L on average |
|
Temperature
(°C) |
Wrong by
4.4°C on average |
|
Sales (units) |
Wrong by 4.4
units on average |
This is MAE's biggest strength — it's directly interpretable. No unit conversion needed.
MAE vs Other Metrics — When to Use What
|
Metric |
Penalizes
Big Errors? |
Interpretable? |
Use When |
|
MAE |
No (equal
weight) |
✅ Yes, same
unit |
Outliers
exist, you want simple average error |
|
MSE |
Yes
(squares errors) |
❌ Squared
unit |
Big errors
are very bad, want to penalize them hard |
|
RMSE |
Yes |
✅ Yes, same
unit |
Big errors
are bad but want interpretable result |
The One Weakness of MAE
MAE treats all errors equally. A ₹2L error and a ₹20L error both just get added as-is.
Error of 2 → contributes 2
Error of 20 → contributes 20
If you want your model to heavily penalize large mistakes, use MSE or RMSE instead. But if outliers exist in your data and you don't want them to dominate the metric — MAE is safer.
Real ML Project Usage
from sklearn.metrics import mean_absolute_error from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split
# After training your model X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LinearRegression() model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Evaluate mae = mean_absolute_error(y_test, y_pred) print(f"Model MAE: {mae:.2f}")
# Rule of thumb — MAE should be small relative to your target's range print(f"Target range : {y_test.max() - y_test.min():.2f}") print(f"MAE as % of range: {(mae / (y_test.max() - y_test.min())) * 100:.1f}%")
MAE as % of range is a great sanity check — if MAE is 5% of the range, your model is decent. If it's 40%, your model needs work.
One Line Summary
MAE tells you — on average, by how much is your model's prediction off from the real value — in the same unit as your data, making it the most human-readable regression metric.

