EDA – Automobile Data 

Data Scaling & Normalization

Learning Outcome

4

Understand the impact of feature scaling on algorithm convergence & distance-based models

3

Identify which scaling technique to use based on the data distribution.

2

Differentiate between Min-Max Scaling (Normalization) & Standardization (Z-score Scaling).

1

Define Data Scaling and explain why different units

Learner already know

All text is now represented as numbers.

 Extreme values are identified and handled.

No more holes in our car records.

5

Imagine comparing a 2-inch ant that can lift 50x its body weight to a 10-foot elephant that can lift 1,000 lbs

We need to "scale" their strength so we can compare them fairly.

In Automobile data, Curb-Weight might be 4,000 lbs, while Engine-Size is only 3.0 Liters.

Without scaling, the algorithm "suffocates" the smaller number.

Why can't we just leave it !

Scaling doesn't change the information in your data; it changes the ruler you use to measure it.

It’s like converting currencies so you can compare the wealth of two people from different countries

We are translating all our car specs into a shared, neutral "language of importance

 Data Scaling

Data Scaling is the process of transforming the range of independent variables or features of data.

Ensuring that no single feature dominates the model’s learning process due to its magnitude.

Essential for:

Where:

  • XXX = original value

  • Xmin⁡X_{\min}Xmin = minimum value of the feature

  • Xmax⁡X_{\max}Xmax = maximum value of the feature

  Data Scaling 

Mechanism

Rescales numerical data to a fixed range, most commonly 0 to 1, while preserving the relative relationships between values.

 Min-Max Scaling

Formula :

Best Used For

  •  Data with known and fixed bounds 
    (e.g., Fuel Efficiency in MPG, percentages, ratings).

  •  Algorithms that do not assume a normal distribution 
    (e.g., KNN, Neural Networks).

Where:

  • XXX = original value

  • μ\muμ = mean of the feature

  • σ\sigmaσ = standard deviation of the feature

Standardization (Z-Score Scaling)

Mechanism

Centers numerical data around a mean of 0 and rescales it to have a standard deviation of 1, without bounding the values to a fixed range.

Formula :

Best Used For

  • Data that approximately follows a Gaussian (Normal) distribution

  • Algorithms such as Linear Regression, Logistic Regression, SVM, PCA

Normalization vs. Standardization

Feature Min-Max Scaling Standardization
Output Range Strictly 0 to 1 No fixed range
centered at 0)
Outlier Sensitivity Very Sensitive Less Sensitive
Distribution Shape Compressed into a box Maintains the shape of distribution
Automobile Use Case Safety Rating
(0-5 stars)
Engine Horsepower or Price

Summary

4

Always Scale AFTER Splitting: Don't let your test data "leak" into your training scale

3

Standardization is the "Go-to" for most machine learning algorithms

2

Normalization (Min-Max) is great for specific ranges

1

Scaling is about fairness: It ensures features with small numbers have an equal voice.

Quiz

Which scaling technique is most affected by a single extreme outlier in the Car Price column?

A.  Standardization

B. Min-Max Scaling (Normalization)

C. Categorical Encoding

D. Log Transformation

Which scaling technique is most affected by a single extreme outlier in the Car Price column?

A.  Standardization

B. Min-Max Scaling (Normalization)

C. Categorical Encoding

D. Log Transformation

Quiz-Answer

Data Scaling & Normalization

By Content ITV

Data Scaling & Normalization

  • 32