What does Centering and Scaling mean? What is the individual effect of each of those?

When preparing our ‘Training Data’, two basic pre-processing techniques, applicable to Numerical Features, are ‘Centering’ and ‘Scaling’. These are usually applied together and maybe necessary to transform raw numerical data into a format that is suitable for the algorithms of choice.

Centering our data means that we alter the position of its mean, by applying a constant to each data point, shifting the response curve up/down. The objective, in Standardization, is to achieve a mean that is equal to zero. By only ‘Centering’ the data variance / relative magnitudes of the data remains the same, as does the unit, only the mean is altered.

Scaling our data means that it is transformed so as to fit within a single specific range, it is a technique that is useful to ensure that different Features can be compared without the risk of overshadowing others that have a different range. It is common to scale Features, as in Standardization, so that they have a Standard Deviation of 1. However ‘Scaling’ a Features min & max values between 0 & 1 (or -1 & 1 if negative values are present) is performed during ‘‘Min-Max Scaling’

Author

Help us improve this post by suggesting in comments below:

– modifications to the text, and infographics
– video resources that offer clear explanations for this question
– code snippets and case studies relevant to this concept
– online blogs, and research publications that are a “must read” on this topic

Leave the first comment

Partner Ad
Find out all the ways that you can
Contribute
Here goes your text ... Select any part of your text to access the formatting toolbar.