DCPY.STANDARDSCALER(with_mean, with_std, column)

The Standard Scaler scales the data by removing the mean and scaling it to the unit variance. It does not guarantee balanced feature scales in the presence of outliers as it is very sensitive to them.

Parameters
  • with_mean – Specifies if the data needs to be centered by removing mean first, Boolean (for example, True).
  • with_std – Specifies if the data needs to be scaled to unit variance, Boolean (for example, True).
  • column – Dataset column or custom calculation that you want to scale.

Example: DCPY.STANDARDSCALER(True, True, [Discount])

Input data
  • A numeric column
  • Rows that contain missing values are dropped before calculations
Result
  • A numeric column with transformed values with the same length as the input column
  • Missing values are on the same indices like in the input column
Key usage points
  • Use it when you need to normalize the data that does not contain a lot of outliers.
Example

The dataset values can vary in magnitudes, units, and range, but if we scale them both into comparable values, we can easily see how much one value is compared to the other. For example, we can scale the weight of the car and its fuel economy by using the following calculations: DCPY.STANDARDSCALER(True, True, [WT]) and DCPY.STANDARDSCALER(True, True, [MPG]).

Both calculations are visualized in the Butterfly chart,.

For the whole list of algorithms, see Data science built-in algorithms.