MLLIB.PEARSONCOR(columns)

Calculates the Pearson correlation coefficient between selected columns to assess the linear relationship of two continuous variables. A relationship is linear when a change in one variable is associated with a proportional change in the other variable. Pearson’s correlation coefficient is a measure based on the actual data values, and thus, it is sensitive to outliers.

Input data
  • Two numeric variables
  • The size of input data is not limited
  • Without missing values

Example: MLLIB.PEARSONCOR(sum([Gross Sales]), sum([No of Customers]))

Result

The correlation coefficient measures of the strength of the relationship between two variables (from -1 to 1). For example, the value of -1 shows a perfect negative correlation, the value of 1 indicates a perfect positive correlation, and the value of 0 — no linear relationship between the two variables.

Example

Using the Scatterplot widget, add a calculation with the MLLIB.PEARSONCOR(sum([Gross Sales]), sum([No of Customers])), but set to dimension. Using the dataset manager, drag it into the Color field. The function returns a single value, so only one color is used. The coefficient value is shown in the legend and the tooltip for each point of the visualization. The coefficient of 0.93 indicates a strong positive correlation.

For the whole list of algorithms, see Data science built-in algorithms.