Descriptive statistics -II

Paddy
3 min readApr 15, 2020

If you have landed on this page and wonder what was discussed on Part I, You could find it out here

Deviation

Lets consider the same marks example from Part I sorted in ascending order

36,45, 51,58,67,72,88,88,97,99

The mean of these values is 70.1

Deviation for any given number is (Mean-The given number) .

Say for 72 the mean is 72–70.1 = 1.9

similarly for 67 it is 67–70.1 = -3.1

The sum of the deviation of all the points from Mean is 0. For above example

  • 34.1+ (-25.1) +(-19.1) +(-12.1) + (-3.1) + (-0.1) + 1.9 + 17.9 + 17.9 + 26.9 + 28.9 = 0 (actually 0.1 consider to round it off to 0)

Sensitivity — Mean vs Median

Lets consider training km of 2 runners for a month:

x => 6,8, 12, 20 ,22,23,25,26,30,32 (Distance in km)

y => 2,3,5,8,12,13,14,16,19,180

Mean of x= 20.4

Mean of y = 27.2

From this data it clearly shows Runner Y has better average running distance and considered to be a better runner.

If you calculate the median for both of them

Median of x = 22.5

Median of y = 12.5

Median gives a picture Runner y is better than x. if you watch closely the data of y runner, he has an outlier performance of 180 km which contributes to his better mean. In these scenario we can calculate the mean and median by expelling the outliers. How do we do that,by removing first and last value of the data . Hence

x => 8, 12, 20 ,22,23,25,26,30 (Distance in km)

y => 3,5,8,12,13,14,16,19

Trimmed Mean of x = 20.75

Trimmed Median of x= 22.5

Trimmed Mean of y= 11.25

Trimmed Median of y =12.5

Outlier

As you compare with previous value by removing the outlier, mean has changed a lot for y (who has an outlier performance), whereas median is slightly change. Conclusion is Mean is sensitive to an outlier whereas median is not. In most case Mode is also not sensitive to outliers

Types of Distribution

  1. Symmetric distribution

The values will be equally distributed from left to right from Median. In this case Mean = Median = Mode

2) Bi modal Symmetric distribution

The modes are multiple and different from (Mean = Median )

3) Left- Skewed distribution

Long tail of small values in the left. Mean will be in the Right end. And mostly in the order of Mean < Median < Mode

4) Right-Skewed distributions

Long tail of small values in the right. Mean will be in the left end and Mostly in the order Mode > median > Mean

--

--