This short tutorial shows how you can calculate standard deviation in Python using NumPy. First, we generate the random data with mean of 5 and standard deviation (SD) of 1. Then, you can use the numpy is std() function. As you can see, the mean of the sample is close to 1.

```
import numpy as np
# mean and standard deviation
mu, sigma = 5, 1
y = np.random.normal(mu, sigma, 100)
print(np.std(y))
```

1.084308455964664

By default, np.std calculates the population standard deviation. We can calculate the sample standard deviation as well by setting ddof=1. (By default *ddof* is zero.)

```
import numpy as np
# mean and standard deviation
mu, sigma = 5, 1
y = np.random.normal(mu, sigma, 100)
print(np.std(y, ddof=1))
```

1.0897710016498157

## Why `ddof=1`

in NumPy `np.std()`

You might have questions as to why there is a need for `ddof = 1`

to calculate `standard deviation(SD)`

in `NumPy`

. To begin, the following is the formula for `np.std()`

in `NumPy`

.

\[\sqrt{\frac{1}{N-ddof} \sum_{i=1}^N (x_i – \overline{x})^2}\]

This is due to the fact that, typically, we only have a random sample of data from the population, and do not have the data of the whole population. Thus, the calculation of `SD`

is `an estimate of population SD`

from `a random sample`

(e.g., the one we generate from `np.random.normal()`

).

On the other hand, if you have all the population data, you do NOT need `ddof=1`

. For instance, if you have all the students’ GPA data in the whole university, you have the whole population of the whole university and your calculation of `SD`

does not need `ddof=1`

. In this case, `ddof=0`

and the formula below is to calculate SD for a population data.

\[\sqrt{\frac{1}{N-ddof} \sum_{i=1}^N (x_i – \overline{x})^2}=\sqrt{\frac{1}{N} \sum_{i=1}^N (x_i – \overline{x})^2}\]

However, if you you do not have the whole populatoin data, you need to set `ddof=1`

. For instance, if you only have Business School students’ GPA and you want to estimate `SD`

of the whole university students’ GPA based on the sample of Business School students’, you need to set `ddof=1`

.

\[\sqrt{\frac{1}{N-ddof} \sum_{i=1}^N (x_i – \overline{x})^2}=\sqrt{\frac{1}{N-1} \sum_{i=1}^N (x_i – \overline{x})^2}\]

## Write `np.std()`

formula from Scratch in Python

We can also check our understanding by writing a function to calculate SD from scratch in Python. The following code writes the standard deviation (SD) fromula in Python from scratch. We can see the output result (i.e., `1.084308455964664`

) is consistent with `np.std(ddof=0) `

or `np.std()`

.

\[\sqrt{\frac{1}{N-ddof} \sum_{i=1}^N (x_i – \overline{x})^2}=\sqrt{\frac{1}{N-0} \sum_{i=1}^N (x_i – \overline{x})^2}\]

```
import numpy as np
mean_number=np.mean(y)
# setting ddof=0
sd_from_scratch1=np.sqrt((1/len(y))*np.sum(np.square(y-mean_number)))
print(sd_from_scratch1)
```

1.084308455964664

The following code reflects the following standard devidation formula, with `ddof = 1`

. As expected, the output is consistent with `np.std(ddof=1)`

(i.e., `1.0897710016498157`

).

\[\sqrt{\frac{1}{N-ddof} \sum_{i=1}^N (x_i – \overline{x})^2}=\sqrt{\frac{1}{N-1} \sum_{i=1}^N (x_i – \overline{x})^2}\]

```
# setting ddof=1
sd_from_scratch2=np.sqrt((1/(len(y)-1))*np.sum(np.square(y-mean_number)))
print(sd_from_scratch2)
```

1.0897710016498157