In order to get the mean value of column 1 (or any other) you type:
$ awk 'BEGIN{s=0;}{s=s+$1;}END{print s/NR;}' file
In order to get the standard deviation of column 1 you type:
$ awk '{sum+=$1; sumsq+=$1*$1} END {print sqrt(sumsq/NR - (sum/NR)^2)}' file
or
$ awk '{delta = $1 - avg; avg += delta / NR; mean2 += delta * ($1 - avg); } END { print sqrt(mean2 / NR); }' file
The second option is working better with large numbers of data, without having the possibility for overflow.
Sources: utah.edu/awk , commandlinefu.com/standard deviation with awk
I replaced NR with (NR-1) to get the right result
awk ‘{delta = $1 – avg; avg += delta / NR; mean2 += delta * ($1 – avg); } END { print sqrt(mean2 / (NR-1)); }’ file
Indeed! Thanks for the input.
That’s only if it’s a “sample” standard deviation (i.e. the data does not represent the entire dataset).
Indeed! The (N-1) is the sample standard deviation when we do not know the population, while putting simply N means that we know all the examined population.
Thanks for pointing out, as this was written a lot of time ago!