Mean value and standard deviation of a column using awk

In order to get the mean value of column 1 (or any other) you type:
$ awk 'BEGIN{s=0;}{s=s+$1;}END{print s/NR;}' file

In order to get the standard deviation of column 1 you type:
$ awk '{sum+=$1; sumsq+=$1*$1} END {print sqrt(sumsq/NR - (sum/NR)^2)}' file

or

$ awk '{delta = $1 - avg; avg += delta / NR; mean2 += delta * ($1 - avg); } END { print sqrt(mean2 / NR); }' file

The second option is working better with large numbers of data, without having the possibility for overflow.

Sources: utah.edu/awk , commandlinefu.com/standard deviation with awk

2 Comments

  1. Comment by Jeff Tjon:

    I replaced NR with (NR-1) to get the right result

    awk ‘{delta = $1 – avg; avg += delta / NR; mean2 += delta * ($1 – avg); } END { print sqrt(mean2 / (NR-1)); }’ file

  2. Comment by grigoris:

    Indeed! Thanks for the input.

Leave a Reply

Your email address will not be published. Required fields are marked *