Mean value and standard deviation of a column using awk

In order to get the mean value of column 1 (or any other) you type:
$ awk 'BEGIN{s=0;}{s=s+$1;}END{print s/NR;}' file

In order to get the standard deviation of column 1 you type:
$ awk '{sum+=$1; sumsq+=$1*$1} END {print sqrt(sumsq/NR - (sum/NR)^2)}' file

or

$ awk '{delta = $1 - avg; avg += delta / NR; mean2 += delta * ($1 - avg); } END { print sqrt(mean2 / NR); }' file

The second option is working better with large numbers of data, without having the possibility for overflow.

Sources: utah.edu/awk , commandlinefu.com/standard deviation with awk

4 Comments

  1. Comment by Jeff Tjon:

    I replaced NR with (NR-1) to get the right result

    awk ‘{delta = $1 – avg; avg += delta / NR; mean2 += delta * ($1 – avg); } END { print sqrt(mean2 / (NR-1)); }’ file

  2. Comment by grigoris:

    Indeed! Thanks for the input.

  3. Comment by Antonio Sanchez:

    That’s only if it’s a “sample” standard deviation (i.e. the data does not represent the entire dataset).

  4. Comment by grigoris:

    Indeed! The (N-1) is the sample standard deviation when we do not know the population, while putting simply N means that we know all the examined population.
    Thanks for pointing out, as this was written a lot of time ago!

Leave a Reply

Your email address will not be published. Required fields are marked *