Tag Archive

amateur astronomy awk bash b[e] supergiant cartoon conference convert evolved star exoplanet fedora figaro fits fun galaxy history iraf large magellanic cloud latex linux lmc machine learning magellanic clouds massive star matplotlib meteor mypaper paper peblo photometry planet pro-am pyraf python red supergiant scisoft skinakas observatory small magellanic cloud smc spectroscopy starlink talk ubuntu university of crete video x-ray

Merging catalogs and creating unique identifier in bash

For a certain project I had created a number of photometric catalogs, each one corresponding to a specific observing field. I would like to construct the final (merged) one but for this I needed to add a unique source identifier at the beginning of each row. I decided to create a F#-**** tag for each source with “F#” corresponding to the field id and **** to a counter for each source per field. The final command was:

for i in {1,2,4,5,6,7,8,9,10,11,12,13,16};do echo F$i.matches.all.cat;awk -v id="$i" 'FNR>1 {print "F"id"-"1+c++, $0}' F$i.matches.all.cat >> results.tmp; done

So the command reads all the specific numbers for which a catalog with a filename of F*.matches.all.cat exists. The number of each field ($i) is parsed as an external variable (id) to awk which places it as the unique identifier “Fid-counter” with the incremental “counter” (1+c++) corresponding actually to the number of row (1+counter to begin from 1 instead of 0 – FNR avoids the first line of each catalog which is a column description). All results are written appended to the output file results.tmp (created automatically when non-existing).

Then, we can use sed to add the header:

sed -i '1i\#SourceID ...' results.tmp

Log and awk

When using the log function of awk, then what we get as a result is the natural logarithmic of the input, like:

...$ awk 'BEGIN{print log(100)}'
...$ 4.60517

So in order to obtain the logarithm of base 10 (or any other base), we just need to divide the result with the logarithm of the base, like:

...$ awk 'BEGIN{print log(100)}/log(10)'
...$ 2

External variable inside awk

Sometimes, a variable may be needed to get inside awk (for or while loop for example). In order to pass the variable inside awk, include it like:
awk -v i=$i '{ .. }' test

Mean value and standard deviation of a column using awk

In order to get the mean value of column 1 (or any other) you type:
$ awk 'BEGIN{s=0;}{s=s+$1;}END{print s/NR;}' file

In order to get the standard deviation of column 1 you type:
$ awk '{sum+=$1; sumsq+=$1*$1} END {print sqrt(sumsq/NR - (sum/NR)^2)}' file


$ awk '{delta = $1 - avg; avg += delta / NR; mean2 += delta * ($1 - avg); } END { print sqrt(mean2 / NR); }' file

The second option is working better with large numbers of data, without having the possibility for overflow.

Sources: utah.edu/awk , commandlinefu.com/standard deviation with awk