Weird awk error when messing around with making a GFF from a TXT file
23 January, 2017
Strange error when trying to mess around with a text file created from STATA in Windows. When using awk to create the 9 column GFF file to use with SignalMap awk goes a little weird. The original data looks like this;
CHR1 101 .17989999
CHR1 151 .083400011
CHR1 301 -.125
CHR1 451 0
CHR1 501 .16670001
CHR1 601 .69999999
CHR1 651 .33329999
CHR1 751 .75
CHR1 801 0
CHR1 901 .25099999
And when you try to create a GFF file using awk, it goes weird like this;
[[email protected]]$ head Sample.txt |awk '{if($3>=0) print $1"\t.\tSAMPLE\t"$2"\t"$2+49"\t"$3"\t.\t.\t."}'
CHR1 . .AMPLE .01 150 .17989999
CHR1 . .AMPLE .51 200 .083400011
CHR1 . .AMPLE .51 500 0
CHR1 . .AMPLE .01 550 .16670001
CHR1 . .AMPLE .01 650 .69999999
CHR1 . .AMPLE .51 700 .33329999
CHR1 . .AMPLE .51 800 .75
CHR1 . .AMPLE .01 850 0
CHR1 . .AMPLE .01 950 .25099999
After 10 mins of banging my head on the table I realised that it was probably something to do with Windows/Unix formatting. So this solved it;
[[email protected]]$ dos2unix -n sample.txt sample_new.txt
[[email protected]]$ head sample_new.txt |awk '{if($3>=0) print $1"\t.\tSAMPLE\t"$2"\t"$2+49"\t"$3"\t.\t."}'
CHR1 . SAMPLE 101 150 .17989999 . .
CHR1 . SAMPLE 151 200 .083400011 . .
CHR1 . SAMPLE 451 500 0 . .
CHR1 . SAMPLE 501 550 .16670001 . .
CHR1 . SAMPLE 601 650 .69999999 . .
CHR1 . SAMPLE 651 700 .33329999 . .
CHR1 . SAMPLE 751 800 .75 . .
CHR1 . SAMPLE 801 850 0 . .
CHR1 . SAMPLE 901 950 .25099999 . .
Comments
comments powered by Disqus