Register
It is currently Sun Apr 22, 2018 9:51 pm

Compare datat in four consecutive line using awk


All times are UTC - 6 hours


Post new topic Reply to topic  [ 2 posts ] 
Author Message
 PostPosted: Fri May 13, 2016 3:20 pm   

Joined: Fri May 13, 2016 3:19 pm
Posts: 1
Hi

I have a trajectory file which looks like below:
3LIG C1 1 -0.531 3.372 3.189
3LIG C1 2 -0.598 3.333 3.325
3LIG O2 3 -0.521 3.246 3.124
3LIG O2 6 -0.596 3.194 3.331
4LIG C1 12 -0.471 -1.170 3.326
4LIG C1 13 -0.483 -1.195 3.179
4LIG O2 14 -0.533 -1.043 3.347
4LIG O2 17 -0.589 -1.105 3.143
14LIG C1 23 3.300 -1.089 3.161
14LIG C1 24 3.279 -0.942 3.180
14LIG O2 25 3.258 -1.145 3.277
14LIG O2 28 3.236 -0.925 3.312
15LIG C1 34 1.808 3.160 3.227
15LIG C1 35 1.722 3.285 3.230
15LIG O2 36 1.933 3.216 3.240
15LIG O2 39 1.792 3.386 3.178
16LIG C1 45 -3.325 -0.288 3.188
16LIG C1 46 -3.197 -0.365 3.199
16LIG O2 47 -3.276 -0.156 3.176
16LIG O2 50 -3.114 -0.297 3.114
19LIG C1 56 -3.643 -0.138 3.289
19LIG C1 57 -3.616 0.009 3.313
19LIG O2 58 -3.575 -0.193 3.402
19LIG O2 61 -3.492 0.018 3.378
22LIG O2 67 -4.063 -2.776 3.958
26LIG C1 72 -1.888 -3.464 3.919
29LIG C1 75 1.965 4.140 5.273
29LIG O2 76 2.085 4.063 5.253
29LIG O2 78 2.054 4.159 5.054
31LIG C1 81 -3.715 -0.470 3.157
31LIG C1 82 -3.731 -0.522 3.297
31LIG O2 83 -3.794 -0.567 3.094
31LIG O2 86 -3.867 -0.562 3.303
33LIG C1 92 -2.117 4.064 3.277
33LIG C1 93 -1.987 4.078 3.354
33LIG O2 94 -2.068 4.043 3.145
33LIG O2 97 -1.890 4.091 3.254
35LIG C1 103 -1.360 -1.957 3.171
35LIG C1 104 -1.351 -1.970 3.325
35LIG O2 105 -1.226 -1.939 3.132
35LIG O2 108 -1.216 -2.019 3.338
36LIG C1 114 -3.480 -4.514 3.349
36LIG C1 115 -3.332 -4.523 3.349
36LIG O2 116 -3.507 -4.397 3.273
36LIG O2 118 -3.288 -4.446 3.241
42LIG C1 120 0.413 -2.912 3.190
42LIG C1 121 0.438 -2.781 3.124
42LIG O2 122 0.529 -2.923 3.272
42LIG O2 125 0.578 -2.785 3.098
47LIG C1 131 -2.571 -0.985 3.402
47LIG C1 132 -2.448 -0.902 3.413
47LIG O2 133 -2.620 -0.955 3.271
47LIG O2 136 -2.409 -0.890 3.281

Now there is a repetition of set of 4 lineswhich have their first column values to be exactly same. There are a certain line which do not follow this pattern.

How do I get rid of these lines not following the pattern using awk or sed.

Kindly help me out with this.

Thanks in advance. bigsmile

- Aniruddha M Dive


Top
 Profile  
 PostPosted: Mon May 16, 2016 11:49 am   

Joined: Mon Oct 20, 2014 9:53 am
Posts: 574
Code:
data_file=./datafile


sort $data_file | \
   awk 'BEGIN { counter = 1
                getline
                array_records[counter] = $0
                last_record = $1
              }
       
         { if (last_record == $1 ) {
             array_records[counter] =  $0
             counter++
          }
          else {
             if ( length(array_records) >= 4) {
                  for ( e in array_records)
                      print array_records[e]
                  counter = 1; delete array_records
                  }
             array_records[counter] =  $0
             last_record = $1
             counter++
          }
         
         }
       '
First we sort the datafile. May be not necessary.
The result we pipe to awk. The awk script consists of two blocks.
First block BEGIN{...} initializes a counter, reads the first line with getline and saves the line into the array array_records. Finally the first field $1 is saved into the var last_record, which we use later to compare.

The second block starts with an if. As long as the compare-fields are the same we just increase the counter, save record to the array.
If the read record differs from the last one, we ask if the counter equals 4. And if this is the case, we print the array in a for loop each entry on a line.
Else we reinit the vars and keep going.

Note that the function length(someArray) is specific to gawk.
It does not work in other awk's. In this case just write an array count function.


Top
 Profile  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 2 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 14 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  


BashScripts | Promote Your Page Too
Powered by phpBB © 2011 phpBB Group
© 2003 - 2011 USA LINUX USERS GROUP