Register
It is currently Thu Aug 21, 2014 2:00 pm

Parse CSV File


All times are UTC - 6 hours


Post new topic Reply to topic  [ 6 posts ] 
Author Message
 PostPosted: Sat Jul 31, 2010 7:02 pm   

Joined: Sat Jul 31, 2010 6:50 pm
Posts: 5
My task is to parse a csv file and remove certain plugin id's (there are multiples to remove - determined to be false positives) which are located in column one but I still need to keep the first row headings and then write the results minus the plugins removed (entire rows) to a new file.
I also need to parse the same file above for the same plugin id's that are excluded and write all of those to a new file as well.

Any help would be greatly appreciated.

Here is some sample data (nessus scan results):

Code:
"Id","Port","Protocol","Address","Severity","Plugin","Family","FirstSeen","LastSeen","Raw-Offset","Raw-Length","Desc","Plugin Output","CVE","Bugtraq","Conversation"
"10180","0","tcp","xxx.xxx.xxx.xxx","Low","Ping the remote host","Port scanners"," on 2010-07-03"," on 2010-07-03","2953397","110","The remote host is up  "," The remote host replied to a TCP SYN packet (built-in port list). ","NOCVE","NOBID",""
"10114","0","icmp","xxx.xxx.xxx.xxx","Low","ICMP Timestamp Request Remote Date Disclosure","General"," on 2010-07-03"," on 2010-07-03","2952825","572","Synopsis :  It is possible to determine the exact time set on the remote host.  Description :  The remote host answers to an ICMP timestamp request.  This allows an attacker to know the date which is set on your machine.   This may help him to defeat all your time based authentication protocols.  Solution :  Filter out the ICMP timestamp requests (13)  and the outgoing ICMP timestamp replies (14).  Risk factor :  None   "," The remote clock is synchronized with the local clock.  ","CVE-1999-0524","NOBID",""


Here is what I have so far.
Script for stripping certain "Plugin-IDs" from the list:
Code:
# Variables
ORIG=original_file_vulndata.csv
NEW=original_file_vulndata_clean.csv

egrep -v "40802|40803|40804|40805|42119|43875|44643|4550|44060|18405" $ORIG > $NEW


Script for stripping certain "Plugin-IDs" from the list and placing them in their own file:
Code:
ORIG=original_file_vulndata.csv
NEW=original_file_vulndata_exceptions.csv

egrep "Raw-Offset|40802|40803|40804|40805|42119|43875|44643|4550|44060|18405" $ORIG > $NEW


Now the problem, with the code above, is that some rows contain the search string elsewhere in the data set (ie... not just in the plugin-id column) which in turn is removing data that should not be.
I need the script to search only the plugin-id column (first column) for the search strings in question. Sed or awk could do this but I stink at those two.


Top
 Profile  
 PostPosted: Sat Jul 31, 2010 9:01 pm   

Joined: Mon Mar 02, 2009 3:03 am
Posts: 535
Hi,

this should do it
Code:
egrep -v '^"(40802|40803|40804|40805|42119|43875|44643|4550|44060|18405)"' $ORIG > $NEW


Top
 Profile  
 PostPosted: Sun Aug 01, 2010 12:34 am   

Joined: Sat Jul 31, 2010 6:50 pm
Posts: 5
@Watael

Thanks but I got strange results when I tried your suggestion. I ended up going with this:

Code:
egrep -v "^(.40802|.40803|.40804|.40805|.42119|.43875|.44643|.4550|.44060|.18405)" $ORIG > $NEW


I think the " " in the csv file are causing the errors and hence the reason I just used the . as a wildcard.


Top
 Profile  
 PostPosted: Sun Aug 01, 2010 3:59 am   
User avatar

Joined: Sun Jun 27, 2010 12:57 am
Posts: 192
BKJ wrote:
I ended up going with this:
Code:
egrep -v "^(.40802|.40803|.40804|.40805|.42119|.43875|.44643|.4550|.44060|.18405)" $ORIG > $NEW
I think the " " in the csv file are causing the errors and hence the reason I just used the . as a wildcard.

You can put the wildcard character outside the group, ie:
Code:
egrep -v "^.(num1|num2|num3)"

I would suggest looking into Watael's solution though, since a short number like '4550' will in your case also match a larger number like '45500' or '45509'.
Could you please try the following?
Code:
egrep -v "^\"(40802|40803|40804|40805|42119|43875|44643|4550|44060|18405)\"" $ORIG > $NEW

This should better match the numbers, now that they are enclosed in double-quotes.


Top
 Profile  
 PostPosted: Sun Aug 01, 2010 6:27 am   

Joined: Mon Mar 02, 2009 3:03 am
Posts: 535
Hi BKJ,

Quote:
I got strange results when I tried your suggestion

because you didn't copy exactly what I wrote!

it's
single-quote circumflex-accent double-quote ...


Top
 Profile  
 PostPosted: Sun Aug 01, 2010 10:15 am   

Joined: Sat Jul 31, 2010 6:50 pm
Posts: 5
@Watael - I swear I did because it was a copy and paste. :)

@Patsie - Thanks for the suggestion
Code:
egrep -v "^.(num1|num2|num3)"

I did end up using it.

Thank you all for your suggestions.


Top
 Profile  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  


BashScripts | Promote Your Page Too
Powered by phpBB © 2011 phpBB Group
© 2003 - 2011 USA LINUX USERS GROUP