Register
It is currently Tue Apr 15, 2014 10:47 pm

Pattern matching


All times are UTC - 6 hours


Post new topic Reply to topic  [ 22 posts ] 
Author Message
 PostPosted: Tue Jan 18, 2011 6:27 am   

Joined: Tue Jan 18, 2011 6:18 am
Posts: 14
Hi,
I need some help with a pretty basic script.
I'm trying to monitor a text file (non standard log file actually) for a pattern that DOESN'T match another.
Eg.
My log looks similar to this:
INFO [ajp-8009-532] [3B3B05] 05 Jan 2011 15:18:06,175 txtype=null type=null ddx=null format=XXX XXX XXX XXX ssx=14345

I need to ensure "format" always equals XXX XXX XXX XXX and if at anytime it doesn't I need to send an email reporting the name of the file and line number(s) it was found on.

I've been using grep and awk but I seem to be making this far to complicated than I think it should be and can be done in much fewer lines.

Does anyone have any suggestion how I might tackle this task?


Top
 Profile  
 PostPosted: Tue Jan 18, 2011 8:37 am   

Joined: Mon Mar 02, 2009 3:03 am
Posts: 511
Hi,

this could be easily transposed to only awk:
Code:
#!/bin/bash

logFile= #put your own value
mailSubject= #put your own value
yourUser= #put your own value
yourMailServer= #put your own value

while read -r logLine
do ((++nbline))
   format="${logLine#*format=}"
   format="${format% *}"
   if [[ ! $format =~ ^.{3}\ .{3}\ .{3}\ .{3}$ ]]
   then echo "${logFile}: Error line n° $nbLine"
   fi
done < "$logFile" | mail -s "${mailSubject}" ${yourUser}@${yourMailServer}


Last edited by Watael on Tue Jan 18, 2011 10:04 am, edited 1 time in total.

Top
 Profile  
 PostPosted: Tue Jan 18, 2011 8:57 am   

Joined: Tue Jan 18, 2011 6:18 am
Posts: 14
Wow.
You make me feel stupid.
Very elegant compared to my useless attempt.
Thank you.

I'm basically trying to ensure that format=XXX XXX XXX XXX (This is basically masking some numbers)
instead exposed numbers format=1234 5678 9012 345 (It must always be masked)
How would I define this in your example?


Top
 Profile  
 PostPosted: Tue Jan 18, 2011 10:03 am   

Joined: Mon Mar 02, 2009 3:03 am
Posts: 511
is it always three numbers with four digits, and one with three digits ?
Code:
[[ ! $format =~ ^[0-9]{4}\ [0-9]{4}\ [0-9]{4}\ [0-9]{3}$ ]]


man bash says:
Quote:
When =~ is used with [[, the string to the right of the operator is considered an extended regular expression and matched accordingly (as in regex(3))


Top
 Profile  
 PostPosted: Tue Jan 18, 2011 10:29 am   

Joined: Tue Jan 18, 2011 6:18 am
Posts: 14
Sorry my mistake (Typo)
it must always be
XXXX XXXX XXXX XXX (4,4,4,3)
It must always be X's and never numbers.
If it's X's as above then all is good
if it contains any number I need to be alerted of the logfile name and line number


Top
 Profile  
 PostPosted: Tue Jan 18, 2011 11:10 am   

Joined: Mon Mar 02, 2009 3:03 am
Posts: 511
then simply replace every [0-9] with X


Top
 Profile  
 PostPosted: Wed Jan 19, 2011 2:56 am   

Joined: Tue Jan 18, 2011 6:18 am
Posts: 14
Hi,
I have done as you suggested but I'm just getting:
/var/log/gw.log: Error line no
This is echoed many times, I suspect the number off times it finds and error
My script looks like this: I've taken out the email part for now.
I'm running it against a log that has no errors for now, so no errors should be detected.
Code:
#
!/bin/bash

logFile=/var/log/gw.log
mailSubject=Alert
yourUser=john
yourMailServer=mail.domain.tld

while read -r logLine
do ((++nbline))
   format="${logLine#*format=}"
   format="${format% *}"
   if [[ ! $format=~ ^[X]{4}\ [X]{4}\ [X]{4}\ [X]{3}$ ]]
   then echo "${logFile}: Error line n° $nbLine"
   fi
done < "$logFile"


Top
 Profile  
 PostPosted: Wed Jan 19, 2011 3:43 am   

Joined: Mon Mar 02, 2009 3:03 am
Posts: 511
i've told to replace [0-9] with X :)
Code:
if [[ ! $format =~ ^X{4}\ X{4}\ X{4}\ X{3}$ ]]
#              ^space is important
then echo "${logFile}: Error line n° $nbLine"
fi

it should be:
Code:
((++nbLine))
my bad


Top
 Profile  
 PostPosted: Wed Jan 19, 2011 4:08 am   

Joined: Tue Jan 18, 2011 6:18 am
Posts: 14
Thanks for the update but it's still not working.
I have:
Code:
#!/bin/bash

logFile=/srv/gw.log
mailSubject=Alert
yourUser=john
yourMailServer=mail.domain.tld

while read -r logLine
do ((++nbLine))
   format="${logLine#*format=}"
   format="${format% *}"
   if [[ ! $format=~ ^X{4}\ X{4}\ X{4}\ X{3}$ ]]
   then echo "${logFile}: Error line no: ((++nbLine))"
   fi
done < "$logFile"


What does LogLine (while read -r logLine) do?


Top
 Profile  
 PostPosted: Wed Jan 19, 2011 4:24 am   

Joined: Mon Mar 02, 2009 3:03 am
Posts: 511
how does it not work? does it through any error message?

you still don't have space between $format and =~

if you want to see what's happening, add set -x under the shebang


Top
 Profile  
 PostPosted: Wed Jan 19, 2011 5:04 am   

Joined: Tue Jan 18, 2011 6:18 am
Posts: 14
Hi Apologies,
The space is there I must have cocked it up when xfering to the forum somehow.
I left only one line in the log.
I ran bash -x test.sh, it appears to be trying to match the whole line after format=

Code:
bash -x test.sh
+ logFile=/srv/gw.log
+ mailSubject=Alert
+ yourUser=john
+ yourMailServer=mail.domain.tld
+ read -r logLine
+ (( ++nbLine ))
+ format='XXXX XXXX XXXX XXX startdate=XXXX expirydate=XXXX issuenumber=XX cardtype=null cardholder=null cv2=XXXX threedsecurestatus=null cavv=null eci=null xid=null clientipaddress=null paypalcallbackurl=null'
+ format='XXXX XXXX XXXX XXX startdate=XXXX expirydate=XXXX issuenumber=XX cardtype=null cardholder=null cv2=XXXX threedsecurestatus=null cavv=null eci=null xid=null clientipaddress=null'
+ [[ ! XXXX XXXX XXXX XXX startdate=XXXX expirydate=XXXX issuenumber=XX cardtype=null cardholder=null cv2=XXXX threedsecurestatus=null cavv=null eci=null xid=null clientipaddress=null =~ ^X{4} X{4} X{4} X{3}$ ]]
+ echo '/srv/gw.log: Error line no: ((++nbLine))'
/srv/gw.log: Error line no: ((++nbLine))
+ read -r logLine
+ (( ++nbLine ))
+ format=
+ format=
+ [[ ! '' =~ ^X{4} X{4} X{4} X{3}$ ]]
+ echo '/srv/gw.log: Error line no: ((++nbLine))'
/srv/gw.log: Error line no: ((++nbLine))


Top
 Profile  
 PostPosted: Wed Jan 19, 2011 6:25 am   

Joined: Mon Mar 02, 2009 3:03 am
Posts: 511
there was not supposed to be anything after ssx=142345, but there may be something, so:
Code:
#!/bin/bash

logFile=/srv/gw.log
mailSubject=Alert
yourUser=john
yourMailServer=mail.domain.tld

while read -r logLine
do ((++nbLine)) #that you'll get through $nbLine
   format="${logLine#*format=}"
#   format="${format% *}" #not used anymore
   if [[ ! $format =~ ^X{4}\ X{4}\ X{4}\ X{3}\  ]] #suppress dollar $ sign, add an antislash and a space
   then echo "${logFile}: Error line no: $nbLine"
   fi
done < "$logFile"


Top
 Profile  
 PostPosted: Wed Jan 19, 2011 7:32 am   

Joined: Tue Jan 18, 2011 6:18 am
Posts: 14
I am now getting the error line number but it still appears to be attempting to match from XXX to the end of the line

Code:
bash -x test.sh
+ logFile=/srv/gw.log
+ mailSubject=Alert
+ yourUser=john
+ yourMailServer=mail.domain.tld
+ read -r logLine
+ (( ++nbLine ))
+ format='XXXX XXXX XXXX XXX startdate=XXXX expirydate=XXXX issuenumber=XX cardtype=null cardholder=null cv2=XXXX threedsecurestatus=null cavv=null eci=null xid=null clientipaddress=null paypalcallbackurl=null'
+ [[ ! XXXX XXXX XXXX XXX startdate=XXXX expirydate=XXXX issuenumber=XX cardtype=null cardholder=null cv2=XXXX threedsecurestatus=null cavv=null eci=null xid=null clientipaddress=null paypalcallbackurl=null =~ ^X{4} X{4} X{4} X{3}  ]]
+ read -r logLine
+ (( ++nbLine ))
+ format=
+ [[ ! '' =~ ^X{4} X{4} X{4} X{3}  ]]
+ echo '/srv/gw.log: Error line no: 2'
/srv/gw.log: Error line no: 2
+ read -r logLine


Top
 Profile  
 PostPosted: Wed Jan 19, 2011 4:00 pm   
User avatar

Joined: Sun Jun 27, 2010 12:57 am
Posts: 192
My suggestion:
Code:
awk '{
  if ( match($0, /format=(.+) ssx=/, format) && format[1] !~ /XXXX XXXX XXXX XXX/ )
    printf("Error in file %s on line #%d \"%s\"\n", FILENAME, NR, format[1]) | "mailx -s \"Error found\" john.doe@example.com";
}' "logfile.txt"

You can ofcourse remove the pipe through mailx to get output to stdout, instead of your mailbox :)


Top
 Profile  
 PostPosted: Wed Jan 19, 2011 9:23 pm   

Joined: Mon Mar 02, 2009 3:03 am
Posts: 511
I've got the feeling that I complicated my script a bit
here's a little simplification :
Code:
#!/bin/bash

logFile=/srv/gw.log
mailSubject=Alert
yourUser=john
yourMailServer=mail.domain.tld

while read -r logLine
do ((++nbLine))
   if [[ $logLine != *format=XXXX\ XXXX\ XXXX\ XXX\ * ]]
# now, it really tests the whole line against right hand expression
   then echo "${logFile}: Error line no: $nbLine"
   fi
done < "$logFile"


Top
 Profile  
 PostPosted: Wed Jan 19, 2011 11:53 pm   
User avatar

Joined: Sun Jun 27, 2010 12:57 am
Posts: 192
But will this not send an error too on lines that contain log entries from completely different applications? This will work nicely if there is absolutely no other data in the logfile except for the lines you want to check.
If the logfile contains more than that, you really need 2 checks (is this a line from the app I want to check and does the format match)


Top
 Profile  
 PostPosted: Thu Jan 20, 2011 10:10 am   

Joined: Tue Jan 18, 2011 6:18 am
Posts: 14
Thanks for all your suggestions.
My log file has many items on one line.
I need to ensure that format=XXXX XXXX XXXX XXX
if it contains anything other than X's. eg numbers then it needs to alert.
All other text on the line can be ignored or must not be matched


Top
 Profile  
 PostPosted: Thu Jan 20, 2011 1:21 pm   
User avatar

Joined: Sun Jun 27, 2010 12:57 am
Posts: 192
have you tried the awk lines in my suggestion? because as far as I know and until proven otherwise, that should work.


Top
 Profile  
 PostPosted: Fri Jan 21, 2011 3:07 am   

Joined: Tue Jan 18, 2011 6:18 am
Posts: 14
No Sorry I haven't yet.
I was on the train as I wrote my reply yesterday but will try your suggestion now.


Top
 Profile  
 PostPosted: Fri Jan 21, 2011 9:44 am   

Joined: Tue Jan 18, 2011 6:18 am
Posts: 14
Hi and thanks again for your suggestions.
Watael - your script is still trying to match the whole line but it reports as expected
Patsie - yours works perfectly!!! YAH!!!

I owe you both a beer for your help


Top
 Profile  
 PostPosted: Thu Jan 27, 2011 5:20 am   

Joined: Tue Jan 18, 2011 6:18 am
Posts: 14
I'de like to extend this script to match a number of patterns.
Basically I need to check that certain fields are not exposed and are masked from view.
I need to check that this is the case otherwise I need to be notified of the file name and line number.
The awk script above works perfectly for one pattern but I need to extend it.
I have unfortunately tried and failed and was hoping for some assistance.
These are the fields I need to check (They need to be this or else there is a problem)

cardnumber=XXXX XXXX XXXX XXX
cv2=XXXX
<pan>xxxxxxxxxxx</pan>

Here is 2 sample log entries:

INFO [ajp-8009-264] [049C74B9-1A41-3D21-E4C6-6D1F077CE69C] 27 Jan 2011 08:18:52,709 (BeanHelper.java:98) - vpsprotocol=2.22 txtype=PAYMENT paymenttype=PAYMENT vendor=acme vendorname=null accounttype=null username=null vendortxcode=ABC1234 amount=111.22 currency=GBP description=Loan Repayment vspterminal=null address=null postcode=null billingpostcode=XYZ 123 billingaddress=123 test street deliveryaddress=null deliverypostcode=null contactnumber=null contactfax=null customeremail=null basket=null allowgiftaid=0 applyavscv2=1 billingsurname=null billingfirstnames=null billingaddress1=null billingaddress2=null billingcity=null billingcountry=null billingstate=null billingphone=null deliverysurname=null deliveryfirstnames=null deliveryaddress1=null deliveryaddress2=null deliverycity=null deliverycountry=null deliverystate=null deliveryphone=null referrerid=null billingagreement=null token=null storetoken=null vendorname=accessmu clientnumber=null cardnumber=XXXX XXXX XXXX XXX startdate=XXXX expirydate=XXXX issuenumber=XX cardtype=VISA cardholder=Mr John Doe cv2=XXXX threedsecurestatus=null cavv=null eci=null xid=null clientipaddress=null paypalcallbackurl=null
INFO [Thread-12345] [] 27 Jan 2011 08:34:48,193 (SSLPost.java:115) - VEREQ :<ThreeDSecure><Message id="1234567899776543"><VEReq><version>1.0.2</version><pan>xxxxxxxxxxx</pan><Merchant><acqBIN>12345</acqBIN><merID>12345678901234</merID><password>password</password></Merchant><Browser><deviceCategory>0</deviceCategory><accept></accept><userAgent></userAgent></Browser></VEReq></Message></ThreeDSecure>


Top
 Profile  
 PostPosted: Thu Jan 27, 2011 6:43 am   

Joined: Tue Jan 18, 2011 6:18 am
Posts: 14
In fact I think it would be easier if I just searched for a pattern that matches a credit card number.


Top
 Profile  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 22 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: Bing [Bot] and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  


BashScripts | Promote Your Page Too
Powered by phpBB © 2011 phpBB Group
© 2003 - 2011 USA LINUX USERS GROUP