Register
It is currently Mon Dec 22, 2014 11:20 am

pull email from a file


All times are UTC - 6 hours


Post new topic Reply to topic  [ 8 posts ] 
Author Message
 PostPosted: Sat Jun 11, 2011 8:25 pm   

Joined: Thu Feb 04, 2010 8:06 am
Posts: 11
I have a file of several thousand emails, just plain text. I need to sort out emails from certain people. I am trying to pull the relevant emails and put them in a separate file. I can use the following:
Code:
sed -n  '/From:/,/Version:/p' /storage/Tfiles/Tmail.txt >> Redo1/Tfiles/newmail.txt

to get all emails with a From line (All of them) but when I try to use an address like
From: TAB Joe Doe <jdoe@company.com>
I get nothing (TAB means a \t tab) I have tried a regex like From:\t\.*jdoe@company\.com\> but still nothing
I guess I'm getting too tired to see what is wrong. Any help will be appreciated.
Thanks
Jim


Top
 Profile  
 PostPosted: Sat Jun 11, 2011 9:05 pm   
User avatar

Joined: Wed Jun 08, 2011 8:27 am
Posts: 189
Location: outer Shpongolia
Can you please post some sample lines? I don't get the /From:/,/Version:/ part : why do you need to specify a range?


Last edited by jsz on Sun Jun 12, 2011 7:39 am, edited 1 time in total.

Top
 Profile  
 PostPosted: Sat Jun 11, 2011 9:17 pm   

Joined: Thu Feb 04, 2010 8:06 am
Posts: 11
I am specifying a range because that is the only way I know to pull the whole email. From: starts the email and Version: starts the last line of the email. There is no other unique way to identify the end of the email that I know of. There is no unique separator between two emails, just an empty line but there are empty lines in many emails so this is not unique. You see I don't want just the address or header but the whole email. I really hope that there is a better way to do this than my approach. I'm wide open to any suggestions.


Top
 Profile  
 PostPosted: Sat Jun 11, 2011 9:28 pm   
User avatar

Joined: Wed Jun 08, 2011 8:27 am
Posts: 189
Location: outer Shpongolia
Just guessing...

Code:
awk -F '[<>]' '/From:[ \t]/ { print $2 }' /storage/Tfiles/Tmail.txt > Redo1/Tfiles/newmail.txt


But please, show some sample lines that reproduce the format of your file. It would be much easier to give you a proper solution.


Last edited by jsz on Sun Jul 10, 2011 8:06 pm, edited 1 time in total.

Top
 Profile  
 PostPosted: Sat Jun 11, 2011 11:40 pm   

Joined: Thu Feb 04, 2010 8:06 am
Posts: 11
Here are 2 sample emails from the file. Sorry about the redactions but necessary for privacy. As you can see "From:" starts an email an email but can also be part of an email if it has been fw or replied to. Version (anti virus software) ends each mail then I can go to the next From to start the next mail. These are just the top two from the file. All the rest are the same format just varying message length and some with attachments. The file is one continuous text file for about 1500000 lines.
Quote:
From: xxxxxxx xxxxxxx <sxxxxxxx@company.com>
Sent: Monday, November 13, 2006 9:13 AM
To: 'xxxxxxx xxxxxxx'
Subject: RE: Reading Tutor

Is she looking for a volunteer or is this paid?



Hope you are doing ok. Thought about you this weekend.



- xxxxxxx



_____

From: xxxxxxx xxxxxxx [mailto:xxxxxxx@company.com]
Sent: Monday, November 13, 2006 9:06 AM
To: 'xxxxxxx xxxxxxx'
Subject: FW: Reading Tutor



any ideas on the email below?







xxxxxxx J. xxxxxxx


_____

From: xxxxxxx, xxxxxxx [mailto:cxxxxxxx@comp2.com]
Sent: Friday, November 10, 2006 10:40 AM
To: xxxxxxx@company.com
Subject: Reading Tutor

Hello! I am looking for someone who would be interested in volunteering to do in-home literacy tutoring for a woman who is disabled and cannot reasonably get to a tutoring site. This woman has difficulty keeping track of her medical appointments and following medical orders due to her inability to read and write, which results in medical emergencies and hospitalizations.



If you have anyone who could help, it would be tremendously appreciated.



xxxxxxx M. xxxxxxx, LMHC







_____

Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may include member information that is legally privileged. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy copies of the original message. Any views expressed in this message are those of the individual sender, except where the message states otherwise and the sender is authorized to state them to be the views of any such entity.

_____


Internal Virus Database
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.20.4/1275 - Release Date: 2/12/2008 3:20 PM

From: xxxxxxx xxxxxxx <xxxxxxx@company.com>
Sent: Wednesday, November 29, 2006 11:15 AM
To: 'xxxxxxx xxxxxxx'
Subject: RE: xxxxxxx

Very well done



_____

From: xxxxxxx xxxxxxx [mailto:xxxxxxx@company.com]
Sent: Wednesday, November 29, 2006 11:08 AM
To: 'xxxxxxx xxxxxxx'
Subject: xxxxxxx



October and November income.

I like our pattern thus far.. if we can double what we bring in each month.. that would be nice huh?



xxxxxxx



Internal Virus Database
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.20.4/1275 - Release Date: 2/12/2008 3:20 PM


Hope this helps because I need help.
thanks Jim


Top
 Profile  
 PostPosted: Sun Jun 12, 2011 12:02 am   

Joined: Thu Feb 04, 2010 8:06 am
Posts: 11
I tried your awk statement and it worked well for the email addresses. However, my problem is not addresses. I need the whole email. This file is composed of emails sent by many different people. I have to sort out emails "From:" certain individuals and put them (the entire email text and any attachments) into files for each individual. For example the first 95 lines in the sample I provided are one email and must be put in a file for that individual. Any other emails from him must be added to the same file. The second email in the sample is from a different individual and must be stored in a separate file from the first email and anything else from that person added to his file, and on and on for all the emails in the main file. Hope this helps in understanding what I am trying to achieve.


Top
 Profile  
 PostPosted: Sun Jun 12, 2011 8:07 am   

Joined: Thu Feb 04, 2010 8:06 am
Posts: 11
Ok got it figured out. This works by pulling the range between the first and last keywords. As you can see I got the first part of the range to accept a phrase which includes the name being searched for. The key is using the [:space:] class which includes white space of any kind. This took care of that tab before the name which was probably my issue on earlier tries. If anyone wants to clean this up or extend it please feel free.
Code:
sed -n -e '/From:[[:space:]]First Last/,/Version:/p' /storage/Tfiles/Tmail.txt >> Redo1/Tfiles/FLast.txt


Top
 Profile  
 PostPosted: Mon Jun 13, 2011 3:06 pm   
User avatar

Joined: Wed Jun 08, 2011 8:27 am
Posts: 189
Location: outer Shpongolia
You've never mentioned the First Last pattern. And if I got it, your sed(1) line doesn't do what you want.

You want to make a file for each email address found, don't you?


Top
 Profile  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  


BashScripts | Promote Your Page Too
Powered by phpBB © 2011 phpBB Group
© 2003 - 2011 USA LINUX USERS GROUP