Register
It is currently Sat Nov 01, 2014 7:17 am

Link extractor


All times are UTC - 6 hours


Post new topic Reply to topic  [ 3 posts ] 
Author Message
 PostPosted: Tue Dec 02, 2008 11:14 pm   

Joined: Tue Dec 02, 2008 11:06 pm
Posts: 2
Hello all,

First, let me just say I am thrilled to have found this community!

I have what seems like a simple request, but I think it will require some sed-ing or awk-ing, which I am not very good at...

All I want to do is read an html file (pulled down with wget) and pull out only the first URL which matches my critereon (domain of http://www.mininova.org/*). :wink:

I then want to pass that URL to transmissoncli (I can handle that part, I think).

If you can help me out, I'd greatly appreciate it!


Top
 Profile  
 PostPosted: Wed Dec 03, 2008 3:39 pm   

Joined: Tue Dec 02, 2008 11:06 pm
Posts: 2
Got it:

Code:
sed 's/<a href="/\n/g' filename.txt |sed 's/<\/a>/\n/g'|grep -m 1 "^http:"|sed 's/".*//g'|sed 's/?.*//'


A little ugly, but works like a champ.


Top
 Profile  
 PostPosted: Thu Dec 04, 2008 7:24 am   

Joined: Mon Nov 17, 2008 7:25 am
Posts: 221
Code:
file=filename.txt
line=$(grep "http://www.mininova.org/" $file | head -n1)
echo $line | sed -e 's/.*<a href="\(http:.*\)">.*/\1/'


This should work too :P untested thou :P

Best regards
Fredrik Eriksson


Top
 Profile  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
cron


BashScripts | Promote Your Page Too
Powered by phpBB © 2011 phpBB Group
© 2003 - 2011 USA LINUX USERS GROUP