Hi!
How're you getting your data? Is it handed to you in a file, or can you choose how it's parsed from the web page? The problem with what we're working with here is that it's all split up on separate lines. If you could parse the page with something like this, it would grab each "a" tag individually (and dump into a file):
Code:
lynx -source google.com|grep -Po \<a.*?\/a\> > dump.txt
I'm intrigued by fredrik's regex in his sed syntax, and I'm going to have to poke at it and see how it works

In the meantime, I typically use perl for something like this, because sed and awk do "greedy" matching, which makes it hard to parse between two tags (like <a and /a>) which is actually why I used "-P" in the grep statement above to use perl style regex. Here's something quick and dirty (the regex would need a lot of refinement for anything beyond a basic example... it even misses some things in my google example, but parses your output perfectly as long as the tags are all on one line):
Code:
open(FILE, "dump.txt");
while($line = <FILE>){
$line =~ /<a.href="(.*?)".*?>(.*?)<.*$/;
print "$2\n$1\n\n";
}
close(FILE);'
Hope this helps?... I need to learn more about posix regex
-J