Register
It is currently Fri Oct 24, 2014 10:50 am

extract text using sed


All times are UTC - 6 hours


Post new topic Reply to topic  [ 31 posts ]  Go to page 1, 2  Next
Author Message
 PostPosted: Sun Feb 06, 2011 2:42 pm   

Joined: Fri May 16, 2008 4:58 am
Posts: 94
I'm using curl to get the below output but only want to get the output from in between the <LocalityName></LocalityName> tags. I can wittle it down using nawk 'NR==10' but would prefer using sed or similar, can anyone help?

Code:
<?xml version="1.0" encoding="UTF-8" ?>
<kml xmlns=""><Response>
  <name></name>
  <Status>
    <code>200</code>
    <request>geocode</request>
  </Status>
  <Placemark id="p1">
    <address></address>
    <AddressDetails Accuracy="5" xmlns="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0"><Country><CountryNameCode></CountryNameCode><CountryName>UK</CountryName><Locality><LocalityName>EXTRACT</LocalityName><PostalCode><PostalCodeNumber></PostalCodeNumber></PostalCode></Locality></Country></AddressDetails>
    <ExtendedData>
      <LatLonBox north="0000000000" south="00000000" east="00000000" west="000000" />
    </ExtendedData>
    <Point><coordinates>00000000</coordinates></Point>
  </Placemark>
</Response></kml>


Top
 Profile  
 PostPosted: Sun Feb 06, 2011 4:29 pm   

Joined: Sat Jan 16, 2010 5:53 pm
Posts: 1
Hi,
I'm sure there's a more elegant way to do this but this is the first thing that popped into my head and it should get the job done for now. In a nutshell, the grep statement below grabs only the lines that contain "<LocalityName>" and then that output is piped into sed, which deletes everything from the beginning of the line (^) to "<LocalityName>" and then from "</LocalityName>" (in which the "/" is escaped by preceding it with a "\") to the end of the line.

$ grep "<LocalityName>" filename.xml | sed -r 's/^.*<LocalityName>//g; s/<\/LocalityName>.*$//g'


Top
 Profile  
 PostPosted: Sun Feb 06, 2011 11:36 pm   
User avatar

Joined: Sun Jun 27, 2010 12:57 am
Posts: 192
Print only what is between the 2 named tags:
Code:
sed -n 's/.*<LocalityName>\(.*\)<\/LocalityName>.*/\1/p'


Top
 Profile  
 PostPosted: Mon Feb 07, 2011 4:14 am   

Joined: Fri May 16, 2008 4:58 am
Posts: 94
Patsie wrote:
Print only what is between the 2 named tags:
Code:
sed -n 's/.*<LocalityName>\(.*\)<\/LocalityName>.*/\1/p'


works a treat thanks,

i'm extracting gps info using

Code:
sed -rn '/<trkpt lat="/{s%.* lat="([^"]+)" lon="([^"]+)">.*%http://maps.google.com/maps?q=\1,\2%;p}'


which returns http://maps.google.com/maps?q=0.0000000,-0.000000

after the last 0 I'd like it to add &output=xml so can anybody help me teak the above code to do this?

using the returned url i'd like to be able to extrack the Locality info ie

Code:
curl -s "URL" sed -n 's/.*<LocalityName>\(.*\)<\/LocalityName>.*/\1/p'


and finally put it into a script ie

GPS="blah blah"
echo "$GPS"


Top
 Profile  
 PostPosted: Fri Feb 11, 2011 10:29 am   

Joined: Fri May 16, 2008 4:58 am
Posts: 94
anyone?


Top
 Profile  
 PostPosted: Fri Feb 11, 2011 1:00 pm   
Moderator
User avatar

Joined: Wed May 03, 2006 2:05 pm
Posts: 242
Hi BrianUK!

I think I might be missing something here. Could we see some sample data for the file or output you're using to generate the URL?

Thanks!
-Jeo


Top
 Profile YIM  
 PostPosted: Sat Feb 12, 2011 3:22 pm   

Joined: Fri May 16, 2008 4:58 am
Posts: 94
In bash using my gps location software along with sed I'm returned

http://maps.google.com/maps/maps?q=00.0000,-0.00000

at the end of this url i'd like to add &output=xml

From the output of that url i'd like to run this command

Code:
curl -s "URL" sed -n 's/.*<LocalityName>\(.*\)<\/LocalityName>.*/\1/p'


which would give me for example 'London'


Top
 Profile  
 PostPosted: Sun Feb 13, 2011 1:12 am   
User avatar

Joined: Sun Jun 27, 2010 12:57 am
Posts: 192
So, lets assume you google-maps URL is stored in $URL
you just run:
Code:
curl -s "${URL}&output=xml" | sed -n 's/.*<LocalityName>\(.*\)<\/LocalityName>.*/\1/p'

And you should be done, right?


Top
 Profile  
 PostPosted: Sun Feb 13, 2011 5:56 am   

Joined: Fri May 16, 2008 4:58 am
Posts: 94
getting close

Code:
#!/bin/bash
URL=gps | sed -rn '/<trkpt lat="/{s%.* lat="([^"]+)" lon="([^"]+)">.*%http://maps.google.com/maps/map?q=\1,\2%;p}'

LOCATION=curl -s "${URL}&output=xml" | sed -n 's/.*<LocalityName>\(.*\)<\/LocalityName>.*/\1/p'

while true
do

echo "$LOCATION"


Would this be the right way to do it?

./gps: line 4: -s: command not found


Top
 Profile  
 PostPosted: Sun Feb 13, 2011 7:40 am   
User avatar

Joined: Sun Jun 27, 2010 12:57 am
Posts: 192
If you want to put the output of a command in a variable, use something line this:
Code:
var=$(command -args)

The way you do it, var=command -args, doesn't work.


Last edited by Patsie on Sun Feb 13, 2011 7:42 am, edited 1 time in total.

Top
 Profile  
 PostPosted: Sun Feb 13, 2011 7:41 am   
User avatar

Joined: Sun Jun 27, 2010 12:57 am
Posts: 192
double post, sorry.


Top
 Profile  
 PostPosted: Sun Feb 13, 2011 8:08 am   

Joined: Fri May 16, 2008 4:58 am
Posts: 94
now i'm confused ha, is this for the URL or location bit? could you place it into the script for me?


Top
 Profile  
 PostPosted: Mon Feb 14, 2011 8:06 am   
Moderator
User avatar

Joined: Wed May 03, 2006 2:05 pm
Posts: 242
What Patsie is saying is that in order to store the output of a command in a variable, you have to surround the command in $() or ``

So for our example:
Code:
# In your script, you're doing this, which will not work:
VARIABLE=commands to execute here

# The proper way looks more like this:
VARIABLE=$(commands to execute here)

# Here's an alternative way, with `back-ticks`
# But the one above is usually preferred
VARIABLE=`commands to execute here`

# To use YOUR script as an example:
URL=$(gps | sed -rn '/<trkpt lat="/{s%.* lat="([^"]+)" lon="([^"]+)">.*%http://maps.google.com/maps/map?q=\1,\2%;p}')
LOCATION=$(curl -s "${URL}&output=xml" | sed -n 's/.*<LocalityName>\(.*\)<\/LocalityName>.*/\1/p')


Everything inside "$()" will be executed, and the ouptut will be stored in the variable you specified.

I hope this helps!
-J


Top
 Profile YIM  
 PostPosted: Mon Feb 14, 2011 2:13 pm   

Joined: Fri May 16, 2008 4:58 am
Posts: 94
Ahh I see what you mean I remember i've previously used back ticks and forgot them in my example, i'm tring your verison with

Code:
echo "$LOCATION"


at the end but not getting any output?


Top
 Profile  
 PostPosted: Thu Feb 17, 2011 7:42 am   

Joined: Fri May 16, 2008 4:58 am
Posts: 94
?


Top
 Profile  
 PostPosted: Thu Feb 17, 2011 10:43 am   
Moderator
User avatar

Joined: Wed May 03, 2006 2:05 pm
Posts: 242
Hi BrianUK!

Try checking the output of your curl command by itself, without running it through sed. In my script, I get an error:

The requested URL <code>/maps/map?q=37.646182,-115.750827&amp;output=xml</code> was not found on this server.

BUT if I change the URL a bit... by replacing /maps/map with maps, I get the source of the map page, but it doesn't seem to understand the output=xml parameter.

It looks like you may have intended to use the google maps api?


Top
 Profile YIM  
 PostPosted: Thu Feb 17, 2011 11:29 am   
Moderator
User avatar

Joined: Wed May 03, 2006 2:05 pm
Posts: 242
This script gives us some XML output, but it's not formatted the way your sed expression is looking for it.

Instead of <LocalityName> we have something like:

Code:
<address_component>
    <long_name></long_name>
    <short_name></short_name>
    <type>locality</type>
</address_component>


Here's what I've got so far:
Code:
#!/bin/bash

# I don't know what the gps command is/does so I
# created a dummy file with data that matches the regex
gps () {
    cat tmp.txt
}

URL=$(gps | sed -rn '/<trkpt lat="/{s%.* lat="([^"]+)" lon="([^"]+)">.*%http://maps.googleapis.com/maps/api/geocode/xml?latlng=\1,\2\&sensor=false%;p}')
OUTPUT=$(curl -s "${URL}" | grep -B2 locality)

echo "$URL"
echo "$OUTPUT"


Top
 Profile YIM  
 PostPosted: Thu Feb 17, 2011 1:32 pm   

Joined: Fri May 16, 2008 4:58 am
Posts: 94
ok for what ever reason my 'gps' software doesn't work being called via a script. This works fine in command line

Code:
gps | sed -rn '/<trkpt lat="/{s%.* lat="([^"]+)" lon="([^"]+)">.*%http://maps.google.com/maps?q=\1,\2%;p}'


but in a bash script using just

Code:
#!/bin/bash

URL=`gps | sed -rn '/<trkpt lat="/{s%.* lat="([^"]+)" lon="([^"]+)">.*%http://maps.google.com/maps?q=\1,\2%;p}'`

echo "$URL"


it doesn't do anything so looks like i'll have to try it a different way.

Ah just run using -x and it appears to be running gps as one command and sed as another

++ gps
++ sed -rn '/<trkpt lat="/{s%.* lat="([^"]+)" lon="([^"]+)">.*%http://maps.google.com/maps?q=\1,\2%;p}'

any idea how to fix this?


Top
 Profile  
 PostPosted: Thu Feb 17, 2011 1:46 pm   
Moderator
User avatar

Joined: Wed May 03, 2006 2:05 pm
Posts: 242
Does your gps command require any parameters? That bash -x output looks normal (well... the two lines that you pasted anyway). Here are the first few lines of mine:

Code:
$ bash -x tmp.sh
++ gps
++ cat tmp.txt
++ sed -rn '/<trkpt lat="/{s%.* lat="([^"]+)" lon="([^"]+)">.*%http://maps.googleapis.com/maps/api/geocode/xml?latlng=\1,\2\&sensor=false%;p}'
+ URL='http://maps.googleapis.com/maps/api/geocode/xml?latlng=37.646182,-115.750827&sensor=false'


Does your bash -x run get as far as saying '+ URL=...'?


Top
 Profile YIM  
 PostPosted: Thu Feb 17, 2011 2:03 pm   

Joined: Fri May 16, 2008 4:58 am
Posts: 94
no it's doesn't need any parameters

no I only get the 2 lines I posted returned

works ok from the command line

Code:
gps | sed -rn '/<trkpt lat="/{s%.* lat="([^"]+)" lon="([^"]+)">.*%http://maps.googleapis.com/maps/api/geocode/xml?latlng=\1,\2\&sensor=false%;p}'

http://maps.googleapis.com/maps/api/geocode/xml?latlng=00.000000,-0.00000&sensor=false


Top
 Profile  
 PostPosted: Thu Feb 17, 2011 2:42 pm   
Moderator
User avatar

Joined: Wed May 03, 2006 2:05 pm
Posts: 242
Interesting... Here's a simpler test... What happens if you just run 'gps' from a script, and don't try to parse the output?

Code:
echo -e '#!/bin/bash\ngps' > mytest.sh
chmod +x mytest.sh
./mytest.sh


Top
 Profile YIM  
 PostPosted: Thu Feb 17, 2011 2:58 pm   

Joined: Fri May 16, 2008 4:58 am
Posts: 94
works fine, even if I do

Code:
#!/bin/bash

gps | sed -rn '/<trkpt lat="/{s%.* lat="([^"]+)" lon="([^"]+)">.*%http://maps.googleapis.com/maps/api/geocode/xml?latlng=\1,\2\&sensor=false%;p}'


but add a but more and a couple of ticks

Code:
#!/bin/bash

URL=`gps | sed -rn '/<trkpt lat="/{s%.* lat="([^"]+)" lon="([^"]+)">.*%http://maps.google.com/maps?q=\1,\2%;p}'`

echo "$URL"


and I get nothing


Top
 Profile  
 PostPosted: Thu Feb 17, 2011 3:26 pm   
Moderator
User avatar

Joined: Wed May 03, 2006 2:05 pm
Posts: 242
There's only one more thing I can think of to try. Instead of `backticks` try using enclosing it in $(here) like in our examples above.


Top
 Profile YIM  
 PostPosted: Thu Feb 17, 2011 3:58 pm   

Joined: Fri May 16, 2008 4:58 am
Posts: 94
Code:
./location5: line 5: syntax error near unexpected token `|'
./location5: line 5: `URL=(gps | sed -rn '/<trkpt lat="/{s%.* lat="([^"]+)" lon="([^"]+)">.*%http://maps.googleapis.com/maps/api/geocode/xml?latlng=\1,\2\&sensor=false%;p}')'


Top
 Profile  
 PostPosted: Fri Feb 18, 2011 10:31 am   
Moderator
User avatar

Joined: Wed May 03, 2006 2:05 pm
Posts: 242
You're missing a "$" :)
It should look like one of the many examples above:

Code:
## Generic example (thanks Patsie!):
var=$(command -args)

## Specific example:
URL=$(gps | sed -rn '/<trkpt lat="/{s%.* lat="([^"]+)" lon="([^"]+)">.*%http://maps.googleapis.com/maps/api/geocode/xml?latlng=\1,\2\&sensor=false%;p}')


I hope this helps!


Top
 Profile YIM  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 31 posts ]  Go to page 1, 2  Next

All times are UTC - 6 hours


Who is online

Users browsing this forum: Google [Bot] and 10 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  


BashScripts | Promote Your Page Too
Powered by phpBB © 2011 phpBB Group
© 2003 - 2011 USA LINUX USERS GROUP