Hi this script below is working to wikify a page but it is giving me some double outputs, I think it is taking text that is within tags such as <DIV>'s and titles that are in the html. Does anyone know of a way to solve this
echo "<html><title>Wikipedia search for '"$1"'</title><body>"
echo "<center><h1> Wikipedia search for '"$1"'</h1></center></br>"
cat $1 |sed -n '/<body>/,/</body>/p' | grep -o '[A-Z][a-z]*' | uniq > file.txt
while read file
echo $file | sed 's/[A-Z][a-z]*/<a href=\"http:\/\/en.wikipedia.org\/wiki\/&\">&<\/a>/'
grep -m 1 $file $1
done < $1