Register
It is currently Mon Apr 21, 2014 9:28 am

cross-reference word matching


All times are UTC - 6 hours


Post new topic Reply to topic  [ 4 posts ] 
Author Message
 PostPosted: Thu Feb 05, 2009 2:37 pm   

Joined: Thu Feb 05, 2009 1:34 pm
Posts: 2
Hello!I have a simple bash script that matches words with bits missing. I use it to help me solve crosswords, as in,

Code:
grep '\<'$mysteryword'\>' /usr/share/dict/words


You enter the word you want to find, and use full-stops for the characters you don't know, and by grepping the dictionary, it returns all possible matches. I'd like to add a commandline argument -s or something to match only synonyms of a chosen word or series of words. The excellent package "dict" has the thesaurus add-on "dict-moby-thesaurus". Its output looks something like this:

Code:
user@compy ~> dict hello
1 definition found

From Moby Thesaurus II by Grady Ward, 1.0 [moby-thesaurus]:

  19 Moby Thesaurus words for "hello":
     accost, address, bob, bow, curtsy, embrace, greeting, hail,
     hand-clasp, handshake, how-do-you-do, hug, kiss, nod, salutation,
     salute, smile, smile of recognition, wave


The command would be something like

Code:
user@compy ~> crossword .r..t..g -s hello


It would scan for words that fit the dotty word, and then filter for words that only match the ones given by the thesaurus and return only those as possibilities. I suppose it'd be more efficient to have these things work the other way around.


I guess this might be a good time to start learning how to use awk or something, but I don't really know how to proceed. The tutorials I've found look pretty imposing. Has anyone any pointers for me?


Alternatively, I could use the thesaurus source file, which lists the synonyms of each word line-by-line. The line for "A-Bomb" looks a bit like this:


A-bomb,H-bomb,atomic bomb,atomic warhead,clean bomb,cobalt bomb,dirty bomb,fusion bomb,hell bomb,hydrogen bomb...(etc)


I suppose grepping here might work, but that file is pretty massive, and there would be way too many hits for the filter to work with. Hrmm. What do you think?


Grateful for any advice!


Top
 Profile  
 PostPosted: Thu Feb 05, 2009 3:47 pm   

Joined: Mon Nov 17, 2008 7:25 am
Posts: 221
well you should be able to do something like this:
Code:
#!/bin/bash
pattern=$1
dict=$(dict $2 | grep "^    " | sed -e "s/^\s\+//" -e "s/, /,/g")

ifs=","
for i in $(echo $dict | sed -e "s/[\n\r]//g"); do
   string=$(echo $i | sed -e "s/,$//")
   echo $string | egrep "^$pattern$" &> /dev/null
   if [ $? -eq 0 ]; then
      echo $string
   fi
done


usage: ./script.sh "w..e" hello
This should return "wave"

Best regards
Fredrik Eriksson


Top
 Profile  
 PostPosted: Thu Feb 05, 2009 9:36 pm   

Joined: Thu Feb 05, 2009 1:34 pm
Posts: 2
Thanks for the help. Unfortunately, it doesn't seem to work. I'm not much of a sed user, so I don't know how to fix it. I made a version that parses the contents of the dict look-up to a temporary file using some of your techniques (and bastardizing them horribly). Hardly an ideal solution. Still, here's what I managed. Sorry it isn't very cogent; it's been the subject of a few rewrites, and I often put stuff in functions even when it's pointless to do so, as my editor can hide it from my eyes whilst I work on other stuff:

Code:
#!/bin/bash
# A simple cross word helping thingy.
   

function usage ()
   { # How to use this script.
      echo """USAGE.

This helps you do crosswords. Type the word you're looking for, and
replace the letters you do not know with full stops.

You can also use the -s option to reduce the number of hits to words
that match only synonyms, like this:

   crossword -s .av. hello
   
This will return a word that fits with .av., but only if it is a synonym
of "hello"."""
   }
   
function look ()
   { # Simple look up with no snazzy thesaurus-checking.
      xy=$(grep '\<'$mysteryword'\>' /usr/share/dict/words)
      echo $xy
   }

function look-with-bells-and-whistles ()
   { # Fancy look-up stuff.


   dict $string | grep "^   " | sed   -e "s/[ ]//g" -e "s/-//g" -e "s/,/\n/g" > /tmp/crossy.txt
   
   xy=$(grep '\<'$pattern'\>' /tmp/crossy.txt)
   echo $xy
   rm /tmp/crossy.txt

   }
   
no_args=1
if [ $# -eq "$no_args" ]; then
      mysteryword=$1
      look
      fi
      
while getopts "sih" options; do
   case "$options" in
      s)    pattern=$2
         string=$3
         look-with-bells-and-whistles
         exit 1
         ;;
      a)    echo "This option doesn't do anything."
         exit 1
         ;;
      h)    usage
         exit 1
         ;;
      *)    echo "WRONG!"
         exit 1
         ;;
   esac
done


Running script -s .av. hello returns wave. That's what I was after. Thanks again.


Top
 Profile  
 PostPosted: Fri Feb 06, 2009 2:43 am   

Joined: Mon Nov 17, 2008 7:25 am
Posts: 221
Well i'm not sure i typed it entirely correct... had to manually "copy" it from my terminal to my browser since my clipboard seems a little off in my new linux installation.

Anyway, if you found what you're looking and my post did some good I'm happy :)

Best regards
Fredrik Eriksson


Top
 Profile  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  


BashScripts | Promote Your Page Too
Powered by phpBB © 2011 phpBB Group
© 2003 - 2011 USA LINUX USERS GROUP