Register
It is currently Thu Jul 30, 2015 9:26 pm

Pick out fields from each input line simply


All times are UTC - 6 hours


Post new topic Reply to topic  [ 6 posts ] 
Author Message
 PostPosted: Sun Oct 21, 2012 4:20 pm   

Joined: Sun Oct 21, 2012 3:23 pm
Posts: 7
The script below provides the command 'fieldz', which can pick out fields/words from the input.
It has a number of options but as a simple example:

$ echo "yes we can" | fieldz 2 3 indeed 1
==> we can indeed yes

Immediately below is the output from its help
message followed by the script code itself which you can copy and paste to a file fieldz and then do "chmod +x" on that file to be able to run it.

Quote:
Usage: fieldz [-d field-delimiter] [-r record-separator] [ -z ] [[-]column-number[+]|-fieldname|text]...[filename]

Print given field column(s) from input. Default is last field.
Use a minus number to print counting from the last field.
First field is numbered from 1 not from zero. Zero refers to all fields.
Use -<fieldname> to print the value of a named field.
Ordinary text is just copied to output.
Use a plus sign, '+', after a field number to print all following fields.
Use a minus sign, '-', after a field number to print all prior fields.
-d <delimiter> will use given delimiter instead of spaces to split input fields
-r <separator> will use given separator instead of newlines to split input records (newlines are swapped for spaces)
-z Output is not padded with space, eg, fieldz 1 2 3 will print fields joined together without spaces.

eg, $ echo yes we can | fieldz 2 3 3 1
==> we can can yes

$ echo b="item1" bcost="99" | fieldz 'It costs: ' -bcost=
==> It costs: 99

$ fieldz -d: userid: 1 shell: -1 /etc/passwd
==> userid: guest shell: /bin/sh

$ find . -printf %T+' ' -print |sort|fieldz 2+ | tr '
' ' ';echo
--gives a single line output with the names all files sorted by modified time, oldest first.

$ cat bookmarks.xml | fieldz -r "<bookmark" -d url 2 | fieldz -d\" 2
--extracts the url attribute value from an xml file



Code:

#!/bin/bash
#
main() {
    if [[ -t 0 || `regexp ^-?-help$ $1` ]] ;then echo -e $msg;exit 1;fi
    #log "fieldz main111 debug col:$col. fsep:$fsep.  arg1:$1.  arg2:$2."
    local fsep=' ' rsep;
    space=" ";
    checkparams() {
   if test "$1" = -d;then
       fsep=$2;return 2;
   else
       if regexp "-d.*" "$1"; then
      fsep=${1#-d};return 1;fi
   fi
   if test "$1" = -r;then
       rsep=$2;return 2;
   else
       if regexp "-r.*" "$1"; then
      rsep=${1#-r};return 1;fi
   fi
   if test "$1" = -z; then
       space="";
       return 1;fi
    }
    checkparams $*;shift $?
    checkparams $*;shift $?
    fs=$fsep
    while isNaN $1;do if test ${1:0:1} = "-" || regexp "[0-9]+-" "$1"; then break;fi;
            prefix=$prefix"$1$space"; shift;
    done
    #log have prefix:$prefix, fsep:$fsep, rsep:$rsep, star:$*, space=$space.
    if test "$rsep";then awk 'BEGIN{RS="'$rsep'"} {gsub("\n"," ");print}';else cat;fi |
   while read line; do
       #log "main() read line from stdin as: $line "
       wchar0=`wchar`;
       if test "$prefix";then echo -n "$prefix";fi
       if test -z $1;then field;fi;                #default, ie, last field
       i=0;
       for col in "$@";do      #process each user specification of a field then call field() to print the required one.
      fs=$fsep
      if isNaN $col;then
          #log "for loop,  NaN, col=$col"
          if regexp "-[^0-9]+" "$col"; then          #find if a dash<name> in spec.
         fs=${col:1}; col=2;flag=\"\|\';            #it is a named field
          else if regexp "[0-9]+-" "$col"; then         #look for <num><dash>, eg, 2-, print all prior fields
              col=${col:0:-1};
              if test $col -lt 0;then if test $col -eq -1;then col=''; else col=$col+1;fi; col=NF$col;fi #as below, copy to func
              andon=true;upto=true;
          else echo -n "$col$space"; continue;          #is just text in spec copy to output
          fi
          fi
      else #is numeric col(s)
          if [[ $col = *+ ]];then
         andon=true;col=${col/+};
          fi
          if test $col -lt 0;then
         if test $col -eq -1;then
             col='';
         else col=$col+1;fi;
         col=NF$col;fi
      fi #endif isNaN
      field $flag;
      flag=''
      let i=i+1
      if test $# -ne $i && test -z $andon;then echo -n "$space";fi 2>/dev/null;
      unset andon upto
       done #end of: for col in $@
       #log "main(), end for col loop, wchar0:$wchar0 "`wchar`
       echo 2>/dev/null # newline, also in case of broken pipe.
   done #end of: while read line
}

field() {
    #log Call field2 with "$@."
    local res=$(field2 $@)
    if test "$res"; then echo -n "$res"; fi
}

field2() { #print the particular field , uses globals: andon, col, line
    #log "field():$fs.  col:$col.  arg1:$1. line:$line. andon:$andon."
    if test -z $col;then col=NF;fi
    echo   "$line"|
   if test -z $andon; then
       if test -z $fs;then
      awk '{printf $('$col')}'
       else
      if test -z $1; then
          awk -F"$fs" '{ printf $('$col') }'
         else                                     #flag for named field $1
          after=`awk -F"$fs" '{ printf $2 }'`  # >1 field match in? print $3, $4?
          if `regexp "^\ *[\"\']" "$after"`;then
         echo -n "$after" | awk -F"$1" '{ printf $2 }'
          else
         after=${after## }
         echo -n ${after%% *}
          fi
      fi
       fi
   else  # if $andon is not empty, !-z$andon,
       #log "field(): Do cut with fs:$fs,col:$col,upto:$upto."
       tr '\t' ' '|
      if test -z $upto; then
          awk -v space="$space" -F"$fs" '  { for(i=('$col');i<=NF;i++) {printf "%s%s", $i, space; }; }';
      else
          awk -v space="$space" -F"$fs" '{
                         for(i=1;i<=('$col');i++) {printf "%s%s", $i, space;  }; }';
      fi     |
      tr -d "\n"
   fi
    #log "field() end."
}

regexp() { #see if $1 as grep regex is in $2
    rexp=$1;shift
    echo "$@"|egrep -q -e "$rexp"
}

isNaN() {
    if test -z "$*";then return 1;fi
    if regexp "^-?[0-9]+\+?$" "$*";then return 1;else return 0;fi
}

wchar() {
    sync
    grep wchar /proc/$$/io|grep -o '[0-9]*';
}

msg="\nUsage: fieldz [-d field-delimiter] [-r record-separator] [ -z ] [[-]column-number[+]|-fieldname|text]...[filename]
\n
\nPrint given field column(s) from input.  Default is last field.  \
\nUse a minus number to print counting from the last field.  \
\nFirst field is numbered from 1 not from zero.  Zero refers to all fields.\
\nUse -<fieldname> to print the value of a named field.
\nOrdinary text is just copied to output.
\nUse a plus sign, '+', after a field number to print all following fields.
\nUse a minus sign, '-', after a field number to print all prior fields.
\n-d <delimiter> will use given delimiter instead of spaces to split input fields
\n-r <separator> will use given separator instead of newlines to split input records (newlines are swapped for spaces)
\n-z output is not padded with space, eg, fieldz 1 2 3 will output three fields joined together without spaces.
\n\neg,\t\$ echo yes we can | fieldz 2 3 3 1
\n\t==> we can can yes
\n
\n\t\$ echo b=\"item1\" bcost=\"99\"   | fieldz 'It costs: ' -bcost=
\n\t==> It costs: 99
\n
\n\t\$ fieldz -d: userid: 1  shell: -1 /etc/passwd \
\n\t==> userid: guest shell:  /bin/sh
\n
\n\t$ find . -printf %T+' '  -print |sort|fieldz 2+ | tr '\n' ' ';echo
\n\t--gives a single line output with the names all files sorted by modified time, oldest first.
\n
\n\t$ cat bookmarks.xml | fieldz -r \<bookmark -d url=\" 2 | fieldz -d\" 1
\n\t--extracts the url attribute value from an xml file
\n"

# debugfile=/tmp/fieldz.log
# exec > >(tee -a $debugfile)
# exec 2>&1
# rm -f $debugfile

log() { echo "{./fieldz debug, $@}"; > /dev/tty; } # > $debugfile; }

for lastarg; do :;done;   #trick to get last arg
if test -f $lastarg; then
    set -- "${@:1:$#-1}"   #as if, shift -1; rm's lastarg reset $1 $2 ... w/o last one.
else
    unset lastarg
fi
#log "Call main, args:$@."
main "$@" < "${lastarg:-/dev/stdin}" | sed '/^$/d'



Last edited by jay on Wed Jun 10, 2015 4:39 pm, edited 12 times in total.

Top
 Profile  
 PostPosted: Mon Oct 22, 2012 1:33 am   

Joined: Mon Mar 02, 2009 3:03 am
Posts: 605
hi,

that's nice.

but:
why does bash has to be in posix mode?
main is useless : function are mostly used for repeated code.
instead of testing the whole $* variable, just $1, or $# being 1 or more.
variables in tests should always be quoted.
expr is not bash, you probably can do the same thing using double suare brackets and BAASH_REMATCH
regex for [[ is easier to use in a variable
Code:
reg="^ *[\"']"
[[ $var =~ $reg ]] ...
you see that lhs var doesn't need to be quoted.
instead of using awk, you could read line in an array, based on defined IFS, and print its fields in any order.
use more quotes


Top
 Profile  
 PostPosted: Mon Oct 22, 2012 7:56 am   

Joined: Sun Oct 21, 2012 3:23 pm
Posts: 7
Thanks for your useful comments.

Posix is given since it will run in bash too, it just means that posix shells may also run it. I can't find a general shell script repository on the internet.

Calling main() at the end is useful in that it allows the use of low level functions in the main body. You cannot pre-declare functions in bash.

Yes using $# may be quicker, but how much is a microsecond these days.
Quotes are not needed for "test -z", nor if you know the variable is non-empty.
As to regexp, your way is clearer.
I'm not sure how to use REMATCH but I've not noticed an absence of expr anywhere.

Thanks,J.


Top
 Profile  
 PostPosted: Mon Oct 22, 2012 3:38 pm   

Joined: Mon Mar 02, 2009 3:03 am
Posts: 605
[[ is not posix, so your script won't work with strictly posix shells.
--posix doesn't exactly mimic a posix shell
Quote:
Change the behavior of bash where the default operation differs from the POSIX standard to match the standard (posix mode).
you can't rely on this option to test if your code is strictly posix, better is to use (d)ash.

what do you mean «you can't pre-declare functions in bash» ?
functions can be declared at the top of the script, just like variables.

Code:
$ var="foo bar"
$ test -z $var
bash: test: foo: binary operator expected
$ test -z "$var"
$
see!

`expr` is not shell internal command, so
why use an external command when the shell can do it by itself?


Top
 Profile  
 PostPosted: Tue Oct 23, 2012 12:47 pm   

Joined: Sun Oct 21, 2012 3:23 pm
Posts: 7
The code above has been updated in situ.

Point taken, thanks.

Predeclare is from C programming.


Last edited by jay on Mon Dec 15, 2014 8:39 am, edited 1 time in total.

Top
 Profile  
 PostPosted: Mon Dec 15, 2014 8:25 am   

Joined: Sun Oct 21, 2012 3:23 pm
Posts: 7
---


Last bumped by jay on Mon Dec 15, 2014 8:25 am.


Top
 Profile  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: Google [Bot] and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
cron


BashScripts | Promote Your Page Too
Powered by phpBB © 2011 phpBB Group
© 2003 - 2011 USA LINUX USERS GROUP