Register
It is currently Thu Dec 18, 2014 10:18 pm

Pick out fields from each input line simply


All times are UTC - 6 hours


Post new topic Reply to topic  [ 6 posts ] 
Author Message
 PostPosted: Sun Oct 21, 2012 4:20 pm   

Joined: Sun Oct 21, 2012 3:23 pm
Posts: 7
The script below provides the command 'fieldz', which can pick out fields/words from the input.
It has a number of options but as a simple example:

$ echo "yes we can" | fieldz 2 3 indeed 1
==> we can indeed yes

Immediately below is the output from its help
message followed by the script code itself which you can copy and paste to a file fieldz and then do "chmod +x" on that file to be able to run it.

Quote:
$ fieldz --help

Usage: fieldz [-d field-delimiter] [-r record-separator] [[-]column-number[+]|-fieldname|text]...

Print given field column(s) from input. Default is last field, -1.
Use a minus number to print counting from the last field.
First field is numbered from 1 not from zero. Zero refers to all fields.
Use -<fieldname> to print the value of a named field.
Ordinary text is just copied to output.
Use a plus sign, '+', after a field number to print all following fields.
-d <delimiter> will use given delimiter instead of spaces to split input fields
-r <separator> will use given separator instead of newlines to split input records (newlines are swapped for spaces)

eg, $ echo yes we can | fieldz 2 3 3 1
==> we can can yes

$ echo b="item1" bcost="99" | fieldz 'It costs: ' -bcost=
==> It costs: 99

$ cat /etc/passwd | fieldz -d: userid: 1 shell: -1
==> userid: guest shell: /bin/sh

$ find . -printf %T+' ' -print |sort|fieldz 2+
--gives a list of files sorted by modified time, oldest first.

$ cat bookmarks.xml | fieldz -r "<bookmark" -d url 2 | fieldz -d\" 2
--extracts the url attribute value from an xml file



Code:


#!/bin/bash
#
main() {
    if [[ -t 0 || `regexp ^-?-help$ $1` ]] ;then echo -e $msg;exit 1;fi

    isNaN() {
   if test -z "$*";then return 1;fi
   if regexp "^-?[0-9]+\+?$" "$*";then return 1;else return 0;fi
    }
    #echo main debug col:$col. fsep:$fsep.  arg1:$1.
    local fsep=' ' rsep;
    checkparams() {
    if test "$1" = -d;then
   fsep=$2;return 2;
    else
   if expr match "$1" -d.* >/dev/null; then
       fsep=${1#-d};return 1;fi
    fi
    if test "$1" = -r;then
   rsep=$2;return 2;
    else
   if expr match "$1" -r.* >/dev/null; then
       rsep=${1#-r};return 1;fi
    fi
    }
    checkparams $*;shift $?
    checkparams $*;shift $?
    fs=$fsep
    while isNaN $1;do if test ${1:0:1} = -; then break;fi;
   prefix=$prefix"$1 "; shift;
    done
    #echo Field have fsep $fsep, rsep $rsep, star $*
    if test "$rsep";then awk 'BEGIN{RS="'$rsep'"} {gsub("\n"," ");print}';else cat;fi |
    while read input; do
   #echo -n debug read input as: $input'  '
   if test "$prefix";then echo -n "$prefix";fi
   if test -z $1;then field;fi; #default.
   i=0;
   for col in "$@";do
       fs=$fsep
       if isNaN $col;then
      if expr match "$col" - >/dev/null; then
          fs=${col:1}; col=2;flag=\"\|\'; #is named field
      else echo -n  "$col ";continue; #is just text
      fi
       else #is numeric col(s)
      if [[ $col = *+ ]];then andon=true;col=${col/+};fi
      if test $col -lt 0;then if test $col -eq -1;then
         col='';
          else col=$col+1;fi; col=NF$col;fi
       fi
       field $flag;
       flag=''
       let i=i+1
       if test $# -ne $i;then echo -n ' ';fi 2>/dev/null;
   done #end of: for col in $@
   echo 2>/dev/null # in case of broken pipe.
    done #end of: while read input
}

field() {
    #echo -n field debug fs:$fs.  col:$col.  arg1:$1. input:$input.'  '
    if test -z $col;then col=NF;fi
    echo   "$input"|
    if test -z $andon; then
   if test -z $fs;then
       awk '{printf $('$col')}'
   else
       if test -z $1; then
      awk -F"$fs" '{ printf $('$col') }'
          else
      after=`awk -F"$fs" '{ printf $2 }'`  # >1 field match in? print $3, $4?
      if `regexp "^\ *[\"\']" "$after"`;then
          echo -n "$after" | awk -F"$1" '{ printf $2 }'
      else
          after=${after## }
          echo -n ${after%% *}
      fi
       fi
   fi
    else #$andon is not -z
   tr '\t' ' '|
   cut -d"$fs" -f $col-|
   tr -d "\n"
#     awk -F"$fs" '{ for ( i = '$col' ; i <= NF; i++ ) printf $i" " }'
#   cut -d"'"$fs"'" -f$col-;  ## this doesn't strip spaces from rest of input.
    fi
}

regexp() {
  rexp=$1;shift
  echo "$*"|egrep -q -e "$rexp"
}

msg="\nUsage: fieldz [-d field-delimiter] [-r record-separator] [[-]column-number[+]|-fieldname|text]...
\n
\nPrint given field column(s) from input.  Default is last field, -1.  \
\nUse a minus number to print counting from the last field.  \
\nFirst field is numbered from 1 not from zero.  Zero refers to all fields.\
\nUse -<fieldname> to print the value of a named field.
\nOrdinary text is just copied to output.
\nUse a plus sign, '+', after a field number to print all following fields.
\n-d <delimiter> will use given delimiter instead of spaces to split input fields
\n-r <separator> will use given separator instead of newlines to split input records (newlines are swapped for spaces)
\n\neg,\t\$ echo yes we can | fieldz 2 3 3 1
\n\t==> we can can yes
\n
\n\t\$ echo b=\"item1\" bcost=\"99\"   | fieldz 'It costs: ' -bcost=
\n\t==> It costs: 99
\n
\n\t\$ cat /etc/passwd | fieldz -d: userid: 1  shell: -1\
\n\t==> userid: guest shell:  /bin/sh
\n
\n\t$ find . -printf %T+' '  -print |sort|fieldz 2+
\n\t--gives a list of files sorted by modified time, oldest first.
\n
\n\t$ cat bookmarks.xml | fieldz -r \"<bookmark\" -d url 2 | fieldz -d\\\" 2
\n\t--extracts the url attribute value from an xml file
\n"

main $*




Last edited by jay on Mon Dec 15, 2014 8:35 am, edited 10 times in total.

Top
 Profile  
 PostPosted: Mon Oct 22, 2012 1:33 am   

Joined: Mon Mar 02, 2009 3:03 am
Posts: 579
hi,

that's nice.

but:
why does bash has to be in posix mode?
main is useless : function are mostly used for repeated code.
instead of testing the whole $* variable, just $1, or $# being 1 or more.
variables in tests should always be quoted.
expr is not bash, you probably can do the same thing using double suare brackets and BAASH_REMATCH
regex for [[ is easier to use in a variable
Code:
reg="^ *[\"']"
[[ $var =~ $reg ]] ...
you see that lhs var doesn't need to be quoted.
instead of using awk, you could read line in an array, based on defined IFS, and print its fields in any order.
use more quotes


Top
 Profile  
 PostPosted: Mon Oct 22, 2012 7:56 am   

Joined: Sun Oct 21, 2012 3:23 pm
Posts: 7
Thanks for your useful comments.

Posix is given since it will run in bash too, it just means that posix shells may also run it. I can't find a general shell script repository on the internet.

Calling main() at the end is useful in that it allows the use of low level functions in the main body. You cannot pre-declare functions in bash.

Yes using $# may be quicker, but how much is a microsecond these days.
Quotes are not needed for "test -z", nor if you know the variable is non-empty.
As to regexp, your way is clearer.
I'm not sure how to use REMATCH but I've not noticed an absence of expr anywhere.

Thanks,J.


Top
 Profile  
 PostPosted: Mon Oct 22, 2012 3:38 pm   

Joined: Mon Mar 02, 2009 3:03 am
Posts: 579
[[ is not posix, so your script won't work with strictly posix shells.
--posix doesn't exactly mimic a posix shell
Quote:
Change the behavior of bash where the default operation differs from the POSIX standard to match the standard (posix mode).
you can't rely on this option to test if your code is strictly posix, better is to use (d)ash.

what do you mean «you can't pre-declare functions in bash» ?
functions can be declared at the top of the script, just like variables.

Code:
$ var="foo bar"
$ test -z $var
bash: test: foo: binary operator expected
$ test -z "$var"
$
see!

`expr` is not shell internal command, so
why use an external command when the shell can do it by itself?


Top
 Profile  
 PostPosted: Tue Oct 23, 2012 12:47 pm   

Joined: Sun Oct 21, 2012 3:23 pm
Posts: 7
The code above has been updated in situ.

Point taken, thanks.

Predeclare is from C programming.


Last edited by jay on Mon Dec 15, 2014 8:39 am, edited 1 time in total.

Top
 Profile  
 PostPosted: Mon Dec 15, 2014 8:25 am   

Joined: Sun Oct 21, 2012 3:23 pm
Posts: 7
---


Last bumped by jay on Mon Dec 15, 2014 8:25 am.


Top
 Profile  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
cron


BashScripts | Promote Your Page Too
Powered by phpBB © 2011 phpBB Group
© 2003 - 2011 USA LINUX USERS GROUP