Register
It is currently Wed Jul 30, 2014 3:06 am

Download xkcd comics


All times are UTC - 6 hours


Post new topic Reply to topic  [ 18 posts ] 
Author Message
 PostPosted: Sat Jan 23, 2010 8:08 am   

Joined: Sat Jan 23, 2010 7:54 am
Posts: 5
My first script on this site. I hope it's useful to someone.
Possibly interesting parts: Adding the mouseover text to the bottom of the image and the title to the top in the local copy.

Read before running:
1. The directory where you'll store the comic has to exist. The script doesn't check. I could add it but it's trivial to make the dir yourself.
2. You need to have imagemagick installed. convert must be runnable from your PATH variable.

How to download every comic in the archives:
for i in {1..692} ; do ; echo "Downloading $i" ; xkcd $i ; done

It will fail for comic #404, which leads to the sites 404 page - but it doesn't break. It just creates an entirely blank comic.

Code:
#!/bin/bash
# Variables: $1 as number of comic to download

TODOW=$(date +%a)
XKCDIR=/home/manu/comics/xkcd

cd $XKCDIR
CURRENTNUMBER=0
   for i in {0..50000} ; do
      [ -f $i.png ] && CURRENTNUMBER=$i
   done
let CURRENTNUMBER=$CURRENTNUMBER+1
[ -f index.html ] && rm index.html
wget -q http://www.xkcd.com/$i
WORKSTRING=$(cat index.html | grep \<img\ src=\"http://imgs.xkcd.com/com)
IMGNAME=${WORKSTRING##*\<img\ src=\"}
IMGNAME=${IMGNAME%%\"*}
FILENAME=${IMGNAME##http://imgs.xkcd.com/comics/}
TITLE=${WORKSTRING##*title=\"}
TITLE=${TITLE%%\"*}
ALT=${WORKSTRING##*alt=\"}
ALT=${ALT%%\"*}
wget -q $IMGNAME
convert $FILENAME -background White -pointsize 36 label:"$ALT\n" +swap -gravity Center -append $CURRENTNUMBER.png
convert $CURRENTNUMBER.png -background White -pointsize 11 -size 420x caption:"\n\n\n$TITLE" -gravity Center -append $CURRENTNUMBER.png
[ -f $FILENAME ] && rm $FILENAME
[ -f index.html ] && rm index.html
exit 0


I did a little bit of trivial adapting in this forum's inputbox without testing because I actually use it in a cron job to automatically download the new one on monday, wednesday and friday but I'm fairly certain this will run. If someone does try it out and it won't run, let me know.


Top
 Profile  
 PostPosted: Sun Jan 24, 2010 5:25 am   
User avatar

Joined: Mon Jan 18, 2010 8:10 pm
Posts: 40
Hi apethedog,

That is some funny stuff. I like 690, "Semicontrolled Demolition".

What a cool script! I like the 'for' loop, "for i in {0..5000}; do". This is the first time I saw that. It seems much better than using (( i=0; i <= 5000; i++)). Too bad it cannot be used with a variable (e.g. "for i in {0..$x})! ...Or can it?

The package 'imagemagick' looks cool. I had to install it and it works great. I like how you used it to add a label and a title.

I love the idea of using 'bash' to get information from the internet. I like how you used the webpage to isolate the picture ("wget -q http://www.xkcd.com/$i"... "wget -q $IMGNAME"). I saw something like this is "Wicked Cool Shell Scripts", but your script is better (IMO).

It's cool that you run this as a 'cron' job. (I still haven't tried playing with 'cron').

Thanks a lot. I really like it and look forward to trying something like this for myself.

:D

g


Top
 Profile  
 PostPosted: Sun Jan 24, 2010 7:01 am   
User avatar

Joined: Mon Jan 18, 2010 8:10 pm
Posts: 40
gofree wrote:
That is some funny stuff. I like 690, "Semicontrolled Demolition".


Actually, they are all really good. Better than the strips in the newspaper!


Top
 Profile  
 PostPosted: Sun Jan 24, 2010 12:10 pm   

Joined: Sat Jan 23, 2010 7:54 am
Posts: 5
I'm glad you like it, and that you think the comics are good. I kind of like having them on my hard disk.

It hadn't occured to me to try the expansion with a variable. I checked just now, and I don't immediately find a way to make it expand properly with: for i in {0..$x}. That sort of thing is really more something you'd use a while loop for.

cron jobs are very useful for things like this. They do have one problem you should be aware of: if your pc is not running when their time comes to trigger, they never run at all. You can remedy this by using anacron, or avoid doing that by using cron.hourly and using a timestamp/check before it does anything to ensure it only runs once every day.

Usually your distro will have cron running, and will have cron.hourly, cron.daily, cron.weekly and cron.monthly directories made under /etc
You shouldn't have to do anything with the daemon at all - just put your script in the directory and it will run. It's very handy if you're forgetful like I am :)


Top
 Profile  
 PostPosted: Mon Jan 25, 2010 12:18 pm   
User avatar

Joined: Mon Jan 18, 2010 8:10 pm
Posts: 40
Thanks again for this! This was just the sort of exercise that I wanted.

Since I have a long way to catch up (so far I have read the first 40 or so comics). I decided to take a different approach than you. Using your code as the basis, I rewrote the script to get and process comics within a specified range ( xkcd [start] [end]). This way I can process only a few comics at a time, rather than download the whole lot. I also broke down the process into three functions: 'xkcd-web' (to download the webpages), 'xkcd-tag' (to get the image and process the details), and 'xkcd-com' (to put together the finished comic). This way saves files at each stage of the process (to be removed later) and allows me to process the whole batch at once from one stage to the next. I like how it is turning out.

I wonder which way is more efficient--processing each comic from start to finish or moving the whole batch from one stage to the next. What do you say?

~~~~~

It's funny that you that you mentioned using 'while' loops instead of variable expansion in a 'for/next' loop. I have not actually used a (single) 'while' loop yet. Guess I should try!

Thanks for the advice for 'cron' and 'anacron'. I will have to keep this in mind. When I get caught up with the 'xkcd' comics, I will try to implement a scheduled task like you have done.

Q:)


Top
 Profile  
 PostPosted: Mon Jan 25, 2010 3:29 pm   

Joined: Sat Jan 23, 2010 7:54 am
Posts: 5
That's cool. Be sure to post the finished script here, if you'd like.

You can definately speed it up if you process all the comics the way you do. You could download a bunch of them at the same time that way in the background.
Code:
wget -qb $COMIC1 -O comic1 &
wget -qb $COMIC2 -O comic2 &
wget -qb $COMIC3 -O comic3
... (I don't know how far you can go with this. I tested it with three and it works fine but you can probably go higher.)


I'll post my weird cron script out here a bit later... I'm kind of ashamed of how much of a hack it is.


Top
 Profile  
 PostPosted: Wed Jan 27, 2010 8:11 am   
User avatar

Joined: Mon Jan 18, 2010 8:10 pm
Posts: 40
Cool Stuff!

Okay, I am just about done. Learned a few more things along the way too!

I found out that I can bypass the stage of downloading the webpages and just read the webpage with 'curl'.
Code:
IMG_TAG=$(curl -s $WEBPAGE | grep \<img\ src=\"http://imgs.xkcd.com/comics/*)


I found a way to read text files line-by-line to an array at the Mandriva user board:
Quote:
Code:
old_IFS=$IFS
IFS=$'\n'
lines=($(cat FILE)) # array
IFS=$old_IFS

echo ${line[4]} # will echo line number 4 (line numbering start with 0)
echo ${line[@]} # will print all the lines.
echo ${line[#]} # will print the size of the array (the total line numbering)

http://mandrivausers.org/index.php?/topic/21998-reading-a-text-file-line-by-line-with-bash/

This made it much easier to read and write the details of the html image tag. Previously I had been writing the URL, Title, Alt, etc. to a string separated by colons, and using 'cut -d: -f' to retrieve the fields. I didn't see another neat way around the fact that there is a colon in the URL. Writing the descriptors out line by line works very well.
Code:
   LINES=($(cat $DETAILS_FILE))
   COMIC_NUMBER=${LINES[0]}
   IMG_URL=${LINES[1]}
   FILE_ORIG=${LINES[2]}
   FILENAME=${LINES[3]}
   EXT=${LINES[4]}
   TITLE=${LINES[5]}
   ALT=${LINES[6]}   

Code:
690
http://imgs.xkcd.com/comics/semicontrolled_demolition.png
semicontrolled_demolition.png
semicontrolled_demolition
png
I believe the truth always lies halfway between the most extreme claims.
Semicontrolled Demolition


Yeah!

g o'free


Top
 Profile  
 PostPosted: Wed Jan 27, 2010 11:47 am   
User avatar

Joined: Mon Jan 18, 2010 8:10 pm
Posts: 40
Code:
#!/bin/bash
# xkcd
# Usage: xkcd [range START] [range END]
# Downloads and assembles 'webcomics'  by Randall Munroe from xkcd.com
# (http://xkcd.com) in batches (from START to END).
# (See Body below)
# G O Free =:)
# geoffreywarne@gmail.com
# Friday, January 29, 2010
# [all xkcd comics by Randall Munroe (http://xkcd.com/about/)]
# [see also: http://en.wikipedia.org/wiki/Xkcd]
# [Thank you]
#======================================================================
#
# Functions [5]: (get_current_number, get_range, read_tag; get_image,
# (assemble_comic)
#
#----------------------------------------------------------------------
# 1) get_current_number: Called from main body, gets the number of the
# latest xkcd comic
#----------------------------------------------------------------------
#
function get_current_number
{
            # Signal function start
echo "[fn: get_current_number]"
            # Read current webpage and extract number
            # of current comic
CURRENT_LINE=$(curl -s http://xkcd.com | grep "<h3>Permanent link to this comic:")
CURRENT_LINE=${CURRENT_LINE%*/</h3>}
CURRENT_LINE=${CURRENT_LINE#<h3>*}
CURRENT_NUMBER=${CURRENT_LINE##*/}
            # Display link and number of current
            # comic
echo "CURRENT_LINE: $CURRENT_LINE"
echo "CURRENT_NUMBER: $CURRENT_NUMBER"
echo
}
#
#----------------------------------------------------------------------
# 2) get_range: Called from main body, checks that given command
# arguments give a valid range (START to END)
#----------------------------------------------------------------------
#
function get_range
{
            # Signal function start
echo "[fn: get_range]"
            # Define standard error message
arg_error="Usage: xkcd [range START] [range END]"
            # Check that at least one argument was
            # given, and ignore more than two
arg_numb=$#
if [ $arg_numb = 0 ]; then echo "NO RANGE: $arg_error"; exit; fi
if [ $arg_numb -gt 2 ]; then echo "TOO MANY ARGUMENTS: $arg_error"; shift; shift; echo "IGNORING: $@";fi
            # Set start and end of range
START=$1
END=$2
END=${END:=$START}
            # Check that range is valid: integers
            # only, not less than one or greater
            # than current number, END is greater
            # than START
if !(echo $START | grep -q "^[0-9]*$") || !(echo $END | grep -q "^[0-9]*$") then echo "INTEGERS ONLY: $arg_error"; exit; fi
if [ $START -lt 1 ]; then echo "START must be greater than zero: $arg_error"; exit; fi
if [ $START -gt $CURRENT_NUMBER ]; then echo "START of range larger then CURRENT... Setting START to CURRENT: $CURRENT_NUMBER"; START=$CURRENT_NUMBER; fi
if [ $END -gt $CURRENT_NUMBER ]; then echo "END of range larger then CURRENT... Setting END to CURRENT: $CURRENT_NUMBER"; END=$CURRENT_NUMBER; fi
NUMBER=$(( $END - $START + 1))
if [ $START -gt $END ]; then echo "START must be less than END: $arg_error"; exit; fi


            # Display range
echo "Start: $START    End: $END     Number: $NUMBER"
echo
}
#
#----------------------------------------------------------------------
# 3) read_tag: Called from main body, for each comic in range, read the
# web page ('curl', 'grep'), parse the image tag for descriptors (image
# URL, file name, extension, title, and alt), and stores the details in
# a text file (DETAILS_FILE).
#----------------------------------------------------------------------
#
function read_tag
{
            # Signal function start
echo "[fn: read_tag]"
echo "Reading xkcd webpage(s)... Comics: $START to $END..."; echo ">>>"
            # Count comics processed
let -i count=0
for (( COMIC_NUMBER = $START; COMIC_NUMBER <= $END; COMIC_NUMBER++ )); do
   let count=$count+1
            # Set target URL and location of
            # details text file (DETAILS_FILE)
   WEBPAGE="http://xkcd.com/$COMIC_NUMBER/"
   DETAILS_FILE="$XKCD_parts/$COMIC_NUMBER.txt"
            # Display target URL
   echo "$count) [$COMIC_NUMBER] Reading WEBPAGE: $WEBPAGE"
            # Read webpage and extract image tag
   IMG_TAG=$(curl -s $WEBPAGE | grep \<img\ src=\"http://imgs.xkcd.com/comics/*)
            # Parse image tag for descriptors
   IMG_TAG=${IMG_TAG%*<br/>}
   IMG_URL=${IMG_TAG##*\<img\ src=\"}
   IMG_URL=${IMG_URL%%\"*}
   FILE_ORIG=${IMG_URL##http://imgs.xkcd.com/comics/}
   FILENAME=${FILE_ORIG%*.*}
   EXT=${FILE_ORIG#*.*}
   TITLE=${IMG_TAG##*title=\"}
   TITLE=${TITLE%%\"*}
   ALT=${IMG_TAG##*alt=\"}
   ALT=${ALT%%\"*}
            # Display details of comic
   echo "IMG_TAG: $IMG_TAG"
   echo "IMG_URL: $IMG_URL"
   echo "FILE_ORIG: $FILE_ORIG"
   echo "FILENAME: $FILENAME"
   echo "EXT: $EXT"
   echo "TITLE: $TITLE"
   echo "ALT: $ALT"
            # Store details in a text file
   echo "Creating DETAILS_FILE: $DETAILS_FILE"
   echo -e "$COMIC_NUMBER\n$IMG_URL\n$FILE_ORIG\n$FILENAME\n$EXT\n$TITLE\n$ALT" > $DETAILS_FILE
   echo
done
}
#
#----------------------------------------------------------------------
# 4) get_image: Called from main body, this function reads the URL for
# each comic from the details file, downloads the image ('wget'), and
# saves it with the details file (in $XKCD/parts (or $XKCD_parts).
#----------------------------------------------------------------------
#
function get_image
{
            # Signal function start
echo "[fn: get_image]"
echo "Getting picture(s)... Comics: $START to $END..."; echo ">>>"
            # SET the Internal Field Separator
            # (IFS) to new line (/n) only
            # After saving the current setting
            # http://en.wikipedia.org/wiki/Internal_field_separator
old_IFS=$IFS
IFS=$'\n'
            # Count comics processed
let -i count=0
for (( COMIC_NUMBER = $START; COMIC_NUMBER <= $END; COMIC_NUMBER++ )); do
   let count=$count+1
            # Get location of details file
   DETAILS_FILE="$XKCD_parts/$COMIC_NUMBER.txt"
            # Display target details file for each
            # comic
   echo "$count) [$COMIC_NUMBER] Reading DETAILS_FILE: $DETAILS_FILE"
            # Read lines of details file into an
            # array
   LINES=($(cat $DETAILS_FILE))
            # Assign values in array to relevant
            # variables: here: image URL,
            # extension (with comic number)
#   COMIC_NUMBER=${LINES[0]}   #Here for reference only
   IMG_URL=${LINES[1]}   # IMAGE URL
#   FILE_ORIG=${LINES[2]}      #Here for reference only
#   FILENAME=${LINES[3]}      #Here for reference only
   EXT=${LINES[4]}      # EXTENSION
#   TITLE=${LINES[5]}      #Here for reference only
#   ALT=${LINES[6]}         #Here for reference only
            # Display target image URL
   echo "IMG_URL: $IMG_URL"
#   echo "FILE_ORIG: $FILE_ORIG"   #Here for reference only
#   echo "FILENAME: $FILENAME"   #Here for reference only
            # Display original image extension
   echo "EXT: $EXT"
            # Set destination file
   IMG_FILE="$XKCD_parts/$COMIC_NUMBER.$EXT"
            # Signal attempt to get image
   echo "Getting Image: $IMG_URL"
   echo "Saving To: $IMG_FILE"
            # Get image from the internet
            # Save it to destination file
   wget -q -O $IMG_FILE $IMG_URL
done
            # Restore original IFS
IFS=$old_IFS
#IFS=' /t/n'
}
#
#----------------------------------------------------------------------
# 5) assemble_comic: Called from main body, this function runs the
# 'convert' program to assemble the image with a title and caption and
# saves it in the the main directory ($XKCD) with a number, original
# filename and (possibly altered) extension (.png).
#----------------------------------------------------------------------
#
function assemble_comic
{
            # Signal function start
echo; echo "[fn: assemble_comic]"
echo "Assembling comic(s) (image+title+alt)... Comics: $START to $END... "; echo ">>>"
            # SET the Internal Field Separator
            # (IFS) to new line (/n) only
old_IFS=$IFS
IFS=$'\n'
            # Count comics processed
let -i count=0
for (( COMIC_NUMBER = $START; COMIC_NUMBER <= $END; COMIC_NUMBER++ )); do
   let count=$count+1
            # Get location of details file
   DETAILS_FILE="$XKCD_parts/$COMIC_NUMBER.txt"
            # Display target details file for each
            # comic
   echo "$count) [$COMIC_NUMBER] Reading DETAILS_FILE: $DETAILS_FILE"
            # Read lines of details file into an
            # array
   LINES=($(cat $DETAILS_FILE))
            # Assign values in array to relevant
            # variables: filename, extension,
            # image file, title, and alt.
   FILENAME=${LINES[3]}
   EXT=${LINES[4]}
   IMG_FILE="$XKCD_parts/$COMIC_NUMBER.$EXT"
   TITLE=${LINES[5]}
   ALT=${LINES[6]}
            # Display parts to assemble
   echo "Using IMG_FILE: $IMG_FILE"
   echo "Using TITLE: $TITLE"
   echo "Using ALT: $ALT"
            # Set filename for finished comic
            # with its number and original
            # filename
   COMIC_FILE="$XKCD/$COMIC_NUMBER--$FILENAME.png"
            # Signal attempt to assemble comic
   echo "Assembling to COMIC_FILE: $COMIC_FILE"
            # Assemble comic: Apply title and
            # caption to image
   convert $IMG_FILE -background White -pointsize 20 label:"$ALT\n" +swap -gravity Center -append $COMIC_FILE
   convert $COMIC_FILE -background White -pointsize 16 -size 420x caption:"\n\n\n$TITLE" -gravity Center -append $COMIC_FILE
            # Signal assembly completed
   echo "[convert] $IMG_FILE --> $COMIC_FILE"
   echo
done

IFS=$old_IFS
}
#
#======================================================================
# Body: This is the main body of the program. It first sets the proper
# directories. It then runs the functions: 'get_current_number',
# 'get_range', 'read_tag', 'get_image', and 'assemble_comic'.
#======================================================================
#
            # Print title
echo; echo "[xkcd: Get xkcd Comics]"

            # Set directories
XKCD="/home/geoffrey/Comics/xkcd"
XKCD_parts="/home/geoffrey/Comics/xkcd/parts"
echo "Destination Directory: $XKCD"
echo "Components Directory: $XKCD_parts"
echo

            # Run function to get the number of
            # current (latest) xkcd comic
get_current_number

            # Check that arguments and range are
            # valid (for xkcd archives)
get_range $@

            # Read the webpage for each xkcd comic
            # and extract details from its html tag
read_tag

            # Get the image for each xkcd comic
get_image

            # Use 'imagemagick' ('convert') to
            # assemble xkcd comic (add title and
            # caption to image)
assemble_comic

            # Signal end of program
echo $0 $@; echo "done"; echo
#[End of file]


Last edited by gofree on Fri Jan 29, 2010 1:20 am, edited 3 times in total.

Top
 Profile  
 PostPosted: Fri Jan 29, 2010 1:07 am   
User avatar

Joined: Mon Jan 18, 2010 8:10 pm
Posts: 40
Error! :-O

When I tried a single number greater than current number, I got this:
Code:
geoffrey@Presario-Laptop:~$ xkcd 700

[xkcd: Get xkcd Comics]
Destination Directory: /home/geoffrey/Comics/xkcd
Components Directory: /home/geoffrey/Comics/xkcd/parts

[fn: get_current_number]
CURRENT_LINE: Permanent link to this comic: http://xkcd.com/695
CURRENT_NUMBER: 695

[fn: get_range]
END of range larger then CURRENT... Setting END to CURRENT: 695
Start: 700    End: 695     Number: -4

[fn: read_tag]
Reading xkcd webpage(s)... Comics: 700 to 695...
>>>
[fn: get_image]
Getting pictures... Comics: 700 to 695...
>>>
[fn: assemble_comic]
Assembling comic(s) (image+title+alt)... Comics: 700 to 695...
>>>
/home/geoffrey/bin/xkcd 700
done


I left a line out:
Code:
if [ $START -gt $CURRENT_NUMBER ]; then echo "START of range larger then CURRENT... Setting START to CURRENT: $CURRENT_NUMBER"; START=$CURRENT_NUMBER; fi


Last edited by gofree on Fri Jan 29, 2010 1:17 am, edited 2 times in total.

Top
 Profile  
 PostPosted: Fri Jan 29, 2010 1:15 am   
User avatar

Joined: Mon Jan 18, 2010 8:10 pm
Posts: 40
Okay, I fixed it:
Code:
geoffrey@Presario-Laptop:~$ xkcd 700

[xkcd: Get xkcd Comics]
Destination Directory: /home/geoffrey/Comics/xkcd
Components Directory: /home/geoffrey/Comics/xkcd/parts

[fn: get_current_number]
CURRENT_LINE: Permanent link to this comic: http://xkcd.com/695
CURRENT_NUMBER: 695

[fn: get_range]
START of range larger then CURRENT... Setting START to CURRENT: 695
END of range larger then CURRENT... Setting END to CURRENT: 695
Start: 695    End: 695     Number: 1

[fn: read_tag]
Reading xkcd webpage(s)... Comics: 695 to 695...
>>>
1) [695] Reading WEBPAGE: http://xkcd.com/695/
IMG_TAG: <img src="http://imgs.xkcd.com/comics/spirit.png" title="On January 26th, 2213 days into its mission, NASA declared Spirit a 'stationary research station', expected to operational for several more months until the dust buildup on its solar panels forces a final shutdown." alt="Spirit" />
IMG_URL: http://imgs.xkcd.com/comics/spirit.png
FILE_ORIG: spirit.png
FILENAME: spirit
EXT: png
TITLE: On January 26th, 2213 days into its mission, NASA declared Spirit a 'stationary research station', expected to operational for several more months until the dust buildup on its solar panels forces a final shutdown.
ALT: Spirit
Creating DETAILS_FILE: /home/geoffrey/Comics/xkcd/parts/695.txt

[fn: get_image]
Getting picture(s)... Comics: 695 to 695...
>>>
1) [695] Reading DETAILS_FILE: /home/geoffrey/Comics/xkcd/parts/695.txt
IMG_URL: http://imgs.xkcd.com/comics/spirit.png
EXT: png
Getting Image: http://imgs.xkcd.com/comics/spirit.png
Saving To: /home/geoffrey/Comics/xkcd/parts/695.png


[fn: assemble_comic]
Assembling comic(s) (image+title+alt)... Comics: 695 to 695...
>>>
1) [695] Reading DETAILS_FILE: /home/geoffrey/Comics/xkcd/parts/695.txt
Using IMG_FILE: /home/geoffrey/Comics/xkcd/parts/695.png
Using TITLE: On January 26th, 2213 days into its mission, NASA declared Spirit a 'stationary research station', expected to operational for several more months until the dust buildup on its solar panels forces a final shutdown.
Using ALT: Spirit
Assembling to COMIC_FILE: /home/geoffrey/Comics/xkcd/695--spirit.png
[convert] /home/geoffrey/Comics/xkcd/parts/695.png --> /home/geoffrey/Comics/xkcd/695--spirit.png

/home/geoffrey/bin/xkcd 700
done



Top
 Profile  
 PostPosted: Fri Jan 29, 2010 1:47 am   
User avatar

Joined: Mon Jan 18, 2010 8:10 pm
Posts: 40
For current comic: 695
Code:
[fn: read_tag]
Reading xkcd webpage(s)... Comics: 695 to 695...
>>>
1) [695] Reading WEBPAGE: http://xkcd.com/695/
IMG_TAG: <img src="http://imgs.xkcd.com/comics/spirit.png" title="On January 26th, 2213 days into its mission, NASA declared Spirit a 'stationary research station', expected to operational for several more months until the dust buildup on its solar panels forces a final shutdown." alt="Spirit" />
IMG_URL: http://imgs.xkcd.com/comics/spirit.png
FILE_ORIG: spirit.png
FILENAME: spirit
EXT: png
TITLE: On January 26th, 2213 days into its mission, NASA declared Spirit a 'stationary research station', expected to operational for several more months until the dust buildup on its solar panels forces a final shutdown.
ALT: Spirit
Creating DETAILS_FILE: /home/geoffrey/Comics/xkcd/parts/695.txt

Details File: 695.txt
Code:
695
http://imgs.xkcd.com/comics/spirit.png
spirit.png
spirit
png
On January 26th, 2213 days into its mission, NASA declared Spirit a 'stationary research station', expected to operational for several more months until the dust buildup on its solar panels forces a final shutdown.
Spirit

Code:
[fn: get_image]
Getting pictures... Comics: 695 to 695...
>>>
1) [695] Reading DETAILS_FILE: /home/geoffrey/Comics/xkcd/parts/695.txt
IMG_URL: http://imgs.xkcd.com/comics/spirit.png
EXT: png
Getting Image: http://imgs.xkcd.com/comics/spirit.png

http://imgs.xkcd.com/comics/spirit.png
Image
Code:
Saving To: /home/geoffrey/Comics/xkcd/parts/695.png

Code:
[fn: assemble_comic]
Assembling comic(s) (image+title+alt)... Comics: 695 to 695...
>>>
1) [695] Reading DETAILS_FILE: /home/geoffrey/Comics/xkcd/parts/695.txt
Using IMG_FILE: /home/geoffrey/Comics/xkcd/parts/695.png
Using TITLE: On January 26th, 2213 days into its mission, NASA declared Spirit a 'stationary research station', expected to operational for several more months until the dust buildup on its solar panels forces a final shutdown.
Using ALT: Spirit
Assembling to COMIC_FILE: /home/geoffrey/Comics/xkcd/695--spirit.png
[convert] /home/geoffrey/Comics/xkcd/parts/695.png --> /home/geoffrey/Comics/xkcd/695--spirit.png


Assembled Comic (695):


You do not have the required permissions to view the files attached to this post.


Top
 Profile  
 PostPosted: Fri Jan 29, 2010 2:51 pm   

Joined: Sat Jan 23, 2010 7:54 am
Posts: 5
Nice work!!

I like your layout of the text better. I'm borrowing :)


Top
 Profile  
 PostPosted: Sat Jan 30, 2010 12:49 am   
User avatar

Joined: Mon Jan 18, 2010 8:10 pm
Posts: 40
apethedog,

Quote:
I like your layout of the text better. I'm borrowing :)
:-/

Heh, heh... :)
Of course, the composition part is all yours. I hardly touched a thing. I just copied the 'convert' lines as you posted them.

Are you still thinking about showing me the 'anacron' configuration?

I am thinking about writing another script to process each file in a batch one-at-time from start to finish, in order to compare it to the stage-by-stage method of my first bash script.

Maybe...

.


Top
 Profile  
 PostPosted: Tue Feb 02, 2010 11:13 pm   
User avatar

Joined: Mon Jan 18, 2010 8:10 pm
Posts: 40
gofree wrote:
Quote:
I wonder which way is more efficient--processing each comic from start to finish or moving the whole batch from one stage to the next. What do you say?


apethedog wrote:
Quote:
You can definately speed it up if you process all the comics the way you do. You could download a bunch of them at the same time that way in the background.


Hello again,

Okay, since we have this question before us, we might as well test it out. It makes intuitive sense that a "lateral" processing of batches stage-by-stage would be faster than a "linear" processing file-by-file, but I would like to know for sure. (To indulge the "little professor" within me.)

I rewrote my version of the xkcd download script (the "lateral" stage-by-stage method) to process the comics in a "linear" (file-by-file) fashion. I moved the loops from the individual function "stages" to the main body of the program.

Code:
#!/bin/bash
# xkcd
# Usage: xkcd [range START] [range END]
# Downloads and assembles 'webcomics'  by Randall Munroe from xkcd.com
# (http://xkcd.com) in batches (from START to END).
# (See Body below)
# G O Free =:)
# geoffreywarne@gmail.com
# Friday, January 29, 2010
# [all xkcd comics by Randall Munroe (http://xkcd.com/about/)]
# [see also: http://en.wikipedia.org/wiki/Xkcd]
# [Thank you]
#======================================================================
#
# Functions [5]: (get_current_number, get_range, read_tag; get_image,
# (assemble_comic)
#
#----------------------------------------------------------------------
# 1) get_current_number: Called from main body, gets the number of the
# latest xkcd comic
#----------------------------------------------------------------------
#
function get_current_number
{
            # Signal function start
echo "[fn: get_current_number]"
            # Read current webpage and extract number
            # of current comic
CURRENT_LINE=$(curl -s http://xkcd.com | grep "<h3>Permanent link to this comic:")
CURRENT_LINE=${CURRENT_LINE%*/</h3>}
CURRENT_LINE=${CURRENT_LINE#<h3>*}
CURRENT_NUMBER=${CURRENT_LINE##*/}
            # Display link and number of current
            # comic
echo "CURRENT_LINE: $CURRENT_LINE"
echo "CURRENT_NUMBER: $CURRENT_NUMBER"
echo
}
#
#----------------------------------------------------------------------
# 2) get_range: Called from main body, checks that given command
# arguments give a valid range (START to END)
#----------------------------------------------------------------------
#
function get_range
{
            # Signal function start
echo "[fn: get_range]"
            # Define standard error message
arg_error="Usage: xkcd [range START] [range END]"
            # Check that at least one argument was
            # given, and ignore more than two
arg_numb=$#
if [ $arg_numb = 0 ]; then echo "NO RANGE: $arg_error"; exit; fi
if [ $arg_numb -gt 2 ]; then echo "TOO MANY ARGUMENTS: $arg_error"; shift; shift; echo "IGNORING: $@";fi
            # Set start and end of range
START=$1
END=$2
END=${END:=$START}
            # Check that range is valid: integers
            # only, not less than one or greater
            # than current number, END is greater
            # than START
if !(echo $START | grep -q "^[0-9]*$") || !(echo $END | grep -q "^[0-9]*$") then echo "INTEGERS ONLY: $arg_error"; exit; fi
if [ $START -lt 1 ]; then echo "START must be greater than zero: $arg_error"; exit; fi
if [ $START -gt $CURRENT_NUMBER ]; then echo "START of range larger then CURRENT... Setting START to CURRENT: $CURRENT_NUMBER"; START=$CURRENT_NUMBER; fi
if [ $END -gt $CURRENT_NUMBER ]; then echo "END of range larger then CURRENT... Setting END to CURRENT: $CURRENT_NUMBER"; END=$CURRENT_NUMBER; fi
NUMBER=$(( $END - $START + 1))
if [ $START -gt $END ]; then echo "START must be less than END: $arg_error"; exit; fi


            # Display range
echo "Start: $START    End: $END     Number: $NUMBER"
echo
}
#
#----------------------------------------------------------------------
# 3) read_tag: Called from main body, for each comic in range, read the
# web page ('curl', 'grep'), parse the image tag for descriptors (image
# URL, file name, extension, title, and alt), and stores the details in
# a text file (DETAILS_FILE).
#----------------------------------------------------------------------
#
function read_tag
{
            # Signal function start
echo "[fn: read_tag]"
echo "Reading xkcd webpage(s)... Comic: $COMIC_NUMBER..."; echo ">>>"
            # Count comics processed

            # Set target URL and location of
            # details text file (DETAILS_FILE)
   WEBPAGE="http://xkcd.com/$COMIC_NUMBER/"
   DETAILS_FILE="$XKCD_parts/$COMIC_NUMBER.txt"
            # Display target URL
   echo "$count) [$COMIC_NUMBER] Reading WEBPAGE: $WEBPAGE"
            # Read webpage and extract image tag
   IMG_TAG=$(curl -s $WEBPAGE | grep \<img\ src=\"http://imgs.xkcd.com/comics/*)
            # Parse image tag for descriptors
   IMG_TAG=${IMG_TAG%*<br/>}
   IMG_URL=${IMG_TAG##*\<img\ src=\"}
   IMG_URL=${IMG_URL%%\"*}
   FILE_ORIG=${IMG_URL##http://imgs.xkcd.com/comics/}
   FILENAME=${FILE_ORIG%*.*}
   EXT=${FILE_ORIG#*.*}
   TITLE=${IMG_TAG##*title=\"}
   TITLE=${TITLE%%\"*}
   ALT=${IMG_TAG##*alt=\"}
   ALT=${ALT%%\"*}
            # Display details of comic
   echo "IMG_TAG: $IMG_TAG"
   echo "IMG_URL: $IMG_URL"
   echo "FILE_ORIG: $FILE_ORIG"
   echo "FILENAME: $FILENAME"
   echo "EXT: $EXT"
   echo "TITLE: $TITLE"
   echo "ALT: $ALT"
            # Store details in a text file
   echo "Creating DETAILS_FILE: $DETAILS_FILE"
   echo -e "$COMIC_NUMBER\n$IMG_URL\n$FILE_ORIG\n$FILENAME\n$EXT\n$TITLE\n$ALT" > $DETAILS_FILE
   echo
}
#
#----------------------------------------------------------------------
# 4) get_image: Called from main body, this function reads the URL for
# each comic from the details file, downloads the image ('wget'), and
# saves it with the details file (in $XKCD/parts (or $XKCD_parts).
#----------------------------------------------------------------------
#
function get_image
{
            # Signal function start
echo "[fn: get_image]"
echo "Getting pictures... Comic: $COMIC_NUMBER..."; echo ">>>"
            # SET the Internal Field Separator
            # (IFS) to new line (/n) only
            # After saving the current setting
            # http://en.wikipedia.org/wiki/Internal_field_separator
old_IFS=$IFS
IFS=$'\n'
            # Count comics processed

            # Get location of details file
   DETAILS_FILE="$XKCD_parts/$COMIC_NUMBER.txt"
            # Display target details file for each
            # comic
   echo "$count) [$COMIC_NUMBER] Reading DETAILS_FILE: $DETAILS_FILE"
            # Read lines of details file into an
            # array
   LINES=($(cat $DETAILS_FILE))
            # Assign values in array to relevant
            # variables: here: image URL,
            # extension (with comic number)
#   COMIC_NUMBER=${LINES[0]}   #Here for reference only
   IMG_URL=${LINES[1]}   # IMAGE URL
#   FILE_ORIG=${LINES[2]}      #Here for reference only
#   FILENAME=${LINES[3]}      #Here for reference only
   EXT=${LINES[4]}      # EXTENSION
#   TITLE=${LINES[5]}      #Here for reference only
#   ALT=${LINES[6]}         #Here for reference only
            # Display target image URL
   echo "IMG_URL: $IMG_URL"
#   echo "FILE_ORIG: $FILE_ORIG"   #Here for reference only
#   echo "FILENAME: $FILENAME"   #Here for reference only
            # Display original image extension
   echo "EXT: $EXT"
            # Set destination file
   IMG_FILE="$XKCD_parts/$COMIC_NUMBER.$EXT"
            # Signal attempt to get image
   echo "Getting Image: $IMG_URL"
   echo "Saving To: $IMG_FILE"
            # Get image from the internet
            # Save it to destination file
   wget -q -O $IMG_FILE $IMG_URL

            # Restore original IFS
IFS=$old_IFS
#IFS=' /t/n'
}
#
#----------------------------------------------------------------------
# 5) assemble_comic: Called from main body, this function runs the
# 'convert' program to assemble the image with a title and caption and
# saves it in the the main directory ($XKCD) with a number, original
# filename and (possibly altered) extension (.png).
#----------------------------------------------------------------------
#
function assemble_comic
{
            # Signal function start
echo; echo "[fn: assemble_comic]"
echo "Assembling comic(s) (image+title+alt)... Comic: $COMIC_NUMBER... "; echo ">>>"
            # SET the Internal Field Separator
            # (IFS) to new line (/n) only
old_IFS=$IFS
IFS=$'\n'

            # Get location of details file
   DETAILS_FILE="$XKCD_parts/$COMIC_NUMBER.txt"
            # Display target details file for each
            # comic
   echo "$count) [$COMIC_NUMBER] Reading DETAILS_FILE: $DETAILS_FILE"
            # Read lines of details file into an
            # array
   LINES=($(cat $DETAILS_FILE))
            # Assign values in array to relevant
            # variables: filename, extension,
            # image file, title, and alt.
   FILENAME=${LINES[3]}
   EXT=${LINES[4]}
   IMG_FILE="$XKCD_parts/$COMIC_NUMBER.$EXT"
   TITLE=${LINES[5]}
   ALT=${LINES[6]}
            # Display parts to assemble
   echo "Using IMG_FILE: $IMG_FILE"
   echo "Using TITLE: $TITLE"
   echo "Using ALT: $ALT"
            # Set filename for finished comic
            # with its number and original
            # filename
   COMIC_FILE="$XKCD/$COMIC_NUMBER--$FILENAME.png"
            # Signal attempt to assemble comic
   echo "Assembling to COMIC_FILE: $COMIC_FILE"
            # Assemble comic: Apply title and
            # caption to image
   convert $IMG_FILE -background White -pointsize 20 label:"$ALT\n" +swap -gravity Center -append $COMIC_FILE
   convert $COMIC_FILE -background White -pointsize 16 -size 420x caption:"\n\n\n$TITLE" -gravity Center -append $COMIC_FILE
            # Signal assembly completed
   echo "[convert] $IMG_FILE --> $COMIC_FILE"
   echo


IFS=$old_IFS
}
#
#======================================================================
# Body: This is the main body of the program. It first sets the proper
# directories. It then runs the functions: 'get_current_number',
# 'get_range', 'read_tag', 'get_image', and 'assemble_comic'.
#======================================================================
#
            # Print title
echo; echo "[xkcd: Get xkcd Comics]"

            # Set directories
XKCD="/home/geoffrey/Comics/xkcd/bigloop"
XKCD_parts="/home/geoffrey/Comics/xkcd/bigloop-parts"
echo "Destination Directory: $XKCD"
echo "Components Directory: $XKCD_parts"
echo

            # Run function to get the number of
            # current (latest) xkcd comic
get_current_number

            # Check that arguments and range are
            # valid (for xkcd archives)
get_range $@

let -i count=0
for (( COMIC_NUMBER = $START; COMIC_NUMBER <= $END; COMIC_NUMBER++ )); do
let count=count+1
            # Read the webpage for each xkcd comic
            # and extract details from its html tag
read_tag

            # Get the image for each xkcd comic
get_image

            # Use 'imagemagick' ('convert') to
            # assemble xkcd comic (add title and
            # caption to image)
assemble_comic

done

            # Signal end of program
echo $0 $@; echo "done"; echo
#[End of file]

I am calling the two versions 'xkcd--sl' (for "small loops") and 'xkcd--bl' (for "big loop").

Both programs work fine and are ready to be put to the test.


Top
 Profile  
 PostPosted: Tue Feb 02, 2010 11:38 pm   
User avatar

Joined: Mon Jan 18, 2010 8:10 pm
Posts: 40
Here is the speed test that I will use:
Code:
#!/bin/bash
#=======================================================================#
#   NAME:       speed_test: 'xkcd--sl' vs 'xkcd--bl'      
#   DESCRIPTION:    runs two programs, 'xkcd--sl' and 'xkcd--bl',   
#         one after the other, timing the duration of     
#         run-time over five runs, and outputs total     
#         times for each batch for each program. This     
#         program loops each of the test programs for     
#         five batchs of five and calculates total times 
#         for all five batches            
#   BY:   G O Free =:)                  
#   FOR: bashscripts.org                  
#   DATE: February 1, 2010                  
#=======================================================================#
#
               # Display name of program
echo; echo "speed_test: 'xkcd--sl' vs 'xkcd--bl'"; echo
               # Set the format for 'time'
               # output (real time only, in
               # seconds, no letters/labels)
TIMEFORMAT='%3R'
               # Test that a start point was
               # given
START_POINT=${1:?"No start point given: Usage: speed_test [start point]"}
               # Run time test for five batches
for (( RUN = 1; RUN <= 5; RUN++ )); do
   echo "RUN: $RUN"
               # Set range for batch
#   let START=$((RUN-1))*5+$START_POINT
#   END=$START+4
   let START=$((RUN-1))+$START_POINT
   END=$START
               # Display test range
   echo "START: $START"
   echo "END: $END"
   echo

### Run test with 'xkcd--sl' first

               # Signal start of 'xkcd--sl'
               # test
   echo "xkcd--sl"
               # Remove previous records and
               # start new record for first
               # program, 'xkcd--sl'
   RECORD="sl--time$RUN.txt"
   echo "RECORD file: $RECORD"
   if [ -e $RECORD ]; then rm $RECORD; fi
   touch $RECORD
               # Identify command to time
   echo "time xkcd--sl $START $END:"
               # Run 'xkcd--sl' on test range
               # five times
   for (( i = 1; i <= 5; i++)); do
               # Identify run number
      echo -n -e "$i)\t "
               # Run 'xkcd--sl' on test range,
               # Suppress 'xkcd--sl' display,
               # and record run-times
      { time xkcd--sl $START $END > /dev/null 2>&1 ; } 2>> $RECORD
               # Use this line to display
               # 'xkcd--sl' output
   #   { time xkcd--sl $START $END ; } 2>> $RECORD
               # Display run-time
      cat $RECORD | tail -n 1
   done
   echo
               # Start count of recorded
               # run-times
   let -i COUNT=1
               # Start run-time total at zero,
               # sending value to 'bc' for
               # decimal numbers
   SL_TOTAL=$(echo "0" | bc)
               # Signal start of summary
   echo "total time for 'xkcd--sl':"
               # Read each run-time in
               # recorded list
   for TIME in $(cat $RECORD); do
               # Convert recorded times to
               # decimal numbers with 'bc'
      DECT=$(echo "$TIME" | bc)
               # Display times and decimal
               # values
      echo -e "$COUNT) TIME: $TIME\t DECT: $DECT"
               # Use 'bc' to calculate the sum
               # of 'xkcd--sl' run-times
      SL_TOTAL=$(echo "$SL_TOTAL+$DECT" | bc)
      let COUNT=$COUNT+1
   done
               # Display run-time total,
               # '$SL_TOTAL'
   echo -e "\t\tTOTAL: $SL_TOTAL"
   echo $SL_TOTAL > "sl--totals$RUN"
   echo

### Run the test a second time with 'xkcd--bl'

               # Signal start of 'xkcd--bl'
               # test
   echo "xkcd--bl"
               # Remove previous records and
               # start new record for second
               # program, 'xkcd--bl'
   RECORD="bl--time$RUN.txt"
   echo "RECORD file: $RECORD"
   if [ -e $RECORD ]; then rm $RECORD; fi
   touch $RECORD
               # Identify command to time
   echo "time xkcd--bl $START $END:"
               # Run 'xkcd--bl' on test file five
               # times
   for (( i = 1; i <= 5; i++)); do
               # Identify run number
      echo -n -e "$i)\t "
               # Run 'xkcd--bl' on test file,
               # Suppress 'xkcd--bl' display,
               # and record run-times
      { time xkcd--bl $START $END > /dev/null 2>&1 ; } 2>> $RECORD
               # Use this line to display
               # 'xkcd--sl' output
   #   { time xkcd--bl $START $END ; } 2>> $RECORD
               # Display run-time
      cat $RECORD | tail -n 1
   done
   echo
               # Start count of recorded
               # run-times
   let -i COUNT=1
               # Start run-time total at zero,
               # sending value to 'bc' for
               # decimal numbers
   BL_TOTAL=$(echo "0" | bc)
               # Signal start of summary
   echo "total time for 'xkcd--bl':"
               # Read each run-time in
               # recorded list
   for TIME in $(cat $RECORD); do
               # Convert recorded times to
               # decimal numbers with 'bc'
      DECT=$(echo "$TIME" | bc)
               # Display times and decimal
               # values
      echo -e "$COUNT) TIME: $TIME\t DECT: $DECT"
               # Use 'bc' to calculate the sum
               # of run-times
      BL_TOTAL=$(echo "$BL_TOTAL+$DECT" | bc)
      let COUNT=$COUNT+1
   done
               # Display run-time total,
               # '$BL_TOTAL'
   echo -e "\t\tTOTAL: $BL_TOTAL"
   echo $BL_TOTAL > "bl--totals$RUN"
   echo

### Results
               # Decide which program had
               # shorter total run time
               # Display total time
               # difference, if any
   if [ $(echo "$SL_TOTAL<$BL_TOTAL" | bc) = 1 ]; then
      DIFF=$(echo "$BL_TOTAL-$SL_TOTAL" | bc)
      echo "'xkcd--sl' took $DIFF seconds less time than 'xkcd--bl'"
   elif [ $(echo "$BL_TOTAL<$SL_TOTAL" | bc) = 1 ]; then
      DIFF=$(echo "$SL_TOTAL-$BL_TOTAL" | bc)
      echo "'xkcd--bl' took $DIFF seconds less time than 'xkcd--sl'"
   else
      echo "'xkcd--sl' and 'xkcd--bl' took the same time"
   fi
echo
done
echo

## RESULTS AFTER FIVE BATCHS (OF FIVE)

               # Initialize batch totals
SL_BATCH_TOTAL=$(echo "0" | bc)
BL_BATCH_TOTAL=$(echo "0" | bc)   
               # Signal start of summary
echo "Calculating batch total for 'xkcd--sl'..."
echo "Calculating batch total for 'xkcd--bl'..."
echo
               # Display header
printf "%-5s" "Run"; printf "%+11s" "sl Batch"; printf "%+11s\n" "bl Batch"
printf "%-5s" "~~~"; printf "%+11s" "~~~~~~~~"; printf "%+11s\n" "~~~~~~~~"
               # Read each recorded batch
               # time, convert to decimal
               # number with 'bc', display
               # values in a chart, and
               # calculate total time over
               # five batches
for (( BATCH = 1; BATCH <= 5; BATCH++ )); do
               # Read batch times for each
               # run for both programs
   SL_BATCH_TIME=$(cat sl--totals$BATCH)
   BL_BATCH_TIME=$(cat bl--totals$BATCH)
               # Convert recorded times to
               # decimal numbers with 'bc'
   SL_BATCH_DECT=$(echo "$SL_BATCH_TIME" | bc)
   BL_BATCH_DECT=$(echo "$BL_BATCH_TIME" | bc)
               # Display batch times for both
               # both programs
   printf "%-5s" "$BATCH"; printf "%+11s" "$SL_BATCH_TIME"; printf "%+11s\n" "$BL_BATCH_TIME"
               # Use 'bc' to calculate the sum
               # of batch times
   SL_BATCH_TOTAL=$(echo "$SL_BATCH_TOTAL+$SL_BATCH_DECT" | bc)
   BL_BATCH_TOTAL=$(echo "$BL_BATCH_TOTAL+$BL_BATCH_DECT" | bc)
done
               # Display batch totals,
               # '$SL_BATCH_TOTAL'
               # '$BL_BATCH_TOTAL'
printf "%-8s" "Total:"; printf "%+8s" "$SL_BATCH_TOTAL"; printf "%+11s\n" "$BL_BATCH_TOTAL"
               # Reset default 'time' format
TIMEFORMAT=$'\nreal\t%3lR\nuser\t%3lU\nsys\t%3lS'
               # Signal end of program
echo "done"; echo
exit 0


So far I have only been using batches of one. The times are very close:
Code:
geoffrey@Presario-Laptop:~/Desktop$ speed_test_xkcd 135

speed_test: 'xkcd--sl' vs 'xkcd--bl'

RUN: 1
START: 135
END: 135

xkcd--sl
RECORD file: sl--time1.txt
time xkcd--sl 135 135:
1)    2.390
2)    1.777
3)    1.810
4)    1.789
5)    1.769

total time for 'xkcd--sl':
1) TIME: 2.390    DECT: 2.390
2) TIME: 1.777    DECT: 1.777
3) TIME: 1.810    DECT: 1.810
4) TIME: 1.789    DECT: 1.789
5) TIME: 1.769    DECT: 1.769
      TOTAL: 9.535

xkcd--bl
RECORD file: bl--time1.txt
time xkcd--bl 135 135:
1)    1.836
2)    1.777
3)    1.763
4)    1.782
5)    1.794

total time for 'xkcd--bl':
1) TIME: 1.836    DECT: 1.836
2) TIME: 1.777    DECT: 1.777
3) TIME: 1.763    DECT: 1.763
4) TIME: 1.782    DECT: 1.782
5) TIME: 1.794    DECT: 1.794
      TOTAL: 8.952

'xkcd--bl' took .583 seconds less time than 'xkcd--sl'

RUN: 2
START: 136
END: 136

xkcd--sl
RECORD file: sl--time2.txt
time xkcd--sl 136 136:
1)    2.082
2)    2.051
3)    2.008
4)    2.011
5)    2.036

total time for 'xkcd--sl':
1) TIME: 2.082    DECT: 2.082
2) TIME: 2.051    DECT: 2.051
3) TIME: 2.008    DECT: 2.008
4) TIME: 2.011    DECT: 2.011
5) TIME: 2.036    DECT: 2.036
      TOTAL: 10.188

xkcd--bl
RECORD file: bl--time2.txt
time xkcd--bl 136 136:
1)    1.940
2)    1.847
3)    1.844
4)    1.846
5)    1.855

total time for 'xkcd--bl':
1) TIME: 1.940    DECT: 1.940
2) TIME: 1.847    DECT: 1.847
3) TIME: 1.844    DECT: 1.844
4) TIME: 1.846    DECT: 1.846
5) TIME: 1.855    DECT: 1.855
      TOTAL: 9.332

'xkcd--bl' took .856 seconds less time than 'xkcd--sl'

RUN: 3
START: 137
END: 137

xkcd--sl
RECORD file: sl--time3.txt
time xkcd--sl 137 137:
1)    2.731
2)    2.685
3)    2.714
4)    2.680
5)    2.689

total time for 'xkcd--sl':
1) TIME: 2.731    DECT: 2.731
2) TIME: 2.685    DECT: 2.685
3) TIME: 2.714    DECT: 2.714
4) TIME: 2.680    DECT: 2.680
5) TIME: 2.689    DECT: 2.689
      TOTAL: 13.499

xkcd--bl
RECORD file: bl--time3.txt
time xkcd--bl 137 137:
1)    2.746
2)    2.947
3)    2.659
4)    2.755
5)    2.709

total time for 'xkcd--bl':
1) TIME: 2.746    DECT: 2.746
2) TIME: 2.947    DECT: 2.947
3) TIME: 2.659    DECT: 2.659
4) TIME: 2.755    DECT: 2.755
5) TIME: 2.709    DECT: 2.709
      TOTAL: 13.816

'xkcd--sl' took .317 seconds less time than 'xkcd--bl'

RUN: 4
START: 138
END: 138

xkcd--sl
RECORD file: sl--time4.txt
time xkcd--sl 138 138:
1)    1.630
2)    1.642
3)    1.632
4)    1.560
5)    1.567

total time for 'xkcd--sl':
1) TIME: 1.630    DECT: 1.630
2) TIME: 1.642    DECT: 1.642
3) TIME: 1.632    DECT: 1.632
4) TIME: 1.560    DECT: 1.560
5) TIME: 1.567    DECT: 1.567
      TOTAL: 8.031

xkcd--bl
RECORD file: bl--time4.txt
time xkcd--bl 138 138:
1)    2.767
2)    1.623
3)    1.558
4)    1.561
5)    1.572

total time for 'xkcd--bl':
1) TIME: 2.767    DECT: 2.767
2) TIME: 1.623    DECT: 1.623
3) TIME: 1.558    DECT: 1.558
4) TIME: 1.561    DECT: 1.561
5) TIME: 1.572    DECT: 1.572
      TOTAL: 9.081

'xkcd--sl' took 1.050 seconds less time than 'xkcd--bl'

RUN: 5
START: 139
END: 139

xkcd--sl
RECORD file: sl--time5.txt
time xkcd--sl 139 139:
1)    1.705
2)    1.672
3)    1.636
4)    1.643
5)    1.650

total time for 'xkcd--sl':
1) TIME: 1.705    DECT: 1.705
2) TIME: 1.672    DECT: 1.672
3) TIME: 1.636    DECT: 1.636
4) TIME: 1.643    DECT: 1.643
5) TIME: 1.650    DECT: 1.650
      TOTAL: 8.306

xkcd--bl
RECORD file: bl--time5.txt
time xkcd--bl 139 139:
1)    1.700
2)    1.639
3)    1.644
4)    1.634
5)    1.637

total time for 'xkcd--bl':
1) TIME: 1.700    DECT: 1.700
2) TIME: 1.639    DECT: 1.639
3) TIME: 1.644    DECT: 1.644
4) TIME: 1.634    DECT: 1.634
5) TIME: 1.637    DECT: 1.637
      TOTAL: 8.254

'xkcd--bl' took .052 seconds less time than 'xkcd--sl'


Calculating batch total for 'xkcd--sl'...
Calculating batch total for 'xkcd--bl'...

Run     sl Batch   bl Batch
~~~     ~~~~~~~~   ~~~~~~~~
1          9.535      8.952
2         10.188      9.332
3         13.499     13.816
4          8.031      9.081
5          8.306      8.254
Total:    49.559     49.435
done


Next, I will try running the speed test with (5) batches of five comics. Let's see if there is any marked difference.


Top
 Profile  
 PostPosted: Wed Feb 03, 2010 12:45 am   
User avatar

Joined: Mon Jan 18, 2010 8:10 pm
Posts: 40
Well, here it is...
Code:
geoffrey@Presario-Laptop:~/Desktop$ speed_test_xkcd 140

speed_test: 'xkcd--sl' vs 'xkcd--bl'

RUN: 1
START: 140
END: 144

xkcd--sl
RECORD file: sl--time1.txt
time xkcd--sl 140 144:
1)    9.589
2)    9.623
3)    9.738
4)    9.787
5)    9.276

total time for 'xkcd--sl':
1) TIME: 9.589    DECT: 9.589
2) TIME: 9.623    DECT: 9.623
3) TIME: 9.738    DECT: 9.738
4) TIME: 9.787    DECT: 9.787
5) TIME: 9.276    DECT: 9.276
      TOTAL: 48.013

xkcd--bl
RECORD file: bl--time1.txt
time xkcd--bl 140 144:
1)    10.075
2)    9.055
3)    9.113
4)    9.062
5)    8.995

total time for 'xkcd--bl':
1) TIME: 10.075    DECT: 10.075
2) TIME: 9.055    DECT: 9.055
3) TIME: 9.113    DECT: 9.113
4) TIME: 9.062    DECT: 9.062
5) TIME: 8.995    DECT: 8.995
      TOTAL: 46.300

'xkcd--bl' took 1.713 seconds less time than 'xkcd--sl'

RUN: 2
START: 145
END: 149

xkcd--sl
RECORD file: sl--time2.txt
time xkcd--sl 145 149:
1)    6.349
2)    6.860
3)    6.259
4)    6.259
5)    6.257

total time for 'xkcd--sl':
1) TIME: 6.349    DECT: 6.349
2) TIME: 6.860    DECT: 6.860
3) TIME: 6.259    DECT: 6.259
4) TIME: 6.259    DECT: 6.259
5) TIME: 6.257    DECT: 6.257
      TOTAL: 31.984

xkcd--bl
RECORD file: bl--time2.txt
time xkcd--bl 145 149:
1)    6.367
2)    6.321
3)    6.251
4)    6.269
5)    6.251

total time for 'xkcd--bl':
1) TIME: 6.367    DECT: 6.367
2) TIME: 6.321    DECT: 6.321
3) TIME: 6.251    DECT: 6.251
4) TIME: 6.269    DECT: 6.269
5) TIME: 6.251    DECT: 6.251
      TOTAL: 31.459

'xkcd--bl' took .525 seconds less time than 'xkcd--sl'

RUN: 3
START: 150
END: 154

xkcd--sl
RECORD file: sl--time3.txt
time xkcd--sl 150 154:
1)    7.889
2)    7.883
3)    8.383
4)    8.027
5)    8.463

total time for 'xkcd--sl':
1) TIME: 7.889    DECT: 7.889
2) TIME: 7.883    DECT: 7.883
3) TIME: 8.383    DECT: 8.383
4) TIME: 8.027    DECT: 8.027
5) TIME: 8.463    DECT: 8.463
      TOTAL: 40.645

xkcd--bl
RECORD file: bl--time3.txt
time xkcd--bl 150 154:
1)    7.960
2)    7.899
3)    7.866
4)    7.952
5)    7.926

total time for 'xkcd--bl':
1) TIME: 7.960    DECT: 7.960
2) TIME: 7.899    DECT: 7.899
3) TIME: 7.866    DECT: 7.866
4) TIME: 7.952    DECT: 7.952
5) TIME: 7.926    DECT: 7.926
      TOTAL: 39.603

'xkcd--bl' took 1.042 seconds less time than 'xkcd--sl'

RUN: 4
START: 155
END: 159

xkcd--sl
RECORD file: sl--time4.txt
time xkcd--sl 155 159:
1)    6.046
2)    6.045
3)    6.096
4)    6.067
5)    6.047

total time for 'xkcd--sl':
1) TIME: 6.046    DECT: 6.046
2) TIME: 6.045    DECT: 6.045
3) TIME: 6.096    DECT: 6.096
4) TIME: 6.067    DECT: 6.067
5) TIME: 6.047    DECT: 6.047
      TOTAL: 30.301

xkcd--bl
RECORD file: bl--time4.txt
time xkcd--bl 155 159:
1)    6.087
2)    6.016
3)    6.037
4)    6.059
5)    7.102

total time for 'xkcd--bl':
1) TIME: 6.087    DECT: 6.087
2) TIME: 6.016    DECT: 6.016
3) TIME: 6.037    DECT: 6.037
4) TIME: 6.059    DECT: 6.059
5) TIME: 7.102    DECT: 7.102
      TOTAL: 31.301

'xkcd--sl' took 1.000 seconds less time than 'xkcd--bl'

RUN: 5
START: 160
END: 164

xkcd--sl
RECORD file: sl--time5.txt
time xkcd--sl 160 164:
1)    9.177
2)    8.998
3)    8.608
4)    8.585
5)    8.918

total time for 'xkcd--sl':
1) TIME: 9.177    DECT: 9.177
2) TIME: 8.998    DECT: 8.998
3) TIME: 8.608    DECT: 8.608
4) TIME: 8.585    DECT: 8.585
5) TIME: 8.918    DECT: 8.918
      TOTAL: 44.286

xkcd--bl
RECORD file: bl--time5.txt
time xkcd--bl 160 164:
1)    8.643
2)    8.572
3)    8.611
4)    8.679
5)    8.932

total time for 'xkcd--bl':
1) TIME: 8.643    DECT: 8.643
2) TIME: 8.572    DECT: 8.572
3) TIME: 8.611    DECT: 8.611
4) TIME: 8.679    DECT: 8.679
5) TIME: 8.932    DECT: 8.932
      TOTAL: 43.437

'xkcd--bl' took .849 seconds less time than 'xkcd--sl'


Calculating batch total for 'xkcd--sl'...
Calculating batch total for 'xkcd--bl'...

Run     sl Batch   bl Batch
~~~     ~~~~~~~~   ~~~~~~~~
1         48.013     46.300
2         31.984     31.459
3         40.645     39.603
4         30.301     31.301
5         44.286     43.437
Total:   195.229    192.100
done



After 25 comics downloaded (five batches of five), the "big loop" method of processing the comics turned out to be slightly (3.129 seconds) faster than the "small loops" method.

The times are very close. So close that the difference might be due to nothing more than that there is one loop in the "big loop" method, where there are three loops in the "small loops" method--two less loops to process.

That makes sense...

If there was any time saved in doing all the downloading at once, before assembling the comics, it seems to have been negated by having extra loops to process.

(Not what I expected...)


Top
 Profile  
 PostPosted: Wed Feb 03, 2010 12:49 am   
User avatar

Joined: Mon Jan 18, 2010 8:10 pm
Posts: 40
Here's another one that I particularly like:


You do not have the required permissions to view the files attached to this post.


Top
 Profile  
 PostPosted: Sun Feb 21, 2010 4:03 am   
User avatar

Joined: Sun Nov 01, 2009 8:59 am
Posts: 23
Location: Try to guess!
Downloading xkcd comics was my first (failed) attempt at bash scripting long time ago! o:|
Thanks, i'm definitely using your script! ;)


Top
 Profile  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 18 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  


BashScripts | Promote Your Page Too
Powered by phpBB © 2011 phpBB Group
© 2003 - 2011 USA LINUX USERS GROUP