Skip to content

Very small Bash scripts to retrieve multiple PDF and create a book

March 20, 2006

The National Academies Press are putting some of their books on-line. I was particularly interested in the Guidelines for the Care and Use of Mammals in Neuroscience and Behavioral Research. The only “trick” is that they provide the book one page at a time (either in HTML or in PDF format). If you want entire chapters or the whole book in one file, you have to purchase it. I think it is a fair deal (how many publishers do that?).

Now, I was sure I can automate the retrieval of PDFs and obtain one file containing the whole book. They give pages 1 up to page 209. So, I wrote this small Bash script to retrieve all the pages (all the PDFs):

#!/bin/bash
# ./getbook.sh -> retrieve PDFs from nap.edu/

c=1

while [ $c -lt 210 ]
do
        wget -c http://print.nap.edu/pdf/0309089034/pdf_image/$c.pdf
        c=$((c+1))
done

In a few minutes, I was able to get all the PDFs. 🙂 Now, I want them all in only 1 PDF. Here, I’ll use pdfjoin (from PDFJam) to … join them. Of course, I can begin to type one big command like “pdfjoin 1.pdf 2.pdf 3.pdf ...” but I was sure there is a better solution. Again, I used a Bash script:

#!/bin/bash
# ./joinpdf.sh -> join PDFs from nap.edu/

c=2
s="1.pdf"

while [ $c -lt 210 ]
do
        s="$s $c.pdf"
        c=$((c+1))
done

pdfjoin $s --fitpaper false --paper a4paper --outfile book.pdf

Now, I have a wonderful book.pdf that I can read on my computer or print on a printer. 🙂

P.S.1: it’s not Perl but I am sure there is more than one way to do it
P.S.2: you can join the two Bash script to to everything in one go. In this case, it would be interesting to create a variable for the maximum number of PDF available (210 in the two scripts above).
P.S.3: as usual, explanations around these scripts are longer than the scripts themselves!

From → Computers, Lab life

Comments are closed.

%d bloggers like this: