Monday, April 19, 2010

BHL poster for AETFAT2010

Due to the volcano in Iceland, I may or may not be going to Madagascar for the AETFAT conference on Thursday as planned. I'm routed through London, everything else is full, so I think I'll have to be making a go/no-go decision on Thursday morning. One of my main reasons for going to the conference was to present this poster (and another one for Tropicos) and I hope I get to display it because I think it turned out really well! Using a variety of open software and open data, I made a photomosaic of Africa and Madagascar from the title pages of books tagged with "Africa" or "Madagascar" in the Biodiversity Heritage Library. Here's how I did it:

1. Downloaded the BHL schema from http://www.biodiversitylibrary.org/data/BHLExportSchema.pdf and the following data exports:

Title: http://www.biodiversitylibrary.org/data/title.txt (10MB+)

Subject: http://www.biodiversitylibrary.org/data/subject.txt (3MB+)

Item: http://www.biodiversitylibrary.org/data/item.txt (14MB+)


2. Imported those text files into tables in a simple db app (MySQL or Access). I set up a One-to-Many relationship between the Title.TitleID field and Subject.TitleID and Item.TitleID, describing how a title ("Flore de Madagascar") has shared data in subjects ("Madagascar") and items ("Volume 25"). Note the field Item.ThumbnailPageID, which indicates the pageID of the image described as either the Title Page, or if no Title Page is selected, then a representative page of interest from the book.


3. Using a simple query editor I created a SQL statement to select the ThumbnailPageID from digitized items whose titles are tagged with the subjects "%Africa%" or "%Madagascar%." Using these wild cards included subjects like "South Africa" and "Madagascar, Central."


4. Using BHL's API documentation for images, I added "http://biodiversitylibrary.org/pagethumb/" to each of the pageIDs in 3. above. This field now contains the link to the page image for the 851 title pages.


5. I used a download manager (Speed Download for Mac OSX; there are plenty for Win/Unix) to grab those 851 JPGs. Using the default size returned, each tile was small at 200 pixels wide, averaging 8k each.


6. I used the map of Africa and Madagascar from UiO as a reference image because it didn't have the sea terrain present, which muddled my first few attempts. I blew that image up using *proprietary software alert* Adobe Photoshop. You can use other imaging software to do the same, but I like Photoshop. I made a blank image roughly 3'x4' at 300 dpi and pasted in the source image, then scaled it to the size of the poster.


7. I then used MacOSaiX to build the photomosaic. This is where all the magic happens, and where I did the least. I just told the app to use the reference image from 6 & the thumbnails from 5 to build the mosaic, and off it went. After 40 minutes or so it beeped and said it was done. Voila! A photomosaic of Africa and Madagascar made from title pages of open access science books.


8. To make the poster I pasted the JPG into *proprietary software alert* Microsoft PowerPoint, because it's surprisingly easy to use for poster layout. Dropped in some text, logo, & a URL and there you have it - a cool poster using open data and (mostly) open software.


You can download the finished poster here as a 1MB JPG.


I'm purposefully documenting how I did this to encourage others to incorporate BHL data into their visualizations & presentations. BHL is an incredibly rich dataset with open access policies and open APIs, and this is but one simple example of how I was able to filter data and extract out compelling images from the millions we have scanned.