Playing with Python, EXIF tags and Flickr API

Some days ago, I was quite amused by Flagrant Disregard Top Digital Cameras: these people daily took 10000 photos that were uploaded on Flickr and looked at the camera makes and models of these photos. This kind of study is interesting because one can see what people are actually using and what camera models can give good results (with a good photographer, of course). I was just disappointed by the fact that they are not saying anything about their sampling method nor the statistics they can apply to their data. I then thought that I can do a kind of survey like this one and publish results along with the method.

One more time, I’ll do this with Python. Instead of reading binary data from JPG files to look at EXIF tags, I’ll use external “modules” (wrappers). After a small survey on the web, it seems that GeneCash’s EXIF.py was the best solution. Indeed, to get the camera make and model of a test image, the code is simply:

import EXIF
f = open('testimage.jpg', 'rb')
tags = EXIF.process_file(f)
print "Image Make: %s - Image Model: %s" % (tags['Image Make'], tags['Image Model'])

Now, to access Flickr most recent photos, I had two options:

  1. I open the Flickr most recent photos page and parse the HTML in order to get the photo. This can be done with regular expressions or XML parsing.
  2. I use the Flickr API where there is a specially designed method: flickr.photos.getRecent

I chose the second option and looked at the three kits for Python referenced by Flickr:

  • FlickrClient author admits his kit is outdated and gives a link to Beej’s Python Flickr API
  • Beej’s Python Flickr API seems to be interesting but there isn’t much documentation and, being a beginner in Python, I was quickly lost
  • Finally, James Clarke’s flickr.py seemed to be a nice and easy to use wrapper. So, I decided to go with it.

Unfortunately, the getRecent method isn’t implemented (James Clarke did not maintained this wrapper since 2005). I tried to use the photos_search method (wrapper for flickr.photos.search method), hoping that using it without any tag will give me all the most recent photos. But some people probably thought of it before me because Flickr disabled parameterless searches. Look at the error:

import flickr
z = flickr.photos_search('', False, '', '', '','', '', '', '', '', '2', '', '')

Traceback (most recent call last):
[...]
FlickrError: ERROR [3]: Parameterless searches have been disabled. Please use flickr.photos.getRecent instead.

So, I was forced to implement the getRecent method. Fortunately, it wasn’t too difficult. Here is the code you can insert at line 589 in James Clarke’s flickr.py (or download my flickr.py here):

def photos_getrecent(extra='', per_page='', page=''):
    """Returns a list of Photo objects.

    """
    method = 'flickr.photos.getRecent'

    data = _doget(method, API_KEY, extra=extra, per_page=per_page, page=page)
    photos = []
    if isinstance(data.rsp.photos.photo, list):
        for photo in data.rsp.photos.photo:
            photos.append(_parse_photo(photo))
    else:
        photos = [_parse_photo(data.rsp.photos.photo)]
    return photos

Now, I have Python, an EXIF wrapper and a Flickr wrapper with a getRecent method, I can write a small script that fetch the 10 most recent images from Flickr and display their camera make and model (if they have one) (flickrCameraQuantifier.py):

#!/usr/bin/python
import urllib2

import EXIF
import flickr

recentimgs = flickr.photos_getrecent('', '10', '1')

imgurls = []
for img in recentimgs:
    try:
        imgurls.append(str(img.getURL(size='Original', urlType='source')))
    except:
        print 'Error while getting an image URL'

for imgurl in imgurls:
    imgstream = urllib2.urlopen(imgurl)
    # save the image
    f = open('tmp.jpg', 'wb')
    for line in imgstream.readlines():
        f.write(line)
    f.close()
    # get the tags
    f = open('tmp.jpg', 'rb')
    try:
        tags = EXIF.process_file(f)
        if len(str(tags['Image Make'])) > 0:
            if len(str(tags['Image Model'])) > 0:
                print "Image Make: %s - Image Model: %s" % (tags['Image Make'], tags['Image Model'])
            else:
                print "Image Make: %s" % (tags['Image Make'])
        else:
            print "No Image Make nor Model available"
    except:
        print 'Error while getting tags from an image'
    f.close()

print "Done!"

Out of 10 images, it usually can give 7-9 camera models. I didn’t checked yet if errors are due to my script or the lack of EXIF tag in submitted images. The EXIF tag detection is a bit slow (imho) but it’s ok. And it’s a “one shot” script: once it finishes it work, nothing remains in memory. So, the next step is to use a flat file or a database connection to remember details found.

I suggest the following method: every 5 minutes, the script retrieves the most recent photo uploaded on Flickr and store the camera make and model somewhere. Each day, one would be able to do some decent statistics. I prefer a sampling of 1 photo every minute rather than 10 photos at one precise moment because people usually upload their pictures in batch processes. There is then a risk that these 10 photos are from the same person and taken by the same device.