Category: Open Source

Playing with Python, EXIF tags and Flickr API

Some days ago, I was quite amused by Flagrant Disregard Top Digital Cameras: these people daily took 10000 photos that were uploaded on Flickr and looked at the camera makes and models of these photos. This kind of study is interesting because one can see what people are actually using and what camera models can give good results (with a good photographer, of course). I was just disappointed by the fact that they are not saying anything about their sampling method nor the statistics they can apply to their data. I then thought that I can do a kind of survey like this one and publish results along with the method.

One more time, I’ll do this with Python. Instead of reading binary data from JPG files to look at EXIF tags, I’ll use external “modules” (wrappers). After a small survey on the web, it seems that GeneCash’s EXIF.py was the best solution. Indeed, to get the camera make and model of a test image, the code is simply:

import EXIF
f = open('testimage.jpg', 'rb')
tags = EXIF.process_file(f)
print "Image Make: %s - Image Model: %s" % (tags['Image Make'], tags['Image Model'])

Now, to access Flickr most recent photos, I had two options:

  1. I open the Flickr most recent photos page and parse the HTML in order to get the photo. This can be done with regular expressions or XML parsing.
  2. I use the Flickr API where there is a specially designed method: flickr.photos.getRecent

I chose the second option and looked at the three kits for Python referenced by Flickr:

  • FlickrClient author admits his kit is outdated and gives a link to Beej’s Python Flickr API
  • Beej’s Python Flickr API seems to be interesting but there isn’t much documentation and, being a beginner in Python, I was quickly lost
  • Finally, James Clarke’s flickr.py seemed to be a nice and easy to use wrapper. So, I decided to go with it.

Unfortunately, the getRecent method isn’t implemented (James Clarke did not maintained this wrapper since 2005). I tried to use the photos_search method (wrapper for flickr.photos.search method), hoping that using it without any tag will give me all the most recent photos. But some people probably thought of it before me because Flickr disabled parameterless searches. Look at the error:

import flickr
z = flickr.photos_search('', False, '', '', '','', '', '', '', '', '2', '', '')

Traceback (most recent call last):
[...]
FlickrError: ERROR [3]: Parameterless searches have been disabled. Please use flickr.photos.getRecent instead.

So, I was forced to implement the getRecent method. Fortunately, it wasn’t too difficult. Here is the code you can insert at line 589 in James Clarke’s flickr.py (or download my flickr.py here):

def photos_getrecent(extra='', per_page='', page=''):
    """Returns a list of Photo objects.

    """
    method = 'flickr.photos.getRecent'

    data = _doget(method, API_KEY, extra=extra, per_page=per_page, page=page)
    photos = []
    if isinstance(data.rsp.photos.photo, list):
        for photo in data.rsp.photos.photo:
            photos.append(_parse_photo(photo))
    else:
        photos = [_parse_photo(data.rsp.photos.photo)]
    return photos

Now, I have Python, an EXIF wrapper and a Flickr wrapper with a getRecent method, I can write a small script that fetch the 10 most recent images from Flickr and display their camera make and model (if they have one) (flickrCameraQuantifier.py):

#!/usr/bin/python
import urllib2

import EXIF
import flickr

recentimgs = flickr.photos_getrecent('', '10', '1')

imgurls = []
for img in recentimgs:
    try:
        imgurls.append(str(img.getURL(size='Original', urlType='source')))
    except:
        print 'Error while getting an image URL'

for imgurl in imgurls:
    imgstream = urllib2.urlopen(imgurl)
    # save the image
    f = open('tmp.jpg', 'wb')
    for line in imgstream.readlines():
        f.write(line)
    f.close()
    # get the tags
    f = open('tmp.jpg', 'rb')
    try:
        tags = EXIF.process_file(f)
        if len(str(tags['Image Make'])) > 0:
            if len(str(tags['Image Model'])) > 0:
                print "Image Make: %s - Image Model: %s" % (tags['Image Make'], tags['Image Model'])
            else:
                print "Image Make: %s" % (tags['Image Make'])
        else:
            print "No Image Make nor Model available"
    except:
        print 'Error while getting tags from an image'
    f.close()

print "Done!"

Out of 10 images, it usually can give 7-9 camera models. I didn’t checked yet if errors are due to my script or the lack of EXIF tag in submitted images. The EXIF tag detection is a bit slow (imho) but it’s ok. And it’s a “one shot” script: once it finishes it work, nothing remains in memory. So, the next step is to use a flat file or a database connection to remember details found.

I suggest the following method: every 5 minutes, the script retrieves the most recent photo uploaded on Flickr and store the camera make and model somewhere. Each day, one would be able to do some decent statistics. I prefer a sampling of 1 photo every minute rather than 10 photos at one precise moment because people usually upload their pictures in batch processes. There is then a risk that these 10 photos are from the same person and taken by the same device.

"A closed mind about an open world"

Under this title, James Boyle, professor of law at Duke Law School (USA), wrote a comment article in the Financial Times [1]. For him, we all have a cognitive bias regarding intellectual property and the internet: the openness aversion. The openness aversion is the fact that we undervalue the importance and productive power of open systems, open networks and non-proprietary production. With three examples (internet, free software and Wikipedia), he somehow shows the evolution of mentalities towards theses “open things”. In 1991, scholars, businessmen and bureaucrats (and even us, maybe) would have scoffed at the internet as a business product. At that moment, control and ownership seemed the right way to go.

Now people evolved and we are a lot to love the internet, free software and Wikipedia. But the openness aversion is still there and some people are trying to restrict freedom (net neutrality, DMCA, DADVSI, DRM, TCPA/TPM, etc.).

[1] Boyles J., “A closed mind about an open world“. Financial Times, August 8th, 2006, p. 9.

P.S. By the way, I discovered Prof. Boyle and his articles on his website. I’ll now have plenty of interesting things to read (as if I didn’t already have enough article and books to read …).

Screen recording software for GNU/Linux

For a long time, I was looking for a video capture software for GNU/Linux. From time to time, I look on the web to see if there are improvements in this field. A recent NewsForge article triggered my curiosity, one more time …

If you accept proprietary formats, you can use vnc2swf : your film will be in Flash format, a proprietary format. Also based on VNC, there is vncrec that produces its own video format (this one seems to be free and easily exported with transcode).

If you don’t have VNC and are working with KDE and don’t like proprieraty formats, the recently released ScreenKast could be very helpful. This software is written by a Belgian (cocorico!) and captures your screen in a video. Supported formats are FFmpeg ones. This NewsForge article explains a little bit about ScreenKast.

But, if you don’t have VNC nor play with KDE, xvidcap can help you. And, finally, if you are working with Gnome, Istanbul is a rather new project but some parts seems to be working. If I have time this week-end, I will install it on my Fedora Core 5.

Now I think GNU/Linux has some decent screen recording software 🙂

Some tutorials:

Release of IPGPhor2Reader

IPGPhor is a device from GE Healthcare (formerly Amersham Biosciences) that performs an isoelectrofocusing of proteins. Version 2 of IPGPhor can be connected to any computer via a serial cable. GE Healthcare provides a monitoring software but no post-hoc analysis software. This gap is efficiently filled by IPGPhor 2 Reader.

Today, I wrote “IPGPhor 2 Reader”. Its goal is to parse log (text) files resulting from an experiment with the IPGPhor and to plot graphs. This software (for MS-Windows, since IPGPhor logs are collected on a MS-Windows computer) is available here.

screenshot IPGPhor2Reader

Goodiff monitors (changes in legal documents of) service providers

GooDiff began its work a week ago and I didn’t see much news/blog posts about it. If I correctly understood, the idea behind GooDiff is to monitor changes in legal documents of (internet) service providers (like Google or Yahoo!). Indeed, service providers are often trying to change on the fly their legal documents, especially in some critical sections like privacy, copyright and alike. With GooDiff, consumers and users are now able to keep track of these changes. Thanks Alexandre!

P.S. Although the name and logo can mislead you (and misled me), the primary origin of the name “GooDiff” is not Google. The “Goo” part comes from the Gray goo (in SF, “goo” means a large mass of replicating nanomachines lacking large-scale structure, which may or may not actually appear like a drippy, shapeless mass). I am learning new words everyday!

Présentations "Messagerie instantanée" et "OOo Impress" aux Namur Linux Days 2006

Les Namur Linux Days avaient pour objectifs de présenter les applications libres, sous GNU/Linux et disponibles pour l’utilisateur final, leur degré d’utilisabilité, leur état d’avancement et leur diversité.

Ma première présentation était consacrée à la messagerie instantanée sous GNU/Linux (dont Jabber !) et vous pouvez la télécharger ici (page reprenant toute une série d’informations dont la présentation en PDF).

Première diapositive sur l'IM

Ma seconde présentation était consacrée à OpenOffice.org Impress. Cette page reprend plus d’informations ainsi que la présentation à télécharger.

Première diapositive sur Impress

Why bother with denunciations? Just use free software!

I really had a hard day at work, moving my desk from one room to another one and coping with unexpected problems. But I finally found some time to look for a new graphic card for my desktop PC (btw. the OpenGraphics project released the schematic of its first FPGA). While reading an article on Tom’s Hardware, I saw a flash animation for the BSA that explicitely ask for denouncement about software without licence. It was so farcical I captured the animation and added a small message at the end. You can download the AVI file here (.avi, 2Mo).

first frame of the movie

last frame of the movie

Namur Linux Days 2006: March 18-19th

On the 18th and 19th of March, 2006, the Namur LUG will organise the “Namur Linux Days 2006“. Despite an English title and this post in English, all the talks will be held in French.

On the 18th (Saturday), there will be two main keynotes: an introduction to Free Software by Maxime Morge and a presentation about intellectual property and free software by Philippe Laurent. Between these two keynotes, there will be a lot of talks about office, multimedia and internet free software for the general public. I will give a talk about OpenOffice.org Impress. The complete schedule is here.

On the 19th, there will be a more “classical” Linux Install Party (LIP). If you intend to attend the LIP, they ask you to register.

Some thoughts on Saturday session at FOSDEM 2006

I went to FOSDEM 2006 on Saturday 25th (schedule here). This year, I went with my brother Laurent (as usual) and my wife, Nandini. This was the first time at FOSDEM for her, it was also the first time she saw so many geeks and I am not sure she enjoyed her day…

In the morning, after a small introduction, Richard M. Stallmann gave his keynote on software patents. Of course, he was preaching to a converted audience (i.e. everyone is against software patents). And, even if we didn’t learned new information on what’s going on, it is always interesting to hear someone else’s opinion (event if it’s the same opinion as us) and a formal presentation on the subject. Two things turned Nandini against Richard Stallman… At one moment, RMS rudely asked that someone “removes this source of noise” (talking about a baby making some noise). Then, during the question, RMS roughly replied to someone trying to ask his questions because he was not talking louder enough (from the middle of the assistance) and because he “dared” to use the words “Open Source” in from of “Him”. I must say that she’s right: we seemed to easily forgive his behaviour because we know the character. But, imho, you can still be a great man, father of the GNU project and be polite.

At the end of this keynote, someone from the FFII (I think it was Hartmut Pilch) took the microphone for a short, 10 minutes speech. Unfortunately, a lot of people was leaving the room at this moment and we were not able to hear a lot. An indicator that really few people were listening to his speech (or could’nt hear it): at one moment, he made a small joke (something like “politicians aren’t used to listen to peole wearing geek T-shirts, so I am wearing a business suit” but it was more funny) and no one laughed!

We skipped the discussion about GPLv3. In the afternoon, we followed the talks about voice-over-IP (VoIP) in the Chavanne room.

We first listened to Jan Janak talking about SIP Express Router (SER, a SIP server). It was a good talk, a bit too technical for me.

Then we listened to Mark Spencer talking about Asterisk, an Open Source PBX (a PBX is a privately-owned telephone switch). If the room was quite full for Jan Janak’s talk, there wasn’t enough seats for Mark Spencer’s one! His talk was sleek-looking, full of acronyms I even don’t have a clue about their meaning and full of humorous audio clips from a (hopefully false) PBX. But it was still accessible to non-technicians.

Finally, we listened to Jean-Marc Valin’s speech about Speex, an Open Source/Free Software patent-free audio compression format designed for speech. We were about only 30-40 people to listen to his great, technical-but-not-too-much talk. From the human speech specifities to the different compression samples, Jean-Marc Valin explained us how speex processes human speech without too much technical details (even Nandini understood how speex worked in spite of the fact that she is a molecular biologist and have less interest in computer-related things). With simple audio samples, clear charts and block diagrams, his talk was a good one.

As usual, besides the official talks and tutorials, they were “dev rooms” and stands held by some free software projects (*BSD, Debian, Mozilla foundation, Fedora, …). We didn’t had too much time to have a look at them, this year. I guess you’ll find more information on the webpages dedicated to the dev room (or on blogs like Laurent Richard’s one, since he was co-organiser of the GNOME room). A last thought? I think that the free software scene is slowly evolving because, besides the usual geek men in T-shirts, I noticed more 30-40 years old people and more women than in previous editions.