Following my previous post where I retrieved EXIF tags from photos posted on Flickr, here is the next step: my script now stores data in a database.
There is a lot of free wrappers for databases in Python. Although I first thought of using pysqlite (because I am already using SQLite in another project), I decided to use Gadfly, a real SQL relational database system entirely written in Python. It does not need a separate server, it complies with the Python DBAPI (allowing easy changes of DB system) and it’s free.
#!/usr/bin/python # test for Gadfly import os import gadfly import time DBdir = 'testDB' DBname = 'testDB' if os.path.exists(DBdir): print 'Database already exists. I will just open it' connection = gadfly.gadfly(DBname, DBdir) cursor = connection.cursor() else: print 'Database not present. I will create it' os.mkdir(DBdir) connection = gadfly.gadfly() connection.startup(DBname, DBdir) cursor = connection.cursor() cursor.execute("CREATE TABLE camera (t FLOAT, make VARCHAR, model VARCHAR)") print 'Add some items' t = float(time.time()) cmake = 'Nikon' cmodel = 'D400' cmd = "INSERT into camera (t, make, model) VALUES\ ('" + str(t) + "','" + cmake + "','" + cmodel + "')" cursor.execute(cmd) print 'Retrieve all items' cursor.execute("SELECT * FROM camera") for x in cursor.fetchall(): print x connection.commit() print 'Done!'
Regarding the initial project, the script became too long to be pasted in this post but you can download it here: flickrCameraQuantifier2.py (5ko). To run it, you should have installed the wrapper for EXIF and Flickr and the Gadfly DB system. In the beginning of the script, you can define the number of iterations (sets of queries) you want in total (variable
niterations), the sleep duration between queries (variable
sleepduration) and the number of photos to get for each query (variable
nphotostoget). Everything will then be stored in a Gadfly database (default name:
cameraDB). If you want to read what is stored, here is a very basic script: flickrCQ2Reader.py.
For example, I’ve just asked 125 queries (with 5s between each query). I’ve got 88 photos (70.4% of queries) with 27 photos without EXIF tags (30.68% of all the photos). Among all the camera makers, Canon has 27%, Fuji has 11%, Nikon has 18% and Sony has 21% of all the photos with EXIF tags at that moment. This is approximately what Flagrant disregard found. I don’t have time anymore but one could improve the data retrieval script in order to automate the statistics and their presentation …
Edit on October 9th, 2006: added the links to the missing scripts