Month: October 2006

Looking for a good free UML2 modelling editor …

I was using Poseidon as a modelling editor for my UML2 diagrams. It was based on Java and I was able to run it from both GNU/Linux and MS-Windows. It was not free software but the Community Edition was free (as in “free beer”) and has all the tools I modestly needed. The only trick: all the diagrams had a string in the bottom, stating it was not meant to be used for commercial purpose (for educational purpose, I’ve written a small software that removes it).

Today, Gentleware boss announced that Poseidon will go away. He said it will be replaced by Apollo for Eclipse and by a new licensing model (renting) starting at 5€ per month. An unregistered version will be available but it won’t be possible to export, print, save, etc.

First, this shows one of the problems of using free-as-in-free-beer-but-proprietary software (and not really free software): the owner can change the licence, the software availability and usage conditions at any time. Secondly, althought I understand the move from a commercial/business point of view (if they need money) but I wonder if they are not depriving themselves from a potential user base (Community Edition users that will recommend paid version in a professional environment).

Anyway, I am now looking for a new, good and free UML2 modelling editor. After a quick search, I’ve found:

  • ArgoUML, a Java-based editor supporting UML1.4, able to import/export Java code (*) (BSD licence)
  • Umbrello UML Modeller, written for KDE only, it can import/generate code from/for Python, Java, C++, … (GPL)
  • BOUML is also Java-based and I think it’s the only one in this list that supports UML2 ; it can generate code for C++, Java (and Idl) (GPL)
  • PyUT is a class diagram editor written in Python and supports UML1.3 ; it can import/export Python and Java source code and export C++ code (GPL)

I really don’t have time for the moment to test all these software. As soon as I’ll have time, I’ll give them a try. Meanwhile, if you have other suggestions and/or any experience with one of them, please feel free to post a comment.

(*) Although I am not confortable with code auto-generation tools, the ability to import/generate code for a programming language is a good indication of the ability of the modelling tool to understand and take into account this language specificity. You don’t want Java syntax highlights when developping a Python application.

RNA-oriented Nobel Prizes

On 6 Nobel prizes, 2 were awarded to people involved in research about RNA. The 2006 Nobel Prize in Medicine was awarded to Andrew Fire and Craig Mello “for their discovery of RNA interference – gene silencing by double-stranded RNA”. And the 2006 Nobel Prize in Chemistry was awarded to Roger Kornberg “for his studies of the molecular basis of eukaryotic transcription”.

RNA interference is a mechanism where a “double-stranded ribonucleic acid (dsRNA) interferes with the expression of a particular gene”. And transcription is basically the process through which a DNA sequence is copied to produce a complementary RNA.

Some years ago, everyone was talking about genomics, the study of genes. Now people working on RNA win Nobel Prizes. Knowing that DNA is transcripted into RNAs and that some RNAs (mRNA) are later translated into proteins, I predict that we’ll see a future Nobel Prize in proteomics, the study of proteins. 😉 Ok, Fenn, Tanaka and Wüthrich already won a Nobel Prize in 2002 for MALDI and NMR mass spectrometry, a technique used, a.o., to identify proteins. And Blobel won a Nobel Prize in 1999 for protein targeting.

Playing with Python and Gadfly

Following my previous post where I retrieved EXIF tags from photos posted on Flickr, here is the next step: my script now stores data in a database.

There is a lot of free wrappers for databases in Python. Although I first thought of using pysqlite (because I am already using SQLite in another project), I decided to use Gadfly, a real SQL relational database system entirely written in Python. It does not need a separate server, it complies with the Python DBAPI (allowing easy changes of DB system) and it’s free.

Using Gadfly is very easy and their tutorial is very comprehensible. Put toghether, here is an example of the creation of a database, the addition of some data and their retrieval (testGadfly.py):

#!/usr/bin/python
# test for Gadfly
import os
import gadfly
import time

DBdir = 'testDB'
DBname = 'testDB'

if os.path.exists(DBdir):
    print 'Database already exists. I will just open it'
    connection = gadfly.gadfly(DBname, DBdir)
    cursor = connection.cursor()
else:
    print 'Database not present. I will create it'
    os.mkdir(DBdir)
    connection = gadfly.gadfly()
    connection.startup(DBname, DBdir)
    cursor = connection.cursor()
    cursor.execute("CREATE TABLE camera (t FLOAT, make VARCHAR, model VARCHAR)")

print 'Add some items'
t = float(time.time())
cmake = 'Nikon'
cmodel = 'D400'
cmd = "INSERT into camera (t, make, model) VALUES\
    ('" + str(t) + "','" + cmake + "','" + cmodel + "')"
cursor.execute(cmd)

print 'Retrieve all items'
cursor.execute("SELECT * FROM camera")
for x in cursor.fetchall():
    print x

connection.commit()
print 'Done!'

Regarding the initial project, the script became too long to be pasted in this post but you can download it here: flickrCameraQuantifier2.py (5ko). To run it, you should have installed the wrapper for EXIF and Flickr and the Gadfly DB system. In the beginning of the script, you can define the number of iterations (sets of queries) you want in total (variable niterations), the sleep duration between queries (variable sleepduration) and the number of photos to get for each query (variable nphotostoget). Everything will then be stored in a Gadfly database (default name: cameraDB). If you want to read what is stored, here is a very basic script: flickrCQ2Reader.py.

For example, I’ve just asked 125 queries (with 5s between each query). I’ve got 88 photos (70.4% of queries) with 27 photos without EXIF tags (30.68% of all the photos). Among all the camera makers, Canon has 27%, Fuji has 11%, Nikon has 18% and Sony has 21% of all the photos with EXIF tags at that moment. This is approximately what Flagrant disregard found. I don’t have time anymore but one could improve the data retrieval script in order to automate the statistics and their presentation …

Edit on October 9th, 2006: added the links to the missing scripts

Playing with Python, EXIF tags and Flickr API

Some days ago, I was quite amused by Flagrant Disregard Top Digital Cameras: these people daily took 10000 photos that were uploaded on Flickr and looked at the camera makes and models of these photos. This kind of study is interesting because one can see what people are actually using and what camera models can give good results (with a good photographer, of course). I was just disappointed by the fact that they are not saying anything about their sampling method nor the statistics they can apply to their data. I then thought that I can do a kind of survey like this one and publish results along with the method.

One more time, I’ll do this with Python. Instead of reading binary data from JPG files to look at EXIF tags, I’ll use external “modules” (wrappers). After a small survey on the web, it seems that GeneCash’s EXIF.py was the best solution. Indeed, to get the camera make and model of a test image, the code is simply:

import EXIF
f = open('testimage.jpg', 'rb')
tags = EXIF.process_file(f)
print "Image Make: %s - Image Model: %s" % (tags['Image Make'], tags['Image Model'])

Now, to access Flickr most recent photos, I had two options:

  1. I open the Flickr most recent photos page and parse the HTML in order to get the photo. This can be done with regular expressions or XML parsing.
  2. I use the Flickr API where there is a specially designed method: flickr.photos.getRecent

I chose the second option and looked at the three kits for Python referenced by Flickr:

  • FlickrClient author admits his kit is outdated and gives a link to Beej’s Python Flickr API
  • Beej’s Python Flickr API seems to be interesting but there isn’t much documentation and, being a beginner in Python, I was quickly lost
  • Finally, James Clarke’s flickr.py seemed to be a nice and easy to use wrapper. So, I decided to go with it.

Unfortunately, the getRecent method isn’t implemented (James Clarke did not maintained this wrapper since 2005). I tried to use the photos_search method (wrapper for flickr.photos.search method), hoping that using it without any tag will give me all the most recent photos. But some people probably thought of it before me because Flickr disabled parameterless searches. Look at the error:

import flickr
z = flickr.photos_search('', False, '', '', '','', '', '', '', '', '2', '', '')

Traceback (most recent call last):
[...]
FlickrError: ERROR [3]: Parameterless searches have been disabled. Please use flickr.photos.getRecent instead.

So, I was forced to implement the getRecent method. Fortunately, it wasn’t too difficult. Here is the code you can insert at line 589 in James Clarke’s flickr.py (or download my flickr.py here):

def photos_getrecent(extra='', per_page='', page=''):
    """Returns a list of Photo objects.

    """
    method = 'flickr.photos.getRecent'

    data = _doget(method, API_KEY, extra=extra, per_page=per_page, page=page)
    photos = []
    if isinstance(data.rsp.photos.photo, list):
        for photo in data.rsp.photos.photo:
            photos.append(_parse_photo(photo))
    else:
        photos = [_parse_photo(data.rsp.photos.photo)]
    return photos

Now, I have Python, an EXIF wrapper and a Flickr wrapper with a getRecent method, I can write a small script that fetch the 10 most recent images from Flickr and display their camera make and model (if they have one) (flickrCameraQuantifier.py):

#!/usr/bin/python
import urllib2

import EXIF
import flickr

recentimgs = flickr.photos_getrecent('', '10', '1')

imgurls = []
for img in recentimgs:
    try:
        imgurls.append(str(img.getURL(size='Original', urlType='source')))
    except:
        print 'Error while getting an image URL'

for imgurl in imgurls:
    imgstream = urllib2.urlopen(imgurl)
    # save the image
    f = open('tmp.jpg', 'wb')
    for line in imgstream.readlines():
        f.write(line)
    f.close()
    # get the tags
    f = open('tmp.jpg', 'rb')
    try:
        tags = EXIF.process_file(f)
        if len(str(tags['Image Make'])) > 0:
            if len(str(tags['Image Model'])) > 0:
                print "Image Make: %s - Image Model: %s" % (tags['Image Make'], tags['Image Model'])
            else:
                print "Image Make: %s" % (tags['Image Make'])
        else:
            print "No Image Make nor Model available"
    except:
        print 'Error while getting tags from an image'
    f.close()

print "Done!"

Out of 10 images, it usually can give 7-9 camera models. I didn’t checked yet if errors are due to my script or the lack of EXIF tag in submitted images. The EXIF tag detection is a bit slow (imho) but it’s ok. And it’s a “one shot” script: once it finishes it work, nothing remains in memory. So, the next step is to use a flat file or a database connection to remember details found.

I suggest the following method: every 5 minutes, the script retrieves the most recent photo uploaded on Flickr and store the camera make and model somewhere. Each day, one would be able to do some decent statistics. I prefer a sampling of 1 photo every minute rather than 10 photos at one precise moment because people usually upload their pictures in batch processes. There is then a risk that these 10 photos are from the same person and taken by the same device.