People love free e-mail services

… at least in my biased population. It came from a file containing all the people interested by ISAL and that I had to parse (the file, not the people). It’s a tab-delimited file with the names, e-mail addresses, location, interests, etc (in total, 379 unique e-mail IDs). I used Python for that purpose. Since I had all the e-mail addresses in a Python set (*), I decided to do some stats. I know it’s useless but here are the results: 30.34% of Hotmail accounts, 27.18% of Yahoo ones, 7.39% of Gmail ones, 7.12% of Rediffmail ones (a popular e-mail service in India) and “only” 7.65% of KULeuven ones. As we can see, members mainly use free e-mail accounts (probably because the majority of them are students - the “S” in ISAL). And less than 10% of members come from the KULeuven, although ISAL is a students organisation from Leuven (the “L” in ISAL). Of course, R can produce nice charts. And since its documentation states that pie charts are “a very bad way of displaying information”, I also produced a regular bar plot. ...

December 6, 2006 · 2 min · jepoirrier

Plugins for Digital Object Identifier lookup

I’ve just written some “search plugins” for Firefox (1.x and 2.x) that allow you to quickly look for a specific Digital Object Identifier ( DOI). These DOI are more and more used in biomedical sciences. One of their interesting features is that they allow direct linking to the scientific article. The plugins are availble here. If you already have Firefox 2, the installation procedure is very easy: all you have to do is go to the plugins page, click on the small arrow near your Firefox search box and choose the “Add DOI lookup” option; it will then automatically be installed for you. ...

November 17, 2006 · 1 min · jepoirrier

How to remove files ending with '~'

The vim text editor always produce a file ending with a tilde (~) as a kind of backup of the currently modified file (this is a default behaviour). On my MS-Windows machine (Pentium M, 1.73GHz), I was tired of manually deleting these files so I first used the “Search” option in the File Explorer. After some time, I got tired to wait for the results. So I wrote a Python and a batch scripts to find all these files. They are going much faster than the Search GUI. The first time I launch them, they are still going slow (but faster than a GUI). As you can see in the graph below, the second time I launch these scripts, they went at least 10 times faster. I’m not a specialist but I guess it has something to do with caching at the OS level. For the first run, the batch script is 20% slower than the Python script. After that, the Python script is 50% slower than the batch script (but between 3.7s and 5.6s, the difference is not big). ...

November 15, 2006 · 2 min · jepoirrier

Simple Sitemap.xml builder

In a recent post, Alexandre wrote about web indexing and pointed to a nice tool for webmaster: the sitemap. The Sitemap Protocol “allows you to inform search engines about URLs on your websites that are available for crawling” (since it’s a Google creation, it seems that only Google is using it, according to Alexandre). If you have a shell access to your webserver and Python on it, Google has a nice Python script to automatically create your sitemap. ...

November 5, 2006 · 2 min · jepoirrier

Automated Pubmed reference to BibTeX

In biology, we often need to use PubMed, a biomedical articles search engine for citations from MEDLINE and other life science journals. In the MS-Windows world, you have nice, proprietary tools (like Reference Manager or Endnote) that retrieves citations from PubMed, store them in a database and allow you to use them in proprietary word processing software (in fact, in MS-Word only since nor Wordperfect nor OpenOffice.org are supported). If you are using BibTeX (for LaTeX) as your citations repository, there isn’t a lot of tools. The best one, imho, is JabRef, a free reference manager written in Java (for me, the only “problem” is that it adds custom, non-BibTeX tags). Or you can edit the BibTeX file by yourself with any text editor. ...

October 22, 2006 · 3 min · jepoirrier

Dasher: where do you want to write today?

Hannah Wallash put their slides about Dasher on the web (quite the same as these ones from her mentor). Dasher is an “information-efficient text-entry interface”. What made me interested in Dasher is her introduction about the way we communicate with computers and how they help us to communicate with them. There are keyboards (even reduced ones), gesture alphabets, text entry prediction, etc. I am interested in the ways people can enter text on a touch-screen, without physical keyboard. Usually, people use a virtual keyboard (like in kiosks for tourists or in handheld devices). But they are apparently not the best solutions. ...

October 18, 2006 · 2 min · jepoirrier

Looking for a good free UML2 modelling editor ...

I was using Poseidon as a modelling editor for my UML2 diagrams. It was based on Java and I was able to run it from both GNU/Linux and MS-Windows. It was not free software but the Community Edition was free (as in “free beer”) and has all the tools I modestly needed. The only trick: all the diagrams had a string in the bottom, stating it was not meant to be used for commercial purpose (for educational purpose, I’ve written a small software that removes it). ...

October 4, 2006 · 2 min · jepoirrier

Playing with Python and Gadfly

Following my previous post where I retrieved EXIF tags from photos posted on Flickr, here is the next step: my script now stores data in a database. There is a lot of free wrappers for databases in Python. Although I first thought of using pysqlite (because I am already using SQLite in another project), I decided to use Gadfly, a real SQL relational database system entirely written in Python. It does not need a separate server, it complies with the Python DBAPI (allowing easy changes of DB system) and it’s free. ...

October 1, 2006 · 3 min · jepoirrier

Playing with Python, EXIF tags and Flickr API

Some days ago, I was quite amused by Flagrant Disregard Top Digital Cameras: these people daily took 10000 photos that were uploaded on Flickr and looked at the camera makes and models of these photos. This kind of study is interesting because one can see what people are actually using and what camera models can give good results (with a good photographer, of course). I was just disappointed by the fact that they are not saying anything about their sampling method nor the statistics they can apply to their data. I then thought that I can do a kind of survey like this one and publish results along with the method. ...

October 1, 2006 · 4 min · jepoirrier

Stream redirection in Python

Two computers are on the same network. A firewall segregates the internet from the intranet. Both computers can access everything on the intranet but only one of them is allowed to access the internet. The problem is to listen to a music stream from the computer that cannot access the internet. A possible solution is to run a stream redirection software on the computer that can access the internet. Then the computer that cannot access the internet can get the stream from the intranet (figure below). ...

September 6, 2006 · 2 min · jepoirrier