How to remove files ending with '~'

The vim text editor always produce a file ending with a tilde (~) as a kind of backup of the currently modified file (this is a default behaviour). On my MS-Windows machine (Pentium M, 1.73GHz), I was tired of manually deleting these files so I first used the “Search” option in the File Explorer. After some time, I got tired to wait for the results.

So I wrote a Python and a batch scripts to find all these files. They are going much faster than the Search GUI. The first time I launch them, they are still going slow (but faster than a GUI). As you can see in the graph below, the second time I launch these scripts, they went at least 10 times faster. I’m not a specialist but I guess it has something to do with caching at the OS level. For the first run, the batch script is 20% slower than the Python script. After that, the Python script is 50% slower than the batch script (but between 3.7s and 5.6s, the difference is not big).

Comparison of .bat and .py files

Here are the scripts : find files ending with ~ in Batch (the problem is that you have to do the duration calculation by yourself), find files ending with ~ in Python and remove files ending with ~ in Python (all scripts are 1kb).

Each of these scripts were run as the first application after my computer was turned on. I didn’t repeated the measures (doing real stats wasn’t the goal anyway). Deleting all the files (after having found them) took 5.4s. It just goes to show what we can do, just before the beginning of a lab seminar.

OSS/FS players about GPL Java

Sun opened Java in the most elegant way of doing it (imho): the licence is the GPL. This move was analysed and commented by many people. Even some important Open Source/Free Software players gave their comments on a Sun website. Unfortunately, their comments are only available in a proprietary video format.

You can now have access to audio recordings of these interviews (Brian Behlendorf, Paul Cormier, Eben Moglen, Tim O’Reilly, Mark Shuttleworth, Richard Stallman and Dr. Marcelo K. Zuffo), to a text transcript and even to SHA1 sums of the audio files!

Double quotes!

GGGRRrrrrrrr … I was quietly using R to analyse my data when, suddently, I wasn’t able to open the file containing these data anymore. It’s just a plain text file! How can it be corrupted? Here is the error message:

t < - read.table('ratsdata.csv', header=TRUE, sep=",")
Warning message:
incomplete final line found by readTableHeader on 'ratsdata.csv'

For hours, I tried everything: I counted the number of separators on each line, I counted the number of decimal points on each line, I removed double quotes around factors, I examined in details the final line, etc. (well, Python scripts did the job for me because my file already has > 700 lines). Finally, the solution was so dumb: I mistakenly deleted one double quote before a header. My first line looked like (1) and should look like (2):

1: "id", "group", "trial", "durTot", durExt" [...]
2: "id", "group", "trial", "durTot", "durExt" [...]

Ok, next time, I’ll add this check on my list …

White & Nerdy

It’s Sunday, let’s rest a little bit … I really liked this Al Yancovic‘s video “White & Nerdy“. To fully understand it, you need some basic technical background and a friend that looks like the white & nerdy guy in the video. Because, of course, you are not like him 😉

Dont’ download this song” is also great (some background about DRM is welcome).

It’s all about Pentiums” is not my music style but some lyrics are good.

If you don’t like Flash movies, you can use the Firefox OOk video plug-in to obtain the FLV file and then read it with VLC or convert it with ffmpeg (e.g.).

Simple Sitemap.xml builder

In a recent post, Alexandre wrote about web indexing and pointed to a nice tool for webmaster: the sitemap. The Sitemap Protocol “allows you to inform search engines about URLs on your websites that are available for crawling” (since it’s a Google creation, it seems that only Google is using it, according to Alexandre).

If you have a shell access to your webserver and Python on it, Google has a nice Python script to automatically create your sitemap.

I don’t have any shell access to my webserver 😦 But I can write a simple Python script 🙂 Here it is: sitemapbuilder.py, 4ko. After having specified the local directory where all your files are and the base URL for your on-line website (yes, you need to edit the script), launch the script and voilà! You can now upload your sitemap.xml file and tell web crawlers where to find the information on your website.

Interested in the other options you can specify?

  • You can specify an array of accepted file extensions. By default, I’ve put ‘htm’, ‘html’ and ‘php’ but you can add ‘pdf’ if you want.
  • You can specify an array of filenames to strip. By default, I strip all ‘index.*’ (with * = one of the accepted extensions) because http://www.poirrier.be is the same as http://www.poirrier.be/index.html but more easier to remember
  • You can specify a change frequency (it will be the same for all files)
  • You can specify a priority (it will also be the same for all files and even omitted if equal to the default value)

On the technical side, there is nothing great (I even don’t use any XML tool to generate the file). I was impressed by the ease of walking through a directories/files tree with the Python os.walk function (it could be interesting to use it in the future blog system I mentioned earlier).

Finally, you can see the sitemap.xml generated for my family website.

Edit on November 17th: it seems that Google, Yahoo! and Microsoft made a team to release a “more official” sitemaps specification: http://www.sitemaps.org/.

Search for images by sketching

On his blog, Laurent wanted to know who is this guy. I though it was an interesting starting point to see how good is Retrievr, “an experimental service which lets you search and explore in a selection of Flickr images by drawing a rough sketch”.

Although my drawing skills really needs to be improved (and their drawing tools more refined – always blame the others for your weaknesses 😉 ), a first sketch gives some interesting results (see screenshot below): 7 retrieved photos (44%) show a b/w human face in “frontal view” (if you count the dog, it’s even 8 correct images).

Click to enlarge

If I just give the photo URL, results are not so good (see screenshot below). I am nearly 100% sure that it’s because it’s a greyscale photo scanned as a color image.

Click to enlarge

When I upload the image on my harddisk, convert it to a greyscale image and upload it on Retrievr, now it gives more expected results: 10 images show a person with his/her hair (63%), either from the front or from the back.

Click to enlarge

So this photo is not among the “most interesting” ones on Flickr (it is even probably not on Flickr). I suppose that if one applies Retrievr on a larger subset of photos, we’ll have a higher probability to find it (but it will also increase noise, i.e. the number similar photos). If you like playing with Flickr, other intersting Flickr mashups can be found here.

Automated Pubmed reference to BibTeX

In biology, we often need to use PubMed, a biomedical articles search engine for citations from MEDLINE and other life science journals.

In the MS-Windows world, you have nice, proprietary tools (like Reference Manager or Endnote) that retrieves citations from PubMed, store them in a database and allow you to use them in proprietary word processing software (in fact, in MS-Word only since nor Wordperfect nor OpenOffice.org are supported). If you are using BibTeX (for LaTeX) as your citations repository, there isn’t a lot of tools. The best one, imho, is JabRef, a free reference manager written in Java (for me, the only “problem” is that it adds custom, non-BibTeX tags). Or you can edit the BibTeX file by yourself with any text editor.

The problem with manual edition is that it is prone to error (even when copying/pasting from the web). Since Python programming is my hobby horse for the moment, there are two solutions to this problem:

  1. Use Biopython to get a reference from PubMed but are you ready to have a huge module dependency just to use 1 function?
  2. Write your own Python script, using a PubMed URL to download your reference and a little bit of XML parsing to extract the relevant info (one can use the ESearch and EFetch tools but my lazy nature tells me to simply use the URL).

Obviously, I chose to write my own Python script. Each reference from this PubMed XML format example (full DTDs) should be like this:

@article{poirrier06,
  author = {Poirrier, J.E. and Poirrier, L. and Leprince, P. and
Maquet, P.},
  title = {Gemvid, an open source, modular, automated activity
recording system for rats using digital video},
  year = 2006,
  journal = {Journal of circadian rhythms},
  volume = 4,
  pages = {10},
  pmid = 16934136,
  doi = {10.1186/1740-3391-4-10}
}

The script is here (4kb). First, use PubMed to check the reference you want, then take its PubMed ID (PMID) and launch the program, giving your BibTeX file in a pipe, for example:

./pyP2B.py 16934136 >> myrefs.bib

If you like, you can edit the script to change the tab size (here = 2).

How does it work?

  1. With PubMed, I do not use the correct tool but a HTTP query. It is much more simple and easier. The script asks for the PMID citations. Since it gets a HTTP answer, I need to parse this answer to replace entities (like , etc.) and obtain a valid XML file.
  2. Once I got the XML file and after some checking, I use XPaths from LXML (for me, XPaths are quick and dirty compared to write a DOM/SAX structure but it works!)
  3. Then the script simply prints the result to the standard output (even if it’s an error ; improvement : print on the error output). You simply need to get this output into your BibTeX file with the correct pipe.

Edit on October 23rd: this script has errors when dealing with non-ascii chars like “ö” in Angelika Görg. I won’t fix it for the moment.

Diwali 2006 @ ISAL

On Saturday, after the Kolam ritual, we went for Diwali, the Hindu Festival of Lights, organized by the ISAL. It was very nice to meet people we already met on previous ISAL “functions” and to talk with them. And I think that ISAL is attracting more and more people, both of Indian origin (working in Belgium, for example) and of non-Indian origin: this time, people from Belgium, China, Poland, Russia, Spain, etc. were there. As usual, I took some photos.