Nothing new on the Open Access front

Cambridge University Peter Murray Rust discovered he cannot have access to his article he paid for an Open Access publication in an Oxford University Press journal. This caused some discussions on /. but, as usual, it’s better to first have a look at Peter Suber blog to have an objective view on this.

Why did Sun chose Derby?

I’m wondering why Sun chose Derby for its JavaDB …

I used JavaDB on a project and my main reason was that it’s embedded in the last Java Runtime Engine (JRE). But I saw a clear degradation of performances (my main criteria is speed) when I had to access the embedded database. And it became worst when I ran my project from a CD-ROM (because it has to be distributed).

So I decided to run a small, rough test and compare JavaDB with two other free Java database engines: H2 and HSQLDB. And the results are astonishing: JavaDB seems to be the slowest, hence the worst choice (except for the license). Here are the results (click to show the normal size graphs):

On the graph, you can see the duration in ns (nanoseconds) -vs- database activity steps (complete protocol below ; a second graph showing the step durations in logarithmic scale is available here). JavaDB took nearly 7s to create the database while all the steps are performed in less than 0.5s for all the other engines! JavaDB was also the engine that took the most space on the hard disk (1.63Mb ; H2 took 384kb and HSQL 5kb).

Of course, this test is a bit subjective since I’m not using any tuning (for any engine) and use sequential retrievals (SELECT * FROM tablename). For the moment I don’t care too much about random retrievals since I don’t need them. Anyway, I’ll quickly switch my application database engine!

Protocol. All tests were done on an Intel Pentium 4 at 2.4GHz with 512Mb of RAM running MS-Windows XP Pro and Java 6.0. The table contains three fields: an ID, a string (“title”) and an integer (the same int as the ID). The initialisation step only creates an object that will handle the database. The step “creation” creates the embedded database and table (i.e. the files on the disk). The insertion step … inserts 100 entries in the table and the retrieval step … retrieves and print these entries on the command line. One test successively creates a database with the three engines. In order to minimize “memory effects”, each step is repeated three times and I only take results from the third test. In addition, these three tests are repeated with each database engine in all the successive position: for example, graphs include data from tests with H2 in 1st, 2nd and 3rd position. On the graph, each step was tested 11 times (n=11).
If you want to test by yourself, sources are here (3kb) and jar executable is here (20kb).

Download YouTube videos

There are many websites around that allow you to download videos from YouTube. But it’s not possible to do it directly from YouTube. And you end up with a proprietary Flash video file. Although you can install the Flash plug-in on your computer, there are cases when you don’t want to do so or you are even not able to do so.

YouTube without Flash player

So, for whatever reason, you want a video from YouTube on your computer in a file format suitable for any kind of multimedia viewer? Here is a small (15 lines) bash script to download and convert a YouTube video you like in standard MPEG format. For that purpose, you’ll need wget (usually, you already have it on your GNU/Linux box) and ffmpeg.

Now, suppose you want to watch a video about Morris water maze, just look at the URL (http://youtube.com/watch?v=y2kJ2Zw9ZgI) and you’ll see the video ID (y2kJ2Zw9ZgI). Now, copy this ID and choose a proper filename for your file. Simply type “./youtubedownload.sh y2kJ2Zw9ZgI MorrisWaterMazeVideo” and after a few seconds, you’ll get a file called MorrisWaterMazeVideo.mpeg you can watch with the player you want. 🙂

Note 1: it doesn’t work with all the files on YouTube but almost all of them
Note2: Google Video gives the opportunity to directly download videos in mp4 format (which is standaradized)

Un-published in Nature (NRSC)

In the last post, I told you one of my photo on Flickr was published in an article from Nature Reports Stem Cells. After some discussions with three friends, I decided to write an e-mail to the journal editors basically stating that, although I enjoyed my photo being shown in their journal, they did not comply with one of the two conditions of the CC-by-sa license (the “Share-Alike” part, more details in the copy of my e-mail). I chose this licence for this photo because it is there to give freedom to other people on some material while this freedom stays with the media even if the latter is modified.

The answer quickly came from Matthew Day, database publisher:

From: Matthew Day
To: Jean-Etienne Poirrier
Date: Wed, 22 Aug 2007 20:55:39 +0200
Subject: RE: Photo license issue

Dear Jean-Etienne,

I’m sorry that we have published a derivative of your image without putting the new work under a Creative Commons license. As you adroitly guessed, we cannot publish the derivative work under CC as it contains components from other sources.

I would be willing to discuss with you a way of keeping the image, and your credit, on the NRSC article. However, an additional complexity is that I see that you are posting a PDF of the article on your blog. The cleanest solution for us may be to simply remove the image and PDF from our websites.

If you’d like to discuss this, I could call you tomorrow or Friday if you are free and can email me your contact number.

With best wishes,

Matthew Day,
Database Publisher
Nature Publishing Group

Matthew also removed the image from the website (before my answer). So I removed the PDF of the article (a personal copy but I don’t think I could consider it as a self-archived article) from this blog.

From an initial error, I think the Nature Publishing Group reacted correctly.

Published in Nature!

I was very pleased to see my first publication in Nature ⁽¹⁾, the scientific journal with an impact factor of 26! Well, it’s not really what you can expect (especially if you are one of my two mentors): one of my photos on Flickr, representing a rat eating (or praying?), was chosen to illustrate a summary of UK Academy of Medical Sciences report on animal-human chimeras 🙂

Click on the thumbnail above to see the full screenshot

Here is the article full reference: DeWitt, N. “Animal-human chimeras: Summary of UK Academy of Medical Sciences Report” Nature Reports Stem Cells, published online on August, 2nd, 2007.

Note that I don’t know if they completely comply with the photo license since the Creative Commons Attribution-ShareAlike 2.0 allows them to re-use the photo and do the modifications, provided they give credit (ok) and distribute the resulting work only under a licence identical to the CC-by-sa. They are not clearly stating to others the licence terms of their new work …

⁽¹⁾ As Jan Schoones wrote in his comment, it’s not published in Nature itself but in Nature Reports Stem Cells, a journal published by the same company as Nature but which does not have an impact factor! (Edited on August, 20th)

Some news

Things are better now: Nandini is back, ankle is better (not 100% though), parents-in-law moved into their new home and all the family is fine 🙂

Do your laptop fans produce a lot of noise?

Someone hoped my laptop doesn’t make too much noise after I posted a photo of the Tecra logo on Flickr. The short answer is no, it doesn’t make too much noise. At 10cm from the fan output, I can measure 42dB when the fan is off and 52dB when it’s on.

Beside the fact that I don’t hear that noise when I have my headphones, it was not sufficient for me. I wrote small python and gnuplot scripts to collect and display temperature, fan status and load (.tar.gz file, 1.3ko). During those 2 hours, I checked my e-mails, read news on the web and wrote the OPML output in catrss (that’s why load averages increase at the end, when I’m debugging the software). Here are the results (click on an image to see a larger version):

Graph of temperature and load on a Toshiba Tecra S1 for 2 hours after boot

Graph of temperature and fan activity on a Toshiba Tecra S1 for 2 hours after boot (see text for explanation on fan activity)

One strange thing is that the status of both fans (/proc/acpi/fan/FAN0/state and /proc/acpi/fan/FAN1/state) is always off. But, when you ask for the trip points (/proc/acpi/thermal_zone/THZN/trip_points), you see that FAN1 should be active above 45Â°C and FAN0 should be active above 104Â°C! Practically, after some observations, I realized that FAN1 is activated if the temperature is equal to or above 50Â°C and it doesn’t stop until temperature is equal to or below 45Â°C. That’s the behaviour displayed in green on the second chart.

OPML output in catrss

A few days ago, I released the first version of catrss, a tool used to concatenate RSS file(s) to standard output. Today, I added OPML output to this tool. Here it is in version 0.2 (.tar.gz file, 16ko).

OPML is a file format first used in a commercial application. Now it’s widely used for the exchange of links between news aggregators. Because of that, I had to implement it in catrss: it’s a potential format for the output of catrss.

But I’m not happy with this format. It’s nice, it does its job, it’s already available, … ok. Other people already complained about this format (see here e.g.). Beside the fact it’s not standardized (it’s difficult to exactly know what software will correctly parse what you put in your outline tags), I dislike the fact that the official specifications include tags that describe what I consider as the look, feel and behaviour of the content (expansionState, vertScrollState and the window*) and not the content itself. Of course, I’ve read that they are optional. Nevertheless, I don’t think it’s a good design decision to handle things like this and mix look/behaviour and content (not in this case).

I’m sorry for people not in computer science but this is — again — another entry about programming and I understand the title and content are a bit strange for them.

Picklist Editor 0.2

I’ve just released the version 0.2 of Picklist Editor. Now you have a table of all the proteins on the right of the gel. If you double-click on a cell, you can edit it (note this is not a recommended behaviour). After revalidating the table, your new spot will be included in the gel (and saved to your picklist if you like it). For me, this version is stable and fully functional 🙂

Picklist Editor 0.2 screenshot

The hardware side of Picklist Editor 0.1

This morning, I released Picklist Editor 0.1 with a text introduction … Hmmm … on my photos on Flickr you can see the hardware side of the picking process … (click on pictures to see details).

On the photo on the left, you can see a gel on a low-fluorescent glass plate. This plate is in part in a tray that firmly holds it when the robot is doing its job. The holes everywhere result from the picking process but there are proteins everywhere and you can’t see them in visible light since they are labelled with fluorescent Cy dyes. You can see two white round stickers on each side of the gel: these are the picking references.

Here, the picker head is in the process of taking a part of the gel with some proteins inside. Exact positions were computed according to the fluorescent images, revealing the proteins. As you can see again: the gel is perfectly transparent for our non-bionic eyes.

Finally, you can see the spot picking robot in action. The picking head is moving following two axis thanks to the horizontal bar at the back and the perpendicular arm holding the picking head and camera. On the left, you have a pumping station: in addition to some jazz when the picking head is on the gel, the station is aspiring water through the head in order to help getting a plug out of the gel. After that the arm moves to the right of the photo where you have two 96-wells plates to collect samples. When the head is above a well, the pumping station is “blowing” water into the head in order to eject the plug into the well. Everything is under control of a computer and software that is on the right, outside of the camera angle.

This is the GE Healthcare “Typhoon 9400” scanner used to scan fluorescent gels. It’s a huge beast but it doesn’t make a lot of noise (well, I don’t want to stay the whole day next to it!). And this unit only has the red and green lasers inside. There is a second (smaller) unit below with only a blue laser source in it! You can see two gels ready to be scanned (the upper door has to be closed before!).

Next is the analysis “workstation” where images of the gels are analyzed (after scanning and before spot picking). The software (DeCyder) helps to create pick lists.

Other photos from my labs can be seen in my laboratory photostream.