Category: Projects

SQLite+JDBC, worst than Derby!

Following a comment from Alexandre on a previous post, I included SQLite in my performance test of database engines running under Java.

What prevented me from using SQLite in the previous test is that it’s not a pure Java database and one have to use third-party JDBC driver and implementation classes in order to manage this database engine. IMHO, I also dislike another fact: SQLite does not enforce data type constraints (and it’s a feature, not a bug) so everything is stored as ASCII string, even if you have very few other “artificial” data types.

In order to include SQLite in my test, I have to rely on a third-party libraries: David Crawshaw’s JDBC driver for SQLite. Some minor adaptations also had to be done to the code (see the SqliteTest class in the source code below). Brandon T. provides a good tutorial on how to use this driver on MS-Windows here.

The result? SQLite performances are worst than Derby! In this case, the slowest step is the data insertion: more than 11s for 100 insertions (see graph below). The database creation step is also slower than with H2 or HSQL (but much faster than Derby). If you compare the whole process (initialisation + creation + insertion + retrieval), SQLite (and its JDBC driver) is the worst database engine. The only good point is that it only creates one file (the other engines creates at least 2 files) and this file is only 5kb.


Click to show the normal size graph.
A graph with a log y-axis is available here

Protocol. Same as in the previous post except that now I have 4 engines to rotate. Source code is here (4kb) and the jar file + libraries is here (5Mb).

Why did Sun chose Derby?

I’m wondering why Sun chose Derby for its JavaDB

I used JavaDB on a project and my main reason was that it’s embedded in the last Java Runtime Engine (JRE). But I saw a clear degradation of performances (my main criteria is speed) when I had to access the embedded database. And it became worst when I ran my project from a CD-ROM (because it has to be distributed).

So I decided to run a small, rough test and compare JavaDB with two other free Java database engines: H2 and HSQLDB. And the results are astonishing: JavaDB seems to be the slowest, hence the worst choice (except for the license). Here are the results (click to show the normal size graphs):

Speed comparison between 3 Java database engine: JavaDB, H2 and HSQL

On the graph, you can see the duration in ns (nanoseconds) -vs- database activity steps (complete protocol below ; a second graph showing the step durations in logarithmic scale is available here). JavaDB took nearly 7s to create the database while all the steps are performed in less than 0.5s for all the other engines! JavaDB was also the engine that took the most space on the hard disk (1.63Mb ; H2 took 384kb and HSQL 5kb).

Of course, this test is a bit subjective since I’m not using any tuning (for any engine) and use sequential retrievals (SELECT * FROM tablename). For the moment I don’t care too much about random retrievals since I don’t need them. Anyway, I’ll quickly switch my application database engine!

Protocol. All tests were done on an Intel Pentium 4 at 2.4GHz with 512Mb of RAM running MS-Windows XP Pro and Java 6.0. The table contains three fields: an ID, a string (“title”) and an integer (the same int as the ID). The initialisation step only creates an object that will handle the database. The step “creation” creates the embedded database and table (i.e. the files on the disk). The insertion step … inserts 100 entries in the table and the retrieval step … retrieves and print these entries on the command line. One test successively creates a database with the three engines. In order to minimize “memory effects”, each step is repeated three times and I only take results from the third test. In addition, these three tests are repeated with each database engine in all the successive position: for example, graphs include data from tests with H2 in 1st, 2nd and 3rd position. On the graph, each step was tested 11 times (n=11).
If you want to test by yourself, sources are here (3kb) and jar executable is here (20kb).

Download YouTube videos

There are many websites around that allow you to download videos from YouTube. But it’s not possible to do it directly from YouTube. And you end up with a proprietary Flash video file. Although you can install the Flash plug-in on your computer, there are cases when you don’t want to do so or you are even not able to do so.

YouTube without Flash player

So, for whatever reason, you want a video from YouTube on your computer in a file format suitable for any kind of multimedia viewer? Here is a small (15 lines) bash script to download and convert a YouTube video you like in standard MPEG format. For that purpose, you’ll need wget (usually, you already have it on your GNU/Linux box) and ffmpeg.

Now, suppose you want to watch a video about Morris water maze, just look at the URL (http://youtube.com/watch?v=y2kJ2Zw9ZgI) and you’ll see the video ID (y2kJ2Zw9ZgI). Now, copy this ID and choose a proper filename for your file. Simply type “./youtubedownload.sh y2kJ2Zw9ZgI MorrisWaterMazeVideo” and after a few seconds, you’ll get a file called MorrisWaterMazeVideo.mpeg you can watch with the player you want. 🙂

Note 1: it doesn’t work with all the files on YouTube but almost all of them
Note2: Google Video gives the opportunity to directly download videos in mp4 format (which is standaradized)

Do your laptop fans produce a lot of noise?

Tecra Someone hoped my laptop doesn’t make too much noise after I posted a photo of the Tecra logo on Flickr. The short answer is no, it doesn’t make too much noise. At 10cm from the fan output, I can measure 42dB when the fan is off and 52dB when it’s on.

Beside the fact that I don’t hear that noise when I have my headphones, it was not sufficient for me. I wrote small python and gnuplot scripts to collect and display temperature, fan status and load (.tar.gz file, 1.3ko). During those 2 hours, I checked my e-mails, read news on the web and wrote the OPML output in catrss (that’s why load averages increase at the end, when I’m debugging the software). Here are the results (click on an image to see a larger version):

Temperature and load graph
Graph of temperature and load on a Toshiba Tecra S1 for 2 hours after boot

Temperature and fan activity graph
Graph of temperature and fan activity on a Toshiba Tecra S1 for 2 hours after boot (see text for explanation on fan activity)

One strange thing is that the status of both fans (/proc/acpi/fan/FAN0/state and /proc/acpi/fan/FAN1/state) is always off. But, when you ask for the trip points (/proc/acpi/thermal_zone/THZN/trip_points), you see that FAN1 should be active above 45°C and FAN0 should be active above 104°C! Practically, after some observations, I realized that FAN1 is activated if the temperature is equal to or above 50°C and it doesn’t stop until temperature is equal to or below 45°C. That’s the behaviour displayed in green on the second chart.

OPML output in catrss

A few days ago, I released the first version of catrss, a tool used to concatenate RSS file(s) to standard output. Today, I added OPML output to this tool. Here it is in version 0.2 (.tar.gz file, 16ko).

OPML is a file format first used in a commercial application. Now it’s widely used for the exchange of links between news aggregators. Because of that, I had to implement it in catrss: it’s a potential format for the output of catrss.

But I’m not happy with this format. It’s nice, it does its job, it’s already available, … ok. Other people already complained about this format (see here e.g.). Beside the fact it’s not standardized (it’s difficult to exactly know what software will correctly parse what you put in your outline tags), I dislike the fact that the official specifications include tags that describe what I consider as the look, feel and behaviour of the content (expansionState, vertScrollState and the window*) and not the content itself. Of course, I’ve read that they are optional. Nevertheless, I don’t think it’s a good design decision to handle things like this and mix look/behaviour and content (not in this case).

I’m sorry for people not in computer science but this is — again — another entry about programming and I understand the title and content are a bit strange for them.

Picklist Editor 0.2

I’ve just released the version 0.2 of Picklist Editor. Now you have a table of all the proteins on the right of the gel. If you double-click on a cell, you can edit it (note this is not a recommended behaviour). After revalidating the table, your new spot will be included in the gel (and saved to your picklist if you like it). For me, this version is stable and fully functional 🙂

Picklist Editor 0.2 screenshot

Picklist Editor 0.1

When you work with 2D gel electrophoresis in proteomics, you end dealing with "pick lists". For this purpose, I wrote Picklist Editor, a tool to help visualize and modify this pick list.

Picklist Editor 0.1 screenshot

As usual, software and source code are available here. Feel free to use it and report any bug or your wish list 🙂

(and if you didn’t understand everything above because you are not in the proteomics field, just go to the page too because I also wrote a small introduction)

I should change this blog title to “Version 0.1” since I recently released version 0.1 of many software tools 😉

catrss 0.1

One day, one has to sit at his/her table and try to really understand how to deal with XML. Since I think I can only learn with a project in mind, I took Alexandre Dulaunoy’s mergerss suggestion and tried to develop my own catrss.

As the name implies, catrss is one of the many descendants of the cat command. Catrss is used to concatenate RSS file(s) to standard output. In its most simple form, you simply have to give it some RSS files to parse and it will concatenate them for you ; the command is:

./catrss rssfile1.xml rssfile2.xml ...

If you want to see all the parameters you can set, just type “./catrss –help”. You’ll probably prefer to set your own title, link and description parameters since they are the only mandatory elements. One important point to keep in mind is that, by default, catrss only take the 10 most recent items (blog entries, e.g.) from all the files. You can change this value with the “-n” option.

For the moment, catrss is only available here (.tar.gz file, 16ko). The file contains the catrss program, its source code and two example of RSS files. Code is licensed under the GNU GPL. You need only Python 2.5 in order to run catrss (it’s probably already installed on any GNU/Linux computer).

Currently, it only works with RSS 2.0 files and it’s very picky with dates (for example, it’s not working with this blog RSS stream — what a shame!). But all this could be improved for version 0.2. Suggestions, bug reports and patches are welcome.

Finally, dealing with XML and Python is very easy. ElementTree documentation is quite good. And, except for other Unix-minded tools, there is plenty of other cool stuff one can do with XML: parse answers from the Yahoo API, deal with XML-RPC and other web services, …

Of course, it’s when you are struggling to feed XML into your program that you realize other people already developed what you are just doing: I’ve found at least 5 RSS parsers/generators [1, 2, 3, 4, 5] and 3 tutorials [1, 2, 3]. But I’m proud to say I didn’t used any of these references for catrss.

oncolour – a Flickr add-on for background

After discovering the Flickr API, I started coding oncolour this night and here is the result … oncolour is a PHP script that allows you to display your photos from Flickr on your own website and with a specific background colour. It’s better with an example … This URL "http://www.epot.org/flickr/oncolour.php?id=860125589," will give this result:

oncolour screenshot

Some other solutions have been developed and used (example) but this one is really free:

  • free for use (see here how to use it – even for your photos – and a description of all the options) and
  • free to re-use (see here how to download the script and use it on your own server).

Feel free to use it! 🙂

Getting some TV news programmes

I told you it’s boring to lay down the whole day (see previous post). And even if I have a laptop, it’s very uncomfortable to type when you are on your bed with a leg on top of 3 pillows. Anyway, I’m not here to talk about my life but to share two small Python scripts. Their goal is to retrieve two television evening news programmes (from RTBF1 and France2, both in French). With that, I can directly watch evening news from my laptop (no need to browse their website nor install ad hoc Firefox plugin; everything can be done from the command line).

For the RTBF1 evening news, it was quite easy since they just rely on a 3rd-party hosting company to provide the video stream. One has just to find the correct URL and voilà: getRTBFvideo.py (446b).

For the France2 evening news, it was a little bit more tricky since they give the programme date in the URL. In my script, it gives yesterday URL if it’s launched before 9.00pm (their programme starts at 8.00pm every evening so it’s reasonable not to download today programme before 9.00pm). Here is the script: getF2video.py (889b).

Both scripts only give you the URL of the video stream. You need then to feed this URL into your favorite multimedia player (mplayer, e.g.). Please note you’ll still need proprietary codecs in order to view these streams.