Month: September 2007

More on Java DBs comparison

Following a comment from Alexandre on a previous post, I went a little bit further with my performance test of database engines running under Java. This evening, I tested a profiling tool and a variable number of insertions/retrievals (I didn’t tested transaction).

Taking the code from the previous time, I simply changed the number of elements to be inserted/retrieved. As expected, the durations of object initialization (except for 2 points for Derby and H2) and database creation did not change with the number of elements to be inserted, Derby being still the slowest engine to create a simple database (1 table only). The durations of the insertion step increased slowly with all the database engine, except for SQLite+JDBC: you can see a much steeper initial angle in the increase of the duration in the graph below (be careful: x-axis shows logarithmic values).

For the retrieval, all the engines increased their time spent in this step in the same way (approximately). All the graphs can be seen here.

Performance analysis was completed using a free profiling tool, jRat (list of tools available here and here). There is a big difference here since jRat measures the time spent in each function. These functions approximately match the previous “steps” but not exactly. And I had memory problems using jRat with a number of elements inserted > 100 (hence the limit here).

Derby and SQLite+JDBC always performed worst than other engines (except for the showData() function). Usually, H2 and HSQLDB had more stable results (smaller standard deviations). And SQLite+JDBC was still the worst engine regarding data insertion (see graph below).

It was also very strange to see that H2 and HSQLDB took approximately the same time to insert 100 or 1000 elements (note that for HSQLDB, I did not take into account the fact one needs to explicitly close the connection, allowing HSQLDB to temporarily store data in memory before committing them to the file — but closing the connection didn’t take so much time). All the graphs can be seen here.

One conclusion of all this: if you write a Java application and need a fast Java database engine, use HSQLDB (BSD-like license) or H2 (modified MPL). Next time, I’ll test transactions (but I don’t know when it’ll be).

Published in Schmap

One of my photos on Flickr is now on Schmap, a website providing travel guides for some destinations in the world (only Europe, North America and Australasia for now). See here how it looks.

What was interesting for me was the way they did it. I came to know it via an e-mail from Emma Williams (from Schmap) telling me my photo was included. And, at first sight (*), they correctly understand the conditions of the Creative Commons Attribution-ShareAlike 2.0: attribution is on their website, as well as the “cc” logo next to the image. And they link to the the image on Flickr 🙂

(*) If you read the terms of use, you’ll notice all the material (including third-party material) “are protected by copyright laws. You may only access and use the Materials for personal or educational purposes and not for resell or commercial purposes by You or any third parties”. My photo is also protected by copyright laws but you can access it for other purposes than personal or educational uses, you can sell them (the cc-by-sa allows it). Since the transformation they did is “only” a resizing, does it really matter since the original material is given and resizing is easy to re-do?

SQLite+JDBC, worst than Derby!

Following a comment from Alexandre on a previous post, I included SQLite in my performance test of database engines running under Java.

What prevented me from using SQLite in the previous test is that it’s not a pure Java database and one have to use third-party JDBC driver and implementation classes in order to manage this database engine. IMHO, I also dislike another fact: SQLite does not enforce data type constraints (and it’s a feature, not a bug) so everything is stored as ASCII string, even if you have very few other “artificial” data types.

In order to include SQLite in my test, I have to rely on a third-party libraries: David Crawshaw’s JDBC driver for SQLite. Some minor adaptations also had to be done to the code (see the SqliteTest class in the source code below). Brandon T. provides a good tutorial on how to use this driver on MS-Windows here.

The result? SQLite performances are worst than Derby! In this case, the slowest step is the data insertion: more than 11s for 100 insertions (see graph below). The database creation step is also slower than with H2 or HSQL (but much faster than Derby). If you compare the whole process (initialisation + creation + insertion + retrieval), SQLite (and its JDBC driver) is the worst database engine. The only good point is that it only creates one file (the other engines creates at least 2 files) and this file is only 5kb.

Click to show the normal size graph.
A graph with a log y-axis is available here

Protocol. Same as in the previous post except that now I have 4 engines to rotate. Source code is here (4kb) and the jar file + libraries is here (5Mb).

Microsoft Research to sponsor Open Access awards

In a somewhat strange move, Microsoft Research is going to sponsor BioMed Central 2007 Research Awards.

Lee Dirks, director, scholarly communications, Microsoft Research: “We are very supportive of the open science movement and recognize that open access publication is an important component of overall scholarly communications.”

I hope the other Microsoft divisions are going to follow this move and sponsor (or release their products as) Open Source and free software projects … More details on the announcement here.

Nothing new on the Open Access front

Cambridge University Peter Murray Rust discovered he cannot have access to his article he paid for an Open Access publication in an Oxford University Press journal. This caused some discussions on /. but, as usual, it’s better to first have a look at Peter Suber blog to have an objective view on this.

Why did Sun chose Derby?

I’m wondering why Sun chose Derby for its JavaDB …

I used JavaDB on a project and my main reason was that it’s embedded in the last Java Runtime Engine (JRE). But I saw a clear degradation of performances (my main criteria is speed) when I had to access the embedded database. And it became worst when I ran my project from a CD-ROM (because it has to be distributed).

So I decided to run a small, rough test and compare JavaDB with two other free Java database engines: H2 and HSQLDB. And the results are astonishing: JavaDB seems to be the slowest, hence the worst choice (except for the license). Here are the results (click to show the normal size graphs):

On the graph, you can see the duration in ns (nanoseconds) -vs- database activity steps (complete protocol below ; a second graph showing the step durations in logarithmic scale is available here). JavaDB took nearly 7s to create the database while all the steps are performed in less than 0.5s for all the other engines! JavaDB was also the engine that took the most space on the hard disk (1.63Mb ; H2 took 384kb and HSQL 5kb).

Of course, this test is a bit subjective since I’m not using any tuning (for any engine) and use sequential retrievals (SELECT * FROM tablename). For the moment I don’t care too much about random retrievals since I don’t need them. Anyway, I’ll quickly switch my application database engine!

Protocol. All tests were done on an Intel Pentium 4 at 2.4GHz with 512Mb of RAM running MS-Windows XP Pro and Java 6.0. The table contains three fields: an ID, a string (“title”) and an integer (the same int as the ID). The initialisation step only creates an object that will handle the database. The step “creation” creates the embedded database and table (i.e. the files on the disk). The insertion step … inserts 100 entries in the table and the retrieval step … retrieves and print these entries on the command line. One test successively creates a database with the three engines. In order to minimize “memory effects”, each step is repeated three times and I only take results from the third test. In addition, these three tests are repeated with each database engine in all the successive position: for example, graphs include data from tests with H2 in 1st, 2nd and 3rd position. On the graph, each step was tested 11 times (n=11).
If you want to test by yourself, sources are here (3kb) and jar executable is here (20kb).

Download YouTube videos

There are many websites around that allow you to download videos from YouTube. But it’s not possible to do it directly from YouTube. And you end up with a proprietary Flash video file. Although you can install the Flash plug-in on your computer, there are cases when you don’t want to do so or you are even not able to do so.

YouTube without Flash player

So, for whatever reason, you want a video from YouTube on your computer in a file format suitable for any kind of multimedia viewer? Here is a small (15 lines) bash script to download and convert a YouTube video you like in standard MPEG format. For that purpose, you’ll need wget (usually, you already have it on your GNU/Linux box) and ffmpeg.

Now, suppose you want to watch a video about Morris water maze, just look at the URL (http://youtube.com/watch?v=y2kJ2Zw9ZgI) and you’ll see the video ID (y2kJ2Zw9ZgI). Now, copy this ID and choose a proper filename for your file. Simply type “./youtubedownload.sh y2kJ2Zw9ZgI MorrisWaterMazeVideo” and after a few seconds, you’ll get a file called MorrisWaterMazeVideo.mpeg you can watch with the player you want. 🙂

Note 1: it doesn’t work with all the files on YouTube but almost all of them
Note2: Google Video gives the opportunity to directly download videos in mp4 format (which is standaradized)