Category: Open Source

Alt+e, g, a

This is the “shortcut” sequence of keys in order to get the list of changes in a text document in OpenOffice.org. It works very nicely with MS-Word documents, a useful feature when you are obliged to exchange work with colleagues, mentors, etc. who only use the proprietary word processor.

Part of screenshot of the track list in OOo IMHO, the only problem is the way the list of changes is shown to the end-user in OpenOffice.org: as in other word processor software, changes are underlined in a different color for each contributor and a small hint tells you what happened to the hovered block of text, who did it and when; unlike other word processors, you can’t accept/refuse any change by right-clicking on it (you have to do it from the separate window). I do not find this intuitive and, sometimes, annoying …

Although I really appreciate the list of all the changes (notably for bulk acceptance/refusal), I think the end-user should also have the opportunity to accept/reject a change, once at a time, with a right-click of the mouse or any other means (keyboard shortcut e.g.). This is, I think, especially important when you are not reviewing the last version of a document, when a reviewer ask questions in the text (you can’t neither accept neither reject, you have to manually edit the text) or when you still do some modifications to modified text.

Talking about modifications of modified text, OpenOffice.org doesn’t update the list of changes when you modify your text while this list is open. You have to close the list and then re-open it to see your new changes.

My “dream functionality” would be that, next to the actual list of changes, the end-user would be able to:

either right-click to obtain a pop-up menu showing the accept and reject options
either use two keyboard shortcuts: one for acceptance of the modification under the cursor

I already looked for such add-on on the web, without success. If anyone finds something interesting, let me know …

GNU tools on MS-Windows

When you are used to work on a computer with GNU/Linux and are obliged to process your files on a MS-Windows system for some time, the GnuWin32 project can come in handy. They provide a lot of command-line tools from the GNU collection (sed, iconv, tar, bzip2, … see the whole list of packages they provide).

This evening, I needed to convert a lot of files from UTF-8 to iso-8859-1 (because it seems no decent Windows text editor can correctly translate text between these two encodings). Apparently, the GnuWin32 project removed the recode tool. But it can be easily replaced by iconv. With iconv, it’s done with:

iconv -c -f utf-8 -t iso-8859-1 utf8file.txt > iso8859file.txt

Photo credit: “In the beginning…it was the command line” by Dick Mooran on Flickr

Two nice schemes about Open Source

I don’t know how I stumble upon this report of a conference (English translation) from Avi Alkalay but I liked 2 schemes he showed.

In this first scheme (left), I like the way it reminds you that “Open” is not only about software, source code. But now that more and more people are aware of the benefits of Open Source software, it’s interesting to also stress the other sides of openness: open standards (like OpenDocument), open hardware, open architecture.
In the second scheme (below) is about the trend from private control / closed access to public control / open access (apparently from Rebecca Henderson; it could be interesting to find this whole presentation from 2004).

There is a third scheme in Avi’s post but there is something I don’t like in it, although it’s visually appealing. Although I understand that proprietary and open innovations should collaborate for the time being, I think that Open Innovation is the model to follow. Moreover, the “speed-to-market” criteria is, imho, better in the Open Innovation model (but maybe I should see Rebecca Henderson’s presentation).

One more Open Source software at ULg

Exams After the promotion of Open Access (see Bernard Rentier’s blog) and a history of publications in Open Access journals (see this last article from the Cyclotron Research Center in PLoS), the University of Liege is slowly slowly publishing Open Source software too.

The last free software published is exams, an assessment management system (for on-line exams, …). They chose the GNU GPL 2, apparently without the possibility to upgrade to version 3 (I don’t know if it’s deliberate or not). And you can download the source code here.

What is even more interesting is that they provide a demonstration website if you want to test it in a nearly real setup (as examiners or students ; only in French). And the demonstration system is hosted by a commercial hosting company (OVH), indicating that it could be possible to use this system on very common platforms (only PHP/MySQL are required).

Now, we can dream of other software from the ULg released as free software, a subversion repository and a users/developers community around exams …

P.S.: of course, we already did all that 😉 since we published Gemvid in an Open Access journal (the Journal of Circadian Rhythms) and published it along with a lot of other tools as free software. But I don’t count this as an institutional push towards free software since it was mainly my decision and the development didn’t involved other people.

Vertical badge

I was writing the next version of my badge counting the number of days without Belgian government when Laurent added his comment requesting for a vertical version. You can see it on the right.

Since the original release, I also added translation of the sentence in Dutch and German (after all, Belgians are speaking 3 official languages). And I approximately centered the text on the vertical version (I personally prefer the text on the right for the horizontal version but you can easily modify this by yourself).

As usual, here is the HTML code to include this vertical version in your page, blog, etc.:

<img src= "http://www.epot.org/belgov/belgovv.php" alt="belgov counter on epot.org" />

And here is the source code (for both version): belgov-0.3.tar.gz (20kb).

How many days without governement?

Now it’s not a secret anymore: more than 148 days passed since we, Belgians, went to vote (it was on the 10th of June 2007) and we still don’t have any government!

If you want to count the numbers of days without Belgian government, it’s easy: just have a look at Belgian newspapers. Or … have a look at the counter below (in French, Vlaams or German) 😉

And if you want the same on your website or blog, it’s very easy, just copy/paste the HTML code below:

<img src="http://www.epot.org/belgov/belgov.php" alt="belgov counter on epot.org" />

Enjoy!

P.S. For those who could be interested, here is the source code: belgov-0.2.tar.gz (6kb). It’s written in PHP and under the GNU GPL (so it’s free!). Each small animal (Lion of Flanders or Rooster of Wallonia) represents 2 days without government. On the last line, there is a small gradation of transparency.

P.P.S. If you want to specifically support the unity of Belgium (because quite a number of politicians and citizens want to split Belgium), Pilok has a “I love Belgium” banner. Here I just wrote a counter of days without government, whatever your opinion is about Belgium.

Edit on Nov. 7th: I added translations in Vlaams and German for the line on the bottom.

More on Java DBs comparison

Following a comment from Alexandre on a previous post, I went a little bit further with my performance test of database engines running under Java. This evening, I tested a profiling tool and a variable number of insertions/retrievals (I didn’t tested transaction).

Taking the code from the previous time, I simply changed the number of elements to be inserted/retrieved. As expected, the durations of object initialization (except for 2 points for Derby and H2) and database creation did not change with the number of elements to be inserted, Derby being still the slowest engine to create a simple database (1 table only). The durations of the insertion step increased slowly with all the database engine, except for SQLite+JDBC: you can see a much steeper initial angle in the increase of the duration in the graph below (be careful: x-axis shows logarithmic values).

For the retrieval, all the engines increased their time spent in this step in the same way (approximately). All the graphs can be seen here.

Performance analysis was completed using a free profiling tool, jRat (list of tools available here and here). There is a big difference here since jRat measures the time spent in each function. These functions approximately match the previous “steps” but not exactly. And I had memory problems using jRat with a number of elements inserted > 100 (hence the limit here).

Derby and SQLite+JDBC always performed worst than other engines (except for the showData() function). Usually, H2 and HSQLDB had more stable results (smaller standard deviations). And SQLite+JDBC was still the worst engine regarding data insertion (see graph below).

It was also very strange to see that H2 and HSQLDB took approximately the same time to insert 100 or 1000 elements (note that for HSQLDB, I did not take into account the fact one needs to explicitly close the connection, allowing HSQLDB to temporarily store data in memory before committing them to the file — but closing the connection didn’t take so much time). All the graphs can be seen here.

One conclusion of all this: if you write a Java application and need a fast Java database engine, use HSQLDB (BSD-like license) or H2 (modified MPL). Next time, I’ll test transactions (but I don’t know when it’ll be).

SQLite+JDBC, worst than Derby!

Following a comment from Alexandre on a previous post, I included SQLite in my performance test of database engines running under Java.

What prevented me from using SQLite in the previous test is that it’s not a pure Java database and one have to use third-party JDBC driver and implementation classes in order to manage this database engine. IMHO, I also dislike another fact: SQLite does not enforce data type constraints (and it’s a feature, not a bug) so everything is stored as ASCII string, even if you have very few other “artificial” data types.

In order to include SQLite in my test, I have to rely on a third-party libraries: David Crawshaw’s JDBC driver for SQLite. Some minor adaptations also had to be done to the code (see the SqliteTest class in the source code below). Brandon T. provides a good tutorial on how to use this driver on MS-Windows here.

The result? SQLite performances are worst than Derby! In this case, the slowest step is the data insertion: more than 11s for 100 insertions (see graph below). The database creation step is also slower than with H2 or HSQL (but much faster than Derby). If you compare the whole process (initialisation + creation + insertion + retrieval), SQLite (and its JDBC driver) is the worst database engine. The only good point is that it only creates one file (the other engines creates at least 2 files) and this file is only 5kb.

Click to show the normal size graph.
A graph with a log y-axis is available here

Protocol. Same as in the previous post except that now I have 4 engines to rotate. Source code is here (4kb) and the jar file + libraries is here (5Mb).

Why did Sun chose Derby?

I’m wondering why Sun chose Derby for its JavaDB …

I used JavaDB on a project and my main reason was that it’s embedded in the last Java Runtime Engine (JRE). But I saw a clear degradation of performances (my main criteria is speed) when I had to access the embedded database. And it became worst when I ran my project from a CD-ROM (because it has to be distributed).

So I decided to run a small, rough test and compare JavaDB with two other free Java database engines: H2 and HSQLDB. And the results are astonishing: JavaDB seems to be the slowest, hence the worst choice (except for the license). Here are the results (click to show the normal size graphs):

On the graph, you can see the duration in ns (nanoseconds) -vs- database activity steps (complete protocol below ; a second graph showing the step durations in logarithmic scale is available here). JavaDB took nearly 7s to create the database while all the steps are performed in less than 0.5s for all the other engines! JavaDB was also the engine that took the most space on the hard disk (1.63Mb ; H2 took 384kb and HSQL 5kb).

Of course, this test is a bit subjective since I’m not using any tuning (for any engine) and use sequential retrievals (SELECT * FROM tablename). For the moment I don’t care too much about random retrievals since I don’t need them. Anyway, I’ll quickly switch my application database engine!

Protocol. All tests were done on an Intel Pentium 4 at 2.4GHz with 512Mb of RAM running MS-Windows XP Pro and Java 6.0. The table contains three fields: an ID, a string (“title”) and an integer (the same int as the ID). The initialisation step only creates an object that will handle the database. The step “creation” creates the embedded database and table (i.e. the files on the disk). The insertion step … inserts 100 entries in the table and the retrieval step … retrieves and print these entries on the command line. One test successively creates a database with the three engines. In order to minimize “memory effects”, each step is repeated three times and I only take results from the third test. In addition, these three tests are repeated with each database engine in all the successive position: for example, graphs include data from tests with H2 in 1st, 2nd and 3rd position. On the graph, each step was tested 11 times (n=11).
If you want to test by yourself, sources are here (3kb) and jar executable is here (20kb).

Do your laptop fans produce a lot of noise?

Someone hoped my laptop doesn’t make too much noise after I posted a photo of the Tecra logo on Flickr. The short answer is no, it doesn’t make too much noise. At 10cm from the fan output, I can measure 42dB when the fan is off and 52dB when it’s on.

Beside the fact that I don’t hear that noise when I have my headphones, it was not sufficient for me. I wrote small python and gnuplot scripts to collect and display temperature, fan status and load (.tar.gz file, 1.3ko). During those 2 hours, I checked my e-mails, read news on the web and wrote the OPML output in catrss (that’s why load averages increase at the end, when I’m debugging the software). Here are the results (click on an image to see a larger version):

Graph of temperature and load on a Toshiba Tecra S1 for 2 hours after boot

Graph of temperature and fan activity on a Toshiba Tecra S1 for 2 hours after boot (see text for explanation on fan activity)

One strange thing is that the status of both fans (/proc/acpi/fan/FAN0/state and /proc/acpi/fan/FAN1/state) is always off. But, when you ask for the trip points (/proc/acpi/thermal_zone/THZN/trip_points), you see that FAN1 should be active above 45Â°C and FAN0 should be active above 104Â°C! Practically, after some observations, I realized that FAN1 is activated if the temperature is equal to or above 50Â°C and it doesn’t stop until temperature is equal to or below 45Â°C. That’s the behaviour displayed in green on the second chart.