Tag: database

Implication of Oracle buying Sun on Open Source projects?

Oracle and Sun announced a few days ago that Oracle will buy Sun. Others are more apt than me to comment on the financial and strategic impacts of this move (for example, in the Guardian, the New York Times, the Wall Street Journal or on Slashdot). I’m more interested in the potential implications this move could have on some Open Source projects which were backed by Sun. I indeed believe Oracle will continue the development of his contributions to Open Source software, whether they are notable (Btrfs or Oracle Enterprise Linux) or less visible.

In the last few years, Sun opened or started to open some of its (key) software like OpenOffice.org, Netbeans, OpenSolaris, Java, … Sometimes these moves were considered as a last hope to see them used (and developed) at a lower cost for Sun. Very often, these moves were criticised because the “opening” was only partial (non-free licenses, stranglehold on the development processes, …) or just announced (Java still needs to be fully opened). However, the openings of OpenOffice.org and Netbeans can be seen as successes: OpenOffice.org is a more and more used office suite and Netbeans fairly competes with another open development source-editor-cum-development-platform, Eclipse. In the beginning of 2008, Sun acquired MySQL AB, the company behind the probably most used database system for website development, MySQL. Unfortunately, rumors spread that Sun will close some of the MySQL features, leading to forks like Maria(DB) (rumors where later dismissed). Anyway, these software are (nearly) free. But they may not be in Oracle strategic plannings.

Oracle now owns 2 database management systems: Oracle and MySQL. Although they maybe do not compete at the same level and although I don’t see Oracle dumping one RDBMS (because of their respective user base), it could become expensive to maintain 2 code bases for the same goal.

Oracle now owns 2 operating systems too: Oracle Enterprise Linux and (Open)Solaris. And here, they compete at the same level: on enterprise desktops and servers. The beauty of Open Source is that OpenSolaris may survive thanks to its community if it would be abandoned by Oracle.

Oracle has now the lead on the development of an IDE, Netbeans, while it extensively uses and promotes its rival, Eclipse. Fortunately for Netbeans, it has a strong community behind … I guess it’s approximately the same for Sun virtualisation software, VirtualBox (no immediate use for Oracle) but I’m not really following these technologies so I won’t bet anything on this.

Oracle now also has the lead on the development of Java, a programming language cherished by a lot of companies around the world (some say Java is the COBOL of the 1990s …). Oracle also uses Java for its tools so I guess Oracle will continue its development. Whether the opening of Java will continue and if it does, at what speed, one can assume it will depend on the financial and/or fame benefits Oracle can gain from it.

Oracle owns now an office suite. I don’t really see how it fits into Oracle software portfolio unless Oracle really pushes hard its adoption in companies where Microsoft Office has a monopoly. Or Oracle intends to beat Microsoft by offering a complete solution, from corporate servers (with Oracle DB, Enterprise Linux, BEA/Tomcat application servers and Sun hardware) to corporate desktops (with OpenSolaris (?) and OpenOffice.org), Oracle’s CEO Larry Ellison being known to forecast the end of Microsoft. By providing top-to-toe-solutions, this would make Oracle the next IBM but this is another subject.

So, except for Java (and maybe OpenOffice.org), I’m rather pessimistic on the future of these Open Source / free software projects. Does this mean that they will not survive? I don’t think so. They users/fans base is sometimes huge. And similar high-quality Open/Free projects live very well without one big corporation behind them ; think of PostgreSQL, Linux, Eclipse, Python/Ruby, etc.

Ryan Paul wrote an article in ArsTechnica on the same topic, for those who are interested.

Belgian police is storing personal details in a database

If you live in Belgium, you probably noticed a small bu zz about a database police is building about Belgian citizens and, more precisely, about the access control of this database. The “problem” is that this database already exists and it has a legal basis since … 1998 (10 years!). But mainstream media won’t tell you that (or I’m unaware of it). I don’t think there is a conspiracy. It’s just that, sadly, the current economic environment doesn’t leave much space for this kind of information. The Minister of Justice’s website has more info on this database and its content (excerpt of translation below):

The database already appeared a royal decree. This decree states that the police can store a bunch of sensitive data about certain categories of Belgian citizens since they are 14-years-old.
These include information on about family ties, consumption habits, ethnicity, physical and mental health, political and religious beliefs, membership of trade unions and political parties and suspicions of criminal offenses.

…

So what can we do about it? Human rights organisations as well as members of the Parliament (La Chambre, look for “P0499”) questioned the Minister of Justice, Jo Vandeurzen. He agreed that there should be both internal and external controls on what is inserted, who have access to the data, who can check the data and the access, … He promised the “Committee P“, the privacy committee and a supervisory body headed by a magistrate will be consulted. Let’s see …

More on Java DBs comparison

Following a comment from Alexandre on a previous post, I went a little bit further with my performance test of database engines running under Java. This evening, I tested a profiling tool and a variable number of insertions/retrievals (I didn’t tested transaction).

Taking the code from the previous time, I simply changed the number of elements to be inserted/retrieved. As expected, the durations of object initialization (except for 2 points for Derby and H2) and database creation did not change with the number of elements to be inserted, Derby being still the slowest engine to create a simple database (1 table only). The durations of the insertion step increased slowly with all the database engine, except for SQLite+JDBC: you can see a much steeper initial angle in the increase of the duration in the graph below (be careful: x-axis shows logarithmic values).

For the retrieval, all the engines increased their time spent in this step in the same way (approximately). All the graphs can be seen here.

Performance analysis was completed using a free profiling tool, jRat (list of tools available here and here). There is a big difference here since jRat measures the time spent in each function. These functions approximately match the previous “steps” but not exactly. And I had memory problems using jRat with a number of elements inserted > 100 (hence the limit here).

Derby and SQLite+JDBC always performed worst than other engines (except for the showData() function). Usually, H2 and HSQLDB had more stable results (smaller standard deviations). And SQLite+JDBC was still the worst engine regarding data insertion (see graph below).

It was also very strange to see that H2 and HSQLDB took approximately the same time to insert 100 or 1000 elements (note that for HSQLDB, I did not take into account the fact one needs to explicitly close the connection, allowing HSQLDB to temporarily store data in memory before committing them to the file — but closing the connection didn’t take so much time). All the graphs can be seen here.

One conclusion of all this: if you write a Java application and need a fast Java database engine, use HSQLDB (BSD-like license) or H2 (modified MPL). Next time, I’ll test transactions (but I don’t know when it’ll be).

SQLite+JDBC, worst than Derby!

Following a comment from Alexandre on a previous post, I included SQLite in my performance test of database engines running under Java.

What prevented me from using SQLite in the previous test is that it’s not a pure Java database and one have to use third-party JDBC driver and implementation classes in order to manage this database engine. IMHO, I also dislike another fact: SQLite does not enforce data type constraints (and it’s a feature, not a bug) so everything is stored as ASCII string, even if you have very few other “artificial” data types.

In order to include SQLite in my test, I have to rely on a third-party libraries: David Crawshaw’s JDBC driver for SQLite. Some minor adaptations also had to be done to the code (see the SqliteTest class in the source code below). Brandon T. provides a good tutorial on how to use this driver on MS-Windows here.

The result? SQLite performances are worst than Derby! In this case, the slowest step is the data insertion: more than 11s for 100 insertions (see graph below). The database creation step is also slower than with H2 or HSQL (but much faster than Derby). If you compare the whole process (initialisation + creation + insertion + retrieval), SQLite (and its JDBC driver) is the worst database engine. The only good point is that it only creates one file (the other engines creates at least 2 files) and this file is only 5kb.

Click to show the normal size graph.
A graph with a log y-axis is available here

Protocol. Same as in the previous post except that now I have 4 engines to rotate. Source code is here (4kb) and the jar file + libraries is here (5Mb).

Why did Sun chose Derby?

I’m wondering why Sun chose Derby for its JavaDB …

I used JavaDB on a project and my main reason was that it’s embedded in the last Java Runtime Engine (JRE). But I saw a clear degradation of performances (my main criteria is speed) when I had to access the embedded database. And it became worst when I ran my project from a CD-ROM (because it has to be distributed).

So I decided to run a small, rough test and compare JavaDB with two other free Java database engines: H2 and HSQLDB. And the results are astonishing: JavaDB seems to be the slowest, hence the worst choice (except for the license). Here are the results (click to show the normal size graphs):

On the graph, you can see the duration in ns (nanoseconds) -vs- database activity steps (complete protocol below ; a second graph showing the step durations in logarithmic scale is available here). JavaDB took nearly 7s to create the database while all the steps are performed in less than 0.5s for all the other engines! JavaDB was also the engine that took the most space on the hard disk (1.63Mb ; H2 took 384kb and HSQL 5kb).

Of course, this test is a bit subjective since I’m not using any tuning (for any engine) and use sequential retrievals (SELECT * FROM tablename). For the moment I don’t care too much about random retrievals since I don’t need them. Anyway, I’ll quickly switch my application database engine!

Protocol. All tests were done on an Intel Pentium 4 at 2.4GHz with 512Mb of RAM running MS-Windows XP Pro and Java 6.0. The table contains three fields: an ID, a string (“title”) and an integer (the same int as the ID). The initialisation step only creates an object that will handle the database. The step “creation” creates the embedded database and table (i.e. the files on the disk). The insertion step … inserts 100 entries in the table and the retrieval step … retrieves and print these entries on the command line. One test successively creates a database with the three engines. In order to minimize “memory effects”, each step is repeated three times and I only take results from the third test. In addition, these three tests are repeated with each database engine in all the successive position: for example, graphs include data from tests with H2 in 1st, 2nd and 3rd position. On the graph, each step was tested 11 times (n=11).
If you want to test by yourself, sources are here (3kb) and jar executable is here (20kb).