Category: Reading

Implication of Oracle buying Sun on Open Source projects?

Oracle and Sun announced a few days ago that Oracle will buy Sun. Others are more apt than me to comment on the financial and strategic impacts of this move (for example, in the Guardian, the New York Times, the Wall Street Journal or on Slashdot). I’m more interested in the potential implications this move could have on some Open Source projects which were backed by Sun. I indeed believe Oracle will continue the development of his contributions to Open Source software, whether they are notable (Btrfs or Oracle Enterprise Linux) or less visible.

In the last few years, Sun opened or started to open some of its (key) software like OpenOffice.org, Netbeans, OpenSolaris, Java, … Sometimes these moves were considered as a last hope to see them used (and developed) at a lower cost for Sun. Very often, these moves were criticised because the “opening” was only partial (non-free licenses, stranglehold on the development processes, …) or just announced (Java still needs to be fully opened). However, the openings of OpenOffice.org and Netbeans can be seen as successes: OpenOffice.org is a more and more used office suite and Netbeans fairly competes with another open development source-editor-cum-development-platform, Eclipse. In the beginning of 2008, Sun acquired MySQL AB, the company behind the probably most used database system for website development, MySQL. Unfortunately, rumors spread that Sun will close some of the MySQL features, leading to forks like Maria(DB) (rumors where later dismissed). Anyway, these software are (nearly) free. But they may not be in Oracle strategic plannings.

Oracle now owns 2 database management systems: Oracle and MySQL. Although they maybe do not compete at the same level and although I don’t see Oracle dumping one RDBMS (because of their respective user base), it could become expensive to maintain 2 code bases for the same goal.

Oracle now owns 2 operating systems too: Oracle Enterprise Linux and (Open)Solaris. And here, they compete at the same level: on enterprise desktops and servers. The beauty of Open Source is that OpenSolaris may survive thanks to its community if it would be abandoned by Oracle.

Oracle has now the lead on the development of an IDE, Netbeans, while it extensively uses and promotes its rival, Eclipse. Fortunately for Netbeans, it has a strong community behind … I guess it’s approximately the same for Sun virtualisation software, VirtualBox (no immediate use for Oracle) but I’m not really following these technologies so I won’t bet anything on this.

Oracle now also has the lead on the development of Java, a programming language cherished by a lot of companies around the world (some say Java is the COBOL of the 1990s …). Oracle also uses Java for its tools so I guess Oracle will continue its development. Whether the opening of Java will continue and if it does, at what speed, one can assume it will depend on the financial and/or fame benefits Oracle can gain from it.

Oracle owns now an office suite. I don’t really see how it fits into Oracle software portfolio unless Oracle really pushes hard its adoption in companies where Microsoft Office has a monopoly. Or Oracle intends to beat Microsoft by offering a complete solution, from corporate servers (with Oracle DB, Enterprise Linux, BEA/Tomcat application servers and Sun hardware) to corporate desktops (with OpenSolaris (?) and OpenOffice.org), Oracle’s CEO Larry Ellison being known to forecast the end of Microsoft. By providing top-to-toe-solutions, this would make Oracle the next IBM but this is another subject.

So, except for Java (and maybe OpenOffice.org), I’m rather pessimistic on the future of these Open Source / free software projects. Does this mean that they will not survive? I don’t think so. They users/fans base is sometimes huge. And similar high-quality Open/Free projects live very well without one big corporation behind them ; think of PostgreSQL, Linux, Eclipse, Python/Ruby, etc.

Ryan Paul wrote an article in ArsTechnica on the same topic, for those who are interested.

Post publishing editing

Brad Burnham recently wrote a post on the editorial process on the web, where the work happens after the publish button is pushed, not before it. It’s a report on a forum session and you can read some stakeholders opinions in the post. There are a serie of good points in the post and comments but, imho, there are also some questions left unanswered.

Basically, blog posts are edited after their publication: if I wrote something wrong, people will tend to post comments correcting what is wrong. That’s why Robert Scoble doesn’t agree with Andrew Keen when the latter argued that “the recent rise of user generated content is lowering the overall quality of programming on the web”. I think there is a “population effect”: the more visitors you have, the more edition and discussions you can have.

Now I would like to know if and how people edit their original posts after getting new input and/or corrections. I try to add a note at the bottom clearly stating the edition. What if I don’t do that? Content will vary over time. How can you trust such content? Since RSS feeds are not reflecting those changes, people using RSS are not aware of these changes. As stated in the post I was referring to, people need new tools to continuously assess the relevance of information they read on the internet (it was already the case with static content but this need becomes more urgent as dynamic content are more easily created and modified). Or bloggers must endorse “rules of conduct” not to edit posts but in this case, it’s up to the visitor to read comments and to summarize what should be the truth out of all this.

I think we are all used to the traditional media editing process: once it’s written, it should be right or corrected in the next publication (by the way the process is mutatis mutandis the same in mainstream media as well as in the scientific literature). Ditto for television where once it’s aired, it should be either correct or later corrected. The main difference is that blog posts and dynamic content in general are there at an instant i and stay during a certain duration (delta-i) in an unmodified form OR in a modified form. If you read the same popular page of the wikipedia (or any other wiki, e.g.) everyday, you’ll never read twice the same information (but main facts will still remain the same).

Finally, unlike Brad Burnham and people who wrote comments on his post, I don’t consider this “post publishing edition” as a problem for decision making (would it be at the business, personal or scientific level). Fact-checking and multiple sources of information are not yet obsolete.

Vote électronique (electronic vote)

(This post will be in French since it concerns French and French-speaking Belgians and gives links to websites written in French)

Les élections françaises approchant (c’est demain !), un certain nombre de personnes ont émis de vifs doutes sur le vote électronique, doutes relayés par la presse (exemple). Je voulais juste épingler le blog de Laurent Pieuchot, conseiller municipal d’Issy-les-Moulineaux (près de Paris, en France). Il y décrit les cafouillages, gaffes, incertitudes et autres mensonges à propos du vote électronique et des ordinateurs de vote dans plusieurs communes françaises. Et ils vont voter pour la plus haute fonction de l’État français …

Quelques liens intéressants sont également répertoriés, comme ordinateurs-de-vote.org, un site web regroupant “citoyens et informaticiens pour un vote vérifié par l’électeur”. Ou encore la section vote électronique du blog de François Nonnenmacher (un consultant informatique), recensant les dernières nouvelles françaises sur le sujet (il y en a qui sont édifiantes !). Pour se rendre compte plus pratiquement des problèmes posés par le vote électronique, une machine à voter virtuelle va vous montrer qu’un ensemble de votes peut être falsifié tout en présentant les apparences de sécurité, vérification (vérifiabilité ?), agrément, certification, etc. (lisez la foire aux questions par après pour voir si vos doutes peuvent être levés ou vos certitudes ébranlées).

Finalement, pour les belges, si vous pensez que vous êtes à l’abri de ce genre d’histoire, vous pouvez vous rendre sur le site pour Eva. Vous verrez que même Eva n’est pas très sure que son vote électronique soit correctement pris en compte en Belgique (je sais le jeu de mot est facile).

Unlimited storage in online apps

Although I liked Bill Burnham’s post about the “storage explosion” I think he forgot one thing in one of his last posts. In “YahooMail, Storage, and the Battle For Personal Data” he explains the announcement of unlimited e-mail storage for free by Yahoo! is the indication of two trends: for him, the obvious one is that storage is cheap and the less-obvious trend is that there will be a battle to control the user data in such “web applications”.

IMHO, he (probably unintentionally) forgot to mention one important question for me: will the user agree to let big companies manage/control/look at their (personal/private/whatever) data? And will the user still have the control of their own data?

First, imagine one day without internet connection? It could already be difficult now; it will be even more difficult when you’ll not be able to access your data (Firefox3 and its support for offline applications could be the beginning of a solution). Second, by putting your data in online applications like webmail, online wordprocessor, spreadsheet, etc., the user is giving these companies control over their data. Who read service, privacy and IP policies? Everyone should, nearly no one does (and GooDiff is there to help ;-)). Moreover, everything is free today (or advertisement-based, to be precise), will it still be free tomorrow? Will the features you enjoy today still be there in 1 or 2 years? Finally, in the off-line computer world, you are not locked-in anymore by big companies and proprietary file formats. While it is possible to save your data from big web applications on your hard disk (and sometimes in standardized formats), you are not able to easily/automatically retrieve your data from most of these “Web2.0” toys. I’ll end up this early post by a quotation from Peter Rip’s blog (emphasis is mine):

Much of the “easy” innovation seems to have been wrung out of the Web 2.0 wave. […] Now the hard work begins, again. The next wave of innovation isn’t going to be as easy. […] Now the hard part is moving from Web-as-Digital-Printing-Press to true Web-as-Platform. To make the Web a platform there has to [be] a level of content and services interoperability that really doesn’t exist today.

Google -vs- CopiePress, II

The Belgian Justice confirmed its original judgement by condemning Google News service to remove all articles citations from some French-speaking newspapers. The Google cache is also considered illegal in Belgium (see beginning of the story here).

In an interview with the “Echo” (Belgian) newspaper, Alain Strowel, lawyer specialised in authors’rights, said the judgement is correct but also raised several questions:

  • What is exactly behind the word “cache”? If a cached document is still formatted as the original document, I understand it could be forbidden by the law. But I guess all the search engines are using indexes where they put all the words from any webpage (regular webpage or newspaper article: it’s just HTML). What about these indexes? If they are considered as a cache, then any webpages from these newspapers shouldn’t be indexed and they’ll then be unavailable. Since they also sued Yahoo! and MSN (with much less buzz), this will mean they won’t be visible on the internet, except if you directly type their URL. Is that what they want?
  • Alain Strowel said this judgement can bring back the debate about the exceptions to authors’rights. Currently, the exceptions are those ones (in French). With all the so-called “laws against terrorism”, I fear this will mean a reduction of the number of exceptions.
  • It’s difficult to obtain web statistics on these newspapers websites. A lot of people guessed the number of visits will go down but it’s the first time a journalist (the interviewer) said this number actually decreased (and the lawyer agreed).

Finally, this whole thing won’t make me change my opinion:

  1. If you want to play and publish on the internet, get to know its rules (robots.txt, no-cache HTML tag, …) and then you can complain if something is still wrong (they even don’t know how to correctly use the simple robots.txt file)
  2. Internet is open by nature: if you really don’t want something to be read, cited, copied, etc. don’t put it on the internet.

You can read an interesting point of view on ArsTechnica: “Google defeated in Belgian copyright case; everyone but Google loses“. And one of the newspapers’news editor is inviting people to discuss about on-line newspapers; it could be interesting 🙂

Links to some interesting documents

Some interesting links for today (didn’t had time to read everything, that’s why I’m storing them here):

Some links about companies:

And, finally, two lighter websites:

Personal storage is the future

While everyone (interested in this topic) is looking at internet applications, using and abusing of buzzwords like "Web 2.0", "mashup" or "Ajax", I think the next wave of cool software applications will be related to personal, local storage and organisation of documents.

Of course, you have > 2Gb of storage in most free e-mail services. Of course, you have broadband access at home, at work and nearly everywhere you go. Of course, you can watch movies on the web. Of course, you can share photos on the web. Of course, you can download songs and books on the web. Etc.

But you need personal, local storage to keep all this data (photos, videos, music, even MS-Office documents are becoming bigger and bigger). Some companies (like BeInSync, FilesAnywhere or Nomadisk (*)) are trying to sell you "hyper-cool" (emphasis is not mine) remote storage solutions. These solutions are good for a small amount of data or for personal usage, not for companies with IP concerns or companies/individuals with privacy concerns (data not encrypted during transfer or left in the host computer hard disk / cache, e.g.).

In this respect, Bill Burnham wrote a somehow interesting blog entry, the storage explosion (based on a Tom’s hardware article). After bringing to mind the "scarcity and abundance" theory of IT development, he noticed the computer storage capacity increased 5907 times in 15 years (more than CPU perfs or network bandwith, if such a comparison can be done) while storage costs dropped by more that 99%! After a short review of current and (near) future hard/flash disks specifications, the obvious question is: "What Happens In A World Awash With Storage?". The forecast is obviously "very interesting implications for both software and internet related businesses".

Some applications are already taking advantage of this storage-flooding … Look, for example, at all the GNU/Linux OS + applications you can use from a simple Flash disk: Knoppix, Flash Linux, Slax, Ubuntu, … It even works with embedded systems and Windows! Other applications are ready to crawl and let you search your hard disks for documents: Beagle, Google Desktop, Spotlight, … Some companies are starting selling kind of virtualization packages based on the increasing amount of space on a USB drive : see MojoPac, for an example on Windows or simply portable apps (without OS). Etc. Where your imagination will stop you?

Since it costs nothing, doesn’t hurt and a lot of people did it before me, here is my prediction for 2007 (in the IT world): the advent of personal storage solutions. What’s yours?

* btw. Nomadisk is just packaging free software in a proprietary package. They provide some sources upon request but I wonder if they comply with the other open source licenses since they don’t provide the modified source code. Or their application is so trivial that any software engineer can write the same software, without even modifying the original components?

P.S.: Ok, now, let’s concentrate on my Ph.D. dissertation …

The Digital Ice Age

In this article, Brad Reagan gives many examples where the use of electronic data begins to cause problem, in a preservation perspective. The causes can be a new software that is not fully compatible with previous data models, new physical formats (unable to play old formats), too much raw information, etc. For the moment, free projects like the Internet Archive or the Free Archive (a.o.) are trying to cope with this problem.

Although the dangers of a “digital blackout” really exist, I think the author forgets one important aspect of information from the past: we already lost a lot of it. What is left is what time left us, often with some damages. It survived time, taking many different forms and paths, different storage procedures, different media, different locations, etc.

Instead of trying to store everything, maybe we should look at storing the most relevant information only. But now the question would be: how to know what is relevant and what is not? I vaguely feel that one can also add a notion of time: the e-mails I receive about a future party or a new product are not worth for more than 1 or 2 years although electronic exchanges that led to the discovery of siRNA, e.g., are much more important.

The Belgian press is fighting for its rights (really?)

A lot of blogs, Belgian or not, are talking about the fact that the Belgian French-speaking press (lead by CopiePress, a Belgian rights management company) successfully sued Google in Belgium over indexing, author rights, content copying, etc. The full order is available on the Belgian Google homepage (in French).

I am not a lawyer. So I read the order:

  • CopiePress wanted the Belgian court to look at the lawfullness of Google News and Googgle Cache services, according to the Belgian law
  • CopiePress wanted Google to remove all links to any data from CopiePress clients
  • CopiePress wanted Google to publish the order in its first page on their Belgian website

So, CopiePress won the first case (the case will be heard again in appeal). I assume that the Belgian justice department is doing its job. So, let us consider that Google broke the Belgian law with their services. If you want to know more about the legal stuff, P. Van den Bulck, E. Wery and M. de Bellefroid wrote an article about which Belgian laws Google seems to have broken (in French).

I am not a lawyer but I grew up with the internet. In my opinion, the internet was technically not designed for the kind of use CopiePress wants. The internet was designed to share information in a decentralised way. All TCP/IP requests are equal (no intrinsic difference in paid/unpaid subscribed/unsubscribed access, i.e.). Search engines were “invented” later, when it became difficult to find a piece of information on the internet. Later on, people invented technical solutions to avoid being indexed by robots (robots.txt convention) or to avoid anyone having access to “protected” (unpaid) content. For instance, Le Soir robots file is useless (it dissallows nothing), La Libre Belgique robots file is only there to protect access statistics and advertisement images. And LeMonde.fr successfully protected its report on interns: no direct access, no Google cache.

As many other people (even on these journals blogs or even from journalists working for these journals), I think these newspapers will lose readers (hits), they will lose their credibility in the young generation of readers who, rightly or wrongly, loves all these free web services (Google, Flickr, YouTube, Skyblog, etc.). At least they lost mine because I am sure there are other ways to avoid Google on their pages and because I am asking myself some questions tending to show that they just want some free advertisement or even hide something else (see below).

Why aren’t they suing other search engines? Yahoo! indexes pages and articles from these newspapers, it even has a cached copy of these. MSN Newsbot also indexes pages and articles from these newspapers, with direct links to the articles (no roundabout way to the first page/ad). Etc. I suppose Google is the internet big player, the first search engine for the moment and they want to catch the public attention.

An very good article by D. Sullivan suggests that they are doing that for money only. Here is their new business plan: if we don’t succeed in selling our products, we’ll sue an internet big player for money!

Why Flemish-written newspapers didn’t launch such lawsuit against Google? Either they like what Google is doing, either they don’t care (or they are preparing such lawsuit).

Finally, these French-writing newspapers launched this lawsuit at the very same moment a French-speaking professional journalist association is launching a press campain against these newspapers practises with freelance journalists: minimum salary, undefined conditions, etc.. That’s strange because Google Cache exists since at least 2 years ; they didn’t noticed it before?

In summary, I am sure there are other ways to make search engines “friendly” with your news website. This lawsuit is giving a bad impression on Belgium and its French-written press in the electronic world. I am wondering how long it will take until they will again complain that their number of readers is down. I am not defending Google. I’m just criticising the French-written newspapers lawsuit.