The Belgian press is fighting for its rights (really?)

A lot of blogs, Belgian or not, are talking about the fact that the Belgian French-speaking press (lead by CopiePress, a Belgian rights management company) successfully sued Google in Belgium over indexing, author rights, content copying, etc. The full order is available on the Belgian Google homepage (in French).

I am not a lawyer. So I read the order:

CopiePress wanted the Belgian court to look at the lawfullness of Google News and Googgle Cache services, according to the Belgian law
CopiePress wanted Google to remove all links to any data from CopiePress clients
CopiePress wanted Google to publish the order in its first page on their Belgian website

So, CopiePress won the first case (the case will be heard again in appeal). I assume that the Belgian justice department is doing its job. So, let us consider that Google broke the Belgian law with their services. If you want to know more about the legal stuff, P. Van den Bulck, E. Wery and M. de Bellefroid wrote an article about which Belgian laws Google seems to have broken (in French).

I am not a lawyer but I grew up with the internet. In my opinion, the internet was technically not designed for the kind of use CopiePress wants. The internet was designed to share information in a decentralised way. All TCP/IP requests are equal (no intrinsic difference in paid/unpaid subscribed/unsubscribed access, i.e.). Search engines were “invented” later, when it became difficult to find a piece of information on the internet. Later on, people invented technical solutions to avoid being indexed by robots (robots.txt convention) or to avoid anyone having access to “protected” (unpaid) content. For instance, Le Soir robots file is useless (it dissallows nothing), La Libre Belgique robots file is only there to protect access statistics and advertisement images. And LeMonde.fr successfully protected its report on interns: no direct access, no Google cache.

As many other people (even on these journals blogs or even from journalists working for these journals), I think these newspapers will lose readers (hits), they will lose their credibility in the young generation of readers who, rightly or wrongly, loves all these free web services (Google, Flickr, YouTube, Skyblog, etc.). At least they lost mine because I am sure there are other ways to avoid Google on their pages and because I am asking myself some questions tending to show that they just want some free advertisement or even hide something else (see below).

Why aren’t they suing other search engines? Yahoo! indexes pages and articles from these newspapers, it even has a cached copy of these. MSN Newsbot also indexes pages and articles from these newspapers, with direct links to the articles (no roundabout way to the first page/ad). Etc. I suppose Google is the internet big player, the first search engine for the moment and they want to catch the public attention.

An very good article by D. Sullivan suggests that they are doing that for money only. Here is their new business plan: if we don’t succeed in selling our products, we’ll sue an internet big player for money!

Why Flemish-written newspapers didn’t launch such lawsuit against Google? Either they like what Google is doing, either they don’t care (or they are preparing such lawsuit).

Finally, these French-writing newspapers launched this lawsuit at the very same moment a French-speaking professional journalist association is launching a press campain against these newspapers practises with freelance journalists: minimum salary, undefined conditions, etc.. That’s strange because Google Cache exists since at least 2 years ; they didn’t noticed it before?

In summary, I am sure there are other ways to make search engines “friendly” with your news website. This lawsuit is giving a bad impression on Belgium and its French-written press in the electronic world. I am wondering how long it will take until they will again complain that their number of readers is down. I am not defending Google. I’m just criticising the French-written newspapers lawsuit.

EURON Ph.D. days in Maastricht

These last 1.5 days, I was in Maastricht (NL) for the 10th Euron PhD days. Euron is the “European graduate school of neuroscience”. I presented a poster and did an 15 minutes oral presentation of my last results. It was a good meeting in its 1st meaning: I met interesting people. I also enjoyed listening to other Ph.D. students’presentations since it always gives you i) a glimpse at what other people (in other universities) are interested in (by other means that paper/digital articles) and ii) the impression that you are not the only one to have problems with your protocol, your animals, your proteins, … The location was great (Fort Sint Pieter), sun was there. The ULg team was very small (only 4 Ph.D. students and 2 senior scientists on a total of about 100 participants) but this gave an occasion to know other students better.

Btw, the new EURON website is using Joomla, a free CMS, as a backend (look at the favicon and the meta tags in the HTML code)

Recognition

A media looked for someone with experience in scientific mazes. They contacted Rudy D’Hooge, from the Laboratory of Neurochemistry & Behaviour, University of Antwerp (with Prof. De Deyn, he wrote an authoritative review on the subject). He gave my name and my lab as a reference for the Morris water maze (*). Maybe he gave other names and labs but …

Nearly 4 years ago, I took the train to visit his laboratory in order to see how we could install a water maze in our lab, what protocol we need to use, pittfalls to avoid, … We were nowhere, I learned from them (**) and now they cited us as a reference lab. After so much toil and trouble, it is heart warming. Thank you.

(*) By the way, the photo illustrating the Wikipedia article on the Morris water maze is mine 🙂
(**) To be complete, I also learned the maze from Prof. C. Smith and Prof. Steinbusch’s lab

Neuroscience = pseudoscience?

At least, that’s what the Thunderbird speller found 😉

Thunderbird spell-checking screenshot

OK, it was also associated with “bioscience” and “omniscience”. I just thought it was fun after a day full of experiments and reading interesting articles (neuroscience is a real science).

Digital access to the ULg libraries

Although the University of Liege (ULg) network of libraries webpage is very old and ugly, the network is starting to use new, technologically advanced tools to allow digital access to its content (articles, books, thesis and other media). Three tools are available since a short time:

Source gives access to all media currently available in libraries (it replaces the Telnet-based Liber, for those who used it before). Source is based on Aleph from ExLibris, a proprietary software.
PoPuPS is a publication platform for scientific journals from the ULg and the FSAGx. PoPuPS is based on Lodel CMS, a free (GPL) web publishing software. Articles in this database seem to be Open Access although no precise licence is defined (and some articles look strange : see the second picture in this geological article).
BICTEL/e is an institutional repository of Ph.D. thesis. It seems to be developed internally by the UCL

With these tools, the ULg try to catch the Open Access movement. Source is already connected to other types of databases but it seems that PoPuPS and BICTEL are not (yet) connected to cross-references systems like DOI nor using standardised metadata like in Eprints.

P.S. an old tool is still very usefull: Antilope gives you the location (in Belgium) of publications. If your library doesn’t have a subscription to a specific journal, maybe another library in Belgium has it.

Lightweight installation of computer

This evening, I prepared a computer for the lab. Don’t blame me but it has to be under MS-Windows and with MS-Office. Knowing it’s only an Intel Pentium II MMX (“x86 Family 5 Model 4 Stepping 3”) with 64Mb of RAM and 2.4Gb of hard disk, I needed to find general software that has the smallest footprint in terms of both memory and hard disk consumption. Here is a small list of software I found interesting (mainly for me to remember):

7-zip, to create and read archive (free software, GPL)
Foxit Reader, to view and read PDF files (freeware, closed sources)
IrfanView for basic image manipulation (freeware, closed sources)
Mozilla Firefox to browse the web + webmail (free software, GPL)
PDF Creator to create PDF files (free software, GPL)
FreeRAM XP Pro to free and optimise RAM
Since it won’t be my computer, I didn’t installed those software but they could be useful: Vim to edit texts (free software, GPL-compatible charityware), Psi for Jabber (and other IM gateways) (free software, GPL) and R for statistics (free software, GPL)

It boots in 205 seconds ; it’s quite long but, hey, it’s Windows ! When Linux was on this computer, I could work in less than a minute after boot (with more software installed and more hard disk space left). Hard disk space left for data: 914Mb (it should be enough for a M.Sc. thesis).

Stream redirection in Python

Two computers are on the same network. A firewall segregates the internet from the intranet. Both computers can access everything on the intranet but only one of them is allowed to access the internet. The problem is to listen to a music stream from the computer that cannot access the internet.

A possible solution is to run a stream redirection software on the computer that can access the internet. Then the computer that cannot access the internet can get the stream from the intranet (figure below).

illustration of the redirection

Since I begin to play with Python, I tried to write such software with this language. Here is the code (preset for a Belgian radio, Pure FM):

#!/usr/bin/python
import socket
import traceback
import urllib2

# get the right input stream (in case it changes everyday)
# change the address to suit your need (yes: user input needed!)
address = "http://old.rtbf.be/rtbf_2000/radios/pure128.m3u"
content = urllib2.urlopen(address)
stream = content.readlines()
stream = stream[0][7:len(stream[0])-1]
inHost = stream[0:stream.index(":")]
inPort = int(stream[stream.index(":")+1:stream.index("/")])
inPath = stream[stream.index("/"):len(stream)]

# set output stream (default is localhost:50008)
outHost = ''
outPort = 50008

# get the in/out sockets
inSock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
inSock.connect((inHost, inPort))
outSock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
outSock.bind((outHost, outPort))

outSock.listen(1)
outNewSock, outAddress = outSock.accept()

# get the info from a *file*, not a simple host URL ...
inSock.send("GET " + inPath + " HTTP/1.0\r\nHost: " + inHost + "\r\n\r\n")

try:
    while 1:
        inData = inSock.recv(2048)
        if not inData:
            print "No data"
        else:
            print "Read ", len(inData), " bytes"
        outNewSock.send(inData)
        print "Sent data to out"
except Exception:
    traceback.print_exc()

# not really needed since program will stop by Ctrl+C or sth like that:
outNewSock.close()
outSock.close()
inSock.close()

Now the computer that doesn’t have access to the internet can connect to port 50008 on the other computer to get the stream and listen to music. It was quite simple.

Note that if you are trying this within IDLE on MS-Windows, you’ll get some errors because of problems in the synchronisation of the mpeg stream.

Another scientific paper from the Poirrier-Falisse!

Finally, a second scientific paper is published by the Poirrier-Falisse (a first paper for me):

Poirrier JE., Poirrier L., Leprince P., Maquet P. “Gemvid, an open source, modular, automated activity recording system for rats using digital video“. Journal of Circadian Rhythms 2006, 4:10 (full text, doi)

It is still in a provisional PDF version but already available on the web and Open Access (of course)! Here is my BibTex entry. I will upload source code tonight on the project website.

Done some spot picking

Today, I did some “spot picking”. In 2D electrophoresis, you disperse proteins in a gel according to their electric charge and mass. You obtain a kind of map of proteins and, if you stain these proteins, you have a map of spots (example here). After some analysis, it could be good to identify some proteins of interest. The problem is that they are in the gel! So, today, I used a robot called “spot picker” that … picks spots representing proteins of interest out of the gel. You can see what a spot picker looks like in my proteomic set on Flickr.

How to fight spam in a wiki?

On Friday, having to wait for a librarian to fetch the old articles I wanted to read, I spent a few minutes removing spam from the AEL wiki. This form of spam is very easy to spot because it’s always the same : <small> HTML tags enclosing 30 links and the text linking to these sites have well-known spam, adult-oriented words in it (see the end of MsSecurity page where I didn’t had time to remove spam).

After the librarian gave me my articles, I went back to my lab, thinking of a possible solution … This kind of spam is constant. Why not writing a simple software bot that will fetch each and every page on a wiki, check if there is some litigious content in it and then going to the next page or cleaning the content. This bot wouldn’t prevent spam but can act quickly after spam (e.g. if you launch it every hour/day/week with cron).

This afternoon, I thought I could not be the only one trying to find ways to fight spam on wiki. Indeed, fighting wiki spam already has a verb, “to chongq” (although it also includes retaliation), a Wikipedia page (event two) and many other dedicated pages.

Basically, there is three types of behaviour to fight spam: wiki-specific methods, general http/web methods and manual actions.

Wiki-specific methods are add-ons to your wiki system that help prevent spammer to modify your wiki. For example, Wikimedia has its anti-spam features and a Spam blacklist extension, TWiki has a Black List plugin, etc. Once set up, you generally do not need to care about them (except to see if they are properly working, to update them, etc.).
General http/web methods use general web mechanisms and/or special features independent from the wiki software you use. These systems are also automated, like Bad Behaviour, use of CAPTCHA images, use of the “rel=nofollow” attribute in link tags, etc.
Finally, manual actions can be taken by any human: removing spam like I did, renaming well-known wiki pages like sandbox, etc. The only advantage of this method is that the human brain can easily adapt itself to new forms of spam. Otherwise, it’s rather time consuming …

Finally, I read that some spam bots are removing spam, but only a part of it. This is the kind of thing I would like to do, but it should remove all the spam. (But before this one, I should begin the simple, geek blog software).

Addedum on August, 21st, 2006: independently of this post, Ploum made an interesting summary of a post from Mark Pilgrim (this post looks rather old: 2002!). In his post, Mark Pilgrim sees two ways of fighting spam: club or lowjack solutions.

With a club solution, your wiki is protected against lazy spammers. Clubs are technical solutions that make it harder for spammers to vandalize your website/wiki/blogs/etc. The Club works as long as not everyone has it. Once everyone had clubs, spammers will think a little bit and update their software to circumvent most of your clubs. In conclusions, “the Club doesnâ€™t deter theft, it only deflects it.”

With a Lojack solution, your wiki isn’t necessarily protected but spammers that will vandalize it will be traced back. “Although it does nothing to stop individual crimes, by making it easier to catch criminals after the fact, Lojack may make auto theft less attractive overall.”

My bot that completely removes spam is definitely not a lojack. But it’s not a club neither. This tool will allow you to be spammed and it will not trace spammers back. Still, it will be less attractive for spammers to add links on wiki since they will be removed soon after being added.

(Btw, I’ve just noticed that comments were automatically forbidden for any post. That was not intentional)