Month: November 2006

Double quotes!

GGGRRrrrrrrr … I was quietly using R to analyse my data when, suddently, I wasn’t able to open the file containing these data anymore. It’s just a plain text file! How can it be corrupted? Here is the error message:

t < - read.table('ratsdata.csv', header=TRUE, sep=",")
Warning message:
incomplete final line found by readTableHeader on 'ratsdata.csv'

For hours, I tried everything: I counted the number of separators on each line, I counted the number of decimal points on each line, I removed double quotes around factors, I examined in details the final line, etc. (well, Python scripts did the job for me because my file already has > 700 lines). Finally, the solution was so dumb: I mistakenly deleted one double quote before a header. My first line looked like (1) and should look like (2):

1: "id", "group", "trial", "durTot", durExt" [...]
2: "id", "group", "trial", "durTot", "durExt" [...]

Ok, next time, I’ll add this check on my list …

White & Nerdy

It’s Sunday, let’s rest a little bit … I really liked this Al Yancovic‘s video “White & Nerdy“. To fully understand it, you need some basic technical background and a friend that looks like the white & nerdy guy in the video. Because, of course, you are not like him 😉

Dont’ download this song” is also great (some background about DRM is welcome).

It’s all about Pentiums” is not my music style but some lyrics are good.

If you don’t like Flash movies, you can use the Firefox OOk video plug-in to obtain the FLV file and then read it with VLC or convert it with ffmpeg (e.g.).

Simple Sitemap.xml builder

In a recent post, Alexandre wrote about web indexing and pointed to a nice tool for webmaster: the sitemap. The Sitemap Protocol “allows you to inform search engines about URLs on your websites that are available for crawling” (since it’s a Google creation, it seems that only Google is using it, according to Alexandre).

If you have a shell access to your webserver and Python on it, Google has a nice Python script to automatically create your sitemap.

I don’t have any shell access to my webserver 😦 But I can write a simple Python script 🙂 Here it is: sitemapbuilder.py, 4ko. After having specified the local directory where all your files are and the base URL for your on-line website (yes, you need to edit the script), launch the script and voilà! You can now upload your sitemap.xml file and tell web crawlers where to find the information on your website.

Interested in the other options you can specify?

  • You can specify an array of accepted file extensions. By default, I’ve put ‘htm’, ‘html’ and ‘php’ but you can add ‘pdf’ if you want.
  • You can specify an array of filenames to strip. By default, I strip all ‘index.*’ (with * = one of the accepted extensions) because http://www.poirrier.be is the same as http://www.poirrier.be/index.html but more easier to remember
  • You can specify a change frequency (it will be the same for all files)
  • You can specify a priority (it will also be the same for all files and even omitted if equal to the default value)

On the technical side, there is nothing great (I even don’t use any XML tool to generate the file). I was impressed by the ease of walking through a directories/files tree with the Python os.walk function (it could be interesting to use it in the future blog system I mentioned earlier).

Finally, you can see the sitemap.xml generated for my family website.

Edit on November 17th: it seems that Google, Yahoo! and Microsoft made a team to release a “more official” sitemaps specification: http://www.sitemaps.org/.