Category: Open Source

I give up!

OK, I am giving up the Jadoo project. It could have been a very interesting project. But if I don’t give up now, it will stay in my mind and prevent me from starting new projects or continuing more important projects. But it’s a temporary giving up: who knows how much time I’ll have in 2 or 3 months.

For those interested in Jadoo, here is the short story: everything started with a post on Alexandre Dulaunoy’s blog, then I tried a first version and finally I wrote a small update. Files are still here: jadwrite.py and jadpub.py.

I think the two main drawbacks with Jadoo are that I need to update files manually (a Python script could have done the job) and that I cannot post from any computer (I need Python, a text editor and all the previous posts -at least the last 10 ones- in order to update the blog). These are some points to think about before a possible come back.

Some news about Jadoo

Here are some news about the Jadoo blog engine …

  • I updated the CSS file (2ko) and corrected some mistakes; now, all HTML/CSS tags are correctly used.
  • I updated the main script in order to link to Technocrati for all tags
  • I also updated the footer (= the side bar in the published page); it now includes the Technocrati search box
  • I added the blogroll. It’s not showing links in random order like many other blog engines. But do we need that feature?

Some tasks still need to be done:

  • Recover all my previous posts and add them here (it will be in PHP since everything is still in the DB on this website)
  • To show previous posts, I am still wondering how it could be done. I want it to be fast and simple for the engine (no DB, remember?). And I want it to be simple and easy to navigate for the reader.
  • Correct the RSS file (1ko) by adding the content in it and see why it doesn’t seem to work in most RSS feeds readers
  • Think about how people can send comments. For the moment, readers can’t send comments. It’s a feature, not a bug. And it’s also like that in some other blog engines (like oddmuse but it uses "comment pages"). The vast majority of blog engines do include a comment tool. Do  I  you need it?

Although I didn’t do an in-depth (Merise/UML) analysis of what a blog engine should do and how it should do it (programmatically), I can say a blog is a really easy to write, at least its most common tasks like publishing posts. Nearly everything else is static text added (could be dynamically updated with PHP if you want) and a CSS style sheet. That’s why Jadoo can even work with non-PHP hosting plans. And you have a backup of all your posts, you don’t need to worry about dumping a database, etc. 🙂

Any further comment and/or advice are welcome (by e-mail).

First tryout of Jadoo

Hi

This is my first post with the Jadoo blog engine. As I stated before, I was planning to write my own blog software with these goals:

  • Simplicity
  • No PHP nor any script for the client
  • All the processing done un Python, offline
  • No DB
  • (maybe some other goals but I don’t remember them, right now)

I’ll try to apply the "release soon, release often" principle (where does it came from?): before writing an entry, launch jadwrite.py ; to create html files, launch jadpub.py ; then upload html files with your FTP client (scripts are highly customized for my blog, for the moment ; and everything doesn’t respect all the standards). I’ll also to retrieve all my previous posts (but the URL will be changed ; the RSS URL also changed). But for the moment, I have other important work to do … There is no system for comments for the moment (I don’t know if there will be one in the future) but you can send me comments and requests to jepoirrier@gmail.com.

Oh yes, and I’ll try to add a decent CSS 🙂

Plugins for Digital Object Identifier lookup

I’ve just written some “search plugins” for Firefox (1.x and 2.x) that allow you to quickly look for a specific Digital Object Identifier (DOI). These DOI are more and more used in biomedical sciences. One of their interesting features is that they allow direct linking to the scientific article.

The plugins are availble here. If you already have Firefox 2, the installation procedure is very easy: all you have to do is go to the plugins page, click on the small arrow near your Firefox search box and choose the “Add DOI lookup” option; it will then automatically be installed for you.

OSS/FS players about GPL Java

Sun opened Java in the most elegant way of doing it (imho): the licence is the GPL. This move was analysed and commented by many people. Even some important Open Source/Free Software players gave their comments on a Sun website. Unfortunately, their comments are only available in a proprietary video format.

You can now have access to audio recordings of these interviews (Brian Behlendorf, Paul Cormier, Eben Moglen, Tim O’Reilly, Mark Shuttleworth, Richard Stallman and Dr. Marcelo K. Zuffo), to a text transcript and even to SHA1 sums of the audio files!

Simple Sitemap.xml builder

In a recent post, Alexandre wrote about web indexing and pointed to a nice tool for webmaster: the sitemap. The Sitemap Protocol “allows you to inform search engines about URLs on your websites that are available for crawling” (since it’s a Google creation, it seems that only Google is using it, according to Alexandre).

If you have a shell access to your webserver and Python on it, Google has a nice Python script to automatically create your sitemap.

I don’t have any shell access to my webserver 😦 But I can write a simple Python script 🙂 Here it is: sitemapbuilder.py, 4ko. After having specified the local directory where all your files are and the base URL for your on-line website (yes, you need to edit the script), launch the script and voilà! You can now upload your sitemap.xml file and tell web crawlers where to find the information on your website.

Interested in the other options you can specify?

  • You can specify an array of accepted file extensions. By default, I’ve put ‘htm’, ‘html’ and ‘php’ but you can add ‘pdf’ if you want.
  • You can specify an array of filenames to strip. By default, I strip all ‘index.*’ (with * = one of the accepted extensions) because http://www.poirrier.be is the same as http://www.poirrier.be/index.html but more easier to remember
  • You can specify a change frequency (it will be the same for all files)
  • You can specify a priority (it will also be the same for all files and even omitted if equal to the default value)

On the technical side, there is nothing great (I even don’t use any XML tool to generate the file). I was impressed by the ease of walking through a directories/files tree with the Python os.walk function (it could be interesting to use it in the future blog system I mentioned earlier).

Finally, you can see the sitemap.xml generated for my family website.

Edit on November 17th: it seems that Google, Yahoo! and Microsoft made a team to release a “more official” sitemaps specification: http://www.sitemaps.org/.

Automated Pubmed reference to BibTeX

In biology, we often need to use PubMed, a biomedical articles search engine for citations from MEDLINE and other life science journals.

In the MS-Windows world, you have nice, proprietary tools (like Reference Manager or Endnote) that retrieves citations from PubMed, store them in a database and allow you to use them in proprietary word processing software (in fact, in MS-Word only since nor Wordperfect nor OpenOffice.org are supported). If you are using BibTeX (for LaTeX) as your citations repository, there isn’t a lot of tools. The best one, imho, is JabRef, a free reference manager written in Java (for me, the only “problem” is that it adds custom, non-BibTeX tags). Or you can edit the BibTeX file by yourself with any text editor.

The problem with manual edition is that it is prone to error (even when copying/pasting from the web). Since Python programming is my hobby horse for the moment, there are two solutions to this problem:

  1. Use Biopython to get a reference from PubMed but are you ready to have a huge module dependency just to use 1 function?
  2. Write your own Python script, using a PubMed URL to download your reference and a little bit of XML parsing to extract the relevant info (one can use the ESearch and EFetch tools but my lazy nature tells me to simply use the URL).

Obviously, I chose to write my own Python script. Each reference from this PubMed XML format example (full DTDs) should be like this:

@article{poirrier06,
  author = {Poirrier, J.E. and Poirrier, L. and Leprince, P. and
Maquet, P.},
  title = {Gemvid, an open source, modular, automated activity
recording system for rats using digital video},
  year = 2006,
  journal = {Journal of circadian rhythms},
  volume = 4,
  pages = {10},
  pmid = 16934136,
  doi = {10.1186/1740-3391-4-10}
}

The script is here (4kb). First, use PubMed to check the reference you want, then take its PubMed ID (PMID) and launch the program, giving your BibTeX file in a pipe, for example:

./pyP2B.py 16934136 >> myrefs.bib

If you like, you can edit the script to change the tab size (here = 2).

How does it work?

  1. With PubMed, I do not use the correct tool but a HTTP query. It is much more simple and easier. The script asks for the PMID citations. Since it gets a HTTP answer, I need to parse this answer to replace entities (like , etc.) and obtain a valid XML file.
  2. Once I got the XML file and after some checking, I use XPaths from LXML (for me, XPaths are quick and dirty compared to write a DOM/SAX structure but it works!)
  3. Then the script simply prints the result to the standard output (even if it’s an error ; improvement : print on the error output). You simply need to get this output into your BibTeX file with the correct pipe.

Edit on October 23rd: this script has errors when dealing with non-ascii chars like “ö” in Angelika Görg. I won’t fix it for the moment.

Playing with Python and Gadfly

Following my previous post where I retrieved EXIF tags from photos posted on Flickr, here is the next step: my script now stores data in a database.

There is a lot of free wrappers for databases in Python. Although I first thought of using pysqlite (because I am already using SQLite in another project), I decided to use Gadfly, a real SQL relational database system entirely written in Python. It does not need a separate server, it complies with the Python DBAPI (allowing easy changes of DB system) and it’s free.

Using Gadfly is very easy and their tutorial is very comprehensible. Put toghether, here is an example of the creation of a database, the addition of some data and their retrieval (testGadfly.py):

#!/usr/bin/python
# test for Gadfly
import os
import gadfly
import time

DBdir = 'testDB'
DBname = 'testDB'

if os.path.exists(DBdir):
    print 'Database already exists. I will just open it'
    connection = gadfly.gadfly(DBname, DBdir)
    cursor = connection.cursor()
else:
    print 'Database not present. I will create it'
    os.mkdir(DBdir)
    connection = gadfly.gadfly()
    connection.startup(DBname, DBdir)
    cursor = connection.cursor()
    cursor.execute("CREATE TABLE camera (t FLOAT, make VARCHAR, model VARCHAR)")

print 'Add some items'
t = float(time.time())
cmake = 'Nikon'
cmodel = 'D400'
cmd = "INSERT into camera (t, make, model) VALUES\
    ('" + str(t) + "','" + cmake + "','" + cmodel + "')"
cursor.execute(cmd)

print 'Retrieve all items'
cursor.execute("SELECT * FROM camera")
for x in cursor.fetchall():
    print x

connection.commit()
print 'Done!'

Regarding the initial project, the script became too long to be pasted in this post but you can download it here: flickrCameraQuantifier2.py (5ko). To run it, you should have installed the wrapper for EXIF and Flickr and the Gadfly DB system. In the beginning of the script, you can define the number of iterations (sets of queries) you want in total (variable niterations), the sleep duration between queries (variable sleepduration) and the number of photos to get for each query (variable nphotostoget). Everything will then be stored in a Gadfly database (default name: cameraDB). If you want to read what is stored, here is a very basic script: flickrCQ2Reader.py.

For example, I’ve just asked 125 queries (with 5s between each query). I’ve got 88 photos (70.4% of queries) with 27 photos without EXIF tags (30.68% of all the photos). Among all the camera makers, Canon has 27%, Fuji has 11%, Nikon has 18% and Sony has 21% of all the photos with EXIF tags at that moment. This is approximately what Flagrant disregard found. I don’t have time anymore but one could improve the data retrieval script in order to automate the statistics and their presentation …

Edit on October 9th, 2006: added the links to the missing scripts