Tag: catrss

OPML output in catrss

A few days ago, I released the first version of catrss, a tool used to concatenate RSS file(s) to standard output. Today, I added OPML output to this tool. Here it is in version 0.2 (.tar.gz file, 16ko).

OPML is a file format first used in a commercial application. Now it’s widely used for the exchange of links between news aggregators. Because of that, I had to implement it in catrss: it’s a potential format for the output of catrss.

But I’m not happy with this format. It’s nice, it does its job, it’s already available, … ok. Other people already complained about this format (see here e.g.). Beside the fact it’s not standardized (it’s difficult to exactly know what software will correctly parse what you put in your outline tags), I dislike the fact that the official specifications include tags that describe what I consider as the look, feel and behaviour of the content (expansionState, vertScrollState and the window*) and not the content itself. Of course, I’ve read that they are optional. Nevertheless, I don’t think it’s a good design decision to handle things like this and mix look/behaviour and content (not in this case).

I’m sorry for people not in computer science but this is — again — another entry about programming and I understand the title and content are a bit strange for them.

catrss 0.1

One day, one has to sit at his/her table and try to really understand how to deal with XML. Since I think I can only learn with a project in mind, I took Alexandre Dulaunoy’s mergerss suggestion and tried to develop my own catrss.

As the name implies, catrss is one of the many descendants of the cat command. Catrss is used to concatenate RSS file(s) to standard output. In its most simple form, you simply have to give it some RSS files to parse and it will concatenate them for you ; the command is:

./catrss rssfile1.xml rssfile2.xml ...

If you want to see all the parameters you can set, just type “./catrss –help”. You’ll probably prefer to set your own title, link and description parameters since they are the only mandatory elements. One important point to keep in mind is that, by default, catrss only take the 10 most recent items (blog entries, e.g.) from all the files. You can change this value with the “-n” option.

For the moment, catrss is only available here (.tar.gz file, 16ko). The file contains the catrss program, its source code and two example of RSS files. Code is licensed under the GNU GPL. You need only Python 2.5 in order to run catrss (it’s probably already installed on any GNU/Linux computer).

Currently, it only works with RSS 2.0 files and it’s very picky with dates (for example, it’s not working with this blog RSS stream — what a shame!). But all this could be improved for version 0.2. Suggestions, bug reports and patches are welcome.

Finally, dealing with XML and Python is very easy. ElementTree documentation is quite good. And, except for other Unix-minded tools, there is plenty of other cool stuff one can do with XML: parse answers from the Yahoo API, deal with XML-RPC and other web services, …

Of course, it’s when you are struggling to feed XML into your program that you realize other people already developed what you are just doing: I’ve found at least 5 RSS parsers/generators [1, 2, 3, 4, 5] and 3 tutorials [1, 2, 3]. But I’m proud to say I didn’t used any of these references for catrss.