One day, one has to sit at his/her table and try to really understand how to deal with XML. Since I think I can only learn with a project in mind, I took Alexandre Dulaunoy’s mergerss suggestion and tried to develop my own catrss.
As the name implies, catrss is one of the many descendants of the cat command. Catrss is used to concatenate RSS file(s) to standard output. In its most simple form, you simply have to give it some RSS files to parse and it will concatenate them for you ; the command is:
./catrss rssfile1.xml rssfile2.xml ...
If you want to see all the parameters you can set, just type “./catrss –help”. You’ll probably prefer to set your own title, link and description parameters since they are the only mandatory elements. One important point to keep in mind is that, by default, catrss only take the 10 most recent items (blog entries, e.g.) from all the files. You can change this value with the “-n” option.
For the moment, catrss is only available here (.tar.gz file, 16ko). The file contains the catrss program, its source code and two example of RSS files. Code is licensed under the GNU GPL. You need only Python 2.5 in order to run catrss (it’s probably already installed on any GNU/Linux computer).
Currently, it only works with RSS 2.0 files and it’s very picky with dates (for example, it’s not working with this blog RSS stream — what a shame!). But all this could be improved for version 0.2. Suggestions, bug reports and patches are welcome.
Finally, dealing with XML and Python is very easy. ElementTree documentation is quite good. And, except for other Unix-minded tools, there is plenty of other cool stuff one can do with XML: parse answers from the Yahoo API, deal with XML-RPC and other web services, …
Of course, it’s when you are struggling to feed XML into your program that you realize other people already developed what you are just doing: I’ve found at least 5 RSS parsers/generators [1, 2, 3, 4, 5] and 3 tutorials [1, 2, 3]. But I’m proud to say I didn’t used any of these references for catrss.