Category: Open Source

Visualizing categorical data in mosaic with R

A few posts ago I wrote about my discomfort about stacked bar graphs and the fact I prefer to use simple table with gradients as background. My only regret then was that the table was built in a spreadsheet. I would have liked to keep the data as it is but also have a nice representation of these categorical data.

This evening I spent some time analysing results from a survey and took the opportunity to buid these representations in R.

The exact topic of the survey doesn’t matter here. Let just say it was a survey about opinion and recommendations on some people. The two questions were:

  1. How do you think these persons were, last year? Possible answers were: very bad, bad, average, good or very good.
  2. Would you recommend these persons for next year? Possible answers were just yes or no.

For the first question, the data was collected in a text file according to these three fields: Person, Opinion, Count. Data was similar to this:

Person,Opinion,Count
Person 1,Very bad,0
Person 1,Bad,0
Person 1,Average,4
Person 1,Good,9
Person 1,Very good,3
Person 2,Very bad,3
Person 2,Bad,4
Person 2,Average,4
Person 2,Good,5
Person 2,Very good,0

The trick to represent this is to use  geom_tiles (from ggplot2) to display each count. There is an additional work to be done in order to have the Opinion categories in the right order. The code is the following:

library(ggplot2)
data1 <- read.table("resultsQ1.txt", header=T, sep=",")
scale_count <- c("Very bad", "Bad", "Average", "Good", "Very good")
scale_rep <- c("1", "2", "3", "4", "5")
names(scale_count) <- scale_rep
ggplot(data1, aes(x=Opinion, y=Person)) +
geom_tile(aes(fill=Count)) +
xlim(scale_count) +
scale_fill_gradient(low="white", high="blue")+theme_bw() +
opts(title = "Opinion on persons")

And the graph looks like this:

For the second question, the data was collected in a text file according to these three fields too: Person, Reco, Count. Data was similar to this:

Person,Reco,Count
Person 1,Recommend,16
Person 1,Do not recommend,0
Person 2,Recommend,5
Person 2,Do not recommend,11

And we use approximately the same code:

library(ggplot2)
data2 <- read.table("resultsQ2.txt", header=T, sep=",")
ggplot(data2, aes(x=Reco, y=Person)) +
geom_tile(aes(fill=Count)) +
scale_fill_gradient(low="white", high="darkblue")+theme_bw() +
opts(title = "Recommendations")

And the graph for the second question looks like this:

Easy isn’t it? Do you have other types of visualization for this kind of data?

Funny update of ql2400 and ql2500 devices in Fedora 14

ql2400 and ql2500 update in Fedora 14
ql2400 and ql2500 update in Fedora 14

Although some people think it’s a joke (see kalev’s comment on 2011-09-17 19:13:44 in the bugfix report), I won’t install this update; I agree it’s funny but refusing to install it at least gives me the feeling I have still something to say on my system (that’s also what free software are for, isn’t it?).

I also like what the (same) submitter wrote for the first update:

Updated qlogic 2400 and 2500 firmware to 5.03.13. What does 5.03.13 do? No one knows, except for QLogic, and they’re not telling. I asked, and they told me that information was only available under NDA. So, I encourage you to imagine what this firmware does, and the bugs it fixes. While you’re at it, imagine a world where vendors release source code for their firmware.

References, references, references!

When I studied biology as well as when I did my Ph.D., our professors were always after us because of references. I think with their precious help we learnt the art of referencing: choosing good references, citing them at the appropriate location in a text and, of course, giving enough information at the bottom of the text to allow the reader to find these references.

I just finished reading two articles in a recent edition of The Economist and they reminded me how important are these references. These articles are What would Jesus hack? and Worrying about wireless.

First an aside: it might be an editorial choice but I would prefer to know who wrote an article rather than anonymity. I don’t have (and won’t have) anything personal against any author. I just like to know if I’m reading something written by a young Mr. I-know-everything with no background in the topic of the article or by a Mrs Specialist who appears to work in the field she’s writing about. In this blog, who I am is in the “About” section in the bar above.

In What would Jesus hack? the anonymous author is throwing a mix of everything and anything to make a story. And actually it works: the article has some logic in its sequence of statements. From an external point of view you may even think it’s a nice article. You discover news and organisations that you may have missed: an opinion from Antonio Spadaro in “Hacker ethics and Christian vision” (Google translation of the abstract), the reply from Eric S. Raymond, Elèutheros, … But you will also be staggered at the hotchpotch mixing Open Source, internet, Twitter, … Why not add Facebook then, the archetypal anti-privacy web service?

Richard Stallman changes my lifeThe only point that the article might get right is that some software programmers are somehow seeing themselves and / or seen by others as gods: Richard Stallmann, Linus Torvalds, Bill Gates (god turned philanthropist), Steve Jobs (god turned designer), etc. On top of that, every programmer had her/his Eureka moment when she/he solves a bug after hours trying to fix the code. Otherwise, I agree with what the unnamed author puts in the mouth of Kevin Kelly and that I can summarize by: “with more power comes more responsibilities”.

And, as I pointed out in the beginning, there isn’t any reference at the bottom of the paper version, any link in the digital version. Statements and people in this article could have been 100% fictional, no one would have known that (until you look for them on the web).

I have the same issue with Worrying about wireless: no sources, no references. I don’t forbid the anonymous writer to have an opinion on the topic. Just let the others also make their own opinion by citing the sources you are using. This article is just shaping the opinion of  readers in a hurry by using a partisan language and not citing sources. Even when indirectly citing sources (e.g. the WHO IARC classification), the anonymous coward succeeds in using negative wording to dismiss what doesn’t please his / her theory. I would have liked to have more information about the potential adverse effects of wifi waves in the long run, for instance. But I will unfortunately not believe such one-way gibberish.

Now you’ll tell me I don’t have to read The Economist and you’ll be right 🙂

Illustration credit: Duty calls by xkcd and Richard Stallman by Pladour on Flickr (CC-by-nc)

ForbidSleepingMode updated

Following some comments on the dependency to version 4 of the .Net framework, I rewrote ForbidSleepingMode in C++. You can open and compile the project with Qt (open source). The source code is of course updated. The mandatory screenshot as well 🙂

forbidSleepingMode screenshot

As you can see, I took the opportunity to add a small field where you can specify your own interval at which the program will “tickle” your computer.

forbidSleepingMode

I just put my first small tool on GitHub: forbidSleepingMode. It will forbid your (Windows) computer to enter into sleep mode, acting as if there was activity all the time. I’m sure you can think of 1001 productive uses for such tool.

Technically, it just sends a “tickle” to the computer every 10 minutes forcing the display to remain on (hence: don’t set your screensaver to come before 10 minutes). Build it with Visual Studio 10 (I know, I know …).

The mandatory screenshot (very, very useful):

forbidSleepingMode screenshotI intend to re-publish old tools on GitHub as I find them.

Installing Fedora 13 on a Toshiba Satellite L670-10K

I quickly needed a new laptop to continue working and I found a Toshiba Satellite L670-10K. It’s a nice entry-level laptop with a dual core processor (I didn’t know Intel was still doing Pentium-branded processors) and a 17″ screen (read the specs for other details). I downloaded the latest Fedora Linux (version 13, 64 bits ; and version 14 is coming soon) and installed it from the LiveCD. Nearly everything was recognized out-of-the-box: screen resolution, graphical card (Intel, with 3D effects), wired network, webcam, card reader, sound card, etc.

The only thing that was not recognized was the wireless network card: a Realtek RTL8191SE. Here is how to install it. On the Toshiba website for (Windows) wireless drivers, it is always associated with the RTL8192SE model. So don’t be surprised if the driver downloaded from the Realtek website is a file with RTL8192 in its name although you clicked on the link for the RTL8191SE-VA2 model. Unpack this file. The LiveCD doesn’t come with some packges so you have to install them (via the System menu, Administration, Add/Remove software). These packages are: kernel-devel, gcc and make. Once it’s done, do a simple “make;make install” as root and reboot the laptop. Your wireless connection is now up and running!

Wireless UFO?

If you want to have Flash on your 64-bits Linux, Adobe released version 10.1 of their Flash player with native 64 bits support. Download Flash player “Square”, unpack the archive and copy the (only) file “libflashplayer.so” in directory /home/yourusername/.mozilla/plugin, restart Firefox. You have now a Flash-enabled browser!

Finally, I must have done something wrong, somewhere but I kept having the first configuration screen after installation, even after subsequent reboots. After a quick search, I didn’t find anyone with the same issue. YMMV. In order to skip this screen (after you went through them a first time), just add the line “RUN_FIRSTBOOT=NO” in the file /etc/sysconfig/firstboot and voilà!

In conclusion, I’m very pleased with this laptop and Fedora. My Linux desktop was ready in just a few minutes. Let’s work, now! 🙂

Happy Software Freedom Day 2010!

Today, September 18th 2010, it’s Freedom Software Day all over the world. It is an annual worldwide celebration of Free Software, a public education effort with the aim of increasing awareness of Free Software and its virtues, and encouraging its use.

On the SFD website, there isn’t a lot of events registered for Belgium. There is only one, in fact, in Oostende (LiLiT is doing an install party in Liege but I can’t see any reference to SFD; still, it’s a good initiative!). Well, a SFD on September 18th in Belgium might not have been a good idea if the goal is to increase awareness of Free Software: more than half of the population is celebrating the Walloon Region or preparing a Sunday without car in Brussels (while others are just looking for a government since April 2010!). So, at a personal level, I decided to give Ubuntu a try (10.04 LTS).

In terms of user experience, you can’t beat the installation process of Ubuntu (my comparison criteria are Fedora 13 and any version of Windows XP, Vista or 7 that are not on a PC-specific image disc). Seven configuration screen with rather simple questions and that’s it. There are choices you can’t make like the selection of software you want to be installed and available on the next reboot. But, most of the general software is there: a web browser, a word processor, some games, a rudimentary movie player and a music player. The “Software Center” is also readily visible so you can’t miss it and it seems to be an obvious choice if you want to install any other software.

New Ubuntu desktop for Freedom Software Day 2010

The real test will now be if one can actually work with it. If I don’t post any furious comment against some features or if I don’t post anything about the installation of some software in the coming days / weeks, you’ll know I’m still working with this Linux flavour.

Bittorrent used to deploy updates

I just watched a video from Larry Gadea working at Twitter: Twitter – Murder Bittorrent Deploy System (speaking at CUSEC 2010).

Briefly, the problem Twitter was facing was the deployment of updates to thousands of servers in a short amount of time and dealing with errors (broken servers, e.g.). A nice, simple, cool and free way of solving this issue was to use the Bittorrent protocol (via Python and a stack of other free software) to actually deploy updates. In summary, you go from a unique repository facing thousands requests approximately at the same time:

And you end up with a nice “distribution chain”:

The beautiful thing is that they now go 75 times faster than before!

And now, the video:

http://vimeo.com/moogaloop.swf?clip_id=11280885&server=vimeo.com&show_title=1&show_byline=1&show_portrait=0&color=00ADEF&fullscreen=1

The Murder software is hosted on Github (Apache 2 license).

Why do I blog this? First, I like to see simple ideas no one had before implemented like this. I also wonder how other companies facing the same problems are doing (status.net for example ; I don’t think it could be useful for Forban). Finally, you see, Bittorrent is sometimes about good stuff too!