Month: May 2012

Visualizing categorical data in mosaic with R

A few posts ago I wrote about my discomfort about stacked bar graphs and the fact I prefer to use simple table with gradients as background. My only regret then was that the table was built in a spreadsheet. I would have liked to keep the data as it is but also have a nice representation of these categorical data.

This evening I spent some time analysing results from a survey and took the opportunity to buid these representations in R.

The exact topic of the survey doesn’t matter here. Let just say it was a survey about opinion and recommendations on some people. The two questions were:

  1. How do you think these persons were, last year? Possible answers were: very bad, bad, average, good or very good.
  2. Would you recommend these persons for next year? Possible answers were just yes or no.

For the first question, the data was collected in a text file according to these three fields: Person, Opinion, Count. Data was similar to this:

Person,Opinion,Count
Person 1,Very bad,0
Person 1,Bad,0
Person 1,Average,4
Person 1,Good,9
Person 1,Very good,3
Person 2,Very bad,3
Person 2,Bad,4
Person 2,Average,4
Person 2,Good,5
Person 2,Very good,0

The trick to represent this is to use  geom_tiles (from ggplot2) to display each count. There is an additional work to be done in order to have the Opinion categories in the right order. The code is the following:

library(ggplot2)
data1 <- read.table("resultsQ1.txt", header=T, sep=",")
scale_count <- c("Very bad", "Bad", "Average", "Good", "Very good")
scale_rep <- c("1", "2", "3", "4", "5")
names(scale_count) <- scale_rep
ggplot(data1, aes(x=Opinion, y=Person)) +
geom_tile(aes(fill=Count)) +
xlim(scale_count) +
scale_fill_gradient(low="white", high="blue")+theme_bw() +
opts(title = "Opinion on persons")

And the graph looks like this:

For the second question, the data was collected in a text file according to these three fields too: Person, Reco, Count. Data was similar to this:

Person,Reco,Count
Person 1,Recommend,16
Person 1,Do not recommend,0
Person 2,Recommend,5
Person 2,Do not recommend,11

And we use approximately the same code:

library(ggplot2)
data2 <- read.table("resultsQ2.txt", header=T, sep=",")
ggplot(data2, aes(x=Reco, y=Person)) +
geom_tile(aes(fill=Count)) +
scale_fill_gradient(low="white", high="darkblue")+theme_bw() +
opts(title = "Recommendations")

And the graph for the second question looks like this:

Easy isn’t it? Do you have other types of visualization for this kind of data?