Tag: visualization

Android is catching up iOS

Well, there is nothing new in this statement. The smartphone OS Android is catching up and even overtaking its rival iOS in many domains:

more activated products per day and per year in 2011,
more Samsung Galaxy S3 (running Android) sold in Q3 2012 than iPhone4 and 5S (running iOS),
more devices worldwide,
catching up Apple’s market share in tablets,
…

All this is summarised in an infographics MBA Online designed (the original address is here: http://www.mbaonline.com/android/ – click at your own risk). It is sweet and colorful, with lots of numbers and some references in the end. Unfortunately these references are embedded in the image so you cannot click on them if you ever want to read more info.

Also as I mentioned previously (for an infographics coming from a similar type of website), I didn’t like much the fact it was very, very long (see reduced copy on the right). It makes things easily read while scrolling down. But ymmv I would have like something a bit more different. For instance I would have seen this more as a succession of slides, a-la Pechakucha maybe (except there is a lot of text). But the restrictive license (CC-by-nc-nd) prohibits derivative works.

So I like my Android device. I like when people promote it, are proud that Android is a success and talk about it. And the web is full of these infographics: a similar story about taking over the world, the successive Android versions (again very long), tastes of Android users (versus iOS users’), a broader smartphone comparison (again very long), a Google search for it, … Choose the one you like!

Effects of Tobacco on health – visualized

As you probably know I am interested in both diseases (and health in general) as well as visualization. Recently Online Nursing Programs (*) invited me to have a look at their latest infographics about the effects of tobacco on health (directly to figure).

Although numbers seem correct (references are at the bottom), although they intelligently re-use the presentation of some well-known tobacco companies, there is one thing that I don’t like that much: like this sentence, the figure is very, very long. You have to scroll many pages in order to see everything. It may look like a story but it is not presented as such (I mean: there are no clear marks of different steps in the story, except the three “chapters”). On the right is the complete figure in exactly 800 pixels of height – can you read something? GOOD.is solved this issue by using a Flash player that allows the viewer to woom in/out and go to different sections of the figure (see here for instance).

Now, about smoking … Smokers do what they want with their health. Of course, I criticise the physical dependency, the effects on social security and indirectly on everyone’s capacity to react to other health issues. And of course I hope that people could stop smoking. But in my opinion the most disgusting thing about tobacco is secondhand smoking (aka. passive smoking): the inhalation of smoke by persons other than the active smoker. This passive smoking is especially harmful in young children. The CDC estimated that it is responsible for an estimated 150,000–300,000 new cases of bronchitis and pneumonia annually, as well as approximately 7,500–15,000 hospitalizations annually in the United States – both in children below 18 months. And in adults, passive smoking increase the risk of heart disease and lung cancer by 20-30%. Without doing anything – just inhaling smoke from your neighbour.

So it was a very nice idea from them to draw people’s attention to these health issues. It could have been better if the figure would have been more “readable” IMHO.

(*) Unfortunately for them, “Online Nursing Programs” sounds like a website that will just ask for your credit card number although they publish nice infographics – like this other one about sanitation. The About page that doesn’t say who they are add to these doubts.

Created by Online Nursing Programs, license CC-…

Visualizing categorical data in mosaic with R

A few posts ago I wrote about my discomfort about stacked bar graphs and the fact I prefer to use simple table with gradients as background. My only regret then was that the table was built in a spreadsheet. I would have liked to keep the data as it is but also have a nice representation of these categorical data.

This evening I spent some time analysing results from a survey and took the opportunity to buid these representations in R.

The exact topic of the survey doesn’t matter here. Let just say it was a survey about opinion and recommendations on some people. The two questions were:

How do you think these persons were, last year? Possible answers were: very bad, bad, average, good or very good.
Would you recommend these persons for next year? Possible answers were just yes or no.

For the first question, the data was collected in a text file according to these three fields: Person, Opinion, Count. Data was similar to this:

Person,Opinion,Count
Person 1,Very bad,0
Person 1,Bad,0
Person 1,Average,4
Person 1,Good,9
Person 1,Very good,3
Person 2,Very bad,3
Person 2,Bad,4
Person 2,Average,4
Person 2,Good,5
Person 2,Very good,0

The trick to represent this is to use geom_tiles (from ggplot2) to display each count. There is an additional work to be done in order to have the Opinion categories in the right order. The code is the following:

library(ggplot2)
data1 <- read.table("resultsQ1.txt", header=T, sep=",")
scale_count <- c("Very bad", "Bad", "Average", "Good", "Very good")
scale_rep <- c("1", "2", "3", "4", "5")
names(scale_count) <- scale_rep
ggplot(data1, aes(x=Opinion, y=Person)) +
geom_tile(aes(fill=Count)) +
xlim(scale_count) +
scale_fill_gradient(low="white", high="blue")+theme_bw() +
opts(title = "Opinion on persons")

And the graph looks like this:

For the second question, the data was collected in a text file according to these three fields too: Person, Reco, Count. Data was similar to this:

Person,Reco,Count
Person 1,Recommend,16
Person 1,Do not recommend,0
Person 2,Recommend,5
Person 2,Do not recommend,11

And we use approximately the same code:

library(ggplot2)
data2 <- read.table("resultsQ2.txt", header=T, sep=",")
ggplot(data2, aes(x=Reco, y=Person)) +
geom_tile(aes(fill=Count)) +
scale_fill_gradient(low="white", high="darkblue")+theme_bw() +
opts(title = "Recommendations")

And the graph for the second question looks like this:

Easy isn’t it? Do you have other types of visualization for this kind of data?

About stacked bar graphs

This afternoon I received a bunch of data accompanied by stacked bar graphs for each dataset. For example, this one:

The chart shows the incidence of disease X in various age ranges. That incidence is split by 8 severity levels. The chart shows that the disease especially affects age ranges 4 and 5, at different severity levels. However I didn’t feel comfortable …

what are the different levels of severity in age ranges 1, 2 and 3?
how can we compare levels C, D and E in age ranges 4 and 5?
is there anywhere some severity A?
(it’s even worst when some age ranges don’t have any incidence at all: what is happening?)
etc.

I looked on the web but couldn’t find much information apart from the fact “The Economist says they’re so bad at conveying information, that they’re a great way to hide a bad number amongst good ones” (but are still using them in their graphic detail section) or “a stacked column chart with percentages should always extend to 100%” (this doesn’t really apply here). Then in a post on Junk Charts, someone mentioned Steven Few who would have said “not to use stacked bar charts because you cannot compare individual values very easily and as a rule [he] avoid[s] stacked bars with more than six or seven divisions”. And Steven Few also participated in his forum here.

This reminded me I read a book written by Steven Few, a few years ago: Information Dashboard Design (O’Reilly Media, 2006). Inside, on pages 135-136, one can read stacked bar graphs are the right choice only when you must display multiple instances of a whole and its parts, with emphasis primarily on the whole. And that this type of graph shouldn’t be used if the distribution changes must be shown more precisely.

If one wants to clearly display both the whole and its parts, Steven Few recommends to either use two graphs next to each other or a combination bar and line graph (with two quantitative scales).

As I’m not really interested in the whole but mainly in the parts and their relative distribution, I suggest another way to present the data. This isn’t really new. Actually everything was already in the table. You just format the table nicely and add some colour gradient. And voilà:

You still see where the incidence is the highest (in age ranges 4 and 5), what levels of severity are the most important (C, with lower but approximately similar levels of D, E and H). In addition to the graph above, one can notice there isn’t any severity levels A, B, F and G represented and we can quickly grasp the proportions between the different incidences.

Of course, if your criteria for “sexiness” is that there shouldn’t be any digit on your chart, then this chart is not sexy. But I find this presentation really more appealing and meaningful than the stacked bar graph. Isn’t it?

Visualizing how a population grows to 7 billion (NPR)

The NPR has produced a nice visualization / video showing how population grew to 7 billion (original article):

If you want to model the improvement in child survival, you just turn the birth tap off (or nearly). Then, with wealth, prevention, healthcare and better food, the population will also grow older (death tap also turned off or nearly) and during a certain time, lots of adults will be economically active (i.e. they will work and consume). This is a demographic dividend. But it comes with a risk: at the next stage, there might be a disproportionately high number of people compared to / depending on a small number of active adults (the next generation). In addition, if you fill it up slowly but you also empty it slowly, the container risk to be full soon, it all depends on the various rates …

Note that this representation is also very effective to understand the basics of compartmental models in epidemiology 🙂

Road traffic: real-life and virtual visualization

During lunch time, I discovered an old street art video (well, old = 2010) where people poured hundreds liters of painting on Rosenthaler Platz (Berlin, Deutschland) to visualize traffic patterns (below: screenshot and video).

This reminded me that I recently discovered that Google Maps now includes traffic for Brussels. It was the case for Berlin since a long time and the Rosenthaler Platz looks quite quiet for the moment:

It’s not exactly the same colours 😉

(found via Olybop ; original page)

Cognitive Surplus visualised

In the 300-and-more RSS items in my aggregator this week, there are 2 great ones from Information is Beautiful, a blog gathering (and publishing its own) nice ways to visualise data.

The first one is based on a talk by Clay Shirky who, in turn, was referencing his book Cognitive Surplus. In Cognitive Surplus visualized, David McCandless just represented one of Shirky’s ideas: 200 billion hours are spent each year by US adults just watching TV whereas only 100 million hours were necessary to create Wikipedia (I guess the platform + the content) …

Cognitive Surplus visualised from Information Is Beautiful

It makes you think about either the waste television helps to produce either the potential of human brain(s) if relieved from the burden of television.

The second interesting post appeared in fact in information aesthetics, a blog where form follows data (referencing Information is Beautiful but I can’t find this post). In Top Secret America: Visualizing the National Security Buildup in the U.S., Andrew Vande Moere relates “an extensive investigative project of the Washington Post that describes the huge national security buildup in the United States after the September 11 attacks”. The project website contains all the ingredients for a well-documented investigation with the addition of interactive maps and flash-based interfaces allowing the user to build his/her own view on the project.

Top Secret America from the Washington Post

It’s nice to see investigative journalism combined with beautiful data visualisation and handling!