Tag: data

About stacked bar graphs

This afternoon I received a bunch of data accompanied by stacked bar graphs for each dataset. For example, this one:

Stacked bar graph example

The chart shows the incidence of disease X in various age ranges. That incidence is split by 8 severity levels. The chart shows that the disease especially affects age ranges 4 and 5, at different severity levels. However I didn’t feel comfortable …

  • what are the different levels of severity in age ranges 1, 2 and 3?
  • how can we compare levels C, D and E in age ranges 4 and 5?
  • is there anywhere some severity A?
  • (it’s even worst when some age ranges don’t have any incidence at all: what is happening?)
  • etc.

I looked on the web but couldn’t find much information apart from the fact “The Economist says they’re so bad at conveying information, that they’re a great way to hide a bad number amongst good ones” (but are still using them in their graphic detail section) or “a stacked column chart with percentages should always extend to 100%” (this doesn’t really apply here). Then in a post on Junk Charts, someone mentioned Steven Few who would have said “not to use stacked bar charts because you cannot compare individual values very easily and as a rule [he] avoid[s] stacked bars with more than six or seven divisions”. And Steven Few also participated in his forum here.

This reminded me I read a book written by Steven Few, a few years ago: Information Dashboard Design (O’Reilly Media, 2006). Inside, on pages 135-136, one can read stacked bar graphs are the right choice only when you must display multiple instances of a whole and its parts, with emphasis primarily on the whole. And that this type of graph shouldn’t be used if the distribution changes must be shown more precisely.

If one wants to clearly display both the whole and its parts, Steven Few recommends to either use two graphs next to each other or a combination bar and line graph (with two quantitative scales).

As I’m not really interested in the whole but mainly in the parts and their relative distribution, I suggest another way to present the data. This isn’t really new. Actually everything was already in the table. You just format the table nicely and add some colour gradient. And voilĂ :

Simple table with data - instead of stacked bar graph

You still see where the incidence is the highest (in age ranges 4 and 5), what levels of severity are the most important (C, with lower but approximately similar levels of D, E and H). In addition to the graph above, one can notice there isn’t any severity levels A, B, F and G represented and we can quickly grasp the proportions between the different incidences.

Of course, if your criteria for “sexiness” is that there shouldn’t be any digit on your chart, then this chart is not sexy. But I find this presentation really more appealing and meaningful than the stacked bar graph. Isn’t it?

The Top 5 Killers of Men

From Delicious, I saw that Yahoo had an article about the top 5 killers of men. I thought it would be nice to see from where they get there data.

First, I have to mention that the article is really about American men, nothing else (not about mankind, not about men around the world, not about women, children, etc.). The article is related to the US National Men’s Health Week (the US National Women’s Health Week was in May 8-14, 2011). Although the article is giving advices, there are no sources of information.

However, it’s rather easy to obtain these numbers …

For the US, the CDC FastStats website is a hub to data about health in the US. Here is the CDC ranking for the top 5 killers in 2007 (in both US women and men):

  1. Heart disease: 616,067 deaths
  2. Cancer: 562,875 deaths
  3. Stroke (cerebrovascular diseases): 135,952 deaths
  4. Chronic lower respiratory diseases: 127,924 deaths
  5. Accidents (unintentional injuries): 123,706 deaths

If you look at the whole world (data from the UN), the picture is somehow different! The UN ranking for the top 5 killers in 2008 (in both women and men) is:

  1. Lower respiratory infections: 1.05 million deaths
  2. Diarrhoeal diseases: 0.76 million deaths
  3. HIV/AIDS: 0.72 million deaths
  4. Ischaemic heart disease: 0.57 million deaths
  5. Malaria: 0.48 million deaths

All of them causes more than 45% of deaths around the world. These diseases with high-mortality vary in an important manner when we compare the USA and the whole world. The main caveat is that the data I presented above are for men and women. It would be interesting to use the UN data API project to dig further into details.

Cognitive Surplus visualised

In the 300-and-more RSS items in my aggregator this week, there are 2 great ones from Information is Beautiful, a blog gathering (and publishing its own) nice ways to visualise data.

The first one is based on a talk by Clay Shirky who, in turn, was referencing his book Cognitive Surplus. In Cognitive Surplus visualized, David McCandless just represented one of Shirky’s ideas: 200 billion hours are spent each year by US adults just watching TV whereas only 100 million hours were necessary to create Wikipedia (I guess the platform + the content) …

Cognitive Surplus visualised from Information Is Beautiful

It makes you think about either the waste television helps to produce either the potential of human brain(s) if relieved from the burden of television.

The second interesting post appeared in fact in information aesthetics, a blog where form follows data (referencing Information is Beautiful but I can’t find this post). In Top Secret America: Visualizing the National Security Buildup in the U.S., Andrew Vande Moere relates “an extensive investigative project of the Washington Post that describes the huge national security buildup in the United States after the September 11 attacks”. The project website contains all the ingredients for a well-documented investigation with the addition of interactive maps and flash-based interfaces allowing the user to build his/her own view on the project.

Top Secret America from the Washington Post

It’s nice to see investigative journalism combined with beautiful data visualisation and handling!