Tag: chart

Digitize you charts with Engauge Digitizer

A few words of appreciation for an open source software that can help you a lot in your work, Engauge Digitizer (ED) from Mark Mitchell. ED is a simple, straightforward curve digitizer: it takes images with graphs like the one below and transform them (with a little help) in data you can use later on.

170804-Engauge-survival0

Continue reading “Digitize you charts with Engauge Digitizer”

About stacked bar graphs

This afternoon I received a bunch of data accompanied by stacked bar graphs for each dataset. For example, this one:

Stacked bar graph example

The chart shows the incidence of disease X in various age ranges. That incidence is split by 8 severity levels. The chart shows that the disease especially affects age ranges 4 and 5, at different severity levels. However I didn’t feel comfortable …

  • what are the different levels of severity in age ranges 1, 2 and 3?
  • how can we compare levels C, D and E in age ranges 4 and 5?
  • is there anywhere some severity A?
  • (it’s even worst when some age ranges don’t have any incidence at all: what is happening?)
  • etc.

I looked on the web but couldn’t find much information apart from the fact “The Economist says they’re so bad at conveying information, that they’re a great way to hide a bad number amongst good ones” (but are still using them in their graphic detail section) or “a stacked column chart with percentages should always extend to 100%” (this doesn’t really apply here). Then in a post on Junk Charts, someone mentioned Steven Few who would have said “not to use stacked bar charts because you cannot compare individual values very easily and as a rule [he] avoid[s] stacked bars with more than six or seven divisions”. And Steven Few also participated in his forum here.

This reminded me I read a book written by Steven Few, a few years ago: Information Dashboard Design (O’Reilly Media, 2006). Inside, on pages 135-136, one can read stacked bar graphs are the right choice only when you must display multiple instances of a whole and its parts, with emphasis primarily on the whole. And that this type of graph shouldn’t be used if the distribution changes must be shown more precisely.

If one wants to clearly display both the whole and its parts, Steven Few recommends to either use two graphs next to each other or a combination bar and line graph (with two quantitative scales).

As I’m not really interested in the whole but mainly in the parts and their relative distribution, I suggest another way to present the data. This isn’t really new. Actually everything was already in the table. You just format the table nicely and add some colour gradient. And voilà:

Simple table with data - instead of stacked bar graph

You still see where the incidence is the highest (in age ranges 4 and 5), what levels of severity are the most important (C, with lower but approximately similar levels of D, E and H). In addition to the graph above, one can notice there isn’t any severity levels A, B, F and G represented and we can quickly grasp the proportions between the different incidences.

Of course, if your criteria for “sexiness” is that there shouldn’t be any digit on your chart, then this chart is not sexy. But I find this presentation really more appealing and meaningful than the stacked bar graph. Isn’t it?

Software license and use of end-product

In one of his buzz, Cédric Bonhomme drew my attention on the Highcharts javascript library. This library can produce beautiful charts of various types with some Ajax interaction. The only negative point imho is that it is dual-licensed and all cases deprive you from your freedom:

  • there is a first Creative Commons Attribution-NonCommercial 3.0 License: you can use the library for your non-profit website (see details on the licensing page) ;
  • there is a commercial license for any other website.

Now what if we only need the end-product, i.e. the resulting chart, in a commercial environment? What is covered by the license is just the re-use of the javascript library in a website, not the resulting chart. If a company choose to use Highcharts internally to render some beautiful charts and just publish (*) the resulting image, I guess they can just download the library and use it (* by “publishing”, I mean: publish a scientific paper in a peer-reviewed journal, not publishing on its website). On the other hand, no one ever questioned the fact commercial companies have licenses for all the proprietary software they use to produce anything else, from charts to statistical data, just because they publish results with these software as tools. So the “trick” here would be that, by changing the medium on which you display end-results (from website to paper, even if it’s in PDF on the journal website), you can use the free-to-download license, even in a commercial environment, for an article from a commercial company. I’m not sure this was the original intention of Highslide Software.