Month: February 2012

Maximum number of characters in a Windows path is 260 characters

A Java project compilation went berserk and I ended up with a directory structure of more than 260 characters. I stopped the mad process but it already created more than 50 successive duo of path “build/classes” …

Duo of build/classes directories in path created by Netbeans

Now I had to delete this structure. And, to my surprise, it was impossible. When you try to just press the “Delete” key with the root directory selected in the File Explorer, you get a Path Too Long exception. The reason is that the maximum length of a path according to the Windows API (MAX_PATH variable) is defined as 260 characters. I tried some other methods but all of them failed:

  • write a small Java program that tried to delete the whole path: Netbeans (Java) was able to create this mess, why shouldn’t another Java program be able to delete it? Impossible.
  • write a small C++ program that tried to delete the whole path: as long as you stick with the Windows API, it’s impossible (I read that it could be possible using the boost::filesystem library but didn’t try).
  • try some Portable Apps utilities for file management: impossible (even when the software was using another framework like Qt).

Finally, I just ran a Cygwin terminal, went to the ad hoc location and did a simple “rm -rf libtest“. And voilà. So, next time Windows forbids me from doing something, it might be a good idea to directly rely on a true terminal from a Unix-like environment. I didn’t try a liveCD (I didn’t have such CD to hand) but it might be also possible.

About stacked bar graphs

This afternoon I received a bunch of data accompanied by stacked bar graphs for each dataset. For example, this one:

Stacked bar graph example

The chart shows the incidence of disease X in various age ranges. That incidence is split by 8 severity levels. The chart shows that the disease especially affects age ranges 4 and 5, at different severity levels. However I didn’t feel comfortable …

  • what are the different levels of severity in age ranges 1, 2 and 3?
  • how can we compare levels C, D and E in age ranges 4 and 5?
  • is there anywhere some severity A?
  • (it’s even worst when some age ranges don’t have any incidence at all: what is happening?)
  • etc.

I looked on the web but couldn’t find much information apart from the fact “The Economist says they’re so bad at conveying information, that they’re a great way to hide a bad number amongst good ones” (but are still using them in their graphic detail section) or “a stacked column chart with percentages should always extend to 100%” (this doesn’t really apply here). Then in a post on Junk Charts, someone mentioned Steven Few who would have said “not to use stacked bar charts because you cannot compare individual values very easily and as a rule [he] avoid[s] stacked bars with more than six or seven divisions”. And Steven Few also participated in his forum here.

This reminded me I read a book written by Steven Few, a few years ago: Information Dashboard Design (O’Reilly Media, 2006). Inside, on pages 135-136, one can read stacked bar graphs are the right choice only when you must display multiple instances of a whole and its parts, with emphasis primarily on the whole. And that this type of graph shouldn’t be used if the distribution changes must be shown more precisely.

If one wants to clearly display both the whole and its parts, Steven Few recommends to either use two graphs next to each other or a combination bar and line graph (with two quantitative scales).

As I’m not really interested in the whole but mainly in the parts and their relative distribution, I suggest another way to present the data. This isn’t really new. Actually everything was already in the table. You just format the table nicely and add some colour gradient. And voilà:

Simple table with data - instead of stacked bar graph

You still see where the incidence is the highest (in age ranges 4 and 5), what levels of severity are the most important (C, with lower but approximately similar levels of D, E and H). In addition to the graph above, one can notice there isn’t any severity levels A, B, F and G represented and we can quickly grasp the proportions between the different incidences.

Of course, if your criteria for “sexiness” is that there shouldn’t be any digit on your chart, then this chart is not sexy. But I find this presentation really more appealing and meaningful than the stacked bar graph. Isn’t it?