Tag: data

2013 with Fitbits

2013 is near its end and it’s time to see what happened during the last 360 days or so. Many things happened (graduated from MBA, new house, holidays, ill a few days, …) but I wanted to know if one could quantify these changes and how these changes would impact my daily physical activity.

For that purpose I bought a Fitbit One in March 2013. I chose Fitbit over other devices available because of the price (99 USD at the time) and because it was available in Europe (via a Dutch vendor). At that time the Jawbone Up was unavailable (even in the USA) and the Nike Fuelband couldn’t track my sleep.

Basically the One is a pedometer (it tracks the number of steps you make per day) but also the number of floors climbed and the time asleep. Note you have to tell your device when you go to sleep and when you wake up ; it will substract automatically the times you were awake. The rest of the data presented are taken from these few observed variables: distance traveled, calories burnt, … The Fitbit website also categorizes your activity from ‘sedentary’ to ‘very active’.

Of course there is an app (for both iOS and Android) where you can also enter what you eat (it automatically calculate the number of calories ingested) and your weight (unless you buy a wifi scale from them). You can set goals on the website and then it tells you how many steps you have to make per day. All this data is stored on a Fitbit server and you can access it via your personal dashboard (yes your data is kept away from you but there are ways to get it …).

Fitbit dashboard beta

I liked the Fitbit One mainly because it is easy to use: you take it and forget it, it works in the pocket. There is a nice, easy to use web interface – great for immediate consumption (not really for long trend analysis). It is quite cheap to acquire the device (well, it is quite small anyway). It works with desktop software as well as mobile app (incl. synchronisation). The One can easily be forgot in a pocket (gives peace of mind) but it doesn’t work when you don’t have pockets (shower, pyjama, changing clothes, … ; I didn’t use the clip/holder at the waist).

That leads me to its disadvantages …

  • First it’s a proprietary system: you need to pay 50USD in order to get the data you generate, to get your data. Although it makes perfect sense from a business perspective, the device then costs 150USD (and not only 99USD for acquisition alone).
  • Then it also uses a proprietary interface to charge the device. This is problematic when you move house (the cable is somewhere in a box) or simply when the cable is lost (see messages on Twitter asking for such cable when lost). Most mobile phone manufacturers understood that and provide regular USB interface (for charging and syncing btw). I guess the small form factor has a price to pay.
  • Tracking of other activities than movement is tedious, especially the need for an internet connection in order to enter food eaten in the app (but otherwise that’s the drawback of logging: auto-vs-manual in general).
  • Then tracking is sometimes not practical. e.g. between wake up and dressed up or shower. So is there always some under-reporting? Probably there is as I don’t wear it when changing or in pyjama (no pocket). Of course the One comes with an armband-holder but I guess it records data differently.

But the last and main disadvantage that comes to my mind is linked with its advantage: it is so easy to use and to forget (in the washing machine), it can fall and you won’t notice it.

So of course I lost it. It was in a business trip in South-East Asia. I thought I put it in my suitcase when changing pants but I couldn’t find it anymore. So after a few hesitations I chose to get a Fitbit Flex.

Fitbit Flex with charger and armband

The Flex comes in another format: it’s like a small pill that you put in a plastic armband-holder. Therefore it is closer to the body (but not legs, to count steps) and therefore you don’t need pockets. However it doesn’t give time (if you have a watch you’ll have 2 devices at your left wrist? Fitbit now sell an evolution of the Flex – the Force – with LEDs displaying time a.o.). As it is always in its armband I feel it is less likely to be forgotten. And you don’t need pockets, it’s like a bracelet you receive at some concerts. The battery autonomy is approximately the same: around 7 days. You can read here another comparison of the two.

So, what about 2013?

In order to dig the past I could:

  1. use the Fitbit dashboard (see first picture of this post) and visually track what I did, making screenshots as I want to keep some results offline ;
  2. shelve 50USD for the Premium reports that can be downloaded and use whatever software to look at the data – note that you get more than just reports for that ;
  3. use the Fitbit API and figure out how to get my data out it.

Of course I chose the third option. It is a bit more complicated but helped with one of Ben Sidders’post I started coding my “app” in R, the statistical language. As there is a bit more than Ben is explaining I posted all my code on the Github repository of my app, jepsfitbitapp.

The first thing I wanted to see is the most obvious one: my steps. As you can see in the figure below I started to collect data in March 2013 (with the One), I stopped collecting data around October 2013 (when I lost the One) and I re-started later on (with the Flex). I usually walk between 5,000 and 10,000 steps per day, with a maximum on July 1st (the day we moved). 10,000 steps is the daily goal Fitbit gave me. There is a significant difference in the number of steps measured by the One (before October) and the Flex (after October): I cannot really say if it is due to the change in tracking device (and their different location on the body) or if I kind of reduced my physical activity (mainly because of more work, sitting in the office).

Fitbit steps over time - 2013

As always, I’ll promise to add some physical activity on top of this baseline as a New Year resolution. We’ll see next year how things evolve. In the meantime I’ll explore more what I can extract from my Fitbits in the following posts. Stay tuned!

Belgium doesn’t score well in the Open Data Index (not speaking about health!)

The Open Knowledge Foundation (OKF) released the Open Data Index, along with details on how their methodology. The index contains 70 countries, with UK having the best score and Cyprus the worst score. In fact the first places are trusted by the UK, the USA and the Northern European countries (Denmark, Norway, Finland, Sweden).

And Belgium? Well, Belgium did not score very well: 265 / 1,000. The figure below shows its aggregated score (with green: yes, red: no, blue: unsure).

Open Data Index - Belgium

The issue with this graph is that you may first think it’s a kind of progress bar. For instance, in transport timetables, it seems Belgium reached 60% of a maximum. But the truth is that each bar represents the answer to a specific question. So the 9 questions are, from left to right:

  1. Does the data exist?
  2. Is it in digital form?
  3. Is it publicly available?
  4. Is it free of charge?
  5. Is it online?
  6. Is it machine readable (e.g. spreadsheet, not PDF)?
  7. Is it available in bulk?
  8. Is it open licensed?
  9. Is it up-to-date?

With the notable exceptions of government spending and postcodes/zipcodes, nearly all Belgian data is available in a way or another. That’s already a start – but … None of them are available in bulk nor machine readable nor openly licenced and only few of them are up to date. Be sure to read the information bubbles on the right of the table if you are interested in more details.

The national statistics category leads to a page of tbe Belgian National Bank. And here is one improvement that the OKF could bring to this index: there should be a category about health data. For Belgium we are stuck with some financial data from the INAMI (in PDF, not at all useful as is) but otherwise we have to rely on specific databases or the WHO, the OECD or the World Bank. The painful point is that these supranational bodies often rely on statistics from states themselves – but Belgium doesn’t publish these data by itself!

If you are interested in the topic, three researchers from the Belgian Scientific Institute of Public Health published a study about health indicators in publicly available databases, 2 years ago [1]. Their conclusions were already that Belgium should improve on Belgian mortality and health status data. And the conclusion goes on about politically created issues for data collection, case definition, data presentation, etc.

I was recently in a developping country (Vietnam) where we try to improve data collection: without reliable data collection it is difficult to know what are the issues and to track potential improvements. In the end, this is also applicable in Belgium: we feel proud of our healthcare system ; but on the other hand it is difficult to find health-related data in an uniform way. It is therefore difficult to track trends or improvements.

[1] Vanthomme K, Walckiers D, Van Oyen H. Belgian health-related data in three international databases. Arch Public Health. 2011 Nov 1;69(1):6.

About stacked bar graphs

This afternoon I received a bunch of data accompanied by stacked bar graphs for each dataset. For example, this one:

Stacked bar graph example

The chart shows the incidence of disease X in various age ranges. That incidence is split by 8 severity levels. The chart shows that the disease especially affects age ranges 4 and 5, at different severity levels. However I didn’t feel comfortable …

  • what are the different levels of severity in age ranges 1, 2 and 3?
  • how can we compare levels C, D and E in age ranges 4 and 5?
  • is there anywhere some severity A?
  • (it’s even worst when some age ranges don’t have any incidence at all: what is happening?)
  • etc.

I looked on the web but couldn’t find much information apart from the fact “The Economist says they’re so bad at conveying information, that they’re a great way to hide a bad number amongst good ones” (but are still using them in their graphic detail section) or “a stacked column chart with percentages should always extend to 100%” (this doesn’t really apply here). Then in a post on Junk Charts, someone mentioned Steven Few who would have said “not to use stacked bar charts because you cannot compare individual values very easily and as a rule [he] avoid[s] stacked bars with more than six or seven divisions”. And Steven Few also participated in his forum here.

This reminded me I read a book written by Steven Few, a few years ago: Information Dashboard Design (O’Reilly Media, 2006). Inside, on pages 135-136, one can read stacked bar graphs are the right choice only when you must display multiple instances of a whole and its parts, with emphasis primarily on the whole. And that this type of graph shouldn’t be used if the distribution changes must be shown more precisely.

If one wants to clearly display both the whole and its parts, Steven Few recommends to either use two graphs next to each other or a combination bar and line graph (with two quantitative scales).

As I’m not really interested in the whole but mainly in the parts and their relative distribution, I suggest another way to present the data. This isn’t really new. Actually everything was already in the table. You just format the table nicely and add some colour gradient. And voilĂ :

Simple table with data - instead of stacked bar graph

You still see where the incidence is the highest (in age ranges 4 and 5), what levels of severity are the most important (C, with lower but approximately similar levels of D, E and H). In addition to the graph above, one can notice there isn’t any severity levels A, B, F and G represented and we can quickly grasp the proportions between the different incidences.

Of course, if your criteria for “sexiness” is that there shouldn’t be any digit on your chart, then this chart is not sexy. But I find this presentation really more appealing and meaningful than the stacked bar graph. Isn’t it?

The Top 5 Killers of Men

From Delicious, I saw that Yahoo had an article about the top 5 killers of men. I thought it would be nice to see from where they get there data.

First, I have to mention that the article is really about American men, nothing else (not about mankind, not about men around the world, not about women, children, etc.). The article is related to the US National Men’s Health Week (the US National Women’s Health Week was in May 8-14, 2011). Although the article is giving advices, there are no sources of information.

However, it’s rather easy to obtain these numbers …

For the US, the CDC FastStats website is a hub to data about health in the US. Here is the CDC ranking for the top 5 killers in 2007 (in both US women and men):

  1. Heart disease: 616,067 deaths
  2. Cancer: 562,875 deaths
  3. Stroke (cerebrovascular diseases): 135,952 deaths
  4. Chronic lower respiratory diseases: 127,924 deaths
  5. Accidents (unintentional injuries): 123,706 deaths

If you look at the whole world (data from the UN), the picture is somehow different! The UN ranking for the top 5 killers in 2008 (in both women and men) is:

  1. Lower respiratory infections: 1.05 million deaths
  2. Diarrhoeal diseases: 0.76 million deaths
  3. HIV/AIDS: 0.72 million deaths
  4. Ischaemic heart disease: 0.57 million deaths
  5. Malaria: 0.48 million deaths

All of them causes more than 45% of deaths around the world. These diseases with high-mortality vary in an important manner when we compare the USA and the whole world. The main caveat is that the data I presented above are for men and women. It would be interesting to use the UN data API project to dig further into details.

Cognitive Surplus visualised

In the 300-and-more RSS items in my aggregator this week, there are 2 great ones from Information is Beautiful, a blog gathering (and publishing its own) nice ways to visualise data.

The first one is based on a talk by Clay Shirky who, in turn, was referencing his book Cognitive Surplus. In Cognitive Surplus visualized, David McCandless just represented one of Shirky’s ideas: 200 billion hours are spent each year by US adults just watching TV whereas only 100 million hours were necessary to create Wikipedia (I guess the platform + the content) …

Cognitive Surplus visualised from Information Is Beautiful

It makes you think about either the waste television helps to produce either the potential of human brain(s) if relieved from the burden of television.

The second interesting post appeared in fact in information aesthetics, a blog where form follows data (referencing Information is Beautiful but I can’t find this post). In Top Secret America: Visualizing the National Security Buildup in the U.S., Andrew Vande Moere relates “an extensive investigative project of the Washington Post that describes the huge national security buildup in the United States after the September 11 attacks”. The project website contains all the ingredients for a well-documented investigation with the addition of interactive maps and flash-based interfaces allowing the user to build his/her own view on the project.

Top Secret America from the Washington Post

It’s nice to see investigative journalism combined with beautiful data visualisation and handling!