A contact matrix is a representation of contacts between individuals. For instance, in order to model the spread of rumors on social media, you ideally have to rely on contact matrices to compute the strength of bonds between types of individual agents. In the infectious disease world, a contact matrix is used to approximate contacts between individuals, e.g. between grand-parents and grand-children.
In this blog post, after a short explanation of POLYMOD contact matrices, I will show how to get the data, process it and 3D print these matrices. Ready?
1. Finding contact matrices
The most used contact matrices in epidemiological modelling are coming from the POLYMOD study, published by Mossong et al. in 2008. The study is a population-based prospective survey of mixing patterns in eight European countries (Belgium, Germany, Finland, Great Britain, Italy, Luxembourg, The Netherlands, and Poland). For that purpose their method consisted in common paper-diaries used by individuals to record information about their daily contacts (you might think this is so old fashion but nobody reproduced this study or did better so far!).
So what does it look like (I’ll take Belgium as an example here)?
You can see above a heatmap of physical contacts between participants and their contacts. The more towards the blue indicates fewer contacts. The more towards white indicates more contacts. Therefore the diagonal towards the top right shows that most Belgian participants have contacts with people of the same age. And this diagonal has two “wings”, representing interactions between parents in their 30s and their children. There are also two “bumps”, representing interactions between grand-parents and their grand-children.
So these heatmaps are already something pleasant to the eye. But what if you could actually touch them? Can you actually physically play with them? This was made possible thanks to 3D printing, a manufacturing process that transform practically any custom 3D model created on a computed into a physical artifact.
We’ll first need to get the data, process it in a suitable format and finally print it …
There Dr. Gammino describes the hurdles faced by healthcare workers in countries where census data is often missing, where political, seasonal and geographical variations are making these more difficult. The description of the different social structures in urban or rural areas was also interesting. But the post also highlights how “social mapping” and geographic information systems (GIS) are helping understanding where the population resides and helping reaching them (here for polio vaccination but this could be for other purposes: maternity care, child care, etc.).
In this respect, modelling could help determine the best strategy to reach still unknown population, where settlers could move, where to concentrate efforts e.g. And a few papers actually address these issues. For instance, Rahmandad et al. studied the impact of network types (networks between individuals) on the dynamics of a polio outbreak. Or Tony Wragg reported the influence of information campaigns on polio eradication in India (one could use information as an infectious agent).
Now it would be interesting to see the two worlds collide: having these geotagged information feed a prediction model and reverting back predictions to healthcare workers in the field to inform them of potential areas to visit. This would have some implications for logistics and these efforts should also address privacy questions. But it would potentially help eradicating polio too.
With a few days of interval I received two very different ways of reviewing data collected by users of “activity trackers”.
The first one came from Jawbone (although I don’t own the UP, I might have subscribed to one of their mailing-lists earlier) and is also publicly available here. Named “2013, the big sleep” it a kind of infographics of how public (and mostly American) events influenced sleep of the “UP Community”. Here data about all (or at least a lot of) UP users were aggregated and shown. This is Big Data! This is a wonderful and quantitative insight on the impact of public event on sleep! But this is also a public display of (aggregated) individual data (something that UP users most probably agreed by default when accepting the policy, sometimes when they first used their device).
The second way came from Fitbit, also via e-mail. There was written how many steps I took in total as well as my most and least active periods / days of 2013. At the bottom there was a link to a public page comparing distances traveled in general with what it could mean in the animal kingdom (see below or here). This is not Big Data (although I am sure Fitbit have access to all these data). But at the same time (aggregated) individual data are not shared with the general public (although here again I am sure a similar policy apply to Fitbit users).
Different companies, different ways to handle the data … I hope people will realise the implication of sharing their data in an automated ways in such centralized services.
After a bit less than 2 hours, jepsfitbitapp retrieved my sleep data from Fitbit for the whole 2013 (read previous post for the why (*)). Since this dataset covers the period I didn’t have a tracking device and, more broadly, I always slept at least a little bit at night, I removed all data point where it indicates I didn’t sleep.
So I slept 5 hours and 37 minutes on average in 2013 with one very short night of 92 minutes and one very nice night of 12 hours and 44 minutes. Fitbits devices do not detect when you go to sleep and when you wake up: you have to tell tem (for instance by tapping 5 times on the Flex) that you go to sleep or you wake up (by the way this is a very clever way to use the Flex that has no button). Once told you are in bed the Flex manages to determine the number of minutes to fall asleep, after wakeup, asleep, awake, … The duration mentioned here is the real duration the Fitbit device considers I sleep (variable
Visually it looks like there is a tendency to sleep more as 2013 passes. But, although the best linear fit shows an angle, the difference between sleep in March and sleep in December is not significant.
R allows to study the data in many different ways (of course!). When plotting the distribution of durations asleep it seems this may be distributed like a normal (Gaussian) distribution (see the graph below). But the Shapiro-wilk normality test shows that the data doesn’t belong to a normal distribution.
The simple plot of the number of awakenings over time shows the same non-significant trend as the sleep duration (above). The histogram of these awakenings shows a more skewed distribution to the left (to a low number of awakenings) (than the sleep duration). This however shows there is a relation between the two variables: the more I sleep the more the Flex detects awakenings (see second graph below).
Sleep efficiency is the ratio between the total time asleep by the total time in bed from the moment I fell asleep. This is therefore not something related to the different sleep stages. However it may indicate an issue worth investigating with a real doctor. In my case, although I woke up 9 time per night on average in 2013, my sleep efficiency is very high (93.7% on average) …
… or very low. There are indeed some nights where my sleep efficiency is below 10% (see the 4 points at the bottom of the chart). These correspond with nights when I didn’t sleep a lot and also with very little awakenings (since these are related).
There is no mood tracking with Fitbit (except one additional tracker that you can define by yourself and must enter a value manually): everything tracked has to be a numerical value either automatically tracked or manually entered. It would be interesting to couple these tracked variables with the level of fatigue at wake-up time or the mood you feel during the subsequent day. I guess there are apps for that too …
(*) Note: I just discovered that there is in fact a specific call in the API for time series … This is for a next post!
Previous data belong to the “activities” category. In this category it is easy to get data about a specific activity over several days in one request. All parameters related to sleep are not in the same category and I couldn’t find a way to get all the sleep durations (for instance) in one query (*). So I updated the code to requests all sleep parameters for each and every day of 2013 … and I hit the limit of 150 requests per hours.
This graph is what I achieved so far. I didn’t sleep much in March-April 2013: on average 4.9 hours per night. The interesting thing is that I can understand why by going back to my agenda at that time (work, study, family …). As soon as I can get additional data it would be interesting to see if sleep durations will increase later on.
(*) If you know how to get all sleep durations for 2013 in one query, let me know!
I continue to explore data about my physical activity in 2013 (see part 1). We moved from an apartment (on the third floor of a building) to a house (with two floors) on July 1st, 2013. I was wondering if the change would have an impact on the number of floors I climbed: I now have to climb to reach bedrooms and go down to go in the living room. A standard house.
Two things before diving into data … First I sometimes used to climb the stairs to the 3rd floor in my building (and I worked all the time at the same floor at the office). Then only the Fitbit One is collecting the number of floors you climb, not the Flex (you can enter them in the web interface but I don’t). So I don’t value the data after I lost my Fitbit One (Sep. 16). I don’t really know how the One determines the number of stairs I climb but I felt it was fairly accurate. For instance when I climbed 3 stairs in my building, the One always indicated +3 stairs on its counter.
So now the data. I updated the R scripts and here is what I get for the number of floors.
On average I did not climb a lot of stairs. In general it is below 20. And if I compare the data before and after the move there is indeed a significant difference (p=2.49e-06)! But I was climbing more floors when I was in my apartment than when I was/am in a house (respective means of 12.59 and 7.37 floors)!
There are a few outliers, days when I climbed relatively more than others. Going back to my agenda, it corresponded to:
- one day I took holidays just after the move in order to arrange things at home (strangely the days of the move doesn’t correspond to more of that activity);
- one day when I came back from a business trip (I had to walk a lot to/in/from airports);
- two days with no particular event.
The lessons I take are that you don’t necessarily need stairs in the area where you live to actually climb more floors (in my case it appears to be the opposite). And I don’t necessarily need to have a specific activity to climb more floors, hence it’s a question of willingness more than anything else.
Next post: how much sleep did I get in 2013!
2013 is near its end and it’s time to see what happened during the last 360 days or so. Many things happened (graduated from MBA, new house, holidays, ill a few days, …) but I wanted to know if one could quantify these changes and how these changes would impact my daily physical activity.
For that purpose I bought a Fitbit One in March 2013. I chose Fitbit over other devices available because of the price (99 USD at the time) and because it was available in Europe (via a Dutch vendor). At that time the Jawbone Up was unavailable (even in the USA) and the Nike Fuelband couldn’t track my sleep.
Basically the One is a pedometer (it tracks the number of steps you make per day) but also the number of floors climbed and the time asleep. Note you have to tell your device when you go to sleep and when you wake up ; it will substract automatically the times you were awake. The rest of the data presented are taken from these few observed variables: distance traveled, calories burnt, … The Fitbit website also categorizes your activity from ‘sedentary’ to ‘very active’.
Of course there is an app (for both iOS and Android) where you can also enter what you eat (it automatically calculate the number of calories ingested) and your weight (unless you buy a wifi scale from them). You can set goals on the website and then it tells you how many steps you have to make per day. All this data is stored on a Fitbit server and you can access it via your personal dashboard (yes your data is kept away from you but there are ways to get it …).
I liked the Fitbit One mainly because it is easy to use: you take it and forget it, it works in the pocket. There is a nice, easy to use web interface – great for immediate consumption (not really for long trend analysis). It is quite cheap to acquire the device (well, it is quite small anyway). It works with desktop software as well as mobile app (incl. synchronisation). The One can easily be forgot in a pocket (gives peace of mind) but it doesn’t work when you don’t have pockets (shower, pyjama, changing clothes, … ; I didn’t use the clip/holder at the waist).
That leads me to its disadvantages …
- First it’s a proprietary system: you need to pay 50USD in order to get the data you generate, to get your data. Although it makes perfect sense from a business perspective, the device then costs 150USD (and not only 99USD for acquisition alone).
- Then it also uses a proprietary interface to charge the device. This is problematic when you move house (the cable is somewhere in a box) or simply when the cable is lost (see messages on Twitter asking for such cable when lost). Most mobile phone manufacturers understood that and provide regular USB interface (for charging and syncing btw). I guess the small form factor has a price to pay.
- Tracking of other activities than movement is tedious, especially the need for an internet connection in order to enter food eaten in the app (but otherwise that’s the drawback of logging: auto-vs-manual in general).
- Then tracking is sometimes not practical. e.g. between wake up and dressed up or shower. So is there always some under-reporting? Probably there is as I don’t wear it when changing or in pyjama (no pocket). Of course the One comes with an armband-holder but I guess it records data differently.
But the last and main disadvantage that comes to my mind is linked with its advantage: it is so easy to use and to forget (in the washing machine), it can fall and you won’t notice it.
So of course I lost it. It was in a business trip in South-East Asia. I thought I put it in my suitcase when changing pants but I couldn’t find it anymore. So after a few hesitations I chose to get a Fitbit Flex.
The Flex comes in another format: it’s like a small pill that you put in a plastic armband-holder. Therefore it is closer to the body (but not legs, to count steps) and therefore you don’t need pockets. However it doesn’t give time (if you have a watch you’ll have 2 devices at your left wrist? Fitbit now sell an evolution of the Flex – the Force – with LEDs displaying time a.o.). As it is always in its armband I feel it is less likely to be forgotten. And you don’t need pockets, it’s like a bracelet you receive at some concerts. The battery autonomy is approximately the same: around 7 days. You can read here another comparison of the two.
So, what about 2013?
In order to dig the past I could:
- use the Fitbit dashboard (see first picture of this post) and visually track what I did, making screenshots as I want to keep some results offline ;
- shelve 50USD for the Premium reports that can be downloaded and use whatever software to look at the data – note that you get more than just reports for that ;
- use the Fitbit API and figure out how to get my data out it.
Of course I chose the third option. It is a bit more complicated but helped with one of Ben Sidders’post I started coding my “app” in R, the statistical language. As there is a bit more than Ben is explaining I posted all my code on the Github repository of my app, jepsfitbitapp.
The first thing I wanted to see is the most obvious one: my steps. As you can see in the figure below I started to collect data in March 2013 (with the One), I stopped collecting data around October 2013 (when I lost the One) and I re-started later on (with the Flex). I usually walk between 5,000 and 10,000 steps per day, with a maximum on July 1st (the day we moved). 10,000 steps is the daily goal Fitbit gave me. There is a significant difference in the number of steps measured by the One (before October) and the Flex (after October): I cannot really say if it is due to the change in tracking device (and their different location on the body) or if I kind of reduced my physical activity (mainly because of more work, sitting in the office).
As always, I’ll promise to add some physical activity on top of this baseline as a New Year resolution. We’ll see next year how things evolve. In the meantime I’ll explore more what I can extract from my Fitbits in the following posts. Stay tuned!
I work in a company that shifted from being R&D-driven to being project-driven. It is official since this 2013 but we saw it coming: the main pieces of memory are Powerpoint slides since a few years.
Everything is in Powerpoint, from agendas, discussions, presentations to minutes. Even when modelers want to show some results, they put them on a slide deck first …
For presentations I used to use Beamer but installing the LaTeX toolchain on a restricted, company-owned Windows laptop was a long and cumbersome process. I made a first presentation in Reveal.js this week. And I love it!
I prefer Beamer, Reveal.js and similar tools because a) it forces you to think of your message and its structure first and b) it forces you to reuse material that is already produced (rather than creating new things in / for the presentation medium). Therefore the presentation is a real presentation of something that really exists, that was really thought outside of the context of the presentation (and before it!).
The additional benefit (IMHO) of Reveal.js is that you just need a text editor and a browser. All restricted company laptops provide you with these tools.
The memory of my projects remains in my models, my notes and reports. The context, next steps and consequences are there. The project evolution is better understood (in reviews, audits and simple chats with colleagues).
The Open Knowledge Foundation (OKF) released the Open Data Index, along with details on how their methodology. The index contains 70 countries, with UK having the best score and Cyprus the worst score. In fact the first places are trusted by the UK, the USA and the Northern European countries (Denmark, Norway, Finland, Sweden).
And Belgium? Well, Belgium did not score very well: 265 / 1,000. The figure below shows its aggregated score (with green: yes, red: no, blue: unsure).
The issue with this graph is that you may first think it’s a kind of progress bar. For instance, in transport timetables, it seems Belgium reached 60% of a maximum. But the truth is that each bar represents the answer to a specific question. So the 9 questions are, from left to right:
- Does the data exist?
- Is it in digital form?
- Is it publicly available?
- Is it free of charge?
- Is it online?
- Is it machine readable (e.g. spreadsheet, not PDF)?
- Is it available in bulk?
- Is it open licensed?
- Is it up-to-date?
With the notable exceptions of government spending and postcodes/zipcodes, nearly all Belgian data is available in a way or another. That’s already a start – but … None of them are available in bulk nor machine readable nor openly licenced and only few of them are up to date. Be sure to read the information bubbles on the right of the table if you are interested in more details.
The national statistics category leads to a page of tbe Belgian National Bank. And here is one improvement that the OKF could bring to this index: there should be a category about health data. For Belgium we are stuck with some financial data from the INAMI (in PDF, not at all useful as is) but otherwise we have to rely on specific databases or the WHO, the OECD or the World Bank. The painful point is that these supranational bodies often rely on statistics from states themselves – but Belgium doesn’t publish these data by itself!
If you are interested in the topic, three researchers from the Belgian Scientific Institute of Public Health published a study about health indicators in publicly available databases, 2 years ago . Their conclusions were already that Belgium should improve on Belgian mortality and health status data. And the conclusion goes on about politically created issues for data collection, case definition, data presentation, etc.
I was recently in a developping country (Vietnam) where we try to improve data collection: without reliable data collection it is difficult to know what are the issues and to track potential improvements. In the end, this is also applicable in Belgium: we feel proud of our healthcare system ; but on the other hand it is difficult to find health-related data in an uniform way. It is therefore difficult to track trends or improvements.
 Vanthomme K, Walckiers D, Van Oyen H. Belgian health-related data in three international databases. Arch Public Health. 2011 Nov 1;69(1):6.
In my opinion privacy issues are a by-product of information conservation times reaching infinite.
For centuries and more humans were used to their own type of memory. When information reaches the brain, it is stored in short-term memory. When relevant and/or repeated, it is gradually consolidated into long-term memory (this is roughly the process).
The invention of oral transmission of knowledge, written transmission (incl. Gutenberg) and, to a certain extend, internet, all these successively increased the duration of retention of information shared with others. The switch from oral to written transmission of knowledge also sped up the dissemination of information as well as its fixed, un-(or less-) interpreted nature.
With the internet (“1.0″ in order to put some buzzword) the duration of information is also extended but somehow limited ; it was merely a copy of printing (except speed of transmission). Take this blog, for instance: information stored here will stay as long as I maintain or keep the engine alive. The day I decide to delete it, information is gone. And the goal of internet was to be able to reach information where it is issued, even if there are troubles in communication pipes.
However on top of this internet came a serie of tools like search engines (“Google”) and centralized social networks (“Facebook”). Now this information is copied, duplicated, reproduced, either because of the digital nature of the medium that allows that with ease. But also because these services deliberately concentrate the information otherwise spread. Google concentrate (part of) the information in its own datacenters in order to extract other types of information and serves searches faster. Facebook (and other centralized social networks) asks users to voluntarily keep their (private) information in their own data repository. And apparently the NSA is also building its own database about us at its premises.
In my opinion, whenever we were sharing information before, privacy issues were already there (what do you share? to whom? in which context? …). But the duration of information is now becoming an issue.