Tag: data

Time commuting in Belgium

DISO1 – Data I Sit On, episode 1. This post is the first of a series of a few exploring data I collected in the past and that I found interesting to look at again … (I already posted about data I collected, see the Quantified Self tag on this blog)

Life is short and full of different experiences. One of the experiences I don’t specifically enjoy but is integral part of life is commuting. Although I tried to minimize commuting (mainly by choosing home close to the office) and benefit(ed) from good work conditions (flexible working hours, home working, etc.), a big change occurred when I took a new opportunity, in 2015, to work in the Belgian capital, Brussels.

Debian-lover-car-jepoirrier-on-flickr
Traffic jam in Brussels – one of my pictures on Flickr (CC-by-sa)

From where I lived at that time, using public transportation was not a viable option, unfortunately: it implied roughly 2 hours to go one way and changing at least 2 times between bus, train and metro. Anyway Belgium is know for having lots of cars and I benefited from a company car. Since some time, I’m also interested in Quantitative Self so I started collecting data about my daily commute.

What I try to see is the seasonality of commuting (I would initially expect shorter commute time during school breaks), the differences between leaving for work after driving children to school or without driving them, … There is also an extensive literature on the impact of commuting on the quality of life …

So, how did I do that?

The route usually taken, between my home then (in Wavre) and my office then (in Brussels, both in Belgium), is 28km long and the fastest I ever saw on Google maps to drive this distance is about 20-25 minutes.

I took note of the following elements in whatever default note-taking app is there in my phone at that moment (Keep on Android, Notes on iOS). The first field in each row is the date in a %y%m%d format, i.e. year, month and day of month as zero-padded decimal numbers, 2 digits only for each. The second field is the start time in a %H%M format, i.e. hours (24-hour clock) and minutes also as zero-padded decimal numbers, 2 digits only for each. Start time is defined when I enter my car at home, in the morning. The third field is the arrival time (same format as start time), defined as when I stop the engine at work. The fourth and fifth fields are start and arrival times when I go back home, defined and formatted the same way, mutatis mutandis. Any missed start/arrival times is marked as “na” or “NA”. It corresponds, for instance, at times when I leave the office but I stop to meet a client (or more prosaically, to do grocery shopping) before coming back home. I may have missed one or two whole days at max. The data is on Github.

On a daily basis, the little game is to try to figure out which lane is the fastest, if there is a pattern in the journey that makes it faster (I think there is). However, there are so many little things to track in this game that I did not track these small differences. The journey is assumed to take more or less the same route.

At the end, the complete log is saved on my computer and analysed in R (version 3.3.2). The typical measures I’m interested in are departure/arrival times over time, commute duration over time, commute duration per month or per day of the week or per season, … for both the morning and afternoon journeys if applicable. Some funny measures should be the earliest I left for work, the latest I arrived at work, the earliest I left work, the latest I left work, the shortest journey ever (to compare to Google estimate) and the longest journey ever …

An unintended measure here is the amount of time actually spent in the office (on a side note, this is different than productivity – but I didn’t find any unambiguous or flawless measure of productivity so far …). Some interesting variations could be to see the average and median duration of my work days, the shortest day or longest day I had, … (I don’t know if my former employer would be happy or angry to see these results ūüėČ but note this doesn’t take into account the numerous times I worked from home, even in evenings after having worked the whole day in the office …).

In theory, the fastest I could go is at an average 84km/h (28km in 20 minutes, according to Google Maps, so this is according to traffic, not maximum speed limits). In practice, this is a whole different story …

In a bit more than a year of collected data:

  • the earliest I left home was 6.11 and the latest 10.11;
  • consequently, the earlier I arrived at work was 6.32 and the latest 10.36;
  • the shortest trip to work was 18 minutes and the longest one was 160 minutes (it was on March, 22, 2016, the day of Brussels airport bombing because the office is close to the airport – I still remember);
  • the earliest I left work was 12.34 (I assume half-day of holidays) and the latest 21.24 (I assume lots of work then);
  • consequently, the earlier I arrived back home was 12.59 and the latest 21.43;
  • the shortest trip back home was 7 minutes (there should be some input error here!!!) and the longest trip was 128 minutes (nothing surprising, here, with Brussels traffic jams).

Finally, the shortest stay in office was 242 minutes (4 hours and 2 minutes) – it was that half-day of holidays. And the longest stay in office was 754 minutes (12 hours and 34 minutes).

As always, these things are nice when rendered as graphs …

180215-BxlAllTrafficPoints

A first note it that none of these graphs show any seasonality in the data. At first, I thought I would go faster during school holidays – but it was more a feeling than anything else, as the data show. And although the time at work varied widely over time, the average time spent at work seems to be pretty constant over the year, I was surprised by this:

180215-BxlTimeSpentAtWork

Finally, the time spent in car depending on the departure time is interesting:

180215-BxlTimeSpentInCar

Going to work was clearly split into 2 periods: leaving home (“Start Time”) before 8.30 and after 8.30. That’s because either I went early (and avoided the morning rush hour) or I drove the kids to school and drove to work at the end of rush hour. But although I tried to minimize the journey, the journey after driving the kids to school was still taking more time.

For the evening, going home became a shorter trip if I was able to delay it. And the later I come back, the shorter the trip. (However, if I didn’t drive the kids to school in the morning, the deal is that I would pick them up in the afternoon – fortunately, afterschool care is cheap in Belgium).

All this to come to the quality of life … I didn’t measure anything related to quality of life. I just remember that the first few weeks were very tiring. However, this commuting factor should be added to other tiring factors: learning a new job, adjusting to a new environment, etc. But there is a body of scientific work looking at the quality of life of commuting (I really like this paper as a starter [1], probably because it was published during that period): fatigue, stress, reduced sleep time, heart disease, absenteeism, BMI (weight), … are all linked – in a way – to commuting (either driving or just sitting in public transport).

[1] K√ľnn‚ÄźNelen, A. (2016) Does Commuting Affect Health? Health Econ., 25: 984‚Äď1004. doi: 10.1002/hec.3199

And a last point: privacy. This data is from 2015-2016. People who know me (even former colleagues!) know where I worked. And even without knowing me, you know when I leave home, when I leave the office, my pattern of organization, etc. Do I want that? Part of the answer is that I only post this data now, 2-3 years later. On the other hand, here is another free, small dataset!

Next steps? I’m continuing to track my journeys to work, even now we moved to the USA. For privacy reasons, I will not publish those data immediately. But it will be interesting, later, to compare the different patterns and try to understand at least some differences … It would also be interesting to give more time to this small experiment and, for instance, try to capture any impact on mood, productivity, … But this would become a whole different story!

Increasing certainty in flu vaccine effectiveness

According to CDC data, studies are getting better at estimating the influenza vaccine effectiveness.

With the 2017-2018 flu season still going on in the USA, there are already some indication that vaccines have some effectiveness (although its target strains were mismatched). The CDC reports how it measures vaccine effectiveness here and I was interested in their confidence intervals (the interval that takes into account uncertainties to extrapolate to the broader, unknown population).

Here is the same graph as on the CDC page, but with confidence interval:

180223-Flu-vaccine-effectiveness-USA-influenza-season
* 2016-2017 VE are still estimates. ** 2017-2018 interim early estimates may differ from final end-of-season estimates.

You can already notice it above but the graph below confirms that the confidence interval becomes narrower with the various flu season. This can come from various reasons. One obvious reason is that early seasons (< 2007-08) had a very small sample size (< 1,000). But overall, we can notice a gain of certainty around the effectiveness (the lower the line below, the more certainty).

180223-Flu-vaccine-effectiveness-USA-confidence-interval-influenza-season
* 2016-2017 VE are still estimates. ** 2017-2018 interim early estimates may differ from final end-of-season estimates.

As usual, the dataset (and code to generate the graphs above) are on my Github repo.

Euthanasia in the Netherlands and Belgium, 1990-2015

While parsing the general literature, I found this paper from van der Heide et al. (2017) giving some numbers about end-of-life decisions in the Netherlands these past 25 years. I was wondering if one could see similar evolution in Belgium. And I didn’t have to look very far: van der Heide cited another NEJM paper with Belgian numbers (Chambaere et al., 2015 ; an attentive reader will notice “Belgian” data is “only” about Flanders, not the whole Belgium).

If you put together the data about euthanasia itself (not counting other type of end-of-life assistance), you obtain approximately the same proportion and evolution:

euthanasia_NL_BE

I’m not aware of more recent Belgian data using the same methodology (i.e. physician interviews). The Belgian Commission f√©d√©rale de Contr√īle et d‚Äô√Čvaluation de l‚ÄôEuthanasie (CFCEE) presented its last report in October 2016. This report contained numbers for years 2014 and 2015. But these numbers were related to euthanasia that were officially requested (and granted) by the Commission. For instance, the Commission granted 1 928 euthanasia for a total of 104 723 deaths in Belgium in 2014 (i.e. 1.84% ; deaths in Belgium in the Open Data repository). If we focus only on requests written in Flemish, we find 2.59% of euthanasia in 2014 (1 523 euthanasia for a total of 58 858 deaths) (note: Flemish is the language spoken in Flanders – the region targeted by interviews in the Chambaere et al. paper – but requests in Flemish might have originated from other regions). One might have found different numbers if one would have used interviews like van der Heide or Chambaere.

Dataset (note there is more data in a Wikipedia article)

Evolution of the number and causes of death in Belgium (2010-2014)

Statbel, the Belgian governmental organisation for data and statistics, just released mortality data for 2014 (press release in French, dataset). The headline of their press release was that, for the first time, tumors were the first cause of death for Belgian men. Diseases of the circulatory system remains the main cause of death in Belgium, for women and for both sex together.

While the death of someone is a bad news in itself, I’m more interested here in the¬†evolution of death causes. I’m interested in the evolution of causes of death because it might be a consequence of the evolution of the Belgian society and, as a proxy, of any (most) developed, occidental countries.

If you look at the data, the number of Belgians dying is stable and natural death is still the main cause (and also stable, around 93%). Note that if we look at data before 2010, it seems that mortality is slightly increasing since around 2005.

Evolution of the number of deaths in Belgium, all causes, 2010-2014

If the total number of deaths seems stable, the press release seemed to indicate that tumors (cancers) are on the rise, especially in men. The breakdown in categories is made following the international classification ICD-10 and, because the names of the different chapters are quite long for graphs, I will use the corresponding chapter numbers instead. Here is the key:

Chapter Header
I Certain infectious and parasitic diseases (A00-B99)
II Neoplasms (C00-D48)
III Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism (D50-D89)
IV Endocrine, nutritional and metabolic diseases (E00-E90)
V Mental and behavioural disorders (F00-F99)
VI Diseases of the nervous system (G00-G99)
VII Diseases of the eye and adnexa (H00-H59)
VIII Diseases of the ear and mastoid process (H60-H95)
IX Diseases of the circulatory system (I00-I99)
X Diseases of the respiratory system (J00-J99)
XI Diseases of the digestive system (K00-K93)
XII Diseases of the skin and subcutaneous tissue (L00-L99)
XIII Diseases of the musculoskeletal system and connective tissue (M00-M99)
XIV Diseases of the genitourinary system (N00-N99)
XV Pregnancy, childbirth and the puerperium (O00-O99)
XVI Certain conditions originating in the perinatal period (P00-P96)
XVII Congenital malformations, deformations and chromosomal abnormalities (Q00-Q99)
XVIII Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified (R00-R99)
XX External causes of morbidity and mortality (V01-Y98)

One thing to notice is that, for chapter IV, Statbel only counts categories E00 to E88 while the WHO includes 2 more, from category E00 to E90¬†;¬†I would assume here that it has no important impact. Also note that, below, R ordered the chapters in a strange way – I’ll see how to fix that.

Excluding natural causes, we see that indeed, diseases of the circulatory system (chapter IX) are still the first cause of death, followed by neoplasms (chapter II) and diseases of the respiratory system (chapter X). If we compare the relative ratio of all these causes (second graph below), we also find the same conclusion Рbut the relative decline in deaths due to diseases of the circulatory system is better shown. And we can see that neoplasms take back approximately the same relative percentage of death, in 2014 (although they returned to the absolute number of deaths of 2012, approximately).

Causes of death in Belgium, 2010-2014

Causes of death in Belgium, 2010-2014, relative numbers

The available data set doesn’t go into more details than numbers by ICD-10 chapters. Therefore we cannot tell from that what kind of neoplasm is the most prevalent or what kind of infectious disease is the most present in Belgium, for instance. The press release however mentions that respiratory, colorectal and breast cancers are the top three killers and that flu was not very present in 2014.

As the cancer occurrence is increasing with age, and as the Belgian population is aging, one of the explanation for a high number of deaths due to neoplasms can be age ; however we don’t see a dramatic¬†increase of neoplasms (fortunately!). Another potential factor is the impact of screening for cancers. Due to a very intelligent political split (sarcasm!), prevention (and therefore screening) is not a federal duty. Therefore regions started different screening programs, at different times, with different results. Screening data and their results are therefore difficult to obtain. The¬†Belgian Cancer Registry doesn’t publish data on¬†screening in oncology¬†– although its latest report (revised version of April 2016) very often mentions screening as a main factor for change in the number of cases diagnosed. In its 2016 report¬†(PDF), the Flemish Center for the Detection of Cancer (Centrum voor kankeropsporing) indicates that they increased the number of women screened for breast cancer by more than 8% between 2011 and 2015 (especially in 2015), with a quality of test between 90% and 95%. They also showed an increase in cancer diagnostics (without linking it directly to the increase in screening).

screening-flanders

This is by no means an exhaustive review of the data. There are other potentially interesting things to look at: the geographical disparities between the three regions, the gender ratio evolution (as some of these diseases are known or by definition affecting more one sex than the other), etc.

It would also be interesting to follow these trends as some changes occurred recently in the Belgian curative landscape. New drugs in cancer immunotherapy were recently authorised and reimbursed, for melanoma, lung – and other indications will follow. These costs have a price (less than what is in the press, however, I may come back on this in a future post) but they delay death (unfortunately they don’t avoid it). However, for some of them, in some indications, their administration and reimbursement is sometimes also linked with screening, testing and prior treatment failure ; that might decrease their impact on overall mortality. New drugs for Hepatitis C also arrived in 2015 and 2016 and the Belgian health minister decided to reimburse these drugs for patients in their early stage 2 of the disease. Studies showed that treating at this stage may prevent hepatitis C from progressing to later stages and, in some cases, studies showed patients cured from the disease. This is an opportunity to see a decline in mortality due to this infectious disease (although it is already quite low – compared to other diseases).

2013 in review: how to use your users’ collected data

With a few days of interval I received two very different ways of reviewing data collected by users of “activity trackers”.

Jawbone_20140117-075010b The first one came from Jawbone (although I don’t own the UP, I might have subscribed to one of their mailing-lists earlier) and is also publicly available here. Named “2013, the big sleep” it a kind of infographics of how public (and mostly American) events influenced sleep of the “UP Community”. Here data about all (or at least a lot of) UP users were aggregated and shown. This is Big Data! This is a wonderful and quantitative insight on the impact of public event on sleep! But this is also a public display of (aggregated) individual data (something that UP users most probably agreed by default when accepting the policy, sometimes when they first used their device).

The second way came from Fitbit, also via e-mail. There was written how many steps I took in total as well as my most and least active periods / days of 2013. At the bottom there was a link to a public page comparing distances traveled in general with what it could mean in the animal kingdom (see below or here). This is not Big Data (although I am sure Fitbit have access to all these data). But at the same time (aggregated) individual data are not shared with the general public (although here again I am sure a similar policy apply to Fitbit users).

Different companies, different ways to handle the data … I hope people will realise the implication of sharing their data in an automated ways in such centralized services.

Fitbit2_20140117-075745

More sleep with Fitbits

After a bit less than 2 hours, jepsfitbitapp retrieved my sleep data from Fitbit for the whole 2013 (read previous post for the why (*)). Since this dataset covers the period I didn’t have a tracking device and, more broadly, I always slept at least a little bit at night, I removed all data point where it indicates I didn’t sleep.

hours alseep with FitbitSo I slept 5 hours and 37 minutes on average in 2013 with one very short night of 92 minutes and one very nice night of 12 hours and 44 minutes. Fitbits devices do not detect when you go to sleep and when you wake up: you have to tell tem (for instance by tapping 5 times on the Flex) that you go to sleep or you wake up (by the way this is a very clever way to use the Flex that has no button). Once told you are in bed the Flex manages to determine the number of minutes to fall asleep, after wakeup, asleep, awake, … The duration mentioned here is the real duration the Fitbit device considers I sleep (variable minutesAsleep).

Visually it looks like there is a tendency to sleep more as 2013 passes. But, although the best linear fit shows an angle, the difference between sleep in March and sleep in December is not significant.

R allows to study the data in many different ways (of course!). When plotting the distribution of durations asleep it seems this may be distributed like a normal (Gaussian) distribution (see the graph below). But the Shapiro-wilk normality test shows that the data doesn’t belong to a normal distribution.

Histogram of hours asleep in 2013Hours asleep in 2013 - Normal?As mentioned above, Fitbit devices are tracking other sleep parameters. Among them there is the number of awakenings and the sleep efficiency.

Awakenings in 2013

The simple plot of the number of awakenings over time shows the same non-significant trend as the sleep duration (above). The histogram of these awakenings shows a more skewed distribution to the left (to a low number of awakenings) (than the sleep duration). This however shows there is a relation between the two variables: the more I sleep the more the Flex detects awakenings (see second graph below).

Number of awakenings in 2013 (histogram)Relation between sleep duration and awakenings with Fitbit FlexSleep efficiency is the ratio between the total time asleep by the total time in bed from the moment I fell asleep. This is therefore not something related to the different sleep stages. However it may indicate an issue worth investigating with a real doctor. In my case, although I woke up 9 time per night on average in 2013, my sleep efficiency is very high (93.7% on average) …

Sleep efficiency in 2013… or very low. There are indeed some nights where my sleep efficiency is below 10% (see the 4 points at the bottom of the chart). These correspond with nights when I didn’t sleep a lot and also with very little awakenings (since these are related).

There is no mood tracking with Fitbit (except one additional tracker that you can define by yourself and must enter a value manually): everything tracked has to be a numerical value either automatically tracked or manually entered. It would be interesting to couple these tracked variables with the level of fatigue at wake-up time or the mood you feel during the subsequent day. I guess there are apps for that too …

The code is updated on Github (this post is in the sleep.R file).

(*) Note: I just discovered that there is in fact a specific call in the API for time series … This is for a next post!

Getting some sleep out of Fitbits

After previous posts playing with Fitbit API (part 1, part 2) I stumbled upon something a bit harder for sleep …

Previous data belong to the “activities” category. In this category it is easy to get data about a specific activity over several days in one request. All parameters related to sleep are not in the same category and I couldn’t find a way to get all the sleep durations (for instance) in one query (*). So I updated the code to requests all sleep parameters for each and every day of 2013 … and I hit the limit of 150 requests per hours.

Hours asleep (March-April 2013)This graph is what I achieved so far. I didn’t sleep much in March-April 2013: on average 4.9 hours per night. The interesting thing is that I can understand why by going back to my agenda at that time (work, study, family …). As soon as I can get additional data it would be interesting to see if sleep durations will increase later on.

(*) If you know how to get all sleep durations for 2013 in one query, let me know!