I wanted to use the TwitteR package for R since a long time, I tried but didn’t do much of it. Today I found a few minutes, followed simple recipes (I admit), and looked at the number of tweets about flu today (November 13, 2018). Result: 283 tweets in English (I wanted to focus on the USA but, for some reason, I couldn’t … yet!). That’s not a lot. But remember we are only at the beginning of the influenza season 2018-2019 in the Northern hemisphere.
After some very basic cleaning, here are the words most used: flu, influenza (obviously: I was looking for them!), rt (note to self: remove this indication of a retweet), vaccine, health and get. As I mentioned: we are at the beginning of the flu season in the Northern hemisphere, it’s still time to get vaccinated and protected against flu!
Now of course, I wanted a word cloud 😉 Here it is:
It’s basically the same graph as above. You don’t get the count but you get the feeling of how important each word is (and you get more words).
I also recently read the recent WIRED article about the need of less stats and more stories about the success of vaccines. And I was wondering if, by following tweets and people on Twitter, tweeting about flu, we could reconstruct stories about influenza and vaccination against it. I’ll try to dedicate a few minutes every now and then, during this season, to this. In the meantime, if you have additional ideas, don’t hesitate to send them to me, comment below, or contact me … on Twitter, obviously! (I’m @jepoirrier)
DISO1 – Data I Sit On, episode 1. This post is the first of a series of a few exploring data I collected in the past and that I found interesting to look at again … (I already posted about data I collected, see the Quantified Self tag on this blog)
Life is short and full of different experiences. One of the experiences I don’t specifically enjoy but is integral part of life is commuting. Although I tried to minimize commuting (mainly by choosing home close to the office) and benefit(ed) from good work conditions (flexible working hours, home working, etc.), a big change occurred when I took a new opportunity, in 2015, to work in the Belgian capital, Brussels.
From where I lived at that time, using public transportation was not a viable option, unfortunately: it implied roughly 2 hours to go one way and changing at least 2 times between bus, train and metro. Anyway Belgium is know for having lots of cars and I benefited from a company car. Since some time, I’m also interested in Quantitative Self so I started collecting data about my daily commute.
What I try to see is the seasonality of commuting (I would initially expect shorter commute time during school breaks), the differences between leaving for work after driving children to school or without driving them, … There is also an extensive literature on the impact of commuting on the quality of life …
So, how did I do that?
The route usually taken, between my home then (in Wavre) and my office then (in Brussels, both in Belgium), is 28km long and the fastest I ever saw on Google maps to drive this distance is about 20-25 minutes.
I took note of the following elements in whatever default note-taking app is there in my phone at that moment (Keep on Android, Notes on iOS). The first field in each row is the date in a %y%m%d format, i.e. year, month and day of month as zero-padded decimal numbers, 2 digits only for each. The second field is the start time in a %H%M format, i.e. hours (24-hour clock) and minutes also as zero-padded decimal numbers, 2 digits only for each. Start time is defined when I enter my car at home, in the morning. The third field is the arrival time (same format as start time), defined as when I stop the engine at work. The fourth and fifth fields are start and arrival times when I go back home, defined and formatted the same way, mutatis mutandis. Any missed start/arrival times is marked as “na” or “NA”. It corresponds, for instance, at times when I leave the office but I stop to meet a client (or more prosaically, to do grocery shopping) before coming back home. I may have missed one or two whole days at max. The data is on Github.
On a daily basis, the little game is to try to figure out which lane is the fastest, if there is a pattern in the journey that makes it faster (I think there is). However, there are so many little things to track in this game that I did not track these small differences. The journey is assumed to take more or less the same route.
At the end, the complete log is saved on my computer and analysed in R (version 3.3.2). The typical measures I’m interested in are departure/arrival times over time, commute duration over time, commute duration per month or per day of the week or per season, … for both the morning and afternoon journeys if applicable. Some funny measures should be the earliest I left for work, the latest I arrived at work, the earliest I left work, the latest I left work, the shortest journey ever (to compare to Google estimate) and the longest journey ever …
An unintended measure here is the amount of time actually spent in the office (on a side note, this is different than productivity – but I didn’t find any unambiguous or flawless measure of productivity so far …). Some interesting variations could be to see the average and median duration of my work days, the shortest day or longest day I had, … (I don’t know if my former employer would be happy or angry to see these results 😉 but note this doesn’t take into account the numerous times I worked from home, even in evenings after having worked the whole day in the office …).
In theory, the fastest I could go is at an average 84km/h (28km in 20 minutes, according to Google Maps, so this is according to traffic, not maximum speed limits). In practice, this is a whole different story …
In a bit more than a year of collected data:
the earliest I left home was 6.11 and the latest 10.11;
consequently, the earlier I arrived at work was 6.32 and the latest 10.36;
the shortest trip to work was 18 minutes and the longest one was 160 minutes (it was on March, 22, 2016, the day of Brussels airport bombing because the office is close to the airport – I still remember);
the earliest I left work was 12.34 (I assume half-day of holidays) and the latest 21.24 (I assume lots of work then);
consequently, the earlier I arrived back home was 12.59 and the latest 21.43;
the shortest trip back home was 7 minutes (there should be some input error here!!!) and the longest trip was 128 minutes (nothing surprising, here, with Brussels traffic jams).
Finally, the shortest stay in office was 242 minutes (4 hours and 2 minutes) – it was that half-day of holidays. And the longest stay in office was 754 minutes (12 hours and 34 minutes).
As always, these things are nice when rendered as graphs …
A first note it that none of these graphs show any seasonality in the data. At first, I thought I would go faster during school holidays – but it was more a feeling than anything else, as the data show. And although the time at work varied widely over time, the average time spent at work seems to be pretty constant over the year, I was surprised by this:
Finally, the time spent in car depending on the departure time is interesting:
Going to work was clearly split into 2 periods: leaving home (“Start Time”) before 8.30 and after 8.30. That’s because either I went early (and avoided the morning rush hour) or I drove the kids to school and drove to work at the end of rush hour. But although I tried to minimize the journey, the journey after driving the kids to school was still taking more time.
For the evening, going home became a shorter trip if I was able to delay it. And the later I come back, the shorter the trip. (However, if I didn’t drive the kids to school in the morning, the deal is that I would pick them up in the afternoon – fortunately, afterschool care is cheap in Belgium).
All this to come to the quality of life … I didn’t measure anything related to quality of life. I just remember that the first few weeks were very tiring. However, this commuting factor should be added to other tiring factors: learning a new job, adjusting to a new environment, etc. But there is a body of scientific work looking at the quality of life of commuting (I really like this paper as a starter , probably because it was published during that period): fatigue, stress, reduced sleep time, heart disease, absenteeism, BMI (weight), … are all linked – in a way – to commuting (either driving or just sitting in public transport).
 Künn‐Nelen, A. (2016) Does Commuting Affect Health?Health Econ., 25: 984–1004. doi: 10.1002/hec.3199
And a last point: privacy. This data is from 2015-2016. People who know me (even former colleagues!) know where I worked. And even without knowing me, you know when I leave home, when I leave the office, my pattern of organization, etc. Do I want that? Part of the answer is that I only post this data now, 2-3 years later. On the other hand, here is another free, small dataset!
Next steps? I’m continuing to track my journeys to work, even now we moved to the USA. For privacy reasons, I will not publish those data immediately. But it will be interesting, later, to compare the different patterns and try to understand at least some differences … It would also be interesting to give more time to this small experiment and, for instance, try to capture any impact on mood, productivity, … But this would become a whole different story!
According to CDC data, studies are getting better at estimating the influenza vaccine effectiveness.
With the 2017-2018 flu season still going on in the USA, there are already some indication that vaccines have some effectiveness (although its target strains were mismatched). The CDC reports how it measures vaccine effectiveness here and I was interested in their confidence intervals (the interval that takes into account uncertainties to extrapolate to the broader, unknown population).
Here is the same graph as on the CDC page, but with confidence interval:
* 2016-2017 VE are still estimates. ** 2017-2018 interim early estimates may differ from final end-of-season estimates.
You can already notice it above but the graph below confirms that the confidence interval becomes narrower with the various flu season. This can come from various reasons. One obvious reason is that early seasons (< 2007-08) had a very small sample size (< 1,000). But overall, we can notice a gain of certainty around the effectiveness (the lower the line below, the more certainty).
* 2016-2017 VE are still estimates. ** 2017-2018 interim early estimates may differ from final end-of-season estimates.
As usual, the dataset (and code to generate the graphs above) are on my Github repo.
While parsing the general literature, I found this paper from van der Heide et al. (2017) giving some numbers about end-of-life decisions in the Netherlands these past 25 years. I was wondering if one could see similar evolution in Belgium. And I didn’t have to look very far: van der Heide cited another NEJM paper with Belgian numbers (Chambaere et al., 2015 ; an attentive reader will notice “Belgian” data is “only” about Flanders, not the whole Belgium).
If you put together the data about euthanasia itself (not counting other type of end-of-life assistance), you obtain approximately the same proportion and evolution:
I’m not aware of more recent Belgian data using the same methodology (i.e. physician interviews). The Belgian Commission fédérale de Contrôle et d’Évaluation de l’Euthanasie (CFCEE) presented its last report in October 2016. This report contained numbers for years 2014 and 2015. But these numbers were related to euthanasia that were officially requested (and granted) by the Commission. For instance, the Commission granted 1 928 euthanasia for a total of 104 723 deaths in Belgium in 2014 (i.e. 1.84% ; deaths in Belgium in the Open Data repository). If we focus only on requests written in Flemish, we find 2.59% of euthanasia in 2014 (1 523 euthanasia for a total of 58 858 deaths) (note: Flemish is the language spoken in Flanders – the region targeted by interviews in the Chambaere et al. paper – but requests in Flemish might have originated from other regions). One might have found different numbers if one would have used interviews like van der Heide or Chambaere.
A few weekends ago, I was challenged by a friend to do more steps than him. Of course, I won 😉 But I noticed he was wearing his activity tracker on his wrist while I was wearing mine on my waist. As I noticed several times before, when I had an activity tracker on my wrist, these devices tend to capture some movements even if you don’t actually walk (while typing energetically on the computer or while driving for instance).
A few words of appreciation for an open source software that can help you a lot in your work, Engauge Digitizer (ED) from Mark Mitchell. ED is a simple, straightforward curve digitizer: it takes images with graphs like the one below and transform them (with a little help) in data you can use later on.
After abandoning my Fitbit device in January because using it didn’t see improvement in my weight (see previous post), I was wondering if I could still measure my risk to develop cardiovascular diseases and other preventable chronic diseases (diabetes e.g.). So, still sitting at my desk (something I do for more than 8 hours a day in theory – probably more in practice), I looked into the ways to monitor my risk for these diseases …