Since a few weeks, I report the raw number of COVID-19 deaths in Maryland counties. If this gives an idea of the cumulative number of deaths – which is interesting – it doesn’t reflect the fact that some counties have more inhabitants than others. That’s why I plotted below the number of COVID-19 deaths adjusted for the population (i.e. the COVID-19-specific death rate):
Today (May 16, 2020), in terms of absolute number of deaths, Montgomery, Prince Georges and Baltimore County are the top 3 counties (this is the same for cases but not in the same order). In terms of confirmed deaths per 100,000 population, the top 3 counties are Kent, Prince Georges and Montgomery.
Since a few weeks, I report the raw number of COVID-19 cases in Maryland counties. If this gives an idea of the cumulative number of cases – which is interesting – it doesn’t reflect the fact that some counties have more inhabitants than others. That’s why I plotted below the number of COVID-19 cases adjusted for the population:
Today (May 11, 2020), in terms of absolute number of cases, Prince Georges, Montgomery and Baltimore County are the top 3 counties. In terms of confirmed cases per 100,000 population, the top 3 counties are Prince Georges, Montgomery and Wicomico (due to a recent surge in cases).
Rank on May 11, 2020
Absolute # of COVID-19 cases
COVID-19 cases per 100,000 population
Prince Georges (9,687)
Prince Georges (1,057)
Baltimore County (3,948)
Baltimore City (3353)
Anne Arundel (2492)
Baltimore City (544)
This is a lot given that, today, the average for Maryland is 401/100,000 (source: CDC) and the average for the US is 552/100,000 (source: OurWorldInData).
Following up on my two previous posts (here and here), I am writing a third post on COVID-19 in Maryland because I believe we enter a new phase.
Before continuing, please note that the same disclaimer as in my previous post applies here (in short: read the CDC and MDH websites for official information).
In the first phase, the importance was to detect and make sure COVID-19 patients were treated (also: make sure not to overwhelm the healthcare system, flatten the curve, lower the baseline, & stay at home!). My two previous posts were following these efforts, thanks to daily data released by the Maryland Department of Health (MDH) on its dashboard. My second post will still be updated with the latest data from there, go read it!
This first phase is not over yet but we started to see metrics states and governments will consider in order to “reopen”. Hence this second phase is adding specifically these metrics, again thanks to the Maryland Department of Health (MDH) on its dashboard (and probably other data sources that will be linked as I use them).
In Maryland, the Governor issues a Roadmap to Recovery on April 24, 2020. In this (easy to read) document, a lot of aspects are introduced and here is what will be tracked and for how long:
“state public health officials should review the numbers of new COVID-19 daily case counts, hospitalizations, and deaths carefully” and “The results of reopening decisions will take 2 to 3 weeks to be reflected in those numbers.“
“the White House’s gating guidelines state that a 14-day downward trajectory of benchmark metrics – or at least a plateauing of rates – is required before recovery steps can begin, and before each additional recovery step can move forward“
That’s why Governor Larry Hogan tweeted his focus on April 24:
States should consider initiating the reopening process when (1) the number of new cases has declined for at least 14 days; (2) rapid diagnostic testing capacity is sufficient to test, at minimum, all people with COVID-19 symptoms, including mild cases, as well as close contacts and those in essential roles; (3) the healthcare system is able to safely care for all patients, including providing appropriate personal protective equipment for healthcare workers; and (4) there is sufficient public health capacity to conduct contact tracing for all new cases and their close contacts
On April 27, 2020, this is what we currently have … On the first chart, the number of positive tests is increasing (probably due to the increase of testing done), hospitalizations and deaths are slowly going up, overall. On the third chart, it seems the number of people in ICU is plateauing. Below these charts, I’ll post the updated charts as days are passing …
Updated charts (look at the date at the bottom right):
Since the Maryland Department of Health (MDH) started to display number of COVID-19 cases for each Zip code in its dashboard, I was wondering how to display this information in a nice way. The MDH display the information as a map – very nice but it lacks from where each Zip code came from: is the number of cases increasing or decreasing?
Following on my busy chart with the evolution of all Zip codes (and highlighting just one of them – that may not be the one you are interested in, see previous post), I created a simple dashboard where you can select the Zip code you are interested in and see how cases are evolving. You can play with it here: https://jepoirrier.shinyapps.io/md-coronavirus-zip-app/ (screenshot below). Enjoy!
Following up on my previous post, here are updated trends in Coronavirus cases in Maryland (USA), the state I live in. I am writing a second post because the Maryland Department of Health (MDH) updated its dashboard with way more data than before (more on this below). Before continuing, please note that the same disclaimer as in my previous post applies here (in short: read the CDC and MDH websites for official information).
The new type of data that the MDH released is: the total number of hospitalizations and releases, more granular age categories and the number of cases by sex. And on March 28, we saw the return of the number of negative tests!
Here are the plots that I will try to update daily (check the bottom right of charts to see when it was last updated):
On March 28, MDH reintroduced the total number of negative cases (11,516). Having the total number of cases done is important because it allows to understand better the disease dynamic than just the number of positive cases …
Suppose you have 992 positive cases (like on March 28) but no total number of cases tested. It’s a lot – or maybe it’s not much, who knows? It depends on how many were tested. Up to that day, imagine that only 1,000 people were tested – this becomes a lot of positive cases because 99% of people tested turned out to be positive. Now, MDH said they actually tested 12,508 people – this means that 7.9% of people tested turn out to be positive. Given the few tests available, testing is reserved for people who are believed to be at risk (more or less ; read the MDH testing FAQ here). So less than 10% of people tested (thought to be at risk) turn out to be actually infected. That’s good!
End of March, the MDH also released more granular data on the age categories of the people tested positive. Age groups 30-39 and 40-49 have the most cases. Therefore, mostly adults are impacted, probably among people working (who are not or can’t do social distancing). Given hospitalization and death rates are lower in these age groups than in older adults (most hospitalizations, ICU admissions and deaths are occurred among adults aged ≥65 years with the highest percentage of severe outcomes among persons aged ≥85 years, according to the March 26 CDC paper), we’ll hopefully see less dreadful cases in adults than in older adults.
On March 30, the MD governor decided that everyone should stay at home, except for food and medicine shopping. The recent days see an increase in cases but especially an increase of deaths, due to an outbreak in a nursing home.
We entered April and the number of cases continued to increases. On April 3rd, the MDH page on coronavirus got enriched with a very nice dashboard with a lot of data:
On April 5, we could see that all numbers are continuing to climb. Frederick County and Baltimore County are shooting up (with Frederick County now being the first county in number of cases). I’ve added a chart with the daily number of cases and it’s hard to grasp that nearly 500 people received a positive COVID-19 test result today (in 1 day) (that’s about an entire elementary school, except the age category doesn’t match)! And we are not even in a state where the number of cases explode … We also see that all adults [30-59] have most of the cases, less for older adults and even less for children and teenagers.
On April 7, Maryland continues to see an increase in all cases, hospitalizations, in all counties and all age groups. Since yesterday, we have more than 1,000 hospitalizations. We went about 100 deaths today. And, for the first time, Prince George’s County has more than 1,000 positive cases of COVID-19 (+104 from yesterday).
On April 9, Maryland continues to see an increase in all cases, hospitalizations, in all counties and all age groups. Today, the MDH started to display the number of cases and number of deaths by race/ethnicity. The African-American community has the most number of cases, followed by the White community – but we shouldn’t forget about the “Other” and “Data Not Available” categories. On top, as usual, although it’s unfortunately not surprising that the African-American community is harshly impacted, one should keep in mind that without the total number of tests done by community, there is little we can say. Given the percentage of positive tests is about 15% overall, one should see if this percentage is similar by community or not.
As the number of cases continues to increase and the data made available by the Maryland Department of Health also increases, I went back to the code and changed a few things (mainly to help maintain it on a daily basis). One choice I made is kind of breaking things: from now on, the trend by age group will not start at the same date as all charts. That’s because MDH changed their reporting of age structure on March 27. I no longer report the previous data (for age groups). I also added the trends of number of deaths by county and the gender distribution.
On April 12, the total number of positive cases in Prince George’s County is now above 2,000. I added the number of hospitalized. This is approximately the number of patients in hospital each day (equation: total # hospitalized – total # released) as a proxy for the number of patients currently sick (this is not perfect because of the lag in reporting, the data not available since the beginning, etc.). MDH also started to report the # of positive tests by zip code (this is very labor-intensive to transcribe these, I won’t do any chart with this data unless there is an easy way to download the data).
On April 13, cases continued to increase, with a small dip (maybe due to the weekend). Dorchester and St-Mary’s counties reported their first death due to COVID-19. On the good news side, the number of new patients released today (+147) was higher than the number of new patients hospitalized (+115). This reduced a little bit the number of patients currently hospitalized.
On April 14, I found a way to get data by ZIP code without too much hassle. I added the data to the Github repository and added the chart above. Any idea to improve the chart is welcome! (Straightforward idea: displaying ZIP codes as a map – but then we lose the temporal aspect)
On April 15, MDH added probable deaths. According to the CDC, death due to COVID-19 with certainty should use ICD-10 code U07.1. Where the cause of death is established with a clinical or epidemiological diagnostic (but with inconclusive or without lab results), the ICD-10 code U07.2 is proposed. There is a 3rd level of uncertainty, when the cause of death is “probable” or “likely” COVID-19. There, this guidance doesn’t specify what to do (it doesn’t mean that future guidance wouldn’t refine the algorithm). I assumed here that MDH “probable deaths” are coded with U07.2. Actually, it’s easier than that: Kata D. Hall mentioned in a tweet that a “death is classified as probable if the person’s death certificate lists #COVID19 as the cause of death, but it has not yet been confirmed by a laboratory test“. In total, today, probable deaths due to COVID-19 represents 18.3% of deaths due with certainty to COVID-19.
This had implications for the death count by county. Confirmed deaths diminished in some counties (e.g. Prince George’s: -7, Montgomery: -10, Baltimore County: -14) while the number of “probable” deaths in most counties doesn’t counterbalance these loss (e.g. Prince George’s: 11, Montgomery: 14, Baltimore County: 5). This is because new deaths occurred and a new category appeared: Data Not Available (i.e. deaths for which we don’t know the county, strange).
On April 15, we also see the cumulative number of positive cases to be over 10,000 for the first time (10,032). It took 38 days to reach that number. MDH also started to report cases and deaths in the Hispanic community. One can see they were reported in the “Other” community before.
On April 16, cases and deaths are still up. Montgomery County is now over 2,000 confirmed cases. Following this article in NPR (The New Coronavirus Appears To Take A Greater Toll On Men Than On Women), I was interested to see what we could see in Maryland. We have more positive cases in women than in men; that would go in the same direction as in the article: if women seek more testing than men (something we can’t see with MDH data), it’s normal more women would turn positive than men. What the article didn’t show (and what we can see at least in Maryland) is that, of all positive tests, men seem to die in greater number (4.1% instead of 3.2%) – see table below. Note that it can be due to a higher risk of dying for men and/or simply due to the lower number of tests done in men (increasing the proportion of more urgent cases).
On April 17, we reached a total of more than 50,000 negative cases, more than 11,000 positive cases, more than 400 deaths and no new patients released. Not a good day. I stop reporting the negative tests on the charts with cases in order to better look at the cases. Negative tests are indirectly on the % of positive tests.
On April 18, all numbers are still going up. However, even if testing is progressing (but not accelerating yet), the percentage of positive cases seems to have difficulties going above 18-19% of all tests.
On April 19, numbers are still going up. At the bottom of the graph, today see more 30-39 years old dead due to COVID-19 than 40-49 years old (2 more) although this age category “benefited” from the inclusion of “age not available (DnA below). I don’t know if it’s a temporary glitch or an actual trend? Also, there seems to be a cycle of ~6 days where the daily approximate # of COVID-19 patients remaining in hospital decreases (less hospitalization and/or more releases). But it’s maybe an effect of the weekend?
On April 20, Maryland Governor Hogan announced he procured 500,000 COVID-19 tests from South Korea. Hopefully this will increase the testing capabilities in Maryland. All data still going up.
On April 21, MDH released current and past data on the number of patients daily in hospital and broken down by acute care department and ICU. I therefore removed the previous way to compute this data as it’s not useful anymore (and I overestimated this number in recent days).
On April 22, the number of total positive ever done in MD was above 60,000 (but still no drastic improvement due to the 500k tests delivered last weekend). And we have now more than 600 deaths. Baltimore County has now more than 2,000 cases since the beginning of the pandemic. And Montgomery and Prince George’s counties have both the same number of deaths (58). The number of confirmed deaths in the 80+ age group is growing fast while we still don’t see any deaths below 20 years old (that’s great!). Also, MDH played a bit with past hospitalization data on the dashboard. Thanks to Tyler Fogarty for spotting it and correcting it directly in the GitHub repo (no need to check the dashboard and re-copy past data :-))!
On April 23, cases continues to accrue and no sign yet that the tests ordered by Gov. Logan from South Korea are impacting results. However, if hospitalization and discharge procedures didn’t change, we can see a second day of decline in the number of people in hospital. Also: it’s been there since a few days now, the zip code 20783 (Hyattsville, MD: NE of Washington DC) is surging in number of cases and overtook 21215 (Baltimore, MD: MW region) today (309 cases vs. 293).
On April 26, we see the trend in testing increasing again (+7,542, we are now close to 100,000 tests since the beginning of this count). Since the last 24 hours, we saw a huge increase of negative reports and a decrease of negative reports, bringing the % of positive test at 19.22% of all tests. However, the acute care beds were increasing (+63 in the last 24h) and therefore the total number of beds occupied today too. I added the proportion of positive and negative cases and it seems that, on a daily basis, it slowly decreases.
Disclaimer: Although I work in infectious diseases, I’m not a specialist in Coronavirus. For the most up-to-date information on Coronavirus in the US, please visit the CDC website. For the most up-to-date information on Coronavirus in Maryland, please visit the Maryland Department of Health. That being said, now you can proceed at your own risk 😉
Living in Maryland during the Coronavirus pandemic, I am interested to follow the number of cases that my state has so far. The Maryland Department of Health (MDH) has now a dashboard representing the count of positive cases and the breakdown by different counties. It’s nice but it only includes the latest update and the past trend is forgotten. So I decided to plot the number of cases with whatever numbers is given on the dashboard.
Now, a bit of background … I started this as a simple exercise with no other intention than plotting the trend of cases tested positive since Maryland started reporting cases (March 9, 2020). After a few days, it stopped reporting the total of negative testing done daily. The reason was: “Now that COVID-19 testing has expanded and is available through commercial laboratories, MDH is no longer reporting negative and pending numbers of tests in Maryland. All positive results obtained by commercial laboratories are reported to MDH and included in the confirmed cases count“. Although the reason is certainly understandable, this doesn’t allow us to follow the evolution of testing in general in the state. Testing and the availability of tests is a sensitive topic in the US …
On March 15, the MDH started to report the number of positive tests in each county. Initially, only 8 (of 24) states had cases. On March 16, the total count for Anne Arundel county dropped from 2 to 1. I don’t know the reason.
On March 17, MDH reported an increase in positive cases detected higher than previously. This can be due to a lot of things (increase in testing, increase in cases per se, …). Frederick County reported its first case.
On March 19, MDH unfortunately reported the first death due to Coronavirus. The total number of positive tests reported is now above 100. Allegany County and Calvert County reported their first positive tests. Today, I also started to split my graph in 3: one for the total number of cases, one for the cases by counties and the last one by age group. This reporting by age group was started on March 17 by MDH. We see the burden is mainly in adults younger than 65 – but this may be simply due to higher level of testing in this population (again, without the number of tests done, you can’t really conclude anything).
On March 20, first cases appeared in Wicomico County and Worcester County. It seems that the number of positive tests is increasing faster (again, without the total number of tests done, you can’t really conclude anything).
On March 23, we see that all counties, except Allegany, Kent and Dorchester, have cases now. I changed the y-axis of the total number of cases to a log scale (therefore it gives a “flatter” look to the curve). But we are still in a rapidly increasing phase of the disease …
On March 24, we now have more than 300 positive tests, among which 107 in Montgomery County alone.
This year, my elder son graduated from Cub Scouts to Scouts (time flies very fast!) and I signed up to be a counselor for Programming (and Public Health) in his troop.
Today, February 1st, 2020, was Merit Badge Day and I taught 6 scouts what is programming and the basics of programming in Python (and Scratch – but they all knew that already) (and nobody chose Public Health …).
I am now sharing my presentation and a few tips and tricks. Feel free to re-use, improve and give me any feedback to make it better.
It was the first time I gave this Merit Badge and having 6 scouts is a good number. You’ll face some issues helping them start programming, especially if all of them are new to programming. Also, it’s interesting to have scouts of approximately the same age: they will have similar reactions and they will be at similar level of programming. I had 5 6-graders and one older scout: the older scout had already a higher level of programming (and he kindly helped younger scouts). Also, big mistake from first-time counselor: do not give them the WiFi password at the beginning of the session! 🙂 Ask them to pre-install Python (if they bring their Windows laptop) and only allow them on internet when coding … You’ll thank me later 😉
I went through Safety, History of programming and Programming today in about 1 hour and 20 minutes, which was a bit too long (despite the good interaction and participation).
Then I programmed with them a converter between degree Fahrenheit to degree Celsius. Typing with them and running the script line by line was a good way for them to understand basic programming concepts like variables, case-sensitivity, functions and branching. The files we used as examples and code are on GitHub. From no knowledge of Python to this temperature converter: about 1 hour.
Finally, I covered Intellectual Property and Career in 10-15 minutes. That’s a little bit short. We had no time to enter into too many details. But scouts will have the additional pointers at the end of the slides and this will be a good introduction already.
Final thought? It’s time consuming to prepare all this material (and I thank the other counselors who shared their material!) but it’s also very rewarding to see children (well, teens) discover programming! I encourage you to share things you like as Scout Counselor!
I wanted to use the TwitteR package for R since a long time, I tried but didn’t do much of it. Today I found a few minutes, followed simple recipes (I admit), and looked at the number of tweets about flu today (November 13, 2018). Result: 283 tweets in English (I wanted to focus on the USA but, for some reason, I couldn’t … yet!). That’s not a lot. But remember we are only at the beginning of the influenza season 2018-2019 in the Northern hemisphere.
After some very basic cleaning, here are the words most used: flu, influenza (obviously: I was looking for them!), rt (note to self: remove this indication of a retweet), vaccine, health and get. As I mentioned: we are at the beginning of the flu season in the Northern hemisphere, it’s still time to get vaccinated and protected against flu!
Now of course, I wanted a word cloud 😉 Here it is:
It’s basically the same graph as above. You don’t get the count but you get the feeling of how important each word is (and you get more words).
I also recently read the recent WIRED article about the need of less stats and more stories about the success of vaccines. And I was wondering if, by following tweets and people on Twitter, tweeting about flu, we could reconstruct stories about influenza and vaccination against it. I’ll try to dedicate a few minutes every now and then, during this season, to this. In the meantime, if you have additional ideas, don’t hesitate to send them to me, comment below, or contact me … on Twitter, obviously! (I’m @jepoirrier)
DISO1 – Data I Sit On, episode 1. This post is the first of a series of a few exploring data I collected in the past and that I found interesting to look at again … (I already posted about data I collected, see the Quantified Self tag on this blog)
Life is short and full of different experiences. One of the experiences I don’t specifically enjoy but is integral part of life is commuting. Although I tried to minimize commuting (mainly by choosing home close to the office) and benefit(ed) from good work conditions (flexible working hours, home working, etc.), a big change occurred when I took a new opportunity, in 2015, to work in the Belgian capital, Brussels.
From where I lived at that time, using public transportation was not a viable option, unfortunately: it implied roughly 2 hours to go one way and changing at least 2 times between bus, train and metro. Anyway Belgium is know for having lots of cars and I benefited from a company car. Since some time, I’m also interested in Quantitative Self so I started collecting data about my daily commute.
What I try to see is the seasonality of commuting (I would initially expect shorter commute time during school breaks), the differences between leaving for work after driving children to school or without driving them, … There is also an extensive literature on the impact of commuting on the quality of life …
So, how did I do that?
The route usually taken, between my home then (in Wavre) and my office then (in Brussels, both in Belgium), is 28km long and the fastest I ever saw on Google maps to drive this distance is about 20-25 minutes.
I took note of the following elements in whatever default note-taking app is there in my phone at that moment (Keep on Android, Notes on iOS). The first field in each row is the date in a %y%m%d format, i.e. year, month and day of month as zero-padded decimal numbers, 2 digits only for each. The second field is the start time in a %H%M format, i.e. hours (24-hour clock) and minutes also as zero-padded decimal numbers, 2 digits only for each. Start time is defined when I enter my car at home, in the morning. The third field is the arrival time (same format as start time), defined as when I stop the engine at work. The fourth and fifth fields are start and arrival times when I go back home, defined and formatted the same way, mutatis mutandis. Any missed start/arrival times is marked as “na” or “NA”. It corresponds, for instance, at times when I leave the office but I stop to meet a client (or more prosaically, to do grocery shopping) before coming back home. I may have missed one or two whole days at max. The data is on Github.
On a daily basis, the little game is to try to figure out which lane is the fastest, if there is a pattern in the journey that makes it faster (I think there is). However, there are so many little things to track in this game that I did not track these small differences. The journey is assumed to take more or less the same route.
At the end, the complete log is saved on my computer and analysed in R (version 3.3.2). The typical measures I’m interested in are departure/arrival times over time, commute duration over time, commute duration per month or per day of the week or per season, … for both the morning and afternoon journeys if applicable. Some funny measures should be the earliest I left for work, the latest I arrived at work, the earliest I left work, the latest I left work, the shortest journey ever (to compare to Google estimate) and the longest journey ever …
An unintended measure here is the amount of time actually spent in the office (on a side note, this is different than productivity – but I didn’t find any unambiguous or flawless measure of productivity so far …). Some interesting variations could be to see the average and median duration of my work days, the shortest day or longest day I had, … (I don’t know if my former employer would be happy or angry to see these results 😉 but note this doesn’t take into account the numerous times I worked from home, even in evenings after having worked the whole day in the office …).
In theory, the fastest I could go is at an average 84km/h (28km in 20 minutes, according to Google Maps, so this is according to traffic, not maximum speed limits). In practice, this is a whole different story …
In a bit more than a year of collected data:
the earliest I left home was 6.11 and the latest 10.11;
consequently, the earlier I arrived at work was 6.32 and the latest 10.36;
the shortest trip to work was 18 minutes and the longest one was 160 minutes (it was on March, 22, 2016, the day of Brussels airport bombing because the office is close to the airport – I still remember);
the earliest I left work was 12.34 (I assume half-day of holidays) and the latest 21.24 (I assume lots of work then);
consequently, the earlier I arrived back home was 12.59 and the latest 21.43;
the shortest trip back home was 7 minutes (there should be some input error here!!!) and the longest trip was 128 minutes (nothing surprising, here, with Brussels traffic jams).
Finally, the shortest stay in office was 242 minutes (4 hours and 2 minutes) – it was that half-day of holidays. And the longest stay in office was 754 minutes (12 hours and 34 minutes).
As always, these things are nice when rendered as graphs …
A first note it that none of these graphs show any seasonality in the data. At first, I thought I would go faster during school holidays – but it was more a feeling than anything else, as the data show. And although the time at work varied widely over time, the average time spent at work seems to be pretty constant over the year, I was surprised by this:
Finally, the time spent in car depending on the departure time is interesting:
Going to work was clearly split into 2 periods: leaving home (“Start Time”) before 8.30 and after 8.30. That’s because either I went early (and avoided the morning rush hour) or I drove the kids to school and drove to work at the end of rush hour. But although I tried to minimize the journey, the journey after driving the kids to school was still taking more time.
For the evening, going home became a shorter trip if I was able to delay it. And the later I come back, the shorter the trip. (However, if I didn’t drive the kids to school in the morning, the deal is that I would pick them up in the afternoon – fortunately, afterschool care is cheap in Belgium).
All this to come to the quality of life … I didn’t measure anything related to quality of life. I just remember that the first few weeks were very tiring. However, this commuting factor should be added to other tiring factors: learning a new job, adjusting to a new environment, etc. But there is a body of scientific work looking at the quality of life of commuting (I really like this paper as a starter , probably because it was published during that period): fatigue, stress, reduced sleep time, heart disease, absenteeism, BMI (weight), … are all linked – in a way – to commuting (either driving or just sitting in public transport).
 Künn‐Nelen, A. (2016) Does Commuting Affect Health?Health Econ., 25: 984–1004. doi: 10.1002/hec.3199
And a last point: privacy. This data is from 2015-2016. People who know me (even former colleagues!) know where I worked. And even without knowing me, you know when I leave home, when I leave the office, my pattern of organization, etc. Do I want that? Part of the answer is that I only post this data now, 2-3 years later. On the other hand, here is another free, small dataset!
Next steps? I’m continuing to track my journeys to work, even now we moved to the USA. For privacy reasons, I will not publish those data immediately. But it will be interesting, later, to compare the different patterns and try to understand at least some differences … It would also be interesting to give more time to this small experiment and, for instance, try to capture any impact on mood, productivity, … But this would become a whole different story!
According to CDC data, studies are getting better at estimating the influenza vaccine effectiveness.
With the 2017-2018 flu season still going on in the USA, there are already some indication that vaccines have some effectiveness (although its target strains were mismatched). The CDC reports how it measures vaccine effectiveness here and I was interested in their confidence intervals (the interval that takes into account uncertainties to extrapolate to the broader, unknown population).
Here is the same graph as on the CDC page, but with confidence interval:
* 2016-2017 VE are still estimates. ** 2017-2018 interim early estimates may differ from final end-of-season estimates.
You can already notice it above but the graph below confirms that the confidence interval becomes narrower with the various flu season. This can come from various reasons. One obvious reason is that early seasons (< 2007-08) had a very small sample size (< 1,000). But overall, we can notice a gain of certainty around the effectiveness (the lower the line below, the more certainty).
* 2016-2017 VE are still estimates. ** 2017-2018 interim early estimates may differ from final end-of-season estimates.
As usual, the dataset (and code to generate the graphs above) are on my Github repo.