Sometimes, you think that you found something interesting but the Maryland Department of Health is already presenting it on its COVID-19 dashboard 😀
For instance, I calculated the percentage of residents of the different counties ever tested (regardless of the test result). I found out that a third of Maryland counties (8/24) tested at least once more than 25% of their residents. Indeed, as of yesterday (August 10), here are the counties in that category:
County (alphabetical order)
% population ever tested
Maryland counties with more than 25% of their population tested for COVID-19 on August 10, 2020
While we are at it, here are the 5 counties with less than 20% of their population tested (still as of August 10, 2020):
County (alphabetical order)
% of population ever tested
Maryland counties with less than 20% of their population tested for COVID-19 on August 10, 2020
Graphically, we see that all counties are testing more and more, and increasing at approximately the same speed:
As you can see, there are 2 minor issues with the dataset from the MDH API. First, Somerset reported more than double the normal number of tests on June 18, 2020; it went back to “normal” on the next day (I suspect an encoding error here, see highlight below). Then, there is no data after July 7; data resumes on July 13 (a posteriori, I don’t recall reading any issue about county data collection during that time). None of these prevents looking at the current data.
Now, as I mentioned, the official dashboard has already this data, presented by quartile, as a kind of competition between counties 😉 … (the % are slightly different, probably because we are using different sources for the population totals – I’m using the population projections from the Maryland Department of Planning).
Since mid-July 2020 in Maryland, we understood that the 20-59 yr age group was problematic, especially the 20-29 yr age group that is racing to overtake all age groups in terms of number of COVID-19 cases (relative to their population, see top chart below).
In terms of COVID-19 hospitalizations, we also saw a small rebound (see chart below; it seems that it subsides since beginning of August).
But what we didn’t know (for this small peak as well as since the beginning) was what is the age of these hospitalized populations. Did these hospitalizations impacted more the older adults? The younger ones? Or the children? The Maryland Department of Health COVID-19 dashboard doesn’t report that information (nor in the API).
Now, the CDC also has an interactive graph where you can see and filter the data by yourself. Here is the situation up to August 9, 2020, for Maryland:
The peak of April-May is well represented, with the 85+ population reaching a peak at nearly 100 weekly hospitalizations per 100,000 pop. All the other age groups increased during that time, the older the higher (unfortunately).
Now, since July, we see some of these age groups increase again. At the end of July:
Weekly hospitalization rate
Weekly hospitalization rates for the week of July 27, 2020 in Maryland, MD, USA
This, in my opinion, reinforce the view that, cases might be increasing in the younger population (also thanks to testing being more available) and children and young adults might be less impacted when infected. But the older population is the first impacted by any increase in cases. It was true in April-May. It is again the case with this small peak. If we should take preventative measures to contain COVID-19, it is for us – but especially for the older population, our parents.
We moved our family from the US (Maryland, just in case you didn’t know yet) to Belgium – no big deal. During the COVID-19 pandemic, in July-August 2020 – now we’re talking …
I wrote this post to document our journey. We were (and still are) extremely privileged to have been able to do this, in the conditions we did it. The journey is not over. I’ll update and continue to document it until we fall back into something more “normal” … [long post]
Total Number Released from Isolation data layer is a collection of the statewide cumulative total of individuals who tested positive for COVID-19 that have been reported each day by each local health department via the ESSENCE system as having been released from home isolation. As “recovery” can mean different things as people experience COVID-19 disease to varying degrees of severity, MDH reports on individuals released from isolation. “Released from isolation” refers to those who have met criteria and are well enough to be released from home isolation. Some of these individuals may have been hospitalized at some point.
Definition of “release from isolation” according to the MDH API (emphasis is mine)
Therefore, mentioning the number of patients released from (home) isolation just below the current number of patients in hospital (as it is currently the case on the MDH dashboard) is a bit misleading: this metric is related to home isolation, which is very different than isolation in hospital.
According to the MDH FAQ on isolation and quarantine, there is no mandate to isolate newly diagnosed positive COVID-19 cases. These cases should follow their healthcare provider’s guidance. In the same document, it also appears that there is no mandate to be notified of the end of isolation. There is a guidance with 3 conditions from the CDC (≥ 10 days since first symptoms, ≥ 24 hours without fever and all COVID-19 symptoms are better – note they don’t need to disappear). These are exactly the conditions written in the CDC Guidance on Discontinuing Home Isolation for Persons with COVID-19 (consulted on August 5, 2020). So I don’t know exactly how this “released from isolation” data is collected.
To add to the confusion, the API page indicates that the data is provided by Maryland’s ESSENCE (Electronic Surveillance System for the Early Notification of Community-based Epidemics). But this system takes most if its data from Maryland acute care hospitals. So would that mean that hospitals direct the end of home isolation and report these numbers? It could be patients released from hospital and asked to isolate at home: so far, there were a total of 12,888 hospitalizations due to COVID-19 in Maryland. It would mean less than half of these patients would have been asked to continue to isolate at home after their hospital release (up to August 4, 2020, the data says that a total of 5,740 COVID-19 cases were released from home isolation). This is 1/20th of the total of positive cases so far (91,854) so I’m not sure we can link these two metrics.
On a daily basis, the second chart (above) shows a kind of cycle with a peak around mid-May – early-June and a trend that increases again end of July. This blue, smoothed curve really looks like the curve of positive cases so I plotted them both below (cases in blue, releases in red). We can see the 2 peaks for both curves but we can’t really distinguish any relay in patients released (red), compared to the positive cases (blue) (this would have made sense since releases follow reporting of cases, by definition – a confounding factor may be the delay in case reporting that may blur the time difference). But the graph also shows us the difference in magnitude between the number of cases (high) and, indirectly, the number of people that were in isolation at home (then released – low).
So I still don’t really know what to think of this metric. If you have any idea, please tell me! Thanks!
Note: if you just look for where to get tested in Maryland, the official information is here.
On the graph above, one can see that, up to the week ending on July 25, a bit more than 19,000 COVID-19 cases were recorded in covidLINK, Maryland’s contact tracing initiative. However, only a bit more than 18,000 of these cases had a phone number registered and only 11,504 were successfully interviewed (this take into account people who refused to be interviewed, who cannot be accessed, etc.). If the number of cases registered and the ultimate number of cases interviewed keep increasing (something good), one should remember that, on July 25, Maryland counted 83,054 positive cases (reported in ESSENCE). This leads us to only 22% of all positive cases were reported in covidLINK (blue bars below) and only 13% of all positive cases were interviewed (pink bars below). These trends increased (good) but if only 1/7th of all cases are interviewed, that’s not a lot and we are missing a lot of potential transmissions.
The MDH also report some information on contacts from these cases. One can see below that more and more contacts are … contacted (!) and their interviews are also increasing. At the end of the week (ending on) July 25, 24,260 contacts were registered by all cases and 11,816 of these contacts were interviewed. If cases who are also contacts are not counted in the two datasets, it’s so far a total of more than 23,000 cases and contacts who were interviewed for their symptoms and contacts in relation to COVID-19!
Now, we reach an average of 2.85 contacts declared by positive case (that’s not a lot! The covidLINK FAQ doesn’t mention how far in the past contact tracers go back, just that they “may ask about your whereabouts for a specific period of time”). And on average, 1.39 contact per case are interviewed …
That’s a already gigantic work that has been done by the 1,350 contact tracers! More resource and more cooperation will increase these metrics (and it’s badly needed!). But that’s already a first look at the necessary contact tracing operation in Maryland.
But also every day, there is one thing that constantly changes: how everyone is calculating the COVID-19 positivity rate. Today (July 26), for instance, the different daily positivity rates announced are: 3.77% (Hogan), 4.47% (Hogan again in the same tweet, Hall, Ricci, MD Health Department, Fogarty) and ~6% (for me, the exact number behind the ~ is 6.14%). This doesn’t show the 7-days (or n-days) averages and other measures. And this is only on Twitter.
Why are these numbers different? Which one is correct?
First, let me get rid of the second question: all of them are mathematically correct. What value you give to them is highly depending on what you are looking at or looking for.
So how are these numbers different? Let’s calculate all of them … Today, the Maryland Department of Health COVID-19 dashboard indicates:
This gives a total of 838,572 cumulative unique tests (# confirmed cases + # persons tested negative) since the beginning of the pandemic. And it gives an overall unique positivity rate of 10.06% (# unique confirmed cases / # unique total). I added “unique” as all these numbers are only counting each person once per test (if someone is tested negative several times, he/she will show up only for 1 negative test). As shown in the chart below, this overall unique positivity rate is growing up fast when cases are increasing but is very slow to go down when cases are diminishing. In consequence, this overall positivity rate will reach 0% in a very, very distant future (almost never will as we will always have cases from the beginning).
Now we may be interested in the total testing volume (1,097,361 today): this is the total of all tests, whether results are always the same or different for the same person. Imagine a doctor being tested every week for COVID-19; for 3 weeks, she is negative (= 1 unique negative test but 3 negative tests in total) until she is found positive on week 4 (= 1 unique positive test = 1 positive test in total); after 2 weeks in quarantine at home, she is again tested negative before returning to work (= 1 unique negative test but 1 positive and 4 negative tests in total). The total testing volume is simply the addition of all tests ever done in Maryland. If you divide the # of unique confirmed cases by the total testing volume, you have an overall positivity rate of 7.64%. I personally don’t like this metric because it mixes unique positive cases with repetitive total cases. As seen in the plausible example above, the total number of unique positive tests and the total number of all positive tests is probably very close (unless positive people are tested positive several times) and it could give a good estimate of the positivity rate.
But to create even more confusion, positive cases are reported from ESSENCE (Electronic Surveillance System for the Early Notification of Community-based Epidemics, click on Biosurveillance here; they write weekly reports that are a trove of information – this may be for a later post). And negative cases are reported from NEDSS (National Electronic Disease Surveillance System, also from the CDC). And the total testing volume is given from all lab results transmitted electronically to the state. It is clearly stated that all results transmitted non-electronically are not taken into account. Having 3 different sources, counting cases differently, doesn’t help reporting – but this highlight the difficulty to present a comprehensive figure. If we plot them all on the same figure, this is what it gives:
As discussed above, the % positive from the cumulative count (green line) will always be high and go down slowly. The % positive of daily reported (violet line) is fluctuating a lot and seems to be often higher than the % positive of daily reported electronically (blue line). This high level of fluctuation is the reason why the MD Health Department has a 5-days average of the % positive of daily reported electronically (red line).
Understanding the positivity rate is important because it gives an indication of the severity of the disease. In this respect, we see that Maryland did well to reduce the severity of this disease, so far, with a positivity rate going down since early May. But the positivity rate can also be read as an indicator that the state is doing relatively good on testing (usually, a high positivity rate is associated with too few testing, only testing the most severe cases). But positivity rate can be influenced by many factors that cannot be understood from these graphs only … One of these factors is the test selection: now that Maryland allows anyone to be tested, one could reasonably thing that the samples tested are more representative of the disease in the state than when only a very restricted set of patients could have been tested (before May 19, 2020). Another key parameter is how long testing takes before giving results. All the numbers above are for when tests results were reported. When these tests were performed is not disclosed (there are discussions online that tests results take several days to several weeks to arrive – if this is true, the % positive we see now is merely a photo of what happened mid-July and not now or last week). And to add to the confusion, I’m sure tests results from different labs are reported at different speed.
All in all, data we see here is a fuzzy picture of what happened in a relatively close past. If figures go down, fine. If they tend to go up, we’ll have to be careful that we are not further up than estimated here.
Indeed, in a nutshell, in Maryland (like in the rest of the world), women are more impacted than men by the disease. But men are dying of the disease a little bit more than women.
Note: this post was updated on July 15, 2020, to fix an error in my code!
Now for the details …
In terms of positive COVID-19 tests / cases, the difference between men and women started early in April, with the number of positive tests or cases in women increasing faster than men over time. Today (July 15, 2020), Maryland counted a cumulative 39k positive cases for women and a cumulative 35.9k positive cases for men. The number of new cases in men and women in Maryland follows (of course) the trend in new cases, with peaks in May, a decrease until now and a fear for new increase of cases now (see bottom graph, below).
Even if we take into account the number of cases relative to the population of each gender, because there is approximately the same number of men and women in Maryland (2.9 mio men, 3.1 mio women, from the MD department of Planning), women always saw more cases than men (even if by just a little bit). Today, here is the data (also see graph below):
Cumulative COVID-19 cases / 100,000 pop.
July 15, 2020
In terms of deaths, we see the opposite trend: since the beginning of data reporting, there were always more men who died of COVID-19 than women. On a daily basis, it’s less clear (and since I’m not smoothing nor averaging anything, it’s a bit jagged) but the overall result remains the same.
Even when we consider deaths relative to the respective populations, men die in larger numbers to their population (than women) and this is consistently the case since the beginning of data availability (see also chart below):
After my previous post on the age of COVID-19 cases in Maryland, it was logical that I write about the age of COVID-19 deaths in Maryland. So far, media and State Departments of Health all agreed that the older someone is, the more risk this person has to die from coronavirus.
So far, this is unfortunately also true in Maryland. In the graph below, we clearly see that people 50-59 years old have more than 250 deaths, people 60-69 have more than 500 deaths, people 70-79 have more than 750 deaths and people 80+ have nearly … 1,5000 deaths! The graph at the bottom also clearly shows that people in age categories 60 and above provide most of the new daily deaths due to COVID-19 (even if we came back down from a peak at about 40 deaths in 80+ at the end of April).
The simpler section at the latest date for which death data by age is available (i.e. today, July 9th, 2020) also shows this curve highly skewed towards older age groups (at the bottom, compare that to cumulative cases, on top):
The two graphs below confirm that people in old age are at much higher risk of death due to COVID-19. On top, if we report the deaths in each age group by the population they actually are in Maryland, we also see that deaths in 80+ disproportionaly affect this age group, reaching a COVID-19-specific mortality rate of 629 per 100,000 pop.!!! The table under the graph gives all the data points.
And when we look at it to see the relative importance of each age groups compared to the total number of cases, we see again that people aged 80+ have 46% of all deaths, followed by people 70-79 (25%) and people 60-69 (16%).
COVID-19-specific mortality rate, by age group, in Maryland, on July 9th, 2020
As opposed to cases by age, we don’t see here any shift in most affected age group: the older some is, the more risk of dying from COVID-19 exists (and part of the problem is the close living conditions in nursing homes). There aren’t 1,000 solutions to protect them: wear a mask and practice physical distancing, especially when there is a risk to meet elderly people and transmit the disease to them!
We recently heard in the US media that, if COVID-19 affected more the older population, beginning of 2020, the younger population was now more affected, especially young adults (various reasons were mentioned: the various academic breaks, being more active or “forced” to work, the sentiment of invincibility …). I wanted to see if one could see a similar trend in Maryland.
If you look at the section of the Maryland population by age (graph below), as of today (July 9, 2020), you see that cumulatively, people 30-39 have the majority of cases, followed by people aged 40-49, 50-59 and 20-29 years old. There are relatively few cases above 70 years old and fewer cases below 20 years old.
This snapshot doesn’t show a trend we indeed saw in the past few weeks. In the chart below, representing the cumulative cases by age categories, one can see a faster increase of cases in 20-29 years old (than the increase in, let’s say, 40-49 years old) – since mid-May. This fast increase is such that one could predict that 20-29 years old will soon have more cases than 40-49 years old and become the 3rd age group with most cases.
Two other age groups also saw their number of new cases accelerates, at a lower rate than 20-29 but still: children (both groups below 20 years old) seem to catch up with the older group (both group above 70 years old). This needs to be watched and, ideally, prevented!
Note the bottom graph shows the number of daily new cases. Although it’s messy, we can see that all age groups are now adding less cases than in May but the middle aged groups (20-59) sill add more cases every day than the younger (< 20) or older (> 70) ones. I could smooth it with a 7- or 14-days average but then we wouldn’t see new trends emerge.
The direct impact of COVID-19 cases on each age category can be better grasped in the next chart, where the evolution of cases is again displayed but this time relative to the respective population in each age category. These populations by age were found from a projection from 2018, for 2020 by the Maryland Department of Planning. This demographic spread is a bit odd because all age groups below 70 years old are between 700k and 800k (I would have expected more a bell/Gaussian distribution):
Age group (years old)
Projected total population by 2020
Age pyramid of Maryland, projection from 2018 for the year 2020 From the Maryland Department of Planning, August 2018 / OpenData Maryland
In the top chart, below, one can see the evolution of cumulative cases relative to the total number of people (sick and healthy) in each age category (for instance: how many cases 70-79 years old relative to 100,000 individuals in this age category). Because of the relatively constant number of people in each age category (see table above), we find back approximately the same mix of curves. However, we should first note the high toll of people 80+ who have the highest number of cases per 100,000. We should also note the fast increase of the 20-29 years old population: they were just above the less than 20 years old in the beginning of the pandemic; they are now the 4th age group in relative cases. The table below indicates the relative cases for yesterday (July 8, 2020):
Age group (years old)
Relative COVID-19 cases (cases / 100,000 pop.)
Cumulative number of COVID-19 cases relative to population, by age group, in Maryland, on July 8th, 2020.
Another way to look at it is to see the relative importance of each age groups compared to the total number of cases. This is done in the last chart, above. We can see that around mid-April, COVID-19 cases in adults 80+ “carved” their share of number of cases. Starting in May, the share of COVID-19 cases in children below 20 also started to increase (from 1.9% on March 29 to 8.5% on July 8). Despite this, 20-29 increased their share of cases (from 13.3% on March 29 to 15.1% on July 8); 30-39 also increased their share of cases (from 16.3% on March 29 to 18.7% on July 8).
All this indicates a shift in new cases, with more and more new cases being discovered in the young adult population. This can be due to a number of factors … The first one is probably that tests were not restricted (or became widely available, without restriction) mid-May: this would have allowed people younger to be tested and therefore would have increased their share of cases. Another parameter could be that younger adults are still in the workforce and therefore more exposed and more often exposed than older adults. A last parameter could also be that some younger adults may care less about their health, may be less willing to follow state and federal rules, may be composed of more Hispanics or African-Americans – two populations specifically at risk for COVID-19 … Nevertheless, this increase / these populations should be watched carefully and reminded that they are also at risk of COVID-19 (maybe less deaths – that’s for a follow-up post – but the disease itself and its long-term consequences).
Since the beginning of the COVID-19 pandemic, we suspected and saw that nursing homes and other facilities where people are grouped together (prisons, …) could be at higher risk of transmission. The focus on nursing homes was because deaths seem to disproportionately affect the older population that also resides there. And nursing homes are also home for frail people with comorbidities.
Besides the weekly update (contrasting with the daily update on the main dashboard), the strange thing is that curves are going down! If it was a true cumulative curve, it would keep either growing (new cases are added) or it will go flat where it reached (no new case, we keep the total from last day or week).
Then you read the note below the dashboard (before the tables) and it says:
Facilities listed above report at least one confirmed case of COVID-19 as of the current reporting period. Facilities are removed from the list when health officials determine 14 days have passed with no new cases and no tests pending.
I could imagine that the reason is pragmatic: somewhere, someone stops adding cases (or deaths) if the facility doesn’t send new case (or new death) count for 14 days. But it doesn’t make sense to actively remove the facility from the list and therefore remove the cases (or deaths) that were reported earlier. Especially if the dashboard leads viewers in error by stating “Total # of Cases” as y-axis:
The article quotes the Department of Health mentioning that the other data presented is cumulative but I couldn’t find this … Indeed all datasets available include the same caveat that facilities not reporting within 14 days are removed:
If I take an example in the first few facilities that reported cases, we clearly see that this one (whichever it is, it doesn’t matter here) started to report cases up to June 10. Since I’m writing this on June 25, there are more than 14 days that they stopped reporting, the dataset doesn’t include this facility anymore (the latest data points in the dataset are for June 24):
This is a pity because, besides the difference between residents and staff, these datasets also present cases and deaths among youth and inmates. It would have been nice to understand the evolution of the burden of COVID-19 in these populations. But the curve is clearly not cumulative, as we can seen on the charts below: after about June 2nd-10th, curves going down probably indicate removal of facilities in the total count.
As mentioned in the Baltimore Sun article, with this kind of reporting, you cannot know the real toll in nursing home, prisons and other congregate facility settings and therefore you cannot respond to it appropriately (i.e. the toll is now underestimated).
Also, you can’t put things in perspective because you can’t have a reliable proportion of cases in congregate facility settings compared to the total number of COVID-19 cases in Maryland. This total number of cases is cumulative and we see an artificial decrease in % of cases in these facilities, as illustrated below:
Now, what can we do? One clear solution is that the Maryland Department of Health changes its reporting and really report the correct cumulative number of cases in congregate facility settings. Besides that, I have a technical solution in mind but I had no time today to code it yet …
Post-scriptum on June 26, 2020: the day after I posted this, Maryland Governor Larry Hogan announceda safe and phased reopening plan for Maryland’s assisted living facilities. Although I welcome any initiative targeting the protection of everyone and especially the most vulnerable populations, the 2 first prerequisites are still tied to this absence of new cases in 14 days (which is fine) – this is still not a reason to intentionally remove facilities from the count. And I couldn’t see the phased approach – but I guess this will be followed up in another post here. To be continued …