Author: Jean-Etienne Poirrier

Notes en passant: how AI could unlearn in HEOR Modelling

In a recent paper, Tinglong Dai, Risa Wolf, and Haiyang Yang wrote about unlearning in Medical AI.

With more and more CPU and storage thrown at Large Language Models (LLMs) and Generative Artificial Intelligence (GenAI) in general, the capacity to “memorise” information grows larger and larger with each generation of LLM. This is further accelerated by the capacity to add specific details to generalist LLMs using Retrieval Augmented Generation (RAG) and agents (e.g., with the ability to query real-world systems at the interface with the physical world).

LLMs are learning more, but what about unlearning? Dai and colleagues didn’t evoke the analogy with human memory: our capacity to learn more relies, in part, on our capacity to forget, to reorganise, to summarise, and to prioritise the learnt information. Sleep and stress play a role in this reorganisation of information; this was the overarching topic of my Ph.D. thesis [link]. I will de-prioritise the visual cues along the path leading to a bakery if I no longer go to this bakery (“unlearning”). However, practising navigation to the bakery improved this skill, and this improvement will serve me later when I need to go to another place (something I could call “secondary learning”). It may seem we diverge from AI, but Dai and colleagues actually start their paper with the EU GDPR possibility for a patient to remove their data from a database, wondering how this is technically possible with LLMs (where data is not structured like in a traditional relational database and where the way data is retrieved is often unknown).

The “unlearning” process in LLMs can be considered from three encapsulated levels: algorithm, legal, and ethical levels.

Continue reading “Notes en passant: how AI could unlearn in HEOR Modelling”

What to look for at ISPOR25 – Artificial Intelligence

After Modelling and Regulations & Pricing, and just a few days before ISPOR25, here is my take on the potentially interesting sessions on Artificial Intelligence (AI, which generally means: the use of Generative AI, or GenAI, in HEOR).

First, Sven Klijn, William Rawlinson, and Tim Reason are again offering their introductory course on Applied Generative AI for HEOR. Last year, I followed it in Barcelona, and it was nice. In my opinion, “nice” means that although I didn’t learn much more than previous presentations by the authors and my own experience, it was a great course for beginners because it struck the right balance between theory (which too many sessions end up only covering) and practical examples. Don’t expect hands-on exercises (that would be too long, and the course synopsis doesn’t mention that either). But “nice” to me means that the presenters dared to show actual working code, with all the humility that it implies. This year, they mention they’ll cover Retrieval-Augmented Generation (RAG) and agents. Hopefully, their coverage of these aspects will be as good as last year’s on the other topics.

Note that there is another course on AI and its use in Real-World Evidence (RWE) Research. I never attended this one, but I hope the instructors will give the audience practical instructions, independent of the AI tool their company is selling.

ISPOR’s key areas of focus for AI in HEOR (source)

Now on to the sessions! After several ISPOR sessions filled with hype from AI-enthusiasts and AI-deniers, we are slowly coming to some “âge de raison.” However, GenAI is still relatively new, and sessions reflect the need to cater to all audiences.

For the beginners (in AI), a few sessions will introduce GenAI and its use in HEOR. Even if hidden (intentionally or not), GenAI relies on prompt and prompt engineering; one session will present an overview of this technique. A second session will present an overview of progress and challenges brought by GenAI.

For the more advanced AI users, most sessions will talk about the newest tools. One session will talk about reliability in LLMs and prompting (as a side note, I will be interested in the Aide solutions that were teased for some time now). From another session’s title, advances in GenAI should be presented; however, the abstract lacks the latest trends, like agents and functions. For these, one should probably attend the session specifically on agents or this other session on RAG (both with one of the same presenters).

Another approach is from the perspective of AI applications. Literature reviews and AI will be covered in two sessions (AI-Assisted Literature Reviews: Requirements and Advances and Leveraging Automated Tools for Literature Reviews in Health Economics and Outcomes Research: Opportunities, Challenges, and Best Practices). RWD/RWE will be covered in three sessions: Identifying Gaps and Establishing a Development Plan for Consensus Real-World Data Standards and two commercial sessions (054 and 048; disclaimer: this last session is from my current employer). Health Preferences and AI have their own sessions, as do rare diseases and AI (with no less than three pharma companies as presenters!).

Finally, the most practical sessions, IMHO, will be the Research Podiums, as they should marry the technological approaches with the domain approaches. Interestingly, the first of these sessions, The Power and Pitfalls of AI in Health Data Analysis, only presents posters using NLP and Machine Learning (i.e. no GenAI per se). The second session, AI-Assisted Literature Reviews: Requirements and Advances, is focused on literature reviews. This year, it looks like there will be no sessions specifically focused on Modelling; my opinion is that either no significant progress was made (compared to previous ISPOR conferences) or this progress is now kept internally (for pharma’s own use or for consultants’ clients’ use).

Did I miss any important sessions? Do you have another take on sessions at this ISPOR conference or AI in HEOR? Although I enjoy a good quasi-philosophical debate on the good and evil of AI in HEOR, I’m happy to see practical applications being presented and discussed 🙂

What to look for at ISPOR 2025 – Regulations & pricing

ISPOR 2025 is now in about two weeks and in this second post about sessions to look for, I’ll talk about regulations and pricing (this is part of a series: last week, I wrote about Modelling; next week, I will write about AI).

IRA letters on a pile of scientific papers with a view of an American city at night

As the main edition of ISPOR is held in North America, the Inflation Reduction Act (IRA) will naturally attract considerable attention. It will start with the very first Plenary Session, promising to explore the impact of price limits on innovation and provide real-world examples.

The lessons learnt or (unintended) consequences-type of sessions are in vogue, with sessions like “Year 1 learnings“, “Implications for Providers“, “Did the IRA spook the industry” (from the perspective of rare diseases), “Unintended consequences“, and a discussion “Beyond Drug Negotiation“.

Here is one session to remind you that timing is essential … The IRA was introduced in 2022, under US President Biden, but the first Maximum Fair Prices will be implemented in 2026, under US President Trump. The latter has already introduced sweeping changes in the pharmaceutical research landscape in the US. But I found one interesting issue panel somewhat prophetic: “IRA Under Trump: What Is Next?“. Given the timelines for panel submission for ISPOR (before the 2024 elections), I have so many scenarios in my head about the authors trying to write the abstract broad enough to encompass potential futures while navigating the possible sensitivity of the situation, even before it happens. It will be an interesting panel to attend …

A panel also takes another perspective, wondering if the IRA became the US HTA. From the abstract, I understand the authors’ perspective, but I don’t agree: although the IRA will impact millions of Americans, it still lacks a direct impact on commercial plans. But it’s obviously a broader discussion than a terse comment in a blog post.

JCA letters on a pile of scientific papers with a view of an European countryside in the afternoon

It will also be the first ISPOR conference after the European Joint Clinical Assessment (JCA) process started in Europe. Therefore, it is a bit early to draw conclusions and look at lessons learnt. However, two interesting sessions will examine it from the outside: one will examine the global (i.e. ex-EU) impact of JCA, and another will consider JCA as an enabler of cross-border collaborations.

Finally, because I recently contributed to projects supporting investors with health economics tools and assessments, I will be interested in the Input/Output Modelling panel: it will look at the broader impact of investments in health and pharmaceutical products. Their abstract reminded me of some early work published by a former boss on the societal impact of vaccination (I wonder if vaccines will be mentioned, by the way). The last workshop (the last one mentioned here) is titled “HEOR meets investing“, and it is precisely what we recently did: early health economics modelling can greatly help secure investments by reassuring about the potential cost-effectiveness of a drug, and justifying studies to fill crucial input data gaps.

I missed some panels in this short helicopter view. Do you have any other suggestions?

Next week, I will look at AI’s potential progress in health economics. Stay tuned!

What to look for at ISPOR25 – Modelling

ISPOR25, the annual North American conference for the International Society for Pharmacoeconomics and Outcomes Research, is in three weeks. As usual, I’m planning for it by browsing its program. This time, I decided to share a few of my interests on my blog. ISPOR usually covers many topics, from “hardcore” statistical methods to top-level overviews of some issues, so I will focus on only a few topics. Feel free to connect with me if you want to discuss anything at or around the conference (or virtually). (And before we start, full disclaimer: I’m currently working for Parexel, but opinions shared here are only mine; otherwise, I would have written them on the company blog.)

Notes from Jep preparing the ISPOR25 conference with its program on a laptop screen in the background
As usual, browsing the ISPOR conference program brings pages of potentially interesting topics

This first post will be about my primary interest: HEOR modelling, what input data we use, the impact of broader frameworks and regulations, and how it is used. Stay tuned for the next posts: they will be about higher-level regulations and pricing, and one specific to AI.

Despite a huge number of posters (as usual) and a somewhat inefficient official search engine, we can still find interesting posters by following poster tours. For instance, the HEOR impact case poster tour (027) on Wednesday is presenting two takes on managed-entry agreements: Zemplenyi et al. will look at three outcome-based agreement models for sickle-cell disease while Arcand will argue that the epidemiological approach is still more valid than RWD for the re-evaluation of CAR T-cell therapies in Quebec. The Methodology Research in HEOR Poster Tour (55), the Cost-Effectiveness Evaluation of Medical Therapies (100) and the session on Novel Concepts and Frameworks in Health Economic Evaluations (131) will have some interesting presentation of real use of (sometimes new) modelling methods.

One aspect of HEOR Modelling that has not yet become mainstream (i.e. required by HTA agencies or taught before basic cost-effectiveness modelling) is the Generalized Cost-Effectiveness Analysis (GCEA) and other frameworks looking at incorporating other elements of value (other than costs and effects, broadly speaking). These frameworks are great and necessary to study the value of a healthcare intervention in a broader perspective than “just” the reimbursement perspective. However, in my opinion, some issues hinder their wider adoption: a wide agreement on their definition (and their usefulness, for a start) and established methods to collect input data. Two panels will be revisiting this: Challenges in the Implementation of Generalized Cost-Effectiveness Analysis (GCEA): Debating a Path Forward (059) and Global Guidance for Evidence-Based Value Assessment of Innovative Health Technologies: Feasible Reality or Idealistic Dream? (138).

Somewhat related, three other panels will explore specific value elements: cost of inequality, family spillover, and financial risk protection. And one session revisits the societal perspective and one the cost-benefit analysis.

From a tools perspective, one session will explore New Tools Facilitating Health Economics and Outcomes (078). It contains 4 interesting posters:

  1. One will seemingly introduce a benchmark to assess a Large Language Model (LLM) performance at extracting information from models and literature reviews. I wrote “seemingly” because, if the intent is great, the rest of the abstract is not clear about how this system will assess the next model (not only the ones currently contained in the LLM database). And I am a bit doubtful about the use of the number of tokens as a measure of quality. Hopefully, the presentation will clarify these points.
  2. One will present a review of the literature on the use of AI in health economics models. From the abstract, it looks a bit like the first review I presented with my previous boss, last year at WorldEPA 2024. Note I will present some interesting sessions on AI in HEOR in a following post.
  3. Two posters/presentations will show tools to improve efficiencies: a VBA/R automator (I wonder about the sustainability of maintaining this type of program) and the use of metamodels.

This last poster is interesting because we will also present a poster about visual programming and try to convince the audience that this way of programming has many benefits for specific uses of modelling: brainstorming, early modelling, strategy (and introducing newcomers to complex topics in modelling).
Blurred poster from Poirrier et al., to be presented at ISPOR 2025 (hence blurred before the conference)
In a real geek way, it is interesting to note that most of these sessions still rely on MS Excel and venture, from time to time, with R. Our poster introduces Typescript, but it’s more a side effect (due to the framework used). In addition, our solution can be extended to any programming language, including Python, for instance (a programming language used a lot in data science; besides the use of LLMs for HEOR, Python is not used very much in our domain).

To end this post, I will also follow with interest the session asking to Flip (to) the Script: Is It Time to Rethink Health Economic Modeling for HTAs? (073). It has been a decade since R was tested for modelling. There are now great packages, videos tutorials (example) and advanced training, a fast-growing group and a recently created HTA working group within the R Consortium. Still, modellers are mainly using MS Excel …

I didn’t mention sessions on causal inference, survival analysis, surrogate endpoints, … They are all worth attending. In your opinion, what session(s) did I miss in this brief overview?

Start with a PyPortal in 2021

The Adafruit PyPortal is a great device, with a few bells an whistles already integrated in order to start small electronic projects (but expensive, ok ;-)). As usual, Adafruit wrote a nice introductory guide. But some parts are outdated. Therefore, here are a few steps to get you started with CircuitPython on a PyPortal in 2021 …

Continue reading “Start with a PyPortal in 2021”

COVID-19 cases in Maryland congregate living facilities

Five months ago, I was wondering why Maryland remove COVID-19 cases from its count in congregate living facilities (nursing homes, prisons, …). I still don’t have any answer but I found a technical solution 🙂

The Python script (in src/ in the MD-coronavirus repo on Github) just fills in the latest data for days where data is missing. On a side note, it also fix some basic issues like a reporting date in year “0200” (instead of “2020”). You can play with the fixed data file here.

To take the same example again, below is the graph of the number of cases in Sterling Care Frostburg Village according to the official data file (“GH” means “group housing”). Between mid-June and mid-September, there is no data point. Therefore, it’s impossible to calculate a cumulative number of cases in all congregate living facilities. You can see in the old post that the cumulative curve is actually going down after June.

On the fixed version below, you can see data points added between mid-June and mid-September:

Note also that the MDH could have reset the count of cases between periods of 14+ days without reporting. Fortunately, it didn’t do that and you can see the facility re-appears in the file, mid-September, with 11 cases (or 2 more than in June), instead of just 2 cases in residents.

This version now allows to correctly display the cumulative count of COVID-19 cases in congregate living facilities:

We can see that, during the first wave, in May 2020, the number of cases increased a lot, especially among residents of nursing homes. Then the curves increased at a slower pace. Since the beginning, nursing homes counted for the bulk of congregate living facilities cases. But the increase in cases happens in all facilities.

There are still some issues to be solved. For instance, some facilities seem coded under different names. Our example above is coded in 2 different ways (and I need time to go through the 200+ facilities in the list):

  • Sterling Care – Frostburg Village
  • Sterling Care Frostburg Village

For a human, they are clearly the same facility. For a computer script, it still needs to be told so. And talking about computer script, this one still needs to be cleaned …

To be continued …

As usual, you’ll find other graphs on my page about COVID-19 in Maryland (and figures above are updated with new data as they appear) and the data, code and figures are on Github (including these ones).

COVID-19 cases in Wallonia schools

In Wallonia (Southern part of Belgium), universities are already back to only giving online classes, schools will be closed two additional days after the Autumn holidays (so November 2-11), and secondary schools (12-18 years-old children) will be virtual for the 3 days before the Autumn holidays (so October 28-30). The reason? The exploding number of COVID-19 cases in schools.

In Wallonia, education is in the hands of the French-speaking Community (along with Brussels) but its statistics department doesn’t seem to provide public data on COVID-19. For that, we have to look at ONE (roughly: “Office for births and infancy“) that communicate weekly numbers of cases and quarantines in children in schools via press releases (forcing us to parse PDFs but it’s better than no data).

So far, the students in secondary school (12-18 years old) are the worst hit with a total of 6,258 positive cases since September 2020 (I’m writing this on October 27), followed by teachers and other personnel (total: 2,497 positive cases).

Is it a lot? Consider this: for the week ending on October 18, incidence in primary school (6-11 years old) is 365 / 100,000, incidence in secondary school (12-17 years old) is 1,117 / 100,000 while the average incidence over the last 14 days in the whole Belgian population is 1,289 / 100,000 (epidemiological bulletin of Oct. 26). Adolescents are therefore a driver of the incidence.

But one can see on the charts below that all age categories are exponentially seeing new cases:

Unfortunately, when you read the press releases, you realize that these numbers are minima. Indeed, the situation is actually worse but there are several reasons why numbers are not completely reported:

  • Health services in schools are not staffed to face a pandemic, they were not prepared and now some personnel also got the virus.
  • As a consequence, data is not completely transmitted to ONE since mid-October (it’s apparently worse for quarantine data, not shown here: at least 21% of cases don’t have data associated with potential follow-up quarantine in the last (7th) report).
  • Since October 1st, protocols (quarantine decreased at 7 days, definitions of close contact, etc.) changed.
  • Children below 6 years are exceptionally tested.
  • Children between 6 and 12 years (primary school) are tested only if they meet some conditions (symptoms, contacts in the family, or if 2 cases in the class).
  • It seems there are issues with reporting in students 18+ (“écoles supérieures“).
  • Universities are not reported in this count.
  • For adults (here: 18+ students, teachers and personnel), Belgium is back at testing only symptomatic patients since October 19, 2020.

So the additional days of holidays and making a few additional days of virtual school for secondary students is meant to try to break transmission of COVID-19 in schools.

Talking about transmission, it seems there is a kind of exploration on sources of infection in the ONE reports. It is not reported systematically nor in a similar way but the source of infection for reported cases is the school (close contact with a student, a teacher or a personnel) in 16-20% of cases.

I really hope this extended holidays will reduce transmission. It seems the younger a child is, the less symptoms he/she’ll display, it therefore seems ok for them to get the disease. But children remain important transmission vectors and we don’t want them to transmit the disease to more vulnerable groups of the population, like grand-parents but also adults and children with co-morbidities or immune diseases. Let’s not add a COVID-19 burden to the usual disease associated with winter (like flu).

To be continued …

As usual, you’ll find other graphs on my page about COVID-19 in Belgium (and figures above are updated with new data as they appear) and the data, code and figures are on Github (including the AVIQ one in this post).

COVID-19 clusters in Belgium

Recently (I’m writing this on October 20), the (new) Belgian government decided to apply more stringent prophylaxis measures to contain COVID-19. One of the controversial measure is to close bars and restaurants for a month.

Unfortunately, in a way, at approximately the same time, AVIQ released its latest poll on COVID-19 clusters in Wallonia (AVIQ is the Walloon agency for well-being, health, handicap and family). I wrote it was unfortunate because I read and heard several people who criticized the closing of bars and restaurants by citing this poll. But this poll cannot answer in favor or against this closure; it doesn’t look at that …

Here are the results:

From the meager press release, here is what we can reconstruct … AVIQ looked at the 5,043 COVID-19 clusters in Wallonia so far and went to interview one or several patients from these clusters (AVIQ defines a cluster as a place where there are 2 or more confirmed COVID-19 cases). The question was, more or less, where did you go before getting COVID-19? (in French: “collectivités que les personnes covid-19 positives ont déclaré avoir fréquentées“).

From there, nearly 84% of clusters were families, far ahead from schools (4%), companies/bars/restaurants (3%), and other places (note schools are still open in Belgium, except universities starting today).

First, bars and restaurants are amalgamated with companies (where home working was encouraged). One cannot easily disentangle them, unfortunately. Then all places are linked and the virus didn’t suddenly appear in the family – but one is more inclined to remember it’s in the family because it is close to dear people (spouse, children, parents, …). Also, there is the potential recollection bias (a classical limitation of interviews), interviewees willing to please the interviewer or simply not willing to disclose behaviors that may be frowned upon. A recent example of this was when the previous Belgian Prime Minister announced she was positive:

This tweet was quickly put in perspective with a plenary meeting of Mrs Wilmès party where the recommended precautions were not all followed:

Well, back to our clusters … My last points for this AVIQ poll is that unfortunately there is no more details than this. We don’t know much about the methodology, it was minimally put in context and there was little caution against wild interpretations (just a “[Ces données] restent toutefois parcellaires compte tenu de ce qu’elles sont déclaratives et tributaires des délais de testing“).

On the other side of Belgium, Zorg en Gezondheid (~AVIQ in the Flemish Region) did a similar poll but gave a bit more details about how they did it and provided more explanations in the results. For instance, they started by asking the index patient where he/she thinks he/she was contaminated: in the chart below, most patients didn’t know (“onbekend” – at least it was an option) but family (“gezin“) and workplace (“werk“) are respectively second and third in the places where they think they most likely got infected (but quite behind “unknown”).

What is interesting is that Zorg en Gezondheid then asked in which social places were these patients before self-isolating. And then we see (below) than most mention bars (“cafés“), restaurants, sports and then only the rather vague “public activities”. It is striking to note that none of these activities are related to school (maybe they only interviewed adults?).

And again, as it was mentioned elsewhere, these are interesting results but it doesn’t show the contagiousness or risk of contamination of these places.

For that, you’ll need serious tracing studies following knows outbreaks. But that’s another story …

To be continued …

As usual, you’ll find other graphs on my page about COVID-19 in Belgium (and figures above are updated with new data as they appear) and the data, code and figures are on Github (including the AVIQ one in this post).

A third of Maryland counties tested more than 25% of residents

Sometimes, you think that you found something interesting but the Maryland Department of Health is already presenting it on its COVID-19 dashboard 😀

For instance, I calculated the percentage of residents of the different counties ever tested (regardless of the test result). I found out that a third of Maryland counties (8/24) tested at least once more than 25% of their residents. Indeed, as of yesterday (August 10), here are the counties in that category:

County (alphabetical order)% population ever tested
Baltimore25.8%
Baltimore City30.2%
Dorchester30.4%
Kent30.6%
Somerset30.8%
Talbot28.6%
Washington27.7%
Wicomico25%
Maryland counties with more than 25% of their population tested for COVID-19 on August 10, 2020

While we are at it, here are the 5 counties with less than 20% of their population tested (still as of August 10, 2020):

County (alphabetical order)% of population ever tested
Calvert14.9%
Cecil15.3%
Charles18.6%
Harford18.1%
Queen Anne’s19.4%
Maryland counties with less than 20% of their population tested for COVID-19 on August 10, 2020

Graphically, we see that all counties are testing more and more, and increasing at approximately the same speed:

Evolution of COVID-19 tests in Maryland Counties, as of August 11, 2020

As you can see, there are 2 minor issues with the dataset from the MDH API. First, Somerset reported more than double the normal number of tests on June 18, 2020; it went back to “normal” on the next day (I suspect an encoding error here, see highlight below). Then, there is no data after July 7; data resumes on July 13 (a posteriori, I don’t recall reading any issue about county data collection during that time). None of these prevents looking at the current data.

Evolution of COVID-19 tests in Somerset county, as of August 11, 2020

Now, as I mentioned, the official dashboard has already this data, presented by quartile, as a kind of competition between counties 😉 … (the % are slightly different, probably because we are using different sources for the population totals – I’m using the population projections from the Maryland Department of Planning).

To be continued …

As usual, you’ll find other graphs on my page about COVID-19 in Maryland (and figures above are updated with new data as they appear) and the data, code and figures are on Github (including these ones).

COVID-19 hospitalization by age in Maryland

Since mid-July 2020 in Maryland, we understood that the 20-59 yr age group was problematic, especially the 20-29 yr age group that is racing to overtake all age groups in terms of number of COVID-19 cases (relative to their population, see top chart below).

In terms of COVID-19 hospitalizations, we also saw a small rebound (see chart below; it seems that it subsides since beginning of August).

But what we didn’t know (for this small peak as well as since the beginning) was what is the age of these hospitalized populations. Did these hospitalizations impacted more the older adults? The younger ones? Or the children? The Maryland Department of Health COVID-19 dashboard doesn’t report that information (nor in the API).

Despite the recent issue about switching hospitalization reporting from CDC to HHS, it seems that CDC is still reporting hospitalization data at COVID-NET (Coronavirus Disease 2019-Associated Hospitalization Surveillance Network), at least until the end of July. There, it is interesting to note that Maryland is the only state which reporting represents 100% of the population (24 counties) – that’s good!

Screenshot of COVID-NET method description showing that 100% of the Maryland population is represented

Now, the CDC also has an interactive graph where you can see and filter the data by yourself. Here is the situation up to August 9, 2020, for Maryland:

The peak of April-May is well represented, with the 85+ population reaching a peak at nearly 100 weekly hospitalizations per 100,000 pop. All the other age groups increased during that time, the older the higher (unfortunately).

Now, since July, we see some of these age groups increase again. At the end of July:

Age groupWeekly hospitalization rate
65-74 yr16.0
75-84 yr21.6
85+ yr17.6
Weekly hospitalization rates for the week of July 27, 2020 in Maryland, MD, USA

This, in my opinion, reinforce the view that, cases might be increasing in the younger population (also thanks to testing being more available) and children and young adults might be less impacted when infected. But the older population is the first impacted by any increase in cases. It was true in April-May. It is again the case with this small peak. If we should take preventative measures to contain COVID-19, it is for us – but especially for the older population, our parents.

To be continued …

As usual, you’ll find other graphs on my page about COVID-19 in Maryland (and figures above are updated with new data as they appear) and the data, code and figures are on Github (including these ones).