Are we using data in the best way to manage the COVID-19 Pandemic?

We are living in the largest, most dramatic and unexpected social experiment ever, as governments launch a variety of “social distancing” measures to slow down the coronavirus contagion. Also the use of AI and Big Data is being tested as never before: China, Taiwan, South Korea are using massive data collection and smartphones for “track and tracing systems”. In Europe nothing similar has been launched yet, while quantitative models to forecast the spread of the virus are multiplying. In Italy where the pandemic is raging there is a hot debate on the opportunity to follow the Asian example. Nobody has all the answers.

The BDVA community wants to open a discussion to share experiences and evidence about the best way to use data and AI to manage the pandemic and support identifying the best practices to be recommended and the best ideas to be shared. In this webinar we start the discussion with a few expert speakers reporting on the Asian and ongoing European experiences. But we call on all interested members of the community to participate and share other experiences and ideas to learn from each other. 

Date/Time: March 26th, 14h CET

Registration: here (closed)

Speakers:

Main organisers: Gabriella Cattaneo (IDC, TF2.Impact lead), Tiblets Demewez (Philips, TF7.Healthcare lead), Ana Garcia (BDVA Secretary General)

Video recording of the session 

Q&A from the session:

Q. How many meters should be the criterion for proximity?

[LF] Epidemiology does not actually have a proper scientific answer to the question for SARS-CoV-2.  1-2 meters is regarded as the most risky distance, hence in a good range for Bluetooth-based technologies, but maybe an issue for GPS? We expect data retrieved from the app to be useful to recalibrate models and answer many questions, including this one.

Q. Why are data-driven solutions used mainly at national level with the obvious limitations to solve a problem that is global?

[LF] AFAIK, some coordination at European level exists, but many national initiatives are currently proceeding faster than the political times needed for coordination. In the end, any solution will need to be coordinated at supra-national scales.

Q. Most mobile phones already have a health related app installed. All that needs to be done is an additional functionality added to the already installed apps.

[LF] It would be possible with some time to prepare, but not in the middle of an epidemic. Also, the app should be under exclusive control of national health services, so there are additional hurdles.

Q. The mobile app idea lacks practicality. In a pandemic as we have it now, reliable test kits will not be available, and they will not be available in numbers.

[LF] In principle, test kits are not needed for a "smart lockdown" where the app can be triggered by self-referrals (filtered by the health system). The app could be deployed tomorrow with this functionality, and still represent a significant improvement over the current lockdown in many countries, despite its limitations. This scenario is being considered. Also, testing capabilities are being dramatically scaled up as we speak, with the aim of testing >1 every 1000 residents per day. That could be enough even in the current phase of the epidemic.

Q. There is a recurrent requirement that has been brought up in most of the presentations: The importance of the quality of the data. When datasets are collected with plenty of time, it is easier to analyse them and do our best to clean them or evaluate the impact of low-quality data. However, the coronavirus started spreading not many months ago. How do we detect these low-quality data or measure their impact?

[GC] I am not a statistician, but improving the quality of data is a systemic issue – the first step is improving the collection of data making sure that at every step of the process testing-tracing-tracking-curing people the data are collected and sent to a comprehensive central database. There must be coordination between multiple sources and the countries leveraging the collaboration btw citizens and institutions deal with this better. In my opinion using smartphones apps is a key step to improve the collection of data. Once healthcare systems are overwhelmed (as is the case now in Lombardy) it is very difficult to collect the right type of data. For example in Lombardy there is insufficient capacity of testing so the baseline of the number of contagious individuals is vastly underestimating reality.

[PG] very good question. An almost perfect model requires many data points. However we cannot wait so long. Must take decisions. Our PAR model is a model that can quickly learn the shape of the curve. It is initially instable but then, as a reasonable amount of data come in (more than 30) it stabilises

Q. A possible solution is training predictive models to predict those missing fields based on other variables which are present, then you fill the missing with the predicted value, if no or few data is available for training those supporting models, we can rely on expert knowledge to create the expected conditional distributions of missing data. Other models support missing data by including additional control variables. In any case, the accuracy must be compared. [To GC]

[GC] Yes, there are several methods to estimate missing data points or fields but with a completely new disease applying expertise from previous pandemics is risky.

Q. These models are focusing on the infection as a whole, and in particular on confirmed cases, how can we use them to understand demand for ICU admissions and lethality vs. mortality rates? I would expect those are important insights for resource planning

[PG] Our PAR model can be similarly applied to the prediction of ICU admission and of mortality rates, as long as there are daily data available. they can be similarly applied to different countries, regions and time periods

Q. How invasive are you being with the data collection of infected or potential infected people? thanks.

[PG] The PAR model just uses publicly available data (from the World Health Organisation)

Q. Are there public datasets with health data from any country? (Age, gender, etc) 

[PG] yes, indeed. Eurostat in Europe, for example, has such data

[GC] However, for the accuracy of models is very important to have access to comparable data across the different countries. Currently the datasets about positives and testing numbers are diffused by each country in different ways and there is insufficient data about age, gender, other socio-demographic factors for example of those who survive vs those who die. It is understandable given the current emergency but still I think the EU should push for some common criteria to provide data to help these efforts

Q. Is there a way to predict how many real cases are not detected based on the data? For instance, taking into account the number of confirmed deaths or grave cases and their ages, etc.

[PG] yes, indeed. A simple method would be to get daily mortality data from the countries, and subtract from that the corresponding data from last year. The difference could approximately estimate the case. There are more evolved methods such as capture-recapture statistical methods (use in population ecology, for example)

[GC] This paper “Inferring cases from recent deaths” is about this. Estimates suggest that for each death in a city there are more than 1000 contagions.

Authors: Thibaut Jombart*, Sam Abbott1, Amy Gimma1, Christopher I Jarvis1, Timothy W Russell1, Kevin van Zandvoort1, Sam Clifford, Sebastian Funk, Hamish Gibbs, Yang Liu, Rosalind Eggo, Adam J Kucharski, CMMID nCov working group & W John Edmunds.https://cmmid.github.io/topics/covid19/current-patterns-transmission/cases-from-deaths.html 

Q. Are any of those presented models being used by authorities or governments? or are all theoretical studies?

[PG] Our PAR model is now in progress with the CEPS center in Bruxelles with the aim to have a further test for Eu authorities. We would like it to be adopted by governments as we believe it quite effective in predictive performance (given the issues with data quality and length), also with a good trade-off between model tailoring to specific situations and interpretability

Q. PAR Model: Interesting model. Seems to be a model that can be nicely fit to national evolution of the infection numbers. Is there any way to identify influencing factors such as date of isolation, effectiveness of isolation, reduction of public transport, change in contact distance between people 

[PG] Yes, we are doing this in the current work, inserting decision times in the model, and see whether "change points" are detected

Q. Professor Giudici talked about explainability, could he please elaborate more?

[PG] it means that a typical AI model is a black box model: it may well predict the number of new contagion, but you don't know the reason why. The PAR model is explainable as it tells which are the possible causes behind the contagion growth: a constant growth rate (intercept); dependence on previous counts (short term reproduction rate); dependence the whole past series (long term dependence, having to do with country's characteristics and measurement policies of the cases)

Q. We need reliable data in open access mode to let people experiment with AI. Any success on this activity needs to be provided open source so that others can learn and validate. Is there any platform that is able to provide such a service, provide data, allow submission of results, track open review and commenting? 

[PG] yes, indeed. Our PAR model is based on publicly available data; its results (alpha, beta, omega) can be similarly be made available for. the awareness of the general public