Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • AI classifiers can accurately predict SARSCoV2 infection status
  • 67,842 individuals with linked metadata were studied, 23,514 tested positive for SARS CoV 2
  • Subjects were recruited via the UK governments National Health Service Test-and-Trace programme and the REACT randomised surveillance survey
  • AI classifiers predict SARS-CoV-2 infection status with high accuracy
  • After adjusting for confounders, classifier performance is weaker
  • Audio based classifiers are outperformed by simple predictive scores based on user reported symptoms

Paper Content

Results

Study design

  • Invited volunteers to participate in study from March 2021 to March 2022
  • Collected audio recordings of four respiratory audio modalities
  • Final dataset consisted of 23,514 COVID + and 44,328 SARS-CoV-2 PCR-negative (COVID − ) individuals
  • Acoustic target should be causally linked to COVID-19
  • Acoustic target should not be self-identifiable
  • Acoustic target should enable high-utility COVID-19 screening

Characterising and controlling recruitment bias

  • Audio-based COVID-19 classification results can be affected by the characteristics of the enrolled population.

Primary analyses

  • Pre-specified analysis plan was designed and fixed to increase replicability of conclusions
  • Audio-based COVID-19 prediction performance was presented in Table 1
  • SSAST and BNN classifiers outperformed the baseline SVM
  • Under the participant disjoint Randomised data split, SSAST classifier achieved high COVID-19 predictive accuracy of ROC-AUC=0.846
  • When controlling for enrolment bias, predictive accuracy dropped to ROC-AUC=0.619
  • Significant differences in predictive scores between COVID − and COVID + individuals in 28 strata were observed

Confirmatory analyses and validation

  • Audio-based classifiers can be useful if they improve performance compared to classifiers based on self-identifiable symptoms.
  • Test sets should reflect real-life settings.
  • Balanced subsampling used to create “general population” test set.
  • Benchmarking audio-based classifier against classifiers based on self-identifiable symptoms.
  • Audio-based classifier offers small increase in predictive accuracy.
  • Utility of classifier depends on context.
  • Illustrative utility function specified.
  • Exploratory methods used to identify influence of unmeasured confounders.
  • Residual predictive variation persists after mapping to COVID - individuals.
  • Points to unmeasured confounder bias contributing to classifier performance.

Discussion

  • Collected largest PCR-validated dataset of its kind to date
  • Quantified accuracy of audio-based classifiers for predicting COVID-19
  • High accuracy before accounting for recruitment bias (ROC-AUC=0.85)
  • Little residual predictive variation after controlling for recruitment bias (ROC-AUC=0.62)
  • Self-reported symptoms outperform audio-based AI classifiers
  • Recruitment bias can artificially inflate association between COVID-19 and its symptoms
  • Audio-based classifiers can augment and complement self-screening
  • Collect and disseminate metadata to filter data for quality and relevance
  • Characterise and control recruitment bias
  • Design studies with bias control in mind
  • Focus on added predictive value of classifiers
  • Assess classifier performance in targeted settings
  • Examine classifier’s expected utility in applied setting
  • Out-of-study replication

Methods

Dataset collection and characteristics

  • Three papers accompany this work
  • Main sources of recruitment were REACT study and NHS T+T system
  • Enrolment was opt-in
  • Participants asked to record four audio clips
  • Demographic and clinical/health metadata collected
  • Final dataset of 23,514 COVID + and 44,328 COVID − individuals
  • Data split into three training sets and five test sets

Machine learning models

  • Three models were implemented to detect COVID-19 from audio
  • Models cover a range of machine learning research
  • Baseline model uses 6,373 hand crafted features
  • BNN model uses ResNet-50 and estimates uncertainty
  • Features for BNN are Mel filterbank
  • SSAST model uses transformers

Matching methodology.

  • Constructed 1:1 Matched test set with 907 COVID+ and 907 COVID- participants
  • Constructed Matched training set with 2,599 COVID+ and 2,599 COVID- participants
  • Considered action of applying testing protocol to individual randomly selected from population and used combined utility of consequences of outcome

Data availability

  • Dataset can be requested through UK Data Service
  • Audio data provided in .wav format
  • Metadata provided in .csv files
  • Data is anonymised and safeguarded
  • Authentication process required for access

Exploratory approaches to identify the influence of unmeasured confounders

  • Matching can be used to control for bias in data.
  • Unmeasured confounders can lead to inflated classification performance.
  • Two exploratory approaches are introduced to quantify the effects of unmeasured confounding.
  • Weak-Robust approach uses a low capacity model to identify cases with confounding signal.
  • Calibration step evaluates the weak model on an easier task than COVID-19 detection.
  • Project COVID+ and COVID- openSMILE features onto the first k principal components of the COVID- cases.
  • Remove correctly classified individuals from Curated Matched test set.
  • Evaluate SSAST model on Curated Matched test set.
  • Graphical representation of enrolment effects.
  • Predictive accuracy within Matched strata.
  • Schematic demonstrating the importance of ascertainment bias adjustment.
  • Symptomatic vs Asymptomatic for other COVID-19 datasets.
  • Results of the Weak-Robust approach.
  • SSAST performance when trained and evaluated on the COVID-19 sounds publicly available dataset.
  • TSNE plots of the the final layer representations of the SSAST.
  • Calibration plot for the SSAST for each respiratory modality.