August 19, 2020 — We would like to share
with the community the data we have been collecting. Because the data is sensitive, we have to share this
through a formal Data Access Agreement which can be signed by institutionally authorized
data, as per the agreement, can be shared for academic research purposes only
At the moment the data we have preprocessed and are able to
share is the data described in the paper published in ACM
KDD, also available publicly at
Please cite this paper when you use the data in your publications.
This data contains 459 cough and breathe samples from 378 users (from Web and Android applications until
22 May, 2020). We have carefully checked samples to guarantee the quality of the data. We have collected
voice, symptoms and other metadata, but at this stage we only share cough and breathing as we used in the
paper. First of all, there were 62 users who said they had tested positive for COVID-19 (in the last 14
days or before that) contributing 141 cough and breathing samples, with 54 of these samples from users
who reported dry or wet cough. In addition, we define non-COVID users as those with a clean medical
history, who had never smoked, had not tested positive for COVID-19. These users contributed 298 samples
with 32 samples declaring cough as a symptom. Finally, we also filtered asthma with cough, the users who
have asthma, had not tested positive for COVID-19, and had a cough; these gave us 20 samples. More
processing details can be found on our paper above.
If you are interested in obtaining the data please email
firstname.lastname@example.org. A data
template will be sent to you to be signed by the authorized signatory of your institution and then we will
be able to share the data with you.