A County-level Dataset for Informing the United States Response to COVID-19

Published in arXiv, 2020

Recommended citation: B. D. Killeen*, J. Y. Wu*, K. Shah, A. Zapaishchykova, P. Nikutta, A. Tamhane, S. Chakraborty, J. Wei, T. Gao, M. Thies, M. Unberath. A County-level Dataset for Informing the United States Response to COVID-19. arXiv preprint, 2020, arXiv:2004.00756. https://arxiv.org/abs/2004.00756


As the coronavirus disease 2019 (COVID-19) continues to be a global pandemic, policy makers have enacted and reversed non-pharmaceutical interventions with various levels of restrictions to limit its spread. Data driven approaches that analyze temporal characteristics of the pandemic and its dependence on regional conditions might supply information to support the implementation of mitigation and suppression strategies. To facilitate research in this direction on the example of the United States, we present a machine-readable dataset that aggregates relevant data from governmental, journalistic, and academic sources on the U.S. county level. In addition to county-level time-series data from the JHU CSSE COVID-19 Dashboard, our dataset contains more than 300 variables that summarize population estimates, demographics, ethnicity, housing, education, employment and income, climate, transit scores, and healthcare system-related metrics. Furthermore, we present aggregated out-of-home activity information for various points of interest for each county, including grocery stores and hospitals, summarizing data from SafeGraph and Google mobility reports. We compile information from IHME, state and county-level government, and newspapers for dates of the enactment and reversal of non-pharmaceutical interventions. By collecting these data, as well as providing tools to read them, we hope to accelerate research that investigates how the disease spreads and why spread may be different across regions. Our dataset and associated code are available at github.com/JieYingWu/COVID-19_US_County-level_Summaries.

Read more.