Project Tycho 2.0: a repository to improve the integration and reuse of data for global population health

J Am Med Inform Assoc. 2018 Dec 1;25(12):1608-1617. doi: 10.1093/jamia/ocy123.

Abstract

Objective: In 2013, we released Project Tycho, an open-access database comprising 3.6 million counts of infectious disease cases and deaths reported for over a century by public health surveillance in the United States. Our objective is to describe how Project Tycho version 1 (v1) data has been used to create new knowledge and technology and to present improvements made in the newly released version 2.0 (v2).

Materials and methods: We analyzed our user database and conducted online searches to analyze the use of Project Tycho v1 data. For v2, we added new US data and dengue data for other countries, and grouped data into 360 datasets, each with a digital object identifier and rich metadata. In addition, we used standard vocabularies to encode data where possible, improving compliance with FAIR (findable, accessible, interoperable, reusable) guiding principles for data management.

Results: Since release, 3174 people have registered to use Project Tycho data, leading to 18 new peer-reviewed papers and 27 other creative works, such as conference papers, student theses, and software applications. Project Tycho v2 comprises 5.7 million counts of infectious diseases in the United States and of dengue-related conditions in 98 additional countries.

Discussion: Project Tycho v2 contributes to improving FAIR compliance of global health data, but more work is needed to develop community-accepted standard representations for global health data.

Conclusion: FAIR principles are a valuable guide for improving the integration and reuse of data in global health to improve disease control and save lives.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Communicable Diseases / epidemiology
  • Data Aggregation
  • Databases, Factual*
  • Epidemiologic Methods
  • Global Health*
  • Humans
  • Information Storage and Retrieval
  • Metadata*
  • Public Health Surveillance