Case Studies for Overcoming Challenges in Using Big Data in Cancer

Cancer Res. 2023 Apr 14;83(8):1183-1190. doi: 10.1158/0008-5472.CAN-22-1277.

Abstract

The analysis of big healthcare data has enormous potential as a tool for advancing oncology drug development and patient treatment, particularly in the context of precision medicine. However, there are challenges in organizing, sharing, integrating, and making these data readily accessible to the research community. This review presents five case studies illustrating various successful approaches to addressing such challenges. These efforts are CancerLinQ, the American Association for Cancer Research Project GENIE, Project Data Sphere, the National Cancer Institute Genomic Data Commons, and the Veterans Health Administration Clinical Data Initiative. Critical factors in the development of these systems include attention to the use of robust pipelines for data aggregation, common data models, data deidentification to enable multiple uses, integration of data collection into physician workflows, terminology standardization and attention to interoperability, extensive quality assurance and quality control activity, incorporation of multiple data types, and understanding how data resources can be best applied. By describing some of the emerging resources, we hope to inspire consideration of the secondary use of such data at the earliest possible step to ensure the proper sharing of data in order to generate insights that advance the understanding and the treatment of cancer.

Publication types

  • Review

MeSH terms

  • Big Data*
  • Delivery of Health Care
  • Humans
  • Medical Oncology
  • Neoplasms* / genetics
  • Neoplasms* / therapy
  • United States / epidemiology