Migrating a research data warehouse to a public cloud: challenges and opportunities

J Am Med Inform Assoc. 2022 Mar 15;29(4):592-600. doi: 10.1093/jamia/ocab278.

Abstract

Objective: Clinical research data warehouses (RDWs) linked to genomic pipelines and open data archives are being created to support innovative, complex data-driven discoveries. The computing and storage needs of these research environments may quickly exceed the capacity of on-premises systems. New RDWs are migrating to cloud platforms for the scalability and flexibility needed to meet these challenges. We describe our experience in migrating a multi-institutional RDW to a public cloud.

Materials and methods: This study is descriptive. Primary materials included internal and public presentations before and after the transition, analysis documents, and actual billing records. Findings were aggregated into topical categories.

Results: Eight categories of migration issues were identified. Unanticipated challenges included legacy system limitations; network, computing, and storage architectures that realize performance and cost benefits in the face of hyper-innovation, complex security reviews and approvals, and limited cloud consulting expertise.

Discussion: Cloud architectures enable previously unavailable capabilities, but numerous pitfalls can impede realizing the full benefits of a cloud environment. Rapid changes in cloud capabilities can quickly obsolete existing architectures and associated institutional policies. Touchpoints with on-premise networks and systems can add unforeseen complexity. Governance, resource management, and cost oversight are critical to allow rapid innovation while minimizing wasted resources and unnecessary costs.

Conclusions: Migrating our RDW to the cloud has enabled capabilities and innovations that would not have been possible with an on-premises environment. Notwithstanding the challenges of managing cloud resources, the resulting RDW capabilities have been highly positive to our institution, research community, and partners.

Keywords: big data; cloud computing; data warehousing; research data governance.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cloud Computing*
  • Data Warehousing*