Objective: Clinical research data warehouses (RDWs) linked to genomic pipelines and open data archives are being created to support innovative, complex data-driven discoveries. The computing and storage needs of these research environments may quickly exceed the capacity of on-premises systems. New RDWs are migrating to cloud platforms for the scalability and flexibility needed to meet these challenges. We describe our experience in migrating a multi-institutional RDW to a public cloud.
Materials and methods: This study is descriptive. Primary materials included internal and public presentations before and after the transition, analysis documents, and actual billing records. Findings were aggregated into topical categories.
Results: Eight categories of migration issues were identified. Unanticipated challenges included legacy system limitations; network, computing, and storage architectures that realize performance and cost benefits in the face of hyper-innovation, complex security reviews and approvals, and limited cloud consulting expertise.
Discussion: Cloud architectures enable previously unavailable capabilities, but numerous pitfalls can impede realizing the full benefits of a cloud environment. Rapid changes in cloud capabilities can quickly obsolete existing architectures and associated institutional policies. Touchpoints with on-premise networks and systems can add unforeseen complexity. Governance, resource management, and cost oversight are critical to allow rapid innovation while minimizing wasted resources and unnecessary costs.
Conclusions: Migrating our RDW to the cloud has enabled capabilities and innovations that would not have been possible with an on-premises environment. Notwithstanding the challenges of managing cloud resources, the resulting RDW capabilities have been highly positive to our institution, research community, and partners.
Keywords: big data; cloud computing; data warehousing; research data governance.
© The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association.