Privacy-Preserving Workflow for the Cross-Border Federated Analysis of Clinical Data

Stud Health Technol Inform. 2024 Aug 22:316:1637-1641. doi: 10.3233/SHTI240737.

Abstract

The motivation behind this research is to perform a privacy-preserving analysis of data located at remote sites and in different jurisdictions with no possibility of sharing individual-level information. Here, we present key findings from requirements analysis and a resulting federated data analysis workflow built using open-source research software, where patient-level information is securely stored and never exposed during the analysis process. We present additional improvements to further strengthen the security of the workflow. We emphasize and showcase the use of data harmonization in the analysis. The data analysis is done using the R language for statistical computing and DataSHIELD libraries for non-disclosive analysis of sensitive data. The workflow was validated against two data analysis scenarios, confirming the results obtained with a centralized analysis approach. The clinical datasets are part of the large Pan-European SARS-Cov-2 cohort, collected and managed by the ORCHESTRA project. We demonstrate the viability of establishing a cross-border federated data analysis framework and conducting an analysis without exposing patient-level information, achieving results equivalent to centralized non-secure analysis. However, it is vital to ensure requirements associated with data harmonization, anonymization and IT infrastructure to maintain availability, usability and data security.

Keywords: Clinical Cohorts; Distributed Data Analysis; Federated Learning; Privacy-Preserving Data Analysis.

MeSH terms

  • COVID-19 / prevention & control
  • Computer Security*
  • Confidentiality
  • Electronic Health Records
  • Humans
  • SARS-CoV-2
  • Software
  • Workflow*