Background: In recent years, research data warehouses moved increasingly into the focus of interest of medical research. Nevertheless, there are only a few center-independent infrastructure solutions available. They aim to provide a consolidated view on medical data from various sources such as clinical trials, electronic health records, epidemiological registries or longitudinal cohorts. The i2b2 framework is a well-established solution for such repositories, but it lacks support for importing and integrating clinical data and metadata.
Objectives: The goal of this project was to develop a platform for easy integration and administration of data from heterogeneous sources, to provide capabilities for linking them to medical terminologies and to allow for transforming and mapping of data streams for user-specific views.
Methods: A suite of three tools has been developed: the i2b2 Wizard for simplifying administration of i2b2, the IDRT Import and Mapping Tool for loading clinical data from various formats like CSV, SQL, CDISC ODM or biobanks and the IDRT i2b2 Web Client Plugin for advanced export options. The Import and Mapping Tool also includes an ontology editor for rearranging and mapping patient data and structures as well as annotating clinical data with medical terminologies, primarily those used in Germany (ICD-10-GM, OPS, ICD-O, etc.).
Results: With the three tools functional, new i2b2-based research projects can be created, populated and customized to researcher's needs in a few hours. Amalgamating data and metadata from different databases can be managed easily. With regards to data privacy a pseudonymization service can be plugged in. Using common ontologies and reference terminologies rather than project-specific ones leads to a consistent understanding of the data semantics.
Conclusions: i2b2's promise is to enable clinical researchers to devise and test new hypothesis even without a deep knowledge in statistical programing. The approach presented here has been tested in a number of scenarios with millions of observations and tens of thousands of patients. Initially mostly observant, trained researchers were able to construct new analyses on their own. Early feedback indicates that timely and extensive access to their "own" data is appreciated most, but it is also lowering the barrier for other tasks, for instance checking data quality and completeness (missing data, wrong coding).
Keywords: Clinical data warehouse; controlled vocabularies; data integration; information storage and retrieval; secondary use.