Data integration and genomic medicine

J Biomed Inform. 2007 Feb;40(1):5-16. doi: 10.1016/j.jbi.2006.02.007. Epub 2006 Mar 9.

Abstract

Genomic medicine aims to revolutionize health care by applying our growing understanding of the molecular basis of disease. Research in this arena is data intensive, which means data sets are large and highly heterogeneous. To create knowledge from data, researchers must integrate these large and diverse data sets. This presents daunting informatic challenges such as representation of data that is suitable for computational inference (knowledge representation), and linking heterogeneous data sets (data integration). Fortunately, many of these challenges can be classified as data integration problems, and technologies exist in the area of data integration that may be applied to these challenges. In this paper, we discuss the opportunities of genomic medicine as well as identify the informatics challenges in this domain. We also review concepts and methodologies in the field of data integration. These data integration concepts and methodologies are then aligned with informatics challenges in genomic medicine and presented as potential solutions. We conclude this paper with challenges still not addressed in genomic medicine and gaps that remain in data integration research to facilitate genomic medicine.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Review

MeSH terms

  • Biomedical Research / methods*
  • Biomedical Research / trends
  • Database Management Systems*
  • Databases, Genetic*
  • Genomics / methods*
  • Genomics / trends
  • Information Storage and Retrieval / methods*
  • Information Storage and Retrieval / trends
  • Oligonucleotide Array Sequence Analysis / methods*
  • Oligonucleotide Array Sequence Analysis / trends
  • Systems Integration
  • User-Computer Interface*