Analysis of array CGH data: from signal ratio to gain and loss of DNA regions

Bioinformatics. 2004 Dec 12;20(18):3413-22. doi: 10.1093/bioinformatics/bth418. Epub 2004 Sep 20.

Abstract

Motivation: Genomic DNA regions are frequently lost or gained during tumor progression. Array Comparative Genomic Hybridization (array CGH) technology makes it possible to assess these changes in DNA in cancers, by comparison with a normal reference. The identification of systematically deleted or amplified genomic regions in a set of tumors enables biologists to identify genes involved in cancer progression because tumor suppressor genes are thought to be located in lost genomic regions and oncogenes, in gained regions. Array CGH profiles should also improve the classification of tumors. The achievement of these goals requires a methodology for detecting the breakpoints delimiting altered regions in genomic patterns and assigning a status (normal, gained or lost) to each chromosomal region.

Results: We have developed a methodology for the automatic detection of breakpoints from array CGH profile, and the assignment of a status to each chromosomal region. The breakpoint detection step is based on the Adaptive Weights Smoothing (AWS) procedure and provides highly convincing results: our algorithm detects 97, 100 and 94% of breakpoints in simulated data, karyotyping results and manually analyzed profiles, respectively. The percentage of correctly assigned statuses ranges from 98.9 to 99.8% for simulated data and is 100% for karyotyping results. Our algorithm also outperforms other solutions on a public reference dataset.

Availability: The R package GLAD (Gain and Loss Analysis of DNA) is available upon request.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Algorithms*
  • Chromosome Mapping / methods*
  • DNA Mutational Analysis / methods*
  • Gene Dosage
  • Genetic Variation
  • Humans
  • In Situ Hybridization / methods*
  • Models, Genetic
  • Models, Statistical
  • Oligonucleotide Array Sequence Analysis / methods*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Analysis, DNA / methods*
  • Software
  • Stochastic Processes
  • Urinary Bladder Neoplasms / diagnosis
  • Urinary Bladder Neoplasms / genetics