Heuristic Bayesian segmentation for discovery of coexpressed genes within genomic regions

IEEE/ACM Trans Comput Biol Bioinform. 2010 Jan-Mar;7(1):37-49. doi: 10.1109/TCBB.2008.56.

Abstract

Segmentation aims to separate homogeneous areas from the sequential data, and plays a central role in data mining. It has applications ranging from finance to molecular biology, where bioinformatics tasks such as genome data analysis are active application fields. In this paper, we present a novel application of segmentation in locating genomic regions with coexpressed genes. We aim at automated discovery of such regions without requirement for user-given parameters. In order to perform the segmentation within a reasonable time, we use heuristics. Most of the heuristic segmentation algorithms require some decision on the number of segments. This is usually accomplished by using asymptotic model selection methods like the Bayesian information criterion. Such methods are based on some simplification, which can limit their usage. In this paper, we propose a Bayesian model selection to choose the most proper result from heuristic segmentation. Our Bayesian model presents a simple prior for the segmentation solutions with various segment numbers and a modified Dirichlet prior for modeling multinomial data. We show with various artificial data sets in our benchmark system that our model selection criterion has the best overall performance. The application of our method in yeast cell-cycle gene expression data reveals potential active and passive regions of the genome.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Base Sequence
  • Bayes Theorem
  • Chromosome Mapping / methods*
  • Genome / genetics*
  • Molecular Sequence Data
  • Multigene Family / genetics*
  • Pattern Recognition, Automated / methods*
  • Sequence Analysis, DNA / methods*