An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs

Nucleic Acids Res. 2006 Jun 6;34(10):3150-60. doi: 10.1093/nar/gkl396. Print 2006.

Abstract

Reconstructing full-length transcript isoforms from sequence fragments (such as ESTs) is a major interest and challenge for bioinformatic analysis of pre-mRNA alternative splicing. This problem has been formulated as finding traversals across the splice graph, which is a directed acyclic graph (DAG) representation of gene structure and alternative splicing. In this manuscript we introduce a probabilistic formulation of the isoform reconstruction problem, and provide an expectation-maximization (EM) algorithm for its maximum likelihood solution. Using a series of simulated data and expressed sequences from real human genes, we demonstrate that our EM algorithm can correctly handle various situations of fragmentation and coupling in the input data. Our work establishes a general probabilistic framework for splice graph-based reconstructions of full-length isoforms.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Alternative Splicing*
  • Chromosomes, Human, Pair 22
  • Computer Simulation
  • Expressed Sequence Tags
  • HLA-D Antigens / genetics
  • HLA-D Antigens / metabolism
  • Humans
  • Likelihood Functions
  • Protein Isoforms / genetics*
  • Protein Isoforms / metabolism
  • Tropomyosin / genetics
  • Tropomyosin / metabolism

Substances

  • HLA-D Antigens
  • HLA-DM antigens
  • Protein Isoforms
  • TPM1 protein, human
  • Tropomyosin