ASElux: an ultra-fast and accurate allelic reads counter

Bioinformatics. 2018 Apr 15;34(8):1313-1320. doi: 10.1093/bioinformatics/btx762.

Abstract

Motivation: Mapping bias causes preferential alignment to the reference allele, forming a major obstacle in allele-specific expression (ASE) analysis. The existing methods, such as simulation and SNP-aware alignment, are either inaccurate or relatively slow. To fast and accurately count allelic reads for ASE analysis, we developed a novel approach, ASElux, which utilizes the personal SNP information and counts allelic reads directly from unmapped RNA-sequence (RNA-seq) data. ASElux significantly reduces runtime by disregarding reads outside single nucleotide polymorphisms (SNPs) during the alignment.

Results: When compared to other tools on simulated and experimental data, ASElux achieves a higher accuracy on ASE estimation than non-SNP-aware aligners and requires a much shorter time than the benchmark SNP-aware aligner, GSNAP with just a slight loss in performance. ASElux can process 40 million read-pairs from an RNA-sequence (RNA-seq) sample and count allelic reads within 10 min, which is comparable to directly counting the allelic reads from alignments based on other tools. Furthermore, processing an RNA-seq sample using ASElux in conjunction with a general aligner, such as STAR, is more accurate and still ∼4× faster than STAR + WASP, and ∼33× faster than the lead SNP-aware aligner, GSNAP, making ASElux ideal for ASE analysis of large-scale transcriptomic studies. We applied ASElux to 273 lung RNA-seq samples from GTEx and identified a splice-QTL rs11078928 in lung which explains the mechanism underlying an asthma GWAS SNP rs11078927. Thus, our analysis demonstrated ASE as a highly powerful complementary tool to cis-expression quantitative trait locus (eQTL) analysis.

Availability and implementation: The software can be downloaded from https://github.com/abl0719/ASElux.

Contact: zmiao@ucla.edu or a5ko@ucla.edu.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Alleles*
  • Gene Expression Profiling / methods*
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Polymorphism, Single Nucleotide*
  • Quantitative Trait Loci*
  • Sequence Analysis, RNA / methods
  • Software*