The Essential Component in DNA-Based Information Storage System: Robust Error-Tolerating Module

Aldrin Kay-Yuen Yim; Allen Chi-Shing Yu; Jing-Woei Li; Ada In-Chun Wong; Jacky F C Loo; King Ming Chan; S K Kong; Kevin Y Yip; Ting-Fung Chan

doi:10.3389/fbioe.2014.00049

The Essential Component in DNA-Based Information Storage System: Robust Error-Tolerating Module

Front Bioeng Biotechnol. 2014 Nov 6:2:49. doi: 10.3389/fbioe.2014.00049. eCollection 2014.

Authors

Aldrin Kay-Yuen Yim¹, Allen Chi-Shing Yu², Jing-Woei Li², Ada In-Chun Wong³, Jacky F C Loo³, King Ming Chan³, S K Kong³, Kevin Y Yip⁴, Ting-Fung Chan¹

Affiliations

¹ School of Life Sciences, The Chinese University of Hong Kong , Hong Kong , China ; Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong , Hong Kong , China ; State Key Laboratory of Argobiotechnology, The Chinese University of Hong Kong , Hong Kong , China ; Department of Computer Science and Engineering, The Chinese University of Hong Kong , Hong Kong , China.
² School of Life Sciences, The Chinese University of Hong Kong , Hong Kong , China ; Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong , Hong Kong , China.
³ School of Life Sciences, The Chinese University of Hong Kong , Hong Kong , China.
⁴ Department of Computer Science and Engineering, The Chinese University of Hong Kong , Hong Kong , China.

Abstract

The size of digital data is ever increasing and is expected to grow to 40,000 EB by 2020, yet the estimated global information storage capacity in 2011 is <300 EB, indicating that most of the data are transient. DNA, as a very stable nano-molecule, is an ideal massive storage device for long-term data archive. The two most notable illustrations are from Church et al. and Goldman et al., whose approaches are well-optimized for most sequencing platforms - short synthesized DNA fragments without homopolymer. Here, we suggested improvements on error handling methodology that could enable the integration of DNA-based computational process, e.g., algorithms based on self-assembly of DNA. As a proof of concept, a picture of size 438 bytes was encoded to DNA with low-density parity-check error-correction code. We salvaged a significant portion of sequencing reads with mutations generated during DNA synthesis and sequencing and successfully reconstructed the entire picture. A modular-based programing framework - DNAcodec with an eXtensible Markup Language-based data format was also introduced. Our experiments demonstrated the practicability of long DNA message recovery with high error tolerance, which opens the field to biocomputing and synthetic biology.

Keywords: DNA-based computational process; DNA-based information storage; biocomputing; error-tolerating module; synthetic biology.

Publication types

Review