The genetics and heredity of complex human traits have been studied for over a century. Many genes have been implicated in these complex traits. Genome-wide association studies (GWAS) were designed to investigate the association between common genetic variation and complex human traits using high-throughput platforms that measured hundreds of thousands of common single-nucleotide polymorphisms (SNPs). GWAS have successfully identified many novel genetic loci associated with complex traits using a univariate regression-based approach. Even for traits with a large number of identified variants, only a small fraction of the interindividual variation in risk phenotypes has been explained. In biological systems, protein, DNA, RNA, and metabolites frequently interact to each other to perform their biological functions, and to respond to environmental factors. The complex interactions among genes and between the genes and environment may partially explain the "missing heritability." The traditional regression-based methods are limited to address the complex interactions among the hundreds of thousands of SNPs and their environmental context by both the modeling and computational challenge. Random Forests (RF), one of the powerful machine learning methods, is regarded as a useful alternative to capture the complex interaction effects among the GWAS data, and potentially address the genetic heterogeneity underlying these complex traits using a computationally efficient framework. In this chapter, the features of prediction and variable selection, and their applications in genetic association studies are reviewed and discussed. Additional improvements of the original RF method are warranted to make the applications in GWAS to be more successful.
Copyright © 2010 Elsevier Inc. All rights reserved.