CloudForest: A Scalable and Efficient Random Forest Implementation for Biological Data

Ryan Bressler; Richard B Kreisberg; Brady Bernard; John E Niederhuber; Joseph G Vockley; Ilya Shmulevich; Theo A Knijnenburg

doi:10.1371/journal.pone.0144820

CloudForest: A Scalable and Efficient Random Forest Implementation for Biological Data

PLoS One. 2015 Dec 17;10(12):e0144820. doi: 10.1371/journal.pone.0144820. eCollection 2015.

Authors

Ryan Bressler¹, Richard B Kreisberg¹, Brady Bernard¹, John E Niederhuber², Joseph G Vockley^{2

3}, Ilya Shmulevich¹, Theo A Knijnenburg¹

Affiliations

¹ Institute for Systems Biology, Seattle, WA, United States of America.
² Inova Translational Medicine Institute, Inova Health System and Inova Fairfax Medical Center, Falls Church, VA, United States of America.
³ Virginia Commonwealth University, School of Medicine, Richmond, VA, United States of America.

Abstract

Random Forest has become a standard data analysis tool in computational biology. However, extensions to existing implementations are often necessary to handle the complexity of biological datasets and their associated research questions. The growing size of these datasets requires high performance implementations. We describe CloudForest, a Random Forest package written in Go, which is particularly well suited for large, heterogeneous, genetic and biomedical datasets. CloudForest includes several extensions, such as dealing with unbalanced classes and missing values. Its flexible design enables users to easily implement additional extensions. CloudForest achieves fast running times by effective use of the CPU cache, optimizing for different classes of features and efficiently multi-threading. https://github.com/ilyalab/CloudForest.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Classification
Computational Biology / methods*
Data Interpretation, Statistical
Programming Languages
Regression Analysis
Software

Abstract

Publication types

MeSH terms

Grants and funding