Polycomb group (PcG) proteins are important epigenetic regulators, yet the underlying targeting mechanism in mammals is still poorly understood. We have developed a computational approach to predict genome-wide PcG target genes in mouse embryonic stem cells. We use TF binding and motif information as predictors and apply the Bayesian Additive Regression Trees (BART) model for classification. Our model has good prediction accuracy. The performance can be mainly explained by five TF features (Zf5, Tcfcp2l1, Ctcf, E2f1, Myc). Our analysis of H3K27me3 and gene expression data suggests that genomic sequence is highly correlated with the overall PcG target plasticity. We have also compared the PcG target sequence signatures between mouse and Drosophila and found that they are strikingly different. Our predictions may be useful for de novo search for Polycomb response elements (PRE) in mammals.
Copyright (c) 2010 Elsevier Inc. All rights reserved.