Abnormal DNA methylation is a major early contributor to colon cancer (COAD) development. We conducted a cohort-based systematic investigation of genome-wide DNA methylation using 299 COAD and 38 normal tissue samples from TCGA. Through conditional screening and machine learning with a training cohort, we identified one hypomethylated and nine hypermethylated differentially methylated CpG sites as potential diagnostic biomarkers, and used them to construct a COAD-specific diagnostic model. Unlike previous models, our model precisely distinguished COAD from nine other cancer types (e.g., breast cancer and liver cancer; error rate ≤ 0.05) and from normal tissues in the training cohort (AUC = 1). The diagnostic model was verified using a validation cohort from The Cancer Genome Atlas (AUC = 1) and five independent cohorts from the Gene Expression Omnibus (AUC ≥ 0.951). Using Cox regression analyses, we established a prognostic model based on six CpG sites in the training cohort, and verified the model in the validation cohort. The prognostic model sensitively predicted patients' survival (p ≤ 0.00011, AUC ≥ 0.792) independently of important clinicopathological characteristics of COAD (e.g., gender and age). Thus, our DNA methylation analysis provided precise biomarkers and models for the early diagnosis and prognostic evaluation of COAD.
Keywords: COAD; DMP; diagnosis; pan-cancer; prognosis.