One challenge in the engineering of biological systems is to be able to recognise the cellular stress states of bacterial hosts, as these stress states can lead to suboptimal growth and lower yields of target products. To enable the design of genetic circuits for reporting or mitigating the stress states, it is important to identify a relatively reduced set of gene biomarkers that can reliably indicate relevant cellular growth states in bacteria. Recent advances in high-throughput omics technologies have enhanced the identification of molecular biomarkers specific states in bacteria, motivating computational methods that can identify robust biomarkers for experimental characterisation and verification. Focused on identifying gene expression biomarkers to sense various stress states in Bacillus subtilis, this study aimed to design a knowledge integration strategy for the selection of a robust biomarker panel that generalises on external datasets and experiments. We developed a recommendation system that ranks the candidate biomarker panels based on complementary information from machine learning model, gene regulatory network and co-expression network. We identified a recommended biomarker panel showing high stress sensing power for a variety of conditions both in the dataset used for biomarker identification (mean f1-score achieved at 0.99), as well as in a range of independent datasets (mean f1-score achieved at 0.98). We discovered a significant correlation between stress sensing power and evaluation metrics such as the number of associated regulators in a B. subtilis gene regulatory network (GRN) and the number of associated modules in a B. subtilis co-expression network (CEN). GRNs and CENs provide information relevant to the diversity of biological processes encoded by biomarker genes. We demonstrate that quantitatively relating meaningful evaluation metrics with stress sensing power has the potential for recognising biomarkers that show better sensitivity and robustness to an extended set of stress conditions and enable a more reliable biomarker panel selection.
Keywords: Biomarker discovery; Machine learning; System biology; Transcriptomics analysis.
© 2022 The Authors.