Computational Strategies for the Identification of a Transcriptional Biomarker Panel to Sense Cellular Growth States in Bacillus subtilis

Yiming Huang; Wendy Smith; Colin Harwood; Anil Wipat; Jaume Bacardit

doi:10.3390/s21072436

Computational Strategies for the Identification of a Transcriptional Biomarker Panel to Sense Cellular Growth States in Bacillus subtilis

Sensors (Basel). 2021 Apr 1;21(7):2436. doi: 10.3390/s21072436.

Authors

Yiming Huang¹, Wendy Smith¹, Colin Harwood², Anil Wipat¹, Jaume Bacardit¹

Affiliations

¹ Interdisciplinary Computing and Complex BioSystems (ICOS) Group, School of Computing, Newcastle University, Newcastle upon Tyne NE1 7RU, UK.
² Centre for Bacterial Cell Biology, Biosciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, UK.

Abstract

A goal of the biotechnology industry is to be able to recognise detrimental cellular states that may lead to suboptimal or anomalous growth in a bacterial population. Our current knowledge of how different environmental treatments modulate gene regulation and bring about physiology adaptations is limited, and hence it is difficult to determine the mechanisms that lead to their effects. Patterns of gene expression, revealed using technologies such as microarrays or RNA-seq, can provide useful biomarkers of different gene regulatory states indicative of a bacterium's physiological status. It is desirable to have only a few key genes as the biomarkers to reduce the costs of determining the transcriptional state by opening the way for methods such as quantitative RT-PCR and amplicon panels. In this paper, we used unsupervised machine learning to construct a transcriptional landscape model from condition-dependent transcriptome data, from which we have identified 10 clusters of samples with differentiated gene expression profiles and linked to different cellular growth states. Using an iterative feature elimination strategy, we identified a minimal panel of 10 biomarker genes that achieved 100% cross-validation accuracy in predicting the cluster assignment. Moreover, we designed and evaluated a variety of data processing strategies to ensure our methods were able to generate meaningful transcriptional landscape models, capturing relevant biological processes. Overall, the computational strategies introduced in this study facilitate the identification of a detailed set of relevant cellular growth states, and how to sense them using a reduced biomarker panel.

Keywords: Bacillus subtilis; biomarker identification; machine learning; transcriptional landscape.

MeSH terms

Bacillus subtilis* / genetics
Biomarkers
Gene Expression Profiling*
Microarray Analysis

Substances

Biomarkers

Grants and funding

EP/N031962/1/Engineering and Physical Sciences Research Council