Sampling and ranking spatial transcriptomics data embeddings to identify tissue architecture

Yu Lin; Yan Wang; Yanchun Liang; Yang Yu; Jingyi Li; Qin Ma; Fei He; Dong Xu

doi:10.3389/fgene.2022.912813

Sampling and ranking spatial transcriptomics data embeddings to identify tissue architecture

Front Genet. 2022 Aug 12:13:912813. doi: 10.3389/fgene.2022.912813. eCollection 2022.

Authors

Yu Lin^{1

2}, Yan Wang^{1

3}, Yanchun Liang^{3

4}, Yang Yu⁵, Jingyi Li⁵, Qin Ma⁶, Fei He⁵, Dong Xu²

Affiliations

¹ School of Artificial Intelligence, Jilin University, Changchun, China.
² Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States.
³ Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.
⁴ School of Computer Science, Zhuhai College of Science and Technology, Zhuhai, China.
⁵ School of Information Science and Technology, Northeast Normal University, Changchun, China.
⁶ Department of Biomedical Informatics, The Ohio State University, Columbus, OH, United States.

Abstract

Spatial transcriptomics is an emerging technology widely applied to the analyses of tissue architecture and corresponding biological functions. Substantial computational methods have been developed for analyzing spatial transcriptomics data. These methods generate embeddings from gene expression and spatial locations for spot clustering or tissue architecture segmentation. Although the hyperparameters used to produce an embedding can be tuned for a given training set, a fixed embedding has variable performance from case to case due to data distributions. Therefore, selecting an effective embedding for new data in advance would be useful. For this purpose, we developed an embedding evaluation method named message passing-Moran's I with maximum filtering (MP-MIM), which combines message passing-based embedding transformation with spatial autocorrelation analysis. We applied a graph convolution to aggregate spatial transcriptomics data and employed global Moran's I to measure spatial autocorrelation and select the most effective embedding to infer tissue architecture. Sixteen spatial transcriptomics samples generated from the human brain were used to validate our method. The results show that MP-MIM can accurately identify high-quality embeddings that produce a high correlation between the predicted tissue architecture and the ground truth. Overall, our study provides a novel method to select embeddings for new test data and enhance the usability of deep learning tools for spatial transcriptome analyses.

Keywords: deep learning; embedding evaluation; message passing; spatial autocorrelation; spatial transcriptomics; tissue architecture.

Grants and funding

R35 GM126985/GM/NIGMS NIH HHS/United States