Clarion is a multi-label problem transformation method for identifying mRNA subcellular localizations

Brief Bioinform. 2022 Nov 19;23(6):bbac467. doi: 10.1093/bib/bbac467.

Abstract

Subcellular localization of messenger RNAs (mRNAs) plays a key role in the spatial regulation of gene activity. The functions of mRNAs have been shown to be closely linked with their localizations. As such, understanding of the subcellular localizations of mRNAs can help elucidate gene regulatory networks. Despite several computational methods that have been developed to predict mRNA localizations within cells, there is still much room for improvement in predictive performance, especially for the multiple-location prediction. In this study, we proposed a novel multi-label multi-class predictor, termed Clarion, for mRNA subcellular localization prediction. Clarion was developed based on a manually curated benchmark dataset and leveraged the weighted series method for multi-label transformation. Extensive benchmarking tests demonstrated Clarion achieved competitive predictive performance and the weighted series method plays a crucial role in securing superior performance of Clarion. In addition, the independent test results indicate that Clarion outperformed the state-of-the-art methods and can secure accuracy of 81.47, 91.29, 79.77, 92.10, 89.15, 83.74, 80.74, 79.23 and 84.74% for chromatin, cytoplasm, cytosol, exosome, membrane, nucleolus, nucleoplasm, nucleus and ribosome, respectively. The webserver and local stand-alone tool of Clarion is freely available at http://monash.bioweb.cloud.edu.au/Clarion/.

Keywords: mRNA; machine learning; multi-class classification; multi-label prediction; sequence analysis; subcellular localization.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, N.I.H., Extramural

MeSH terms

  • Cell Nucleus* / genetics
  • Computational Biology / methods
  • Databases, Protein
  • Proteins*
  • RNA, Messenger / genetics

Substances

  • Proteins
  • RNA, Messenger