Automated annotation of functional imaging experiments via multi-label classification

Matthew D Turner; Chayan Chakrabarti; Thomas B Jones; Jiawei F Xu; Peter T Fox; George F Luger; Angela R Laird; Jessica A Turner

doi:10.3389/fnins.2013.00240

Automated annotation of functional imaging experiments via multi-label classification

Front Neurosci. 2013 Dec 16:7:240. doi: 10.3389/fnins.2013.00240. eCollection 2013.

Authors

Matthew D Turner¹, Chayan Chakrabarti², Thomas B Jones², Jiawei F Xu², Peter T Fox³, George F Luger², Angela R Laird⁴, Jessica A Turner⁵

Affiliations

¹ Department of Computer Science, University of New Mexico Albuquerque, NM, USA ; Mind Research Network Albuquerque, NM, USA ; Conjectural Systems Atlanta, GA, USA.
² Department of Computer Science, University of New Mexico Albuquerque, NM, USA.
³ Research Imaging Center, University of Texas Health Science Center San Antonio, TX, USA.
⁴ Department of Physics, Florida International University Miami, FL, USA.
⁵ Mind Research Network Albuquerque, NM, USA ; Conjectural Systems Atlanta, GA, USA ; Department of Psychology and Neuroscience Institute, Georgia State University Atlanta, GA, USA.

Abstract

Identifying the experimental methods in human neuroimaging papers is important for grouping meaningfully similar experiments for meta-analyses. Currently, this can only be done by human readers. We present the performance of common machine learning (text mining) methods applied to the problem of automatically classifying or labeling this literature. Labeling terms are from the Cognitive Paradigm Ontology (CogPO), the text corpora are abstracts of published functional neuroimaging papers, and the methods use the performance of a human expert as training data. We aim to replicate the expert's annotation of multiple labels per abstract identifying the experimental stimuli, cognitive paradigms, response types, and other relevant dimensions of the experiments. We use several standard machine learning methods: naive Bayes (NB), k-nearest neighbor, and support vector machines (specifically SMO or sequential minimal optimization). Exact match performance ranged from only 15% in the worst cases to 78% in the best cases. NB methods combined with binary relevance transformations performed strongly and were robust to overfitting. This collection of results demonstrates what can be achieved with off-the-shelf software components and little to no pre-processing of raw text.

Keywords: CogPO; annotations; bioinformatics; data mining; multi-label classification; neuroimaging; text mining.

Abstract

Grants and funding