Learning anchor verbs for biological interaction patterns from published text articles

Int J Med Inform. 2002 Dec 4;67(1-3):19-32. doi: 10.1016/s1386-5056(02)00054-0.

Abstract

Much of knowledge modeling in the molecular biology domain involves interactions between proteins, genes, various forms of RNA, small molecules, etc. Interactions between these substances are typically extracted and codified manually, increasing the cost and time for modeling and substantially limiting the coverage of the resulting knowledge base. In this paper, we describe an automatic system that learns from text interaction verbs; these verbs can then form the core of automatically retrieved patterns which model classes of biological interactions. We investigate text features relating verbs with genes and proteins, and apply statistical tests and a logistic regression statistical model to determine whether a given verb belongs to the class of interaction verbs. Our system, AVAD, achieves over 87% precision and 82% recall when tested on an 11 million word corpus of journal articles. In addition, we compare the automatically obtained results with a manually constructed database of interaction verbs and show that the automatic approach can significantly enrich the manual list by detecting rarer interaction verbs that were omitted from the database.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Artificial Intelligence*
  • Databases as Topic
  • Humans
  • Information Storage and Retrieval / methods*
  • Logistic Models
  • Models, Statistical
  • Molecular Biology*
  • Natural Language Processing*
  • Periodicals as Topic
  • Protein Interaction Mapping
  • Vocabulary