Assemble the shallow or integrate a deep? Toward a lightweight solution for glyph-aware Chinese text classification

Jingrui Hou; Ping Wang

doi:10.1371/journal.pone.0289204

Assemble the shallow or integrate a deep? Toward a lightweight solution for glyph-aware Chinese text classification

PLoS One. 2023 Jul 28;18(7):e0289204. doi: 10.1371/journal.pone.0289204. eCollection 2023.

Authors

Jingrui Hou¹, Ping Wang^{2

3}

Affiliations

¹ Department of Computer Science, School of Science, Loughborough University, Loughborough, Leicestershire, United Kingdom.
² Center for the Studies of Information Resources, Wuhan University, Wuhan, Hubei, China.
³ School of Information Management, Wuhan University, Wuhan, Hubei, China.

Abstract

As hieroglyphic languages, such as Chinese, differ from alphabetic languages, researchers have always been interested in using internal glyph features to enhance semantic representation. However, the models used in such studies are becoming increasingly computationally expensive, even for simple tasks like text classification. In this paper, we aim to balance model performance and computation cost in glyph-aware Chinese text classification tasks. To address this issue, we propose a lightweight ensemble learning method for glyph-aware Chinese text classification (LEGACT) that consists of typical shallow networks as base learners and machine learning classifiers as meta-learners. Through model design and a series of experiments, we demonstrate that an ensemble approach integrating shallow neural networks can achieve comparable results even when compared to large-scale transformer models. The contribution of this paper includes a lightweight yet powerful solution for glyph-aware Chinese text classification and empirical evidence of the significance of glyph features for hieroglyphic language representation. Moreover, this paper emphasizes the importance of assembling shallow neural networks with proper ensemble strategies to reduce computational workload in predictive tasks.

Copyright: © 2023 Hou, Wang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Grants and funding

This work was financially supported by the National Natural Science Foundation of China under grant number 72074171. The funder did not participate in the study design, data collection and analysis, decision to publish, or in the preparation of the manuscript.