David vs. Goliath: comparing conventional machine learning and a large language model for assessing students' concept use in a physics problem

Fabian Kieser; Paul Tschisgale; Sophia Rauh; Xiaoyu Bai; Holger Maus; Stefan Petersen; Manfred Stede; Knut Neumann; Peter Wulff

doi:10.3389/frai.2024.1408817

David vs. Goliath: comparing conventional machine learning and a large language model for assessing students' concept use in a physics problem

Front Artif Intell. 2024 Sep 18:7:1408817. doi: 10.3389/frai.2024.1408817. eCollection 2024.

Authors

Fabian Kieser¹, Paul Tschisgale², Sophia Rauh³, Xiaoyu Bai³, Holger Maus², Stefan Petersen², Manfred Stede³, Knut Neumann², Peter Wulff¹

Affiliations

¹ Physics and Physics Education Research, Heidelberg University of Education, Heidelberg, Germany.
² Department of Physics Education, Leibniz Institute for Science and Mathematics Education, Kiel, Germany.
³ Applied Computational Linguistics, University of Potsdam, Potsdam, Germany.

Abstract

Large language models have been shown to excel in many different tasks across disciplines and research sites. They provide novel opportunities to enhance educational research and instruction in different ways such as assessment. However, these methods have also been shown to have fundamental limitations. These relate, among others, to hallucinating knowledge, explainability of model decisions, and resource expenditure. As such, more conventional machine learning algorithms might be more convenient for specific research problems because they allow researchers more control over their research. Yet, the circumstances in which either conventional machine learning or large language models are preferable choices are not well understood. This study seeks to answer the question to what extent either conventional machine learning algorithms or a recently advanced large language model performs better in assessing students' concept use in a physics problem-solving task. We found that conventional machine learning algorithms in combination outperformed the large language model. Model decisions were then analyzed via closer examination of the models' classifications. We conclude that in specific contexts, conventional machine learning can supplement large language models, especially when labeled data is available.

Keywords: explainable AI; large language models; machine learning; natural language processing; problem solving.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was supported by the Leibniz Association, Germany [grant number K194/2015], the Klaus-Tschira-Stiftung [project “WasP” under grant number 00.001.2023] and the BMBF [project “Akilas” under grant number 16SV8610].