Automated Scoring of the Speech Intelligibility Test Using Autoscore

Am J Speech Lang Pathol. 2024 Dec 12:1-12. doi: 10.1044/2024_AJSLP-24-00276. Online ahead of print.

Abstract

Purpose: The purpose of the current study was to develop and test extensions to Autoscore, an automated approach for scoring listener transcriptions against target stimuli, for scoring the Speech Intelligibility Test (SIT), a widely used test for quantifying intelligibility in individuals with dysarthria.

Method: Three main extensions to Autoscore were created including a compound rule, a contractions rule, and a numbers rule. We used two sets of previously collected listener SIT transcripts (N = 4,642) from databases of dysarthric speakers to evaluate the accuracy of the Autoscore SIT extensions. A human scorer and SIT-extended Autoscore were used to score sentence transcripts in both data sets. Scoring performance was determined by (a) comparing Autoscore and human scores using intraclass correlations (ICCs) at individual sentence and speaker levels and (b) comparing SIT-extended Autoscore performance to the original Autoscore with ICCs.

Results: At both the individual sentence and speaker levels, Autoscore and the human scorer were nearly identical for both Data Set 1 (ICC = .9922 and ICC = .9767, respectively) and Data Set 2 (ICC = .9934 and ICC = .9946, respectively). Where disagreements between Autoscore and a human scorer occurred, the differences were often small (i.e., within 1 or 2 points). Across the two data sets (N = 4,642 sentences), SIT-extended Autoscore rendered 510 disagreements with the human scorer (vs. 571 disagreements for the original Autoscore).

Discussion: Overall, SIT-extended Autoscore performed as well as human scorers and substantially improved scoring accuracy relative to the original version of Autoscore. Coupled with the substantial time and effort saving provided by Autoscore, its utility has been strengthened by the extensions developed and tested here.