Purpose: Patient-Reported Outcomes (PROs) are widely used in clinical trials, epidemiological research, quality of life (QOL) studies, routine clinical care, and medical surveillance. The Patient Reported Outcomes Measurement Information System (PROMIS) is a system of reliable and standardized measures of PROs developed with Item Response Theory (IRT) using latent scores. Power estimation is critical to clinical trials and research designs. However, in clinical trials with PROs as endpoints, observed scores are often used to calculate power rather than latent scores.
Methods: In this paper, we conducted a series of simulations to compare the power obtained with IRT latent scores, including Bayesian IRT, Frequentist IRT, and observed scores, focusing on small sample size common in pilot studies and Phase I/II trials. Taking the PROMIS depression measures as an example, we simulated data and estimated power for two-armed clinical trials manipulating the following factors: sample size, effect size, and number of items. We also examined how misspecification of effect size affected power estimation.
Results: Our results showed that the Bayesian IRT, which incorporated prior information into latent score estimation, yielded the highest power, especially when sample size was small. The effect of misspecification diminished as sample size increased.
Conclusion: For power estimation in two-armed clinical trials with standardized PRO endpoints, if a medium effect size or larger is expected, we recommend BIRT simulation with well-grounded informative priors and a total sample size of at least 40.
Keywords: Bayesian IRT; Clinical trials; Misspecification; PRO; Power; Small sample.
© 2025. The Author(s), under exclusive licence to Springer Nature Switzerland AG.