Observer studies involving detection and localization: modeling, analysis, and validation

Dev P Chakraborty; Kevin S Berbaum

doi:10.1118/1.1769352

Observer studies involving detection and localization: modeling, analysis, and validation

Med Phys. 2004 Aug;31(8):2313-30. doi: 10.1118/1.1769352.

Authors

Dev P Chakraborty¹, Kevin S Berbaum

Affiliation

¹ Department of Radiology, University of Pittsburgh, Pittsburgh, Pennsylvania 15213, USA. dpc10@pitt.edu

PMID: 15377098
DOI: 10.1118/1.1769352

Abstract

Although the receiver operating characteristic (ROC) paradigm is the accepted method for evaluation of diagnostic imaging systems, it has some serious shortcomings inasmuch as it is restricted to one observer report per image. By contrast the free-response ROC (FROC) paradigm and associated analysis method allows the observer to report multiple abnormalities within each imaging study, and uses the location of reported abnormalities to improve the measurement. Because the ROC method cannot accommodate multiple responses or use location information, its statistical power will suffer. The FROC paradigm/analysis has not enjoyed widespread acceptance because of concern about whether responses made to the same diagnostic study can be treated as independent. We propose a new jackknife FROC analysis method (JAFROC) that does not make the independence assumption. The new analysis method combines elements of FROC and the Dorfman-Berbaum-Metz (DBM) methods. To compare JAFROC to an earlier free-response analysis method (specifically the alternative free-response, or AFROC method), and to the DBM method, which uses conventional ROC scoring, we developed a model for generating simulated FROC data. The simulation model is based on an eye-movement model of how experts evaluate images. It allowed us to examine null hypothesis (NH) behavior and statistical power of the different methods. We found that AFROC analysis did not pass the NH test, being unduly conservative. Both the JAFROC method and the DBM method passed the NH test, but JAFROC had more statistical power than the DBM method. The results of this comparison suggest that future studies of diagnostic performance may enjoy improved statistical power or reduced sample size requirements through the use of the JAFROC method.

Publication types

Research Support, U.S. Gov't, P.H.S.

MeSH terms

Algorithms*
Humans
Mammography / methods
Models, Theoretical*
Observer Variation
Reproducibility of Results
Software*

Abstract

Publication types

MeSH terms

Grants and funding