Objective: Large-scale receiver operating characteristic (ROC) studies are expensive and time-consuming. If most of the difference in diagnostic accuracy occurs in a subset of subtle cases, considerable effort could be saved by restricting comparisons to this subset. We investigate the effect of subtle cases on diagnostic accuracy, the magnitude of error that can occur because of an imbalance of subtle cases in two groups, and the potential for sample size reductions if only subtle cases are used.
Methods: Data from a previous study of posteroanterior chest radiographs were reanalyzed separately for subsets of typical cases and subsets of subtle cases. Actually positive and actually negative cases were classified as subtle or typical and as difficult or easy for diagnosis of the specific abnormality. The area under the ROC curve (Az) was used as the measure of diagnostic accuracy. Pairwise comparisons were done among three techniques and for the detection of nodules and interstitial disease.
Results: The performance index (Az) was significantly (> or = 25%) lower for the subset of subtle cases as compared with the subset of typical cases. The difference in observer performance between two techniques was more often greater in the subset of subtle cases than in the subset of typical cases.
Conclusion: The difference in diagnostic accuracy between the subset of typical cases and the subset of subtle cases is large enough that a difference in the proportion of subtle cases in two samples could result in clinically significant false differences in observer performance. Furthermore, the generally larger difference observed in the group of subtle cases suggests that sample sizes for some experiments could be reduced by 45-90% if the experiment were restricted to subtle cases.