Background: Several studies, which were limited by their small sample size and selection of difficult cases for review, have reported substantial variability among radiologists in interpretation of mammographic examinations. We have determined, in the largest study to date, intraobserver and interobserver agreement in interpreting screening mammography and accuracy of mammography by use of the American College of Radiology Breast Imaging Reporting and Data System (BI-RADS).
Methods: The mammographic examinations were randomly selected on the basis of original mammographic interpretation and cancer outcome from 71,713 screening examinations performed by the Mobile Mammography Screening Program of the University of California, San Francisco, during the period from April 1985 through February 1995. The final sample included 786 abnormal examinations with no cancer detected, 267 abnormal examinations with cancer detected, and 1563 normal examinations. Films were read separately by two radiologists according to BI-RADS. Cancer status was determined by contacting women's physicians and by linkage to the regional Surveillance, Epidemiology, and End Results Program.
Results: There was moderate agreement between radiologists in reporting the presence of a finding when cancer was present (kappa = 0.54) and substantial agreement when cancer was not present (kappa = 0.62). Agreement was moderate in assigning one of the five assessment categories but was statistically significantly lower when cancer was present relative to when cancer was not present (kappa = 0.46 versus 0.56; two-sided P = .02). Agreement for reporting the presence of a finding and mammographic assessment was two-fold more likely for examinations with less dense breasts. Agreement was higher on repeat readings by the same radiologists than between radiologists. The sensitivity of mammography was lower with BI-RADS than with the original system for mammographic interpretation, but the positive predictive value of mammography was higher.
Conclusion: Considerable variability in interpretation of mammographic examinations exists; this variability and the accuracy of mammography are neither improved nor diminished with use of BI-RADS.