Objective: Comparisons of the performance of multiple health care providers are often based on hypothesis tests, those with resulting P-values below some critical threshold being identified as potentially extreme. Because of the multiple testing involved, the classical P-value threshold of, say, 0.05 may not be considered strict enough, as it will tend to lead to too many "false positives." However, we argue that the commonly used Bonferroni-corrected threshold is in general too strict for the problem in hand. The purpose of this article is to demonstrate a suitable alternative thresholding procedure that is already well established in other fields.
Study design and setting: The suggested procedure involves control of an error measure called the "false discovery rate" (FDR). We present a worked example involving a comparison of risk-adjusted mortality rates following heart surgery in New York State hospitals during 2000-2002. It is shown that the FDR critical threshold lines can be drawn on a "funnel plot," providing a simple graphical presentation of the results.
Results: The FDR procedure identified more providers as potentially extreme than the Bonferroni correction, while maintaining control of an intuitively sensible error measure.
Conclusion: Control of the FDR offers a simple guideline to determining where to draw critical thresholds when comparing multiple health care providers.