Categorizing prognostic variables is essential for their use in clinical decision-making. Often a single cutpoint that stratifies patients into high-risk and low-risk categories is sought. These categories may be used for making treatment recommendations, determining study eligibility, or to control for varying patient prognoses in the design of a clinical trial. Methods used to categorize variables include: biological determination (most desirable but often unavailable); arbitrary selection of a cutpoint at the median value; graphical examination of the data for a threshold effect; and exploration of all observed values for the one which best separates the risk groups according to a chi-squared test. The last method, called the minimum p-value approach, involves multiple testing which inflates the type I error rates. Several methods for adjusting the inflated p-values have been proposed but remain infrequently used. Exploratory methods for categorization and the minimum p-value approach with its various p-value corrections are reviewed, and code for their easy implementation is provided. The combined use of these methods is recommended, and demonstrated in the context of two cancer-related examples which highlight a variety of the issues involved in the categorization of prognostic variables.
Copyright 2000 John Wiley & Sons, Ltd.