Machine Learning and External Validation of the IDENTIFY Risk Calculator for Patients with Haematuria Referred to Secondary Care for Suspected Urinary Tract Cancer

Eur Urol Focus. 2024 Jun 21:S2405-4569(24)00093-2. doi: 10.1016/j.euf.2024.06.004. Online ahead of print.

Abstract

Background: The IDENTIFY study developed a model to predict urinary tract cancer using patient characteristics from a large multicentre, international cohort of patients referred with haematuria. In addition to calculating an individual's cancer risk, it proposes thresholds to stratify them into very-low-risk (<1%), low-risk (1-<5%), intermediate-risk (5-<20%), and high-risk (≥20%) groups.

Objective: To externally validate the IDENTIFY haematuria risk calculator and compare traditional regression with machine learning algorithms.

Design, setting, and participants: Prospective data were collected on patients referred to secondary care with new haematuria. Data were collected for patient variables included in the IDENTIFY risk calculator, cancer outcome, and TNM staging. Machine learning methods were used to evaluate whether better models than those developed with traditional regression methods existed.

Outcome measurements and statistical analysis: The area under the receiver operating characteristic curve (AUC) for the detection of urinary tract cancer, calibration coefficient, calibration in the large (CITL), and Brier score were determined.

Results and limitations: There were 3582 patients in the validation cohort. The development and validation cohorts were well matched. The AUC of the IDENTIFY risk calculator on the validation cohort was 0.78. This improved to 0.80 on a subanalysis of urothelial cancer prevalent countries alone, with a calibration slope of 1.04, CITL of 0.24, and Brier score of 0.14. The best machine learning model was Random Forest, which achieved an AUC of 0.76 on the validation cohort. There were no cancers stratified to the very-low-risk group in the validation cohort. Most cancers were stratified to the intermediate- and high-risk groups, with more aggressive cancers in higher-risk groups.

Conclusions: The IDENTIFY risk calculator performed well at predicting cancer in patients referred with haematuria on external validation. This tool can be used by urologists to better counsel patients on their cancer risks, to prioritise diagnostic resources on appropriate patients, and to avoid unnecessary invasive procedures in those with a very low risk of cancer.

Patient summary: We previously developed a calculator that predicts patients' risk of cancer when they have blood in their urine, based on their personal characteristics. We have validated this risk calculator, by testing it on a separate group of patients to ensure that it works as expected. Most patients found to have cancer tended to be in the higher-risk groups and had more aggressive types of cancer with a higher risk. This tool can be used by clinicians to fast-track high-risk patients based on the calculator and investigate them more thoroughly.

Keywords: Bladder cancer; Cancer risk; Haematuria; Prediction; Predictive model; Renal cancer; Risk calculator; Upper tract urothelial cancer; Urinary tract cancer; Validation.