Speaker
Description
Introduction
Monitoring the clinical performance of healthcare units (e.g. hospitals, surgeons) is the main component for national audits, enabling identification of ‘outlier’ units whose clinical performance, e.g. in-hospital mortality, deviates significantly from expected performance. Accurate detection and subsequent management of outliers are critical for improving healthcare quality.
Two frequently implemented statistical frameworks for outlier detection are Common Mean Model (CMM) and Random-Effects Logistic Regression (RELR). Our study evaluates their performance through simulation under violations of their underlying distributional assumptions and provides recommendations for their appropriate use.
Methods
CMM assumes that the probability of death is the same in all units, attributing any differences to random variation. As the observed variability is often larger than expected (overdispersion), CMM is applied with an overdispersion correction. Outliers are detected using test-statistics based on differences between observed and expected unit probabilities. In contrast, RELR uses test-statistics based on estimated random effects on the logit scale. Both methods assume that the test-statistics for in-control units follow a normal distribution, but they are on different scales: probability scale for CMM and logit scale for RELR. Due to the non-linearity of the logit function, both assumptions cannot hold simultaneously unless outcome prevalence is near 0.5.
We simulated scenarios varying the number of units, unit sizes, outcome prevalences, and between-unit variability. Two data-generating mechanisms (DGMs) were used, based on CMM and RELR, respectively. For each method, we assessed the overall false positive rate (FPR) and the FPR for ‘good’ (low mortality) and ‘bad’ (high mortality) outliers separately. We further evaluated the performance of using QQ plots for selecting a method whose normality assumption was best satisfied in each scenario.
Results
Both methods maintained nominal overall FPR. However, FPRs for good and bad outliers deviated from nominal levels when the DGM and outlier-detection method were misaligned. Under low outcome prevalence, applying CMM to RELR-DGM data caused severe over-detection of bad outliers and under-detection of good outliers (and vice versa). These discrepancies increased with smaller unit size and greater between-unit variability. Application to real datasets with low prevalence showed consistent patterns. Quantifying departures from normality in QQ plots was effective in identifying the most appropriate method for a given scenario.
Conclusion
Violating the normality assumption in CMM or RELR can have serious implications, potentially leading to unfair scrutiny of healthcare units or failure to detect underperformance. The most appropriate method can be chosen through checks of the distribution of test-statistics.
85717610367