18–21 May 2026
Europe/Warsaw timezone

How many subgroup analyses are false (-positives or negatives)? – Evidence from p-value distributions of interaction tests and mixture models in diabetes research

21 May 2026, 16:39
18m
Room 13 A

Room 13 A

Speaker

Oliver Kuss (German Diabetes Center, Leibniz Institute for Diabetes Research at Heinrich Heine University Düsseldorf, Institute for Biometrics and Epidemiology)

Description

Subgroup analyses are frequently reported results from randomized trials. They help to identify heterogeneity in the average treatment effect, which occurs when this average effect varies across different categories of a subgroup factor, like age, sex, or disease severity. If treatment effects are different across subgroups, this information can help to personalize treatment decisions. However, the limitations of subgroup analyses are also well known. Since each subgroup analysis involves a separate statistical test, performing many of them increases the likelihood of finding false-positive results, that is, statistically significant results for actually true null hypotheses (type I error). On the other hand, subgroups have smaller sample sizes than the overall trial population, which means that they often lack the statistical power to detect a true subgroup effect, leading to a higher risk of false-negative results (type II error). Despite the high risk of false findings in subgroup analyses, there is surprisingly little empirical research quantifying the actual proportion of false subgroup analyses.

We reiterate the basic idea of these analyses, which was developed more than 20 years ago for microarray analyses but has apparently never been applied to subgroup analyses in clinical research. The basic building blocks for our analyses are the p-values from the subgroups' interaction tests. It is assumed that these p-values originate from two component distributions: the first describes the true null hypotheses (yielding a uniform distribution of p-values), and the second describes the false null hypotheses—that is, the true subgroup effects. These two distributions are combined in a 2-component mixture model. We fit the model to obtain estimates of the proportion of true null hypotheses and the parameters of the second distribution of p-values from the false null hypotheses. Simultaneously, the proportions of false-positive and false-negative results are estimated as predictive values, treating the subgroup interaction test as a standard diagnostic test.

In our motivating example, we collected 292 p-values of interaction tests from 17 large randomized trials, utilizing data from 141,695 study participants. We further introduce some new distributions with varying numbers of parameters to extend the initially proposed restricted beta-uniform mixture (BUM) model. Depending on the mixture model, the proportion of false-positive results lies between 53% and 60%, the proportion of false-negative results between 13% and 25%, signaling that exaggerrating subgroup effects is a more serious problem than missing them in diabetes research.

32144101605

Author

Oliver Kuss (German Diabetes Center, Leibniz Institute for Diabetes Research at Heinrich Heine University Düsseldorf, Institute for Biometrics and Epidemiology)

Co-author

Tim Mori (German Diabetes Center, Leibniz Institute for Diabetes Research at Heinrich Heine University Düsseldorf, Institute for Biometrics and Epidemiology)

Presentation materials

There are no materials yet.