Speaker
Description
Introduction:
Machine learning (ML) validation studies can often be tackled with standard statistical inference methods, i.e. confidence intervals and statistical tests. While this is reasonable in many situations there are also conditions under which the usual IID assumption is not met, and operating characteristics (coverage probability, type 1 error rate) may thus deteriorate. For instance, hierarchical data structures (multiple medical images per patient, multiple cells per blood sample, …) may introduce dependencies between (lower level) observations.
Methods:
We applied a flexible hierarchical bootstrap approach, which has been proposed before in other contexts, to the outlined problem (1, 2). Hereby, drawing samples with replacement is conducted sequentially for each level of the (assumed) hierarchical data structure (e.g. first patients, then medical images per patient). To compare confidence intervals based on different variants of this approach and the traditional bootstrap with regards to relevant operating characteristics (coverage probability, average width), we conducted a simulation study in the ADEMP framework (3). Hereby we investigated different data generating mechanisms in the context of ML validations studies.
Results:
Our simulation results indicate that utilizing the hierarchical bootstrap usually outperforms the standard bootstrap as its coverage probability is usually much closer to the target level. The only exception is the IID case where a hierarchical data structure is assumed but not truly part of the data generating process. In this scenario, the hierarchical bootstrap is rather conservative.
Discussion:
We conclude that the hierarchical bootstrap investigated in this work is valuable for ML practitioners as it shows promising results while being still simple to apply. It is also widely applicable to arbitrary performance or error metrics. We currently see this method in phase 2 of the methodological research framework (4). In effect, more extensive and diverse simulation studies will be needed in the future to better characterize and understand the operating characteristics in various scenarios.
References:
(1) Ren, Shiquan, et al. "Nonparametric bootstrapping for hierarchical data." Journal of Applied Statistics 37.9 (2010): 1487-1498.
(2) Saravanan, Varun et al. “Application of the hierarchical bootstrap to multi-level data in neuroscience.” Neurons, behavior, data analysis, and theory vol. 3,5 (2020).
(3) Morris, Tim P., Ian R. White, and Michael J. Crowther. "Using simulation studies to evaluate statistical methods." Statistics in medicine 38.11 (2019): 2074-2102.
(4) Heinze, Georg, et al. "Phases of methodological research in biostatistics—building the evidence base for new methods." Biometrical Journal 66.1 (2024): 2200222.
64288212786