18–21 May 2026
Europe/Warsaw timezone

Living Synthetic Benchmarks: A Neutral and Cumulative Framework for Simulation Studies

21 May 2026, 16:21
18m
Room 13 A

Room 13 A

Speaker

František Bartoš (University of Amsterdam)

Description

Simulation studies are widely used to evaluate statistical methods. However, new methods are often introduced and evaluated using data-generating mechanisms (DGMs) devised by the same authors. This coupling creates misaligned incentives, e.g., the need to demonstrate the superiority of new methods, potentially compromising the neutrality of simulation studies. Furthermore, results of simulation studies are often difficult to compare due to differences in DGMs, competing methods, and performance measures. This fragmentation can lead to conflicting conclusions, hinder methodological progress, and delay the adoption of effective methods. To address these challenges, we introduce the concept of living synthetic benchmarks. The key idea is to disentangle method and simulation study development and continuously update the benchmark whenever a new DGM, method, or performance measure becomes available. This separation benefits the neutrality of method evaluation, emphasizes the development of both methods and DGMs, and enables systematic comparisons. In this paper, we outline a blueprint for building and maintaining such benchmarks, discuss the technical and organizational challenges of implementation, and demonstrate feasibility with a prototype benchmark for publication bias adjustment methods. We conclude that living synthetic benchmarks have the potential to foster neutral, reproducible, and cumulative evaluation of methods, benefiting both method developers and users.

21429403524

Author

František Bartoš (University of Amsterdam)

Co-authors

Björn S. Siepe (University of Marburg) Samuel Pawel (University of Zurich)

Presentation materials

There are no materials yet.