Speaker
Description
In field studies, measurements are often collected over extended periods, during which subtle shifts in data quality or instrument performance can occur. Recognizing and quantifying such measurement heterogeneities over time is essential to ensure the validity of study results and to intervene at an early stage if possible. However, the performance of available statistical approaches for identifying temporal variability that indicates measurement accuracy remains poorly understood.
We present a simulation study designed to systematically compare seven statistical methods (ARIMA, fused LASSO signal approximator [FLSA], GAM, LOWESS, moving average, PELT, and piecewise regression) for assessing measurement heterogeneity across the data collection period. All methods were implemented using default parameter settings to reflect typical application scenarios. We generated 70,720 datasets covering a wide range of sample sizes, from 30 to 1,000 observations, to evaluate the performance of each method under 136 varying data conditions and systematic error patterns. These include sample size, distribution, signal-to-noise ratio, type and magnitude of systematic change. Four estimands were defined to investigate the ability of each method to detect temporal variability and change points in the simulated scenarios.
Our results demonstrate that method performance is strongly dependent on both data distribution and sample size. LOWESS and GAM consistently delivered the most stable results across all performance measures and error patterns, making them suitable default choices for routine monitoring. For scenarios without true change, PELT performed best with normally distributed data, while FLSA excelled with log-normal distributions. Change-point detection revealed method-specific strengths: moving average was superior for detecting single jumps, whereas FLSA handled more complex change patterns most effectively. For normal data, the moving average and, for lognormal data, PELT tend to systematically overestimate the performance of the range as the sample size increases. Notably, ARIMA and PELT exhibited considerable instability and sensitivity to sample size, particularly degrading with larger N.
Based on these findings, we recommend LOWESS and GAM as robust general-purpose methods for detecting measurement heterogeneity across diverse field study conditions. For change-point enumeration specifically, FLSA provides consistent performance with minimal sensitivity to sample size. ARIMA and PELT should be avoided as default approaches due to their inconsistent bias behavior and performance degradation with increasing sample size. These recommendations provide practical guidance for researchers implementing quality control procedures during data collection.
75002901206