Speaker
Description
Prediction in the presence of missing values is a complex and still poorly understood problem, particularly when future records also contain missing values.
Mertens, et al. (2020) demonstrate that with non-linear models (such as logistic regression or Cox survival) and when using imputations, averaging of multiple predictions obtained from distinct models fitted on imputed data should be preferred to use of pooled models. Imputation is often regarded as computationally cumbersome however. It also tends to be poorly understood by applied researchers utilizing statistical methods. For such reasons, the method is often avoided. This raises the question whether other approaches could reasonably be used to handle missing values in prediction problems.
In this talk we contrast predictive averaging with some potential alternatives, such as complete-case-based model calibration (CC) as well as use of missing-indicator (IDX) and Pattern Submodel (PS) approaches. Connections between these methods are discussed. We focus on the problem of risk prediction. Simulations are used to ensure knowledge of the true risk in a comparison of prediction performance between methods. We demonstrate that only predictive averaging guarantees required coverage levels in prediction. Pattern submodeling as well as indicator methods provide poorly calibrated predictions, with no obvious methods to correct for these deficiencies.
64288220328