K.J.LEW1, H.L.KOH1, P.S.S.LEE1, E.S.LEE1
Missing data is a prevalent problem in healthcare research. When data is Missing Completely At Random (MCAR), complete case analysis (CCA) would not result in selection bias. However, when data is Missing At Random, principled methods should be used. The aim of our study is to assess the implications of missing data handling methods on healthcare research by comparing principled methods of imputing missing data with CCA on our existing dataset.
Our dataset was a cross-sectional study of 932 patients that looked at 10 socio-demographic and four multimorbidity independent variables associated with quality of life (QoL). Only 847 patients had complete data (9.1% missing). To impute missing values in our dataset, we used two imputation methods – Multivariate Imputation by Chained Equations (MICE) and MissForest (MF). We fitted the same generalized linear model on each of our imputed datasets and CCA (MICE-GLM, MF-GLM and CCA-GLM) using R (version 3.6.1).
MICE-GLM and MF-GLM had the same six variables significantly associated with QoL. Comparatively, CCA-GLM had four significant variables that were similar but two were completely different (sex and marital status). Sensitivity analysis confirmed that the QoL scores of 85 patients removed for CCA were on average higher than the remaining 847 patients, pulling the average QoL down and incurring Type 1 error.
Our missing data was not MCAR, therefore CCA was not appropriate. It is pertinent for researchers to consider the missingness mechanism, which is not often done, before deciding on the correct statistical analysis model to avoid making wrongful conclusions.