A.LI1, C. W. OEI2, H. P. PHUA2, W. X. LIAN2, L. H. HTET2, G.P. TAN2, H.Y. XU2, S.H. PHUA2, J. ABISHEGANADEN2, W.Y. LIM2
MOH Holdings Pte Ltd (MOHH)1, Tan Tock Seng Hospital2
Modern Machine Learning (ML) methods have been employed to solve problems in the healthcare setting, including prediction of adverse outcomes among patients. We assessed the use of ML in predicting 1-year All-Cause Mortality (ACM) in patients with a clinical diagnosis of Chronic Obstructive Pulmonary Disease (COPD).
The cohort comprised 2160 patients with an inpatient admission episode and a discharge diagnosis of COPD from 2012 to 2018. Patients who died during admission were excluded. The outcome of interest was 1-year ACM, defined as death within 365 days from discharge of index admission, and 335 (16%) patients in this COPD cohort had this outcome. We selected 166 features, from demographical, laboratory, spirometry, sputum culture and medication data, as inputs to model for ACM. We compared 3 ML models: eXtreme Gradient Boosting (XGBoost), Gradient Boosting (GBoost), MultiLayer Perceptron (MLP) neural networks against Logistic Regression with L1 regularisation and K best feature selection (LogReg). Weighted modelling was done to account for imbalances in the dataset. Each model underwent hyperparameter tuning with repeated 5-fold cross validation on training data (80% of cohort). Model performance was compared on held-out test data (20% of cohort).
XGBoost had the best Area under Receiver Operator Curve score of 0.803 (GBoost: 0.788, LogReg: 0.766, MLP: 0.652). Sputum culture with NTM growth, length of stay, age were top predictors of ACM in XGBoost. Comparatively, oxygen requirement, age and cardiac comorbidities were top predictors of ACM in LogReg.
XGBoost outperformed traditional statistical modelling using logistic regression in prediction of ACM in COPD.