University of Hertfordshire

By the same authors


View graph of relations
Original languageEnglish
Pages (from-to)849-880
JournalDecision Sciences
Publication statusPublished - 28 Oct 2014


The number of emergency (or unplanned) readmissions in the United Kingdom National Health Service (NHS) has been rising for many years. This trend, which is possibly related to poor patient care, places financial pressures on hospitals and on national healthcare budgets. As a result, clinicians and key decision makers (e.g. managers and commissioners) are interested in predicting patients at high risk of readmission. Logistic regression is the most popular method of predicting patient-specific probabilities. However, these studies have produced conflicting results with poor prediction accuracies. We compared the predictive accuracy of logistic regression with that of regression trees for predicting emergency readmissions within forty five days after been discharged from hospital. We also examined the predictive ability of two other types of data-driven models: generalized additive models (GAMs) and multivariate adaptive regression splines (MARS). We used data on 963 patients readmitted to hospitals with chronic obstructive pulmonary disease and asthma. We used repeated split-sample validation: the data were divided into derivation and validation samples. Predictive models were estimated using the derivation sample and the predictive accuracy of the resultant model was assessed using a number of performance measures, such as area under the receiver operating characteristic (ROC) curve in the validation sample. This process was repeated 1000 times—the initial data set was divided into derivation and validation samples 1000 times, and the predictive accuracy of each method was assessed each time. The mean ROC curve area for the regression tree models in the 1000 derivation samples was 0.928, while the mean ROC curve area of a logistic regression model was 0.924. Our study shows that logistic regression model and regression trees had performance comparable to that of more flexible, data-driven models such as GAMs and MARS. Given that the models have produced excellent predictive accuracies, this could be a valuable decision support tool for clinicians (health care managers, policy makers, etc.) for informed decision making in the management of diseases, which ultimately contributes to improved measures for hospital performance management.


This is the peer reviewed version of the following article: Eren Demir, “Classification Trees, Logistic Regression, Generalized Additive Models, and Multivariate Adaptive Regression Splines” Decision Sciences, Vol 45(5): 849-880, October 2014, which has been published in final form at doi: 10.1111/deci.12094. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving. © 2014 Decision Sciences Institute

ID: 2663376