{"id":1380,"date":"2022-04-04T16:45:00","date_gmt":"2022-04-04T16:45:00","guid":{"rendered":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/thomas-newman\/?p=1380"},"modified":"2022-04-18T15:21:34","modified_gmt":"2022-04-18T15:21:34","slug":"daily-attendances-in-accident-and-emergency-departments-forecasting-attendances","status":"publish","type":"post","link":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/thomas-newman\/2022\/04\/04\/daily-attendances-in-accident-and-emergency-departments-forecasting-attendances\/","title":{"rendered":"(Part 2) Daily attendances in accident and emergency departments \u2013 Forecasting attendances"},"content":{"rendered":"\t\t
In this second and final blog in the series dedicated to analysing daily attendances in accident and emergency departments,\u00a0we will\u00a0<\/span>attempt to forecast the daily number of attendances at A&E using three different forecasting methods.<\/span>\u00a0The first blog focused on exploring the data and identifying relationships between attendances\/admissions and\u00a0<\/span>patients\u2019 age\/referral source\/arrival date\/time. <\/span>\u00a0To read the first blog (click here<\/a>).\u00a0<\/span><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t In 2015 the hospital of interest saw one of its neighbouring hospital’s ED close, as a result the number of attendances at the A&E\u00a0suddenly<\/span>\u00a0increased. As seen in figure 9, the average number of attendances before and after 2015 are different. This could be accounted for using change point detection prior to implementing our forecasting algorithms. However, due to the nature of the change and also to simplify the problem and accurately forecast the time series, only attendances recorded between 2016 and 2020 were kept. (To learn more and read a blog on change point detection (<\/span>click here<\/a>)).<\/span><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t To improve understanding of the time series, I applied a TBATS decomposition to the data. Other more common decomposition methods such as X11 could not be used as they do not handle multiple seasonality.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t In figure 10, the first plot corresponds to the actual daily attendances at the A&E. The second plot represents the level of the time series, which roughly corresponds to the trend. The third and fourth plots represent the weekly and yearly seasonality, respectively. From figure 11, we can see that the TBATS model has successfully captured the weekly seasonality. There are 8 points in that plot corresponding to 8 days, with the first point being a Monday. Mondays\u2019 average higher number of attendances is well represented. The yearly seasonality is also well captured in figure 12, where the summer months have on average a higher number of attendances.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t Next, we can apply several tests to ensure the residuals of the model resembled white noise. The residuals from the TBATS look to have mean zero and are normally distributed (Fig.13). The ACF plot of residuals does not show any significant autocorrelation and the histogram of residuals are centred around zero and normally distributed. Additionally, performing a Ljung-Box test yields a p-value of 0.2457 which is greater than 0.05. Therefore, we can assume that residuals are independent. Thus, all non-randomness has been well captured by the model.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t In this section, three different models which account for multiple seasonalities will be used to forecast the average daily number of attendances at A&E.\u00a0The two seasonalities of the time series have length 365.25 and 7 days. The time series shows a positive trend and an additive seasonal pattern. The data was split 80\/20 into a training dataset (1169 days) and a testing dataset (292 days).\u00a0<\/span>The first model\u00a0<\/span>we will look at\u00a0<\/span>is the TBATS (Trigonometric, Box-Cox transformation, ARMA errors, Trend, Seasonal components) model. The second model is an ARIMA model with Fourier series as external\u00a0<\/span>regressor. Finally the third model is an Autoregressive Neural Network model. We will then look at combining the different models to try and improve our predictive accuracy.<\/span><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t Figure 14 shows the daily number of attendances at the A&E, from January 2016 to the end of 2019. The blue line corresponds to the actual data and was split 80\/20 between training and testing, respectively. The red line corresponds to the TBATS forecasts from From figure 15, the TBATS forecasts look to be reasonably close to the actual daily number of attendances. However, the TBATS model struggles to predict spikes in the data, especially negative ones.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t In order to select the optimal number of Fourier terms for each seasonality, two nested for loops were created to identify the number of terms which minimises the AIC. According to the algorithm, 3 and 10 terms are optimal for the weekly and yearly seasonalities, respectively. Figure 16 shows the actual daily number of attendances at the A&E (blue line) and the ones forecasted by the ARIMA model with Fourier series as external regressor (red line).<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t As shown in figure 17, the forecasted figures look reasonably close to the actual data. However, similarly to the TBATS forecasts, the negative spikes in the data are not predicted very well.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t Looking at figure 18, we can see that the autoregressive neural network forecasts seem relatively close to the actual data. Furthermore, it appears to be able to predict some spikes in the data.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t The autoregressive neural network (ANN) does not look to have captured the seasonality of the time series as well as the other two models. However, it seems better at predicting spikes in the data. When forecasting the daily number of attendances at the A&E using the autoregressive neural network, I tested if forecasts based on detrended and deseasonalised data would yield a more accurate prediction. This revealed that detrending and deseasonalising the data did increase the forecast\u2019s accuracy (results are shown in the Results and Limitations section below).<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t It is sometimes possible to improve the forecasts\u2019 accuracy by combining predictions from different models. The forecasts from the three models tested above were combined using two different methods. The first method consisted of using an aggregation algorithm (here a Polynomial Potential aggregation) which would decide the weights used when combining the daily number of attendances from the different forecasts. The weights were assigned as to minimise the loss function (here the mean square error). I used two different accuracy measures to compare the forecasts. First, the Mean Absolute Error (MAE) gives the average number of attendances by which the forecasts and the actual data differ across the 292 days forecasted. Secondly, the Root Mean Squared Error (RMSE) which computes the square root of the mean of the squared residuals. These two measures were obtained comparing the 292 daily number of attendances from the test data and the forecasts. Both MAE and RMSE are scale dependant which implies that they can be used to compare forecasts made on the same dataset.<\/p> Table 2 shows that detrending and deseasonalizing the ANN increased the accuracy of the forecast. The RMSE and the MAE both reduced, from 22.88 to 19.52 and 17.88 to 15.49, respectively. However, it is the TBATS model which produced the most accurate forecasts with a RMSE of 19.01 and a MAE of 15.07.\u00a0\u00a0<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t2.5 Decomposing the Data<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t
\t\t\t\t\t\t\t\t\t\t\t
\t\t\t\t\t\t\t\t\t\t\t
\t\t\t\t\t\t\t\t\t\t\t
\t\t\t\t\t\t\t\t\t\t\t
\t\t\t\t\t\t\t\t\t\t\t3 Forecasting daily attendances, results and limitations.<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t
3.1 Forecasting using TBATS<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t
March 15th, 2019 to December 31st, 2019. The TBATS forecast and testing datasets have equal length (292 daily figures).<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t
\t\t\t\t\t\t\t\t\t\t\t
\t\t\t\t\t\t\t\t\t\t\t3.2 Forecasting using ARIMA model and Fourier series as external \nregressor<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t
\t\t\t\t\t\t\t\t\t\t\t
\t\t\t\t\t\t\t\t\t\t\t3.3 Forecasting with an Autoregressive Neural Network model<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t
\t\t\t\t\t\t\t\t\t\t\t
\t\t\t\t\t\t\t\t\t\t\t3.4 Combining forecasting models<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t
This was done for all possible combinations of the three forecasts using the function mixture()<\/strong><\/a> from the package \u2018opera<\/strong><\/a>\u2019. The second method used to combine the forecasts was to average the daily predictions. This was also done for all possible combinations of the three forecasts.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t3.5 Results and Limitations<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t