Научная статья на тему 'DEVELOPMENT OF ALGORITHMS FOR CHOOSING THE BEST TIME SERIES MODELS AND NEURAL NETWORKS TO PREDICT COVID-19 CASES'

DEVELOPMENT OF ALGORITHMS FOR CHOOSING THE BEST TIME SERIES MODELS AND NEURAL NETWORKS TO PREDICT COVID-19 CASES Текст научной статьи по специальности «Компьютерные и информационные науки»

CC BY
178
24
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
HOLT'S LINEAR TREND MODEL / SIR MODEL / FORECASTING

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Abotaleb M.S.A., Makarovskikh T.A.

Time series analysis became one of the most investigated fields of knowledge during spreading of the COVID-19 around the world. The problem of modeling and forecasting infection cases of COVID-19, deaths, recoveries and other parameters is still urgent. Purpose of the study. Our article is devoted to investigation of classical statistical and neural network models that can be used for forecasting COVID-19 cases. Materials and methods. We discuss neural network model NNAR, compare it with linear and nonlinear models (BATS, TBATS, Holt's linear trend, ARIMA, classical epidemiological SIR model). In our article we discuss the Epemedic.Network algorithm using the R programming language. This algorithm takes the time series as input data and chooses the best model from SIR, statistical models and neural network model. The model selection criterion is the MAPE error. We consider the implementation of our algorithm for analysis of time series for COVID -19 spreading in Chelyabinsk region, and predicting the possible peak of the third wave using three possible scenarios. We mention that the considered algorithm can work for any time series, not only for epidemiological ones. Results. The developed algorithm helped to identify the pattern of COVID -19 infection for Chelyabinsk region using the models realized as parts of the considered algorithm. It should be noted that the considered models make it possible to form short-term forecasts with sufficient accuracy. We show that the increase in the number of neurons led to increasing accuracy, as there are other cases where the error is reduced in case of reducing the number of neurons, and this depends on COVID -19 infection spreading pattern. Conclusion. Hence, to get a very accurate forecast, we recommend re-running the algorithm weekly. For medium-range forecasting, only the NNAR model can be used from among those considered but it also allows to get good forecasts only with horizon 1-2 weeks.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «DEVELOPMENT OF ALGORITHMS FOR CHOOSING THE BEST TIME SERIES MODELS AND NEURAL NETWORKS TO PREDICT COVID-19 CASES»

DOI: 10.14529/ctcr210303

DEVELOPMENT OF ALGORITHMS

FOR CHOOSING THE BEST TIME SERIES MODELS

AND NEURAL NETWORKS TO PREDICT COVID-19 CASES

M.S.A. Abotaleb, [email protected],

T.A. Makarovskikh, [email protected]

South Ural State University, Chelyabinsk, Russian Federation

Time series analysis became one of the most investigated fields of knowledge during spreading of the COVID-19 around the world. The problem of modeling and forecasting infection cases of COVID-19, deaths, recoveries and other parameters is still urgent. Purpose of the study. Our article is devoted to investigation of classical statistical and neural network models that can be used for forecasting COVID-19 cases. Materials and methods. We discuss neural network model NNAR, compare it with linear and nonlinear models (BATS, TBATS, Holt's linear trend, ARIMA, classical epidemiological SIR model). In our article we discuss the Epemedic.Network algorithm using the R programming language. This algorithm takes the time series as input data and chooses the best model from SIR, statistical models and neural network model. The model selection criterion is the MAPE error. We consider the implementation of our algorithm for analysis of time series for COVID -19 spreading in Chelyabinsk region, and predicting the possible peak of the third wave using three possible scenarios. We mention that the considered algorithm can work for any time series, not only for epidemiological ones. Results. The developed algorithm helped to identify the pattern of COVID -19 infection for Chelyabinsk region using the models realized as parts of the considered algorithm. It should be noted that the considered models make it possible to form short-term forecasts with sufficient accuracy. We show that the increase in the number of neurons led to increasing accuracy, as there are other cases where the error is reduced in case of reducing the number of neurons, and this depends on COVID -19 infection spreading pattern. Conclusion. Hence, to get a very accurate forecast, we recommend re-running the algorithm weekly. For medium-range forecasting, only the NNAR model can be used from among those considered but it also allows to get good forecasts only with horizon 1-2 weeks.

Keywords: BATS, TBATS, ARIMA, Holt's Linear Trend Model, SIR Model, NNAR, COVID-19, Forecasting.

Introduction

COVID-19 is one of the most serious problems facing the entire world today. In this article, we consider methods for predicting the spread of COVID-19 (cases of infection, death, and recovery) in the Chelyabinsk region, using time series analysis models and NNAR neural networks. On March 21, 2020, the virus began to spread in a pattern that resulted in millions of infections in less than a year. Most of the deaths from this virus occur among the elderly and people with chronic heart disease, which is the leading cause of death even in developed countries [1]. Recently, a lot of studies have been published on forecasting the number of cases of COVID-19, both worldwide and in individual states and regions. These studies used mainly the ARIMA model, Holt's linear trend model, and the SIR state transition model. There are also studies devoted to the comparison of the work of the models, for example, in [2] it is shown that the linear Holt model is better than the ARIMA model for the states considered in it. In our article, we will investigate the performance of these models and provide an analysis of the errors of the forecasts obtained.

Given the similarity of the characteristics of the models in the United States and Italy, it was suggested in [3] that the corresponding forecasting tools can be applied to other countries fighting the COVID-19 pandemic, as well as to any pandemics that may arise in the future. However, a general principle for choosing models for predicting the spread of COVID-19 has not yet been formulated. Moreover, for different states and different conditions of the spread of the epidemic, it is advisable to build a forecast using different models. For example, in [4], it was shown that the LSTM model had consistently the lowest rates of forecast errors for tracking the dynamics of infection cases in the four

countries considered. There are also studies that show that the ARIMA model and cubic smoothing spline models had lower forecast errors and narrower forecast intervals compared to Holt and TBATS models.

Fig. 1. Scheme of Algorithm Epidemic. Network selecting the best model for predicting COVID-19 cases

The results obtained cannot be generalized to all countries affected by the COVID-19 pandemic due to different patterns of the spread of the virus. As for the SIR model, even at the beginning of the pandemic, it was shown to be ineffective in predicting cases of coronavirus infection. For example, using this model, it was found that the peak of the second wave of infection cases in Pakistan should have occurred on August 25, 2020, however, in fact, the peak of infection in this country in December 2020 [5]. The "covid19. Analytics" package, developed in the R language, has the same drawbacks. This is evidenced by the results of the SIR model and the prediction of the time of occurrence of the second (and subsequent) wave cycles. In Fig. 1 we show the scheme of the developed software module, which allows you to choose the best model with the available initial data. For the experiment, the Yandex dataset [6] was used on infections, deaths, and hospital discharge from March 12, 2020, to April 09, 2021. Let's

consider the models used in this algorithm. All models are subdivided into three categories: (1) time series analysis models; (2) neural network models; (3) epidemiological models. One of the first papers [7] was devoted to the simulation of the COVID-19 in the Isfahan province of Iran for the period from Feb 14th to April 11th, 2020. The authors of this paper forecasted the remaining infectious cases with three scenarios that differed in terms of the stringency level of social distancing. Despite the prediction of infectious cases in short-term intervals, the constructed SIR model was unable to forecast the actual spread and pattern of the epidemic in the long term. Remarkably, most of the published SIR models developed to predict COVID-19 for other communities suffered from the same conformity. The SIR models are based on assumptions that seem not to be true in the case of the COVID-19 epidemic. Hence, more sophisticated modeling strategies and detailed knowledge of the biomedical and epidemiological aspects of the disease are needed to forecast the pandemic

1. Time Series Analysis Models

1.1. BA TS u TBA TS Models

The TBATS model is a state-space trigonometric exponential smoothing model with Box-Cox transform, ARMA errors, trends, and seasonal components called the TBATS model, which is used to analyze univariate time series models and was developed by De Livera et al. [8]. A figure of the functioning of these models is shown in Fig. 2. The main difference between the TBATS model and the BATS model is the ability to forecast with variable seasonality.

Fig. 2. Scheme of the BATS and TBATS models

The main advantage of these models is the ability to use multiple seasonality. Nevertheless, in some cases, the use of these models is not advisable, since the results of the same order of accuracy can be obtained by other methods that are less demanding on computational resources.

1.2. Linear Holt model

Adaptive exponential smoothing models are a fairly popular tool for predicting the spread of coro-navirus infection. These models also served as a general tool for making time-series projections corresponding to the development of the epidemic in different countries [2, 9, 10]. True, the main drawback of most of the studies presented is the lack of an explanation for the choice of the corresponding model specification, as well as the lack of an "explanation" for the choice of model hyperparameters [9]. We also note the article [2], which shows that the exponential smoothing model for the time series under consideration gives more accurate results than the ARIMA model. The Holt-Winters model does not really explain in any way the nature of the epidemic and focuses exclusively on the data itself. Thus, in this model, we can note the phenomenon of insignificant seven-day cyclicity, associated primarily not with the true development of the infectious process, but with the work schedule of individual medical services (testing laboratories, as well as administrative services) [9].

1.3. ARIMA model

The ARIMA model consists of three components [11]: (1) AR (autoregressive term) - refers to past values used to predict the next value; determined by the parameter p in the autoregressive model; (2) MA (Moving Average) - used to determine the number of past forecast errors used to predict future values; determined by the q parameter obtained from the ACF (auto-correlation function); I (integrating term) - if the series is not stationary, then its difference of order d is found, which is a stationary series. To check the stationarity of the series, the extended Dickey-Fuller, Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test is used. The same tests allow determining the parameter d of the model.

In [11], it is shown that the parameters of the model predicting the spread of COVID-19 are different for different regions of the Russian Federation (and states), in addition, the parameters of the model change over time. The paper considers the possibility of automatic selection of parameters of the ARIMA model for time series corresponding to the same process occurring in different conditions.

2. Neural network model with autoregressive

One of the model prediction methods is an artificial neural network, based on simple mathematical models of the brain and allowing to establish the relationship between the response variable and its predictors, which is a complex nonlinear relationship.

In this work, we used a linear autoregressive model with delay, which we will call the NNAR model. To predict cases of COVID -19 in the Chelyabinsk region, the NNAR model (6.5) was used, which is a neural network with the last observation. The vector yt_1 is used as input to predict the output of yt and with five neurons in the hidden layer (Fig. 3).

NNAR Model For Covid 19 (infection - Deaths - Recovery ) Cases in Chelyabinsk

zi = *>, + Ем "»i

Fig. 3. NNAR model for predicting COVID-19 cases in Chelyabinsk

We have two types of neural networks: simple and multi-level feed-forward network. A simple neural network, which has no hidden layer and is equivalent to linear regression in this type, has coefficients attached to predictors that cause weights and the prediction obtained by a linear combination of input data and weights are chosen using a training algorithm that can minimize a "Cost Function" such as MSE, so in this type of neural network, linear regression is an efficient method for training a model.

The second type of neural networks, called feed-forward layered network, In this type, the inputs of each node from the previous levels, and the outputs of the nodes in one layer are the inputs in the next layer, and there is a combination between the inputs of each node and a weighted linear combination, and there is also non-linear function for modified results before output, open neurons j are combined linearly to give 2} = bj + Xl. For example, a sigmoid nonlinear function that is used to change in a hidden layer S(z) =-— parameters b1,b2,b3,b4,and b5 and and o)-, c

"Learned" from the data. Weights are often limited to keep them from getting too large. The weight limiting parameter is known as the "decay parameter" and is often set to 0.1. First, the weights are randomized and then updated using the observed data. Therefore, there is an element of randomness in the predictions made by the neural network. Therefore, the network is usually trained multiple times using different random starting points, and the results are averaged.

Neural network with autoregressive (NNAR) lagged value in time series data that is used for input into neural networks, so we used lagging in linear autoregressive model, which we can call this NNAR model, which means neural network autoregressive model in our implementation for prediction the third wave of COVID-19 in Chelyabinsk, we used the NNAR model (6,5) for the first scenario and NNAR (6,10) for the second scenario, that is, the model is a neural network with the last observation. yt_1 is used as input to predict the output yt and with five neurons in the hidden layer.

3. Epidemiological SIR Model

Epidemiological models such as SIR (susceptible, infected, recovered), and their many variants describe the density of infected people I using a typical equation [12]. At the beginning of the spread of infection, the number of infected, and recovered people is much less than the number of susceptible ones, so we can approximate S with a constant. Using this approximation, we obtain a linear differential equation with constant coefficients, according to the solution of which the growth in the number of infected persons at the beginning of the epidemic is exponential, and then slows down as the number of susceptible to infection decreases.

However, the classical SIR model does not provide a high quality of the obtained forecasts [5, 13, 14] due to differences in the algorithms for choosing its parameters. In [5], the work with an extension for the R language called covid19.analytics is described in detail. In [15], a model is used that provides a complete picture of the spread of COVID-19 anywhere in the world. The author of this package claims to do this by accessing, and retrieving data publicly available, and published from two main sources. The package also provides basic analysis and visualization tools and functions for exploring these and other similarly structured datasets. The main disadvantage of this package at the moment is the use of exclusively the classical SIR model for forecasting, which gives a very large error. However, in reality, effective (and not so) measures to contain the epidemic (quarantines, restriction of activities and movement, the use of masks, etc.) are developed and practiced everywhere, which affects the change in the trajectory of the epidemic and, as a result, leads to the fact that the coefficients of such a model become variable. In the article [9], the authors retain the coefficients of the model based on the newly obtained data, which is justified for obtaining short-term forecasts (up to 10 days) with high accuracy. The reason for the lack of accuracy of the model lies in the fact that one of the most important assumptions of this model is to divide the population into three homogeneous groups, and therefore this model is not suitable for the example of clearly heterogeneous societies. During the year of the pandemic, it became clear that these models give the best results for long-term forecasting (more than 7 days).

Software implementation of the considered algorithms

The considered models BATS, TBATS, the linear Holt model, ARIMA, SIR, and the neural network model NNAR were implemented using the R language. The results of computational experiments are given in [16], and the source code of the algorithm is in [17]. Here are some of the results obtained using the developed algorithm.

Table 1

MAPE(%) for testing data by using BATS, TBATS, Holt linear trend, ARIMA, SIR and NNAR for COVID-19 cases in Chelyabinsk Region (cumulative data)

Model Infections Deaths Recoveries

MAPE (%) MAPE (%) MAPE (%)

Time series models BATS 0.037 0.256 1.361

TBATS 0.041 0.238 0.514

Holt 0.043 0.316 0.359

ARIMA 0.041 0.281 0.689

Epidemiological model SIR 4.98 - -

Neural network model NNAR (1,5) 0.458 4.577 5.367

Selected best model BATS TBATS Holt

Table 1 shows the calculation of forecast errors for the period from 12, March 2020 to 9, April 2021 and testing last 7 days for cumulative daily infection cases, testing last 31 days for cumulative daily recovery cases, and the testing last 27 days for cumulative daily deaths cases in the Chelyabinsk region,

obtained using the considered models. Hence, it can be seen that the epidemiological and neural network models give an error that is 1-2 orders of magnitude higher than the time series analysis models.

Table 2

Forecasting peaks of infection waves in the Chelyabinsk region using the NNAR model

Wave Model Peak (forecast) Peak (fact)

1 NNAR(2,50) June, 23, 2020 June, 23, 2020

2 NNAR(6,50) December, 19, 2020 December, 19, 2020

3 The first scenario NNAR (6,5) June, 21, 2021 Unknown

3 The Third scenario NNAR (19,15) July, 18, 2021 Unknown

Fig. 4. Forecast of peaks of infection waves in the Chelyabinsk region

Despite the low (compared to time series models) forecasting accuracy using the NNAR model, this model can be successfully used to construct not only short-term, but also medium-term and long-term forecasts. Consider the results of using the NNAR model to predict infection peaks (Table 2, Fig. 4).

It can be seen from Table 2 that the peak of infections in the first two waves was accurately predicted by the corresponding models. Thus, there is reason to expect the peak of the next wave (provided that the existing restrictions and management decisions are preserved) at the predicted time, which makes it possible to proactively make some decisions and, possibly, influence the current situation.

Table 3

Modeling in a different time for the two Scenario for forecast the third wave in Chelyabinsk

Scenario NNAR (P, k) Lagged inputs P for forecasting the output yt Number of neurons in the hidden layer k

The first scenario NNAR(6,5) 6 5

The second scenario NNAR(6,10)* 6 10

The Third scenario NNAR(19,15) 19 15

* it is forecasted stable in daily infection cases (won't wave happen).

Table 3 shows the NNAR (P, k) to indicate there are P lagged inputs and k nodes in the hidden layer. We used NNAR (6,5) model to obtain forecast for the first scenario, NNAR (6,10) model for the second scenario, and NNAR (19,15) model for the third scenario respectively. Hence, both models present a neural network with the last six observations used as (yt — 1, yt — 2, yt — 3, yt — 4, yt — 5, yt — 6 ) inputs for forecasting the output yt for first and second scenario, and present a neural network with the last 19 observations for the third scenario. In addition to with five neurons in the hidden layer in the first scenario, five neurons in the hidden layer in the second scenario, and 15 neurons in the hidden layer in the third scenario. So the only difference between the three scenarios is the number of neural, but we obtain MAPE 6.984% for the first scenario, 1.636% for the second scenario, and 0.955% for the third scenario. One more difference is that we are testing the last 60 days for the second scenario, the last 30 days for the first scenario, and last 7 days for the third scenario. The lower error for the third scenario depends on the number of neurons in the network, and the size of training data. Hence, it is not necessary to increase the number of neurons for increasing accuracy, as in our case, the increase in the number of neurons led to increasing accuracy, as there are other cases where the error is reduced in case of reducing the number of neurons, and this depends on COVID -19 infection spreading pattern. To get a very accurate forecast, we recommend re-running the algorithm weekly (Table 4).

Table 4

Forecasting the lowest and highest daily COVID-19 infection cases till the end of July 2021 he end of July 2021

Scenario Forecasted the Lowest daily in- Forecasted the Highest daily

NNAR (P, k) fection cases infection cases

The first scenario NNAR (6,5) 122 313

The second scenario NNAR (6,10) 69 85

The Third scenario NNAR (19,15) 94 208

Conclusion

The article discusses a new algorithm implemented with R, predicting COVID-19 cases and choosing the best model (BATS, TBATS, Holt's linear model, ARIMA, SIR, and NNAR) for forecasting. The model selection criterion is the MAPE error. The developed algorithm helped to identify the pattern of COVID-19 infection. It should be noted that the considered models make it possible to form short-term forecasts with sufficient accuracy. For medium-range forecasting, only the NNAR model can be used from among those considered. The development of methods for highly accurate medium and long-term forecasting of COVID-19 cases is an open task. It is also an open task to take into account the number of vaccinated population when making forecasts.

The work was supported by Act 211 Government of the Russian Federation, contract No. 02.A03.21.0011. The work was supported by the Ministry of Science and Higher Education of the Russian Federation (government order FENU-2020-0022).

References

1. Barbarash O.L., Karetnikova V.N., Kashtalap V.V., Zvereva T.N., Kochergina A.M. [New coro-navirus disease (COVID-19) and cardiovascular disease]. Complex Issues of Cardiovascular Diseases, 2020, vol. 9, no. 2, pp. 17-28. (in Russ.)

2. Abotaleb M.S.A. Predicting COVID-19 Cases Using Some Statistical Models: An Application to the Cases Reported in China Italy and USA. Academic Journal of Applied Mathematical Sciences, 2020, vol. 6, no. 4, pp. 32-40. DOI: 10.32861/ajams.64.32.40

3. Tian Y., Ishika L., Xi Zh. Forecasting COVID-19 cases using Machine Learning. medRxiv, 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada., 2020. DOI: 10.1101/2020.07.02.20145474

4. Gecili E., Assem Z., Rhonda D.S. Forecasting COVID-19 confirmed cases, deaths and recoveries: Revisiting established time series modeling through novel applications for the USA and Italy. PloS one, 2021, vol. 16, no. 1, pp. 1-11. DOI: 10.1371/journal.pone.0244173

5. Hussain N., Baoming L. Using R-studio to examine the COVID-19 Patients in Pakistan Implementation of SIR Model on Cases. International Journal of Scientific Research in Multidisciplinary Studies, 2020, vol. 6, no. 8, pp. 54-59. DOI: 10.13140/RG.2.2.32580.04482

6. Koronavirus. Statistika [Coronavirus. Statistics]. Available at: https://yandex.ru/covid19/stat.

7. Moein S., Nickaeen N., Roointan A., Borhani N., Heidary Z., Javanmard S.H., Ghaisari J., Gheisari Y. Inefficiency of SIR models in forecasting COVID-19 epidemic: a case study of Isfahan. Scientific Reports, 2021, vol. 11 (1), pp. 1-9. DOI: 10.1038/s41598-021-84055-6

8. De Livera A.M., Hyndman R.J., Snyder R.D. Forecasting Time Series with Complex Seasonal Patterns Using Exponential Smoothing. Journal of the American statistical association, 2011, vol. 106, pp. 1513-1527. DOI: 10.1198/jasa.2011.tm09771

9. Lakman I.A., Agapitov A.A., Sadikova L.F., Chernenko O.V., Novikov S.V., Popov D.V., Pavlov V.N., Gareeva D.F., Idrisov B.T., Bilyalov A.R., Zagidullin N.S. [COVID-19 mathematical forecasting in the Russian Federation]. Hypertension, 2020, vol. 26, no. 3, pp. 288-294. (in Russ.)

10. Shokeralla A.A.A., Sameeh F.R.I., Musa A.G.M., Zahrani S. Prediction the daily number of confirmed cases of Covid-19 in Sudan with ARIMA and Holt-Winters exponential smoothing. International Journal of Development Research, 2020, vol. 10, no. 8, pp. 39408-39413. DOI: 10.37118/ijdr.19811.08.2020

11. Makarovskikh T.A., Abotaleb M.S.A. [Automatic Selection of ARIMA Model Parameters to Forecast COVID-19 Infection and Death Cases]. Bulletin of the South Ural State University. Ser. Computational Mathematics and Software Engineering, 2021, vol. 10, no. 2, pp. 20-37. (in Russ.) DOI: 10.14529/cmse210202

12. Banerjee M., Tokarev A., Volpert V. Immuno-epidemiological model of two-stage epidemic growth. Mathematical Modelling of Natural Phenomena, 2020, vol. 15 (27), pp. 1-11. DOI: 10.1051/mmnp/2020012

13. Sun D., Duan L., Xiong J., Wang D. Modelling and forecasting the spread tendency of the COVID-19 in China. Advances in Difference Equations, 2020, vol. 1: 489. DOI: 10.1186/s13662-020-02940-2

14. Barzon G., Rugel W., Manjunatha K.K.H., Orlandini E., Baiesi M. Modelling the deceleration of COVID-19 spreading. Journal of Physics A Mathematical and Theoretical, 2021, vol. 54 (4). DOI: 10.1088/1751-8121/abd59e

15. Ponce M. Covid19.analytics: An R Package to Obtain, Analyze and Visualize Data from the Corona Virus Disease Pandemic. The Journal of Open Source Software, 2021, vol. 6 (60): 2995. DOI: 10.21105/joss.02995

16. Abotaleb M, Makarovskikh T. Analysis of Neural Network and Statistical Models Used for Forecasting of Covid-19 Cases. Available at: https://rpubs.com/abotalebmostafa/752378.

17. Abotaleb M, Makarovskikh T. Epidemic.Network Used for Forecasting of Covid-19 Cases. Available at: https://github.com/abotalebmostafa11/Epidemic.Network.

Received 28 April 2021

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

УДК 517.7, 519.257 DOI: 10.14529/ctcr210303

РАЗРАБОТКА АЛГОРИТМОВ ВЫБОРА ЛУЧШЕЙ МОДЕЛИ АНАЛИЗА ВРЕМЕННЫХ РЯДОВ И НЕЙРОСЕТЕВЫХ МОДЕЛЕЙ ДЛЯ ПРОГНОЗИРОВАНИЯ СЛУЧАЕВ COVID-19

М.С.А. Аботалеб, Т.А. Макаровских

Южно-Уральский государственный университет, г. Челябинск, Россия

Анализ временных рядов является одной из наиболее исследуемых областей знаний во время распространения Covid-19 по всему миру. Проблема моделирования и прогнозирования случаев заражения COVID-19, летальных исходов, выздоровлений и прочих параметров остается актуальной и по сей день. Цель исследования. Данная статья посвящена исследованию классических статистических и нейросетевых моделей, которые могут быть использованы для прогнозирования случаев COVID-19. Материалы и методы. В статье обсуждается модель нейронной сети NNAR, проводится ее сравнение с линейными и нелинейными моделями (BATS, TBATS, линейный тренд Холта, ARIMA, классическая эпидемиологическая модель SIR). В статье приводится алгоритм Epemedic.Network, реализованный с использованием языка программирования R. Этот алгоритм принимает временные ряды в качестве входных данных и выбирает лучшую модель из SIR, статистических моделей и модели нейронной сети. Критерием выбора модели является ошибка MAPE. Рассмотрена реализация данного алгоритма для анализа временных рядов распространения COVID-19 в Челябинской области и прогнозирования возможного пика третьей волны, рассматриваются три возможных сценария. Отметим, что рассмотренный алгоритм может работать для любых временных рядов, а не только для эпидемиологических. Полученные результаты. Разработанный алгоритм позволил выявить закономерность заражения COVID-19 для Челябинской области с использованием моделей, реализованных в составе рассматриваемого алгоритма. Следует отметить, что рассмотренные модели позволяют с достаточной точностью формировать краткосрочные прогнозы. Мы показываем, что увеличение количества нейронов привело к повышению точности, так как есть другие случаи, когда ошибка уменьшается в случае уменьшения количества нейронов, и это зависит от характера распространения инфекции COVID-19. Заключение. Следовательно, чтобы получить очень точный прогноз, рекомендуется повторный запуск алгоритма еженедельно. Для среднесрочного прогнозирования из рассмотренных может использоваться только модель NNAR, но она также позволяет получать хорошие прогнозы только с горизонтом в 1-2 недели.

Ключевые слова: BATS, TBATS, ARIMA, линейная модель Хольта, модель SIR, NNAR, COVID-19, прогнозирование.

Литература

1. Новая коронавирусная болезнь (Covid-19) и сердечно-сосудистые заболевания / О.Л. Бар-бараш, В.Н. Каретникова, В.В. Кашталап и др. // Комплексные проблемы сердечно-сосудистых заболеваний. - 2020. - Т. 17, № 28. - С. 17-28.

2. Abotaleb, M.S.A. Predicting COVID-19 Cases Using Some Statistical Models: An Application to the Cases Reported in China Italy and USA / M.S.A. Abotaleb // Academic Journal of Applied Mathematical Sciences. - 2020. - Vol. 6, no. 4. - P. 32-40. DOI: 10.32861/ajams.64.32.40

3. Tian, Y. Forecasting COVID-19 Cases Using Machine Learning / Y. Tian, L. Ishika, Zh. Xi // medRxiv, 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 2020. DOI: 10.1101/2020.07.02.20145474

4. Gecili, E. Forecasting COVID-19 confirmed cases, deaths and recoveries: Revisiting established time series modeling through novel applications for the USA and Italy / E. Gecili, Z. Assem, D.S. Rhonda //PloS one. - 2021. - Vol. 16, no. 1. - P. 1-11. - DOI: 10.1371/journal.pone.0244173

5. Hussain, N. Using R-studio to Examine the COVID-19 Patients in Pakistan Implementation of SIR Model on Cases / N. Hussain, L. Baoming // International Journal of Scientific Research in Multi-disciplinary Studies. - 2020. - Vol. 6, no. 8. - P. 54-59. DOI: 10.13140/RG.2.2.32580.04482

6. Коронавирус. Статистика. - https://yandex.ru/covid19/stat.

7. Inefficiency of SIR models in forecasting COVID-19 epidemic: a case study of Isfahan / S. Moein, N. Nickaeen, A. Roointan et al. // Scientific Reports. - 2021. - Vol. 11 (1). - P. 1-9. DOI: 10.1038/s41598-021-84055- 6

8. De Livera, A.M. Forecasting Time Series with Complex Seasonal Patterns Using Exponential Smoothing / A.M. De Livera, R.J. Hyndman, R.D. Snyder // Journal of the American statistical association. - 2011. - Vol. 106. - P. 1513-1527. DOI: 10.1198/jasa.2011.tm09771

9. Возможности математического прогнозирования коронавирусной инфекции в Российской Федерации / И.А. Лакман, А.А. Агапитов, Л.Ф. Садикова и др. // Артериальная гипертензия. -2020. - Т. 26, № 3. - С. 288-294.

10. Prediction the Daily Number of Confirmed Cases of Covid-19 in Sudan with ARIMA and Holt-Winters Exponential Smoothing /A.A.A. Shokeralla, F.R.I. Sameeh, A.G.M. Musa, S. Zahrani //International Journal of Development Research. - 2020. - Vol. 10, no. 8. - P. 39408-39413. DOI: 10.37118/ijdr. 19811.08.2020

11. Макаровских, Т.А. Автоматический подбор параметров модели ARIMA для прогноза количества случаев заражения и смерти от Covid-19 / Т.А. Макаровских, М.С.А. Аботалеб //Вестник ЮУрГУ. Серия «Вычислительная математика и информатика». - 2021. - Т. 10, № 2. -С. 20-37. DOI: 10.14529/cmse210202

12. Banerjee, M. Immuno-epidemiological Model of Two-stage Epidemic Growth / M. Banerjee, A. Tokarev, V. Volpert // Mathematical Modelling of Natural Phenomena. - 2020. - Vol. 15 (27). -P. 1-11. DOI: 10.1051/mmnp/2020012

13. Modelling and forecasting the spread tendency of the COVID-19 in China / D. Sun, L. Duan, J. Xiong, D. Wang // Advances in Difference Equations. - 2020. - Vol. 1: 489. DOI: 10.1186/s13662-020-02940-2

14. Modelling the Deceleration of COVID-19 Spreading / G. Barzon, W. Rugel, K.K.H. Manjunatha et al. // Journal of Physics A Mathematical and Theoretical. - 2021. - Vol. 54 (4). DOI: 10.1088/1751-8121/abd59e

15. Ponce, M. Covid19.analytics: An R Package to Obtain, Analyze and Visualize Data from the Corona Virus Disease Pandemic / M. Ponce // The Journal of Open Source Software. - 2021. -Vol. 6 (60): 2995. DOI: 10.21105/joss.02995

16. Abotaleb, M. Analysis of Neural Network and Statistical Models Used for Forecasting of Covid-19 Cases /M. Abotaleb, T. Makarovskikh. - https://rpubs.com/abotalebmostafa/752378.

17. Abotaleb, M. Epidemic.Network Used for Forecasting of Covid-19 Cases / M. Abotaleb, T. Makarovskikh. - https://github.com/abotalebmostafa11/Epidemic.Network.

Аботалеб Мостафа Салахелдин Абделсалам, аспирант, кафедра системного программирования, Южно-Уральский государственный университет, г. Челябинск; [email protected].

Макаровских Татьяна Анатольевна, д-р.физ.-мат. наук, доцент, кафедра системного программирования, Южно-Уральский государственный университет, г. Челябинск; Makarovskikh.T.A@ susu.ru.

Поступила в редакцию 28 апреля 2021 г.

ОБРАЗЕЦ ЦИТИРОВАНИЯ

FOR CITATION

Abotaleb, M.S.A. Development of Algorithms for Choosing the Best Time Series Models and Neural Networks to Predict COVID-19 Cases / M.S.A. Abotaleb, T.A. Makarovskikh // Вестник ЮУрГУ. Серия «Компьютерные технологии, управление, радиоэлектроника». - 2021. - Т. 21, № 3. - С. 26-35. DOI: 10.14529/ctcr210303

Abotaleb M.S.A., Makarovskikh T.A. Development of Algorithms for Choosing the Best Time Series Models and Neural Networks to Predict COVID-19 Cases. Bulletin of the South Ural State University. Ser. Computer Technologies, Automatic Control, Radio Electronics, 2021, vol. 21, no. 3, pp. 26-35. DOI: 10.14529/ctcr210303

i Надоели баннеры? Вы всегда можете отключить рекламу.