INVESTIGATION OF THE BAYESIAN AND NON-BAYESIAN TIME SERIES FORECASTING FRAMEWORKS IN APPLICATION TO OSS SYSTEMS OF THE LTE/LTE-A AND 5G MOBILE NETWORKS
DOI: 10.36724/2072-8735-2022-16-4-52-60
Vladimir A. Fadeev,
Kazan National Research Technical University named after A.N. Tupolev-KAI, Kazan, Russia, [email protected]
Shaikhrozy V. Zaidullin,
Kazan National Research Technical University named after A.N. Tupolev-KAI, Kazan, Russia, [email protected]
Adel F. Nadeev,
Kazan National Research Technical University named after A.N. Tupolev-KAI, Kazan, Russia, [email protected]
Manuscript received 21 March 2022; Accepted 14 April 2022
Keywords: key performance identifiers (KPI), RAB (Radio Access Bearer) protocol, E-RAB (E-UTRAN Radio Access Bearer), LTE (Long-Term Evolution) network, LTE-Advanced network, 5G, time series forecasting, Bayessian, SARIMAX, XGBoost
Applicability of some conventional Time-Series prediction models for the temporal dynamics of the EPS Radio Bearer Setup Failure Rate is examinated in this paper. Two main problems of the proactive network management have been considered: the prediction of regular part of time series and the outliers prediction. For the regular part prediction Holt-Winters Exponential Smoothing, Extreme Gradient Boosting (XGBoost), Support Vector Regression (SVR), Python Dynamic Linear Model (PyDLM) and Seasonal AutoRegressive Integrated Moving Average (SARIMAX) have been used. The error performance has been analyzed using Median Absolute Error, Mean Absolute Error, Mean Square Error and Root Mean Square Error. A two-step approach has been proposed for the outliers prediction. Deployment of such approach is justified by the fact, which the time moments of the anomalies, are preferable for practical purposes. On the first step, the values of time-series are predicted using one of the above mentioned models. On the second step the resulting values from the step one are classified using discrete state Hidden Markov Model. For the error performance estimation purpose True Positive, False Positive and False negative rates have been calculated for the respective models. Finally, several proposals for the usage of the considered algorithms in a proactive offline network management in Operation Support System (OSS) or Network Data Analytics Function (NWDAF) have been made.
Для цитирования:
Фадеев В.А., Зайдуллин Ш.В., Надеев А.Ф. Исследование байесовских и небайесовских методов прогнозирования временных рядов в применении к системам OSS мобильных сетей LTE/LTE-A и 5G // T-Comm: Телекоммуникации и транспорт. 2022. Том 16. №4. С. 52-60.
For citation:
Fadeev V.A., Zaidullin S.V., Nadeev A.F. (2022). Investigation of the Bayesian and non-Bayesian time series forecasting frameworks in application to OSS systems of the LTE/LTE-A and 5G mobile networks. T-Comm, vol. 16, no.4, pр. 52-60. (in Russian)
1. ntroduction
Rapid growth of the user-plane and control-plane traffic and constant technological improvement, simplify the implementation and, at the same time, make more important new functionality related to predictive data analysis. In case of mobile networks the main module that should be considered is Operation Support System (OSS).
An OSS for cellular communication systems up to LTE-A can be decomposed into following interconnected entities [1-3]:
- entity which performs collection of the numerical indicators, representing various aspects of network deployment (performance, fault-tolerance, incident rate etc);
- entity which provides network management options (interaction with cellular network elements);
- database management system (DBMS);
- graphical user interface for network service engineers.
System architecture, described above, is visualized on figure 1.
In order to obtain overall network performance, KPIs can be combined over aggregation objects, which follows the reduction of the data structure depicted on figure 2 to matrix form.
2.2. Temporal decomposition of time series (structural model)
One of the most important aspects of time series is to separate deterministic and random temporal components. Time series from the previous subparagraph can contain harmonic components (seasonality), since some indicators correlate with human life cycles (working hours and days, weekends, holidays). Also, trend components may be presented because of the increase or decrease of the number of subscribers. Consequently, an ETS (error, trend, seasonality) [4, 5] decomposition model is suitable for considered data. The decomposition can be either additive or multiplicative.
An additive model can be defined as follows:
yt :
: St + Tt + Rt
(1)
where yt - Is value of y at time moment t, St - seasonal component, Tt - trend component, and Rt - Is a residual, containing random fluctuations, cannot be defined in terms of previous two components (noise, spikes etc). A multiplicative decomposition, on the other hand, is defined as follows:
Figure 1. System architecture of LTE/LTE-A OSS
The research is focused on the opportunities to extend functionality of OSSs by the predictive function that can provide ability to predict various key performance identifiers (KPI) of mobile networks.
2. ata models
2.1. Initial structure of the data
Often, cellular network operators collect network performance indicators in form of time series, aggregated with distinct periodicity on both radio-access network (RAN) and core network (CN) sides.
KPI values can be represented in the form of third order array with each direction corresponding to KPI values, time instants, or data aggregation objects (Fig. 2).
yt = St x Tt x Rt or, equivalently as:
ln(yt)= ln(St)+ ln(Tt) x ln(Rt)
(2)
(3)
We consider this type of decompositions also as a theoretical basis for other linear decompositions belonging to the ETS family. The models (1) and (2) can be extended by heuristically derived components, for example, as Facebook Prophet model, which also encounters public holidays component [6, 7].
KPI considered in this article is the number of unsuccessful E-RAB (E-UTRAN Radio Access Bearer) connections (E-RAB Setup Failures). The main reason for the appearance of the mentioned above failures is unsuccessful delivery of the "E-RAB SETUP REQUEST" message (Fig. 3) MME (Mobility Management Entity) in the network of the considered operator. These events happen due to corrupted checksum of the particular message caused by problems in the communication channels between base stations and CN.
Figure 2. Example of multidimensional representation of the cellular network time data
Figure 3. The procedure of the E-RAB protocol establishing
The decomposition of the E-RAB Setup Failures KPI measurements by formula (1) is shown in figures 4 and 5.
Visualization of the ETS Decomposition additive E_RAB_SETUP_FR Initial singal
4J.1L._____LX. J** mnii, , i
Trend Component of ETS decomposisiton additive
1
JLjui ^-jvAJUUW^JIJ AH „„ -
Residuals of ETS decomposisiton additive
• •
• • t
I.I L*. ..
r
2019-07 Dates
Figure 4. Visualization of the trend and residual components of the E-RAB Setup Failures KPI measurements (additive ETS model).
Seasonal Component of ETS decomposisiton additive E_RAB_SETUP_FR
a> oooo
a"«-
,»v
if-
Dates
Figure 5. Visualization of the seasonal component of the E-RAB Setup Failures KPI measurements (additive ETS model). The decomposition of the considered KPI measurements by formula (3) is shown in figures 6 and 7
visualization of the ETS Decomposition multiplicative ERABSETUPFR Initial singa
____lIt. ,. Ji.it, mmu........tl
Trend component of ETS decomposisiton additive
1 1
—J--—-.,--vJU ULajv _JUl____ "---v-
Residuals of ETS decomposisiton additive
Figure 6. Visualization of the trend and residual components of the E-RAB Setup Failures KPI measurements (multiplicative ETS model)
Seasonal Component of ETS decomposisiton additive E_RAB_5ETUP_FR
,<p
cJy- „,t>V
7?>j ijP"1-'
Dates
Figure 7. Visualization of the seasonal component of the E-RAB Setup Failures KPI measurements (multiplicative ETS model)
s>
va -A.
Based on tlie knowledge about the residual component (Fig. 4,6), the additive model can be assumed as the more appropriate.
2.3. Statistical decomposition of time series
Another important aspect of time series analysis is to define statistical distribution of the considered data. Moreover, real data can contain more than one statistically independent component.
Month-wiss Box Plot (The Seasonality)
Ë4 J
53
*
•
• t
■
, ; • . t
fix &.8) = ^a:f\x fi]
(4)
a. > O.y a. =1
where /1 -v ft1
J
are weights of probabilistic mixture.
is the probability density function of ;-th mixture com-
ponent, ■" is the set of distribution parameters, K is the total number of components. In this paper, we assume also that our data contain a regular (repetitive) statistical component, and a component of the outliers (Fig. 8).
Large number of outliers is one of the main motivations for outliers prediction task.
3. athematical background
3.1. Continuous state-space models
One of the most abstract approaches to describe time-dependent time series is the continuous state-space model and it's partial case: dynamic linear model (DLM) [8 - 13]. DLM can be also defined as a particular case of HMM [9]. Because of predetermined linearity conditions, data model of time series is represented as follows:
y, = .vr I i'
(14)
Jan Feb Mar /*pr May Jul Ju Aug Sep Od Nov Dec month
Figure 8. Box plot of the E-RAB Setup Failures measurements (year 2019)
Since our KPIs depend on various factors (time of the day, overall number of subscribers, possible incidents) which do not correlate with each other strongly, it is reasonable to assume that KPI value can be decomposed into several statistically independent values:
where xt - is a component of current state, a v• js the
component of random fluctuations. Moreover, the state component can [11, 12]:
*,=e;f
(15)
where f - Is a known regression vector, and 0, - a vector of the states, defined as follows :
(16)
w ~N(0 W1
where Gt is the state transition matrix, < v ' <' - s a vector of random state fluctuations.
An important feature of the DLM's is the possibility to superimpose multiple simple models into one complex, by concatena-
tion of the regression vectors into matrix and collection of the transition matrices into the block-diagonal form:
f,(i) f = fj(0
I
G,=
Wh
'C,(0 0
o g2(/)
(17)
0
0 0
G AO
(18)
(6)
where 0<a< 1 is the smootliing parameter. The slope of the trend is:
(7)
where 1 is the trend smootliing parameter. The seasonal
component can be defined as:
where k is a number of particular DLM's.
Conventionally, the following components can be represented by particular DLM's:
- rend component;
- long-term and (or) short-term seasonality;
- d namic component, which represents influence of external
factors;
- utoregressive component AR(p), which allows taking into
account short-term dependencies within time series.
Training of DLM models is composed of two stages:
1) On the first stage, also called the filtering stage, coefficients are calculated by Kalman filter regardless of previously estimated states
2) On the second stage, or smootliing stage, previously estimated states are adjusted with respect to new calculations (e.g. via Rauch-Tung-Striebel algorithm).
These models are based on Bayesian theory and can be considered as a reference model for all the time series models which have Markovian.
3.2. Holt-Winter's method
One of the classical linear ETS methods is the exponential smootliing. The triple exponential smootliing also known as Holt-Winter's method is considered by us during this research due to requirements for analysis of the seasonal component.
According to [4] an additive decomposition is used more widely in this case. The value of the time series in time moment t can be defined as:
yt+h = /,+ (fl» + *2+ - + 9h)bt+st+h_m{k+l) (5)
where O<<50<1 is the trend damping parameter (for non-damped trends this parameter is equal to 1), h is the integer value that shows, how many samples ahead the forecast should be done, m is the number of samples per one period (seasonality), k is the integer part of (h -1) / m, that guarantees usage of the belonging to one year only seasonal indexes estimations.
The level of the time series in time moment t is:
i-m (8)
where 0<7< 1-a is the smootliing parameter of the seasonal component.
Unknown coefficients u, fi h 7 can be obtained on the model training stage using one of the maximum likelihood estimation methods (MLE) [4]. This model can be also considered as an example of DLM [14] and therefore trained via Bayesian filtering and smootliing procedures.
3.3. SARIMAX model
Another widely used linear method is the seasonal autoregressive integrated moving average (SARIMA) [16 - 18], that can be described via (p, d, q) X (P, D, O, s) form, where p and P are the numbers of required backward samples and periods of the non-seasonal and seasonal components of the time series respectively, d and I) are orders of differentiation necessary to reduce to a stationary form of observation and seasonal component respectively, q and O are the numbers of required values of the approximation errors of the non-seasonal and seasonal components respectively, and 5 is the number of samples per each period.
The observation value ^ t and approximation error Et are related to each other via the following equation:
<Pp( B°) (f>p{ B) V f V dyt = 9q{ B) eQ( 5 s) et
(9)
0 ( B ! — II — -±- U — — Y U
where r V ' '1" J is the seasonal
part of the autoregressive (AR) model's component of the order
n q> (B) =( 1 - <p.B - ... - to Bp) . . ^
P. r \ 1 p / is the non-seasonal part
V u = (1 - Bs) 0
of the AR component of the order p. s x ' and
V " = (1 - B)d is nabla operators for seasonal and non-seasonal components of the orders I) and d respectively,
(Bs) = (l - 0.B'- ... - eMA . ,
V 1 i? 1 is the seasonal compo-
of the moving average (MA) of the order O.
= f 1 - 0.B - ... - 6 Bi) . , \ 1 9 J is the non-seasonal com-
ponent of the MA of the order q, and B is the lag operator.
The selection of the parameters p, d, q, P, D h Q can be done automatically via the described in [16] algorithm, on each iteration of that the sets of the unknown coefficients of the AR and MA polynomials are estimated based on one of the one of the MLE optimization algorithms [16].
Additionally, this method can be extended by exogenous variables (SARIMAX model) [17] and described as an example of DLM models [18].
3.4. Common machine learning forecasting models
As an alternative to above mentioned methodology with a predefined statistical model, a more heuristic approach as a regression tree can be employed for anomaly prediction. For our research, we use Extreme Gradient Boosting (XGBoost) algorithm [19], which is widely used for classification and regression tasks.
The algorithm is based on Classification and Regression Trees (CART) [20]. Based on this, we can specify the following advantages:
<P,,B<P )
Q
nent
0 ( B)
r
-ver fast execution
- no need of any data normalization
- can handle non-linear dependencies
The disadvantages of the CART are consequences of use of finite set of the target variables for the regression what follows:
- bad performance on time series with trend
- no extrapolation or interpolation capabilities
- performance degradation on noisy data
Similarly, the Support Vector Regression (SVR) can be applied. The main idea of SVR is to determine regressions optimal error bound. The following advantages might be derived from literature [21-24]:
- can handle non-linear dependencies via Kernel trick;
- relatively simple general idea;
- may be applied to multivariate time series.
However, the following drawbacks are determined for this method:
- high computational complexity;
- sensitivity to data normalization;
- lower performance on time series with high volatility.
3.5. Anomaly prediction methods
A great challenge in cellular network performance analysis is prediction of possible incidents based on KPI. Various approaches for this may be found in literature.
One of them is defining internal patterns and dependencies inside the data. Examples of such techniques are non-linear models based on regressive trees XGBoost [25-28], various artificial neural network (ANN) techniques a LSTM networks [29], transformers [30, 31], ANN autoencoders [32, 33].
In this paper we provide a two-step approach for anomaly prediction:
- on the first step one a conventional time-series prediction model is evaluated to predict future values of time series for a predefined horizon
- on the second step, predicted values are classified using a pre-trained classifier to determine possible time moments of the future anomalies (Fig. 9).
So, our main goal is not to predict the exact shape of the time series in the future, but to define the time moments with higher risk of the incidents in the future.
Figure 9. The block scheme of outliers investigation approach
Use of Gaussian Mixture HMM to implement the pre-trained classifier is motivated by an assumption that statistical parameters of arbitrary data can be approximated by a Gaussian Mixture. Therefore, the appropriate HMM to choose is Gaussian HMM. Parameters of the Gaussian HMM can be estimated using Baum-Welsh expectation-maximization (EM) [34] algorithm.
4. Simulations and results
The results of predictions of the regular part based on the described above models are provided in table 1. The following open source Python 3 libraries are used for this research:
- s atsmodels [35] (Holt-Winter's model);
- sc kit-learn [36] (SVR);
- pmdarima [37] (SARIMAX);
- x boost [25] (XGBoost);
-P DLM [38] (continuous state-space model).
The following exogenous variables are used for XGBoost, PyDLM, SARIMAX and SVR: day of year and hour. Super parameters of the SARIMA model are obtained by Auto ARIMA algorithm [16]. Median Absolute Error, Mean Absolute Error, Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) are used as performance metrics. The stepwise cross-validation with splitting into 12 parts is used. The data set includes values belonging to year 2019.
Table 1
The results of the predictions of the regular part
Model name Median Absolute Error Mean Absolute Error MSE E
Holt-Winter's (additive trend, additive seasonality) 0.040 0.074 0.055 0.173
SVR 0. 0.087 0.057 0.179
XGboost 0.043 0.052 0.163
PyDLM 0.074 0.105 0.068 0.198
SARIMAX (2, 0, 1) x (1, 0, 2, 24) 0.064 0.098 0.060 0.189
The same features are used for outlier prediction problem, and the dataset was extended by 2017 and 2018 years. As a metric to evaluate performance of the algorithms a true positive rate [39] was chosen. The results are provided in tables 2 and 3. Results for the SARIMAX model are not included because of absent reasonably detected outliers. The pomegranate Python 3 library [40] was used as the discrete-state HMM implementation.
Table 2
The results of predictions of day and hour of outliers' occurrence
Model Number of Number of Number of Number Number of
name detected predicted true alarms false missed
outliers outliers alarms alarms
Train years: 2017, 2018; test year: 2019.
Holt- 551 524 523 1 23
Winter's
SVR 551 2504 551 1953 0
XGboost 551 392 392 0 1
DLM 551 3174 325 2849 226
Train years: 2017, 2019; test year: 2018.
Holt- 1760 2157 1759 398 1
Winter's
SVR 1760 4186 283 3903 1477
XGboost 254 1759 695 1
PyDLM 1760 539 38 501 1722
Train years: 2018, 2019; test year: 2017.
Holt- 1419 1015 1014 1 405
Winter's
SVR 1419 4228 0 4228 1419
XGboost 1419 1342 1341 1 78
PyDLM 1419 855 2 853 1417
Table 3
The results of predictions of the day of outliers' occurrence
into operation. Also, such implementation is preferable due to active development network function virtualization.
Model name Number of detected outliers Number of predicted outliers Number of true alarms Number false alarms Number of missed alarms
Train years: 2017, 2018; test year: 2019.
Holt-Winter's 82 0 4
SVR 82 2 82 155 0
XGboost 82 0 20
PyDLM 82 3 81 235 1
Train years: 2017, 2019; test year: 2018.
Holt-Winter's 170 195 170 25 0
SVR 170 351 164 187 6
XGboost 170 220 170 50 0
PyDLM 170 90 44 46 126
Train years: 2018, 2019; test year: 2017.
Holt-Winter's 134 111 110 1 24
SVR 134 363 132 231 2
XGboost 134 130 129 1 5
PyDLM 134 152 57 95 77
The averaged by cross-validation sets and rounded to integer values results are performed in the table 4.
Firstly, the results of the regular part predictions can be considered as comparable and differences between used models are almost negligible.
Secondly, from point of view maximum of truly predicted alarms and minimum of the false alarms Holt-Winter's model and XGBoost algorithm can be considered as the most preferable candidates for the task of outliers prediction.
Table 4
Summarized results of outliers prediction
Model Number of Number of Number of Number Number of
name detected predicted true alarms false missed
outliers outliers alarms alarms
Day and hour of outliers' occurrence
Holt- 1243 1232 1099 133 145
Winter's
SVR 1 3639 278 3361 965
XGboost 1396 1164 232 79
PyDLM 1243 1523 121 1401 1122
Day of outliers' occurrence
Holt- 127 128 119 9 9
Winter's
SVR 127 317 126 191 3
XGboost 137 120 17 9
PyDLM 127 186 60 68
Figure 10. System architecture of OSS with embedded predictive
analysis tools
Figure 11. System architecture of OSS with additional predictive analysis subsystem
The considered above schemes can be also implemented into non-stand-alone 5G networks (New Radio and LTE core). In stand-alone 5G such functionality is already recommended by 3GPP [43-46] part of the system architecture. This function is called network data analytics function [47] (NWDAF) (Fig. 12), and is important because higher data rates and low latency requirements for crucial 5G applications (industrial IoT, telehealth, smart homes and cities etc).
5. mplementation in communication systems
Discussed above, predictive functionality might be implemented into current LTE/LTE-A networks in two ways:
- Embedding of new functions and entities to existing OSS architecture. This approach is reasonable because of new trends in the development of DBMS engines (distributed storage, insertion of artificial intelligence AI [41, 42]). The visualization of this approach is shown in figure 10.
- Creation of independent subsystems in addition to OSS (Fig. 11). This approach is more preferable for OSS, already put
Figure 12. Interaction of NWDAF with other 5G network elements
NWDAF includes following data analysis and prediction options [48, 49]:
- ca culation and prediction of the overall network load performance and load for specific network slice;
- co lection of the analytical information and prediction of key performance indicators for specific network function (NF);
- oS calculation and prediction for an application or group of user equipments UE;
- prediction of expected UE behavior and mobility, UE anomaly detection;
- co lection of network overloading information: current and predicted for specific location;
- QoS stability enforcement, including reports.
It is also good to mention that 3GPP recommendations imply very flexible choice of the data analysis tools, as well as an increase of exploit third-party open-source software for these purposes.
Conclusions
From table 1 we can conclude that XGboost and Holt-Winters outperform other algorithms in terms of selected metrics for regular part prediction problem. Also, XGboost shows less error value spread, compared to all the rest considered algorithms.
The results for outliers prediction problem (tab. 4) are similar to the derived for regular part prediction problem. XGboost and Holt-Winters demonstrate low false-alarm and missed-alarm rates compared to SVR and PyDLM, for both two- and single-featured problems. It has also to be mentioned that SVR gives significant false-alarm rate, while PyDLM demonstrates large missed-alarm rate. The last two observations might be caused by smoothing nature of respective algorithms.
The investigation of the applicability of Deep Learning models to the considered tasks can be the following step of this research.
References
1. ETSI TS 132 111-1 V12.1.0 (2015-01).
2. Passionateaboutoss.com. 2021. What are OSS andBSS? | Passionate About OSS and BSS. [online] Available at: <http://passionateaboutoss.com/ back-ground/what-are-oss-bss/> [Accessed 27 February 2021].
3. Support.huawei.com. 2021. Huawei iManager U2000 Support Guide, Manuals & PDF - Huawei. [online] Available at: <https://support.huawei.com/ enterprise/en/u2000/imanager-u2000-pid-15315> [Accessed 27 February 2021].
4. Hyndman R.J., & Athanasopoulos G. (2018) Forecasting: principles and practice, 2nd edition, OTexts: Melbourne, Australia. OTexts.com/fpp2. Accessed on 1.10.2020.
5. Harvey A. & Peters S. (1990), "Estimation procedures for structural time series models", Journal of Forecasting, no. 9, pp. 89-108.
6. Sean J. Taylor, Benjamin Letham (2018). Forecasting at scale. The American Statistician, no. 72(1), pp. 37-45.
7. GitHub. 2020. Facebook/Prophet. [online] Available at: <https://github.com/facebook/prophet/tree/master/python> [Accessed 18 October 2020].
8. West M., Harrison P.J. and Migon H.S. (1985). Dynamic generalized linear models and Bayesian forecasting. Journal of the American Statistical Association, no. 80(389), pp. 73-83.
9. West M. and Harrison, J. (1996). Bayesian forecasting. Encyclopedia of Statistical Sciences.
10. Geweke J. and Whiteman C. (2005). Bayesian forecasting. Forthcoming in Handbook of Economic Forecasting, Edited by Elliott, Granger and Timmermann, Elsevier.
11. Fei X., Lu C.C. and Liu K. (2011). A bayesian dynamic linear model approach for real-time short-term freeway travel time prediction. Transportation Research Part C: Emerging Technologies, no. 19(6), pp. 1306-1318. Vancouver.
12. West M. (2013). Bayesian dynamic modelling. Bayesian Inference and Markov Chain Monte Carlo: In Honour of Adrian FM Smith, pp. 145-166.
13. Ma T.Y. and Pigne Y. (2019). Bayesian dynamic linear model with adaptive parameter estimation for short-term travel speed prediction. Journal of Advanced Transportation.
14. Hyndman R.J., Koehler A.B., Snyder R.D. and Grose S. (2002). A state space framework for automatic forecasting using exponential smoothing methods. International Journal of forecasting, no. 18(3), pp. 439-454.
15. Box G. E. P., & Jenkins G. M. (1970). Time series analysis: Forecasting and control. San Francisco: Holden-Day.
16. Box G. E. P., Jenkins G. M., Reinsel G. C., & Ljung G. M. (2015). Time series analysis: Forecasting and control (5th ed). Hoboken, New Jersey: John Wiley & Sons.
17. Hyndman R.J. and Khandakar Y. (2007). Automatic time series for forecasting: the forecast package for R (No. 6/07). Clayton VIC, Australia: Monash University, Department of Econometrics and Business Statistics.
18. Jalles J.T. (2009). Structural time series models and the kalman filter: a concise review.
19. Koehler A.B., Snyder R.D., Ord J.K. and Beaumont A. (2012). A study of outliers in the exponential smoothing approach to forecasting. International Journal of Forecasting, no. 28(2), pp. 477-484.
20. Chen Tianqi & Guestrin Carlos (2016). XGBoost: A Scalable Tree Boosting System. 785-794. 10.1145/2939672.2939785.
21. Breiman L., Jerome H. Friedman, Richard A. Olshen and C. J. Stone (1983), "Classification and Regression Trees."
22. Drucker Harris & Burges Christopher & Kaufman Linda & Smola Alexander & Vapnik V. (1997), "Support vector regression machines." Adv Neural Inform Process Syst. No. 28, pp. 779-784.
23. Debasish Basak, Srimanta Pal, and Dipak Chandra Patranabis. Support Vector Regression. Neural Information Processing — Letters and Reviews Vol. 11, No. 10, October 2007.
24. Goli Shahrbanoo & Mahjub, Hossein & Faradmal, Javad & Soltanian, Ali-Reza. (2016). Performance Evaluation of Support Vector Regression Models for Survival Analysis: A Simulation Study. International Journal of Advanced Computer Science and Applications. No.7. 10.14569/IJACSA.2016.070650.
25. GitHub. 2022. Dmlc/Xgboost. [online] Available at: <https://github.com/dmlc/xgboost> [Accessed 9 March 2022].
26. Que Z. and Xu Z. (2019). A data-driven health prognostics approach for steam turbines based on xgboost and dtw. IEEE Access, no. 7, pp.9313193138.
27. Henriques J., Caldeira F., Cruz T. and Simoes P. (2020). Combining k-means and xgboost models for anomaly detection using log datasets. Electronics, no.9(7), p.1164.
28. Dhaliwal S.S., Nahid A.A. and Abbas R. (2018). Effective intrusion detection system using XGBoost. Information, no. 9(7), p.149.
29. Sak H., Senior A. and Beaufays F. (2014). Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. arXiv preprint arXiv:1402.1128.
30. Khandelwal, U., He, H., Qi, P. and Jurafsky, D., 2018. Sharp nearby, fuzzy far away: How neural language models use context. arXiv preprint arXiv:1805.04623.
31. Li S., Jin X., Xuan Y., Zhou X., Chen W., Wang Y.X. and Yan X. (2019). Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. arXiv preprint arXiv:1907.00235.
32. Wang M., Abdelfattah S., Moustafa N. and Hu J. (2018). Deep Gaussian mixture-hidden Markov model for classification of EEG signals. IEEE Transactions on Emerging Topics in Computational Intelligence, 2(4), pp.278-287.
33. Zong B., Song Q., Min M.R., Cheng W., Lumezanu C., Cho D. and Chen H. (2018), February. Deep autoencoding gaussian mixture model for unsu-pervised anomaly detection. In International Conference on Learning Representations.
34. Miin-Shen Yang, Chien-Yo Lai, Chih-Ying Lin. A robust EM clustering algorithm for Gaussian mixture models. Pattern Recognition. Vol. 45, Issue 11, November 2012, pp. 3950-3961.
35. Statsmodels.org. 20 2. Statsmodels.Tsa.Holtwinters. Exponentialsmoothing - tsmodels. [online] Available
<https://www.statsmodels.org/dev/generated/statsmodels.tsa.holtwinters.Exponen tialSmoothing.html> [Accessed 9 March 2022].
36. scikit-learn. 2022. 1.4. Support Vector Machines. [online] Available at: <https://scikit-learn.org/stable/modules/svm.html> [Accessed 9 March 2022].
37. Alkaline-ml.com. 2020. Pyramid.Arima.Auto_Arima - Pyramid 0.9.0 Documentation. [online] Available at: <https://alkaline-ml.com/pmdarima/0.9.0/modules/generated/pyramid.arima.auto_arima.html> [Accessed 7 October 2020].
38. GitHub. 2022. PyDLM [o Available <https://pydlm.github.io/> [Accessed 9 March 2022].
39. Angiulli F., Basta S. and Pizzuti C. (2005). Distance-based detection and prediction of outliers. IEEE transactions on knowledge and data engineering, no. 18(2), pp. 145-160.
40. GitHub (2022). Pomegranat [online] Available at: <https://pomegranate.readthedocs.io/en/latest/HiddenMarkovModel.html> [Accessed 9 March 2022].
41. Huawei Enterprise (2021). GaussDB + AI-Native Distributed Database Huawei rprise. [online] at: <https://e.huawei.com/ru/solutions/cloud-computing/big-data/gaussdb-distributed-database> [Accessed 27 February 2021].
42. Li G., Zhou X. and Li S. (2019). XuanYuan: An AI-Native Database. IEEE Data Eng. Bull., no. 42(2), pp. 70-81.
43. 3GPP, "Architecture enhancements for 5G System (5GS) to support network data analytics services," 3rd Generation Partnership Project (3GPP), Technical Specification (TS 23.288), Sept. 2019, version 16.1.0. [Online]. Available: https://portal.3gpp.org/desktopmodules/ Specifications/Specification De-tails.aspx?specificationId=3579
44. 3GPP, "System architecture for the 5G System (5GS)," 3rd Generation Partnership Project (3GPP), Technical Spec- ification (TS 23.501), Sept. 2019, version 16.2.0. [On-line]. Available: https://portal.3gpp.org/ desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=3144
45. 3GPP, "5G System; Network data analytics services; Stage 3," 3rd Generation Partnership Project (3GPP), Technical Spec- ification (TS 29.520), September 2019, version 16.1.0. [On-line]. Available: https://portal.3gpp.org/ desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=3355
46. 3GPP, "5G System; Unified data management services; Stage 3," 3rd Generation Partnership Project (3GPP), Technical Spec- ification (TS 29.503), Sept. 2019, version 16.1.0. [On-line]. Available: https://portal.3gpp.org/ desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=3342
47. Sevgican S., Turan M., Gokarslan K., Yilmaz H.B. and Tugcu T. (2020). Intelligent network data analytics function in 5G cellular networks using machine learning. Journal of Communications and Networks, no. 22(3), pp. 269280.
48. 3GPP TR 23.791,
49. Inform.tmforum.org (2021). NWDAF: Automating the 5G network with machine learning and data analytics | TM Forum Inform. [online] Available at: <https://inform.tmforum.org/insights/2020/06/nwdaf-automating-the-5g-network-with-machine-learning-and-data-analytics/> [Accessed 27 February 2021].
ИССЛЕДОВАНИЕ БАЙЕСОВСКИХ И НЕБАЙЕСОВСКИХ МЕТОДОВ ПРОГНОЗИРОВАНИЯ ВРЕМЕННЫХ РЯДОВ В ПРИМЕНЕНИИ К СИСТЕМАМ OSS МОБИЛЬНЫХ СЕТЕЙ LTE/LTE-A И 5G
Фадеев Владимир Анатольевич, ФГБОУ ВО «Казанский национальный исследовательский технический университет
им. А.Н. Туполева-КАИ», г. Казань, Россия, [email protected] Зайдуллин Шайхрозы Васимович, ФГБОУ ВО «Казанский национальный исследовательский технический университет
им. А.Н. Туполева-КАИ», г. Казань, Россия, [email protected] Надеев Адель Фирадович, ФГБОУ ВО «Казанский национальный исследовательский технический университет им. А.Н. Туполева-КАИ», г. Казань, Россия, [email protected]
Аннотация
В данной статье исследуется применимость некоторых традиционных моделей прогнозирования временных рядов для временной динамики частоты отказов установки однонаправленного радиоканала EPS. Рассмотрены две основные задачи проактивного управления сетью: предсказание регулярной части временного ряда и предсказание выбросов. Для предсказания регулярной части использовались экспоненциальное сглаживание Холта-Винтерса, экстремальное повышение градиента (XGBoost), регрессия опорных векторов (SVR), динамическая линейная модель Python (PyDLM) и сезонное авторегрессивное интегрированное скользящее среднее (SARIMAX). Показатели ошибок были проанализированы с использованием медианной абсолютной ошибки, средней абсолютной ошибки и среднеквадратичной ошибки. Для прогнозирования выбросов был предложен двухэтап-ный подход. Применение данного подхода оправдано тем, что моменты времени возникновения аномалий являются предпочтительными для практических целей. На первом этапе значения временных рядов прогнозируются с использованием одной из вышеупомянутых моделей. На втором этапе результирующие значения с первого шага классифицируются с использованием скрытой марковской модели с дискретным состоянием. Для оценки производительности по ошибкам для соответствующих моделей были рассчитаны показатели истинного положительного, ложноположительного и ложноотрицательного результата. Также было сделано несколько предложений по использованию рассмотренных алгоритмов в упреждающем автономном управлении сетью в системе поддержки операций (OSS) или функции анализа сетевых данных (NWDAF).
Ключевые слова: key performance identifiers (KPI), RAB (Radio Access Bearer) protocol, E-RAB (E-UTRAN Radio Access Bearer), LTE (Long-Term Evolution) network, LTE-Advanced network, 5G, time series forecasting, Bayessian, SARIMAX, XGBoost
Инфомрация об авторах:
Фадеев Владимир Анатольевич, аспирант, ФГБОУ ВО «Казанский национальный исследовательский технический университет им. А.Н. Туполева-КАИ», г. Казань, Россия Зайдуллин Шайхрозы Васимович, аспирант, ФГБОУ ВО «Казанский национальный исследовательский технический университет им. А.Н. Туполева-КАИ», г. Казань, Россия Надеев Адель Фирадович, д. ф.-м. н., профессор, ФГБОУ ВО «Казанский национальный исследовательский технический университет им. А.Н. Туполева-КАИ», г. Казань, Россия