Detection of anomalies in network traffic using the methods of fractal analysis in real time
In this paper we propose the method for traffic anomaly bursts online detection using fractal methods. Method stands on current measurement of fractal properties using sliding window and muhiresolution wavelet analysis. Traffic's fractal properties are measured by Hurst parameter. Multireslotion wavelet analysis is carried out with using of discrete wavelet transform of scaling processes. As a result, network traffic translates into time-frequency space where it could be variously analyzed. Therefore, wavelet transform could be shown as a sort of concurrent observation of time sequences of all durations on different scales. It has been shown, that in practical use of proposed procedure of estimation of Hurst parameter, lower boundary of scaling should be determined. To choose the interval of scaling it better to use algorithm of auto determination of scaling's lower boundary. By the value of lower boundary we can judge about the placement of boundary between short range and long range correlations in test data. Relations for measurement of fractal dimension for current placement of analysis window that were obtained in experiment, are generalization of known results for online case and allow us to calculate current estimate of Hurst parameter instead of it steady value. Because in the task of sliding window detection the amount of analyzed sequence limits the number of decomposition levels - octaves, the task of detection is considered for different types of maternal wavelets. We provide tips for maternal wavelet type choice and for size of analysis window. We have shown the efficiency of algorithms that are fair for the case with sufficiently common conditions in Gaussian approximation that allow us to detect anomaly bursts of traffic. We provide the extension of obtained with results of Hurst parameter monofractal wavelet measurement for multifractal case. We had performed comparative analysis of the accuracy of traffic anomaly bursts detection with the example of anomaly caused by SYN-flood attack that is called Neptune and using different kinds of wavelets.
Keywords hurst parameter, long range dependency, traffic packets, parameters measurement, stationary, telecommunication networks, time boundaries anaysis, wavelet decomposition, analysis sliding window.
Sheluhin O.I.,
Moscow Technical Univercify of Communication and Informatics, Moscow, Head of department of Information security and Automation, professor, PhD, [email protected]
Pankrushin A.V.,
Moscow Technical Univercify of Communication and Informatics, Moscow, PhD student of department of Information security and Automation, [email protected]
1. Introduction
The development of efficient methods for anomaly events detection in network processing, those are result of technical issues or impacts of third party, is an actual task. Related events may happen, for example, while attack or intrusions in network. The main condition that imposes for those methods is the possibility of anomalies detection of any types, including those pre-unknown, also distributed in time impacts.
Considering local behavior of analyzing singularities, their multiscale and varied forms, the most perspective methods for those tasks are methods of multiscale wavelet analysis. As it known, one of the advantages of wavelet analysis is its ability to accurately analyze changes in the properties of the signal in time-frequency domain.
Measurements obtained by using wavelet analysis have additional advantage in form of durability against some types of traffic's unsteadiness, for example deterministic trends.
Nowadays it has been strongly proven that in the wide set of generalized data types, including those that similar to the mostly important components of telecommunication traffic, there are fractal properties. The most known examples are shown in [24].
Determination of the parameters self-similarity in real time is needed to create the controls for managements and monitoring networks, which might work with same frequency scales as network traffic belongs to. Also it is needed for the ability to perform measuring on network devices itself without need to save large amounts of data in memory.
Unlike other articles about this topic the main aim of this paper is the research of detection ability of changes in fractal scale (power of self-similarity) which caused by anomaly changes in properties of network traffic in online (current) time scale. To address this task we're proposing to use widely spread and efficient method for determination the current fractal scale of network traffic, that based on mathematical methodic of multiresolution analysis [2], [3].
The approach of determining the parameters of self-similarity in real time, shown below, is needed to make the efficient tool for intrusion detection in computer networks and gives wide abilities for the analysis of signal properties.
1. Main declarations of muitircsolution analysis
As ii known, orthogonal mult ¡resolution analysis (MRA) refers to definition of L~(R) field of complex value functions Aff) with finite energy through hierarchically nested subspaces y^ c L2(R\TV, = Q,+l,+2 which are nonintersecting and limit
of union of which gives L~(R). This system of subspaces should satisfy to conditions:
- Enclosure: V c V
'rn 'm + ii
- Completeness and density ofpartition; Umef VL —■ L2 (R) i
- Orthogonally subspaces: f \'m = ft)};
- Preservation in subspace with shift of functions: *CO e Vm « xCt + 1)6 Vrn:
For any function x{t) e Vm scale transform of argument by 2 moves function into nearby subspace:
For spacc pr0 it is function o( t) G V0 which integer shifts by argument make ortho normal basis of space V01 <p0,k = <p(t-k),kel(.k = 0, ±1, ±2, ...)■ Function callcd scaling function and it implements normalization condition
I"
From those conditions it infers, that if subspace V» has orthonormal basis (pok- then all other subspaces also have orthonormal bases which are
made by scalc transform of basis <p6y.<pmM(i) = c&(p(aml - k),m,k E / If the signal jt(t) belongs to spacc V„„ then at the same time it belongs to space y„„i and also there is the signal .v(2/} in that space. Increasing the space's number wc can research smaller and smaller details and features of the signal with help of higher frequency components (there is analogy with microscope).
All of the noted conditions together make it possible to decompose any signal X(t) F 1? (ft) by subspaces Vmt i.e. by set of sequential mul-
tiscale and orthogonal to each other functions xm(t') G Vm- Union of
result functions gives initial signal X(l), or approximates signal with certain accuracy that depends on boundaries for number of values of scaling coefficient m (so as number of subspaces V„). Functions x^ft)
are orthogonal projections of the signal ,v{/) on subspaces Vm. That gives the possibility to analyze function or signal with different levels of resolution or scale. Variable m is called the scale coefficient or the level of analysis. If value of m is low, then function _vm(t) 'S harsh approximation of .\'(y) that has no details. With increase of value of m accuracy of approximation is increasing as well.
From combination of initial conditions of muitircsolution analysis it infers that translation of the signal from the space K„„, with bigger resolution into space V,„ represents normalized decimation of the signal -double filtration with appropriate double decrease in number of signal points. That is likely to low frequency filtration of signals (Jt) € Vm +. hy operator hk with slice frequency equals to half of
Nyquist frequency of signals fm+. (k) with reduction (because of filtering) of main frequency interval of decimated signal xm(k~) 'n tw°. The
procedure has very certain physical representation in signals specter division (and so signals themselves while reconstruction from specter) for low frequency Vm.t(w) and high frequency W„._1 parts.
To except the loss of high frequency information, that might he needed for reconstruction of a signal, wc need to "translate" and to save it in new subspaces Wm, which are orthogonal to subspaces V„ as following: ^m + i ~
Subspaces W,„ are called as details in that meaning, that its contain that additional information (that does not intersect with space V„),
which is needed for level increase of signal resolution from V„, to K„,i while reconstruction.
Therefore while we perform multiresolutional analysis of signal X(l), that notes the sequence of projections from signal X(l) to every approximation subspace Vj, we can infer:
approX/Ct) = (Projy.x) (r) .= ^T ax.(j, k)<pJik(t)
k
Because c then approx^ 's rougher approximation of .V,
than appruxj therefore the main idea of MRA is in consideration of
information losses containing in details while transformation from one approximation to another more rough approximation:
detailjlt) = approx>_ 1 (t) — approXj(t)
MRA shows, that details of the signal deluilj cim ^ obtained straight
from projection X(t) to subspace W, which is called as wavelet space. Furthermore, in theory MRA shows that there is function that is called as
maternal wavelet, and there is derivative from (pg, that is it example
i^j,kW = 2-jVV0(2~ fc).fc e z) makes Riesz secluellce for wi-
detailjQO m (ProjWjx)(_f) = ^dx(j,kypJk(jO.
k
As a result, MRA is in rewriting information in X(t) according to
obtained with different resolution details and approximation with low resolution
xit) = approXjtt) + ^ detailj(t) j=i
1
~ £ k)<plk (t) +V^ dx (ji k)i,jk (t). (1) k j=1 if Coefficients of approximation ax(j,k) and details dx(j,k) are determined by scalar multiplications with scaling function cpjk(t) ^ maternal wavelet yp k (t) respectively.
Therefore, regarding to wavelet analysis1 s definitions, time sequence X(r) can he observed as = Xj(t) + Y/ Dj (t)' where
X(t) = S^-q2 1 a <cV]k(0 is a tunct'on of initial approximation
which corresponds to scale j (j </maje), and a,ik = (X(t),(p} k) is a
scaling coefficient that is equal to scalar multiplication of initial sequence i'V) and scale function of roughest scale J that shifted on k scale points to right from beginning of coordinates;
£/(0 = ZIIq1'1 djJcipJk(t) ~ delail function of scale level /, where k — u)is a tielaiI coemcienl of scale./' that is equal to scalar
multiplication of initial sequence X(i) and wavelet of scale/, which is shifted on k scale points to right from the beginning of coordinates. Here r.0 = 2Jmax.(n0 < jV) an(l = [log. jV] is maximum number of decomposition scales; [log - iV] ~ integer part of number [log- N]-
The value of scale index / = 0 corresponds to the ease of maximal resolution - most accurate approximation that is equal to initial sequence X(i), that consists from n0 points. With increase in j (Q <j < )max] we 'lave
transformation to rougher resolution.
Features of sliding window method's for measurement of self-similarity
Let [Y(f(),j — 1,Af} will be discrete random process that is determined on interval i = I...N and let wavelet transform of a traffic will be performed inside sliding window with size N. Analysis window shifts
with step i < iV- As in result while moving from left to right analysis
window will do in steps Ai = — m ~ 1, M Then detail coefficients
a
dj% in fij-th step of a window could be found in the end of analyzed interval.
As it shown in [2], to measure the fractal dimension in m-th step of a w indow we need to evaluate the mean of squares of detail coefficients
Ai olle lralfic sequence with size N„, Here M[.] is an
operation of determination of second central moment. Of the size of sequence X(t) that is analyzed in window is equal to Na, then the possible number of wavelet coefficients in octave j is equal to rij = 2~!N0.
Regarding to stationarity and weak statistical dependency of wavelet coefficients, evaluation of aggregated mean value might be carried out
bv local (temporal) mean value nf
. . . .7
(2)
Coefficient I^C")
'(J11) j measures the value of energy in analyzed
signal within the window with number m respectively to time moment 21 k and frequency 2~^v0 where t>0 a random reference frequency that is given by is 'Jj0 function.
To calculate the current measurement of a Hurst parameter Hm
within /H-tli analysts window's position we need to carry out linear regression on a scale of/within interval [ft;_/',] respectively to the equation:
bg г(Я;,т) = loS г
= (2 Hm-i)j+t =
= 0 + where £ = cor.st-
The equation (3) depicts a possible method to measure Hurst parameter of the process as a linear dependence.
This means that if process j = 1,JV0 j has long time dependency
with Hurst parameter H,„, then the plot of log20*;.m) from j, that is called as logarithmic diagram (LD), has linear slope 2Mm —1 and scale index am = (2Hm — l)can be obtained by measuring of slope of the function's log; (JJ.j n) plot from j while every m-th analysis window's position.
The weighted estimate <im for a within interval [/(; /\] on «;-th window's position can be obtained with using the method [3,4, 11 ]:
ffm = Vwjy^m C4)
где
j j
WJ Сss^sD.y
<ш »
TJj/n?.
a} =
v, =
5, -jS±
СSS2-S\-)af
-Ï^J'Ag,
(10) (11)
where T(%) = JQ t* XS 1 dt,T - gamma function, f - it derivative and f(2,z) = 1/C2 + n)2 ~ generalized Riemann zeta function; ^(x) — f (x)/T(x) - 1'si function (also known as Digamma function); rij - number of detail coefficients on decomposition level j.
Therefore, defining qui utiles S,S],S2 we can obtain the weighted estimate of a for q, that isn't shifted within interval [juj^'.
Respectively, the current
(12)
value of Hurst parameter //
m-til window position defines by the relation:
a 1 +
Hm = ~ , m = TJÏÏ .
(13)
In practice, Lo use the proposed procedure for measuring Hurst parameter we need to define the scale lower boundary j\. To choose the interval ofscaling \j\; yj we can use automated algorithm of scale lower boundary detection [7]. By the value of scale lower boundary we can judge about placement of the boundary between short term and long term correlations in experimental data. Because in estimation of Hurst parameter we lake in account only long term correlations and the presence of strong short term correlation might give us only error, therefore as a start point for approximation oflogarithmic diagram we used j\ = 3, Using equations (4),..(12) we can write relation (13) for measurement of Hurst parameter within w-ih window position and octaves / and j2as follow:
1
"mOWî) = о
-+1
(14)
where
jOO
(J,k)\2)
In2 2
rij In3 2
(5)
(6) (?)
m
(9)
and weight coefficient
,J k
S, = (nln22)/2/+1 is opposite to theoretical asymptotic variance tjj
[3, 4J. The equation (14) is a generalized form of known results for the case of analysis sliding window and unlike [7] it allows us to calculate the current estimate and not the Hurst parameter steady value.
Because it's priory unknown, which and on what scales the property of sealed invarianee could exist, if ever exists, then in practical use we can make the solution with lower octave intervals for every alleged scaling mode. The decision about that accepts based on results of logarithmic diagram observation using confidence intervals. The linear regression should limit each from the confidence intervals in chosen interval [fi;hl
The logarithmic diagram, that based on discrete wavelet transformation, can be realized using fast pyramidal filtering algorithm with very low complexity that is approximately 0(/?), where n - long of time sequence. This algorithm also has the advantage in memory utilization with dividing the data into blocks that can be analyzed and reconstructed with least calculation complexity.
The observation of statistical properties of wavelet coefficients, that had been obtained while wavelet analysis of network traffic in real networks [2, 5, 6], shows that in common detail coefficients i/t(j, k) are random Gaussian values. In other words, they had been obtained from wavelet decomposition of a process, that is Gaussian itself. Moreover, if we promote the property of weak correlation of wavelet coefficients into the whole independence, then we would be able to calculate g(f) and aj
analytically from relations (8) and (9) respectively;
ll needs to be noted, that those analytical equations depend only on knowledge of iy and can be easily measured In practice. Simulations derived in [4] show, that for the Gaussian processes those analytical calculations are good approximation for real conditions.
In case of non-Gaussian process estimator realizing would be harder. However, in case of the processes with limited variance values fl/tlj) Vt dyij, A* i2 are asymptotically Gaussian and it can be shown
that correction values can be derived [3] to Gaussian case using following relations:
1 + C4(/)/2 2 2 1 + Q(/)/2
3(J)~ 71,1032 ' aJ " log^ 2 nj where C4 is normalized cumulant order of 4 for the wavelet coefficients of octave j:
Mdx(j.ky~3(Mdx(j.kyy *}) (MdxQ,kyy
Efficiency of estimation of //;
As it known [2, 3, 4, 7], with presence of long range dependency standard estimates of second order statistic like (1 хь ) '^at ''ог
example like variance of process .v. have very bad statistical properties, because analyze data has strong correlation.
В пространстве отображения вей влет коэффициентов, как показано [20], для LRD процесса с параметром И
Mdx(j,k)dx(J,kJ) = 0(\к - k'\2H-2~2Nl\k - к'\ - со.
It clearly shows that correlation structure of processed data, i.e. the data in form of wavelet coefficients, doesn't have long term dependency with condition N > H 1 as in opposition to long term dependency which contains in original data.
This reduction of correlation in analyze data set, that bad been obtained after wavelet analysis, allows us to use standard evaluations of variance as 1 /Tlj fc) |г With Gaussian and quasi-decorrclated
wavelet coefficients an equation for variance of estimate И can be shown as [7]
- 2 1 — 2;
Where / = j2 — Л is a number of octaves that are used in linear interpolation and n. — 2~JlN0 is a number of available coefficients in octave j\.
From the formula for variance estimation (9) in Gaussian and asymptotic approximation we can get the confidence interval
H — sgzp — H - H
where Zq presents 1 — /} quintlle of Gaussian distribution, i.e. P(z > Zp) = (3. All results shown below and obtained from simulation and from the real data analysis were calculated with p = 0.C25 (i.e. 95% confidence interval} and based on previously defined hypotheses.
The extension of detection algorithm
By extension of the considered before monofractal wavelet estimate we mean multifractal estimate. In addition to second moments (variance) for wavelet coefficients that estimate may takes into account moments with higher order "j
(14)
3 k = 1
which is likely to equation (13), that is used in monofractal case, but that took in account the order of moment q.
Experimental results
Fig. I presents the results of estimating the self-similarity in sliding window with different q. The plot of N = 10J long was considered as a implementation of traffic with anomaly that was caused by network attack SYN-flood (Neptune). The time placement of attack is shown on the top and marked with the window.
Analysis window was configured as follow: A = 10; N(> = 10\ For the wavelet system we used 1 laar wavelet. The interval \j\yj2\ was taken as [3,9].
_J—1
100 200 300 400 500 600 700 800 300 1000
0 too »0 300 J00 500 600 700 800 300 1000
Fiy. 1. The plot of realization of network traffic with anomaly and the estimates of Hurst parameter with different values of parameter q using Haar wavelets. Parameters of the window are A = 10; Afc = I03
The opportunities for anomaly detection in network traffic using fractal methods are well illustrated with estimations histogram of Hurst parameter for moments "before" and "on" the attack (fig. 2).
It can be seen that detection of anomaly bursts of traffic with using wavelets can be performed by estimation of burst in current value of Hurst parameter. Using Ney man-Pears on lemma the choice of the threshold can be done based on given value of probability of
false alarms;
r+CB
PFA = \ w(x)e(3C = const
threshold
The values of found characteristics of anomalies detection reliability when used as in formative parameter Hurst are shown in Table I.
Гкстогрэмиз распределения "на атаке"
This multifractal estimate measures the slope am performing linear regression for with given interval for j. Flurst parameter H can be
rj .m
calculatcd with equation
(15)
o
Нщд -0.5+-
3.J5 05 0.Б5 0.6 D.65 0.7 0.75 0.3 Q.85 0.9 0.И 1 1.И t.l 1 15 1.2 ^начення параиатра Карста
Fig. 2. Hurst parameter histogram on the basis of processing the entire implementation
Table I
The characteristics of reliability of detection of anomalies caused by SYN-flood (Neptune) network attack using Haar wavelets
q = 2 | if = 4 ] ¡, = 6 q = 2 \ q=4 j q=6
PFA 0.01 0..05
Utiwahold 0,86 0,6 0.56 0,76 0.6 0.56
Probability of right detection 0,93 0,98 0,96 0,99 0,99 0,98
1 DO 200 500 400 M W IN W ЗОН 1000
II» W ЛИ Ш Я Ш 700 900 W0 1000
Fig. 3. Tlie piot of traffic implementation with presence of anomaly impaci and
estimates of Hurst parameter with different values of parameter q using Daubechies wavelets (db6). Analysis window's parameters A = 10; iV0 = 2900
In a second case we will characterize analyzing data with condition of equal number of decomposition levels independently from the choice of wavelet system. As the result, we had to increase the length of analysis window when using Daubechies wavelets (db6) to 2900 coefficients, A = 10, then number of octaves were equal to 8. The values of obtained characteristics of anomaly detection reliability using Hurst parameter and Daubechies wavelets (db6) are shown in table 2.
Table 2
The characteristics of reliability of detection of anomalies caused by SYN-flood (Neptune) network attack using Daubechies {(/66) wavelets
я = 0.01 Ч=2 <Г 4 4 = 6 a - 0.05 4 = 2 </ = 4 q = 6
Threshold 0,85 0,44 0,38 0,62 0,42 0,37
Probability of right diitcelion 0,51 0,96 0,96 0,99 0,99 0,97
The choice of wavelet system
In [1...3] shown that for estimation of Hurst parameter H, the main property of wavelet is the number of tending to zero moments. It doesn't matter if wavelet is symmetrical or not, has orthogonal, half or biorthogonal basis - it has no significant theoretical or practical meaning. However, the estimation of Hurst parameter in detection tasks has its own features related to task condition. It depends on a length of analysis data. Because in detection task the length of analyzing traffic is limited by the size of analysis w indow, the number of decomposition levels is varying for different wavelet systems. We show this with the example of two wavelet systems - Haar and Daubechies. The reason why there were chosen Haar and Daubechies wavelet systems is its orthonormality and that they forms the basis where number of moments seeking to zero can be easily increased.
In a first case we will characterize analyzing data that obtained using analysis sliding window with length of 1000 coefficients independently from wavelet system. As the result, using Haar wavelets the number of decomposition levels was equal to 9, while using Daubechies wavelets the number of decomposition levels was only 5. Considering that the error of Hust parameter estimation depends on available number of decomposition levels it can be derived that in case of Haar wavelets it will be significantly lower than in case of Daubechies wavelets.
The analysis of these values shows that reliability of detection significantly increased compare to the case of using window that gives only 5 octaves. However there is unacceptable large probability of false alarms caused by mismatch between lengths of analysis window and network attack. Therefore, analysis that been carried out allows us to conclude that the choice of wavelet system has significant meaning in detection tasks.
Conclusion
To solve the problem of detecting changes in the fractal dimension due to abnormal changes in the properties of telecommunications traffic in real (current) time scale is proposed to use the method of assessment, which is based on the mathematical formalism of multiresolution analysis, To solve the problem of detection is proposed to use mono and multifractal estimates.
The proposed method is a generalization of commonly known results for the case of analysis sliding window and instead of known articles it allows calculation of current estimate and not the steady value of Hurst parameter. It is shown that in detection problems the choice of wavelet system has significant meaning and the length of analysis window is limited from the one side by condition for precision of Hurst parameter estimation and from the other side by length of the anomaly.
The proposed method of self-similarity parameters real time estimation is required for making of effective tools for intrusion detection in computer networks and presents wide opportunities to solve the problems of analysis of properties of telecommunication signals.
lieferences
1. Sheluhin 0.1., Sakalema D.Zh.. Filinova A.S., Intrusion detection in computer networks. Network anomaly. - M: Hotline - telecom, with 2013-220.
2. Sheluhin O.I. Mult [fractals. Information applications. M,: Hotline -Telecom, 2011. - 314 p.
3. Abry P.. Veitch D.. Wavelet analysis of long-range dependent traffic, IEEE Trans, on Info. Theory, vol.44, no. I, p. 2-15, 1998.
4. Abry P., Taqqu MS, Flandrin P.. Veitch D., Wavelets for the analysis, estimation, and synthesis of scaling data, in Park K„ Willinger W. (Eds.), Self-similar Network Traffic and Performance Evaluation, John Wiley & Sons, pp.39-88,2000,
5. Sheluhin O.I., Pankrushin A.V., Validation ofnetwork traffic anomaly detection methods discrete wavelet analysis // T-Comm, №10, 2013. - From 110- M 5.
6. Sheluhin O.!., Patikrusliin A.V., Measuring of Reliability of Network Anomalies Detection Using Methods of Discrete Wavelet Analysis// Science and Information (SA1). Conference 2013. London, (JK.p.393-397.
7. Veitch D., Abry P.. A wavelet based joint estimator of the parameters of longrange dependence. 1EEF, Transactions on Information Theory (special issue on Mullis-ea!e statistical signal analysis and its applications), vol. 45, no, 3, pp.878-897, 1999.
8. Veitch D„ Abry P. Fkmdrin P.. $hainais P.. Infinitely divisible cascade analysis of network traffic data, in Proceedings of ihe International Conference on Acoustics, Speech, and Signal Processing (Istanbul. Turkey), June 2000.
9. Sheluhin OA., Smolskiy S.M., Osin A.V., Self-similar processes in telecommunications. John Wiley & Sons, 2007. - 320 p.
10. S. Mallcit, Wavelets in signal processing. Lane, from English, - M.: World 2005.-671 p.
! I, Sheluhin OX, Antnnian A.A., Analysis of changes in the fractal properlies of telecommunication traffic caused bv abnormal invasions // T-Comm, №6, 2014.- C.61-64.