Розроблено адаптивну систему виявлення тбератак, яка базуеть-ся на удосконалених алгоритмах роз-биття простору ознак на кластери. Удосконалена процедура розтзнаван-ня за рахунок одночасног кластеризации та формування перевiрочних допусти-мих видхилень для ознак аномалш та кiбернападiв. За допомогою iмiтацiй-них моделей, створених у Ма^АБ та БтиНп^ перевiрена працездаттсть алгоритмiв розтзнавання тбератак у критично важливих тформацшних системах
Ключовi слова: система виявлення тбератак, тбербезпека, кластериза-щя ознак, перевiрочнi допустимi вид-хилення
Разработана адаптивная система обнаружения кибератак, основанная на усовершенствованных алгоритмах разбития пространства признаков на кластеры. Усовершенствована процедура распознавания за счет одновременной кластеризации и формирования проверочных допустимых отклонений для признаков аномалий и кибератак. С помощью имитационных моделей, созданных в MatLAБ и Бтп1т^ проверена работоспособность алгоритмов распознавания кибератак в критически важных информационных системах
Ключевые слова: система обнаружения кибератак, кибербезопасность, кластеризация признаков, проверочные допустимые отклонения
UDC 004.056
|DOI: 10.15587/1729-4061.2017.102225|
DEVELOPMENT OF A SYSTEM FOR THE DETECTION OF CYBER ATTACKS BASED ON THE CLUSTERING AND FORMATION OF REFERENCE DEVIATIONS OF ATTRIBUTES
V. Lakhno
Doctor of Technical Sciences, Associate Professor*
E-mail: [email protected] V. Malyu kov Doctor of Physical and Mathematical Sciences** E-mail: [email protected] V. Domrachev PhD, Associate Professor Department of Applied Information System Taras Shevchenko National University of Kyiv Volodymyrska str., 60, Kyiv, Ukraine, 01033 E-mail: [email protected] O. S tepanen ko Doctor of Economic Sciences, Associate professor Department of Economics Information Systems Vadym Hetman Kyiv National Economic University Peremohy ave., 54/1, Kyiv, Ukraine, 03057 E-mail: [email protected] O. Kramarov Postgraduate student** E-mail: [email protected] *Department of Managing Information Security*** **Department of Information Systems and Mathematical Sciences*** ***European University Akademika Vernads'koho blvd., 16 V, Kyiv, Ukraine, 03115
1. Introduction
Active expansion of computer technologies, in particular in critically important information systems (CIIS), is accompanied by the emergence of new threats to cyber security (CS). It is possible to enhance CS of CIIS by using, in particular, intelligent systems (and technologies) for the detection of cyber attacks (ISDA). Given a constant complication in the scenarios of cyber attacks, ISDA must have characteristics of adaptive systems. In other words, the ability to deliberately modify the algorithm for detecting the anomalies and cyber attacks by using the methods of clustering of attributes of the recognition objects (RO), as well as machine intelligent technologies of learning (MITL).
This makes it relevant to examine improvement of those existing and development of the new algorithms for the clustering of RO attributes, as well as the applied adaptive subsystems as a part of ISDA.
2. Literature review and problem statement
Information that is accepted as the basis for building the clusters in adaptive systems of recognition (ASR) of cyber attacks was explored in many studies, for example, in the form of complex attributes of RO in CIIS [1, 2]. These studies were mainly of theoretical character. As indicators or metrics [3] for building the classifiers, the authors investigated: threshold values of parameters of the input and output
©
traffic [4], unpredicted addresses of packets [5], attributes of requests to databases (DB) [6, 7], etc. These articles do not take into account the possibility of parallel formation of reference deviations for the features of anomalies and cyber attacks, which increases the time of RO analysis in ASR (or ISDA) [8]. For complex targeted attacks, information attributes may be quite fuzzy [9, 10], which does not contribute to building the effective algorithms of recognition.
In papers [11, 12], it was assumed that to enhance effectiveness of recognition, it is expedient to split the set of values of each indicator into disjoint groups by certain rules. This task can be solved by using the methods and models for cluster analysis [13, 14]. However, these studies have not been brought to hardware or software implementation.
By using an information condition of functional effectiveness (ICFE) of ASR learning [15, 16], it is possible to implement adaptive algorithms for the clustering of RO attributes into ISDA.
As was shown in articles [17, 18], in case the RO attributes glossary is unchanged, it is possible to improve effectiveness of ASR learning. These studies do not take into account the possibility of increasing the degree of intersection of the RO classes.
Thus, given the potential of the ISDA application, it appears to be an important task to improve the algorithms for clustering and formation of reference deviations of the OR attributes for the timely detection of anomalies and cyber attacks in CIIS.
learning matrices (BLM) [16, 21, 22], obtained at the stage of splitting SF into relevant RO classes. Fuzzy partition RC|M| includes the elements that can be attributed to fuzzy RO classes, for example, when it is difficult to distinguish a DoS attack from a DDoS attack [4, 16].
The rules of ASR learning, according to [1, 2, 14, 23, 24], are built based on the iteration procedure of searching for the maximum boundary magnitude of an information condition of functional effectiveness (ICFE):
1 M
is'k = Argmax{max{...{max — £ ŒJ...^ ISt ISk_, IS,nISCEMm"l
(2)
where CEm is the ICFE of ASR learning to recognize RO that belong to class C^,; ISk is the permissible range of values of the k-th informative attribute of RO; ISCE is the permissible range of ICFE in the course of ASR learning.
The following constraints are imposed on expression (2):
{|_CTm * 0j : (vCTm eRCM)};
(3)
CTao * CTbo ^
^ct° n CT" *0
CTo * CTo
: (3CT° e RCM, 3CTb° e RCM ) J ; (4)
>BCT° n BCTo = 0
: (vCTao e RCM, VCTbo e RCM) 1,(5)
3. The aim and tasks of research
The aim of present research is to develop an algorithm for the partition of the feature space (FS) into clusters in the process of recognition of cyber attacks in the systems of cyber protection.
To achieve the aim of the study, the following tasks are to be solved:
- to improve algorithms for the clustering of attributes of anomalies and cyber attacks and for the simultaneous formation of verifying admissible deviations in the intelligent systems of cyber attack detection;
- to conduct simulation in order to test and verify the adequacy of the proposed algorithms.
4. Algorithms for the clustering of attributes and the formation of verifying admissible deviations in the intelligent systems of cyber attack detection
Splitting FS and further clustering, for any RO class CT^, in accordance with [19, 20], was carried out by transforming FS to a hyper-spherical form. Since the main stage of clustering when splitting FS into groups is an increase in the radius (crm) of container (RC) at every step of ASR (or ISDA) lean-ring, it is possible to use the following recurrent expression:
crm (ls) = |crm (ls -1) + Ç | crm (ls) e ISmr ],
(1)
where ls is the number of steps of increasing RC C^; £, are accepted for the chosen attributes of steps of increasing RC; ISm is the permissible value of RC.
In the process of ASR learning, we make an assumption about fuzzy compactness of the implementation of binary
where BCTa0, BCTb0 are the nuclei of RO classes CTa0 and CTb0, respectively;
U CTm c RSB;a* b;a,b,m = 1,M.
(6)
Accepted assumptions: classes CTa0 are CTb0 adjacent; the classes have a minimum distance between the centers of clusters cr (cta © ctb) among all classes for RO; RO are described by binary learning matrices (BLM) [21-23]. We accepted that cta and ctb are the reference vectors of RO classes, in particular, by the KDD Cup 1999 Data [2, 5, 7].
The ASR learning procedure is given in the form of predicate expression:
CTo * CTo
cra'< cr (cta © ctb ))■
(cta © ctb))
( cr, < cr
: (VCE° e RCM, VCTbo e RCM)
,(7)
where cra', crb are the optimal radii of containers and Cb, respectively.
To reduce the number of cycles during a learning procedure, the sets of input signals (factors) that influence ASR were determined. These sets correlate with the dimensionality of the vector of ASR testing parameters is=<isi,..., isk,..., isRS> in the course of recognition of the templates of attacks.
ASR (or ISDA) learning is an iteration procedure of searching for global ICFE [2, 5, 8, 20, 24]:
ca = Argmax{ max CE},
ISc, ISCE nIScr
(8)
where ISca is the admissible range of magnitudes of reference deviation (ca) for RO class {CTm}; ISce is the operation
range of determining ICFE indicator CE; IScr is the permissible range of RC magnitude cr.
The algorithm of OR classification is functional at the following restrictions:
(J CT^c RS,
(12)
[CT^ * 0, m = 1,mJ : (VCT^ s RCM) },
(9)
where BCT^, BCTc"5 are the centers of the two nearest (adjacent) clusters CT^ and CTO^, respectively; 5 is the step of increasing the radius of cluster container (RCC);
cr^ M crc'5 are, respectively, formed RCC CT^ ^
cT,mi *
* ct° ^ bct°e n bct° = 0
(vd^eRcMVCT;, eRC|M|U, (10)
cT_mê
* CTO
;r't <;r(;t
>4® ;t<
a(ct% <;r(;t„ . ® ;t
VCTm.EERC|M|,VCT°çeRC
,(11)
and CT^; cr (ctm © ctc) is the inter-center code distance of clusters CT^"^ and CT°5.
For better visualization, the stages of splitting FS of RO into clusters in ASR are represented in tabular form in Table 1.
As a criterion of the optimization of parameters, during ASR learning, we used statistical parameters (information measures) for the variants of solutions with two alternatives [18, 25, 26] for a modified entropic indicator, as well as the Kullback-Leibler divergence (for three hypotheses) [27].
Table 1
Stages of splitting FS into clusters
Stage Action Description
1 2 3
1 Step counter (SC) of changing VAD ca; by features of RO is set as "0": l := 0
2 Calculation of the lower Alow [l] and the upper Aup [l] of VAD of RO features for entire FS Al [l] = lm, cacalowi; lowi[ ] . 100 ' A [l] = lmi + cacalowi, upi[] ' 100 where lm; is the i-th attribute of standard vector-realization of nonclassified multi-dimensional matrix (NMLM) lm® [16, 23]; calow is the VAD for RO attributes, which are determined based on methods [2, 16, 21, 23]
3 Formation of BLM ct® Rule ^J1, if Alowi [1]< lm(j)< A„Pi [1]; ' |0, else
4 Value of SC for increasing RC 5 := 0
5 Initialization of SC for increasing RC 5 := 1
6 Splitting NMLM {ct®} into two clusters {CTm [5] | m = 172}
6.1 Initial original standard vectors for RO attributes {ctm} for Cm are calculated Verification of conditions: 1) cr (ct1 © ct0 min, cr (ct2 © ct1 min; 2) cr(ct1 © ct2) ^ max, where ct0, ct1 are zero and unity vectors.
6.2 Value of Cm is set as "0" crm [5] := 0, nm := 0, where nm is the number of realizations of RO, which belong to Cm
6.3 RO implementations, belonging to clusters CTm [5] , are defined Rules: ct eCT," [5], if cr (ct © ct1) <= <= cr & cr (ctj © ct, )<(ctj © ct2); ct, e CT2" [5], if cr (ct, © ct2) <= <= cr & cr (ctj © ct2 )<(ctj © ct,); where ctji = 1,N are the implementations of BLM ||ct®||
1 2 3
6.4 Calculation of current ICFE [2, 5, 8, 24, 25] _. / M CE = (1/M)maxCE,, where CEc is the value of ICFE of ASR learning for the realization of class of anomalies or cyber attacks - CTc0; {ls} is the set of steps for ASR learning as a part of ISDA
6.5 Formation of set {ctm} of standard realizations for clusters {CTm [5]} Rule for defining coordinates: ct if X; ctm,i = { n j=1 [0, else
6.6 Conditions verification f m 1 if N' = £nm <Nthen^6.7&6.3 [else 6.9
6.7 Conditions verification fif crm [5] < cr (ct1 ® ct2) then ^ 6.8&6.3 [else 6.9
6.8 Increasing RC crm [5] := crm [5]+1
6.9 Calculation of ICFE and optimal radii of clusters {CTm [5]} M Under conditions: N' = ^ nm < N, where N' is the number of RO m=1 implementations that belong to RC5 and crm [5]< cr(cti © ct2)
7 Increasing SC 5 := 5 +1
8 Splitting a binary space of features (BSF) into 3 clusters {CTm [5] |m = P}
8.1 Calculation of BLM for cluster CT30, the standard vector-realization ct3 of which satisfies the conditions Verification of conditions: cr (ct1 © ct3) ^ min & cr (ct2 © ct3) ^ min, where ct1, ct2 are the standard realizations of clusters {CT^ | m = 1,2}, restored at performing stage 6
8.2 Value of radius of cluster CT3o is set as "0" cr [5] := 0.
8.3 Determining the cases of obtaining RO features implementations in cluster CT30 Rules for determining the cases of obtaining RO features implementations in cluster CT3o: cti eCT3 if cr (cti © ct3 )<= <= cr & cr (cti © ct3 )<= <= cr(cti ©ct1) & cr(cti ©ct3)<= <= cr (ct1 © ct2), where cti i = 1,N are the implementations of BLM ||ct(j'||
8.4 Correction of containers for clusters {CTm|m = 1,2} is performed Implementations {ss (j),j = 1,n}, which arrived to container of category CT3o, are removed from container {CT^ }. Radius of container {CT^ }: crm[5] := crm [5]-1
8.5 Calculation of current ICFE Expression - stage 6.4
8.6 Formation of set {ctm } of standard implementations {CTm [5]} Rule for defining coordinates: ct if X; ctm,i = { n j=1 [0, else
8.7 Condition verification if cr3 < cr (ctj © ct3) & cr3 < < cr(ct2 © ct3) then ^ 8.8; else 8.9
1 2 3
8.8 Increasing radius cr3 [£] := cr3 [£]+1
8.9 Optimal radius of cluster container CT30 is calculated At conditions: cr3 [5]< cr (ct1 © ct3) & cr3 [5] < cr(ct2 © ct3)
9 Condition verification Jif ca[l]< 0,5• calow then ^ 2 [else 10 calow is the VAD for RO attributes, which are determined based on [5, 8, 25]
10 Condition verification Jif CE [l]g ISCE then ^ 11 [else 2
11 Search for global maximal (GMAX) value CE in the operating range of RO attributes ca* = argmax{ max CE}& ISca ISCEnIScr CE* [l] := extremCEm[l]
12 Based on methods [5, 8, 25] and others, optimal parameter of fields ca of RO attributes for the container is defined ca A°p = lm, caop ; lowi ' 100 ca A°p = lm.+ caop —— upi ' 100
13 Procedure of splitting BSF of RO into 4 clusters: {CTm [5] | m = M}
13.1 Binary matrix of cluster is defined {CT4o} Under conditions: cr (ct1 © ct4 min, cr (ct2 © ct4 min & cr (ct3 © ct4) ^ min, where ct1, ct2, ct3 are the standard implementations of clusters {CTm m = 1,3}, restored when performing stage 8
13.2 Value of radii of cluster CT4o is set as "0" 0-4 [5] := 0
13.3 Determining RO realizations, which arrived to cluster CT4o Rule: ctj e CT4o, if cr (ctj © ct4) <= cr4 [5], where ctj i = 1,N4 are the implementations of BLM ||ctP||
13.4 Calculation of current ICFE Expression - stage 6.4.
13.5 Formation {ctm} of standard implementations for clusters {CTmH} Rule for defining coordinates: ct if ¿cri^ ctm,i = [ n j=1 [0, else
13.6 Conditions verification ifcr4 [^]< cr (ctj © ct4 ), cr4 [^]< cr (ct2 © ct4 ) , cr4 [£] < cr (ct3 © ct4 ) then ^ 8.3&8.8; else 8.9
13.7 The next RO attribute in cluster CT4o is added ct4:= ct4+1.
13.8 Optimal radius of container CT4o is determined At conditions: cr4 [i;] < cr (ctj © ct4 ), cr4 < cr (ct2 © ct4 ), cr4 < cr (ct3 © ^ )
14 Adding results to a knowledge base (KB). End of algorithm operation.
We developed the algorithm that allows us to perform parallel formation of reference tolerances during an analysis of attributes of anomalies and cyber attacks, which are difficult to explain [1, 7, 16, 18]. This approach, when a parallel formation of VAD - ({caKi}) is performed, makes it possible
to change VAD for all attributes at every step of learning simultaneously. The algorithm enables in the course of learning to update optimal parameters of containers for the recognition classes CT00. The stages of splitting FS of RO into clusters are presented in tabular form in Table 2.
Тable 2
Stages of algorithm of VAD formation for the attributes of recognition of cyber attacks, anomalies or threats
Stageee Action Clustering algorithm for a mathematical description of RO attributes
1 Value of meter of steps of VAD change ca; for RO attribute «0» l := 0
2 Calculation of Alow [l] and Aup [l] of VAD of RO attribute for entire FS Al [1] = lm,, cacaloWI ; w L] 1,1 100 A p] = lm.^ cacaloWI , upiL] 1j 100 where lmy is the i-th attribute of vector-standard of implementation lml for basic class CT10. (It was accepted that cT0 characterizes the most acceptable states of IB).
3 Formation of BLM ctij) Rule: ct(j) J1, if A1owiD]< lmij)< a„P; p]; m>i [0, else
4 Formation of set {ctm} for vectors-standards of implementation of RO CT [1, if / Yet m > v.- ct = [ ' /n^f m /2' LLmj J j=1 [0, else, where n is the number of implementation of RO (attributes), which belong to the cluster of correspondent class CT0m
5 Splitting {ctm} into pairs of the nearest adjacent vectors-standards Methods and models [8, 10, 12, 14, 23, 25] are used
6 Restoration of container for CT
6.1 Values of meter of recognition classes "0» m:=0
6.2 Increasing the value of meter m:=m+1
6.3 Value of meter of steps of RC change "0» cr:=0
6.4 Increasing the value of meter cr:= cr +1
6.5 Calculation of current ICFE Expression - stage 6.4 Table 1
6.6 Condition verification [if CEm gISCE then ^ 6.4 |else 6.7.
6.7 Calculation of current ICFE Expression - stage 6.4 Table 1
6.8 Calculation of GMAX of ICFE CEm [1] := extemCE,,, [l,cr]
6.9 Calculation of optimal RC of RO class CT cr* [l] := argextremCEm [l,cr]
7 Condition verification Jif m g M then ^ 6.2 [else 8
8 Calculation of averaged ICFE value / M CEcp = (1/M) Y maxCEc
9 Condition verification JJif ca [l]< cai0w / 2 then ^ 2 "[else 10
10 Condition verification Jif CE g ISCE then ^ 11 "[else 6.8& 6.9
11 Calculation of GMAX ICFE in admissible function determination range ca* = argmax{ max CE} ISca ISCEnIScr
12 Adding results to a knowledge base (KB). End of algorithm operation.
Input data for ASR are an array of learning samples, obtained based on data from Tables 1, 2, as well as results of [10, 16]:
LM[kl] [implementation] [j], (13)
where kl is the number of learning matrix for RO class; implementation is the number of implementation in BLM [10, 16]; j is the number of recognition attribute for RO.
To assess ASR effectiveness and optimality of defined VAD for RO classes of ISDA, the Pareto method was used
[5, 8, 28]. The membership degree of the best, from the standpoint of ARS or an expert, variant of Pareto-optimal solution in terms of strategies for providing cyber protection was determined by formula:
j=i w
where ® is the triangular norm (T-norm) [5, 28]; W;(x) is the final choice of the solution option of ASR (or an expert); zij is the fuzzy assessment of usefulness of the i-th option of solving the problem of recognition of anomaly or cyber attack, which is determined by value of ICFE; pj is the assessment
of CIIS states in the process of
~ss
RO recognition; p, are the assessments of ASR states in the process of anomalies or cyber attack recognition.
Membership degree of the best variant of Pareto-op-timal fuzzy solution for the formation of KB for ASR was defined using the modified Wald criterion and the Savage criterion [5].
istered for the system, which allows us to form controlling commands for responding to the deviations of parameters from the estimated values, please refer to Fig. 2, a, b.
Fig. 3 shows results, obtained in the course of simulation modeling and testing of algorithms of parallel clustering and formation of reference deviations for the recognition attributes, on the example of a DoS class of attacks. Results of the clustering of attack attributes in the process of testing the improved algorithm and the formation of VAD are shown in blue color. Similar results were also obtained for other classes of anomalies and cyber attacks.
SL- i -■ in.
J - fe *
. f? 8«<i-0 DE
m b b
ce
5. Simulation of the clustering algorithm and the formation of VAD for the attributes of anomalies and cyber attacks
A
A Workspace
cr™
CE\
5
A
Workspace
10
b
CE 6
1
k
The algorithms were implemented in the MATLAB 7/2009 and Simulink programming environments in order to subsequently study the operation modes of ASR of anomalies and cyber attacks in CIIS (under conditions of countering the targeted cyber attacks [1,7, 10, 16, 18]).
In accordance with recommendations of [8, 20, 21, 25], multidimensional binary learning matrices (MBLM) of RO classes had from 50 to 65 implementations. For the classes of network attacks [7, 8] (DoS/DDoS, Probe, R2L, U2R), the number of recognition attributes made up 12-41 [13, 23, 15], for virus attacks, 7-15 [5, 7] attributes. Fig. 1, a-e shows dependences of ICFE learning of simulation model (SM) of ASR [23] on RC of RO - cr. In Fig. 1, a-e, the middle section (marked in blue) corresponds to the operation area of the selected recognition attributes that have the highest informativeness indicator (ICFE) [23].
After formation of MBLM for the normal behavior of a system, according to the proposed algorithm, binary trees of traffic are constructed for network attacks, as well as er-ror-free decisive rules, by the appropriate learning matrix of attributes [16, 18, 23]. Next, MBLM are determined and reg-
CE
ilk
A k
Workspace
Fig. 1. Dependences of ICFE of learning of simulation model ASR on RC of RO: a - ICFE for the DoS/DDoS attacks; b- ICFE for the Probe attacks; c- ICFE for the R2L attacks; d— ICFE for the U2R attacks; e — ICFE for virus attacks
An analysis of results of the simulation experiment (Fig. 3) on determining the dependence of ICFE of ASR learning allows us to draw the following conclusions:
- the averaged maximum value of ICFE of ASR learning is equal to: for attacks of the DoS/DDoS class CE =3.19; for attacks of the Probe class CE=3.15; for attacks of the R2L class CE=2.84; for attacks of the U2R class CE=3.27; for virus attacks (VA)=2.56;
- the averaged value of optimal radius cr equals in code units for RO classes, given in Table 3, respectively: for hyyl class: DoS/DDoS - cr* = 4; Probe - cr* = 3; R2L - cr* = 4; U2R- crj* = 4; BA- of =5; for hy 2 class: DoS/DDoS -
cr2* = 2; Probe - cr2* = 1; R2L - cr2* = 1; U2R - cr2* = 1; BA cr2* = 2; for hyg3 class: DoS/DDoS - cr3* = 3; Probe cr3* = 3; R2L - cr3* = 2; U2R - cr3* = 2; BA - cr3* = 3.
The values of optimal RC cr, taking into consideration additional hypotheses for the examined simulation models of ASR learning, are givra in Table 3.
a b
Fig. 2. Structural characteristics of anomalous and normal traffic: a — normal traffic for simulation model; b — traffic for the case of recognition of a network attack
Fig. 3. Results of the stages of parallel clustering and formation of VAD for the recognition of attributes (on the example of
DoS attacks)
Table 3
Values of optimal RC cr for the examined simulation models of ASR learning
Values of optimal RC cr
No. Accepted hypotheses for RO DoS/ DDoS Probe R2L U2R BA
Basic hypotheses
1 Basic working hypothesis - hyg1 : attribute (attributes) rq of RO and indicator IE (characterizes stability of CIIS functioning [18, 23]) is within the normal state of CIIS cr1opt = 4-5 cr1opt = 3-4 cr1opt = 4-5 cr1opt = 4-5 cr1opt = 5-6
2 Hypothesis hy 2 - attribute (attributes) allows drawing a conclusion that indicator IE is lower than the norm cr2opt = 2-3 cr2opt = 1-2 cr2opt = 1-2 cr2opt = 1-2 cr2opt = 2-3
3 Hypothesis hy 3 allows drawing a conclusion that indicator IE is higher than the norm cr3opt = 3-4 cr3opt = 3-4 cr3opt = 2-3 cr3opt = 2-3 cr3opt = 3-4
Additional hypotheses for simulation model
4 Hypothesis hy^j - node of CIIS demonstrates increased network activity cropt = 4 cropt = 4 c< = 3 c< = 3 -
5 Hypothesis hy^2 - node of CIIS demonstrates increased activity during external traffic cropt = 3 cropt = 3 cropt = 3 cropt = 2 -
As was shown by data analysis, for IM, Fig. 1-3, quasi-optimal value of parameter can,i of VAD equals VAD=8-16 % at maximum value of CEmax=6.16.
Thus, it was proved in the course of the simulation experiment that the proposed algorithms for the clustering of RO attributes enable us to obtain efficient learning matrices for ASR as a part of ISDA.
6. Discussion of results of testing the algorithms and prospects of further research
Scientific and practical results of research in the form of software applications were implemented in ASR and adaptive expert systems (AES) of cyber protection, implemented at the state enterprise "Design and engineering technological bureau of automation of control systems on railway transport of Ukraine" of the Ministry of Infrastructure of Ukraine, as well as in the information security services of computing centers at the industrial and transportation enterprises in the cities of Kyiv, Dnipro and Chernihiv.
The proposed algorithms differ from the existing ones by the possibility of simultaneous formation of reference tolerances in the course of analysis of complex attributes of anomalies and cyber attacks. This allows changing VAD for all attributes simultaneously during the procedure of training the existing and promising ISDA. The improved algorithms are also focused on the possibility of processing a large amount of specialized data during procedures of the recognition and analysis of various types of attributes of anomalies and targeted cyber attacks in CIIS.
The effectiveness of using the proposed algorithms depends on the number of informative attributes, which are used for the formation of BLM. In addition, efficiency of algorithms is determined by the input data for ASR or AES, formed at each step of clustering. When the number of attributes is insignificant, the effect of using the modified algorithm is negligible.
The results presented are a continuation of the research, results of which were described earlier in articles [10, 18, 23]. The prospects of further research include the enlargement of attributes knowledge base and the formation of BLM of ASR.
7. Conclusions
1. We proposed to refine the algorithm of splitting the feature space into clusters in the course of implementation of procedure for the recognition of anomalies and cyber attacks, which differs from the existing algorithms by the simultaneous formation of reference tolerances during analysis of complex RO attributes, and allows simultaneous changing of VAD for all attributes at every step of learning. The proposed refinements make it possible to prevent possible cases of the absorption of one RO class of basic attributes of anomalies and cyber attacks by another class. In this case, predicate expressions were obtained for ASR that is capable of self-learning.
2. We examined the devised algorithms on the simulation models in MatLab. It was proved that the proposed algorithms for the clustering of RO attributes enable to obtain effective learning matrices for ASR as a part of ISDA.
References
1. Khan, L. A new intrusion detection system using support vector machines and hierarchical clustering [Text] / L. Khan, M. Awad, B. Thuraisingham // The VLDB Journal. - 2006. - Vol. 16, Issue 4. - P. 507-521. doi: 10.1007/s00778-006-0002-5
2. Ranjan, R. A New Clutering Approach for Anomaly Intrusion Detection [Text] / R. Ranjan, G. Sahoo // International Journal of Data Mining & Knowledge Management Process. - 2014. - Vol. 4, Issue 2. - P. 29-38. doi: 10.5121/ijdkp.2014.4203
3. Feily, M. A Survey of Botnet and Botnet Detection [Text] / M. Feily, A. Shahrestani, S. Ramadass // 2009 Third International Conference on Emerging Security Information, Systems and Technologies. - 2009. doi: 10.1109/securware.2009.48
4. Mahmood, T. Security Analytics: Big Data Analytics for cybersecurity: A review of trends, techniques and tools [Text] / T. Mah-mood, U. Afzal // 2013 2nd National Conference on Information Assurance (NCIA). - 2013. doi: 10.1109/ncia.2013.6725337
5. Dua, S. Data Mining and Machine Learning in Cybersecurity [Text] / S. Dua, X. Du. - UK, CRC press, 2016. - 256 p.
6. Zhang, S. An Empirical Study on Using the National Vulnerability Database to Predict Software Vulnerabilities [Text] / S. Zhang, D. Caragea, X. Ou // Lecture Notes in Computer Science. - 2011. - P. 217-231. doi: 10.1007/978-3-642-23088-2_15
7. Lee, K.-C. Sec-Buzzer: cyber security emerging topic mining with open threat intelligence retrieval and timeline event annotation [Text] / K.-C. Lee, C.-H. Hsieh, L.-J. Wei, C.-H. Mao, J.-H. Dai, Y.-T. Kuang // Soft Computing. - 2016. - Vol. 21, Issue 11. -P. 2883-2896. doi: 10.1007/s00500-016-2265-0
8. Buczak, A. L. A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection [Text] / A. L. Buczak, E. Guven // IEEE Communications Surveys & Tutorials. - 2016. - Vol. 18, Issue 2. - P. 1153-1176. doi: 10.1109/ comst.2015.2494502
9. Petit, J. Potential Cyberattacks on Automated Vehicles [Text] / J. Petit, S. E. Shladover // IEEE Transactions on Intelligent Transportation Systems. - 2015. - Vol. 16, Issue 2. - P. 546-556. doi: 10.1109/tits.2014.2342271
10. Lakhno, V. A. Applying the functional effectiveness information index in cybersecurity adaptive expert system of information and communication transport systems [Text] / V. A. Lakhno, P. U. Kravchuk, V. L. Pleskach, O. P. Stepanenko, R. V. Tishchenko, V. A. Chernyshov // Journal of Theoretical and Applied Information Technology. - 2017. - Vol. 95, Issue 8. - P. 1705-1714.
11. Dovbysh, A. S. Information-extreme Algorithm for Recognizing Current Distribution Maps in Magnetocardiography [Text] / A. S. Dovbysh, S. S. Martynenko, A. S. Kovalenko, N. N. Budnyk // Journal of Automation and Information Sciences. - 2011. -Vol. 43, Issue 2. - P. 63-70. doi: 10.1615/jautomatinfscien.v43.i2.60
12. Ameer Ali, M. Review on Fuzzy Clustering Algorithms [Text] / M. Ameer Ali, G. C. Karmakar, L. S. Dooley // IETECH Journal of Advanced Computations. - 2008. - Vol. 2, Issue 3. - P. 169-181.
13. Guan, Y. Y-means: a clustering method for intrusion detection [Text] / Y. Guan, A. A. Ghorbani, N. Belacel // CCECE 2003 -Canadian Conference on Electrical and Computer Engineering. Toward a Caring and Humane Technology (Cat. No.03CH37436). -2003. doi: 10.1109/ccece.2003.1226084
14. Halkidi, M. On Clustering Validation Techniques [Text] / M. Halkidi, Y. Batistakis, M. Vazirgiannis // Journal of Intelligent Information Systems. - 2001. - Vol. 17, Issue 2/3. - P. 107-145. doi: 10.1023/a:1012801612483
15. Gamal, M. M. A Security Analysis Framework Powered by an Expert System [Text] / M. M. Gamal, B. Hasan, A. F. Hegazy // International Journal of Computer Science and Security (IJCSS). - 2011. - Vol. 4, Issue 6. - P. 505-527.
16. Lakhno, V. A model developed for teaching an adaptive system of recognizing cyberattacks among non-uniform queries in information systems [Text] / V. Lakhno, H. Mohylnyi, V. Donchenko, O. Smahina, M. Pyroh // Eastern-European Journal of Enterprise Technologies. - 2016. - Vol. 4, Issue 9 (82). - P. 27-36. doi: 10.15587/1729-4061.2016.73315
17. Riadi, I. Log Analysis Techniques using Clustering in Network Forensics [Text] / I. Riadi, J. E. Istiyanto, A. Ashari, N. Subanar // (IJCSIS) I International Journal of Computer Science and Information Security. - 2012. - Vol. 10, Issue 7.
18. Lakhno, V. Development of adaptive expert system of information security using a procedure of clustering the attributes of anomalies and cyber attacks [Text] / V. Lakhno, Y. Tkach, T. Petrenko, S. Zaitsev, V. Bazylevych // Eastern-European Journal of Enterprise Technologies. - 2016. - Vol. 6, Issue 9 (84). - P. 32-44. doi: 10.15587/1729-4061.2016.85600
19. Kiss, I. A clustering-based approach to detect cyber attacks in process control systems [Text] / I. Kiss, B. Genge, P. Haller // 2015 IEEE 13th International Conference on Industrial Informatics (INDIN). - 2015. doi: 10.1109/indin.2015.7281725
20. Dovbysh, A. S. Informatsionno-ekstremalnyy algoritm optimizatsii parametrov giperellipsoidnykh konteynerov klassov raspozna-vaniya [Text] / A. S. Dovbysh, N. N. Budnik, V. V. Moskalenko // Problemy upravleniya i informatiki. - 2012. - Issue 5. -P. 111-119.
21. Lee, S. M. Detection of DDoS attacks using optimized traffic matrix [Text] / S. M. Lee, D. S. Kim, J. H. Lee, J. S. Park // Computers & Mathematics with Applications. - 2012. - Vol. 63, Issue 2. - 501-510. doi: 10.1016/j.camwa.2011.08.020
22. Gao, P. Identification of Successive "Unobservable" Cyber Data Attacks in Power Systems Through Matrix Decomposition [Text] / P. Gao, M. Wang, J. H. Chow, S. G. Ghiocel, B. Fardanesh, G. Stefopoulos, M. P. Razanousky // IEEE Transactions on Signal Processing. - 2016. - Vol. 64, Issue 21. - P. 5557-5570. doi: 10.1109/tsp.2016.2597131
23. Lakhno, V. Design of adaptive system of detection of cyber-attacks, based on the model of logical procedures and the coverage matrices of features [Text] / V. Lakhno, S. Kazmirchuk, Y. Kovalenko, L. Myrutenko, T. Zhmurko // Eastern-European Journal of Enterprise Technologies. - 2016. - Vol. 3, Issue 9 (81). - P. 30-38. doi: 10.15587/1729-4061.2016.71769
24. Dovbysh, A. S. Optimization of the parameters of learning intellectual system of human signature verification [Text] /
A. S. Dovbysh, D. V. Velikodnyi, J. V. Simonovski // Radioelectronic and computer systems. - 2015. - Issue 2. - P. 44-49.
25. Akhmetov, B. Designing a decision support system for the weakly formalized problems in the provision of cybersecurity [Text] /
B. Akhmetov, V. Lakhno, Y. Boiko, A. Mishchenko // Eastern-European Journal of Enterprise Technologies. - 2017. - Vol. 1, Issue 2 (85). - P. 4-15. doi: 10.15587/1729-4061.2017.90506
26. Callegari, C. Improving PCA-based anomaly detection by using multiple time scale analysis and Kullback-Leibler divergence [Text] / C. Callegari, L. Gazzarrini, S. Giordano, M. Pagano, T. Pepe // International Journal of Communication Systems. - 2012. -Vol. 27, Issue 10. - P. 1731-1751. doi: 10.1002/dac.2432
27. Chinh, H. N. Fast Detection of Ddos Attacks Using Non-Adaptive Group Testing [Text] / H. N. Chinh, T. Hanh, N. D. Thuc // International Journal of Network Security & Its Applications. - 2013. - Vol. 5, Issue 5. - P. 63-71. doi: 10.5121/ijnsa.2013.5505