Научная статья на тему 'BINARY DECISION TREE CONSTRUCTION USING THE HYBRID SWARM INTELLIGENCE'

BINARY DECISION TREE CONSTRUCTION USING THE HYBRID SWARM INTELLIGENCE Текст научной статьи по специальности «Физика»

CC BY
31
9
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
CLASSIFICATION / DECISION TREE / PARTICLE SWARM / GENETIC EVOLUTION / HYBRIDIZATION / INTEGER PARAMETER VALUES / CHROMOSOME STRUCTURES / DIRECTED MUTATION OPERATOR

Аннотация научной статьи по физике, автор научной работы — Lebedev B.K., Lebedev O.B., Zhiglaty A.A.

Solving the problem of a classification model construction is presented in the form of a sequence of considered attributes and values thereof included in the Mk route from the root to the dangling vertex. Decision tree developed interpretation is presented as a pair of chromosomes (Sk, Wk). The Sk chromosome list of genes corresponds to the list of all attributes included in the Mk route in the decision tree. The Wk chromosome gene values correspond to the attribute values included in the Mk route. Unification of data structures, search space and modernization of integrable algorithms was carried out for hybridization. Hybrid algorithm operators are using the integer parameters and synthesize new integer parameter values. Method was developed to account for simultaneous attraction of the αi particle to three xi (t), x*i (t), x*(t) attractors dislocating from the xi (t) position to the xi (t + 1) position. Modified hybrid metaheuristic of the search algorithm is proposed for constructing a classification model using recombination of swarm and genetic search algorithms. The first approach uses genetic algorithm initially and then the particle swarm algorithm. The second approach uses the high-level nesting hybridization method based on combination of genetic algorithm and particle swarm algorithm. The proposed approach to constructing a modified paradigm uses chromosomes with integer parameter values in the indicated hybrid algorithm and operators, which assist chromosomes to evolve according to the rules of particle swarm and genetic search

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «BINARY DECISION TREE CONSTRUCTION USING THE HYBRID SWARM INTELLIGENCE»

UDC 007.52+004.896:656.052.48:519.876.5 DOI: 10.18698/0236-3933-2021-2-52-65

BINARY DECISION TREE CONSTRUCTION USING THE HYBRID SWARM INTELLIGENCE

B.K. Lebedev O.B. Lebedev A.A. Zhiglaty

[email protected]

[email protected]

[email protected]

Academy for Engineering and Technologies, Southern Federal University, Taganrog, Russian Federation

Abstract

Solving the problem of a classification model construction is presented in the form of a sequence of considered attributes and values thereof included in the Mk route from the root to the dangling vertex. Decision tree developed interpretation is presented as a pair of chromosomes (Sk, Wk). The Sk chromosome list of genes corresponds to the list of all attributes included in the Mk route in the decision tree. The Wk chromosome gene values correspond to the attribute values included in the Mk route. Unification of data structures, search space and modernization of integrable algorithms was carried out for hybridization. Hybrid algorithm operators are using the integer parameters and synthesize new integer parameter values. Method was developed to account for simultaneous attraction of the a¡ particle to three x¡ (t), x*¡ (t), x*(t) attractors dislocating from the x¡ (t) position to the x¡ (t + 1) position. Modified hybrid metaheuristic of the search algorithm is proposed for constructing a classification model using recombination of swarm and genetic search algorithms. The first approach uses genetic algorithm initially and then the particle swarm algorithm. The second approach uses the high-level nesting hybridization method based on combination of genetic algorithm and particle swarm algorithm. The proposed approach to constructing a modified paradigm uses chromosomes with integer parameter values in the indicated hybrid algorithm and operators, which assist chromosomes to evolve according to the rules of particle swarm and genetic search

Keywords

Classification, decision tree, particle swarm, genetic evolution, hybridization, integer parameter values, chromosome structures, directed mutation operator

Received 29.05.2020 Accepted 26.06.2020 © Author(s), 2021

This work was performed with financial support provided by the Russian Foundation for Basic Research (grant no. 20-07-00260 a)

Introduction. The most common methods for solving classification problems are using the D = (X, U} decision tree as a qualification model, where X = {xi | i = 1, 2, ..., n} is the set of vertices and U = {ui | i = 1, 2, ..., m} [1-3]. The X set includes the Xi set of internal vertices and the X2 set of end vertices. Inner vertices of the decision tree correspond to features characterizing the object. End vertices correspond to the values of categorical variables (specific class, grade, etc.) [4, 5]. All edges are oriented. In order to classify a new object, it is necessary in the decision tree to build an oriented route from the root to one of the end vertices. The order of vertices in the oriented route determines the order of features consideration. Thus, the edge emerging from the xi vertex corresponds to the xi feature value [4].

The purpose of building a decision tree is to determine the value of categorical dependent variable (class). If the target variable takes discrete values, then classification problem is being solved.

Binary decision trees are the most common and simplest case [3, 4]. The decision tree efficiency significantly depends on correct selection of the branching criterion.

Most of the known algorithms (CART, C4.5, NewId, ITrule, CHAID, CN2, etc.) [1-4] are the "greedy" algorithms of sequential type. With this approach, the decision tree is built from top to bottom. At each step of the greedy algorithm, partition of a set of objects is performed according to a feature ensuring maximum difference and distinction between subsets. Sequential algorithms are characterized by lesser labor intensity, but provide the lowest quality.

An effective way to improve the quality of solutions is using the stochastic population algorithms [5], which, as a rule, are iterative and operate in the complete solution area. Swarm and genetic algorithms are widely employed. Studies of the population algorithms efficiency demonstrated that their hybridization is a powerful means in increasing the new algorithm efficiency [6, 7]. Recombination of the population algorithms metaheuristics provides uniform and reasonable scanning of the search space and high efficiency of the integrated algorithms [8].

Solution to the problem of constructing a classification model in this work is the sequence of considered attributes and values thereof included in the route from the root vertex to the dangling vertex. Search algorithm modified hybrid metaheuristics is proposed by recombination of swarm and genetic search algorithms.

The first approach uses initially genetic algorithm and then the particle swarm algorithm.

The second approach uses the high-level nesting hybridization method based on combining genetic and particle swarm algorithms [9-11]. Hybridization usually implies unification of data structures, search space and modernization of integrated algorithms in connection with unification.

This concerns primarily types of the parameter values. Most algorithms in solving the combinatorial logical problems are using the integer parameter values [12, 13]. These algorithm operators introducing parameters with the integer values synthesize new integer parameter values. Classical paradigm of a particle swarm operates with real parameter values, while the particle swarm operators generate solutions with real values even on the basis of integer parameter values. In the proposed approach to constructing a modified paradigm in the indicated hybrid algorithm, chromosomes with integer parameter values and operators are used assisting chromosomes to evolve according to the rules of particle swarm and genetic search.

Search for the particle swarm algorithm solutions. Swarm algorithm is based on the process of step-by-step particles displacement to new positions in the search space [14, 15] determined as:

Xi (t + 1) = Xi (t) + Vi (t + 1),

where Vi (t + 1) is the vector (interval) of a particle displacement from the Xi (t) position to the Xi (t + 1) position. The Vi (t + 1) vector shows the particle attraction to three attractors: Xi (t) is the ai particle current position; x*(t) is the ai particle best position visited since the first iteration start; X*(t) is the ai particle position in the particle swarm at the t time moment.

Approaches to particle swarm modification and hybridization considered below do not depend on the type of neighborhood topology.

The Vi (t + 1) vector is considered as a means of changing the decision and could take real values.

Target variable in the classification problem takes discrete values. Therefore, the work uses a solution search space with integer coordinate values. Particle descriptions and particle positions are presented in the form of a chromosome with integer values of genes, which in the genetic algorithm is the code of solution. Therefore, distance between positions corresponds to the degree of proximity between decisions. In this case, the search space could be considered as the affine search space.

In our case, chromosome that encodes such position is used as a position. The Xi (t), X*(t), X*(t) positions correspond to the Hi (t) = {gii (t) | l =

= 1, 2, ..., nl}, H*(t) = {g*(t) | l = 1, 2, ..., nl}, H*(t) = {g*(t) || l = 1, 2, ..., nl} chromosomes.

Value of the pair of positions connection affinity with each other is determined by the distance between them. The smaller the distance between two positions, the more they are similar (close) to each other, and the greater is the affinity of connection between them.

At each step, the ai particle passes in the affine space to a new Hi position, where the weight of the Hi position affine connection with the best position in the particle swarm is increasing. The distance between positions continuously decreases (the weight of affine connections between particles increases) in the process of particle swarm displacement.

Particle and position, where it is displaced, correspond to the same chromosome; therefore, two decoders D1 and D2 are used to obtain the particle and position phenotypes. As a result of applying the D1 decoder to the Hi(t) chromosome, decision interpretation is being formed. When using the D2 decoder, a set of position coordinates is formed.

For each particle located in the xi (t) position at the t iteration, the x*(t) and x* (t) positions are determined, which are declared to be its attractors (centers of attraction).

Simultaneous attraction of the ai particle to the three xi (t), x*(t), x*(t) attractors, when passing from the xi (t) position to the xi(t + 1) position, is accounted as follows. Let us introduce the following notations: Ô1 is the weight of affine connection between xi (t) and x*(t); Ô2 is the weight of affine connection between xi (t) and x (t); Ô3 is the weight of affine connection between xi (t + 1) and xi (t); Ô4 is the weight of affine connection between xi (t + 1) and x*(t); Ô5 is the weight of affine connection between xi (t + 1) u x* (t). Displacement of the ai particle from the xi (t) position to the xi (t + 1) position is carried out using the directed mutation operator subjected to the condition: 61 + Ô2 < < Ô3 + Ô4 + Ô5. In other words, total connection affinity of the ai particle with the three xi (t), x*(t), x* (t) attractors is not decreasing after displacement to a new position.

The ai particle displacement means transition from the Hi (t) chromosome to the H (t + 1) chromosome.

Purpose of the ai particle displacement is to maximize the total weight of affine connections between the Hi (t) position and the attractors.

Statement of problem in constructing a binary decision tree by methods of hybrid swarm intelligence. There is a set of objects O = {Oi | i = 1, 2, ..., n0}, each is characterized by ni features A = {Ai | i = 1, 2, ni}. A certain learning set of examples P = {Pi | i = 1, 2, ..., np} is provided for objects with description of feature values and indication of the object class. Each Ai feature has two distinct values Z/, Zf.

It is necessary to elaborate an algorithm for constructing a binary classification model in the form of a decision tree, which would make it possible to classify new data coming from the outside. The goal of constructing a decision tree is to determine the categorical dependent variable values.

As an assessment of the classification quality, the Fo = (no -n*)/ no value is chosen, where no is the total number of objects, and n* is the number of correctly classified objects.

At the model construction stage, an ordered sequence of attributes is formed that are part of the route on the decision tree from the root vertex to the dangling vertex. Route construction ends, if the Fo minimum value (zero value) is reached or the C search depth (number of attributes in the sequence) reaches the Cmax limiting value. In this case, the dangling vertex is declared as a leaf. The C parameter is the route estimate in the first case. In the second case, the F parameter is the route estimate:

F = aFo + PC,

where a, P are the proportionality coefficients.

Optimization goal is to minimize the F criterion.

Principles of binary decision tree coding. In this work, solution to the problem of constructing a qualification model lies in building a sequence of considered attributes and values thereof that are part of the Mk oriented route from the root vertex to the leaf. In general, the Mk route on a decision tree includes ni vertices and ni edges. The xi and xi+1 vertices of the Mk route correspond to the Ai, Ai+1 attributes. Each uij edge corresponds to the Ai attribute value, leaves out of xi and enters xi+1. The last edge in the route leaves the last vertex of the Sk list of vertices and enters the L vertex with the -list mark, which value corresponds to the number in the recognized class. States of the edges entering the Mk route are set by the Wk vector.

Elaborated interpretation of the decision tree is represented as a pair of chromosomes (^, Wk) [8, 9]. The Hk = {hki | i = 1, 2, ..., ni} chromosome

is an ordered set of hki genes with integer values corresponding to the Sk ordered list all attributes included in the Mk route in the decision tree from the root vertex to the dangling vertex.

The Wk = {gki \ i = 1, 2, ..., ni} chromosome is an ordered set of gki genes with integer values corresponding to the states (attribute values) entering the Mk route. Each gki gene corresponds to the wki edge index (Ai attribute value) connecting vertices xi and xi +i corresponding to genes hki, hki+i. The last route vertex is connected by an edge with the additional L vertex. The L vertex is intended for storing the classification result.

Example. To construct a classifier of rice variety, the P = {pk \ k = 1, 2, ..., nt} learning sample is set and presented in the Table. Each Ai feature has two values

4, Z2.

Learning sample

Features

Number Ai, humidity, % (not more) A2, black dockage, % (not more) A3, yellowed cores, % (not more) A4, unpeeled cores, % (not more) Variety

Pi Z} < l0 Z} < 0.2 Z < 2 Z4 are none 2

P2 Z} < l0 Z} < 0.2 Z1 < 2 Z2 < 2 l

P3 Z} < l0 Z} < 0.2 Zf are none Z2 < 2 l

P4 Zf < is Z} < 0.2 Zf are none Z42< 2 l

P5 zf < is Z22 < 0.3 Zf are none Z42< 2 2

P6 z2 < is Z} < 0.2 Zf are none Z4 are none l

P7 Z} < l0 Z22 < 0.3 Z1 < 2 Z4 are none 2

Let us consider the process of building a classification tree. Let the solution be set by the Hi = <X2, X4, X3, xi> chromosome, where xi corresponds to Ai and the Wi vector defining the states of edges connecting the vertices: W24 = 1, W43 = 1, W3i = 1, wiL = 1. A pair of chromosomes corresponds to the route Mi = X2, U24, X4, U43, X3, U31, xi, uiL, L. The route is supplemented with the L leaf. Here L is the class type determined after the tree is built (processing the Mi route).

Decision tree is being formed sequentially. At each t step of constructing a tree in the Mi route, the next Xi vertex and the uij outgoing edge are selected, for which the Wj parameter value specifying the Zj or Zf values of the Ai attribute are determined in the Wi vector.

In the first step, the A2 attribute is selected in Mi. The set of P examples (see the Table) is divided into two subsets P2 and P22: P2 e P contains n2 examples with the first Z2 value of the A2 attribute, and P22 e P contains n2 examples with the second Z2 value of the A2 attribute: P2 u P22 = P. In our example: n = |P| = 7, n2 = | P211 = 5, n2 = | P22 | = 2, P21 = (pi, p2, p3, p4, p6), P22 = (p5, p7). In P21, four examples correspond to the first variety, one to the second.

Further, in accordance with Mi, the Z2 value is selected for A2, and for further branching, P2 is selected in accordance with the Z| and Z| values of the A4 attribute. Further, the Z4 value is selected for A4 in accordance with Mi, and so on.

Fig. i shows a classifier that includes the Mi route with given states of the Wi edges. The 9 = (tci : ^2) parameter fixes the ratio of the Ki examples number of the first variety to the n2 examples number of the second variety.

Fig. 1. Rice variety classifier

In our example (see Fig. 1), possible routes on the graph are as follows:

(M2 = X2, U2L, L. W2L = 0. L = 2);

(M3 = X2, U24, X4, U41, X1, U1L, L. W24 = 1, W41 = 0, WlL = 0. L = 1);

(M4 = X2, U24, X4, U41, X1, U13, X3, U3L, L. W24 = 1, W41 = 0, W13 = 1, W3L = 1. L = 2);

(M5 = X2, U24, X4, U41, X1, U13, X3, U3L, L. w24 = 1, w41 = 0, w13 = 1, w3L = 0. L = 1).

The М2 route has the shortest length, while the М4 and М5 routes are having the maximum length.

Fig. 2 provides the process of reconstructing a decision tree from the found solution interpretation set by the Mi and Мl W1 pair.

Procedures for forming positions and displacing particles in the decision search affine space. Current decision population is exposed to changes using genetic operators in the genetic algorithm at each iteration. When particles are displaced in the search space, a pair of chromosomes (Hk and Wk) is considered as a single object; however, mechanisms of these operators are different, independent and correspond to the Hk and Wk chromosome structures.

In a general case, the E, single search space could be considered, where position of each ak particle is determined by a pair of chromosomes (Hk, Wk). This work applies an approach, where the k-th population solution corresponds to a pair of ahk and awk particles being synchronously displaced respectively in the t,h and E,w search subspaces, E,h ^ tw = Ç.

The number of axes in the E,h search subspace of the ahk particle described by the Hk chromosome is equal to the number of genes in the Hk chromosome. The Hk (t) = < hki (t) | i = 1, 2, ..., ni > chromosome corresponds to the ordered list of attributes Mk (t) = < mki (t) | i = 1, 2, ..., m >. Each i locus in the Hk (t) chromosome corresponds to an axis in the ^h search subspace. Each axis hosts ni reference points corresponding to the possible gene values. Note that each hki (t) value is plotted only on a single axis and only once. For example, the Hk (t) = < 1, 5, 6, 4, 2, 3 >, ni = 6 chromosome is located in a subspace with 6 coordinate axes.

At each t step, the ahk particle exposed to attraction to the attractor moves from Hk (t) to Hk (t + 1) with the new mutual arrangement of genes.

To assess the affine relationship (distance estimation) between two positions, an indicator is used, i.e., the 51kz (t) difference degree. The 51kz (t) difference degree between two chromosomes of the same length is the number of loci,

Number ah a2. a3. a4. Variety

1 4 4 zl 4 1

2 z\ 4 zl Zi 2

3 z\ 4 Zl Zi 1

4 Z? 4 Zl Zi 1

5 Z? Zl Zl Zi 2

6 4 4 Zl 4 1

7 A Zl2 Zl 4 2

Number ah ¿2- a3. a4. Variety

1 z\ 4 zl 4 1

2 z\ 4 zl 4 2

3 z\ 4 Zl 4 1

4 zi 4 4 1

6 4 4 4 4 1

Number Ah Аъ. Aa. Variety

5 zi 72 Z2 zl 4 2

7 z\ 72 Z2 Zl 4 2

Number av a2. a3. a4. Variety

1 z\ 4 zl 4 1

6 zi 4 zi 4 1

1

Number ah a2. Лз • a4. Variety

2 z\ 4 4 4 2

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

3 4 4 4 4 1

Number av a2. a3. a4. Variety

2 4 4 zl 72 z4 2

3 4 4 zi z4 1

4 4 4 Zi 72 Z4 1

Number ¿1- ^3- Л4. Variety

2 4 4 4 4 2

Number л4. Variety

4 4 4 4 4 1

1

Number ah ^2- л4. Variety

3 4 4 4 4 1

Fig. 2. Decision tree construction process

and in each of them the hki (t) e Hk (t) and hzi (t) e Hz (t) genes do not coincide. Let 5ikz (t) be the initial difference degree between positions Hk (t) and Hz (t). By modifying Hk (t), the ahk particle is displaced to the new Hk (t + 1) position with a lower value of the difference degree: 5!kz (t + 1) < 5!kz (t).

An increase in the affine relationship value between Hk (t) and Hz (t) is performed by implementing selective pairwise rearrangements of genes between loci in the Hk (t) position.

A set of L(t) = < li (t) | i = 1, 2, ..., ni > is generated, where genes are located not coinciding with the genes located in the corresponding loci of the Hz (t) chromosome.

With the K1 = q/ni probability, the li (t) e L (t) locus is selected in the Hk (t) chromosome, and the hki (t) e Hk (t) gene located in this li (t) locus is determined, 9 is the coefficient.

Sequentially starting from the first, the set of L(t) = < li (t) | i = 1, 2, ..., ni > loci in the Hz (t) chromosome is considered, and the lj (t) e L (t) locus is identified, where the hzj (t) e Hk (t) gene is located, such that hzj (t) = hki (t).

Genes located in the li (t) and lj (t) loci of the Hk (t) chromosome are reversing. From now on, genes with the same value ô1kz (t + 1) < 51kz (t) are located in the li (t) locus of the Hk (t + 1) and Hz (t) chromosomes. The (i number of such paired permutations is a control parameter, and the ( < nl condition should be satisfied.

An example of modifying the Hk (t) position, when performing displacement.

Let the Hk (t) and Hz (t) positions have the following form: Hk (t) = {1, 3, 2, 4, 5, 6}, Hz (t) = {1, 4, 2, 3, 5, 6}.

A set of L (t) = < 2, 4 > loci is formed in the Hk (t) chromosome, where genes are located not coinciding with the genes located in the corresponding loci of the Hz (t) chromosome, ô1kz (t) = 2. After rearranging genes in Hk (t) between the second and fourth loci, the following is obtained:

Hk (t + 1) = {1, 4, 2, 3, 5, 6} and 51** (t + 1) = 0.

Distance between Hk (t) and Hz (t) decreased, connection affinity between Hk (t) and Hz (t) increased.

The number of axes of the E,w search subspace of the awk particle described by the Wk = {gki I i = 1, 2, ..., ni} chromosome is the same, as in the i,h search subspace. The gki (t) e Wk gene value is an attribute value variant. If gki (t) = 1, then the first value of the corresponding attribute is selected, if gki (t) = 2, then the second value. Each gki (t) e Wk corresponds to its own axis, which scale

includes two reference points, i.e., xn, Xi2. If gki (t) = i, then xn = i. If gki = 2, then

Xi2 = 2.

For example, position in the search space has the following form: W = {2, 2, i, i, 2, 2}.

The 52kz (t) difference degree between two chromosomes of the same length is the number of mismatched gene values in the same loci. Let 52kz (t) be the calculated difference degree between the Wk (t) and Wz (t) positions. By modifying the Wk (t), the awk particle is displaced to a new Wk (t + i) position with a lower value of the difference degree: 5kz (t + i) < 5kz (t).

Gene values in each i locus of the new Wk (t + i) position are determined as follows:

if gki (t) = gzi (t), then gki (t + i) = gki (t); if gki (t) * gzi (t), then gki (t + i) = gzi with the n probability; n = s52kz (t)(t) / ni.

Where s is the coefficient; ni is the number of genes in chromosomes. The higher is the 52kz (t) difference degree between Wk (t) and Wz (t), the higher is probability that the gzi (t) value would become the gki (t + i) value.

Example. Let Wz (t) = < i, 2, 2, i, i, 2, 2, 2, i, i, 2 >;

Wk (t) = < 2, i, 2, 2, i, 2, i, 2, 2, i, 2 >.

Here | Wz (t)| = | Wk (t)| = ii. Gene values do not coincide in loci i, 2, 4, 7, 9, 5kzk (t) = 5. Let genes 2, 4, and 7 mutate in the Wk (t) with a certain probability k = s • 5/ii. The Wk (t + i) modified position has the following form: Wk (t + i) = < 2, 2, 2, i, i, 2, 2, 2, 2, i, 2 >. The difference degree is 52kz (t + i) = = 2. Connection affinity between Wk (t) and Wz (t) increased by the ASw = = k2 (5 2kz (t + i) - 52kz (t)) value.

Affine connection total value increased by the following value:

AS = ASh + ASw = = ki (5ikz (t) (t + i) - 5ikz (t) (t)) + k2 (25kz (t) (t + i) - 52kz (t) (t)).

Experimental research. The developed algorithm for constructing a qualification model was implemented in the form of a GA-RCh DR program for constructing the decision tree.

GA-RCh DR program testing was carried out on test cases with the known Kopt optimum [i3, i6, i7]. Obtained solutions quality level was assessed by the

P = Kopt/K indicator, where K is the optimization criterion value used in the GA-RCh DR program. The number of iterations, where the algorithm reached the maximum quality level, was not exceeding i35 (Fig. 3).

0,75-1-1-1-1-1-1-

20 40 60 80 100 130

Number of iterations Fig. 3. Dependence of the GA-RCh DR algorithm quality level (P) on the number of iterations

Comparison of the GA-RCh DR algorithm in terms of quality level with the genetic algorithm and the particle swarm algorithm demonstrated that with comparable time expenditure, the P indicator in GA-RCh DR algorithm was higher in average by 9-11 %. The average quality level achieved by the GA-RCh DR algorithm at 130 iterations differs from the maximum value by 0.15 %. Overall estimate of the time complexity lies in the 0(n2) - 0(n3) range, where n is the number of features.

Conclusion. Solving the problem of constructing a classification model is presented in the form of a sequence of considered attributes and values thereof included in the Mk route from the root vertex to the dangling vertex. Developed interpretation of the decision tree is presented as a pair of chromosomes (Sk, Wk). List of genes of the Sk chromosome corresponds to the list of all attributes included in the Mk route in the decision tree. Gene values of the Wk chromosome correspond to the attribute values included in the Mk route.

For hybridization, unification of data structures, search space and integrable algorithms modernization were performed. Hybrid algorithm operators use integer-valued parameters and synthesize new integer parameter values.

Modified hybrid metaheuristic of the search algorithm is proposed for constructing a classification model through recombination of swarm and genetic search algorithms. The first approach initially uses the genetic algorithm and then the particle swarm algorithm.

The second approach implies the high-level nesting hybridization method based on combining genetic algorithm and particle swarm algorithm [10, 11]. Position alteration of a particle represented as a genotype leads to both paramet-

Р 1

ric and structural changes. Thus, the proposed modified system is capable of adapting based on parametric and structural changes. A method was elaborated to account for the ai particle simultaneous attraction to several attractors, when displaced to a new position.

The proposed approach to constructing a modified paradigm uses chromosomes with integer values of parameters in the indicated hybrid algorithm and operators allowing chromosomes to evolve according to the rules of particle swarm and genetic search.

Translated by K. Zykova

REFERENCES

[1] Witten I.H., Frank E., Hall M.A. Data mining. San Francisco, Morgan Kaufmann, 2011.

[2] Zhuravlev Yu.I., Ryazanov V.V., Sen'ko O.V. Raspoznavanie. Matematicheskie metody. Programmnaya sistema. Prakticheskie primeneniya [Recognition. Mathematical methods. Software system. Practical applications]. Moscow, Fazis Publ., 2006.

[3] Berikov V.S., Lbov G.S. [Modern trends in cluster analysis]. Vserossiyskiy konkursnyy otbor obzorno-analiticheskikh statey po prioritetnomu napravleniyu "Informatsionno-telekommunikatsionnye sistemy" [All-Russian competitive selection of review and analytical articles in the priority area of "Information and Telecommunication Systems"]. Moscow, Informika Publ., 2008, art. 126 (in Russ.).

[4] Barsegyan A.A., Kupriyanov M.S., Stepanenko V.V., et al. Metody i modeli analiza dannykh: OLAP i Data Mining [Methods and models of data analysis: OLAP and Data Mining]. St. Petersburg, BKhV-Peterburg Publ., 2004.

[5] Karpenko A.P. Sovremennye algoritmy poiskovoy optimizatsii. Algoritmy, vdokh-novlennye prirodoy [Modern search optimization algorithms. Algorithms inspired by nature]. Moscow, Bauman MSTU Publ., 2014.

[6] Wang X. Hybrid nature-inspired computation method for optimization. Doc. Diss. Helsinki University of Technology, 2009.

[7] Lebedev B.K., Lebedev O.B., Lebedev V.B. Mechanisms of the roving algorithm for finding the solution of the problem of distribution of connections. Programmnye produkty, sistemy i algoritmy [Software Journal: Theory and Applications], 2017, no. 4 (in Russ.). Available at: http://swsys-web.ru/en/mechanisms-of-the-roving-algorithm-for-finding-the-solution-of-the-problem-of-distribution-of-connections.html

[8] Lebedev B.K., Lebedev O.B. Hybrid bioinspired algorithm for solving symbolic regression problem. Izvestiya YuFU. Tekhnicheskie nauki [Izvestiya SFedU. Engineering Sciences], 2015, no. 6 (167), pp. 28-41 (in Russ.).

[9] Kureychik V.M., Lebedev B.K., Lebedev O.B. Poiskovaya adaptatsiya: teoriya i praktika [Search adaptation: theory and practice]. Moscow, FIZMATLIT Publ., 2006.

[10] Lebedev B.K., Lebedev O.B., Lebedeva E.M. Distribution of resources based on hybrid models of swarm intelligence. Nauchno-tekhnicheskiy vestnik informatsionnykh tekhnologiy, mekhaniki i optiki [Scientific and Technical Journal of Information Technologies, Mechanics and Optics], 2017, vol. 17, no. 6, pp. 1063-1073 (in Russ.). DOI: https://doi.org/10.17586/2226-1494-2017-17-6-1063-1073

[11] Kureychik V.V., Kureychik Vl.Vl. The architecture of hybrid search for design. Izvestiya YuFU. Tekhnicheskie nauki [Izvestiya SFedU. Engineering Sciences], 2012, no. 7 (132), pp. 22-27 (in Russ.).

[12] Clerc M. Particle swarm optimization. London, ISTE, 2006.

[13] Lebedev B.K., Lebedev V.B. The evolutionary learning procedure for pattern recognition. Izvestiya TSREU [Izvestiya TRTU], 2004, no. 8 (43), pp. 83-84 (in Russ.).

[14] Lebedev B.K., Lebedev V.B., Lebedev O.B. The solution of the symbolic regression problem by genetic search methods. Izvestiya YuFU. Tekhnicheskie nauki [Izvestiya SFedU. Engineering Sciences], 2015, no. 2 (163), pp. 212-225 (in Russ.).

[15] Kennedy J., Eberhart R.C. Particle swarm optimization. Proc. ISNN, 1995, pp. 19421948. DOI: https://doi.org/10.1109/ICNN.1995.488968

[16] Lebedev B.K., Lebedev O.B., Lebedeva E.M. Partition a class method alternative collective adaptation. Izvestiya YuFU. Tekhnicheskie nauki [Izvestiya SFedU. Engineering Sciences], 2016, no. 7 (180), pp. 89-101 (in Russ.).

[17] Cong J., Romesis M., Xie M. Optimality, scalability and stability study of partitioning and placement algorithms. Proc. ISPD, 2003, pp. 88-94.

Lebedev B.K. — Dr. Sc. (Eng.), Professor, Department of Computer Aided Design Systems, Academy for Engineering and Technologies, Southern Federal University (Nekrasovsky pereulok 44, Taganrog, 347900 Russian Federation).

Lebedev O.B. — Cand. Sc. (Eng.), Assoc. Professor, Department of Computer Aided Design Systems, Academy for Engineering and Technologies, Southern Federal University (Nekrasovsky pereulok 44, Taganrog, 347900 Russian Federation).

Zhiglaty A.A. — Assistant of the Department of Mathematical Software and Computer Applications, Academy for Engineering and Technologies, Southern Federal University (Nekrasovsky pereulok 44, Taganrog, 347900 Russian Federation).

Please cite this article as:

Lebedev B.K., Lebedev O.B., Zhiglaty A.A. Binary decision tree construction using the hybrid swarm intelligence. Herald of the Bauman Moscow State Technical University, Series Instrument Engineering, 2021, no. 2 (135), pp. 52-65. DOI: https://doi.org/10.18698/0236-3933-2021-2-52-65

i Надоели баннеры? Вы всегда можете отключить рекламу.