OPTIMIZATION OF TESTS SYNTHESIS ON THE BASE OF DESCENT ALGORITHMS WITH THE USE OF GENETIC TRANSFORMATIONS
YANKOVSKAYA A.E., BLEIKHER A.M.__________________
yank@tisi.tomsk.su, yank@tsuab.ru
bleikher@mail.ru
Introduction
Diagnostic tests synthesis is one of the main directions of computer-aided design and redesign of logical control devices (LCD). Because of this, optimization of its synthesis takes on special significance at increasing dimension of designed devices.
There is a sufficiently large number of approaches to solving problems concerning optimization of test s synthesis. However a scope of paper does not permit to dwell on their analysis. One of the most effective approaches underlying optimization procedures for computer-aided design and redesign of LCD is the approach based on descent algorithms [ 1 ], whose main idea consists intransition either from universal method or “inexpensive” one of solution to a more simple one satisfying to the preassigned requirements. Based on logical tests synthesis descent algorithms are adequate instrument of investigations in such science areas as optimization of (ij)- separating systems [2] and cascade codes [3], finite-state machine (FSM) synthesis [4], test pattern recognition (logical inference), data mining and knowledge analysis [512], including revealing regularities inthem. Descent algorithms found a practical application in different problem and interdisciplinary areas. On their base a series of intelligent systems [5, 14-18] were realized.
In the paper a new approach for descent algorithms realization based on tests synthesis optimization with genetic transformations use is suggested. Recently genetic algorithms (GA) were used widely in discrete optimization problems of computer-aided design LCD [19]. We are proposing the use of genetic transformations concerning tests synthesis optimization on the base of descent algorithms for the first time. The reason of this is the fact that procedures used in descent algorithms are NP-complete. Therefore in real problems where a cardinality of feature space is greater than 300, heuristic methods are required with which GA are related. GA ensure decreasing a number of computations. Preference to GA is given because one can obtain suitable results in case of required computer resource lacks: 1) processor time for computation; 2) computer memory for intermediate results.
For the optimization of tests synthesis descent algorithms based on step-by-step algorithms [1,13,14,20] are used. The step-by-step algorithm using genetic transformations is suggested.
Problem formulation
We suppose that an initial solution obtained by some “inexpensive” method, is represented a ternary matrix Q of descriptions of objects under consideration. Rows of matrix Q correspond to object descriptions (states, conditions, occurrences, situations) and columns correspond to features. The element of the matrix Q takes on a value from a set {0,1,}, where «-» is a symbol of uncertainty treated as an unimportant value ofthe given object. In addition requirements of distinguishing are assigned in form an integer matrix of discriminations R. Rows of matrix R are associated with those of the matrix Q under the same name and columns are associated with classification features, which define various mechanisms of classification (mechanisms of partition of objects into classes of equivalency) [1, 5]. It is necessary to construct a matrix Q’, which contains minimal or close to minimal number of columns of matrix Q and satisfies the assigned distinguishing requirements.
With reference to tasks of FSM synthesis, which is mathematical model of LCD, the matrix Q can represent a matrix of monotonous, races tolerance and fault tolerance coding of internal states of asynchronous FSM, where its rows corresponded to internal states, columns corresponded to internal variables. In this case, requirements of distinguishing assigned in form a matrix of transitions R are conditions of monotonous, races tolerance and fault tolerance coding which should be supported by the matrix Q for the concrete assigned asynchronous FSM (the method of assigning is inessential) at one or two- level of memory realization [21].
For test pattern recognition (logical inference) object descriptions correspond to rows of matrix Q and characteristic features correspond to columns of matrix Q. The rows of matrix R are associated with those of the matrix Q under the same name, and columns are associated with classification features that define various mechanisms of classification (mechanisms of partition of objects into classes of equivalency). The element of R in the intersection of the i-th row and the j-th column defines the membership of the i-th object in one of the extracted classes under the j-th classification mechanism. The fact that an obj ect belongs to a class is marked by a code number ofthis class. The number of extracted patterns is equal to the number of various rows of Q matrix associated with the same rows of R matrix. With a unique classification mechanism, R degenerates into a column, which complies with the traditional representation of information in pattern recognition problems. Notice that a row of Q is associated with a conjunction, and that the element of corresponding row of R defines the number of a class where this conjunction assumes a unit value at the corresponding classification mechanism.
Principle notions and definitions. Peculiarities of genetic transformations
For further understanding of description we shall give principle notions partially represented in [5-13, 19, 22, 23].
We will denote a code of internal state a (description of a-th object) by z(a) representing a ternary vector, and z(a,b) is a ternary vector whose components take on a values of the components of the same name of the vectors z(a) and z(b) if they coincide and values «-» otherwise.
R&I, 2003, Ns 3
51
We will consider that two ternary vectors are at distance h if a number of components in which the vector values do not coincide and differ from «-» is equal to h.
(noncrossing at computer-aided design and redesign of LCD) is equal to M where M is a number of characteristic features (internal variables).
We will consider two objects to be h-distinguishable if not less than h characteristic features in description one of them take the value 1 (0) and in description of another take inverse values, i.e. 0 (1). We will consider 1-distinguishable objects are distinguishable.
Definition 1. Test is a subset of characteristic features corresponding to columns of matrix Q, on condition that any two rows of the matrix are different, if they belong to different patterns.
The notion test used in the descent algorithm for problems concerning computer-aided reliable design and redesign LCD is connected with conditions of coding internal states of asynchronous FSM [1].
Definition 2. Irredundant test (IT) is a test, if its any own subset is not a test.
Definition 3. Regularities are defined as feature subsets with certain easily interpreted properties, which influence discrimination of the objects from different patterns, and are stably observed for the objects from the tutorial sample. And at the same time these properties become apparent on the other objects of the same character. Besides one of the regularities is the weight coefficients of the features, which characterize their individual contribution in the objects discrimination.
The mentioned sub sets will include constant, steady (constant withinthe class), non-informative (differing no pair of objects), alternative (in the meaning of including in a test), dependent (in the meaning of including subsets of discriminative object pairs), unessential (included into no one of the IT), obligatory (included into all of IT s) features, and also all of the minimal discriminative subsets, which are minimal unconditional IT in their essence [5]. Further in the paper we will use obligatory, constant, non-informative and alternative features only.
Similarly the notion regularities is introduced in application of descent algorithm for problems solution of computer-aided design and redesign LCD. In this case non-informative internal variables are variables not ensuring any criterion of coding internal states of asynchronous FSM [1].
T o present conditions of discrimination of tutorial obj ects we will use a binary implication matrix U, constructed onthe basis of Q and R matrices. Columns of the matrix U are associated with the characteristic features, and rows are associated with the results of the comparison of description of different pairs of the objects, which are included into different patterns.
As far as tests are considered as one of the kinds of regularities a test construction is performed simultaneously with revealing a part of regularities with the use of deep equivalent and optimizing transformations over rows ofthe implication matrix U.
The implication matrix will be named irredundant and will be denoted by U’ if it has no covered rows.
The row ofimplication matrix U represents the value ofvector-function of destinguishing and its components are in one-to-one correspondence with characteristic features (columns of matrix Q). Dimension of vector-function of destinguishing
The value of the vector-function of destinguishing u(a,b) is calculated by the following formula:
u(a,b) = z(a)z(b) v z(a)z(b) = z(a) © z(b). (1)
In computer-aided design and redesign of LCD for descent algorithms with the use of one- or two- level memory, the values of vector-functions of noncrossing described by modeles of asynchronous FSM are given in [1].
Further definitions are related with optimization of tests synthesis in solution of problems for test patterns recognition on the base of descent algorithms with genetic transformations use.
As to the optimization of tests synthesis in problems solution of computer-aided design and redesign of LCD on the base of descent algorithms with the use of genetic transformations these algorithms will be analogical after transformation to the irredundant implication matrix fromboththe matrix of coding an internal states of asynchronous FSM and the matrix of transitions. Construction ofthe irredundant implication matrix is described in [1].
For estimation of individual contribution of inclusion of characteristic features in test, we will use weight coefficients wm (m = 1, ..., M) are calculated by the following formula:
wm
K-1
z
r=1
K NrNt
I Z Z 5 If
t=r+1i=1j=1
K-1 K
I I
i=1 j=i+1
a i a j
(2)
where K is the number of extracted patterns; Nc is the number
ofrows in the c-thpattern description (ce {r,t,i,j}); §m = 0,
if qim=qjm=0 or qim=qjm= 1 (qim is the value of the Q matrix element in the cross of the i-th row and j-th
column); 5 m = pip j 2di +dj (d “ is the number ofvalues «-
» in qi (row i of the matrix Q), pi is the replication coefficient i-th row), if qim= 0 and qj,m=1 or qim=1 and
qj,m=0; 8jj = pipj2 i j , if qi,m=“-” and (or) qj,m=“-”; sj is a number of objects in the j-th pattern (j=1,..,K), calculated by the formula:
Nj -
a j = £ pl2dl . (3)
l=1
The weight (cost) Wi of the test is equal to the sum of weight coefficients of j-th test features (j = 1,., l), where l is a number of already constructed tests.
A sample (an individual) of the population, that is represented by the submatrix Q with the columns, corresponding to the features, included into IT, is considered as able-to-compete under the condition ofthe less number of unit genes (features), included into a chromosome (test), and larger sum of weight coefficients of unit genes, included into this chromosome (IT) [7-9].
52
R&I, 2003, N 3
The value of competitive ability depends on the particularities of matrices Q and R. An able-to-compete individual is considered suitable to join the population. [7-9].
The size (power) of the population is determined by the number of the individuals to be included.
Let us introduce the following characteristics describing peculiarities of genetic transformations used in the paper:
1. Initial chromosome T0 consists of genes not entering the number of constant, non-informative and alternative genes (except one of each group of alternative genes with the greatest weight), corresponding to columns of irredundant implication matrix Uy. In other words the chromosome T0 represents a set of all genes, which take part in genetic transformations and corresponding to columns of the matrix Uyy (submatrix of matrix Uy). It is obvious that the length N ofthe initial chromosome T0 is equal to a number of appropriate matrix Uy columns corresponding only to the genes taking part in genetic transformations, i.e. N J M, where N is the length of chromosome, M is a number of all columns of matrix Uy.
2. Gene gk (k O {1,..,N}) is a minimal informative unit taking part in genetic transformations. The gene gk is marked by an unit value of k-th component of the initial chromosome T0 ; k-th gene is characterized by weight, which is equal to weight coefficient wk of corresponding k-th column of matrix U’'. So
T0 = {gk}N.
3. Alphabet He {0,1} defines a correspondence of chromosome genes with respect to the initial chromosome T0 where value 0 means an absence ofk-th gene in a chromosome, 1 - presence. For instance, suppose a some chromosome Tj is equal to the initial chromosome T, and here the pattern (mask) Hj for the alphabet H, describing the chromosome Tj,
will consist of all units in all N positions, i.e. {111... 1} .
' N '
4. The lengthX (Tj) of chromosome Tj, the mask of which is constructed, on the base of alphabet H e {0,1}, defines a number of fixed unit positions of the chromosome Tj in relation to the initial chromosome T0.
5. Step t (0, 1, ...) corresponds the discrete time scale and describes a population evolution in time.
6. P(t) is population obtained on t step, and P(0) is the initial population.
7. N(t) is the length of population P(t) and is defined by the number of included chromosomes in the population.
8. T(t) = {T1 (t),T2 (t),...TN(t)(t)} is a set of chromosomes forming the population P(t). Each chromosome Tj (t) (j e {1,..,N(t)}) corresponds to j-th IT and T(t)
corresponds to the set of ITs of appropriate population P(t). In fact, the population P(t), obtained on t-step includes a set of solutions (chromosomes) which should be subjected to further optimization.
9. Fitness function f(Tj (t)) defines a contribution of chromosome Tj (t) to population P(t). The function f(Tj (t)) is calculated by the formula:
f(Tj(t)) = Wj. (4)
10. Selection operator s represents a proportional selection with mapping p, which defines a recombination. Note that in the paper under recombination we understand the crossover operator only.
11. Mapping p is based on probability of including a separately taken chromosome Tj (t) from P(t) in crossover process. The probability of including the chromosome Tj (t) into recombination is denoted by ps(T- (t)) and is calculated by the formula:
N(t)
ps(Tj(t)) = f(Tj(t))/ Z f(Tc(t)). (5)
c=1
12. Stop criterion represents a number of parameters (required number of IT, a number of repeating IT, a number of seconds assigned from outside for an elite population construction), on reaching one of which, GA will stopped.
Optimization of tests synthesis on the base of step-by-step descent algorithms with the use of genetic transformations
The theorem 1 about an estimation of minimal length of test lies in the foundation of optimization of test synthesis based on step-by-step algorithms:
Theorem 1. d >]hlog2 n[,where h=2r+1 at fault tolerance and races tolerance coding of internal states of asynchronous FSM, h=r+1 at test pattern recognition, r is a number of failures of memory elements (measure errors or entry at test pattern recognition), d is the length of test, n is a number of rows (internal states) of matrix Q and a number of patterns at test recognition correspondingly and ] b [ is the least from above integer to b.
Corollary. The use of theorem 1 allows to cut the volume of search at estimation of intermediate solutions on length of h-multiple shortest column covering in w times, where w > 1 + d / m , and m is a number of columns of matrix Q.
For simplicity of further description we take h = 1.
The step-by-step algorithm with the use of genetic transformations consists in an iterative procedure of construction of irredundant column covering of matrix Uyy and consists of the following stages:
1. Selection ofa non-excluded set of genes B+(B+H B, where B is a set of genes from T0 corresponding to columns of matrix Uyy, B + is a kernel, which consists of all obligatory genes).
2. Selection of perspective genes (genes with the greatest weights) from chromosomes Tx(t) and Ty(t) (x ^ y; Tx(t), Ty(t) e T(t)) of parent population P(t) has been obtained on t-step. Here a summary number of already selected genes from both chromosomes should not exceed the value v, where v is
v = d -X (B +). (6)
3. Ensuring of correspondence condition of obtained chromosome Tj(t+1) to irredundant test is verified.
4. In case of lack of correspondence of chromosome Tj (t+1) to condition of test to irredundency, it is necessary to add in it a new perspective gene from B - (B - = B \ B+\ B’), where B ’
R&I, 2003, N 3
53
is a set of genes selected on stage 2, and return to stage 3. Otherwise it is the end.
Formally we will describe the GA in the following way [23]:
GA = (P(t), n, s, p, f). (7)
The genetic algorithm represents an evolution process, which is reflecting the evolution of population P(t) in time with defined discrete step t (t = 0,1...). The necessary requirement of evolution progress is an non-deterioration of every next (children) population P(t+1) in relation to its parent population. In other words, the average fitness function of children chromosomes should not be less than the average fitness function of parent chromosomes. Mathematical proof of the requirement is the theorem, which is described in [25]. The theorem is based on fundamental Holland theorem about pattern (mask) [22] and considering a limiting case for population of unlimited size, where in search space the proportional selection is applied continuously and cyclically. Drawing a parallel with the theorem [25] and considering the ps as function of T(t), we can say about convergence of
ps (T (t)) to optimal solution, namely obtaining a required
number of ITs taking into account the stop criterion and expediency application ofGA usage for optimization synthesis of ITs.
Taking into account the mentioned above, we give in details the step-by-step algorithm
1. Construction of the initial population P(0) by including a set of chromosomes T(0) = {Tj (0)}N(0). The chromosome Tj (t) is formed by including genes from initial chromosome T0 with taking into account a gene weight wk (k e {1,..,n}) and providing the condition of its test to irredundency. Here all obligatory genes are included in every chromosome and will be not subject to crossover operation in future.
2. Construction ofthe next intermediate population P(t+1) (a set of chromosomes T(t+1)) by means of using the so-called “greedy” crossover operator (introduced by Grefenstette)[15] on a set of chromosomes from previous (parent) population P(t) inthe following way. A set of chromosomes corresponding to parent population is divided on two parts (with the same length at even cardinality of the population). In the first part chromosomes with even numbers are included, in the second part - with odd numbers correspondingly. To enter into intermediate population one chromosome from every part (Tx(t) is chosen from odd part, Ty(t) is chosen from event part, where x ^ y; Tx(t), Ty(t) e T (t)) is selected in a random way taking into account its fitness functions. Later these chromosomes interbreed.
3. Application of step-by-step algorithm to form a new chromosome Tj(t+1), as well as to estimate it on correspondence to test irredundency. The step-by-step algorithm is performed in 3 stages:
3.1. B - = B \ B +.
3.2. Consequent addition of genes into chromosome Tj(t+1) from chromosomes Tx(t), Ty(t) and the set B- on reaching the value has been defined by the formula (6).
3.3. Verification on correspondence condition of obtained chromosome Tj(t+1) to irredundant test. In case of lack of
correspondence of chromosome Tj(t+1) to condition of test to irredundency, return to p.3.2.
4. Selection of the best chromosome from three chromosomes (Tx(t), Ty(t) and Tj(t+1)) on the base of their fitness functions. Including it into a set of chromosomes of corresponding intermediate population.
5. Repetition of construction of intermediate population on reaching the stop criterion, t := t+1, fixation of the t-step and the population P(t) called elite.
6. Construction of result matrix Q’ on the base of obtained population P(t) to carry out in future the voting procedure on a set of ITs for final decision-making.
In the case of h > 1, the matrix Q should ensure corresponding conditions of noncrossing at computer-aided design and redesign LCD (destinguishing at pattern recognition), at that the algorithm of search of the shortest column covering is replaced by the algorithm of search of the shortest h-multiple covering taking into account a verification of the test on irredundency.
Conclusion
Suggested step-by-step algorithm on the base of descent algorithm with the use of genetic transformations allows to cut search essentially at optimization of tests synthesis in computer-aided design and redesign LCD and test pattern recognition at large feature space.
It is expedient to use the descent algorithm for optimization of tests synthesis with the use of genetic transformations in construction ofintelligent CAD [13] and different recognizing systems. In future, due to carrying out the voting procedure on a set of tests the given method will allow to increase a steadiness coefficient to failures of memory elements at computer-aided design and redesign LCD, and measuring errors or entry at pattern recognition.
Including ofthe algorithm into software tool IMSLOG [26] is assumed. Intelligent CAD and applied intelligent systems are constructed on the base of the software tool IMSLOG with the use of the methods of test pattern recognition.
In future, we are going to estimate an effectiveness of suggested method.
The work was supported by the Russian Fund of Basic Research, project Nos. 01-01-00772, 01-01-01050 and 03-0106116).
References: 1. Yankovskaya A.E. Algorithms of Descending for Discrete Field Synthesis Problems and Applications // Collection, Theory of Discrete Control Fields, Russia, Moscow, 1982. P. 206214. 2. PradhanD.K., RedduS.M. Techniques to construct (2, 1)-separating systems from Linear error-correcting codes. IEEE. Trans. Comput., 1976. C-25, N9. P.945-949. 3. Sagalovich Yu.L Cascade codes of finite-state machine. Problems of information transition, 1978. N2. P. 77-85. 4. Synthesis of asynchronous finite-state machine on computer / Ed. by Zakrevsky A.D. Minsk: Science and engineering, 1975. 5. Yankovskaya A.E. Logical tests and cognitive graphic tools in intelligent system // Recent information technologies in investigations of discrete structures: Proceedings of the 3-d All-Russian conf. with International participation. Tomsk: CO RAS, 2000. P. 163-168. 6. Yankovskaya A.E. Distinguishing functions for knowledge base analysis with the use of matrix knowledge description // Artificial intelligence- 90: Proceed. of. II All-Union. conf.. T. 1. Minsk, 1990. P. 102-105. 7. Yankovskaya A.E. Design of Optimal Mixed Diagnostic Test With Reference to
54
R&I, 2003, N 3
the Problems of Evolutionary Computation// Proceedings of the First International Conference on Evolutionary Computation and Its Applications (EVCA’96). Moscow, 1996. P. 292-297. 8. Yankovskaya A .E. T est Pattern Recognition with the Use of Genetic Algorithms // Pattern Recognition and Image Analysis. 1999. Vol. 9, No. 1. P. 121-123. 9.Yankovskaya A.E. The Test Pattern Recognition with Genetic Algorithms Use // Proceedings of the Pattern Recognition and Image Understanding. 5th Open German-Russian Workshop. Germany, Herrshing. 1999. P. 47-54. 10. Yankovskaya A.E., BleikherA.M. Genetic algorithms for the synthesis optimization ofa set ofirredundant diagnostic tests in the intelligent system // Computer science journal of Moldova. 2001. Vol.1, No. 3. P. 336-349. 11. Yankovskaya A.E., Bleikher A.M. Synthesis optimization of irredundant diagnostic tests on the base of genetic algorithms and knowledge regularities // Proc. ^Integrated models and soft computations in artificial intelligences, Science, «F izmatlit», Kolomna, 2001. P. 214-219. 12. YankovskayaA.E., BleikherA.M. Genetic Algorithms for the Synthesis Optimization of a Set of Irredundant Diagnostic Tests in Intelligent System // Artificial intelligence N2, Ukraine, Donetsk, 2000. C.272-278. 13. Yankovskaya A.E. The main directions of construction intelligent industrial CAD of logical control devices // CAD SBT-89: Proc. of I Internal conf. M., 1989. P. 106-115. 14. Yankovskaya A.E., Miretskiy A.O., Korotysheva N.V. Interactive system of logical design AVTOL-TSM on the basis of knowledge base // New information technology in systems engineering. M.: Radio and Sviyaz, 1990. P. 102-107. 15. Yankovskaya A.Ye., Gedike A.I. Integrated Intelligent System EXAPRAS and its Application //
Journal of Intelligent Control, Neurocomputing and Fuzzy Logic. USA, Nova Science Publishers, Inc. 1995. Vol. 1. P. 243-269. 16. Yankovskaya A.E. Test Pattern Recognition Medical Expert Systems with Cognitive Graphic Elements // Komp’yuternaya Khronika. 1994. N8/9. P. 61-83. 17. Gedike A.I., Yankovskaya A.E. Ecologic-medical intelligent system // Theory of system control. 1995. N5. P. 224-228. 18. Yankovskaya A.E., Kudyakov A.I. Logical-combinatorial recognizing system STROIKOMPOSIT for complex solutions technological problems in building // USiM. 1996. N1/2. P. 28-38. 19. Kureichik V.M. Genetic Algorithms. Condition. Problems. Perspectives // Theory and control systems, 1999. N1. P. 144-160. 20. Yankovskaya A.E. Algorithms of Internal States Coding of Asynchronous F inite-State Machine// Collection, Digital Models and Integrated Structures. Taganrog, 1970. P. 371-380. 21. YakubaitisE.A., GobzemisA.Yu. Coding of states of asynchronous finite-state machine with two-level memory. AiVT, 1970. N6. P. 1-4. 22. Holland J.H. Adaptation in Natural and Artificial Systems/ / Ann Arbor: University Michigan Press, 1975. 23. Skurihin A.M. Genetic Algorithms // News of artificial intelligence, 1995. N4. P. 6-46. 24. Grefenstette J., Gopal G., Rosmaita B., D. van Gucht. Genetic algorithms for the traveling salesman problem // Proc. of Intern. Conf. of Genetic Algorithms and The Applications / Ed. By Grefenstette. P. 224-230. 25.Xiaofeng Q., PalmietF. Theoretical analysis of evolutionary algorithms ij-lth an infinite population size in continuous space. Parts 1,11, IEEE Trans. on Neural Networks, 1994. Vol.5, No.1. P. 102-130. 26. Yankovskaya A.E, Gedike A.I., Ametov R. V., Bleikher A.M. IMSLOG-2002 Software Tool for Supporting Information Technologies of Test Pattern Recognition // Pattern Recognition and Image Analysis. 2003. Vol. 13. No. 2. P. 243-246.
R&I, 2003, N 3
55