Brain-Like Computer Structures
Vladimir Hahanov, Member, IEEE, Svetlana Chumachenko, Ngene Christopher Umerah, Tiecoura Yves Computer Engineering Faculty, Kharkov National University of Radioelectronics, Kharkov, Ukraine
Abstract - High-speed multiprocessor architecture for brain-like analyzing information represented in analytic, graph- and table forms of associative relations to search, recognize and make a decision in n-dimensional vector discrete space is offered. Vector-logical process models of actual applications, where the quality of solution is estimated by the proposed integral non-arithmetical metric of the interaction between binary vectors, are described.
I. Introduction
The goal is to remove arithmetic from computer and transform free resources to the brain-like infrastructure of associative logic simulating the brain functionality that makes possible making the right decision every moment. The brain and the computer have the same technological basis in the form of primitive logical operations: and, or, not, xor. With experience, the brain and the computer create more complex functional space-time logic converters using the above primitive operations. Specialization of computer, focused on using only logical operations, enables to approximate to the associative logic human thinking, and thus considerably (x100) improve the performance of solving nonarithmetic problems.
Removing arithmetic operations, leveraging the parallelism of the vector logic algebra, and multiprocessor architecture provide an efficient infrastructure, which combines mathematical and technological culture to solve applied problems.
Brain-likeness of multiprocessor digital system-on-a-chip is the concept of making an architecture and models of computational processes to implement typical brain nonarithmetic associative logic functionalities on today's digital platform by using vector logical operations and criteria for search, pattern recognition and decision-making problems. Market appeal of logical associative multiprocessor (LAMP) is determined by thousands of old and new logical problems, which now are solved ineffectively by redundant universal computers with high-performance arithmetic
Manuscript received October 19, 2009.
Vladimir Hahanov is with the Kharkov National University of Radioelectronics, Ukraine, 61166, Kharkov, Lenin Prosp., 14, room 321 (corresponding author to provide phone: (057)7021326; fax: (057)7021326; email: hahanov@ kture.kharkov.ua).
Chumachenko Svetlana is with the Kharkov National University of Radioelectronics, Ukraine, 61166, Kharkov, Lenin Prosp., 14, room 378 (phone: (057)7021326; fax: (057)7021326; e-mail: ri@ kture.kharkov.ua).
processor. Here are some problems relevant to the IT-market: 1. Analysis and synthesis of syntactic and semantic language structures (abstracting, error correction, analysis of the text quality). 2. Video and audio pattern recognition by means of their representation by vector models of essential parameters in discrete space. 3. Use of Infrastructure IP for complex technical products to ensure their manufacturability and lifetime reliability. 4. Knowledge testing and expert appraisal of objects or parties to determine their validity. 5. Identification of the object or process to make a decision under uncertainty. 6. Exact information retrieval in the Internet, if information is given by a vector of parameters. 7. Target designation of fighter or aircraft autoland system functioning in microsecond time. 8. Air traffic control or optimization of municipal traffic control infrastructure to avoid conflicts. Practically all these problems are solved in real time; they are isomorphic by the logical structure of the process models, based on a set of interrelated associative tables. To solve them it is necessary quick and dedicated hardware platform (LAMP), focused on the concurrent execution of search, recognition and decision-making procedures, estimated by means of the integral nonarithmetic quality criterion.
Our goal in this article is to increase considerably (x100) the speed of search, recognition and decision-making procedures by means of multiprocessor and concurrent implementation of associative logic vector operations for the analyzing graph and tabular data structures in discrete Boolean space without the use of arithmetic operations.
The problems are: 1) Developing nonarithmetic metric for estimating the associative logic solutions. 2) Creation of data structures and process models for solving the applied problems. 3) Designing the architecture of logical associative multiprocessor. 4) Implementation of LAMP.
Essence of the research is the infrastructure for expert servicing of requests in real time integrating multiprocessor system-on-chip with associative-logical data structures to obtain a deterministic solution, the validity of which is estimated by nonarithmetic integral interaction quality criterion of a query and given discrete space.
References: 1. Hardware platform for associative logical information analysis [1-2]. 2. Associative logical data structures for solving the information problems [3-4]. 3. Models and methods for discrete analyzing and synthesizing [5-6]. 4. Multiprocessors for solving information-logical problems [7-10]. 5. Brain-like and intelligent logical computing [1112].
30
R&I, 2009, №4
II. Integral Metric For Solution Estimation
Infrastructure of brain-like multiprocessor includes models, methods and associative logical data structure, focused on hardware support of search, recognition and decisionmaking processes [11-12] on the basis of vector nonarithmetic operations.
Evaluation of problem solution is determined by the vector-logical criterion of interaction quality between a query (a vector m) and a system of associative vectors (associa-tors). The query processing results in generating a positive response in the form of one or more associators, as well as the numerical grade of membership characteristic (quality function) of an input vector m to the obtained solution: p(m e A). The input vector
m = (m1,m2,...,mj,...,mq), m; e{0,1,x} and the matrix Ai of associators Aijr(e A;j e а; e A) = {0,1, x} have the same dimension that is equal to q . Below the membership grade of m-vector to А is designated by p(m e A).
There are 5 types of set-theoretic (logical) A - interaction of two vectors m n A defined in Fig. 1. They form all primitive reactions of the generalized SRM systems (SRM
- Search, Pattern Recognition and Decision Making) on the input request vector. In the technological field of knowledge
- Design & Test - this sequence of actions is isomorphic to the route: fault finding, fault locating, decision-making for repairing. All three stages of technological route require the metric for estimating solutions to choose the optimal variant.
Fig. 1. The results of the intersection of two vectors
Definition. Integral set-theoretic metric for the estimating query quality is a function of the interaction of multivalued vectors mnA, which is determined by average sum of three normalized parameters: code distance d(m, A) , membership function p(m e A) and membership function p(A e m):
Q = -3[d(m,A) + p(m e A) + p(A e m)],
1 n
d(m,A) = — [n - card(mi I a; = 0)]; n ;=1
p(m e A) = 2card(mnA)-card(A) ^ card(m n A) = nn
= card(m; I A; = x) & card(A) = card( U A; = x); (1)
i=1 i=1
p(A e m) = 2card(mnA)-card(m) ^ card(m n A) = nn = card(m; I A; = x) & card(m) = card( U m; = x). i=1 i=1
Explanations. The normalization of parameters makes it possible to estimate the level of vector interaction in the interval [0,1]. If it is fixed the limiting maximum value of each parameter equal to 1, it means the vectors are equal. The minimal estimation Q = 0 is fixed if the vectors are not coincided by all n coordinates. If intersection power m n A = m is equal to half of A vector space, membership and quality functions are equal respectively:
p(m e A) = ^; p(A e m) = 1; d(m, A) = 1;
Q(m,A) = -0 = 5
2 x 3 6
The same value will be setting for Q parameter if the power of intersection m n A = A is equal to half of the vector space m . If the power of intersection card(m n A) is
equal to half of the power of vector spaces A and m , membership functions are the following:
p(m e A) Q(m,A) =
= -2-; p(A e m) = ■—; d(m,A) = 1;
4 = 4 = 2
2 x 3 = 6 = 3
It should be noted, if the intersection of two vectors is equal to the empty set, then the power of number 2 from the symbol "empty" is equal to zero:
2card(mnA)=0 = 20 = о. It really means that the number of common points in the intersection of two spaces is zero.
The aim of a new vector logical criterion of solution quality is improving considerably the performance of calculating the quality Q of interaction between the components m and A, when analyzing the associative data structures by using the vector logical operations only. The arithmetic criterion (1) without the averaging membership functions and code distance can be transformed to the form:
R&I, 2009, №4
31
Q = d[m,Ai(j)] + |a[m e Aj(j)] + e m],
n(m)
d(m,Ai(j)) = card[m © Aj(j) = 1];
i(j)=1
n(m)
p(m e Ai(j)) = card[Ai(j) = 1] - card[m л Ai(j) = 1];
i(j)=1
n(m)
^(Ai(j) e m) = card[m = 1] - card[m л Aij = 1].
i(j)=1
The first component of the criterion forms the degree of mismatch between n-dimensional vectors - the code distance, by performing xor operation, second and third ones determine the degree of non-membership of conjunction result to a set of “1” each of two interacting vectors. The notions of membership and non-membership are complementary, but calculating non-membership is more technological. Thus, the ideal criterion of quality is equal to zero, if two vectors are equal. The estimation of the interaction quality between two binary vectors is decreasing with increasing criterion from 0 up to 1. To finally remove arithmetic operations, when counting a vector quality criterion, it is necessary to transform the expressions (2) to the form:
Q = d(m, A) v p(m e A) v p(A e m),
their interaction quality by formulas (3) is shown in the following form (zero coordinates are marked by dots):
m 1 1 . . 1 1 . . 1 1 . .
A
m л A .... 1 1 ... 1 . .
m л A 1 1 1 1 .. 1 1 1 . 1 1
d(m, A) = m © A 1 1 .... 1 1 1 .. 1
^(A e m) = m л m л A 11 1 . . .
U(m e A) = A л m л A 1 1 ... 1
Q = d(m,A) vц(ш e A) v^(A e m) 1 1 .... 1 1 1 .. 1
Q(m, A) = (6/12) 111111
It is formed not only the estimation of vector interaction that is equal to Q(m,A) = (6/12), but the most importantly, the unit coordinates of the row Q = d(m, A) v p(m e A) v p(A e m) identify all essential variables for which there is low-quality vector interaction. To compare two solutions obtained by logical analysis, compressed quality vectors Q are used; and vector procedure including the following vector operations is performed:
ГХт A) JQl^A) ^orKh(m,A)лQ2(m,A)©Qi(m,A)] =0,
Q m ) lQ2(m,A) ^o(Qj(m,A)лQ2(m,A)©Q!(m,A)]=1.(4)
d(m,A) = m © A;
----- (3)
p(m e A) = A л m л A;
p(A e m) = m л m л A.
Here the criteria are not numbers, but vectors, which determine the interaction of components m, A . The increasing quantity of 0 in three quality vectors improves the criterion, and 1’s indicate loss of interaction quality. To compare the estimations it is necessary to determine the power of 1’s in each vector without performing addition operation. This can be done using the register [9-10] (Fig. 2), which makes it possible to perform left shifting and compacting all 1 coordinates of n-bit binary vector for one clock cycle.
Vector-bit or-operator of devectorization determines a binary bit solution on the basis of application a logical OR operation to n bits of an essential variables vector of quality criterion. A circuit design for decision
Q
Q1 ^ y = 0 Q2 ^ Y = 1
and analytic process-model include three operations, shown in Fig. 3.
Fig. 3. Process-model of decision
Fig. 2. Register for shifting and compacting 1’s
After compacting procedure right unit bit number of compacted set of 1’s determines the index of interaction quality for vectors. For binary sets m = (110011001100), A = (000011110101) the determining
For binary vectors which are quality criteria the procedure for choosing the best one on the basis of expression (4) is presented below:
Q1(m,A) = (6,12) 1 1 1 1 1 1
Q2(m,A) = (8,12) 1 1 1 1 1 111....
Q1(m,A^Q2(m,A) 1 1 1 1 1 1
Qj(m,A) © Qj(m,A) л Q2(m,A)
Q(m,A) = Q1(m,A) 1 1 1 1 1 1
Vector logical criteria of interaction quality for associative sets enable to obtain estimation of the search, pattern recognition and decision-making with high-speed parallel logic operations, which is especially important for critical real-time systems.
The quality criterion Q uniquely determines three forms of interaction between any two objects in the n-dimensional vector logical space: the distance, and two membership functions. Taking into account that all three estimates in-
32
R&I, 2009, №4
eluded in the integral criterion form are joined by the function OR, simplification of vector interaction gives the result Q = d(m, A) v p(m e A) v p(A e m) =
= m © A v A л m л A v m л m л A =
= m © A v [A л (m v A)] v [m л (m v A)] =
= m © A v [Am v AA v mm v mA] =
= (Am v mA) v [Am v AA v mm v mA] =
= Am v mA v Am v AA v mm v mA =
= m © A.
The quality criterion Q = m © A conforms to the metric for estimating distance or interaction in vector logical space, as well as it has a trivial computational form for estimating many solutions related to the analysis and synthesis of information. In fact, a logical vector space should not use the metric distance and quality criteria, including scalar arithmetic operations. WFC determines not only the distance between disjoint objects vector logical space, but also their mutual affiliation: if they intersect. Vector logic criterion (VLC) determines not only the distance between disjoint objects of a vector-logical space, but also their mutual membership: d(a,b) vp(a,b) vp(b,a), if they overlap.
Vector discrete logical (Boolean) space defines the interaction of objects through the use of three axioms (identity, symmetry, and triangle), which form a non-arithmetic B-metric of vector dimension:
d(a,b) = a © b = (ai © bi),i = 1,n;
B =
d(a,b) = [0 ^ Vi(di = 0)] o a = b;
d(a,b) = d(b,a);
d(a,b) © d(b,c) = d(a,c),
© = [d(a,b) л d(b,c)] v [d(a,b) л d(b,c)].
Vertices of a transitive triangle are the vectors, which identify the objects of n-dimensional Boolean B-space. Sides of the triangle d(a, b), d(b, c), d(a, c) are the distances between the vertices. They are also vectors of the dimension n, where each bit is defined in the same alphabet as the coordinates of vertex vectors.
Vector transitive triangle is perfect analogy with the numerical distance of the metric M-space, which is determined by the system of axioms defining the interaction of one, two and three points in any space:
fd(a,b) = 0 o a = b;
M = <jd(a,b) = d(b,a);
I d(a,b) + d(b,c) > d(a,c).
The specific of metric triangle axiom is numerical (scalar) comparison of the distances for three objects, where the interval uncertainty of the expression “two triangle sides can be greater or equal to third one” is not usable to determine the exact length of the last side. Elimination of this disadvantage is able to be done only in logical vector space, which characterized by determinate representation of each parameter of the state for process or phenomenon. Then the numerical uncertainty of third triangle side in a vector logical space takes the form of the exact binary vector, which
characterizes the distance between two objects and is calculated on the basis of information about the distances of other two triangle sides:
d(a, b) © d(b, c) = d(a, c).
We can transpose the right component to the left side, which makes it possible to compress any closed space to a zero vector
d(a, b) © d(b, c) = d(a, c) ^ d(a,b) © d(b, c) © d(a, c) = 0 .
Convolution of space to a zero vector is of interest for many practical problems, including: 1) Diagnosis and error correction when the transmitting the information via communication channels. 2) Fault detection in digital products based on two-valued and multivalued fault detection tables. The theoretical justification of space convolution is presented below. It includes the proving of correctness of the use vector Boolean space metric to determine the interaction between logical structures, including point, line, plane.
Axiom 1. For Boolean variables the logical expression is valid: a = b ^ a © b = 0 . In fact, modulo 2 sum is a function of nonequivalence, which is true or unit value when variable values are not coincided a Ф b ^ a © b = 1 and zero - when arguments are coincided.
Definition 1. The distance between two points of n-dimensional space is a vector, calculated on the basis of the XOR-sum of the same name coordinates:
n
d(a,b) = ai © bi.
i=1
Definition 2. The vector distance between two points (objects) of n-dimensional logical space is zero (one), if all components of the vector are zero (one):
n
d(a,b) = 0(1) ^ Vi[ai © bi = 0(1)].
i=1
Definition 3. A simple chain is a sequence of vector distances and points in a space, not including equal components: distances or points
Definition 4. Vector logical cycle D is a set of vector distances d; e D between points of a space, forming a closed simple chain, where the first and last points are coincided.
Theorem 1. Xor-sum of vector distances, forming a cycle, between two points in n-dimensional space is equal to zero: d(a,b) © d(b,a) = 0.
This follows from the axioms of symmetry (commutativity) that the vector distances between any two points in n-dimensional space d(a,b) = d(b,a) are equal. But, according to axiom 1, the transposition of the right side of equality to the left one is accompanied by the regulation of relations between the components using xor-operation, the result of which is equal to zero: d(a,b) © d(b,a) = 0 in view of the
equality of vector distances.
Theorem 2. Xor-sum of vector distances or two sides of the transitive triangle is equal to the third one.
R&I, 2009, №4
33
Proof. In general, the metric defines the interaction of three points (a, b, c) (the triangle sides) in the space by means of the forming three distances d(a,b),d(b,c),d(a,c). Assume that there are two distances d(a,b),d(b,c), which, according to Theorem 1, define equality d(a,b) © d(b, с) = 0 that has three variants of point interaction:
1) d(a,b) = d(b, с) = 0 - there is one point marked in this case by identifiers a,b,c, the distance between them is zero;
2) d(a,b) = d(b,c) ^ d(a,b) © d(b,c) = 0 - there are two points {b,a = c}, which form two identical distances, creating a cycle, in accordance with the principle of symmetry or commutativity;
3)
d(a,b) Ф d(b,c) ^ d(a,b) © d(b,c) Ф 0 ^ d(a,b) © d(b,c) = d(a,c) -there are two unequal distances d(a,b) Ф d(b,c), which are possible only when the interaction of three points in space a,b,c for determining third distance is realized by vector operation d(a,b) © d(b,c) = d(a,c). In this case, the vector specified in the right side of equality will never be equal to one in the terms of the left side, because d(a,b) Ф d(b,c). Thus, the relationship of any three points in a vector logical space can be reduced to a formal interaction, specified by the equality d(a,b) © d(b,c) = d(a,c), which degenerating regulates the interaction of two and a single point on itself.
Theorem 3. Xor-sum of vector distances in the transitive triangle is equal to zero:
d(a,b) © d(b,c) = d(a,c) ^ d(a,b) © d(b,c) © d(a,c) = 0 .
The proving is based on application of Axiom 1 to the transitive closure expression, obtained in Theorem 2.
n
Theorem 4. Xor-sum © di = 0 of vector distances i=1
di e D in cycle D, defined by finite quantity of nodes (n), is equal to zero:
1) di = 0;
2) d1 © d2 = 0;
3) d1 © d2 © d3 = 0;
4) d1 © d2 © d3 © d4 = 0;
5) d1 © d2 ©... © di ©... © dn = 0.
The proving is based on application Theorems 1-3 to the distances between the points (nodes), forming closed cycles. In the first case the transitive closure distance of point on itself is took place. In the second one it is the distance between two transitive closed points. In third one - between three points of a space. In forth one - between four points. The fifth case generalizes the presence of zero distance in the sum of transitive closures of any points in n-dimensional vector space.
Consequence 1. Xor-sum of any binary codes of the same length is equal to zero, if they form a cycle.
Consequence 2. Metric p of vector logic space is defined by a single equality that forms zero xor-sum of the distances between nonzero and finite quantity of points, closed in a cycle:
n
p = © di = 0 . i=1
Definition 5. Cyber Space is vector logic space, specified by p-metric, where the xor-sum of distances between a finite number of cycle points is equal to zero-vector.
The metric p of vector logical space is focused not on elements of the set, but the relationship, thereby reducing the axioms from three to one formula and extend it to arbitrary complex structures of n-dimensional space.
Example. There are five points of a vector space: (000111, 111000, 101010, 010101, 110011). The closure of these points in the cycle gives the following side-distances of the pentagon: (111111, 010010, 111111, 100110, 110100). Coordinatewise addition of all vectors gives the result: (000000). The practical significance of this fact lies in the possibility of determining any distance of a closed cycle, if (n-1) sides of a figure are known. For a triangle, this means the possibility of determining third side by two known ones. If create a triangle closed logical space, we can gain 66% of data, which generate all distances in the logical space.
III. Process Model For Searching, Recognition And Decision Making
The quality metrics represented in (3), makes it possible to evaluate the proximity between spatial objects, as well as the interaction of vector spaces. As practical example of the usefulness of an integral quality criterion we can consider the firing at a target, which is illustrated by the diagrams of vector interaction described above (see Fig. 1):
1) A shell hit right on a target;
2) The target is hit by a shell of unreasonably large caliber;
3) The shell caliber is not enough to hit a large target;
4) Inefficient and inaccurate shot by large-caliber shell;
5) shell flew past the target. A process model for the interaction P(m, A) corresponds to the integral quality criterion that evaluate not only hit or miss, but also the efficiency of utilizing the shell caliber.
The analytical form of a generalized process model for choosing the best interaction between the input query m and the system of logic associative relations is presented in the following form:
34
R&I, 2009, №4
n j*i
P(m,A) = minQi(m Д Ai) = v[(Qi л Qj) ®Qi]
i=1 j=1,n
Q(m,A) = (Qi,Q2,...,Qi,...,Qn);
A = (Ai,A2,...,Ai,...,An);
Д = {and, or, xor, not, slc, nop};
Ai = (Ai1,Ai2,...,Aij,...,Ais);
Aij = (Aij1,Aij2,...,Aijr,...,Amsq); m = (m1,m2,...,mr,...,mq).
Qi = d(m,Ai) v p(me Ai) v p.(A; e m), d(m,Ai) = m © Ai; p(m e Ai) = Ai лm л Ai; p.(A; e m) = m л m л Ai.
Comment: 1) The functionality P(m,A) specifies the analytical model for computational process in the form of statement, minimizing the integral quality criterion. 2) Data structures are presented as nodes-tables of the graph A = (ApA2,...,Ai,...,Am), which logically interact each other. 3) A graph node is described by the ordered set of the vector-rows of an associative table A; = (Aj1,Aj2,...,Ajj,...,Ajs) for explicit solutions, where
the row Aij = (Aij1,Aij2,...,Aijr,...,Amsq) is true proposition. Since the functional presented in tabular form has no time-constant input and output variables, this structure differs from sequential von Neumann’s machine, defined by finite automata Miles and Moore. Equivalence of all variables in the vector A;j = (A;j1,A;j2,...,Aijr,...,Amsq) creates conditions for their existence that means the invariance of the problem solving for direct and inverse implication in the space a; e A . The associative vector A;j is an explicit
solution, where each variable is defined in the final, multivalued and discrete alphabet
A;jr e{a1,a2,...,a;,...,ak} = P. The interaction P(m,A)
between the input vector-query m = (m1,m2,...,mr,...,mq) and the graph A = (ApA2,..., A;,..., Am) generates a set of solutions and makes it possible to choose the best ones by minimum quality criterion:
P(m,A) = minQj[m л (A1 v A2 v... v Ai v... v Am)].
The concrete interaction between the graph nodes generates the functionality A = (ApA2,...,A;,...,Am) that can be realized by the following structures: 1) A single associative table that includes all solutions of a logic problem explicitly. The advantage is maximum speed of parallel associative searching for a solution by the table. The disadvantage is the highest hardware complexity of memory allocation for large-scale table. 2) Tree (graph) structure of binary relations between the functional primitives, each of them generates the truth table for small numbers of variables. The advantage is the smallest hardware complexity of problem solving. The disadvantage is minimum speed of sequential associative searching for a solution by tree. 3) The com-
= 0;
(5)
promise graph structure of logically understandable to the user relations between primitives, each of them generates the truth table for logical strongly connected variables. The advantage is high speed of concurrent associative searching for solutions by minimal number of the graph tables, as well as relatively low hardware complexity of problem solving. The disadvantage is decrease in speed because of sequential logic processing of the graph structure for explicit solutions found in the tables.
Partitioning a single table (associative memory) on k parts allows reducing hardware cost, expressed in components (LUTs - Look Up Table) of programmable logic array [8,9]. Each memory cell is created by 4 LUTs. Taking into account the associative matrix can be represented by a square of side n, the total memory hardware cost Z(n) for storing of data and the time T(n) for analyzing the logic associative graph are functionally dependent on the number n of table partitions or the number of nodes:
4 , 1 (ni2 n2
Z(n) = k x — x I — I + h =-
4 I k I 4 x k
+ h, (h = {n,const};
T(n) = IX* + 4
4
tclk tclk tclk
(k +1), (tclk = const).
(6)
Here h is cost of the general control circuit for the system of associative memory. Consequences of reducing hardware is reduction the speed for processing the memory structure or increasing the time for analyzing the system components. The period of processing a single associative memory is a cycle of 4 clock pulses. The number of partitions k increases proportionally the number of cycles in the case of
4
the worst serial connection of memory. The summand------
tclk
determines the time needed to prepare data on the system input, as well as their decoding on the output of computer structure. The functional dependences of the hardware cost and the time for analyzing a graph of associative memory on the number of nodes or partitions are presented in Fig. 4.
Fig. 4. The dependences of hardware and time on the number of nodes
The generalized function for the efficiency of graph structure on the number of nodes
f[Z(n),T(n)] = Z(n) + T(n) =
(
4 X k
+ h
^ ( 4 ^
(k +1)
I
tclk
(7)
R&I, 2009, №4
35
allows determining the optimal partitioning for the total and specified volume of associative memory [6]. In the case shown in Fig. 4, the best partitioning is the minimum of additive function, which is determined by value k, reversing the function derivative to zero:
n x n = 600 x 600 , h = 200, tclk = 4 , k = 4 .
The proposed process model for analyzing the graph of associative tables, as well as the introduced solution quality criteria are the basis of the developing a dedicated multiprocessor architecture focused on the concurrent vector logic operations.
IV. Architecture Of Logic Associative Multiprocessor
To analyze large information volumes of logical data, there are several technologies focused to the practical application: 1. Use a workstation for serial programming, where the cost and time of problem solving are very high. 2. Development of a dedicated concurrent processor based on the PLD. The high concurrency of information processing compensates for the relatively low clock rate in comparison with cPU. such reprogrammable circuit design is the best solution regarding performance. Disadvantage is lack of flexibility the software methods for solving logic problems and high cost of implementing the system-on-a-chip PLD under large production volumes. 3. The best solution is to leverage advantages CPU, PLD and ASIC concurrently [8,9]. This is due to the flexibility of programming, the possibility of correcting the source code, the minimum command set, and simple circuit designs for hardware multiprocessor implementation, the parallelization of logic procedures by the structure of bit processors. The implementation of a multiprocessor in ASIC allows to obtain the maximum clock rate, the minimum chip cost for large product volumes, and low power consumption. Combining the advantages of the technologies above determines the basic configuration of the LAMP, which has spherical multiprocessor structure (Fig. 5), consisting of 16 vector sequencers. Each sequencer together with the boundary elements is connected with eight contiguous ones. The processor PRUS [9], developed by Dr. Stanley Hyduke (CEO Aldec, USA), is the LAMP prototype.
Fig. 5. LAMP macroarchitecture and interface Entering information in the processor is realized like the classical design flow, except the stage "place and route" that is replaced by the operation of distributing software modules and data among all logical bit processors running con-
currently. The compiler provides the placement of data among processors, sets the time of searching for solutions at the output each of them, and also plans transfer the results to another processor. LAMP is an effective processor network, which processes the data and provides the exchange of information between network components when searching for solution. The simple circuit engineering of each processor can effectively process very large arrays with millions bits of information, expending time in hundreds the times less compared with general-purpose processor. Basic cell (vector processor for LAMP) can be synthesized by using 200 gates, which makes it possible to implement network containing 4096 computers in ASIC, using advanced silicone technology. Taking into account that memory costs for data storage are very small, LAMP may be applied for the designing of control systems in the areas of human activity, such as: industry, medicine, information protection, geology, weather forecasting, artificial intelligence, space science. LAMP is of particular interest for digital data processing, pattern recognition and cryptanalysis. If LAMP functioning is considered, its main purpose is obtaining quasi-optimal solution of the integrated problem of search and / or pattern recognition by using infrastructure components focused to the performing vector logical operations: n
P(m,A) = minQi(m Д Ai), m = {ma,mb,mc,md}. i=1
System interface, corresponding to this functional, is presented in Fig. 5. All components {A,ma,mb,mc,md} can be input and output. Bidirectional interface specification is related to the invariance of relation for all variables, vectors, A-matrix, components and infrastructure inputs and / or outputs. Therefore, the structural model of LAMP can be used to solve any problems of direct and inverse implication in discrete logical space, and it emphasizes its difference from the automaton model concept of computer with explicit inputs and outputs. The components or registers m = (ma,mb,mc,md) are used for solution in the form of
buffer, input and output vectors, as well as for identification of quality estimation for request performance. one of the variants multiprocessor architecture LAMP is a structure shown in Fig. 6. The main its component is a multiprocessor matrix P = [Pj],card(4 x 4), containing 16 vector-
processors, each of them is designed for performing 5 logic vector operations with data memory contents, described by a table of dimension A = card(m x n).
П ma b --j,- LP
1 1
_i_
' UmtUl- CBvDl- □mjnU CU bq CM
Sequencer
Fig. 6. LAMP architecture and sequencer structure
36
R&I, 2009, №4
Interface is used for data exchange and data loading to the appropriate memory commands. The control unit initializes the executing commands of logical data processing and synchronizes the functioning all components of a multiprocessor. Infrastructure IP [1] is designed for servicing all modules, diagnosing faults and repairing functionality of components and device in whole. Elementary logic associative processor or sequencer (see Fig. 6) is a part of the multiprocessor and contains: logical processor (LP), associative (memory) A-matrix for concurrent executing basic operations, block of vectors m, designed for concurrent processing rows and columns of A-matrix, as well as data exchange when computing, direct access memory (CM) for the storing commands of data processing software, automaton (CU) for logic operations execution control, interface (I) for the connecting sequencer and other elements of a multiprocessor. Logical Processor (LP) (Fig. 7) provides the implementation of five operations (and, or, not, xor, s1s - shift left bit crowding), which are the basis for the creating algorithms and procedures of information retrieval and evaluation of solutions. LP module has a multiplexer at the input to select one of five operands, which is passed to the selected logic vector operator. By using a multiplexer (element or), a result is entered in one of four operands, which are selected by appropriate address.
Fig. 7. Structure of logic calculations
Implementation features of the logical processor is use of three binary (and, or, xor) and two unary (not, slc) operations. The last ones can be added to the cycle of processing register data by selecting one of three operations (not, slc, nop - no-operation). To improve the efficiency of logical unit, two elements with empty operation are included. If it is necessary to perform a unary operation only, the selecting nop at the level of binary commands should be done, that almost means the transfer data through a follower to second level of unary operations. All LP operations are register or register-matrix. The last ones are designed for the analyzing vector-rows of a table using input m-vector as a request for exact information retrieval. The following combination of operators and operands are acceptable in a unit for logic calculating:
[{ma,mb,mc,md}AAT C = j{ma,mb,mc,md }A{ma,mb,mc,md}
[{not, nop, slc}{ma, mb, mc, md ,Aj}.
A = {and, or, xor}.
Realization of all vector operations for logic calculating by using a single sequencer in Verilog environment and followed implementation in PLD chip gives the results:
Logic Block Utilization:
Number of 4 input LUTs: 400 out of 9,312 4%
Logic Distribution:
Number of occupied Slices: 200 out of 4,656 4%
Number of Slices only related logic: 200 out of 200 100 %
Total Number of 4 input LUTs: 400 out of 9,312 4%
Number of bonded IOBs: 88 out of 320 29%
Total equivalent gate count for design: 2400
Clock rate of register operations for Xilinx’s Virtex 4 is 100 MHz that by order of magnitude higher than similar procedures for a computer with clock rate 1GHz.
V. Infrastructure For Vector Logic Analyzing
Infrastructure is a set of models, methods and data definition languages, data analysis and synthesis tools for solving the functional problems. Model (system model) is a set of interrelated components defined in space and time, which describe the process or phenomenon with specified adequacy, and used for achieving the aim under constraints and metric for evaluating of the solution quality. Here, the constraints are the hardware costs, the time-to-market, which are have to be minimized. Metric for the evaluating solution by using the model is defined by a binary logic vector in the discrete Boolean space. The conceptual computer model is presented by an aggregate of control and operational automata. The system functionality model LAMP uses GALS (Global Asynhronus Local Synchronus) [8] technology for the creating hierarchical digital systems with the local synchronization of individual modules and simultaneous global asynchronous of the entire device.
To detail the structure of the vector processor and sequencer the analytical and structured process models are presented below. They are reduced to the analysis of the A-matrix by columns or rows. The first one is shown in Fig. 8 and it is designed for determining a set of feasible solutions relatively the input query mb .
mS =v [(mb л Ai)® mb];
ai i=1
n
Ai = (mb л Ai). i=1
Fig. 8. Searching all feasible solutions
The second structure (Fig. 9) searches for optimal solution on a set of ones, found in the first process model, by analyzing rows. In addition, the second model has a separate application, focused on the finding single-valued and
R&I, 2009, №4
37
multi-valued solutions, for example, when searching for faults in digital systems-on-chips.
mb =( Л Ai) л ( V Ai)
Vmai =1 Vmai =0
mm = ( V Ai) л ( V Ai)
Vmai =1 ^mai =0
Fig. 9. Structure for searching the optimal solution
All operations are presented by two process models are vector. Process model for analyzing rows (see Fig. 8) generates the vector ma for identification of feasible mai = 1 or contradictory mai = 0 solutions relatively the input condition mb for n cycles of processing all m-bit vectors of the table A = card(m x n). The quality (validity) of the decision is determined for each interaction between the input vector mb and the row Ai e A by the disjunction (devectorization) block. The matrix A can be modified by its intersecting with an input vector on the basis of the op-n
eration Ai = (mb л Ai), if it is necessary to remove from i=1
the A-table all insignificant for the solution coordinates and vectors, marked by unit values of the vector ma . An interesting solution for the problems of diagnosis by analyzing table rows shown in Fig. 9, should be interpreted as follows. After performing the diagnostic experiment the binary output response vector ma is made, which masks the A-table of faults to detect single or multiple faults. Vectors mb and mc are used to accumulate the results of conjunction and disjunction operations. Then the logical subtraction the contents of the second vector mc from the first register mb and subsequent saving the result in register md is performed. To implement the second equation, which generates a multiple solution, element AND is replaced by the function OR. The circuit has also a variable for the choosing the solution search mode: single or multiple. The process model uses as input condition a vector ma, which controls the choice of vector operation AND, OR for processing unit Ai (mai = 1) e A or zero Ai (mai = 0) e A A-table rows. The result of n cycles is accumulation of unit and zero solutions relatively coordinate values of the vector ma in the registers A1,A0, respectively. A priori, the vectors of 1’s and 0’s are entered in these registers: A1 = 1,A0 = 0 . After processing all n rows of A-table for n cycles the vector conjunction for the contents of register A1 and the inversion of the register A0 are performed, which generates the result in the form of the vector mb , where unit coordinates determine a solution. When analyzing a fault table of digital device the columns identified with the numbers of faults or faulty blocks to be repaired correspond to unit coordinates of the vector mb . Within the bounds of the Infrastructure IP
the optimization repairing problem can be solved by using an universal structure of vector logic analysis. It is necessary to cover all faults found in the cells by minimum number of spare rows and/or columns, such as memory. The technological and mathematical culture of vector logic in this case provides a simple and interesting circuit solution for obtaining a quasioptimal coverage, Fig. 10. The advantages are: 1) The computational complexity of the procedure is Z = n of vector operations, equal to the number of table rows. 2) The minimum hardware costs, which are a table and two vectors mb, ma for storing in-
termediate coverages and the accumulating result in the form of unit coordinates, corresponding to table rows, which contain a quasioptimal coverage. 3) There is no need for the classical splitting the coverage problem for searching a coverage core and a complement. 4) There is no need for complicated procedures for manipulating rows and columns. The disadvantage is obtaining not always optimal coverage that is costs for the efficiency of vector procedure, shown in Fig. 10.
mb = (mb v Ai); n
mai = v [(mb v Ai) л m^. i=1
Fig. 10. Process model for searching quasioptimal coverage There is the devectorization operation, which at the last stage transforms the vector result in a bit mai of the vector ma by the function OR
mai =v[(mb v Ai) л mb]. In general, in the algebra of vector operations the devectorization operation is written in the notation: <binary operation> <vector>:
vAi, лm, л (m v Ai). The inverse vectorization operation is the concatenation of Boolean variables: ma(a,b,c,d,e,f,g,h). In the process for coverage searching a priori the vectors mb = 0, ma = 0 are nulled. The quasioptimal coverage is accumulated in the vector ma for n cycles by serial shifting. Bits, entered in the register ma , are formed by the circuit OR, which realizes devectorization by analyzing the input result [(mb v Ai) л mb] on the presence of 1’s. The next example is characterized by functionally completeness of the diagnosis cycle, when this information is used to repair faulty memory cells after obtaining the quasioptimal coverage [9]. The dimension of a memory module 13x15 cells does not influence on the computational complexity of obtaining a coverage for ten faulty cells by using spare rows (2) and columns (5) (Fig. 11).
38
R&I, 2009, №4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
□ □ □ □ □ □ □ □ □ □ □ □ п п □
□ □ □ □ □ □ □ □ □ □ □ □ □ □ □
□ □ □ □ □ □ □ □ □ □ □ □ п п □
□ □ □ □ □ □ □ □ □ □ □ □ □ □ □
□ □ □ □ □ □ □ □ □ □ □ □ □ □ □
□ □ □ □ □ □ □ □ □ □ □ □ □ □ п
□ □ □ □ □ □ □ □ □ □ □ □ п □ п
□ □ □ □ □ □ □ □ □ п □ □ п п □
□ □ □ □ □ □ □ □ □ □ □ □ □ □ □
□ □ □ П □ п □ □ □ □ п п п п □
□ □ □ □ □ □ □ □ □ □
Д D О о п п □ ц □ □
Fig. 11. Memory module with spare and coverage table To solve the optimization problem a coverage table for faulty cells is generated (see Fig. 11); it contains rows (spares) for the covering faults:
(С2,С3,С5,С7,С8,C2,R2,R4,R5,R7,R8,R9) •
The columns are faults of cells (F2,2,F2,5,F2,8,F4,3,F5,5,F5,8,F7,2,F8,5,F9,3,F9,7) to be
repaired. Here the columns match the coordinates of faulty cells, and rows identify the spare components (rows and columns), which can repair the faulty coordinates. The process model (Fig. 9) makes it possible to obtain the optimal solution in the form
ma = 11111000000, which corresponds to
the coverage: R = {C2,C3,C5,C7,Cg}. It is one of three possi-
ble
minimum
solutions
R = C2,C3,C5,C7,C8 v C2,C3,C5,C8,R9 v C2,C5,C8,R4,R9 for a fault detection table. The technological model for embedded diagnosing and repairing memory is shown in Fig. 12.
Fig. 12. Model for embedded testing and repairing memory It includes four components: 1) Testing a module (UUT -Unit Under Test) by using the reference model (MUT -Model Under Test) to generate the output response vector ma , dimension of which corresponds to the number of test
patterns. 2) Fault diagnosis based on analysis of the fault detection table A. 3) Optimization of fault coverage by spares (rows and columns) based on the analysis of the table A. 4) Repairing memory by the readdressing (AD - Address
Decoder) faulty rows and columns of the vector ma by spare components SM - Spare Memory [9].
The process model for embedded servicing is functioned in real time and allows maintaining a digital system-on-a chip without human intervention that is an interesting solution for the critical technologies related to the remote maintenance of a product. The proposed process model for the analyzing associative tables, as well as the imposed quality criteria for logical solutions allows solving the problems for quasi-optimal covering, diagnosing software faults and/or hardware modules. The model of vector calculations provided the basis for the developing dedicated multiprocessor architecture focused to searching, pattern recognition and decision making by using associative tables.
Performance evaluation, Fig. 13, of design solution based on the specialization Sp and standardization St requires use three discrepant parameters: quality Y, time T, hardware cost H:
E = F(Y,T,H),
Y = (1 - P)n(1-Q);
T = - x S x d; f
H = 2(Hs x n).
Fig. 13. Performance evaluation for process model
The parameter Y depends on the testability Q of a design, the probability P of existence faulty areas in the chip and the number n of undetected faults.
The time of problem solving is determined by the follows: S x d structural depth of a circuit multiplied by the average delay of the primitive that enter into the maximum logical path and divided by clock speed f of a device. Hardware costs (see Fig. 6) are a function of the complexity: Hs,n,Hu,Hm,Hd,H* - sequencer, number of logic processors, control unit, command and data memory, Infrastructure IP for diagnosis and communication interface. In order to simplify the formula of efficiency it can be assumed that the matrix of logic processors is equivalent by the complexity the rest part of the LAMP
(Hs x n) = Hu + Hm + Hd + H. Analysis of the efficiency
En the proposed process model for diagnosing and repairing the memory block (see Fig. 12) based on the LAMP
with respect to the basic Eb realization on the universal computer uses the evaluation:
= _Eb = + Hb = (S x d)b + (Hs x n)b =
Л En Tn Hn (S x d)n (Hs x n)n
= 200 + 1000000gates = 10 + 40 = 50.
20 2(800 x 16) gates
Here it is supposed that the clock speed of basic and new products, their quality, as well as delays of primitives are equal and they eliminated from the formula for calculating
R&I, 2009, №4
39
result. The structural depth of hardware variants is equal to 200 and 20, the number of equivalent gates - 1000000 and 25600. Additive evaluation of the efficiency for using infrastructure to solve the problems of diagnosis and memory repair gives a result equal to 50. Multiplicative evaluation is almost by order of magnitude greater.
VI. Conclusion
Existing software analogs do not provide a purely vector-logical pathes for searching, pattern recognition and decision-making in a discrete information spaces [3,8]. Almost all of them use a universal command system of modern expensive CPU with math coprocessor. On the other hand the hardware dedicated tools for logical analysis, which can be considered as prototypes [1,3], typically focused on bitwise or nonvector information processing.
To eliminate the disadvantages of software analogs and hardware prototypes it is proposed the new approach for vector logic processing the associative data with complete exclusion of arithmetic operations, which influence on the performance and hardware complexity. It was successfully implemented on the basis of modern microelectronic devices in the form of multiprocessor digital system-on-a-chip.
Actual implementation of the approach is based on the proposal of infrastructure, which includes the following components: 1. Process models for the analyzing associative tables based on the use of vector logical operations for searching, pattern recognition, decision making in the vector discrete Boolean space. Models are focused on high-performance concurrent vector logical analysis of information and calculating of solution quality criteria on the basis of proposed beta-metric of cyber space. 2. A multiprocessor architecture for concurrent solving associative logic problems by using a minimal set of vector logical operations and total exclusion of arithmetic instructions. It provides high performance, minimal cost and low power consumption of LAMP, implemented in a chip of programmable logic. 3.
Novel vector logical process model for embedded diagnosing digital systems-on-chips and searching for quasioptimal coverage based on the logic associative multiprocessor, parallel operations for computing processes and calculating quality criteria.
The veracity and practical significance of the obtained results are confirmed by the creation of multiprocessor infrastructure for diagnosing and repairing memory components of digital system-on-a-chip, the theoretical proof of the metric for vector logical space and the quality criteria for estimating solutions.
Further research are focused on the developing a prototype of logic associative multiprocessor in order to solve the topical problems of searching, pattern recognition and decision making by using the proposed infrastructure of vector logical analysis.
References
[1] Y. Zorian, “Test Strategies for System-in-Package” Plenary Paper of IEEE East-West Design & Test Symposium (EWDTS’08), 2008.
[2] L. Smith, “3D Packaging Applications, Requirements, Infrastructure and Technologies”, Fourth Annual International Wafer-Level Packaging Conference, September, 2007.
[3] M.F. Bondarenko, “Brain-Like Computers”, Radielectronics & Informatics, No2, pp. 89-105, 2004.
[4] M.F. Bondarenko, “Predicate Algebra”, Bionica intellecta, No1, pp. 15 - 26, 2004.
[5] A. Acritas, “Fundamental of computer algebra with applications”, М.: Mir, 1994, 544 p.
[6] A.V. Attetkov, “Optimization Methods”, Bauman Moscow State Technical University, 2003, 440 p.
[7] J. Bergeron, “Writing Testbenches Using SystemVerilog”, Springer Science and Business Media, Inc., 2006, 414 p.
[8] D. Densmore, “A Platform-Based taxonomy for ESL Design”, Design & Test of Computers, pp. 359-373, 2006.
[9] V.I. Hahanov, “Digital System-on-Chip Design and Test”, Kharkov:Novoye Slovo, 2009, 484 p.
[10] V.I. Hahanov, “Digital System-on-Chip Design and Verification”,
Kharkov:Novoye Slovo, 2010, 528 p.
[11] A.A. Cohen, “Addressing architecture for Brain-like Massively Parallel Computers”, Euromicro Symposium on Digital System Design (DSD'04), pp. 594-597, 2004.
[12] O.P. Kuznetsov, “Simulating High-Speed Intelligent Processes of Ordinary Thinking”, Intelligent Systems,Vol.2, Issue. 1-4, 1997.
40
R&I, 2009, №4