The equivalence problem for programs with mode switching is PSPACE-complete
Rimma Podlovchenko. Dmitry Rusakov and Vladimir Zakharov
Abstract. We study a formal model of imperative sequential programs. In this model programs are viewed as deterministic finite automata whose semantics is defined on Kripke structures. We focus on the equivalence problem for some specific class of programs — programs with mode switching — whose runs can be divided into two stages. In the first stage a program selects an appropriate mode of computation. Several modes may be tried (switched) in turn before making the ultimate choice. Every time when the next mode is put to a test, the program brings data to some predefined state. In the second stage of the run. once a definitive mode is fixed, the final result of computation is produced. The effect of mode switching may be used for automatic generation of opaque predicates, i.e. boolean expressions whose behavior is known a priori. Such predicates provide a very simple and effective means for virus obfuscation: therefore the development of efficient algorithms for the analysis of programs with mode switching is an urgent task in view of designing virus detection tools. We develop a new technique for simulating the behavior of such programs by means of finite automata and demonstrate that the equivalence problem for programs with mode switching is decidable within a polynomial space. By revealing a close relationship between the equivalence problem for this class of programs and the intersection emptiness problem for deterministic finite state automata we show that the former is PSPACE-complete.
1. Introduction
The intimate linkage among automata and formal models of programs used for the purpose of translation, verification, optimization, etc. is widely recognized. Since the early 60-th it was found out [4. 5. 9. 21. 29] that common finite state, multi-tape, multi-head, push-down automata give a suitable framework for developing decision procedures/proving undecidability for many program analysis problems originated from software engineering (see [34] for a survey). In this paper we study one of such problems, namely, the equivalence problem, for some specific class of imperative programs with mode switching. The runs of such programs are divided into two stages. In the beginning of a run an
program 7Гі read (yl,y2);
x=yl; z=y2;
u=x+z; stop.
program 7Г2 read (yl,y2);
x=yl; z=y2;
if P(x,z) then {x=z; goto Ll}
else {z=z+x; goto L2};
Ll: '
z=yl; x=y2;
if P(z,x) then {u=x+z; z=x; x=u-z}
else {u=x-z; z=u+x};
stop;
L2:
z=y2; x=yl;
if P(x,z) then u=2*x-z**2;
else u=z+z;
stop.
Figure 1: Programs with mode switching
appropriate mode of computation is selected. A program may try (switch) a number of modes in turn before making the final choice. In the second stage, once some mode of computation is chosen, a program starts an ordinary run and yields the final result.
Usually mode switching is achieved by means of constant assignment statements. Two programs 7Ti and 7T2 in Fig 1. illustrate the concept of mode switching. It is easy to see that both programs are equivalent, i.e. they compute the same function u = y\ + y^. Boxed statements in these programs play the role of mode switches: when executing these statements the programs bring data to the predefined states that are specific for each mode. Our interest to the equivalence problem for programs with mode switching has both theoretical and practical motivations. In [23, 24] a theory of algebraic models of programs was 1 program introduced for the purpose of designing effective equivalence-checking techniques and complete systems of equivalent transformations of programs. In this theory programs are viewed as deterministic finite automata operating on semigroups or Kripke structures. A series of obtained results [26, 32] show that for many algebraic models of programs it is possible to design efficient (polynomial-time) equivalence-checking procedures. It was found out (see [14, 15, 16, 18|) that decidability and complexity of the equivalence problem for algebraic models of programs depend greatly on some group-theoretic properties of program semantics. That is why it is very important to know how much this or that property of a semigroups or Kripke structures used for the semantics in algebraic models of programs influences the decidability and complexity of the equivalence problem.
In the framework of the theory of algebraic models of programs the semantics of programs with mode switching can be specified in terms of semigroups with right zeroes. In [13. 14] the equivalence problem for finite automata operating on free semigroups with right zeros is considered. A decidability result was obtained by establishing the regularity preserving properties of the set-theoretic and closure operations on topological spaces of functions computed by such automata. In [20, 19] a “hard set’' method was successfully applied to the equivalence problem for linear recursive programs with constants. It must be emphasized that in the model of recursive programs constants play the same role as mode switching statements in the propositional model of sequential programs we deal with in this paper. Both equivalence-checking techniques developed in [13, 14, 19] are very much sophisticated: their main deficiency is that they give no means to estimate the complexity of the problem. One of the aims of our paper to is to estimate precisely (as much as possible) the complexity of the equivalence problem for programs with mode switching.
Another topic which involves programs with mode switching is malicious pattern (viruses) detection in software. The classic virus-detection techniques look for the presence of a virus specific sequence of instructions (virus signature) inside the programs: if the signature is found, it is highly probably that the program is infected. A new generation of rnetarnorphie viruses attempts to evade simple pattern-matching detection by using complex obfuscations: when replicating these viruses change their signatures by applying semantic-preserving transformations (see [1|). The only way to disclose such viruses is to develop efficient equivalence-checking techniques that could cope with the common obfuscating transformations. Thus, in [2| an architecture for detecting malicious patterns in executables is presented that is resilient to some obfuscating transformations (dead-code insertion and code transposition). But the properly resistant obfuscations [3] rely on the existence of opaque predicates whose behavior is known a priori to the obfuscator, but which is difficult for the deobfuscator to deduce. The most simple way to get the opaque predicates is to make the same predicate computable on the same data but in different program points. This may be achieved by bringing data into some fixed state before computing such predicate several times along a run. But this is just the effect of mode switching. When considering the program 7t2 depicted in the Fig. 1 one could find that both underlined conditions in the branching statements are opaque predicates obtained with the help of mode switching: one of them is always evaluated to true whereas the other to false. Thus, to detect a metamorphic virus (the program 7t2) by its signature (the program 7Ti) one need an effective equivalence-checking procedure which could cope with mode switching.
Ill
In our paper we reveal dose relationships between the equivalence problem for programs with mode switching and the Intersection Emptiness Problem (IEP) for deterministic finite state automata (DFAs). The Intersection Emptiness
Problem for DFA is that of checking, given a collection of k DFAs Fi,... ,Fk of
k ?
size \Fj | = n. if their intersection is empty: p| L(Ai) ^ 0. where L(F) denotes
i — 1
the language accepted by the automaton F. If the parameter k is a constant then the problem has a polynomial time algorithm, but the general problem, where this parameter can depend on the input size (say. k = n). is much harder, known to be PSPACE-complete [12].
Some recent papers testify that the IEP would have a substantial impact on many aspects of complexity theory and formal verification of complex systems. In [10] it was proved that integer factoring of an n-bit number is solvable in time n0^ ■ 2£n for any t > 0. provided that one can decide the IEP for a family Fi,...,Fk of DFAs F, of size n in time t(n, k) = n'7^+d, where /(•) is an unbound function, and d > 0 is a constant. Moreover, by assuming that there is a non-uniform circuit that will solve the IEP with size t(n, k), one could deduce NLOG ^ NP. In [28] the IEP was used to demonstrate that a great many verification problems of supervisory controllers for discrete-event systems are PSPACE-complete. We think that the results of our paper also give a new insight into the IEP.
The paper is organized as follows. In Section 2 we define formally the syntax and the semantics of propositional sequential programs. In Section 3 the model Mo which captures the semantics of programs with mode switching is introduced. We also reduce the equivalence problem 7r' ’k" f°r programs with mode
switching to that of checking three characteristic properties of the runs of ty', 7r". In Section 4 we show that these properties can be verified by constructing a finite number of DFAs and checking their emptiness. The DFAs we use for this purpose are similar to Vectorized Finite State Automata introduced in [11] for natural language processing. A vectorized DFA operates on a tape divided into N tracks and its internal states are vectors s = (t>i,..., vn). Some tracks may be synchronized: it is required that the input letters on the synchronized tracks should be unified (in our case this means that the letters on these tracks should be the same). The synchronization of tracks varies along a run of DFA. Vectorized DFAs have the same computation power as common DFAs and the emptiness problem for them is NLOG-complete. Since the state space of DFAs we use is exponential of the size of programs to be analyzed, we arrive at the conclusion that the equivalence problem ir' ~m0 ’k" in PSPACE. In Section
5 we reduce the IEP to the equivalence problem for programs mode switching and establish thus the PSPACE-completeness of the latter. We conclude with discussing some new research problems caused by the results obtained.
2. Preliminaries
In this section we define the syntax and the semantics of prepositional sequential programs.
Fix two finite alphabets A = {cii,..., ar}. V = {pi,... ,Pk}- The elements of A are called basic statements', they stand for assignment statements in imperative programs. The elements of V are called basic predicates: they stand for elementary built-in relations on program data. Each basic predicate may be evaluated by 0 (false) or 1 (true). A tuple (d'i,... ,6k) of truth-values of basic predicates is called a condition. The set of all conditions is denoted by C: we write Ai, A2,.. . for generic elements from C.
Definition 1. A deterministic prepositional sequential program ('PSP for short) is a finite transition system. 7r = (V, entry, exit, T, B), where
• V is a non-empty set of program points;
• entry is the initial point of the program.;
• exit is the terminal point of the program.;
• T: (V — {exit}) x C —> V is a (total) transition function;
• B: (V — {exit}) —> A is a (total) binding function.
A transition function represents the control flow of a program, whereas a binding function associates with each point some basic statement. By the size |7r| of a program 7T we mean the cardinality of the set V. Any finite sequence of points v„ such that for every i, 1 < i < n, t><+i = T(vj, A t) holds for some condition A,;, is called a control path (or a trace) in the PSP n from v\ to vn. We say that vn is reachable from vi if there exists a trace from vi to vn.
The semantics of PSPs is defined with the help of Kripke structures used in the framework of dynamic logics.
Definition 2. A Kripke structure is a quadruple M = (S, so, R, £}, where
• S is a non-empty set of data states;
• so € S is a distinguished initial state;
• R : A x S —> S is a (total) updating function;
• £ : S —> C is a (total) evaluation function.
An updating function R gives the interpretation of basic statements: a data state R(a, s) is the result of application of a basic statement a to a data state s. An evaluation function £ is used for the interpretation of basic predicates: £(s) gives a tuple of truth-values for all basic predicates on a data state s. By a data path in M we mean any sequence of states si, s2, ■ ■ ■, Sfc such that s,;+i = R{di, Sj ) for some basic statement a,:. 1 < i < k.
Let 7r = {V, entry, exit, T, B) be a PSP and M = (S, so,R, £} be a Kripke structure. A run of 7T on M is a sequence (finite or infinite) of pairs
r{TT,M) = (i’l, Si), {v2, s2), {Vi, Si), {vi+1, si+1),... (1)
such that
1. s0 is the initial state of M, and v\ = entry:
2. Si = R(B(vi), s,;_i) and = T(i’i,£(si)) hold for every i, i > 1:
3. the sequence r(tt, M) either is infinite (in this case we say that the run loops and yields no results), or ends with a pair (vn,sn, An) such that t>n+i = exit (in this case we say that the run terminates and gives a result sn).
We write J. r(7T, M) to indicate that the run terminates and denote by [r(7T, M)\ its result assuming that the result is undefined when r(ir, M) loops. It is worth noting that if Vi = Vj and s,: = Sj for some pair of triples in (1) then r(ir,M) loops. If a point Vi occurs in some triple of (1) then we say that r(ir, M) passes via i’i.
In what follows when referring to a m.odel of programs M we mean the set of all PSPs over fixed alphabets A, V whose semantics is specified by the set M of Kripke structures.
Definition 3. Given a m.odel of program.s A4, we say that PSPs tx\ and are equivalent (ni ^_A4 in symbols) iff[r(ni, M)\ = [?’(7T2, M)\ for every M e M.
The equivalence problem for a model of programs M is to check, given a pair of PSPs 7Ti and 7r2. whether 7Ti ty2 holds. The complexity of the equivalence problem ,r7Ti ~m 7r2?“ depends on the set of structures M which specifies a model of programs. Two examples below illustrate this thesis.
Example 1. Given a set calA of basic statements, consider a free semigroup {A, o) generated by A. The elem.ents of this semigroup may be thought of as finite sequences (words) of basic statements, whereas binary operation o may be interpreted as concatenation. The empty sequence lambda stands for the neutral element of the semigroup. Then the equivalence problem, for the model of programs M' = {(caM*, A, o, : £ is an evaluation function on A is decidable in time O(nlogn). In [9, 29]it was demonstrated that the equivalence problem, for M' is reducible in, linear time to the the equivalence problem, for deterministic finite automata; the complexity O(nlogn) of the latter was established in, [8]. '
Example 2. Given a set A = {cii,..., an, a^1,..., a,^1} of basic statements consider Abelean group (S', o) of rank n generated by the elem.ents from. A. The equivalence problem, for the model M' = {(S, e, o, £} :
£ is an evaluation function on S, where e is the unit (neutral) element of S', was studied in, [16]. It was shown, that this problem, is decidable within exponential space when n = 1, and it is undecidable when n > 2.
Other results on the equivalence problem for some models of programs may be found in [16, 17, 18, 25, 26, 27, 34, 35, 36].
3. Programs with mode switching
In this paper we focus on the equivalence problem for some specific class of programs whose runs can be divided into two stages. In the first stage a program selects an appropriate mode of computation. Several modes may be tried in turn before making the ultimate choice. Every time when the next mode is put to the test, the program brings the data back to the initial state. In the second stage, once a definitive mode is fixed, the final result of computation is generated. In real programs mode switching may be implemented by restart statements or constant assignment statement. In this section we introduce formally the model of such programs in the framework of PSP's syntax and semantics.
We will assume that the set of basic statements A is partitioned into two subsets Aord = {a1, • • •, afc} (ordinary actions) and Amode = {b1, • • •, bN} (mode switches). Those points v in a PSP 7r that are associated with mode switches (i.e. B(v) e Amode) are called switching points. All other points are called ordinary points. We write Vmode to denote the set of all switching points of a given program 7r. Without loss of generality, we will assume that entry G Vmode-
Two principles are used as the basis for the semantics of PSPs with mode switching:
• each ordinary action a is interpreted according to a current mode of computation, and
• each mode switch b abandons any previous intermediate result of computation and brings data into some distinguished state s&.
Thus, the model of programs with mode switching is characterized by the set of all Kripke structures Mo = {M% : = (S, s0, R, £}} such that
1. S = {t} U AmodeA*ord, i.e. S includes the empty string A and all strings 60102 • • • On, where a mode switch b £ Amode is followed by a string of ordinary actions 01O2 • • • an € A*rd:
2. so = A:
3. an updating function R is defined as follows:
Since the data space S and the interpretation of basic statements R are fixed, each structure £ Mo is completely specified by its evaluation function £. PSPs with mode switching tti and 7t2 are called equivalent (ni ~x0 7t2 in symbols) iff [r(7Ti, M^)] = [r(7T2, M%)\ holds for every structure £ Mo- When studying decidability and complexity of the equivalence problem for Mo we will use the inverse variant of this definition: PSPs 7Ti and 7t2 are not equivalent iff there exists a structure £ M such that either both runs and
?’(7t2,M^) terminate but [r(7Ti,M^)] ^ [?’(7r2, M^)], or one of the runs (say. r(7Ti,M^)) terminates, whereas the other (in our case. r(7T2,M^)) loops.
Given a PSP 7T. we introduce the skeleton Gw of tt as a finite directed graph intended for representing the reachability relation between the switching points in 7r. Formally, Gw = (U,E), where U = Vmode U {exit} is the set of vertices and E = : v',v" £ U, and v" is reachable from v' in 7r by a trace
which does not pass via any switching point other than v' and v"} is the set of arcs. A trajectory is any directed path in Gw (finite or infinite) which begins in the entry point. A trajectory reflects a possible scenario of mode selection in the course of some run of ir. We say that a run r(ir, M%) of a PSP 7r on a structure traverses the skeleton Gv along a trajectory v\, v2,..., vn,... if r(7r, M€) passes via switching points Vi, v2,.. ., vn,... in order. The proposition below follows from the definition of Mo and states the principal property of PSP's runs on the structures under consideration.
Proposition 1. If a run, r(n, M%) of a PSP 7r traverses the skeleton Gw along a trajectory Vi, v2, ■ ■ ■, i’i, ■ ■ ■, i’j,... such, that v.j = Vj then, r(n, M^) loops.
A trajectory vn in a skeleton Gw is called
• repetition-free if it does not pass twice via the same vertex:
• complete if it is repetition-free and ends in the node vn = exit.
Thus, a terminating run of 7r traverses the skeleton Gw only along a complete trajectory, while a looping run of 7r may traverse Gw along either some repetition-free non-complete trajectory (in this case we say that the run loops on ordinary points of 7r), or some infinite trajectory (in this case we say that the run loops on switching points of ir). By Proposition 1, in order the latter case to happen a run of 7r should traverse the skeleton along a trajectory Vi, 'I’i+i, • • • such that t>< is a repetition-free trajectory and
Vi+l € {l’l,V2, .
The proposition below is but a restatement of the equivalence problem ir' ~m0 7t" in terms of some properties of trajectories in the skeletons of the PSPs.
Proposition 2. PSPs n' and n" are not equivalent on, Aio iff there exists a pair of repetition-free trajectories w' = v[, v'2,..., v'n and w" = v", v2,..., v'^ in, the skeletons G^i and G^» such, that for some structure the runs r(n', M^) and r(n", M^) traverse the skeletons along the trajectories w' and w" respectively, and meet one of the following requirements:
Rl: both, trajectories are complete and [r(n', M^)\ ^ r[(n", M^)\;
R2: one of the trajectories (say, w') is complete, whereas the other (w") is traversed by the run, which, loops on, ordinary points;
R3: one of the trajectories (say, w') is complete, whereas the other (w" ) is traversed by the run, which, loops on, switching points.
This proposition provides a foundation for the following equivalence-checking strategy: given a pair of PSPs, guess a complete trajectory in the skeleton of one PSP and a repetition-free trajectory in the skeleton of the other: then check if there exists some structure to satisfy one of the requirements R1-R3 above. Since the number of repetition-free trajectories in the skeletons is finite, the equivalence problem for PSPs is reduced thus to the analysis of trajectories in their skeletons. Next we will show that the latter can be carried out with the help of DFAs.
4. Using DFAs for the equivalence-checking of programs with mode switching
The problem we deal with in this section is as follows: given a pair of repetition-free trajectories w' and iv" in the skeletons of PSPs ti' and 7r", check if there exists a structure M.% which complies with at least one of the requirements R1 R3 in Proposition 2. We demonstrate that to retrieve an appropriate structure one could construct the specific DFAs D\, D2 and D3 and check their emptiness: the requirement Ri. i= 1,2,3, can be satisfied by some iff L(Dj ) ± 0. First we discuss the key ideas of our construction of the DFAs Dj and briefly describe how they operate. Then we present a detailed description of D\ and explain what minor modifications should be made to convert D\ into D2 and
D3.
When guessing a structure which makes it possible to traverse the skeletons of 7r' and 7r" along the trajectories w' and w" one may rely on the basic property of the semantics of PSPs with mode switchings: each mode switch b abandons the achieved data state s and brings data into the predefined state st. Thus, the traversing of each arc (u, v) begins with some fixed data state which depends only on a mode switch assigned to the point u. This enables us to try all arcs of the trajectories independently in attempt to find for each arc (u, v) a specification Specuv which provides the reachability of the switching point v from the switching point u in the course of some run. The specification Specuv imposes constraints on a an evaluation function £ on some data path b, bai, baia2, ..., where b = B(u) and cii, a2, ■ ■ ■ e Mord■ We seek to define the specification so that for any structure which satisfies Specuv the run r(Tv',M^) (or depending on the trajectory (u, v) belongs to) when starting from
the point u reaches the point v. As soon as the specifications Specuv are developed for all arcs of the trajectories, we may compose the objective structure M?.
When building up Specuivi and SpecU2V2 for a pair of arcs (ui,vi), (u2,v2) it should be seen that the specifications are consistent, i.e. impose the same constraints on £ on the same data states. Two cases are possible depending on the mode switches assigned to u\ and «2-
1. The switching points u\ and u2 are associated with distinct mode switches, i.e. B{ui) = hi ^ b2 = B(u2). This implies that the data paths the specifications SpecUlVl and Specu.2V2 refer to are disjoint. Therefore these specifications are always consistent and may be built up independently.
2. The switching points u\ and u2 are associated with the same mode switch, i.e. B{ui) = £>(«2) = b. Then the specifications SpecUlVl and SpecU2V2 may refer to the same data path. Therefore, special care must be taken to coordinate (synchronize) the development of such specification. But as soon as the data paths these specifications deal with diverge, they will never refer to any other common data state and the synchronization may be ceased.
Thus, all specifications Specuv can be built up in parallel provided that some appropriate synchronization is used to ensure their consistency. This parallel synchronized derivation of the specifications can be implemented by some DFA D. The internal states of D keep only track of the following information:
• A tuple [i’i, ... ,vn] of points in 7r' and 7r". Every element is a point in 7r' or 7r" which is currently achieved in attempt to route a trace from the switching point u to the switching point v for some arc (u, v) in the trajectories «/ and w".
• Synchronization table H. It is used to provide the consistency of those specifications that refer to the same data states. The synchronization table may be viewed as a finite set of pairs (ei,e2) of arcs. Every such pair when being set into the table H indicates that the corresponding specifications Speci and Spec2 need coordination to maintain their consistency.
On each computation step D reads as input (guesses) a tuple of conditions [Ai,..., Ajv]. These conditions give rise to new constraints that should be added to the specifications: each A,: is viewed as a possible value of £ on the currently achieved data state. The synchronization table H is used to check the identity of those constraints that should be added to coordinated specifications. Then DFA D computes the updated tuple of points AiT(vn, Ajv)]
and modifies the synchronization table H by using the transition functions T and the the binding functions B of both PSPs ir' and 7r". This results in transition of D to the next internal state. To complete the computation step D checks whether the new internal state satisfies some objective condition: if this is the case then D accepts. The acceptance implies the existence of a structure such that w', w" and comply with one of the requirements R1-R3 (depending on the objective condition to be checked).
Now we consider a DFA D\ intended for checking the satisfiability of the requirement R1 and describe this DFA in more detail.
Let n' = (V, entry7, exit, T, B) and n' = (U, entry", exit, T, B). To simplify the notation we will assume that V n U = {exit}, and both PSPs have the same transition function T and the same binding function B defined on V U U.
Let w' = Vl,l’2, , vn, exit and w" = t’n+l, t’n+2, • • • , 'I’n+m, exit be complete
trajectories in the skeletons of ti' and 7r" respectively.
Let H(n,m) denotes the set of all (unordered) pairs (i,j), 1 < i,j < n + m, i ^ j, and H0{n,m) = : (i,j) € H{n,m),B{vi) = B{vj)}. The set
Ho(n,m) indicates all those pairs of switching points in w' and iv" that are associated with the same mode switch.
Then Di = (S, Q, (jo, accept, (5), where
• £ = Cn+m is the input alphabet:
• Q = {accept, reject} U (Vn x Um x ) is the set of internal states
of Dr,
• qo = (i’i,.. ., vn, t’n+i,..., vn+m, Ho(n, to)} is the starting state of D\\
• accept is the accepting state of Di\
• 6 : Q x S —> Q is the (partial) state transition function.
Suppose q = («i,..., un+m, H) is any internal state of Di and ^ = (Ai,..., An+m) is any tuple of conditions. The state transition function
6 is defined according to the following rules (we provide each rule with a brief explanation of its intended meaning).
1. If Ai ^ Aj for some pair (i,j) e II then 6(q,z) = reject (Comm: the
tuple ^ does not comply with synchronization request: the constraints
imposed on £ are inconsistent):
2. Otherwise, consider the tuple of points (u[,..., u'n+m) such that
, _ f T(ui, Aj), if is an ordinary point, or q = qo,
^ * y* * * j 11 * * i t / lor "V C3I*"V
[ Ui, if Ui is a switching point and q ^ qo,
1 < i < n + m.
Two cases are possible.
Case 1: all points 1 < i < n, and m', n+1 < j < n + m, are switching points, and at least one of the points u'n and u'n+m is either a switching point, or exit. Then
(a) if u'n or u'n+m is a switching point, or if there exists i,
1 < i < n + to, i ± n, such that v!i is a switching point, but u'j ^ then 8(q,z) = reject (Comm: this means that the
automaton D\ “went of the trajectory when “laying off' a trace either from vn-i and vn+m-i to exit, or from Vj to M;+i):
(b) if u'j = m,:+i holds for all 1 < i < n + m, i ^ n, and < = u'n+m = exit, and (n,n + to) € H then S(q,z) = reject (Comm,: this means that D\ built a specification of a structure Mg such that [r(Tv',Mg)] = [r(n", M^)]);
(c) otherwise S(q,z) = accept (Comm,: this means that D\ built a specification of a structure Mg such that both runs r(Tv',Mg) and r(7r", Mg)] traverse the skeletons along the trajectories w' and tv", but [r(Tr',Mg)} ± [r(7r",M?)]).
Case 2: at least one point 1 < i < n + m, i ± n, is an ordinary point, or both u'n and u'n+m are ordinary points. Then
6(q,z) = (wi,..., u'n+m, H'), where
H' = H — {(i,j) : both v!.hv!j are ordinary points, and B(?4) ^ £(«'•)}
is a new synchronization table. (Comm,: this means that the traversing along some arcs in the trajectory is not completed yet: £>(«•) ^ B(uj) implies that data paths in the specifications SpecViVi+1 and SpecVjVj+1 diverge and the synchronization for these specifications is of no further consequence). □
The DFA Di thus defined may be thought of as a one-way finite state machine operating on a tape divided into n + m tracks. On the i-track it simulates some fragment r,: of the runs of PSPs ir' and n". Such fragment = [vj, (nf+1, s.f+1, A,f+1) begins with a triple whose point vj is
either a switching point or entry, and ends with a triple whose point nf+1 is either a switching point or exit. The synchronization tables of D\ ensure that only those pairs of runs of ir' and ir" are simulated that could be performed on the same structure. The accepting conditions allow D\ to accomplish successfully a simulation of the runs iff the last fragments of these runs yield different results. Thus we arrive at
Proposition 3. Let w' and w" be complete trajectories in, the skeletons of ir' and ir" respectively. Let DFA D\ be as specified above. Then, L(D\) ^ 0 iff there exists a structure Mg such, that the runs r(ir\^) and r(ir", Mg) traverse the skeletons along the trajectories w' and w", and [r(ir', Mg)\ ^ [r(ir", Mg)\.
It is suffice to introduce only some changes in the acceptance and rejection rules (Case 1) to transform a DFA D\ intended for checking the requirement Rl into the DFAs D2 and D3 that would be responsible for the requirements R2 and R3.
To construct D2 one has to set off in 7r" strongly connected components that are free from switching points. Let
Vinf = {v : there is a cycle in n"
which contains no switching points and passes via
Then D2 operates as follows: after constructing the tuple of points
{u[,..., u'n+m) (see item 2 in the description of D1), it
• rejects (rule 2(b)) if ufj = Vi+i holds for all 1 < i < n + m, i ^ n, and
u'n = exit, and u'n+m is either a switching point or exit.
• accepts (rule 2(c)) u'i = w,:+i holds for all 1 < i < n + m, i ^ n, and u'n = exit, and u'n+m € Vinf.
It is worthy of notice that when at some computation step D2 accepts after constructing the tuple (v2, • • •, vn, exit, t>„+2, • • •, vn+m, u) such that u € Vinf, this should be interpreted as 7r" is presented with an unbound possibility to continue an infinite (looping) run along some cycle on ordinary points.
Similarly to D2, after constructing the tuple of points {u[,..., u'n+m)
D3 rejects (rule 2(b)) if u'n+m is either exit, or a switching point
other than vn+i,... ,vn+m, and accepts (rule 2(c)) if u'n = exit and
un+m € {'i’n+l, • • • , Vn+m}-
Proposition 4. Let w' be a complete trajectory and w" be a repetition-free trajectory in the skeletons of tt' and 7r" respectively. Let DFA D2 and D3 be as specified above. Then
1. L(D2) ^ 0 iff there exists a structure Mg such, that the runs r(n', Mg) and r(ir", Mg) traverse the skeletons along the trajectories w' and w", and r(n', Mg) terminates, whereas r(n”, Mg) loops on, ordinary points of tt".
2. L(D3) ^ 0 iff there exists a structure Mg such, that the runs r(n', Mg) and r(ir", Mg) traverse the skeletons along the trajectories w' and w", and r(n', Mg) terminates, whereas r(ir", Mg) loops on, switching points of tt".
Theorem 1. The equivalence problem, tt' tt" for PSPs with, m.ode switching is decidable in polynomial space.
Proof. Due to Savitch’s theorem [30] we can convert any nondeterministic polynomial space algorithm into deterministic one. Therefore it will suffice to design a nondeterministic polynomial space procedure for checking non-equivalence of PSPs with mode switching. It is as follows. Given a pair of PSPs tt' and tt" it builds their skeletons G^i, G^n and guesses a complete trajectory w' in one of the skeletons and a repetition-free trajectory in the other. Then it checks the emptiness of DFAs Di, D2 and D3 corresponding to tt' , tt" and the selected trajectories. Propositions 2-4 guarantee that tt' '/'Mo ’K" iff f°r some pair of trajectories one of the DFAs Di, i = 1,2,3, is non-empty. As may be seen from the descriptions of DFAs Di, these automata has 0(2poly<'n')) states, where n = \tt'\ + 17r//1. Hence, their non-emptiness may be certified within a polynomial space. □
5. Complexity issues
The complexity of decision procedure above can not be improved to a large extent.
Theorem 2. The equivalence problem, for PSPs with, m.ode switching is P SPACE-complete.
Proof. We demonstrate how to build in linear time, given a family of n DFAs Fi, 1 < i < n, of size \Fi\ = n, a PSP tt of size 17r| = n2 such that the
intersection p| L(Fi) = 0 iff tt has no terminating runs, i.e. ttq is equivalent to
the empty PSP.
Without loss of generality, we may assume that each DFA F/ operates on binary input alphabet {0,1}, i.e. F., = (Qi,5i,q{- , where Qi is a finite state of states, Sj : Q x {0,1} —> Q is a (total) transition function, (f- € Qi is the starting state, and Q' C Qj is the set of final states. Let •Amode = {&}, Aord = {®}i and 'P = {PhPi}- For every DFA Fj we build the PSP TTj = (Qj LJ {(lead, entry,, exit,}. entry,, exit,. I,. //,’) whose transition function T.j is defined as follows for every q G Q and x € {0,1}:
n
i= 1
3. Tj(dead, A) = dead.
It is easy to see from the construction of 7T; that a binary string
w = X1X2 ■ ■ - Xk, k > 0, is accepted by F.j iff r(7T,:, Mguj) terminates on a structure Mg_w such that vu,(6(a)fc) = (xk, 1} and vu,(6(a)*) = (x,;, 0} for all 1 < i < k.
The PSP no is obtained from the family of PSPs 7Ti, ..., n„ by identifying each point exit,;, 1 < i < n — 1, with the point entryi+1. It is easy to verify that a binary string iv is accepted by every DFA F.j iff j r(7r,;, ) hold for all PSPs
7T-i iff I r(7T0,M?uJ).
Finally, consider any PSP 7Tempty such that the terminal point exit is unreachable from the initial point entry in 7Tempty (PSPs of this kind are called empty PSPs). Clearly, an empty PSP has no terminating runs, and therefore
n
p| L(Fi) = 0 iff 7To ^empty• Thus the Intersection Emptiness Problem for
i— 1
DFAs, which is known to be PSPACE-complete, is reducible in linear time to the equivalence problem for PSPs with mode switching. □
6. Conclusion and Future Work
We demonstrated that the equivalence problem for programs with mode switching is PSPACE-complete. Actually, the semantics of these programs makes it inevitable to face the Intersection Emptiness Problem for DFAs. This gives us a decision procedure which is confined to the manipulations with finite automata but at the expense of fairly large complexity. One could find some analogy between our construction of DFAs Dj and the ’parallel stacking" technique introduced in [31]. The results obtained pose a number of open problems and we discuss some of them.
One of the most simple class of programs with undecidable equivalence problem was studied in [22]. The programs from this class are composed of four basic statements A = {ai, a2, 61, b2}. In the framework of PSPs the semantics of such programs is specified by the set of structures M2 = {M' = (N2, (0,0), R', £}}, where N = {0,1, 2,...}, R'(cii, (n, to)) = (n+1, to), R'(ci2, (n, to)) = (n, m+1), R'{bi,{n,m)) = (0, to.), R'(b2,(n,m)) = (n, 0). Notice that constant assignment statements 61,62 are but a sort of a (separate) mode switches. In [7, 22] it was proved that two-head finite state automata can be simulated by the PSPs in the model M2 and this brings the equivalence problem for M2 into undecidability.
In essence. M2 captures two main features of the real program semantics: the effect of commutativity of some statements (ai and a2) and the effect of constant assignments (&i and b2). By divorcing these effects from each other we arrive at the model of programs with commutative statements M1 and the model of programs with mode switching Mo- In [26, 32] it was shown that the equivalence problem for Mi is decidable within a time 0(n2 log??.). In our paper we proved that the same problem for Mo is PSPACE-complete. In order to make a boarder between decidable and undecidable cases more precise we wonder: what is the complexity of the equivalence problem for the model Moi which may be placed between M2 and both Mo and M{? In contrast to M2, which involves two separate mode switches &i and b2, we restrict ourselves in Moi only with a joint mode switch b0 such that R' (bi, (n, m)) = (0, 0). Clearly, Moi is less expressive than M2. The equivalence problem for Moi was studied in [6|, but its complexity is still unknown.
The equivalence-checking algorithms worked out in the framework of this model of programs may be used as the basis for the deobfuscation tools aimed at detecting metamorphic viruses. This could stimulate the development of more practical and efficient equivalence-checking procedures for Mo than that from Theorem 1. Thus, for example, we wonder, if it is possible to check within a polynomial time the equivalence tt' it1 providing that every mode switch
occurs in 7r', 7r" at most k times and k is fixed.
References
[1] M. Christodorescu, S. Jha. Static analysis of executables to detect malicious patterns. In Proceedings of the 12th USENIX Security Symposium (Security’03), 2003, p. 169-186.
[2] M. Christodorescu, S. Jha, S.A. Seshia, D. Song, R. E. Bryant. Semantics-aware malware detection. In Proceedings of the 2005 IEEE Symposium on Security and Privacy (Oakland 2005), 2005, (to be published).
[3] Collberg C., Thomborson C., Low D., Manufacturing cheap, resilient and stealth)' opaque constructs. Symposium, on Principles of Programming Languages, 1998, p. 184-196.
[4] A.P.Ershov, Theory of program schemata. In Proc. of IFIP Congress 71, Ljubljana, 1971, p.93-124.
[51 E.P. Friedman. Equivalence problem for deterministic context-free languages and monadic recursive schemes. J. Com,put. and Syst. ScL, 14. X 3, 1977. p. 344-359.
[6] A.B. Godlevsky. Some special cases of the termination and equivalence problems for automata. 1973. X 4. p. 90—98 (in Russian)
[7] A.B. Godlevsky. On the one case of special problem of functional equivalence for discrete transducers. Cybernetics. 1974. X 3. p. 32-35 (in Russian)
[8] J.E. Hopcroft. R.M. Karp. A linear algorithm for testing equivalence of finite automata. Technical Report TR 71-114. Cornell University. Computer Science Dep.. 1971.
[9] Ianov Iu I.. On the equivalence and transformation of program schemes Communications of the ACM, 1:10 (1958). 8-12.
[10] G. Karakostas. R.J. Lipton. A. Viglas. On the complexity of intersecting finite state automata and XL versus XP. Theoretical Computer Science, 302. 2003. p. 257-274.
[11] A. Kornai. Vectorized finite state automata. In: Proceedings of the W1 workshop of the 12th European Conference on Artificial Intelligence, Budapest. 1996. p. 36-41.
[12] D. Kozen. Lower bounds for natural proof systems. In 18th Annual Symposium on Foundation of Computer Science, IEEE. 1977. p. 254-266.
[13] A.A.Letichevsky. On the equivalence of automata with final states on the free monoids having right zero. Reports of the Soviet Academy of Science, 182. 1968. X 5 (in Russian).
[14] A.A.Letichevsky. Functional equivalence of discrete transducers. Cybernetics, 1970. X 2. p. 14-28.
[15] A.A.Letichevsky, Functional equivalence of finite transducers. Ill, Cybernetics, 1972, X 1, 1—4 (in Russian)
[16] A.A.Letichevsky, On the equivalence of automata over semigroup, Theoretic Cybernetics, 6, 1970, 3—71 (in Russian).
[17] A.A.Letichevsky, Equivalence and optimization of programs. In Programming theory, Part 1, Xovosibirsk, 1973, 166—180 (in Russian).
[181 A.A.Letichevsky. L.B.Smikun. On a class of groups with solvable problem of automata equivalence. Sov. Math. Dokl.. 17. 1976. X 2. 341—344.
[19] L.P. Lisovik. Methalinear schemes with constant assignment. Programming and Computer software. X 2. 1985. 29—38.
[20] L.P. Lisovik. Hard sets method and semilinear reservoir method with applications. Lecture Notes in Computer Science, 1099. 1996. p. 219-231.
[21] D.C.Luckham. D.M.Park. M.S.Paterson. On formalized computer programs. J. Com,put. and Syst. Sci., 4. 1970. X 3. p.220-249.
[22] G.X.Petrosyan. On one basis of statements and predicates for which the emptiness problem is undecidable. Cybernetics, 1974. X 5. p.23-28 (in Russian).
[23] R.I.Podlovchenko. The hierarch)' of program models. Programming and Computer Software. 1981. X 2. 3—14.
[24] R.I.Podlovchenko. Semigroup program models. Programming and Computer Software. 1981. X 4. 3—13.
[25] R.I.Podlovchenko. On the decidability of the equivalence problem on a class of program schemata having monotonic and partially commutative statements. Programming and Computer Software. 1990. X 5. 3—12.
[26] R.I.Podlovchenko. V.A.Zakharov. On the polynomial-time algorithm deciding the commutative equivalence of program schemata. Reports of the Soviet Academy of Science, 362. 1998. X 6 (in Russian).
[27] R.I.Podlovchenko. On program schemes with commuting and monotonic statements. Programming and Computer software. X 5. 2003. 46-54.
[28] K. Rohloff. S. Lafortune. On the computational complexity of the verification of modular discrete-event systems. In Proceedings of IEEE Conference on Decision and Control, 2002. Las Vegas. XV. Dec.. 2002.
[29] J.D.Rutledge. On Ianov’s program schemata. Journal of the ACM, 11. 1964. p.1-9.
[30] W.J. Savitch. Relationships between nondeterministic and deterministic space complexities. Journal of Computer and System, Science, 4. X 2. 1970. p. 177-192.
[31] L.G.Valiant. The equivalence problem for deterministic finite-turn pushdown automata. Information and Control, 25. 1974. p.123-133.
[32] V.A. Zakharov. An efficient and unified approach to the decidability of equivalence of propositional program schemes. Lecture Notes in Computer Science, 1443, 1998. p. 247-258.
[33] V.A. Zakharov. On the decidability of the equivalence problem for monadic recursive programs. Theoretical Informatics and applications. 34. X 2.
2000. 157-171.
[34] V.A. Zakharov. The equivalence problem for computational models: de-cidable and undecidable cases. Lecture Notes in Computer Science, 2055.
2001. 133-153.
[35] V.A. Zakharov. I.M. Zakharyaschev. An equivalence-checking algorithm for polysemantic models of sequential programs. Proceedings of the International Workshop on Program Understanding (14-16 July, Altai Mountains. Russia). 2003. 59—70.
[36] V.A. Zakharov. I.M. Zakharyaschev. On the equivalence checking problem for a model of programs related with muti-tape automata. Lecture Notes in Computer Science. 3317. 2005. 293—305