Program obfuscation as obstruction of program static analysis

Ivanov K.S.; Zakharov V.A.

Program obfuscation as obstruction of program

static analysis

Kirill S. Ivanov, Vladimir A. Zakharov

Abstract. By obfuscation we mean any efficient semantic-preserving transformation of computer programs aimed at bringing a program into such a form, which impedes the understanding of its algorithm and data structures or prevents the extracting of some valuable information from the plaintext of a program. Some results obtained recently testify that it is very much uneasy to develop a provably secure program obfuscator unless a capability of an adversary is restricted weakly. But in many practical cases when program re-engineering activity is restricted only to the application of available static analysis algorithms an acceptable security of obfuscation can be achieved by guaranteeing the obfuscated programs to obstruct completely some definite static analysis technique. This approach to designing program obfuscators is commonly used now though no rigorous study of its efficiency and security is made so far. In this paper we give a preliminary framework for studying formally program obfuscators as tools for the obstruction of program static analysis. We present a criterion which provides a way of estimating the actual efficiency of static analysis obstruction and demonstrate with example of two typical static analysis algorithms (Andersen’s points-to analysis algorithm and Binkley-Horwitz-R.eps slicing algorithm) how to use our approach in practice.

1. Introduction

Protection of software against intelligent tampering and unauthorized purposeful modifications is one of the central issues in computer security. An advent of global computer networks has given rise to new computational environments which provide remote execution, distributed computing, code mobility etc. Although new models of computation bring great flexibility and new promises to the world of computing they are accompanied with a set of new problems that were not present when computation was carried out for the most part on local stand-alone machines.

One of the most difficult security problem is that of protecting a code from an untrustworthy environment which is known as malicious host problem. One faces the challenge of the malicious host problem every time when downloads or installs a code on a host computer which is shared by a second party. In this case a client program becomes a subject of various threats, such as

• software piracy, the illegal copying and resale of applications;

• malicious reverse engineering, the illegal extraction of valuable pieces of code for the purpose of unauthorized using of them in other applications;

• program tampering, the illegal extraction of valuable information from a code for the purpose of unauthorized using of the code.

Unlike many similar problems (say, malicious code problem) the malicious host problem has not been studied very much intensively and rigorously. Various methods and tools are widely used for the purpose of software protection, including sophisticated security policies, network filters, cryptosystems, tamper-resistant hardware, etc. [1, 12, 13, 19]. But no matter how powerful these techniques may be, they don’t cover the case when an adversary, having in mind to make an illegal modification of a program or to gain some valuable knowledge about algorithms or data structures, gets an access to the plaintext of a program. The current trends in software engineering and communication technology make it common to distribute software in such a form that hold most of information presented in the program source code. This increases drastically the risk of reverse engineering attacks aimed at extracting secret information from a program. An important example is Java bytecode. Java applications are distributed as Java class files, hardware-independent virtual machine codes that retains almost all information of the original Java source. The customary cryptographic tools can be very effective for protecting programs from illegal usage at the stage of their distribution. But when a program is decrypted, it becomes extremely vulnerable to software pirates seeking for private information (passwords, data-keys, etc.) or valuable pieces of code to incorporate them in their own applications.

In these cases the only way to prevent such malicious activity is to convert a program into some tamper-resistant form, which has the property that understanding and making purposeful modifications to it are rendered difficult while its original functionality and efficiency are preserved. Program transformations of this kind are called obfuscating transformations.

Obfuscation attempts to transform a program into an equivalent one that is harder for understanding and reverse-engineering. The concept of program obfuscation was introduced in [8]. In this paper an obfuscator is defined informally as any efficient probabilistic compiler O which transforms any source program 7r into an equivalent program O(tt) which is substantially less intelligible (readable) than 7r. In succeeding papers [9, 10] this concept has been specified in more details and the program obfuscation problem has been set up as follows.

Given a set of programs V, V C U, find a transformation O : V —> U such that for every program 7r from V an obfuscated program ti' = O(tt) should satisfy the following requirements.

semantic-preservation: tx' has the same observable behavior as 7r;

obscurity: understanding and reverse engineering tv' is strictly more time-consuming than understanding and reverse engineering 7r;

resilience: it is either difficult to construct an automatic tool to undo the obfuscation transformation, or executing such a tool is extremely time-consuming;

stealth: the statistical properties of w' are similar to those of 7r;

cost: execution time/space penalty incurred by the obfuscation of 7r are small.

The key issue in this definition is an obscurity of program obfuscation. Unlike other requirements this item does not admit a uniform indubitable formalization since it involves such loose concepts as “understanding”, “semantic interpretation”, etc. Nevertheless in [3, 14, 27] serious efforts were made to find some suitable formalism for capturing the obscurity requirement. As early as 1998 J.MacDonald [22] noticed that to protect a program “we are in search of security, however, and not obscurity, so a formal proof of security is a necessity”.

To formalize the obscurity requirement Barak, Goldreich, et al. introduced in [3] the concept of “perfect” obfuscation. Intuitively, a program obfuscator O is called perfect if it transforms any program 7r into a “virtual black box” O(tt) in the sense that anything one can efficiently compute given O(tt), one should be able to efficiently compute given just oracle access to 7r. The main result of [3] implies that perfect obfuscation is impossible. This is proved by constructing a family J7 of functions and a predicate P : !F —> {0,1} such that, given any program that computes a function / G the value P{f) can be efficiently computed, whereas no efficient algorithm, being given only oracle access to a randomly selected / G can compute P{f) much better than by random guessing.

An alternative approach to the development of the rigorous concept of obfuscation security was initiated in [27]. In this paper a formal information-theoretic definition of obfuscation security was introduced. Informally, O obfuscates securely a property P of programs from some ensemble if any adversary A, being granted anaccess to programs from 0{7i), can extract in polynomial time only a “negligibly small” amount of information on P. This definition was also provided with novel obfuscation technique which takes advantage of cryptographic primitives (one-way functions, hard-core predicates). To show that the new formalization of obfuscation security enables one to design provably secure program obfuscators one meaningful example was considered as a case study. In this example a class of programs that implement a simple fixed password identification scheme is taken for and the property of a program to execute actual password checking is considered as a secret property P . A program obfuscator

O is designed and, assuming that one-way permutations exist, it is proved that O securely obfuscates P. Moreover, close relationships between obfuscation problem and basic cryptographic primitives was established by demonstrating that a secure obfuscation of P implies the existence of one-way functions.

But the mainstream of research in program obfuscation has been focused on the development of obfuscation techniques which admit a straightforward application to the solution of malicious host problem without giving strong evidence of their security. A taxonomy of basic obfuscating transformations was presented in [8]. It includes

lexical transformations: scrambling of identifier, deleting comments, etc.;

control transformations: inserting dead pieces of code and faked branchings, unfolding of loops, splitting and merging control flow structures etc.;

data transformations: variable splitting, data encoding, converting static data into dynamic, etc.

The most important concept introduced in [8] is that of opaque predicate. A predicate R is called opaque if its outcome is known at obfuscation time but is difficult for the deobfuscator to deduce. Given such opaque predicates it is possible to construct obfuscating transformations that entangle control flow of a program so that no adversary could understand the intended meaning of the program components. It looks like the very essence of obfuscation is concentrated in this notion and the development of secure opaque predicates may be regarded as a prime task of program obfuscation.

The main difficulty in solving this task is in providing security of opaque predicates. But if an adversary is assumed to be restricted in her/his potency and only currently available automatic tools are allowable for deobfuscation then an obscurity of an opaque predicate R can be achieved by obstructing relevant program static analysis techniques. It is well known (see [17, 20, 26]) that precise static analysis of programs containing complex data structures and pointers is either undecidable or extremely time consuming. Therefore one could obstruct a static analysis toolset by using for obfuscation such an opaque predicate whose precise analysis by means of the selected toolset is a hard problem.

This approach was implemented in many papers. In [9, 11] it was shown how strong opaque predicates can be constructed based on intractability of alias analysis. The basic idea is to extend the program to be obfuscated with a code that builds a set of complex dynamic structures. The introduced code will occasionally update the structures (modifying pointers, splitting and merging fields, etc.) while maintaining certain invariants. These invariants are then used to manufacture opaque predicates as needed. In [6, 28] it is demonstrated that 3-SAT problem which is known to be NP-complete can be simulated by means

of general pointers. This makes it possible to design opaque predicates whose precise analysis is an NP-complete problem. A more sophisticated using of function pointers is used in [24] for the same purpose. In [7] it was shown that any PSPACE-complete problem can be used as the basis for manufacturing opaque predicate and we could restrict ourselves to scalar boolean variables only.

The principal drawback of all these techniques is that none of them has a formal basis for making claims about the difficulty of understanding an obfuscated program. Therefore it is hardly possible to estimate to what extent such methods serve the purpose — only the reference to the intractability of some static analysis problems or to the hardness of combinatoric problem embedded into obfuscated programs is not sufficient. Some highly speculative measures for potency and resilience of obfuscating transformations were introduced in [6, 8, 30] for the purpose of estimating the potency of obfuscating transformations. Usually, given a measure of program complexity E, a potency of obfuscating transformation O in relation to obfuscated program 7r is defined as

The problem of measuring program complexity is quite long-term. In the frame of software theory lot of different measures were created (see [6] for survey). But all these measures characterize only static issues of program (e.g. length of program, number of variables, cyclomatic number of control flow graph, etc) and none of them gives an acceptable fit to our purpose.

In this paper we introduce a framework for the rigorous study of program obfuscators as tools for the obstruction of program static analysis. Our main idea is as follows. A program static analysis tool A computes a mapping A : 1A —> W, where 1A is a set of programs under consideration and W is a set of possible outcomes of static analysis. For example, when static analysis is applied for detecting useless variables a powerset V^Var^), where Varv is the set of all variables of a program tt, stands for W. As a rule, a set W of possible outcomes constitutes a lattice which has the minimal element _L. This distinguished element may be interpreted as a lack of any significant information about the program property analyzed by means of A. In the case of useless variables detection _L = 0 which means that static analyzer A can not detect any useless variable.

Definition. Given a program obfuscator O and a program static analyzer A we say that O is impervious to A if A(0(n)) = _L holds for any program 7r.

Thus, a program obfuscator O impervious to static analyzer A makes this toolset unsuitable for the purpose of program decompilation. Imperviousness criterion has several advantages over other measures for the security/obscurity

of obfuscating transformations. Unlike complexity-theoretic [3, 14] and information-theoretic [27] criteria it may be used not only in theory but also in practice for demonstrating resilience of certain obfuscating transformations against possible reverse engineering attacks. Imperviousness criterion defined thus does not appeal to the worst-case complexity of static analysis (cf. [7, 8, 9, 10, 11, 24, 28, 30]) and, hence, it could be used to guarantee absolute resilience of obfuscating transformations against some specific decompilation tools. Finally, given a specific static analysis algorithm A and an obfuscation transformation O one could prove formally that O is impervious to A. The aim of this paper is to demonstrate that the latter is possible at least for some simple program analysis algorithms. We take two typical static analysis algorithms (Andersen’s points-to analysis algorithm Ai and Binkley-Horwitz-Reps slicing algorithm A2) as examples. For each of these algorithms Ai, i = 1,2, we develop a corresponding obfuscating transformation O.j and prove that O.j is impervious to A,.

The paper is organized as follows. In Section 2 we discuss briefly the problems of pointer analysis and slicing for computer programs. We introduce also the model of programs these algorithms deal with. In Section 3 we consider in more details a flow-insensitive points-to analysis algorithm Ai presented in [2], introduce our obfuscating transformation 01 aimed at obstructing A\, prove its imperviousness to A\, and estimate the cost of obfuscation. In Section 4 the same investigation is applied to a slicing algorithm A2 presented in [18]. Finally, in Section 5 we discuss further lines of research.

2. Program static analysis

Program static analysis is a branch in formal methods which incorporate a variety of techniques intended for deducing the safe information about program behavior immediately from the text of a program. Static analysis holds an intermediate position between complete formal verification of programs and program testing. Static analysis algorithms are used mostly for revealing some specific semantic properties of programs (like dead code detection, constant propagation, etc.) which may be helpful for program optimization, understanding, re-using, etc. The efficiency of such algorithms is characterized by complexity and accuracy. Therefore to obstruct static analysis one has to bring a program into such a form which is either intractable for the current static analysis algorithms or drastically degrade their accuracy. Since many static analysis algorithms operate in polynomial time, the only way to obstruct them is to reduce the precision of their outcome to the minimal possible level.

In this article we will consider two typical static analysis algorithms. One of these algorithms is intended for detecting aliasing. Two variables x and y are said to be aliasing if they could refer to the same memory location during a program execution. Alias analysis is necessary for revealing implicit updating

of variables. It is well known that aliasing problem is undecidable in general (see [26]) and remains hard even for simple programs (see [5, 17]). Nevertheless in [2, 23] (see also [4, 15] )it was shown that some reasonable approximation for pointer aliasing can be computed in polynomial time. Each of these algorithms associates a pointer p with a set of variables W£ and guarantees that no variables beyond Wp can be alias of p. The precision of an alias analysis algorithm is

#(wc)

characterized by the ratio #(»•“) ’ where Wp is the set of actual aliases of a pointer p. Clearly, the least possible precision is attained when W£ coincides with the set of all variables declared in the program. In this case alias analysis by means of fast algorithms from [2, 23] are completely obstructed. In the next section we will demonstrate how to transform a program into such a form which guarantees a complete obstruction of fast alias analysis. According to the definition above this program transformation is impervious to the fast alias analysis. For the sake of simplicity we restrict ourselves with flow-insensitive and context-insensitive variants of alias analysis algorithms.

Another type of static analysis algorithms considered in this paper are static slicing analysis algorithms. Program slicing is a viable method to restrict the focus of a task to specific sub-components of a program. A slice [29] is an executable subset of program statements that preserves the original behavior of the program with respect to a subset of variables of interest and at a given program point. The applications of slicing technique include debugging, testing, program comprehension, restructuring, downsizing, and parallelization. The most advanced algorithms (see, e.g. [18]) compute static slices as backward traversals of a Program Dependence Graph (PDG) [25], a program representation where nodes represent statements and predicates while edges carry information about control and data dependencies. A PDG based algorithm considers slicing criteria of the type (p, V), where p is a program point and V is a set of variables referenced at p. A slice with respect to such a slicing criterion consists of the set of nodes that directly or indirectly affect the computation of of the variables in V at node p.

Since slices are very much helpful for revealing hidden dependencies between data structures and program components, program obfuscation has to be resilient against slicing algorithms.The efficiency of a slicing algorithm C is char-

acterized by its time complexity and by the ratio s^, where SpV is the set of program points included into the slice computed by the algorithm with respect to a slice criterion (p, V), and <S'“y is the minimal set of program nodes that are necessary to be kept intact for computing correctly the values variables from V at point p. The worst possible solution to static slicing problem is given by the trivial slicing algorithm which outputs as S'® v the set S™v of all program points that precede p in a Control Flow Graph. Thus, according to the definition above program obfuscation is said to be impervious to a slicing

algorithm C if for every slicing criterion (p, V) an algorithm C after being applied to an obfuscated program 0{ tt) outputs the worst possible slice S™v as the result. In this case static slicing algorithm is completely obstructed. In the next section we will demonstrate how to transform a program into such a form which guarantees a complete obstruction of static slicing algorithm introduced in [18].

For the sake of clarity we restrict ourselves with considering programming language whose syntax includes only simple data types, assignment statements, if-then-else statements, while-do statements and function calls.

3. Obstructing alias (points - to) analysis

3.1. Points-to analysis algorithms

Given a program 7r a points-to analysis algorithm C computes a function fct7r '■ V ar —> 2V “r such that for every variable x the set fc,ir{x) includes all variables x may refer to along some run of 7r. Usually the function fc^ is presented as a directed graph whose nodes are associated with variables or sets of variables. An arc from a node p to a node q means that q is included in

Since the problem of detection given a program 7r and a pair of variables p, q whether 7i admits a computation such that p refers to q is undecidable, any function fc}7T '■ Vciv —2V “r computed by a points-to analysis algorithm only approximates true points-to relation between program variables.

We will consider three points-to analysis algorithms presented in [23], [16], and

[2]. The most fast algorithm is that by Steensgaard [23] (in what follows this algorithm will be identified as S-algorithm ), and the most precise is that by Andersen [2] (A-algorithm). The points-to analysis algorithm presented by Shapiro and Horwitz [16] (SH-algorithm) has a parameter k. This parameter allows one to tune speed and precision of the algorithm. If k = 1 then it gives the same result as that of Steensgaard’s algorithm, and if k = n (where n is the number of variables) then it’s result is the same as that of Andersen’s algorithm.

To explain in more details the difference between these algorithms let us consider the following program:

a = &b

a = &d

a = &c

c = &d

144

The results of each algorithm are as follows:

1. Steensgaard’s algorithm (Shapiro and Horwitz with k = 1)

2. Shapiro and Horwitz algorithm with fc = 2

3.2. Equivalent transformations

Any insertion of the assignment statement in some program point may change the value of at least one variable and affect program runs. To preserve program behavior we will use only those transformation that restore or keep the value

of the affected variable until it’s next use in predicate, calling of a function or in a right-hand side of some assignment statement. Examples of equivalent transformations are presented below.

1. Use of opaque predicates. A predicate P is called opaque predicate if it’s value is constant and it is known before the compilation but is hard for computing by means of static analysis algorithms. Instructions of the inserted block that affect the value of p can be hidden in a branch of conditional statement with a predicate P = 0. For example:

p = &b;

//beginning of a block

if(P) //P = 0

p = q; // hiding obfuscation instruction

a = *p + 3c;

//end of a block

2. The simplest way to preserve the values of selected variables is to use extra variable. In what follows it will be demonstrated that extra variable won’t affect anything in the obfuscation process in this case. Example:

p = &b;

/ / inserting a block v = p; //storing a value

p = q;

p = v; //restoring a value

a = *p + 3c;

3.3. Obfuscation algorithm Oi

Since Andersen’s algorithm [2] is the most precise points-to analysis algorithm, it is suffice to show how to obstruct this static analysis technique.

The main weakness of points-to analysis algorithms under consideration lies in their flow-insensitivity. If some fragment contains a pair of instructions

p = &a;

q = &a;

then Andersen’s algorithm characterizes variables p and q as an aliased pair of pointers in relation to value a independently of points of their occurrence in a program. One should also notice that instructions like p = q expand the set of possible values of p with the set of q (Xp = Xp U Xq, where X; is the set of possible values for pointers i).

The main idea of our obfuscation technique is based on inserting additional assignment and de-reference statements and using additional pointer variables aiming at making all pointers of the same reference level aliased (their set of possible values will be equal). Since now by the set of pointer p we will mean a set of possible values of a pointer p at the end of the points-to analysis. The obfuscation algorithm is as follows:

1. Pointer Analysis.

• collecting information about pointers, determine the maximum pointing level (mark it as max_level);

• collecting the set V of all variables (we will denote by Vj the subset of V containing pointers of level i).

• extracting instructions p = q; and coping them into the set W in the form of p —> q;

• extracting de-reference instructions p = Sza; and coping a into the set Sp.

The sets W and Sp will be used later to avoid superfluous insertions of statements.

2. Optimization of the Set W. Since the set W can contain cycles (e.g. it can contain both pairs p —> q and q —> p), it requires optimization. The optimization procedure is as follows:

• build a directed graph whose arcs are all pairs from W;

• delete nodes that have no in-going arcs;

• mark all nodes and arcs that form cycles in the graph;

• delete all marked nodes that has no marked in-going and out-going arcs. If there are no such nodes in a cycle then select randomly any node in the cycle and delete all other nodes;

• delete all nodes that has no in-going edges.

We deleted thus all pointers that are necessary for the obfuscation algorithm. The obfuscation is not necessary for pointers remaining in W since the instruction p = q; will ensure that the set of p is larger than the set of q. And the set of q will cover the entire level.

3. Preparing Additional Variables.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

At this step the set of additional pointer variables is defined. Let k > 1 be an input parameter indicating the number of additional pointer per reference level. So we have k*max_level additional pointers. Denote by Pi is the set of such variables on the i-th level, \Pj \ = fc, i = 1, max_level,

Po = 0-

4. Building Lists of Additional Assignments.

We describe this step for the i-th level, i = 1, max_level.

(a) Denote by (V-i U the set of variables associated with a

pointer p. This set should satisfy the following requirements:

• For any pair s,p G Pi

(V-i-l U Pi —i )s c (K-iUPj-i), (K:-lUPi-i)p C (K-iUPj-i).

• If s ^ p then (Vi-1 U Pi—i)§ n (Vi-1 U Pi—i )p = 0,

• |J {Vi-1 U Pi_i )p = (V-i U Pi_i).

PG-Pi

For each p G Pj and q G (V-i U form a de-reference assignment p = Szq; and add it to the list Listi. Thus, additional

pointers have the set of values from the lower level.

(b) For each v G V fl (V \ W) and for each p G Pi such that (V-i U Pi-1 )p ^ Sv form an assignment statement v = p; and add it to the list List2- Thus, every pointer at the level i will gain a set of possible values equal to the set of variables from the lower level. Now expand the set for additional variables. For each p G Pi and any randomly selected v G V form an assignment statement p = v; and add it to the list Listi.

5. Inserting additional statements.

Add statements from the lists A and I into the program code. Statements from A can be inserted evenly. Statements from I must be inserted in according to the rule specified in Section 3.2).

An example below demonstrates how this algorithm operates. Let us consider the following program tv.

int a, b, c *p, *q, **r; p = &b; q = &c; r = &p; a = **r + *q;

q = p;

p = &a;

By applying Andersen’s points-to analysis algorithm we obtain the reference graph for 7r:

After applying our obfuscation algorithm for k = 2 we obtain the following program 0{ 7r):

Step 1&2: Vo = {a, b, c}, V\ = {p, q}, V2 = {?’}, S'p = {fe, a}, S'q = {c}, Sr = {;•}

Step 3: Po = 0,-Pi = {Sll , S12}, Pi = {S21,S22}

Step 4: .4 = {“sn = &a;”,“sn = = fee; ”, “s,i = &p; ”, “s, 1 =

&g;’\ “S22 = &S12;”, “S22 = fesn; V‘sn =p\”,“s12 = q\”, “s22 = r; ”, “s2i =

;•;”},/= {“r = s2i; ”, “r = S22; ”, “p = S12; ”, > = sn; ”}

Step 5: Program C,('/r) (possible form):

int a,b, c,*p, *q, **r; int *sll, *sl2, **s21, **s22; sll = &a;

P = sll; s21 = &p; r = s21; s22 = r; p = &b; sll = &b; s22 = fesll; q = &c; s21 = r; sl2 = fee; if(0) p = sl2;

r = &p; s22 =&sl2; a = **r + r = s22; sll = p;

q = p;

s21 = &q; p = &a; sl2 = q;

Andersen’s algorithm graph for 0{ 7r):

3.4. Correctness, complexity and overheads of the obfuscation 0\

Transformations specified above preserve functions computed by programs, i.e. they are semantic-preserving (equivalent) transformation.

Claim 1. Let 7r be an arbitrary program and Oi(tt) be a result of transformation of 7r as described in Section 3.3. Then Oi(tt) computes the same function as 7r.

The obfuscating transformation Oi obstructs flow-insensitive points-to analysis algorithms by hiding differences between pointer variables and making thus each variable to be an alias of any other variable at the same level of the reference graph: for every obfuscated program Oi(tt) Andersen’s points-to analysis algorithm builds only the trivial reference graph as depicted in Section 3.3. Thus, we arrive at:

Claim 2. The obfuscating transformation O\ is impervious to flow-insensitive variant of points-to analysis algorithms (A-algorithm, S-algorithm and SH-algorithm).

Now we are to find upper bound of the obfuscating transformation complexity. Assume that a program 7r contains n statements and m variables. The first step (Pointer Analysis) requires O(n) operations. The next step (Optimization of the Set W) takes about 0(m2) operations on the reference graph. The maximum number of additional variables is k * max_level, where k is a parameter used in SH-algorithm. So on the step of Preparing Additional Variables the algorithm builds 0(max_level * k * m2) additional instructions. The insertion of new instructions into the code takes 0{m * n) operations. So the total complexity of Oi is 0{max_level * k * m2 + nm).

Claim 3. Time complexity of obfuscating transformation Oi is 0(|7r|3).

The overheads of the obfuscating transformation Oi can be characterized in terms of the number of additional instructions.

Claim 4. For any program 7r

|C)i(7r)l — M = 0(m * k * max_level)

If a program 7r is such that n m (which seems reasonable for most programs appeared in practice) then « 1 and the obfuscating transformation Q\

can be accepted as cheap according to the Taxonomy of Obfuscating Transformations [8].

The scripting languages appear to be one of the developing directions in the programming. The distinguished feature of scripts is that they are formally typeless. So our obfuscating transformation can be easily adopted to scripting languages. We have to replace all the occurrences of p with P, Vj with V and so on. Finally if (t>i,..., vn) is the set of variables and («i,..., um ) is the set of additional pointers, the analysis graph for the program will be as follows:

4. Obstructing program slicing

Any static slicing algorithm based on Program Dependence Graphs (PDG) begins to work with building PDG and the size of a slice obtained is determined by the size of PDG. PDG presents both control and data dependencies between variable. We say that a variable x at a program point p depends (semantically) on a variable y at a program point q if some spontaneous change of a value of y may affect either a value of x or an access to the point p via q by some execution

of a program. In general, dependency relation is intractable but admits safe approximation which can be computed efficiently relying only on the syntactic structure of a program: a program point p depends (syntactically) on a point q if

• some variable y is defined at q (e.g. y is in the left-hand side of an assignment statement), used at p (e.g. y occurs in the right-hand side of an assignment statement), and there is a syntactically admissible path from q to p in the control-flow graph of the program such that no other definitions of y occurs along this path;

• y is defined at q and used as an argument of a predicate at a branching point p and there is a syntactically admissible path from q to p such that no other definitions of y occurs along this path.

PDG represents all syntactic dependencies between program points.

4.1. Introducing extra dependencies

To obstruct static slicing algorithm we introduce some additional syntactic dependencies so that any point p in a program becomes dependent from all points q preceding p in the control flow graph of the program. In this case any slice will cover all such points giving thus no useful information about control and data dependencies in an obfuscated program. To this end we introduce a series of specific equivalent transformations.

Ti Given an assignment statement S' of the form y = f{x) and an arithmetic expression g{z) such that g{z) = c we may induce a dependency relation between y and an arbitrary variable z by replacing the above statement by y = f{x) + g{z) — c. We will denote this transformation by Tf(S').

Tg The same effect can be achieved by replacing any occurrence of a variable x in a right-hand side of an assignment by the expression .

We will denote this transformation by T£(S).

T3 Boolean identities also provide a vast possibility for inserting new dependencies:

1. P = (P A P) V P;

2. P= (PAP) V (-PAP).

This transformation will be denoted by T-^(P).

4.2. Obfuscation algorithm

The obfuscation algorithm O2 presented in this section when being applied to a statement S' transforms it given a list of variables L to an equivalent statement 02(S, List) and builds a new list of variables List(S, L) according to the following rules.

1. If a statement S' is a sequential composition of statements S'i and S'2 (i.e.

S' = Si; S2) then 02{S,L) = e>2(S'i,L); <D2(S2,L') and List.(S~L) = List(S2, L'), where L' = List (Si, L).

2. If S is an assignment statement of the form x := t(y\,...,yn) and L \ {2/1,... ,y„} = {si,...,sfc} then 02(S, L) is either TZl’-’Zk{S)1 or TZl’--’Zk (S)2, and List(S,L) = {x}.

3. If S is a conditional branching statement of the form if P then S'i else S'2 fi, where P is a predicate whose arguments are variables y\, . . . ,y„, then 02(S,L) is a statement of the form if T-^(P) then 02(Si,$) else O2(S2,0) fi, where R is an arbitrary predicate whose arguments are all variables from the list L \ {yi,..., y„}, and List(S, L) = List(S\, 0) U List(S2, 0).

4. If S' is an iteration of the form while P do S' od, where P is a predicate whose arguments are variables y\,..., yn, then 02(S, L) is a statement of the form while T-^(P) do 02(S 1,0) od, where R is an arbitrary predicate whose arguments are all variables from the list L\{y\,.. ., y„}, and List(S, L) = List(Si, 0).

Given a program 7r an obfuscation algorithm 02 begins to operate with an empty list of variables and results with a program 02(Ti:,ty).

4.3. Correctness, complexity and overheads of the obfuscation O2

Since all transformations presented in Section 4.1 are semantic-preserving, the transformation 02 also preserves the function computed by a source program.

Claim 5. Let 7r be an arbitrary program and 02(Ti:,ty) be a result of transformation of 7r as described in Section 4.2. Then Oi(tt) computes the same function as 7r.

It is easy to demonstrate that the extra dependencies introduced by 02 make each statement S' in a program 02(ir) dependent on all statements that precede S' in the control flow graph of 02(ir). In this case any static slicing technique based on PDG outputs the worst possible result — for every slicing criteria (p, x), where p is a program point and x is a variable defined at p the corre-

sponding static slice will include all statements that precede p in the Control Flow Graph of a program under consideration. Thus, we arrive at:

Claim 6. The obfuscating transformation 02 is impervious to any PDG-based static slicing technique.

Each transformation Ti,T2,T^ takes time O(m), where m is a number of new variables inserted into a statement or a predicate, and each basic statement or a predicate is modified along the application of O2 at most once. Thus, we arrive at:

Claim 3. Time complexity of obfuscating transformation 02 is 0( 17r|2).

The obfuscating transformation 02 does not introduce new statements but it could extend some statements by inserting extra variables. In the worst case all variables of a program could be inserted into a statement.

Claim 4. For any program 7r

|C02(7T)

O(m),

I'M

where m is the number of variables used in tv.

Since in practice in most cases 17r | m, the obfuscating transformation 02 can

be accepted as cheap according to the Taxonomy of Obfuscating Transformations [8].

5. Conclusion

In this paper we demonstrate that one could estimate formally a resilience of obfuscating transformations against possible re-engineering activity based on static analysis algorithms (points-to analysis and static slicing). We think that our approach can be extended to the cases when more involved program analysis algorithms are applied and this will be our line of further research.

References

[1] Amoroso E.G. Fundamentals of Computer Security Technology. Englewood Cliffs, NJ: Prentice Hall PTR, 1994.

[2] Andersen L.O.,Program Analysis and Specialization for the C Programming Language, DIKIJ, University of Copenhagen, May 1994.

[3] Barak B., Goldreich O., Impagliazzo R., Rudich S., Sahai A., Vedhan S., Yang Iv., On the (Im)possibility of obfuscating programs. CRYPTO’Ol — Advances in Cryptology, Lecture Notes in Computer Science, 2139, 2001, p. 1-18.

[4] Bruns G., Chandra S., Searching for points-to analysis, IEEE Transactions on software engineering, v. 29, 2003, N 10, p. 883-897.

[5] Chakaravarthy V.T., Horwitz S., On the non-approximatibility of points-to analysis, Acta Informatica, v. 38, 2002, p.587-598.

[6] Chernov A.V. On the analysis of obfuscating transformations, Proceedings of ISP RAN, v. 3, 2002, p. 7-39. (in Russian)

[7] Chow S., Gu Y., Johnson H., Zakharov V., An approach to the obfuscation of control flow of sequential computer programs. Information Security Conference, Lecture Notes in Computer Science, 2200, 2001, p. 144-156.

[8] Collberg C., Thomborson C., Low D., A taxonomy of obfuscating transformations, Tech. Report, N 148, Dept, of Computer Science, Univ. of Auckland, 1997.

[9] Collberg C., Thomborson C., Low D., Manufacturing cheap, resilient and stealthy opaque constructs. Symposium, on Principles of Programming Languages, 1998, p.184-196.

[10] Collberg C., Thomborson C., Low D. Breaking abstraction and unstructuring data structures. IEEE International Conference on Computer Languages, 1998, p.28-38.

[11] Collberg C., Thomborson C., Watermarking, tamper-proofing and obfuscation — tools for software protection. IEEE Transactions on Software Engineering, 28, N 2, 2002, p. 735-746.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

[12] Devanbu P.T., Stubblebine S. Software engineering for security: a

roadmap. Future of SE Track, 2000, 227-239.

[13] Gollmann D. Computer Security. New York: Willey, 1999.

[14] Hada S., Zero-knowledge and code obfuscation. ASIACRYPT’2000 — Advances in Cryptology, 2000.

[15] Heintze N., Tardieu O., Ultra-fast alias analysis using CLA, in Proc. of ACM SIGPLAN Conference on Programming Language Design and Implementation, 2001.

[16] Horwitz S., Shapiro M., Fast and Accurate Flow-Insensitive Points-To Analysis, , CS Dept., University of Wisconsin-Madison.

[17] Horwitz S., Precise flow-insensitive may-alias analysis is NP-hard. ACM Transactions on Programming Languages and Systems, 19, N 1, 1997, p. 1-6.

[18] Horwitz S., Reps T., Binkley D. Interprocedural slicing using dependence graphs. ACM Transactions on Programming Languages and Systems, v.12, No 1, p. 26-60.

[19] Howard M., LeBlank D., Writing Secure Code, Microsoft Press, 2003.

[20] Landi W. Undecidability of static analysis. ACM Letters on Programming Languages and Systems, 1, N 4, 1992, p. 323-337.

[21] Linn C., Debray S., Obfuscation of executable code to improve resistance to static disassembly, in Proceedings of the 10th ACM conference on Computer and communication security, 2003, p.290-299.

[22] MacDonald J., On Program Security and Obfuscation, Technical Report, University of California, 1998.

[23] Steensgaard B., Points-to analysis in almost linear time, March 1995.

[24] Ogisso T., Sakabe Y., Sochi M., Miyaji A. Software obfuscation on a theoretical basis and its implementation, IEEE Transactions on Fundamentals, E86-A(l), 2003.

[25] Ottenstain K.J., Ottenstain L.M., The program dependence graph in a software development environment, ACM SIGPLAN Notices, v. 19, 1984, N 5, p. 177-184.

[26] Ramalingam G., The undecidability of aliasing. ACM Transactions on Programming Languages and Systems, 16, N 5, 1994, p. 1467-1471.

[27] Varnovsky N.P., Zakharov V.A., On the possibility of provably secure obfuscating programs, in Proceedings of Andrei Ershov Fifth, International Conference ’Perspectives of System Informatics”, Novosibirsk, 2003, p.76-86.

[28] Wang C., Hill J., Knight J. Davidson J., Software tamper resistance: obstructing static analysis of programs, Tech. Report, N 12, Dep. of Comp. Sci., Univ. of Virginia, 2000.

[29] Weiser M., Program slicing, IEEE Transactions of Software Engineering, v. 10, 1984, N 4, p. 352-357.

[30] Wroblewski G., General method of program code obfuscation, in Proceedings of the International Conference on Software Engineering Research and Practice (SERP), 2002, p. 153-159.

Program obfuscation as obstruction of program static analysis Текст научной статьи по специальности «Компьютерные и информационные науки»

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Ivanov K. S., Zakharov V. A.

Похожие темы научных работ по компьютерным и информационным наукам , автор научной работы — Ivanov K. S., Zakharov V. A.

Текст научной работы на тему «Program obfuscation as obstruction of program static analysis»