An approach to quantitative analysis of resistance of equivalent transformations of algebraic circuits
A. V. Shokurov
Abstract. A system of computations on encrypted data such that
— transformation of encryption is effective, i.e. can be performed in polynomial
time on the size of circuit C;
— the size of scheme A' differs not essentially from the size of initial scheme A;
— lower bounds on resistance of circuit is exponential
is constructed.
1. Introduction
Tamper-resistant software technologies are intended for protecting software programs from intelligent tampering attacks aimed at obtaining some extraknowledge about program structure and behavior (the key ideas of algorithms used in a program and some specific data — passwords, constants, parameters, etc.). To achieve these goals some specific semantic-preserving transformations are applied to a source computer program that perform deep and sophisticated changes in its control flow and data flow [9, 8, 5].
Tamper resistant software is one of the forms of software protection against reverse engineering. Data protection is an important part of tamper resistant software. This protection can be done in different ways and one of them which uses data encodings [7, 5] is considered in this paper.
Formally model with transformed data can be described as follows. Let Alice and Bob are two participants of a computation. Alice is the owner of some data but has not enough computational resources. Bob possesses enough amount of computational resources but Alice does fully not entrust Bob. Alice needs to transform data, constants and computational circuit in such a way that the needed by Alice result could be easily obtained from the result of transformed computation over transformed data. Then Alice needs to send all transformed data, constants and computational circuit to Bob. These transformations need to be such that the second participant could get minimum information (in ideal case nothing) from received data and circuit.
Now we can state the following problem.
Problem. Let C be some transformed computational circuit. Find initial computational circuit C corresponding to C. How much computational circuits C correspond to transformed circuit Cl It is suggested that the parameters of given transform are not given.
To explain what are data encodings consider the problem of computation of some arithmetical expression
V = F{x,c),
where x = (xi,.. ., xn) are input data and c = (ci,..., cm) are some internal parameters which we need to protect against an adversary. Let y = (yi,... ,yt) be the result of this computation.
Encoding in general is a parametric collection of functions that map each integer into some tuples of integers. Thus, any integer x will be converted to x' = (a^, £2,A simple example of it is so called linear encoding x' = a-x + b. Integers a, b are parameters of this encoding [7]. Formal definition of data encoding will be given in Section 2.
The key idea of using such encodings is the following. Instead of computing y = F(x,c) we perform some distorted computation y = F(x',cl), which is obtained from original computation F by some rules and then apply decoding (the inverse function to encoding) to obtain “real” results of the computation. One of possible approaches to obtain F is constructing for every basic operation (+,x, etc.) used in F a corresponding sequence of operations over encoded data. This is possible if any arithmetic operation on data can be expressed in terms of encoded data. Such encodings will be called homomorphic (for formal definition see Section 2). To clarify this notion consider the following example. Let integers x\ and X2 be represented using linear encoding as
x'1 = a1-x1+b1 x2 = Cl-2 ■ X2 + 62
and
Ax = ax/m A2 = a2/m
where
m = GGD(ai, <32). (3)
To calculate y = x 1 + x2 in terms of linear encoding one can find its encoded value by the formula
y = A2 ■ x'x + Ai • x2. (4)
Then
V = B2y + Bi
(5)
where B\ = (b-\_a2 + b2a\)/”* and B2 = aia2/m and the value y can be decoded from y as follows
Hence the result of addition is represented in terms of encoded data by the formula (4). In this example the arithmetic expression is y = F{xi,x2) = xi + x2 and the distorted expression y = F{x'1,x'q) is given by the formula (4). Note that an adversary observes the values of parameters Ai and A2 which are related with encoding parameters by formulas (2-3), i.e. an adversary knowing the values Ai and A2 may try to guess the values of encoding parameters. The main question is the following. Whether such information is enough to find the values of all encoding parameters? Moreover it is clear that the more operations on encoded data an adversary should observe the more information he gets on encoding parameters. The aim of ’’good” encoding is to minimize such information.
Local analysis of encoded procedures may give to adversary some information about parameters of encoding. An example of such information is given above where addition of two linearly encoded integers is discussed. Therefore the problems of proper choice of encoding and of comparison of encodings are important. The solution of such problems should be based on a notion of measure of resistance of data encodings against local analysis (when each encoded procedure is treated separately). These problems are considered in this paper.
We propose a notion of quantitative measure of resistance of encoded circuit which we consider as our main contribution. This measure gives the opportunity to compare different encoded circuits. Estimates for lower bounds of resistance are obtained for relatively wide class of algebraic circuits. These estimates show high security of computations performed in such encodings. Typically such bounds are at least of order 2100.
The structure of the paper is the following. In Section 2 formal definitions of encoding and homomorphic encoding are given. In Section 3 some examples of homomorphic encodings are considered. A measure of resistance of computation is introduced in Section 4. In Section 5 we present lower bounds of resistance of different types of encodings for some algebraic circuits. The proofs are given in the Appendix.
y={y~ Bi)/B2.
(6)
2. Notion of data encoding
Now we give a formal definition of encoding.
Definition 1. Encoding is a pair (<,£>,?/>) of transformations of integer data (xi,..., xn) and constants (parameters of encoding) (ci,. .., cm ) into integer data (x^,..., x'k) such that
. . . , Xn, Cl, ..., cm) = (xi, ...,x'k)
i/>(x'i,. . . , 4, Cl, ..., cm) = (xi,..., x„),
where decoding tp is left inverse of ip, i.e.
'if}(tp(Xl, . . . , Xn , Cl, . . . , Cm ), Cl,..., cm ) (xi,..., xn).
The data (x'i,..., x’k) are called encrypted data (cryptogram), (xi,. .., x„ is a plaintext and ci,..., cn) is a key.
Now the problem is to find a proper circuit F on encoded data for arbitrary algebraic circuit F without multiplicity [1]. One of possible approaches is to construct for every basic arithmetic operation (+,x, etc.) used in F a corresponding sequence of operations on encoded data. This is possible if any arithmetic operation over original data can be expressed in terms of encoded data. Such encodings will be called hom.om.orphic.
Now we give a formal definition for such encodings.
Consider encoding (<,£>,?/>) of two integers xi and x2 assuming n = 1 in formula
(7) and
^(.Tl,Cu,...,Cml) = (x^,...,^).
Let also
^(.T2,Ci2,...,Cm2) = (x'i2, . . . ,x'k2).
Integer numbers cu,..., cmi and C12,. .., cm2 are parameters of encoding of xi and x2 respectively. Let x% = h{xi,x2) be some binary operation on integer inputs with integer outcome (say, addition, multiplication, etc.).
Definition 2. Encoding (<,£>,?/>) is called homomor-phic with respect to h if there exist functions A, C
(«1, . . . ,tts) = A(cn , . . . , Cml, Cl2, .. • , cm2 )
(^13, • • • , Cm3 ) C{ Cn , . . . , Cmi, Ci2, . . . , Cm 2 )
and an arithmetic procedure (a sequence of arithmetic operations)
{yi,---,Vk) = h{x'n,... ,x'kl,x'12,... ,x'k2,a1,... ,as) (9)
with integer input data x^,..., x'kl, x'12, • • •, x'k2, ai,..., as such that
h(x i,x2) = x3 =tp(y1,...,yk,c13,...,cm3).
It follows from the definition that function C computes the parameters of encoding of the outcome of operation whereas function A creates the parameters of arithmetic procedure h. It is clear from definition that one can choose as function A the identity function
^-(Cll, • • • , Cml i ^12 ? ■ ■ ■ i Cm2 ) (Cll, • • • , Cml ? Cl2 ? • • • ? Cm2 ) ?
but in this case the aim of encoding should not be achieved, because all the parameters of encoding should take part in computation. The aim of function A is to decrease the information on encoding parameters and to show the possible minimum. In ideal case function A minimizes information on these parameters. It is also clear from the definition that h represents algebraic circuit. Note that functions C and A may be arbitrary but for practical use need to be efficiently computable. To find encoded value of the result of operation it is not necessary to know all the parameters of encodings but their transform «i,.. ., as.
Linear encoding (1) gives an example of homomorphic encoding with respect to operations of addition and multiplication of integer numbers. For addition it follows from formulas (2-5) because
x[ = <p(x1: ai, 61) = aiXi + 61, x'2 = (f{x2,a2, 62) = a2x 2 + &2,
(ai,a2) = A(a1,b1,a2,b2) = (A1,A2),
(03,63) = C{a1,b1,a2,b2) = (B2,B1),
where Ai, A2, B\ and £>2 are given by formulas (2-3) and arithmetic procedure h is given by formula (4).
3. Examples of homomorphic encodings
In this section we present some examples of homomorphic encodings with respect to addition and multiplication.
• Linear encoding (see, Section 2).
Residue encoding of integer x is a tuple
x'j = x (mod pi),
where pi, i = 1, • • •, m are coprime integers. The parameters of encoding for all data are the same and are equal to
(ci, ... , Cm ) (pi, ... , Pm).
The formulas for decoding function tp are given in [2].
Addition and multiplication of integers in this encoding are performed as follows. Let y'.j = y (mod p.j), i = I,... ,k then x\ + y'.j (mod pi) is the encoded sum x + y and x\ ■ y'i (mod p.j) is the encoded product x ■ y [1, 2]. The function A from formula (8) is a constant function
A{PU ■ ■ ■ ,Pm,Pl, ■ ■ -,Pm) = 0
because the formulas for addition and multiplication do not depend on parameters of encoding pi,... ,pm-
Mixed encoding is a generalization of linear and residue encodings. Fix coprime integers pi,... ,pm- Integer x is represented in mixed encoding as
x[ = ai • %i + bi mod pi
(10)
x'm = Am ' Xm + Ki mod pm where GCD(cifc, j)fc) = 1 for all k = 1,..., m. The parameters of encodings in this case are (ai, bi,.. ., a.m, bm ). Now take two integers encoded by parameters (an, bn,.. ., ami,bmi) and (ai2, 612, • • •, am2, bm2) correspondingly. Then addition y = x 1 + x2 in terms of encoded data can be made by formula
Vi = liaiix'n + liai2x'i2 mod pi
for arbitrary 7; such that GCD(7;,p,;) = 1. So the function A from formula (8) is
A{ctn, b 11,..., ctm 1, bm 1, cti2, b 12, • • •, ctm2, bm2) (an,..., ami, «12,..., am2),
where otij = Tifl.y1 mod p.j. Function C in this case is
C{ctn, b 11,..., ctm 1, bmi, a 12, bi2,..., ctm2, bm2) (7li -(«11^1 + «12^2), • • .,7m, -(«ml&l + am2b2)).
The formula for performing multiplication in mixed encoding is given in section 5 and multiplication also is homomorphic for mixed encoding.
• p-base representation gives an homomorphic encoding [11].
• Encoding based of Discrete Fourier Transform [11, 2].
One can try to protect data in a program using well-known cryptographic encodings. Let us consider the following example: RSA function x' = xe mod m, where m = pq, p and q are prime numbers [4].
Given encoded data x' and y' it is easy to implement multiplication z = xy as follows: z' = x'y'. So this encoding is homomorphic with respect to multiplication.
But it is difficult to implement in the same manner the sum of RSA encoded data. This is the answer to the question: “why we do not use well-known cryptographic functions for encodings of data”?
4. Resistance of data encodings
Firstly introduce the notion of the “observable” and the “real” worlds. The “observable” world is a set of encrypted values which Alice sends to Bob. The “real” world is a set of non-encrypted values of inputs and constants. Illustrate this by the following example. Let x be encoded in linear encoding as x' = a-x+b and assume that an adversary Bob observes only x'. The “observable” world is x' and a “real” world is a set (a, x, b) for which encoded x corresponds to the observable value x'.
It is obvious that the same “observable” world can correspond to several “real” worlds. And any of these “real” worlds can be the real world which we encode. In the example mentioned above the number of the “real” worlds, which we denote as Rw, can be estimated as Rw > K2, where K is the range of integers we use.
Note that operations with encoded data can reduce the resistance because additional relations between parameters occur. It can be illustrated by the example of the sum of two integers in linear encoding given in the previous section.
Let integers x and y be represented as x' and y' in linear encodings by formulas (1-2). The sum z = x + y is given in terms of encoded data by formula (4). The observable world is determined by the following parameters: x', y', Ai, A2 and the number of “real” worlds is the number of solutions of the corresponding system of equations (1-4). Additional relations which reduce Rw are
equations (2-3). The solution (i.e., one of the possible “real” worlds Rw) is a set of values for x, y, ai, a2, &i, b2. Let us denote the range of possible values as K. We estimate now the number of “real” worlds in some cases.
Proposition 1. For fixed x', y', Ai, A2 and a 1, a2 the number of possible solutions Rw > K2.
To prove it note that arbitrary values of x or y are solutions of our system as for any x (y, respectively) one can choose such 61 (62, respectively) that the value of x' (y1, respectively) does not change.
Proposition 2. For fixed x', y', Ai, A2 and x, y the number of possible solutions can be estimated as Rw > Kj max(ai, a2).
Note that for some solutions of our system a 1, a2 and for any q the values h\ = q - a\ and d2 = q ■ a2 also give a solution because Ai, A2 are the same and there exist 61 and b2 such that x' and y' do not change.
Proposition 3. The number of real worlds is Rw > K3/A, where A = max(fli, a2).
This follows immediately from Proposition 1 and Proposition 2.
As we can see in this example the procedure of estimating Rw (the number of “real” worlds) can be rather difficult and greatly depends on the sequence of the operations with encoded data.
The number of such “real” worlds which correspond to the same observable world can be used for estimating the resistance of the encoding. We introduce a measure of encodings resistance as a measure of uncertainty that is the number of “real” worlds Rw which can correspond to the observable encoded world. An adversary observing only operations in encoded world and inputs to encoded world (i.e., all encoded input data) can not distinguish between any of “real” worlds. Thus the more there is the number of corresponding “real” worlds the more there are uncertainty and resistance of encoding.
Let y = F{x,c) be some algebraic circuit [1] and y = F(x',d) be its corresponding encoded circuit.
Definition 3. A measure of resistance of observed encoded circuit y = F{x', c') is the number of different “real” worlds (c, x) which correspond to the same encoded world (x',c',F).
We will denote this resistance by Rw{x',c'). Now take the maximum of such circuits over all encoded data x'.
Definition 4. A measure of resistance of encoded circuit y = F{x',c') is the maximum of all observed resistances. We denote the resistance by RW(F) or simply by Rw.
It is important to note that such measure characterize the resistance of encoding to arbitrary attack which uses only information from encoded world. It means that this measure characterizes absolute resistance.
Protection of data in encoded world will be guaranteed by lower bounds of resistance of encodings which one can obtain. Typically the bounds we present below are at least 2100 when the range of integers is 264.
5. Results
5.1. Mixed encoding
In this section some estimates of resistance of some circuits of computations are obtained. Two algebraic circuits with encoded data are considered — the first is a computation of second degree form and the second is some homogeneous circuit without multiplicity. It is shown that for both cases we have the following estimate of resistance
Rw > (v(pi) • • • • • v{pm))n ,
where v is the Euler function, pi,... , pm are mixed encoding parameters, and n is the number of input variables.
Mixed encoding for a vector of data x = (xi,.. .,xn)t and coprime numbers (Pi i • • • i Pm) is given by equations
x'n = anxi + £>ii mod Pi
Tf — Ira ^lm^l 1 ^1 m mod Pm
Tf — ^nl a-nixi + bn i mod Pi
x' = nm ii m 1 bnm mod Pm
(11)
Then the encoded data are given by the matrix
/ z'n • • • \
\ xnl * * * '!'n III I
(12)
The original data can be decoded from encoded data by the following procedure. Let
Xu = Cux'u + du mod Pi
^lm Clmxim 1 dim mod Pm
Xnl = cnixni 1 dn i mod Pi
%nm r n m-! n m 1 dnm mod Pm
Then
xi = Aixn H--------h Xmxlm mod pi ■ ... ■ pm
.............. ••• (14)
%n = \\Xnl + * * * + ^m%nm mod pi ■ ■ pm
for some integer numbers Ai,. .., Am.
How to perform operations over data in terms of encoded data?
To add two integer numbers x\ + x2 in terms of encoded data one may use the following procedure
Xu + x-21 = Cux'n + c2\x'21 + dn + d2i mod pi
(15)
mod pm.
For calculation of linear combination of two integer numbers AiXi + X2x2 the following formula can be used
Aixn + X2x2i = Xiciix'n + X2c2ix'2i + Xidn + X2d2i mod pi
Ai.xim I X2x2m = A | c n/.r n/ I X2c2mx2m I A| d n/ I X2d2m mod \>n>.
The products Afcdfcj and linear combinations Ai&ij + X2b2j can be calculated during the compilation.
Now consider multiplication. To calculate the product of two integers xix2 one may use the following formula
xu ■ x2i = (chx'h + du) • {c2ix'2i + d2i) mod p.j. (16)
Hence
Xu ■ x2i = {ciic2ix'iix'2i + duc2ix'2i + d2icux'2i) + dud2i mod p,:. (17)
In observable world only the products cuC2j, duC2i, cfeci* and duchi are given in evaluation.
Now we show how to estimate the resistance of encoding computations considering (as an important example) an evaluation of the second degree form
n n n
f(x i,... ,xn) = ^ X! n:;x:x; + '^2f3ixi +7 (18)
i— 1 j— 1 i— 1
using encoded data x1. How to find it in terms of encoded data? To do this let us calculate its three summands using only encoded data.
Claim 1. Let
Cijk f 1 rj I:1: (' j I: for >. / I.......I* (19)
n
Cik ^ ! 1 i j k I 0.jik)Cikdjk I ft-ikC-ik for h 1, . . . , 77? (20)
i=1
n n
///, = 'y ' o.ijkdikdjk I ^ ]ftikdjk I ~ /, f -1 j
i. j i
where coefficients Cjfc are given by formulas (13). Then
n n
fk(xi, ■ ■ ■ ,xn) = ^ ' Cijkxikxjk ^ ' Cikxik (-2-2)
i. j i
The proof is given in Appendix.
Claim 2. The resistance of formula (22) in mixed encoding is Rw > {v{pi) ■ ■■■■ v{pm))n ,
where v is the Euler function and n is a number of input variables.
Now consider algebraic circuits without multiplicity using operations of addition and multiplication of integers and define corresponding homogeneous algebraic circuits. We shall present such circuits as graphs of computation with operations of addition and multiplications as nodes and data and variables as leaves
(see [3, 1]). Let input variables xi,...,xn and the constant data ci,...,cm
of this circuit have some weights such that deg(x,;) = 1 and deg(c,;) < 0. The degree of the product is equal to the sum of degrees of multipliers. As usually the degree of the sum is not greater then the maximal degree of the summands. Definition 5. A circuit is called homogeneous without multiplicity if:
(1) Addition is performed only for summands of the same nonpositive degree and the result has the same degree as its summands.
(2) Each coefficient c,; is used in the circuit only once.
(3) Multiple use of input variables is allowed.
Condition (1) means homogeneousity of the circuit. Condition (2) means that each coefficient is used only once.
Example 1. Consider Horner’s scheme of computation of polynomial
Pl{x) = ClX1 +------b C\X + C0 = (. . . (C[X + Cl-1)x +---b C1)x + c0.
Let deg(cfc) = — k and deg(.T) = 1. It is not difficult to see that corresponding circuit is homogeneous without multiplicity.
Example 2. Sparse multivariate polynomial gives another example of homogeneous circuit without multiplicity.
Now encode a homogeneous circuit step by step using the encodings of operations in mixed encoding.
Claim 3. The resistance of any encoded homogeneous circuit without multiplicity in mixed encoding is
Rw > {v{pi) ■ .■■■ v{pm))n ,
where v is the Euler totient function, n is the number of variables, and m is the parameter of encoding which is a number of modules pu-
Now we add to the algebraic graph of computations on integers the nodes with one input and one output that correspond to exponents, i.e. if input is fc, then output is km for some integer m. Definition of homogeneousity is the same as for ordinary homogeneous algebraic circuits. Such circuits will be called general homogeneous circuits. As for homogeneous circuits without multiplicity for general homogeneous circuits the following Claim holds.
Claim 4. The resistance of any encoded general homogeneous circuit without multiplicity in mixed encoding is
Rw > {v{pi) ■ .■■■ v{pm))n ,
where v is the Euler function, n is the number of variables, and m is the parameter of encoding which is a number of modules pu-Consider a circuit
C(c\,..., cn, жі,..., xm),
which depends on parameters ci,..., cn and inputs xi,.. ., xm. Some circuit
C fci,..., cn, cn^i,..., Cn^/,;, Xi,..., xm)
will be called a generalization of C if for some values of parameters cn+i,.. ., cn+fc circuit C computes the same result as circuit C.
Theorem 1. For any circuit
C(ci,..., c„, xi,..., xm)
there exists some homogeneous circuit without multiplicity
C (ci,. . . , Cn, Cn^l, . . . , Cn^/,;, Xi, ..., xm )
which generalizes the first circuit and contains extra multiplications that does not exceed the doubled number of additions and k is not greater then the number of extra multiplication.
Then from Theorem 1 and Claim 4 follows the next Theorem.
Theorem 2. For any algebraic circuit without multiplicity there exists encoded homogeneous circuit without multiplicity in mixed encoding which resistance is
Rw > {v{pi) ■ ■■■■ v{Pm)T ,
where v is the Euler function, n is the number of variables, and m is the parameter of encoding which is a number of modules pu-
5.2. Multi-linear encoding
In this section for this type of encoding we propose some circuit of computation of second degree form with n variables and show that its resistance satisfies the inequality
Rw > v(K)mn+n
where K is the range of integers used, m is the parameter of the encoding, and v is the Euler function. This means that all computations are modulo K. What is multi-linear encoding? Let x = (xi,... ,x„)( be a vector of data and
( $11 * * * ^ln ^
A
(23)
and
/ h \
b =
(24)
V bm J
Then the encoded vector is
I A \
V J
For matrix A there must exist left inverse
/611 •••
B =
\ b.
n 1
Ax + b.
bl m \ bum j
Then
x = BAx = B{x' -b) = Bx' - Bb = Bx' - b'.
(25)
(26)
The elements of matrix B may no longer be integers. Then there exists such integer m that matrix mB = Bo is integer. Then vector mb' = bo is also integer and the following equation holds
(mB)x' — mb' = Box' — bo-
(27)
How to perform operations over data in terms of encoded data?
To add two integer numbers xi + x2 in terms of encoded data one may use the following procedure
Xi + X2 — (&llXi + ' ' ' + bimx'm ) + (621*^1 + • • • + b2mx'm) — b[ — &2
= (bll + &21 Y-c'l + ' ' ' + {blm + &2m)*Tm — (^1 + ^2) ■
(28)
For calculation of linear combination of two integer numbers AiXi + X2X2 the following formula holds
A1X1 + \2X2=\l(bnX,1 + • • • + bimX,m) + \2(b2lX,1 + - ■ •+ b2m'x'm)—\lb'l—\2b'2
=(Ai&n + A2&2l)*'C/l + • • • + (Ai6im + A2&2m)*'C/m — (Ai b[ + A262).
The linear combinations of coefficients Ai&ij + \2b2j can be calculated during compilation.
Now consider multiplication. To calculate the product of two integers x\x2 one may use the following formula
(m \ / m
y bux'i - b[ • I y b2jx,j - b'2
i=1 J \j=i ) (29)
mm v 7
= y bub^x'ix'j - + b^bu)^ + b\b'2.
i,j=l i—1
The products bnb2j and combinations b[b2i + b2bu can be calculated during compilation.
Now consider calculation of the second degree form
n n n
f(xi,... ,xn) = ^2^,aijXiXj + ^2/3iXi +7 (30)
i— 1 j— 1 i— 1
using encoded data x'.
Claim 5. Let
n
£ki = ^ aiMbji for M = 1, • • •, m (31)
i,j = 1
n n
Cfc = fobik - a>j(bibjk + b'jbik) for fc=l,...,m (32)
i— 1 i,j=l
n n
'/ = “y b'i b'j ~ & b'i - (■33)
i,j=l i— 1
Then the following formula for computing function / defined by equation (30) holds
m m
f{x 1,..., xn) = ^ Y ^kX'k + '?• (34)
k,l=l k= 1
Claim 6. The resistance of expression (34) in multi-linear encoding satisfies
the inequality
Rw > (v(K))mn+n,
where K is the range of integers used and m is the parameter of the encoding.
For K = 264 (usual range for representing integers) the value v{K) = 263 and so lower bounds of suggested circuit of computation of second degree form is greater then 2100 which seems large enough from the point of computational complexity (enumerating all possible solutions is impossible) and from probabilistic point of view (the probability to guess right parameters is less than
2-100).
6. On “good” and “bad” parameters of encodings
The measure of resistance of encodings we introduced gives the possibility to analyze encodings on a quantitative base and choose those parameters which provide greater resistance. Moreover, such measure of resistance allows to compare different algebraic circuits and choose those that have greater resistance.
6.1. An example of “bad” parameters in linear encodings
Let piiP2iP3-,P4 — be four different prime numbers. Let data xi = p3 and X2 = Pi are linearly encoded by formulas
x[ = pixi+bi x'2 = P2X2+b2.
Let adversary observes computation of their product x3 = X\X2 in terms of encoded data made by the following formula
x3 = x'-yx'^ — bix'2 — 62*^1 + b\b2-
The result of encoded computation is connected with real product by relation x3 = 0102X3. Therefore the adversary knows the values by, 62, x[ and x2. So he can find that
x[ — 61 = Cll.Tl
x'2 - 62 = C12X2-
In our case the adversary finds that
aixi = pip3
0-2^2 = P2P4-
Hence he obtains that there are 4 cases for cl\ and xi
• ai = 1, xx = pip3
• ai =px, £1 =p3
• CS1 = P3, X! = pi
• ai= pxp3, xi = 1,
and 4 independent cases for a2 and x2
• a2 = 1, x2 = p2pA
• a2 = p2, x2 = pA
• a2 = p4, x2 = p2
• a2 = p2p4, x2 = 1.
So there are only 16 possible cases in this example.
6.2. Resistance and comparison of algebraic circuits for encoded computations
Now consider multiplication of two integers and estimate its resistance. At first consider lineal’ encoding. Let encodings for xi and x2 be given by formulas
x\ = CL\Xl + &l
x'2 = ci2x 2 + b2
and m = GCD(ai, a2). Then the product y = x\x2 can be encoded by formula (see [10])
y = x'yx'^ — b2x\ — bix'2.
Then an adversary observes the parameters of encoding &i and b2 and finds that
a-ia2 | — b2x[ - bix'2 + bib2).
In the example of Subsection 6.1 we obtain only 16 possibilities for parameters
cii, a2, xi and x2.
So in the case of linear encoding the upper bound for resistance of multiplication is equal to 16. To increase resistance of multiplication it is necessary to make restrictions on the encoding parameters.
Now consider a variant of mixed encoding. Let encodings for xi and x2 be given by formulas
xu o\iX\i I by.j
i 02iX2i -\- b2i,
where
x\ = xu mod pi for 0 < xu < pt x2 = X2i mod Pi for 0 < x2i < Pi ■
Then multiplication y = x\x2 can be expressed in terms of encoded data by formulas (similar to linear encoding) (see [10])
Vi = x'uAi ~ hix'u - bux'2i. (35)
Than an adversary finds that
aua2i | (s/H4i ~ b2ix'u - bux'zi + bub2i). (36)
and we have the same problem as in case of linear encoding. In this case the resistance is at most 16m. In the case m = 5 it is not more then 220.
Now consider another variant of mixed encoding. In this case integers xi and x2 are encoded by the formulas
an xu + bn = x'u mod pi a2ix2i + b2i = x'2i mod p,:,
where GCD(afcj,ft) = 1 and there are no other restrictions on x'u and x'2i and coefficients Uki and 6,;. Then the product y = x\x2 can be expressed in terms of encoded data by the formula (see [11])
y = ai3x'1ix,2i + OLux'yi + a2ix'2i, (37)
where
ai3 = jiaua2i mod pt
an = jiaua2ib2 mod p.j (38)
ai3 = jiCiua^b! mod pt
for some (arbitrary) 7,: such that GCD(7,:,]3,:) = 1. Then adversary observes parameters
«11, «12, «13, • • • , «ml, «m2, «m3-
In this case the function A of formula (8) is given by
A{ctn , &11, • • • , Ciim, b\m , (221, b2 \ , . . . , Ct2m , &2m )
(«11, «12, «13, . . . , «ml, «m2, «mailt may be shown in this case that the resistance of such multiplication is at least v(pi).......v{Pm) where v is the Euler totient function.
7. Conclusions
The main contribution of this paper is the notion of measure of resistance of encoded computations we introduced. This gives the possibility to perform quantitative analysis of encoding schemes and to compare different data encodings. The results presented here show sufficiently high level of protection of data during the computations when one uses considered encoding schemes. Protection of data in encoded world is guaranteed by lower bounds of resistance we obtained. Typically the bounds are at least 2100 when operating with integers of range 264.
References
[1] A. A. Aho, J.E. Hopcroft, J. D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley Publishing Company, 1976.
[2] D. E. Ivnuth, The Art of Computer Programming, vol.2, Seminumerical Algorithms, 1997.
[3] G. Birkhoff, T. C. Bartee, Modern Applied Algebra, McGrow-Hill Book Company, 1975.
[4] T. H. Cormen, C. E. Leiserson, R. L. Rivest, Introduction to Algorithms, The MIT Press Cambridge, Massachusetts, London, England, 1997.
[5] S. T. Chow, H. J. Johnson, Yuan Gu, Tamper Resistant Software Encoding, 08-878835US, 1999.
[6] I. Niven, H. S. Zuckerman, An Introduction to the Theory of Numbers, Wiley, 1980.
[7] C. Collberg, C. Thomborson, D. Low, Manufacturing cheap, resilient and stealthy opaque constructs, Sym.p. on Principles of Prog. Lang., 1998, p.184-196.
[8] M. Mambo, T. Murayama, E. Okamoto, A tentative approach to constructing tamper-resistant software, Workshop on New Security Paradigm,s, 1998, p.23-33.
[9] C. Wang, J. Hill, J. Knight, J. Davidson, Software tamper resistance: obstructing static analysis of programs, Tech. Report, N 12, Dep. of Comp. Sci., Univ. of Virginia, 2000.
[10] A.V. Shokurov, On encodings of integers and the division problem. Technical Report, Institute for Sistem Programming Russian Acad, of Sci., May 2000.
[11] A.V. Shokurov, On measures of resistance of data encodings. Intitute for Sistem Programming Russian Acad, of Sci., Technical Report, February 2001.
Appendix
Proof of Claim 1. We have (all computations are made modulo p^)
'y ' f 1 rj I:jI: ^ f 1 rj I: ( 1:jI, I I: ) ' f (' jI: ^'j I: I j I )
i,j = 1 i,j = 1
n
— ^ ^ &ijk{cikcjkxikxjk~\~
%3=1
^ik^jkxik H- Cjk^ik^jk d'ikdjk')
n
^ ^ &ijkCikCjkxikxjk~^~
%3=1
n ( n \
E I ^ ^ ^ijk^ik^jk
i= 1 \j=1
^ ^ djk %3=i
n E *,j=i (X-ijkCikCjk^ik^ j
n E ( U
*=l \J=1 / *)J=l
For linear part of this expression we can write
^ ^ f3j%i ^ ^ fiik^Cjk^jk H- ^ik) ^ ftikdik)'
i— 1 i=l &=1
Now let
V/ y /, f 1 vy /, r /1: r jI: for i, j 1, . . . , '/?' (40)
n
Cik ^ (Xjik'jCikdjk H- ftikC-ik for & 1, . . . , 772
J=1
Vk — ^ ^ Qijkdikdjk H- ^ ^ ftikdjk (^2)
i,j=l i— 1
Then the formula (22) for computing function / defined by equation (18) holds. ■
Proof of Claim 2. The observed data are coefficients ^ijk^Cik^Vk of formula (22) and encoded data x'ik. Take arbitrary numbers Sik mod pk that coprime with pk and change coefficients of the form (18) by relations dijk = dikdjk&ijk and fiik = Sikfiik’ Then change the parameters of encoding by relations = $ikcik and dik = dik'dik where S^} are modulo pk inverse of Then from formulas (19)- (21) for dijk and fiik
£ijk = &ijkCikCjl = k3jk&ijk$ifc Cik^jk Cjk = ^ijk^ik^jk = £,ijk
Cik ^ ®-jik)C-ikdjk H- fiikCik
3=1 n
^ ^J\$ik$jk^-ijk H- Cik^jfc djk Sikftikdifc C-ik Cik
3=1
Vk — ^ ^ ^ijk^ikdjk + ^ ^ fiikdjk H- '"/fc
i,j=l i— 1
■n. n
= ^ ^ ^ik^jk^ijk^ik dik^jfe djk + ^ ^ dikfiikdjk d*ik H-“Tfe = T]k-
i,j=1 i=l
the coefficients of encoded form do not change. Thus encoded data and circuit don’t change.
Prof of Statements 3 and 4 is similar to the proof of Claim 2.
Prof of Claim 5. We have
n n / m \ / m \
E anxixi = E “y ( E bikX'k ~ b'i ) ' I E bjlX'l ~ b'j )
£,j=l 1 \fc=l / \ f=l / n { m
= ^ ^ a-y ( ^ ^ bi]{bjiX],X{ —
i,j=i \'U=i
mm \
b'i y bjlx\ - bj y\ bikx'k + b'ib'.j J (43)
/=1 fc=l / m I n \
= j O-ijbikbjl J .'Cfc.'C; —
fc,;=i \*,j=i /
m j n \ n
E E «y(6':6jfc +6j6'ifc) 4+ E ai^i k=1 \*,j = l / *,j=l
For linear part of this expression we can write
n n / m \ m / n \ n
X ^ X I X 6':fcXfc “ b'i ) X I X fj'ibik ) X'k ~ E f3ib'i- (44)
fc=l fc=l \fc=l / fc=l \*=1 / fc=l
Now denote by
n
£ki = E “ybikbji for fc, / = 1,..., TO (45)
i,J =1
n n
Cfc = E fi>bik - E ai](bibik + b'jbik) for fc = 1,..., to (46)
i— 1 i,j=l
n n
V = E ay6*'6j' E ; // - (47)
i,j=l i— 1
Then the formula (34) for computing function / defined by equation (30) holds. ■
Prof of Claim 6 is similar to the proof of Claim 2.