Checking parameterized Promela models of cache coherence protocols

Burenkov V.S.; Kamkin A.S.

Checking Parameterized Promela Models of Cache Coherence Protocols

1 V.S. Burenkov <[email protected]> 2 A.S. Kamkin <[email protected]> 1 JSC MCST,

24 Vavilov str., Moscow, 119334, Russian Federation 2 Institute for System Programming of the Russian Academy of Sciences, 25 Alexander Solzhenitsyn str., Moscow, 109004, Russian Federation

Abstract. This paper introduces a method for scalable verification of cache coherence protocols described in the Promela language. Scalability means that resources spent on verification (first of all, machine time and memory) do not depend on the number of processors in the system under verification. The method is comprised of three main steps. First, a Promela model written for a certain configuration of the system is generalized to the model being parameterized with the number of processors. To do it, some assumptions on the protocol are used as well as simple induction rules. Second, the parameterized model is abstracted from the number of processors. It is done by syntactical transformations of the model assignments, expressions, and communication actions. Finally, the abstract model is verified with the SPIN model checker in a usual way. The method description is accompanied by the proof of its correctness. It is stated that the suggested abstraction is conservative in a sense that every invariant (a property that is true in all reachable states) of the abstract model is an invariant of the original model (invariant properties are the properties of interest during verification of cache coherence protocols). The method has been automated by a tool prototype that, given a Promela model, parses the code, builds the abstract syntax tree, transforms it according to the rules, and maps it back to Promela. The tool (and the method in general) has been successfully applied to verification of the MoSI protocols implemented in the Elbrus computer systems.

Keywords: multicore microprocessors, shared memory multiprocessors, cache coherence protocols, model checking, SPIN, PRoMELA.

DOI: 10.15514/ISPRAS-2016-28(4)-4

For citation: Burenkov V.S, Kamkin A.S. Checking Parameterized Promela Models of Cache Coherence Protocols. Trudy ISP RAN/Proc. ISP RAS], volume 28, issue 4, 2016. pp. 57-76. DOI: 10.15514/ISPRAS-2016-28(4)-4

1. Introduction

Shared memory multiprocessors (SMP) constitute one of the most common classes of high-performance computer systems. In particular, it includes multicore

microprocessors, which combine several processors (cores) on a single chip [1]. Nowadays, 8- and 16-core microprocessors are in mass production; hardware vendors have announced forthcoming 48-, 80-, and even 100-core designs. Multicore microprocessors and SMP systems are also designed by Russian companies such as MCST and INEUM, e.g., Elbrus-4C (4 cores, 2014) and Elbrus-8C (8 cores, 2015) [2].

The main problem arising in the development of SMP systems is ensuring memory coherency. As each processor contains a local cache, multiple copies of the same data may exist in the system: one copy is in the main memory, and several copies are in the processors' caches. Modification of a copy should cause either the invalidation of the other copies or their consistent modification. This is supported by so-called cache controllers, i.e. memory devices connected into a network and cooperating in accordance with a special protocol, so-called cache coherence protocol (CCP) [3]. Development of cache coherence mechanisms includes two stages: first, design of a CCP; second, its implementation in hardware. The both stages are error-prone; accordingly, methods for protocol verification and methods for hardware verification are in use [4]. Protocol bugs are especially critical and should be revealed before implementing the hardware. The widely recognized method for protocol verification is model checking [5]. It is fully automated, but suffers from a principal drawback -it is not scalable due to the state space explosion problem. Using the traditional methods for verifying a CCP of a system with four and more processors is impossible (at least, highly problematic) [6].

To overcome the issue and develop scalable verification technologies, researchers utilize parameterized model checking [7]. The idea is to construct abstract models that are independent of the number of processors and may be verified with the existing tools. Correctness of the abstract model guarantees correctness of the original one (checking, however, may produce wrong error messages, so-called false positives). The proposed approach is also of that type. In contrast to the existing ones, it supports the Promela language used in the Spin model checker [8] and the message passing primitives. The method was successfully used for verifying the CCPs implemented in the Elbrus computer systems [2].

The paper is structured as follows. In Section 2, we analyze the existing approaches to CCP verification. In Section 3, we propose a method for constructing an abstract model out of a Promela protocol model. In Section 4, we describe theoretical foundations of the suggested method. In Section 5, we provide a case study on using the method for verifying a MOSI protocol. In Section 6, we summarize our work and outline directions of further research.

2. Related work

As it has been said, classical model checking is inapplicable to CCPs with an arbitrary number of processors. There exists an alternative approach, called deductive verification; however, it is hardly automated due to the need of so-called inductive invariants [9] and does not provide any diagnostic information if there are errors. 58

Parameterized model checking seems to be a more promising approach. It is worth mentioning two directions.

First, verification of a parameterized model (in essence, a family of models) can be reduced to the verification of a single model of the family. Corresponding methods are aimed at finding such number N that verification of the model for N components (processors, cache controllers, etc.) is sufficient for proving correctness in general. In [7], such kind of method is presented, and it is reported that N = 7 is enough for the protocols having been examined. However, that value is too big to make the method applicable to industrial SMP systems [6].

Second, a model (parameterized model) can be abstracted so as to reduce the state space size (make it independent of the number of components). In [10], a method for abstracting a model from the exact number of replicated identical components (e.g., caches in which the cache line is in a given state) is introduced. The technique significantly reduces the state space size; however, the use of a modified version of the Mur< tool complicates its real-life application. A similar idea, called (0,1, counter abstraction, is employed in [11]-[13]. Though the technique seems to be powerful, it often leads to overly detailed abstract models, which makes the approach inapplicable to complex protocols.

In [14], a general method for compositional verification is proposed. The idea is to replace a subset of identical components with an abstract one, called environment. Such replacement usually leads to false positives, and considerable efforts are required to eliminate them. In [15]-[18], the approach has been adapted to CCPs. The suggested method is based on syntactical transformations of Mur< models and counterexample-guided abstraction refinement (CEGAR). The main drawbacks are as follows:

• Mur< does not support the message passing primitives, which complicates CCP description;

• restrictions on Mur< models of CCPs are not clearly defined;

• the tools are not in open access.

3. Suggested method

The problem to be solved is as follows. Given a Promela model of a CCP for some configuration of an SMP system (i.e. a model with a fixed number n > 2 of processors), it is required to check the CCP correctness for an arbitrary configuration of the system (i.e. for any N > n).

Models considered in this paper satisfy the following conditions (obtained from the verification practice and shown to be sufficient for specifying CCPs). The allowed statements are if, do, goto, = (assignment), ! (send), and ? (receive). Each guarded action is placed in an atomic block and therefore is executed with no interruption; else alternatives are absent. Assignments' right-hand sides contain only primary expressions, i.e. variables and constants; left-hand sides are variables and array elements (an array index is a primary expression). Atomic logic formulaе are of the

form x == c or B(ch), where x is a variable (or an array element), c is a constant, ch is a channel, and B is a predicate: empty, full, etc.

3.1 Model parameterization

From the conceptual point of view, a CCP model consists of an unbounded number of replicated identical processes, so-called basic processes, and a fixed number of auxiliary processes. Without loss of generality we will assume that there is only one auxiliary process. All processes are enumerated from 0 to N, where N is a parameter: 0 is the identifier of the auxiliary process, while 1,...,N are the identifiers of the basic processes. All arrays used in the model (arrays of variables and arrays of channels) are of length N and indexed with the identifiers of the basic processes.

To generalize the original model to a parameterized one, the following induction rules are used:

• each condition containing an array is either a conjunction or a disjunction of similar conditions on all array elements:

o <p{i/1} A ... A <p{i/n} is interpreted as Vi £ {1, ...,N}: (p; o <p{i/1} V ... V<p{i/n} is interpreted as 3i £ {1, ...,N}: (p;

• each sequence of statements a{i/1}; ...;a{i/n} is interpreted as a loop for (i: 1 ..N) {a}.

Here, (p (a) is a formula (statement) containing an index i as a free variable, and <p{i/t} (a{i/1}) denotes the result of substitution of t for all occurrences of i in cp (a).

3.2 Assumptions

Let us consider a CCP where request processing is coordinated by a system commutator of the home processor (the processor that owns the requested data). Accordingly, the Promela model contains two process types: proc is a cache controller (a basic process), and home is a home processor's commutator (an auxiliary process). As usual, the CCP model deals with a single cache line. Broadly speaking, the CCP works as follows. Each proc instance may initiate an operation on the cache line by sending a primary request to the home process. Upon its reception and analysis, home sends snoop requests to all processes except for the sender. After snoop reception, a proc sends a response to the sender (data or an acknowledgement that it has completed an action on the cache line). Having collected all of the answers, the sender informs home on the completion of the operation. As soon as the completion message is received, home can accept the next primary request (see Fig. 1).

Fig. 1. Generalized scheme of a CCP

It is worth emphasizing that at most one primary request is being processed at each moment of time. It is assumed that values of global variables (e.g., a current sender identifier) are set by home upon reception of a primary request and do not change during its processing.

Each channel can be read by a single process; however, multiple processes are allowed to write into it. A channel is called simple if there is only one sender; otherwise, it is called multiplexed. Let Cs^r be the set of channels with the reader r and senders from the set S. Channels are divided into three groups (hereinafter, singletons are written without brackets, e.g., 0 ^ j stands for {0} ^ j):

• C„ = U^=0 is the set of multiplexed channels of capacity N used by home and proc to receive messages from the basic processes (e.g., a channel over which home receives primary requests, and channels over which processes receive responses);

• Ch^p = U^ C0^j is the set of simple channels of positive capacity (which is defined by the CCP, but independent of N) used by the basic processes to receive messages from home (e.g., channels over which home transmits snoop requests);

• Cp^h = Uf=1 is the set of simple channels of capacity 1 used by home to receive messages from the basic processes (e.g., channels over which a sender informs home on operation completion).

Messages transmitted via channels are ordered pairs of the form (opc, i), where opc is an operation code, and i is an identifier of the message sender. A verified CCP property looks as follows:

G[vk, I E {1,... N}: p{i/k,j/l}},

where G is an operator that requires its argument to be true in all reachable states of the model [5]; (p is a formula with two free indices (i and j) that characterizes cache coherency in the corresponding caches. For MOSI protocols [3], (p is as follows: i—(cache[i] = M A cache [j] ± I); \—(cache[i] = 0 Acache[j] = 0); where cache is an array that stores the cache line states.

3.3 Informal description

The core of the proposed method is syntactical transformation of Promela code. The transformations change the process types and retain four processes of N + 1: a modified home process (homeabs), two modified proc processes (procabs), and an environment process representing the rest of the processes (procenv). Accordingly, the initialization process of the abstract model is as follows (ABS is a constant distinct from 0, 1, and 2): init { atomic { run homeabs(0); run procabs(1); run procabs(2); run procenv(ABS);

}

The length of all arrays is changed from N to 2 (recall that arrays are indexed with the identifiers of the proc processes). Each array access is supplied with the guard i < 2, where i is the index of the element being accessed.

• On read (in a condition), the atomic formula containing the array access, is replaced with undef (an undefined value) if the index is rejected by the guard:

B(x[i],...) ^ (i<2 ^ B(x[i],...) ■ undef). In Promela, a formula of the kind (B ^ tt ■ t2) corresponds to the conditional construct if B then tt else t2 fi.

• On write (in an assignment), the assignment to the array is placed inside the selection statement:

x[i] = t ^ if :: atomic {i < 2 ^ x[i] = t} :: else fi. Assignments to the global variables as well as conditions on the global variables remain unchanged. Channels of the set Ch^p are represented as an array (let us denote it as ch). Similarly to other arrays, it is truncated to length 2. Each atomic formula over ch[i], where i > 2, is replaced with undef, while each operation on such a channel is removed. Channels of the sets C„ and Cp^h are represented by individual variables, not arrays.

Send statements are either unchanged or removed. A statement ch!m in a process type P is removed only in the following cases:

• che Ch^e and P = homeabs, where Ch^e = U^=3 co^j; e.g., homeabs does not send snoop requests to procenv;

• che C„ и P = procenv ;

e.g., procenv does not send primary requests / snoop responses. Receive statements may be left unchanged, modified, or removed. A statement ch? m in a process type P is removed only in the following case:

• che Ch^e and P = procenv ;

e.g., procenv does not receive snoop requests. Modification of ch? m takes place solely in the following case:

• che C„ and P e {homeabs,procabs}.

The corresponding transformation replaces a guarded action of the kind atomic {B ^ ch? m] with the following selection statement: if

:: atomic {B' ^ ch?m}

:: atomic {m.opc = opci; m.i = ABS}

:: atomic {m.opc = opck; m.i = ABS} fi

where B' is the result of B transformation, and opc1,^,opck are all possible operation codes that may be sent along the channel ch.

proc(3).....proc(N)

Fig. 2. Abstraction of a CCP model

Fig. 2 provides a simplified view on CCP model abstraction. All processes except for home(0), proc(1), and proc(2) are merged into the environment process

procenv(ABS). Solid arrows represent the unmodified send / receive statements. Dashed arrows correspond to the removed sends / modified receives. Having performed the above transformations, all logical formulae containing undef (in essence, formulae of Kleene's strong three-valued logic) are transformed into classic logic formulae such that undef in the outer scope is interpreted as true. This is achieved by the obvious transformation F:

• F(*p) ^ G(<p,true);

• G(undef,T)^T;

• G(B, T) ^ B, where B is an atom distinct from undef;

• G(-<p,T)^-G(<p,-T);

• G(<p °xp,T)^ G(<p, T) ° G(ip, T), where ° £ {A,V}.

When transforming the Promela model, the following optimizations are applied:

• constant propagation and folding;

• dead code elimination. Here are some simple examples:

• (i <2) ^ true in homeabs and procabs;

• (true A B) ^ B and (false A B) ^ false;

• atomic {true ^ a} ^ a.

It should be said that in general case the abstraction procedure transforms N + 1 processes to the k + 2 ones, where k £ {2, ...N - 1}: procabs (in the number k), homeabs, and procenv.

4. Theoretical foundations

4.1 Basic definitions

Let Var be a set of variables and Chan be a set of channels. Data = Var U Chan is referred to as the set of data. For each c £ Chan, a value | c | > 0, called capacity, is defined. A data state (or state for short) is a valuation of data, i.e. a mapping s that maps each variable v to the value s(v) £ N and each channel c to the sequence of messages s(c) £ M* such that |s(c)| < |c|. The set of all states is denoted by S. A designated state s0 £ S is called initial.

Let us assume that there is a language over the data that includes logic formulae and statements, such as x = t (assignment), c ! m (send), and c ?m (read). A guard is a formula; an action is a sequence of statements; a guarded action is a pair y ^ a, where y is a guard, and a is an action. The guarded action true ^ e, where e is the empty sequence of statements, is called empty and designated as e. The set of all guarded actions is denoted by Act. A guarded action y ^ a is called executable in s £ S iff (if and only if) s ^ y.

A process graph (or process for short) is a triple {V, v0,E), where V is a set of vertices, v0 £ V is an initial vertex, and E QV x Act xV is a set of edges. 64

Process structure is defined by the control statements: i f (selection), d o (repetition), and g oto (jump). Correspondence between code and processes is straightforward and not described here.

r

A system is a set of processes, i.e. {< V v0., Ei)\._ . Hereinafter, Pi is considered to be a shortcut for {Vi,v0.,Ei). A configuration of {Pt}f=0 is a pair (l,s), where 1: {0,..., N} ^ Uf=0 Vi such that l(i) E V\ for all i E {0, ..., N}, so-called the control state, and s E S. The configuration {l0, s0), where l0(i) = v0i for all i E {0,..., N}, is called initial.

The state space of a system {Pi}?=0 is a triple {C, c0, T), where C is the set of all configurations of the system, c0 is the initial configuration, and T c c x

({0,..., N} x (Uj=0 Et)) x C is a transition relation such that the following property

holds: ({I, s), (i, (v, y ^ a, v')), {l',s')) E T iff:

• l(i) = v;

• (v,y ^ a,v') E Et;

• s ^ y;

• l' = (l \{i» v}) U{i » v'};

• s' = [a](s), where [a]:S ^ S is the semantics of a (actions are assumed to be deterministic).

It is worth mentioning that the restrictions on the transition relation conform to the notion of asynchronous parallelism.

A configuration c is called reachable in a state space {C, T, c0) iff there is a path in T from 0 to . A state is called reachable iff a configuration { , ), for some , is reachable.

4.2 System abstraction

A process transformation (or transformation for short) is a function that maps one process to another.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Let Datas = (Vars U Chans) C Data be a set of significant data. States s and s' are called equivalent (it is designated as s ~ s') iff SiDatas = s'iDatas.

A guarded action y' ^ a' is referred to as an abstraction of a guarded action y ^ a in E S iff:

• the truth of y' is determined only by the significant data: for all s ' E S such that s' ~ s, s' ^ y' iff s ^ y';

• the effect of a' is determined only by the significant data: for all s' E S such that s' ~ s, there holds [a'] (s') ~ [a'] (s);

• y' is weaker than y: s ^ y ^ y';

• a' acts similar to a: [a'](s) ~ [a] (s).

A set of guarded actions {y' ^ a'i}1j=1 is referred to as an abstraction of a guarded action y ^ a in s £ Siff there exists i £ {1, ...,m} such that y' ^ a'' is an abstraction of y ^ a in s.

A guarded action y' ^ a' (a set {yi ^ a'i}1j=1) is referred to as an abstraction of y ^ a iff y' ^ a' ({y' ^ a''}1,-!) is an abstraction of y ^ a in all states. An abstraction function is a mapping f\ Act ^ 2Act such that for all y ^ a £ Act, f(y ^ a) is an abstraction of y ^ a. The abstraction function I(y ^ a) = {y ^ a} is called trivial.

It should be emphasized that this view to abstraction is a bit simplified. An abstraction function should take into account context of a guarded action (the process edge, the process, and the model). Thus, it is assumed that each guarded action contains the context information.

Let P = {V, v0, E) be a process, f be an abstraction function, V' be some set, and R:V^V' be a mapping. An abstraction of P induced by f and R is the process f(P,R) = {V',R(v0),E'), where E' is defined as follows:

• if (v,y ^ a,u) £ E and f(y^a) = {y'i^a'i}1j=1, then {(R(v),y? ^ a[,R(u))}?=1QE';

• no other edges belong to E'.

An abstraction f(P,R), where R is a bijection, is referred to as a bijective abstraction. Besides transforming individual processes, there are of interest transformations that merges several processes into one. Let us consider a particular kind of such transformations, where processes to be merged are identical. Given a system {Pi}?=0, the following denotations can be introduced (i £ {0,..., N}):

• Uset is the set of variables read by Pt;

• Defi is the set of variables assigned by Pt;

• Vart = UsetU Def is the set of variables of Pt;

• VarL. is the set of local variables of Pt (we do not define the set VarL. assuming that it is provided);

• Varc = Var \ (Ui=0 VarL.) is the set of global variables.

Similarly, the following sets of channels (including the sets of local channels and the set of global channels) can be defined: Ini, Outt, Chani, ChanL., and Chanc. In addition,

• Datai = Vart U Chant is the set of data of Pt;

• DataL. = VarL. U ChanLl is the set of local data of Pt;

• Datac = Varc U Chanc is the set of global data.

Processes are called identical if they can be transformed one another by renaming their local data. More formally, processes Pt and Pj are called identical if there are a bijection R:Vi ^Vj and a bijection r: Datahi ^ Datah. such that R(v0.) = v0. and

(v,y ^ a,u) E Et iff (R(v),r(y ^ a),R(u)) E Ej, where r(y ^ a) is the resultof renaming the local data in y ^ a in accordance with r.

Let {Pi}be a system of identical processes, Datas n (uf=fc DataL.) = 0 (the processes' local data are insignificant), g be an abstraction function, V be some set, and R: Vki ^ V be a mapping. The process g(Pk±, .-,Pk2; R) = g(Pk±,R) is called a unifying abstraction of {Pi}'i=1 induced by g and R.

The definition needs to be clarified. Provided that the processes {Pi}f^i operate simultaneously, there are control states that cannot be represented by a single vertex of the abstraction g(Pk±,..., Pk2;R). Thus, a unifying abstraction may appear to be inadequate. Let us assume that each process can be either active or passive, and it is prohibited two or more processes to be active simultaneously. Besides, the passive mode is organized as the following loop:

• a request is received;

• the local data are updated;

• a response is sent;

• the control is returned to the initial vertex. Let V(E') be the set of all vertices of the edges from E'.

A process P = {V, v0, Ea U Ep) is referred to as a bimodal process with the set of active edges Ea and the set of passive edges Ep iff EAn Ep = 0 and the graph {V(Ep), Ep) is strongly connected.

Given a bimodal process P = {V,v0,Ea uEp), the following denotation can be introduced: VA = V(EA) and Vp = V(Ep) (generally speaking, VA nVP ^ 0). The process g(P,R) = {V', vQ, E'), where g is an abstraction function, and R:V ^V' is a mapping, is called a serializing abstraction of P iff R satisfies the following properties:

• R(v) = vQ for all v E Vp \ VA;

• R: VA ^ V is abijection; and E' is defined as follows:

• if (v,y ^ a,u) E Ea and g(y ^ a) = {y' ^ a'i}7j=1, then {(R(v),y' ^ a[,R(u))}v=1CE';

• (vQ, £,vQ) E E' (so-called e-self loop);

• no other edges belong to E';

and for every (v, y ^ a, u) E EP, the empty guarded action £ is an abstraction of y ^ a, i.e. a depends on and affects solely insignificant data.

The nature of serializing abstraction is removing all passive edges and replacing them with the e-self loop (v'o,£,v0). Being applied to identical bimodal processes, such abstraction makes them unimodal and serializable (at most one process is operating, i.e. being in a non-initial state, at each moment of time) and allows constructing an adequate unifying abstraction.

Let M = {Pi}f=0 be a system where all processes, except maybe {Pi}lk=0, for some k £ {0,...,N}, are identical and bimodal; Datas be significant data; V[, where i £ {0, ...,k + 1}, be some sets; R^Vi ^ V[ be some mappings; ft, where i £ {0,...,k}, and g be abstraction functions; at that, fi(Pi,Rd are bijective abstractions, while g(Pk+1,..., PN; Rk+i) is a serializing abstraction. Then, the system

M' = {fi(Pi,Ri)}k=0U{g(Pk+i.....PN;Rk+i)}

is called an abstraction of M. A process fi(Pi\ Ri), where i £ {0, ...,k}, is called an abstraction of the process Pt. The process g(Pk+1, ■■■ ,PN; Rk+1) is called an abstraction of the environment.

Statement. Let M = {Pi}?=0 and M' = {P}}^ be, respectively, a system and its abstraction. Given an arbitrary state s, if s is reachable in the state space of M, then there is a state s' reachable in the state space of M' such that s' ~ s. Proof. Let ABS = k + 1. This denotation is introduced to emphasize that the abstraction of the environment, the process PABS = Pll+1, generalizes not only the process Pk+1, but also the processes Pk+2, -, Pn.

A configuration {I', s') of M' is said to conform to a configuration {I, s) of M iff the following conditions are satisfied:

• l'(i) = Rt(l(i)) for all i £ {0, ...,k};

• if I'(ABS) = Rabs(v0abs), then l(i) = v0. for all i £ {k + 1,..., N};

• if I'(ABS) ^ Rabs(v0abs), then there is only one index i £ {k + 1, ...,N} such that l'(ABS) = Ri(l(i));

• s' ~ s.

Let us consider a path in the state space of M starting with {l0, s0):

n = {{{li,si), {ii, (vi, Yj ^ ai, vi+1)), 5/+1))} .= .

Here, ij £ {0, ...,N} isaprocessindex; Vj = lj(ij) £ Vi. and vj+1 = lj+i(ij) £

are

the process's vertices connected with the edge labelled by Yj ^ aj;Sj ^ yj andSj+1 = [aj](sj) for all j £ {0, ...,m- 1}.

Our goal is to show that, in the state space of M', there is a path n' of the same length as n such that each configuration of n' conforms to the corresponding configuration of n:

n' = {{(lj,sj),(ij,(vj,Yj ^ a^,v^+i)),(l'j+i,s^+i))}.=o .

Obviously, existence of such a path implies that there is a state s'm reachable in the state space of M' such that s'm~ sm. Let us consider how to construct n'. Induction basis. The initial configuration {1'0, s'0) certainly conforms to (l0, s0>: v0. = l'(i) = Ri(l(0) = Ri(vot) for all i£{0.....N}. '

Inductive step. Given an arbitrary index q E {0,...,m — 1}, we will show that if the configuration {lq,s'> conforms to {lq,sq), then there are a process of M' (let us denote its index as i'q) and an edge (v',Yq ^ a',v'+1) of that process such that {l'q+i,s'q+i) = {(I'q \ { i'q » v'}) и {i' » v'q+i}, E«'](sq)> (see the definition of the state space) conforms to {lq+1,sq+1>. There are two cases:

• iqE{0.....k};

• iqE{k + 1,...,N}.

Case 1. If iq E {0,..., k}, let iq = iq: the transition is executed by the process P'q =

fiq (Píq, Riq').

The edge (vq,yq ^ aq,vq+1) of the process Piq is abstracted to the set of edges

{(Riq (Vq), Гf ^ a'0, Riq (Vq+1))}|= ^ where fiq (Yq ^ aq) = {y^ ^ ^

Among them, there is selected an edge whose label, y'^ a'q, is an abstraction of yq ^ aq in sq. Such an edge always exists (see the definition of the process abstraction). We need to proof that the chosen edge belongs to the state space of M' and the configuration {l'q+1,sq+1> conforms to {lq+1,sq+1>. It is sufficient to proof the following statements:

• s'^ yq;

• Ia'](sq)~la](sq).

The first of them can be deduced from the facts that sq и yq (otherwise, the state space of M would not include the transition under consideration), у' ^ a'q is an abstraction of yq ^ aq in sq, and s' ~ sq (the induction assumption). Obviously, sq и yq and sq и yq ^ у' lead to sq и у' , which, in couple with sq ~ sq, leads to sq и у' . The second statement is an implication of the facts that у' ^ a'q is an abstraction of yq ^ aq in sq and s' ~ sq.

Case 2. If iq E {k + 1,..., N}, let i'q = ABS: the transition is executed by the process Pabs = a(Pk+1, .■■, Pn'; Rabs). There are two subcases:

• the edge (vq, yq ^ aq, vq+^ is active;

• the edge ( vq,yq ^ aq,vq+1) is passive.

Subcase 2.1. If the edge is active, then, by definition of configuration conformance, I' (ABS) = Riq (vq). In P'bs, there is selected an edge between Riq (vq) and Riq (vq+1) whose label is an abstraction of yq ^ aq in sq. Such an edge always exists (active edges are abstracted in a usual way). The further proof is similar to that in Case 1.

Subcase 2.2. If the edge is passive, then Riq(vq) = Riq(vq+1) = Riq (v0.J = v'ABS. In P'bs , there is selected an edge (v0ABS, £, v'ABs). Conformance of the configuration follows from the facts that passive edges do not depend on sufficient data and do not affect them.

Conclusion. Given an arbitrary path n in the state space of M, there is a path n' in the state space of M' such that the ending state of n' is equivalent to the ending state of n.

Q.E.D.

Corollary. Let M = {Pt}f=0 and M' = {P'}k+1 be, respectively, a system and its abstraction. Given an arbitrary formula p over significant data, if (p is true (false) in all states reachable in the state space of M', then p is true (false) in all states reachable in the state space of M.

4.3 Model transformation

This section defines abstraction functions used for protocol model transformation. The description is not quite formal: rigorous definition requires, first, formalization of the Promela semantics and, seconds, usage of formalisms for describing code transformations. Nevertheless, we believe that the explanations below are sufficient for formalizing and automating the abstraction procedure.

Let M = {PJI0 and M' = {P'}^1 be, respectively, a system (referred to as an original model) and its abstraction (referred to as an abstract model). Let us recall that each message circulating in the model includes the sender's identifier. A state of a channel being written by {Pi}'i=k+1, as well as messages being read from the channel may contain identifiers from the set { k + 1, ...,N}. In the abstract model, there are no such identifiers: they are mapped to ABS (usually, ABS = k + 1). The definition of state equivalence should be modified so as not to distinguish between i and ABS if i £ {k + 1,..., N}.

Another issue is as follows. State of a channel's buffer is not of importance until a message is read. The idea is to ignore some messages (in particular, messages written by {Pi}f=k+1). In this case, a send statement can be replaced with e. To preserve the abstraction properties, each read from the channel should be supplied (as alternative behavior) with the assignments of all possible values that could be sent via the channel by the removed statement to the message variable. To be more precise, the definition of state equivalence should take into account the following considerations:

• given a channel c £ C„, an abstract state s' is (quasi) equivalent to a state s (state is a sequence of messages) iff ' is produced from by removing all messages with identifiers from { k + 1,..., N};

• the channels from Ch^e = U^=k+1 C0^j are insignificant (every two states of a channel are equivalent);

• an abstract state s' of the channels Ce^h = U?Lk+1 (as a whole) is equivalent to a state s iff there is i £ { k + 1, ...,N} such that for each c £ Ct^0, the state s'(c'), where c' is a channel that corresponds to c in PABS, is produced from s(c) by replacing i with ABS while the remaining channels

are empty in both states. The suggested approach implies the following restrictions on the input model:

• Datas = Data \ (UlN=k+1DataL.);

• for each iE{1,...,N}, there holds Chani = ChanA. U ChanP., where ChanAi and ChanPi are the sets of channels used, respectively, in the active and passive modes, and:

o ChanA. П ChanP. = 0;

o ChanAt £ ChanG (ChanAt = C{1, ,n}^o U Ci^o); o ChanP. £ ChanLl (ChanP. = C0^t U (UN=1 C{1j_jN}^j))';

• the only channel predicate in use is empty (behavior does not depend on the number of messages in the channels' buffers);

• there are no dependencies via variables between the processes {Pi}N=1 (all dependencies are via messages);

• each guarded action is closed under data dependencies via variables;

• there are no data dependencies from the local data (control dependencies from the local data are allowed).

M' = {Pl}k+01 = {fi(Pi,Ri)}k=oU{g(Pk+1.....PN;Rk+1)}, the abstract model, is

constructed as follows (the description below can be viewed as a definition of the mappings Ri and the abstraction functions fi and g). Initially, each process P/, where E {0, . , k + 1}, is isomorphic to Pi: Pi' = l(Pi,Roi), where I is the trivial abstraction function, while R0i: Vi ^ Vf is a bijection. Then, the following transformations are applied to PAbs = Pll+1 and the rest of the processes:

• all passive edges of P'bs are removed and replaced with the e-self loops;

• when removing a passive edge whose action contains a read from some channel (a write to some channel ):

o in {P^}'k=0, for all j E {k + 1,..., N}, all writes to Cj (all reads from Cj), where Cj is a channel of Pj that corresponds to с (the processes are identical), are removed; o when removing a read of a message m:

■ in the guards dependent on m, the minimal subformulae dependent on m are replaced with undef;

• the active edges of P'bs are processed as follows:

o all assignments to the local variables are removed; o when removing an assignment to a local variable x:

■ in the guards dependent on x, the minimal subformulae dependent on x are replaced with und f;

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

o each read from a global channel с is not modified:

■ in {Pf}'k=0, writes to с are not modified;

o each write to a global channel c is removed:

■ in {P'}k=0, each read c?m is supplemented with the

alternatives [m = Vj}._ where [vj}._i contains all

possible values that PABS can send via c.

Statement. The processes {fi(Pi, Ri)}k=0 (constructed as it is described above) are bijective abstractions, while the process g(Pk+1,. ,PN;Rk+1) is a serializing abstraction. Thus, M' is an abstraction of M.

As the description is informal, the statement is given without a proof. It should be noticed that the abovementioned method has been implemented in a tool prototype. Given a Promela model, the tool parses the code, builds the abstract syntax tree, transforms it according to the rules, and maps it back to Promela.

5. Case study

The tool and the underlying method were used to verify the MOSI family CCPs implemented in the Elbrus computer systems. The developed Promela model supports memory accesses of the types Write Back, Write Through, and Write Combined. The experiments were performed on Intel Core i7-4771 with a clock rate of 3.5 GHz. The verified properties are as follows:

• G{-(cache[1] = M A cache[2] = M)};

• G{-(cache[1] = 0 A cache[2] = 0)};

• G{— (cache [1] = M A cache [2] £ {0,5})}.

Table 1 and Table 2 show time and memory resources consumed for checking the property (1), respectively, on the original model (n = 3) and on the abstract one. Note that in the case n = 3 abstraction preserves the number of processes: home(0), proc(1), and proc (2) are replaced with their abstract counterparts, while proc(3) is replaced with procenv (ABS).

Table 1. Resources required for checking the original model

Spin optimization State space size Memory consumption Verification time

Absent 5.1 x 106 682 Mb 9 s

COLLAPSE 5.1 x 106 328 Mb 15 s

Table 2. Resources required for checking the abstract model

Spin optimization State space size Memory consumption Verification time

Absent 2.2 x 106 256 Mb 3.7 s

COLLAPSE 2.2 x 106 108 Mb 6.2 s

The tables show that even for n = 3 there is a gain in state space size and memory consumption. Meanwhile, correctness of the abstract model implies correctness of the 72

original one for any n> 3. It is shown that the suggested approach reduces verification of the parameterized CCP model to visiting and testing ~106 states, which requires ~100 Mb of memory.

6. Conclusion

SMP computer systems utilize complicated caching mechanisms. To ensure that multiple copies of the same data are kept up-to-date, CCPs are employed. Errors in the CCPs and their implementations may cause data corruption and system hanging. This explains why CCP verification methods are of high value and importance. The main problem arising in CCP verification is state explosion. In this paper, we have proposed an approach to overcome the issue and make verification scalable. The method having been described is aimed at transforming a CCP Promela model so as the result is independent of the number of processors and can be verified by the Spin model checker on a regular basis. The approach was successfully applied to the MOSI family CCPs implemented in the Elbrus computer systems.

In the future, we are planning to extend the method with CEGAR, to develop an open-source tool for syntactical transformations of Promela models (a prototype is already available), and to create a unified model-based technology for checking CCPs and verifying memory management units.

References

[1]. Patterson D.A., Hennessy J.L. Computer Organization and Design: The Hardware/Software Interface. Morgan Kaufmann, 2013. 800 p.

[2]. Kim A.K., Perekatov V.I., Ermakov S.G. Microprocessors and computer systems of the Elbrus familty. SPb.: Piter, 2013. 272 p. (in Russian).

[3]. Sorin D.J., Hill M.D., Wood D.A. A Primer on Memory Consistency and Cache Coherence. Morgan and Claypool, 2011. 195 p.

[4]. Kamkin A.S., Petrochenkov M.V. A system to support formal methods-based verification of coherence protocol implementations. Voprosy radioehlektroniki. Ser. EVT. [Issues of radio electronics], 2014, issue 3, pp. 27-38 (in Russian).

[5]. Clarke E.M., Grumberg O., Peled D.A. Model Checking. MIT Press, 1999. 314 p.

[6]. Burenkov V.S. An analysis of the Spin model checker applicability to cache coherence protocols verification. Voprosy radioehlektroniki. Ser. EVT [Issues of radio electronics], 2014, issue 3, pp. 126-134 (in Russian).

[7]. Emerson E.A., Kahlon V. Exact and Efficient Verification of Parameterized Cache Coherence Protocols. Correct Hardware Design and Verification Methods, IFIP WG 10.5 Advanced Research Working Conference, 2003, pp. 247-262.

[8]. Holzmann, G.J. The Spin Model Checker: Primer and Reference Manual. Addison-Wesley Professional, 2003, 608 p.

[9]. Park S., Dill D.L. Verification of FLASH Cache Coherence Protocol by Aggregation of Distributed Transactions. Annual ACM Symposium on Parallel Algorithms and Architectures, 1996, pp. 288-296.

[10]. Ip C.N., Dill D.L. Verifying Systems with Replicated Components in Murphi. International Conference on Computer Aided Verification, 1996, pp. 147-158.

[11]. Pnueli A., Xu J., Zuck L. Liveness with (0, 1, <»)-Counter Abstraction. International Conference on Computer Aided Verification, 2002, pp. 107-122.

[12]. Clarke E., Talupur M., Veith H. Environment Abstraction for Parameterized Verification. Verification, Model Checking, and Abstract Interpretation, 2006. LNCS, vol. 3855, pp. 126141.

[13]. Clarke E., Talupur M., Veith H. Proving Ptolemy Right: The Environment Abstraction Framework for Model Checking Concurrent Systems. International Conference on Tools and Algorithms for the Construction and Analysis of Systems, 2008, pp. 33-47.

[14]. McMillan K. Parameterized Verification of the FLASH Cache Coherence Protocol by Compositional Model Checking. Conference on Correct Hardware Design and Verification Methods, 2001, pp. 179-195.

[15]. Chou C.-T., Mannava P.K., Park S. A Simple Method for Parameterized Verification of Cache Coherence Protocols. Formal Methods in Computer-Aided Design, 2004. LNCS, vol. 3312, pp. 382-398.

[16]. Krstic S. Parameterized System Verification with Guard Strengthening and Parameter Abstraction. International Workshop on Automated Verification of Infinite-State Systems, 2005.

[17]. Talupur M., Tuttle M.R. Going with the Flow: , pp. 1-8.

[18]. O'Leary J., Talupur M., Tuttle M.R. Protocol Verification Using Flows: An Industrial Experience. Formal Methods in Computer-Aided Design, 2009, pp. 172-179.

Проверка параметризованных Promela-моделей протоколов когерентности

памяти

1 В.C. Буренков <[email protected]> 2 А.C. Камкин <[email protected]> 1 АО «МЦСТ» 119334, Россия, г. Москва, ул. Вавилова, 24. 2 Институт системного программирования РАН, 109004, Россия, г. Москва, ул. А. Солженицына, 25

Аннотация. В статье представлен метод масштабируемой верификации Promela-моделей протоколов обеспечения когерентности памяти. Под масштабируемостью понимается независимость затрат на верификацию (прежде всего, машинного времени и памяти) от числа процессоров в системе. Метод состоит из трех основных шагов. На первом шаге в модель протокола, созданную для определенной конфигурации системы (для конкретного числа процессоров), вводится параметр, представляющий число процессоров в системе. Для этого используются простые индуктивные правила, что возможно только при определенных допущениях на вид протокола. На втором шаге построенная параметризованная модель абстрагируется от числа процессоров. Для этого над присваиваниями, выражениями и коммуникационными действиями модели совершается ряд синтаксических преобразований. На третьем шаге полученная абстрактная модель верифицируется с помощью инструмента Spin обычным образом. Помимо описания метода, в статье приводится доказательство его корректности:

утверждается, что предложенная схема абстракции является консервативной в том смысле, что любой инвариант (свойство истинное во всех достижимых состояниях) абстрактной модели является инвариантом исходной модели (свойства-инварианты — это именно те свойства, которые представляют интерес при верификации протоколов обеспечения когерентности памяти). Предложенный метод был воплощен в прототипе инструмента, который разбирает код на языке Promela, строит дерево абстрактного синтаксиса, преобразует его по заданным правилам и отображает обратно в Promela код. Инструмент (и метод в целом) был успешно использован при верификации протоколов семейства MOSI, разработанных в АО «МЦСТ» и реализованных в вычислительных комплексах «Эльбрус».

Ключевые слова: многоядерные микропроцессоры, мультипроцессоры с разделяемой памятью, протоколы когерентности памяти, проверка моделей, Spin, Promela.

DOI: 10.15514/ISPRAS-2016-28(4)-4

Для цитирования: Буренков В.С., Камкин А.С. Проверка параметризованных PROMELA-моделей протоколов когерентности памяти. Труды ИСП РАН, том 28, вып. 4, 2016 г. стр. 57-76 (на английском). DOI: 10.15514/ISPRAS-2016-28(4)-4

Список литературы

[1]. Patterson D.A., Hennessy J.L. Computer Organization and Design: The Hardware/Software Interface. Morgan Kaufmann, 2013. 800 p.

[2]. Ким A.K., Перекатов В.И., Ермаков С.Г. Микропроцессоры и вычислительные комплексы семейства «Эльбрус». Спб.: Питер, 2013. 272 с.

[3]. Sorin D.J., Hill M.D., Wood D.A. A Primer on Memory Consistency and Cache Coherence. Morgan and Claypool, 2011, 195 p.

[4]. Камкин А.С., Петроченков М.В. Система поддержки верификации реализаций протоколов когерентности с использованием формальных методов. Вопросы радиоэлектроники. Серия ЭВТ, 2014, вып. 3, стр. 27-38.

[5]. Clarke E.M., Grumberg O., Peled D.A. Model Checking. MIT Press, 1999, 314 p.

[6]. Буренков В.С. Анализ применимости инструмента Spin к верификации протоколов когерентности памяти. Вопросы радиоэлектроники. Серия ЭВТ, 2014. вып. 3, стр. 126134.