Научная статья на тему 'Model-Based verificaton with Error Localization and Error Correction for c designs'

Model-Based verificaton with Error Localization and Error Correction for c designs Текст научной статьи по специальности «Компьютерные и информационные науки»

CC BY
85
44
i Надоели баннеры? Вы всегда можете отключить рекламу.
i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Model-Based verificaton with Error Localization and Error Correction for c designs»

4. Zhuravlev Y.I., Izbrannye nauchnye trudy [Favorites scientific works], Moscow, 1998, 420 p.

5. Zagoruyko N.G., Prikladnye metody analiza dannykh i znany [Applied methods of data analysis and knowledge], Novosibirsk, 1999, 270 p.

6. Mazurov V.D., Metod komitetov v zadachakh optimizatsii i klassifikatsii [Committee method in optimization and classification], Moscow, Nauka, 1990, 248 p.

7. Rastrigin L.A., Erenshteyn R.H., Metod kollektivnogo raspoznavaniya [Method of collective recognition], Moscow, 1981.

8. Rifkin R., Klautau A., Journ. of Machine Learning Research, 2004, pp. 101-141.

9. Freitas A.A., Data Mining and Knowledge Discovery with Evolutionary Algorithms, Berlin, Springer, 2002, 279 p.

10. Emelyanov V.V., Kureychik V.V., Kureychik V.M., Teoriya i praktika evolyutsionnogo modelirovaniya [Theory and practice of evolutionary modeling.], Moscow, 2003.

11. Rutkovskya D., Pilinsky M., Rutkovsky L., Nejronnye seti, geneticheskie algoritmy i nechetkie sistemy [Neural networks, genetic algorithms and fuzzy systems], Moscow, 2008, 452 p.

UDS 62

MODEL-BASED VERIFICATON WITH ERROR LOCALIZATION AND ERROR CORRECTION FOR C DESIGNS

(The work has been partly supported by European Commission FP7-ICT-2009-4-248613 DIAMOND project, by Research Centre CEBE funded by European Union through the European Structural Funds, by Estonian Science Foundation grants 9429 and 8478 and by Estonian Academy of Security Sciences)

Urmas Repinski, Assistant (Department of Computer Engineering Tallinn University of Technology 5, Ehitajate tee, Tallinn, 19086, Estonia, urrimus@h°tmail.com)

Abstract. Process of software design verification makes sure if design holds its specification, existence of the specification that is possible to animate allows to perform simulation-based verification. In order to locate and repair errors if verification fails, we have to have access to the structure of the debugged design. For this purpose the design should be parsed into suitable representation - into the model. Both algorithms: Error Localization process and Error Correction process of the design require model simulation. This article presents different simulation algorithms. Simulation of the model is usually applied directly, when the design model is simulated with the goal to obtain the outputs from the inputs, but this approach is not always suitable, because in this case the functionality of the programming language design should be almost completely re-implemented for simulation. An alternative approach described in this article - design model simulation using design programming language functionality. It is more reasonable and does not require re-implementation of the functionality already available in design programming language. This approach also makes possible implementation of the algorithm of dynamic slicing for Error Localization and Error Correction.

Keywords: simulation-based verification, error localization, error correction, specification, automatic error correction, debug, C design.

In simulation-based verification, a digital system (design) looks like a black box, as it uses inputs and compares outputs. It's entire design structure is irrelevant. If we want to debug this design, it should be transformed into a white box structure. In other words, it should be parsed into a representation suitable for processing. White box representation of the design is stored in corresponding data structure, in the model. A suitable model for Error Localization and Correction should allow tracing its execution and application of any changes within its structure. Localization algorithms use the trace of the executed model to select erroneous candidates from total components of model, correction require full access to any part of investigated design's model.

This paper presents implementation of the ModelBased Error Localization algorithm for C designs. The algorithm has been integrated to the open source FoREnSiC [1] (FOrmal Repair ENvironment for SImple C) tool. The FORENSIC model has a flowchart-like structure that can be described as a special case of flowgraph: the hammock graph [2].

The tool includes a Front-End parser, a number of Back-End Error Localization and Error Correction algorithms' implementations.

Similar to [3], different ranking algorithms are compared and their ranking accuracy for error localization is measured by experimental results on the Siemens benchmark set. However, a new contribution of the paper is the observation that a simple error ranking metric that takes into account only information from failed sequences has the least average deviation from exact localization.

Model-Based Error Localization algorithms store trace of model simulation and select most probable erroneous components of the model. Dynamic Slicing can increase Error Localization accuracy that will be shown using experimental results, obtained by FORENSIC Error Localization tool. Level of design details in the model should be high enough to be able to extract variables, variable names, operators and conditions for Dynamic Slicing.

The article will show what structure is possible for the model for Error Localization, that were imple-

mented in FORENSIC tool [1], how functionalities, provided by the model, allow to implement ModelBased Error Localization, Mutation-Based Error Correction and Dynamic Slicing for C designs, how parser for the model can be implemented using C/C++ GIMPLE plugin. FORENSIC tool [1] can be extended to C/C++/SystemC design's support in the future.

FORENSIC tool structure

FORENSIC tool [1] is divided into front-end and back-end components. Front-end parses design into the model and back-end consists of several components that are responsible for different functionality, such as design Error Localization and Error Correction.

Front-End Implementation. Before starting process of Verification with Error Localization with Dynamic Slicing, based on the model, it is necessary to parse processed design into the model representation.

GIMPLE plug-in for GCC compiler [4] allows to obtain three-address representation of the processed design with tuples of no more than 3 operands. This representation of the design is suitable for parsing, as it has whole information about processed design. At this stage there is no need to create additional parser or lexical translator, C compiler's plug-ins already generate suitable design form, and just necessary to store the parsed design tree into the model. For that every class of the model implementation should have add and remove methods.

The front-end parser of the FORENSIC tool [1] for C design was implemented at University of Bremen. The front-end of the tool can be extended to C+ + and to SystemC support for C++ and SystemC language's support.

Tool Implementation. Various back-ends and algorithms are implemented in FORENSIC tool [1] for Error Localization and Correction, Simulation-Based back-end, implemented in Tallinn University of Technology, is described in current article.

Simulation-Based Back-End's implementation consists of several components that are responsible for different functionality. Those are:

- Inputs-outputs, is data that is required and used for Simulation-Based Verification, Error Localization and Error Correction;

- Data Structures, places where data information is stored during processing. It includes model with the most significant data structure in tool;

- Functionality, is front-end, back-end, animation and simulation algorithms implementations, are actual implementations that were developed and documented and can be extended further.

Diagram that describes basic Verification with Error Localization and Error Correction algorithm's implementation in FORENSIC tool [1] is Figure 1a and more accurate verification algorithm's implementation is Figure 1b.

a)

Input/Output^ | Functionality ■ ¡Structure

Figure. 1. Basic (a) and more (b) accurate Verification, Error Localisation and Error Correction implementations

The difference between two implementations is following in

a) implementation simulation of processed design is used to generate outputs, in b) implementation there is no independent design's simulation with goal to get

outputs, outputs are generated by design's model simulation, process of Error Localization is started immediately after process of parsing of design into model, and during process of simulation of model outputs for Verification and information for Model-Based Error Localization are obtained simultaneously.

b) implementation is more reasonable. Both Error Localization and Correction require simulation of the model, and outputs obtained during simulation for Error Localization and Correction can be used for Verification. On the other hand, in a) implementation original design is used to get outputs not parsed representation that is more reliable. But if design is parsed correctly, then outputs of model simulation and design simulation should be the same. In actual implementations of tool back-ends can be used any of the proposed algorithms.

Further Model-Based Error Localization and Mutation-Based Error Correction functionalities of FORENSIC tool [1], simulation functionality and structure of specification for the Error Localization and Error Correction are described in details.

Model for Error Localization

Model Difinition. In order to perform design Verification and further Error Localization, a model, that is suitable representation for Error Localization of the processed design, should be developed. Design can be presented as single-entry single-exist structure [5], that is a special case of flow graph [2], known as hammock graph [2]. Hammock graph is defined as follows.

Definition 1. Hammock graph is a structure H=<N, E, n0, ne>, where N is a set of nodes, E is a set of edges in N*N, n0 is initial node and ne is end node. If (n, m) is in E then n is an immediate predecessor of m and m is an immediate successor of n. A path from node n0 to node nE is list of nodes p0, p1, ..., pk such thatp0=n0,p\=n\, ...,pK=ne, and for all i, 1<i<k-1, (pi, pi+1) is in E. There is a path from n0 to all other nodes in N. From all nodes of N, excluding ne, there is a path to ne.

In proposed model structure nodes in the hammock graph are statements or part of statements of the original design, that include assignments, conditions, arithmetical operations, etc. This model structure is suitable for Error Localization implementation, but for Dynamic Slicing and Mutation-Based Error Correction model representation should be more accurate. It is necessary to have possibility to obtain statement's variables in the node and to process defined and referenced variables in statement for Dynamic Slicing algorithm implementation, that will be shown further. For this purposes every node in hammock graph should be parsed further into flow graph representation, that is hierarchy of operators and functions with corresponding variables in the statement.

In the FORENSIC tool the C design is parsed into the flowgraph model using gcc front-end parser. Implemented model is designed with Error Localization and Correction in mind.

Definition 2. Flow graph is a structure H=<N, E, n0>, where N is a set of nodes, E is a set of edges in N*N, n0 is an initial node. If (n, m) is in E then n is an immediate predecessor of m and m is an immediate successor of n. There exists a path from n0 to all other nodes in N. Path is a list of nodes p0, p1, ..., pk such thatp0=n0,p =n1, ...,pk=nh and for all i, 1<i<k-1, (pi, pi+1) is in E.

Practically in FORENSIC tool corresponding structure is implemented using C++ classes, every hammock graph node and flow graph node has number of class variables and functions that allow to store design as "white box" representation [1]. Model were designed and implemented in Graz University of Technology.

Implementation of C model. The model should store all possible information about the design and allow to trace hammock graph nodes during simulation, to obtain defined and referenced variables during simulation, to correct and change any part of the processed design.

The model built for C designs should be able to store following C design components:

- Flow control primitives (for, if, while, switch, do.while);

- Arithmetical operations;

- Assignments;

- User-defined and compound types and variables;

- Pointer data types;

- Pre-defined C functions.

Proposed model is implemented using C++ classes, C++ programming languages, in FORENSIC tool [1].

For this purposes model consists of FunctionData class instances, that has number of parameters, that can be CompoundType or BasicType class instances. Compound Data Type has number of methods and class variables, that allow to define user-defined C data type. BasicType defines pointers and pre-defined C data types.

Every data type can be defined using pointer, address in memory where data type is located. For each type number of pointer indirections parameter exists in the model. If this value is zero, then we have non-pointer data type, if greater than zero then have pointer type, pointer to pointer type, etc.

Every function consists of the start and end node, between them is model's hammock graph (def. 1), where processed design statements are stored. Hammock graph nodes are implemented in FORENSIC tool [1] by FlowGraphNode class instances. Flow-GraphNode class can be extended by ConditionNode class, that has one predecessor and two successors - if condition at the ConditionNode is true or false, and

OperationNode, that is statement with no condition, one predecessor and one successor. FlowGraphNode class instances are presented by Flow Graph (def. 2), AstNode class instances, that store parsed representation of assignments and arithmetical operations in the node. C functions are presented using Operation-Nodes with special description (a function), with function's return value and parameter's representation using AstNode class instances.

Described model is sufficient enough to store any C design, all C flow control primitives can be stored using this structure.

Additional nodes with comments and any additional information for processing can also be added to the model.

The example of FORENSIC tool [1] model is Figure 2.

argc!=4

l; 51 c:7 bb:2

schedule2/versions/v2/scedule2, ^-

exit_here(-1);

l; 51 c:38 bb:3

schedule2/versions/v2/scedule2.c

h-

Op ! iUNEQUA~i

prio=3;

iFUN_CAL|

Number itype: int

raw value: -1 parsed value: -1

l; 52 c:14 bb:4

schedule2/versions/v2/scedule2.c

; id ; -----i

argc

!argc_2 I

prio.28=prio;

l; 52 c:30 bb:13

schedule2/versions/v2/scedule

h-

Op I ASSIGN "

sr:

Number

prio.28>0 Op

l; 52 c:5 bb:13 ASSIGN

schedule2/versions/v2/scedule2.c s N

I

-I

type: int

raw value: 3 i

-I

parsed value: 3

prio.24=prio; status=get command(&command

l; 54 c:37 bb:5 l; 52 c:30

schedule2/versions/v2/scedule2.c schedule2/versions/v2/sce

□ □

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

OperationNodes ConditionNodes AstNodes Figure 2. Example of FORENSIC tool design's model

Model-Based Error Localization

In Graz University of Technology SMT-Based Error Localization and Correction algorithm were implemented, when simulation trace of the model with specific inputs is transformed into SMT-solver compatible expression and expression is solved setting repaired variable as unknown variable.

This algorithm, named Symbolic Execution, is repeated for every processed variable in the model. Algorithm allows to correct complicated errors when variables or functions are defined erroneously.

However in practice, simpler errors such as misuse of arithmetical operator and error in one digit of the number appear much more often. To correct these types of errors Model-Based Error Localization with Mutation-Based Error Correction Algorithm can be applied. It is simpler and faster than more sophisticated algorithms. But if Model-Based Error Localization algorithm does not correct errors and verification of design still fails, then SMT-Based Error Localization and Correction algorithm can be applied after Model-Based Error Localization [6].

In FORENSIC tool [1] it is possible to execute back-ends independently, one after another.

Specification for C design. In order to implement Model-Based Error Localization, it is necessary to verify design. For that we need to have specification in valid format, that is explicit and possible to animate, that will produce reference outputs from inputs which will be compared further with design's outputs.

Specifications can be formal or informal. In information systems development in business informal specifications through graphical modeling have been used at least since late 70s. Recently formal specification languages (such as Larch, VDM, Z, FOOPS and OBJ) have been developed [7].

Formal Specifications are divided into two categories: explicit and non-explicit. Specifications that are possible to animate, called explicit, are defined in such a way when specification's outputs are derived from specification's inputs directly [8]. Non-explicit specifications are impossible to animate.

Difficulty of defining C/C++ design's specification's format is that this should be specification suitable for any C/C++ design. It should be possible to define any C/C++ design's behaviour using that. C/C++ design has no functionality for temporal operators like SystemC; most of its functionality are arithmetical procedures, conditions and file processing.

Best solution to define specification of system with required properties - specification using C/C++ language, same language as design. Specifications in C/C+ + format are simple to create and to control. For verification purposes specification should be possible to animate and C/C++ code is executable, fast and reliable for this purposes.

Specifications using high-level programming languages like Java, C, C++ are widely used in software development industry [9].

If specification for C/C++ design is C/C++ specification language, then it is easy to produce outputs in the same format as in processed design. To do this Observation Points in the same locations for specification and for processed design can be added to both design and specification.

Y

In this case internal format of specification is not strictly defined and any C/C++ functionality can be used in specification. As C/C++ is high-level programming language, specification can be implemented using test cases for arbitrary functions, or they can model design more precisely, most important that outputs of specification should correspond to outputs of design.

In general, it is a good idea to use same design language and specification language.

Simulation for Model-Based Error Localization. When specification for the design is ready, and design is parsed into model, it is possible to verify the design. During Simulation-Based Verification reference outputs of the design are compared with outputs, and if design corresponds to its specification decision is made. For obtaining the outputs from design inputs it is enough to execute the design. However, for ModelBased Error Localization we need trance of simulation of the design, storing information about components of the design that were activated during simulation. It is necessary to be able simulate design's model, and process of design verification and Error Localization can be merged together using simulation of the model only for Verification and Error Localization. In the FORENSIC tool [1], simulation of the model is implemented using C functionality, executing instrumented and dumped model of the C design. The advantages of using C language's functionality for simulation of C design's model over direct simulation of the C design's model are as follows.

In case of direct simulation of the model, all logical operators should be implemented and all values of variables activated during simulation should be stored somewhere during processing. Variable's types can be different: pointers, arrays, compound data types and so on, and practically memory model of C programming language should be rewritten in order to store the values. Therefore using already existing C compiler's memory model is reasonable. Additionally, implementation of operators would be duplication of C functionality.

On the other hand, when using C functionality, the model should be dumped into a C executable file, instrumented, executed, and output responses should be written into output. It allows getting any data during simulation process, and we do not lose any functionality, but get benefits such as speed of execution and simplicity of approach. There are not any programming errors while complicated C functionality is re-implemented. Dynamic Slicing described in Section 2,4 is implemented in the FORENSIC tool [1] based on C functionality.

Same principles of design's model simulation can be applied if the design is written using on other programming language, C/C++/SystemC or JAVA as examples.

Model-Based Error Localization. During simulation of the model during Model-Based Error Localization number of nodes from model are activated.

Definition 3. Activated path Ps of hammock graph H=<N, E, n0, ne> is the path, that consists of nodes n, e N simulated during simulation with input s. Path Ps from node n0 to node ne is a list {n0, nb ..., nk-x}, such that for allj, 0<j<k-2, (n,, nJ+l) is in E.

Definition 4. Activated nodes NaeN during simulation with input s that is in S are nodes, that belong to the activated path Ps from node n0 to node ne.

Algorithm 1. Model-Based Error Localization. During Model-Based Error Localization the model is simulated with inputs S and for each input s e S output of simulation is compared with reference output of specification. If the comparison, i.e. Verification fails, then corresponding activated nodes Na have the failed counter increased, otherwise nodes have the passed counter increased. After simulation with all inputs S ranking algorithms are applied to counters, and finally nodes are sorted according to their rank. Nodes with high ranks are Candidates for Correction and stored in corresponding data structure.

Ranking for Error Localization. During ModelBased Error Localization rankings are applied to model's Activated Nodes. It allows extracting Candidates for Correction. Ranking equations and comparison of rankings using Siemens Designs [10] were studied in [3]. However, a new contribution of the paper is the observation that a simple error ranking metric that takes into account only information from failed sequences has the least average deviation from exact localization. The purpose of this work was not correcting errors as in [11], but comparing coefficients for ranking algorithms.

Coefficients described in this paper are taken from diagnosis/automated debugging tools Pinpoint [12], Tarantula [11], and AMPLE (Analyzing Method Patterns to Locate Errors) [7], and from molecular biology domain (Ochiai coefficient) [3].

A program spectrum [13] is collected in [3] of processed designs. Based on obtained spectra, that is activated nodes and passed and failed information in terminology introduced above, binary matrix M x N is built: M simulations of design of N blocks length [3].

Spectra can be presented as:

X11 X12 ... X-, X21 X2 2

M Spectra

4w X2N

lXM1

XM 2

MNi lcM

Every component of spectra xy has value 1 if it were activated and 0 if it were not activated during simulation. Values ey- have error detection values information, have value 1 if simulation were failed and 0 if corresponding simulation was passed. Using notation of [14], four counters can be introduced:

values

counter xii e

an 1 1

a10 1 0

a01 0 1

a00 0 0

Similarity coefficients were calculated using counters. Corresponding ranking algorithms were derived using terminology introduced above. Comparison of ranking definitions is in Table 1 below.

Table 1

Comparison of ranking algorithms for FORENSIC Simulation-Based Back-End with experiments in [3]

[9] similarity coefficients s

FORENSIC tool ranking

Tarantula tool ranking [6]

an failed (n)

aii + aoi__totalfailed

- o rankin) =-

dii + Œr

- +

aio + aoi

passed (n) failed (n) totalpassed totalfailed

Jaccard coefficient used in Pinpoint framework [9, 1]

____failed(n)

= rankin) = -

a1 + am + au

passed (n) + totalfailed.

Л

Ochiai coefficient, used in molecular biology [9]

an_ , n ) failed(n)

rank(n) =

i + &oi) • (^u + a10)

^totalfailed • (passed(ri) + failed(ri))

AMPLE tool ranking [9]

aio | I failed(n) passed(n) i

—:-\ rank(n) = \---1

o + aoo1_I totalfailed totalpassed I

simple ranking introduced in current article

failed (n)

rank(n) =-

__totalfailed_

Counters atj are defined in (1)

where rank of Activated Node n with simulation with all inputs S is calculated using passed(n) - times passed for node n, failed(n) - times failed for node n, totalpassed - total times passed for all inputs and totalfailed - total times failed for all inputs._

It is possible to improve Error Localization accuracy by applying additional dynamic slicing algorithm to the Candidates for Correction.

Dynamic Slicing for Model-Based Error Localization. Dynamic Slicing is a technique applied to Activated Nodes Na obtained during Model-Based Error Localisation simulation that allows reducing number of Candidates for Correction. The idea behind Dynamic Slicing is the following: some amount of candidates (nodes) is activated during Model-Based Error Localisation's simulation but do not have any influence on simulation output. Those are usually constant declarations, assignments inside code and so on. Dynamic Slicing allows discarding those statements.

Definition 5. Dynamic Slice Nd of a hammock graph H=<N, E, n0, ne> includes nodes that are in activated nodes Na that have influence on simulation output. Those nodes, that are not in slice, i.e. Na\Nd can be removed from Activated Path of simulation with input s because they have no effect on simulation output.

In order to find nodes of model's hammock graph that are not in slice we need to introduce the following definitions - defined DEF(n) and referenced REF(n) variables in node n (class FlowGraphNode in FORENSIC implementation [1]).

Definition 6. Let V be the set of variables, that exists in design's model represented by a hammock graph H=<N, E, n0, ne>. Then for every node neN, two sets can be defined, each of them is in V: REF(n)

- set of variables, whose values are used in n, and DEF(n) - set of variables, whose values are changed or defined in n [15].

Algorithm 2. Dynamic Slicing. This algorithm uses Activated Nodes Na obtained during simulation, global referenced variables gREF are calculated at the moment of processing of every component n that is in Na. Execution of algorithm goes from end activated node ne to initial activated node n0, backtracing is performed. At the beginning of processing gREF=0.

If n contains C function or condition (Condition-Node), then used variables REF(n) in n are added to the set of global referenced variables: gREF=gREFu UREF(n). Output Observation Points, that output data during simulation, can be last executed components during Model-Based Error Localization simulation and variables outputted by Output Observation Points are initially stored in the set of global referenced variables gREF. Observation Points are implemented using assert(...) or special FORENSIC_...() functions.

If node n contains an assignment, then referenced variables REF(n) in the assignment are added onto the set of global referenced variables if and only if defined variables DEF(n) are already in the set of global referenced variables gREF, otherwise component is not in the slice and has no influence on the simulation result. Defined variables DEF(n) are removed from the list of global referenced variables, as they are redefined: if DEF(n)egREF then gREF=(gREF\ DEF(n)) UREF(n), otherwise gREF=gREF\DEF(n) and n is not in the slice.

Components that are not in the slice can be removed from Activated Path without changing the simulation output.

Design

Activated Operators neN„

DEF(n) REF(n) gREF Line #

int a, b, c; a=0; b=a+1; c=a+b; if (c>0) { a=0; a=b+c; } else {

a=b-c; }

assert(a==1);

int a, b, c;

a=0;

b=a+1;

c=a+b;

if (c>0)

a=0;

a=b+c;

assert(a==1);

a

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

a, b

b, c b, c

b, c

1

2

3

4

5

6

7

8

9

10 11

Figure 3. Example of Dynamic Slicing execution, assignment at Line 6 is not in slice

Depending of processed design's nature, Dynamic Slicing can have strong influence. Number of assignments can be made at the initializing part of design, those are usually constants, arrays, and so on. Not all of initialized values are used during simulation. Corresponding assignments will be discarded, as they will

a,-, + a

a

be not in slice. In actual implementation of Dynamic Slicing in FORENSIC tool [1] unique C variable addresses are used instead of variable names, that are shown at Figure 3, that were possible to implement only because of C functionality were used for simulation of model, variables are parsed in model and presented using Flow Graph.

Mutation-Based Error Correction

When Error Localization is complete and Candidates for Correction are generated it is possible to correct error in design. Error Classes define types of errors that could be found in processed designs, and valid Error Correction algorithm should be able to correct as much as possible. The model with implemented Mutation-Based Error Correction (FORENSIC model) defines possible corrections implementation, and therefore model choice have high influence on correction procedure and results.

Error Classes found in experimental Siemens Designs [10] that are used for experiments with FORENSIC tool [1] are described below, last column if Mutation-Based Error Correction algorithm of FORENSIC tool [1] corrects corresponding class:

Table 2

Error Classes in Siemens Benchmarks

Error class Operators/examples Supported?

AOR ADD (+), SUB (-), MULT (*), DIV (/), MOD (%) YES

ROR EQ (==), NEQ (!=), GT (>), LT(<), GE (>=), LE (<=) YES

LCR AND (&&), OR (||) YES

ASOR (+=), (-=), (=) YES

Integer mutation 500 ^ 550, 740 ^ 700, 1 ^ 2, a[0] ^ a[1], n= 0 ^ n = 1 Partially

Float mutation 0.0 ^ 0.1, 0.0 ^ 1.0 Partially

Incorrect array index a[i-1] ^ a[i], a[i+1] ^ a[i], a[3] ^ a[4], a[var] ^ a[0] Partially

+1 added/missing r = i ^ r = i+1, f(a, b, c+1) ^ f(a, b, c), if(i) ^ if(i+1) Partially

Constant mutation TRUE ^ FALSE NO

Float rounded 2,5066282 ^ 2,50663 EPS ^ EPS-0.000001 Partially

Missing code/Code Added /* && (Cur_Vertical_Sep > MAXALTDIFF); missing code */ NO

Function Substituted by Constant Inhibit Biased Climb() ^ Up Separation NO

Logic Change Brackets removed Partially

Some of the Error Classes are not supported by current version of Mutation-Based Error Correction algorithm. Some of them are because of limitation of Mutation-Based Error Correction algorithm, as it does not support corrections of complicated errors. Others are impossible to correct when some code is simply missing in processed design.

Mutation-Based Error Correction algorithm corrects errors using following mutations.

• Operators Mutation - operators are mutated with similar ones. Operators are divided into several groups where similar operators are grouped together.

• General Mutations of constants. Those include adding one to constant - C—C+1, subtracting one from the number - C—C-1, setting constant to zero -C—^0, and finally making constant negative C— -C.

• Mutations in constant number's digits. Number's digit is mutated, values 0, ..., 9 are applied to every digit, numbers can be any C floats or integers.

• Double-Precision numbers rounding mutations. Double-precision numbers (floats and doubles) are mutated with rounded ones, rounded up and down.

Groups of operators for mutations in operators are.

• Arithmetical Operators: plus (+), minus (-), multiplication (*), division (/), module (%).

• Assignment Operators: assign (=), plus assignment (+=), minus assignment (-=), multiplication assignment (*=), division assignment (/=), module assignment (%=).

• Comparison Operators: less (<), grater (>), equal (==), unequal (!=), less equal (<=), grater equal (>=).

• Logical Operators: logical and (&&), logical or

(II)

• Bit Shift Operators: bit shift left (<<), bit shift right (>>), bit and (&), bit or (|), bit xor (A).

• Bit Shift With Assignments Operators: bit shift left assignment (<<=), bit shift right assignment (>>=), bit and assignment (&=), bit or assignment (|=), bit or assignment (A=).

• Unary Arithmetical Operators: unary plus (+), unary minus (-).

• Increase Decrease Operators: pre increase (++x), post increase (x++), pre decrease (--x), post decrease (x--).

• Unary Logical Operators: logical not (!), bit complement (~).

If there is one fault assumption for Error Correction then mutations are applied one-by-one to Candidates for Correction, and mutated model is verified again using C compiler. However, if multiple fault assumption is used, then mutations should be applied to several candidates simultaneously to get valid correction. If verification passes, then valid correction is found, if differ, then process of Mutation-Based Error Correction is continued while all Candidates for Correction are processed (Figure 1b).

Algorithm 3. Mutation-Based Error Correction. Let C be Candidates for correction, C=<N, R> where n in N is node of model - hammock graph H=<N, E, n0, ne>, r in R - rank of node, calculated during Error Localization using ranking algorithm. Let O(n) be arithmetical and logical operator(s) in n in N, Ct(n) be constants in n in N, DP(n) - Double-Precision numbers in n in N. Initially C is sorted according to rank,

c _ ^max(fl) ... min (R)^ ^max(fl) ... min (R)^ and then

Vnt £ N £ Cs, t = 1 ... INI:

V0(nt ):

0(n ) is substituted with one from the same group, new model's Verification passes ? Error Is Corrected,

infinite loop during Verification ? break; 0(nt) is restored, Error Is Corrected ? End.

VCt(nt):

Ct(n;) 's every digit is substituted with values (0..9), new model's Verification passes ? Error Is Corrected, infinite loop during Verification ? break; Ct(n;) is restored, Error Is Corrected ? End.

VDP(n ): DP(nt) is rounded up and down, new model's Verification passes ? Error Is Corrected, infinite loop during Verification ? break; DP(nt) is restored, Error Is Corrected ? End. No Correction Found.

Mutation-Based Error Correction algorithm does not correct complicated errors when some code is missing or function is substituted by constant, but has its own advantages. Error Classes supported by Mutation-Based Error Correction of FORENSIC tool [1] are natural and not complicated, algorithm is fast, reliable and simple. According to experimental results, processing time with Mutation-Based Error Correction algorithm of Siemens Benchmark [10] design, that have 180-570 lines of C code takes from 1 to 6 minutes with 2,4 GHz processor PC.

Property of Model-Based Error Correction. In order to measure Error Localization accuracy, in [11] number of statements to reach the error is used. But every statement of processed design can have different complexity, as examples it can be assignment, assignment with arithmetical operators, function with different number of parameters, every parameter of function can be other function or arithmetic, it can be simple logical condition or very complicated logical condition. In order to make code more readable and beautiful, sometimes it is a "good style" to compose complicated statements that are easier to observe for code developer than group of small statements. In this case approach to count number of statements that are necessary to reach error is not accurate measure of Error Localization accuracy.

On the other hand, number of mutations, that is applied to parsed design's model, should be practically the same for same functionality. This is because in FORENSIC model hammock graph nodes have no more than 3 operands, and complicated statements are presented by several hammock graph nodes. Implemented mutations for Mutation-Based Error Correction define operators, numbers, constants and logic, that is processed during Correction, or define locations of errors, that can be found and corrected in the design. Not processed during Mutation-Based Error Correction errors are errors, that are out of scope of investigation in the design, and can be omitted during measurement of Error Localization accuracy without losing measurement accuracy.

It means that counting number of processed mutations during Mutation-Based Error Correction instead

of number of statements, required to reach the error, is more accurate meter of Error Localization accuracy.

Experimental Results

Experiments of Model-Based Error Localization with Mutation-Based Error Correction were performed using Error Localization with Dynamic Slicing and without Dynamic Slicing. Experimental designs are C Siemens Benchmarks [10].

Table 3

Siemens Benchmarks

Design Complexity, lines Number of Inputs Number of Erroneous Designs

teas 186 1605 41

schedule 419 2650 9

schedule2 312 2710 9

replace 558 5543 31

tot_info 413 1052 22

print_tokens 569 4130 7

print_tokens2 515 4115 10

Siemens Benchmarks [10] are suitable for FORENSIC tool [1] experiments, those are open-source C designs, created by Siemens Corporate Research and extended by Mary Jean Harrold and Gregg Rothermel from Georgia Institute of Technology, Atlanta, USA. Every design type has number of already generated inputs that test C designs with different coverage. Error classes that can be found in Siemens Benchmarks are described in Table I.

During experiments FORENSIC tool's [1] Simulation-Based back-end was used, simple ranking and one fault assumption for Model-Based Error Localization were applied, with and without Dynamic Slicing. In table below measure of Error Localization accuracy is mean number of mutations required to correct errors in Siemens Benchmarks [10] during Mutation-Based Error Correction.

Table 4

Error Localization with and Without Dynamic Slicing for Siemens Benchmarks

Design Percentage of corrected Designs Mean Number of Correction mutations required to correct error when Error Localization is Change, A - В

В • 100 %

with Dynamic Slicing (A) without Dynamic Slicing (B)

teas 65,9 131,19 303,41 56,76

schedule 33,3 325,67 403,67 19,32

schedule2 33,3 460,67 495 6,93

replace 37,5 855,17 874 2,15

tot_info 46,9 781,33 793,47 1,52

print_tokens 14,2 825 891 7,40

print_tokens2 60,0 1255,17 1291,33 2,80

Designs that include Error Classes which are not supported by FORENSIC tool [1] were not included into experimental results.

Design internal structure defines how much Dynamic slicing reduces number of correction mutation required for Correction. If design has number of initializations in the beginning of the code, then Dynamic Slicing discards initializations that are not required for particular simulation with input s.

Accourding to experiments, Dynamic Slicing has strong influence on tcas design from Siemens Benchmarks, on schedule, schedule2 and on printto-kens, for tcas design Dynamic Slicing algorithm will reduce number of required mutations for correction by 56,76 % in mean, for schedule - by 19,32 %, for schedule2 by 6,93 %, and for print_tokens by 7,40 %.

Conclusions

The article describes Error Localization algorithm that is implemented with Dynamic Slicing and simulation using C code simulation. The localization algorithm has been integrated into the FoREnSiC automated debugging system. Different ranking algorithms were compared and their ranking accuracy for Error Localization was measured by experiments on the Siemens benchmark set. A new contribution of the paper was the observation that a simple error ranking metric that takes into account only information from failed sequences has the least average deviation from exact localization.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Experiments presented in this work show that the Simple ranking for Model-Based Error Localization is more stable than other rankings introduced in [3], as it has lowest standard deviation of results - 6.61 that is 2.5 times less than second minor standard deviation of results, obtained using Ochiai ranking for Error Localization. Simple ranking is best when one error is present in processed design. In this case every failed simulation of design will include erroneous component and it will have highest ranking. However, other rankings may become necessary if multiple errors are present simultaneously. Our future work is focused on studying error localization for multiple design errors.

References

1. FORENSIC tool, Available at: http://www.informa-tik.unibremen.de/agra/eng/forensic.php (accessed 23 January 2012).

2. Kasyanov V.N., Distinguishing Hammocks in a Directed Graph, Soviet Math. Doklady, 1975, Vol. 16, no. 5, pp. 448-450.

3. Abreu R., Zoeteweij P., and Van Gemund A.J.C., An Evaluation of Similarity Coefficients for Software Fault Localization, Dependable Comp., PRDC '06. 12th Pacific Rim Intern. Symposium, 2006, pp. 39-46.

4. GIMPLE plug-in for GCC compiler, Available at: http://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html, (accessed 23 January 2012).

5. Weiser M., Program Slicing, Soft. Engineering, IEEE Transactions, 1984, pp. 352-357.

6. Konighofer R. and Bloem R., Automated error localization and correction for imperative programs, Formal Methods in Computer-Aided Design (FMCAD), 2011, pp 91-100.

7. Jadish S. Gangolly, Lecture notes on Design & Analysis of Accounting Information Systems, University at Albany, NY, 2000.

8. Xiaoping J., An approach to animating Z specifications, Computer Software and Applications Conference, 1995. COMPSAC 95. Proc., 19th Annual Intern., 1995, pp. 108-113.

9. William K. Lam, Hardware Design Verification. Simulation and Formal Method-Based Approaches, 2005.

10. Bug Aristotle Analysis System — Siemens Programs, HR Variants, Available at: http://pleuma.cc.gatech.edu/aristotle/Tools/ subjects/, (accessed 23 January 2012).

11. Debroy V., and Wong W.E., Using Mutation to Automatically Suggest Fixes for Faulty Programs, Soft. Testing, Verification and Validation (ICST), 3rd International Conf., 2010, pp. 65-74.

12. Chen M.Y., Kiciman E., Fratkin E., Fox A., Brewer E., Pinpoint: Problem determination in large, dynamic internet services, Proc. Int. Conf. on Dependable Systems and Networks, 2002, pp. 595-604.

13. Reps T., Ball T., Das M. and Larus J., The use ofprogram profiling for software maintenance with applications to the year 2000 problem, Proc. 6th European Soft. Engineering Conf. (ESEC/FSE 97), Vol. 1301, LNCS, 1997, pp. 432-449.

14. Jain K., and Dubes R.C., Algorithms for clustering data, Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1988, pp. 528-550.

15. J. de Kleer, and Williams B.C., Diagnosis with behavioral modes, Proc. IJCAI-89, 1989, pp. 1324-1330.

16. Reiter R., A theory of diagnosis from first principles, Artificial Intelligence, 1987, no. 32, pp. 57-95.

17. Quek C. and Leitch R.R., Using energy constraints to generate fault candidates in model based diagnosis, Intelligent Systems Engineering, 1st Intern. Conf., 1992, pp. 159-164.

18. Repinski U. and Raik J., Simulation-Based Verification with Model-Based Diagnosis of C Designs, East-West Design Test Symposium, 2012, pp. 42-45.

19. Kamkin A., Simulation-based hardware verification with time-abstract models, Design & Test Symposium (EWDTS), 2011 9th East-West, 2011, pp. 43-47.

Версия статьи на русском языке

УДК 62

ВЕРИФИКАЦИЯ НА ОСНОВЕ СИМУЛЯЦИИ С НАХОЖДЕНИЕМ И ИСПРАВЛЕНИЕМ ОШИБОК ДЛЯ С-ДИЗАЙНОВ

Урмас Репинский, ассистент (Таллиннский технологический университет, Эхитаяте теэ, 5, г. Таллинн, 19086, Эстония, [email protected])

i Надоели баннеры? Вы всегда можете отключить рекламу.