AUTOMATIC GENERATION OF SOFTWARE BUG FIXES BASED ON ANALYSIS OF SOFTWARE REPOSITORIES

Belskii A.; Itsykson V.M.

*-

Software of Computer, Telecommunications and Control Systems

DOI: 10.18721/JCSTCS.13204 УДК 004.052.44

AUTOMATIC GENERATION OF SOFTWARE BUG FIXES BASED ON ANALYSIS OF SOFTWARE REPOSITORIES

A. Belskii, V.M. Itsykson

Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russian Federation

This paper describes the method, which is developed by the authors to automated correction of software errors, which is based on the analysis of successful project fixes for the ABAP programming language available in open repositories. The method generates the candidates of patches based on predefined templates and ranks the results by the probability of successful application, which is determined by a probabilistic model using machine learning methods. The probabilistic model is formed by training on features, which are extracted from data from successful and unsuccessful patches of ABAP programs in open repositories. The developed method is tested on synthetic examples and real projects with errors in the ABAP language. As a result of the experiments, the method successfully generated some patches, which showed their efficiency. The results in accuracy and efficiency are comparable or superior to the results of experiments in similar works by other authors.

Keywords: Automated program repair, machine learning, Abstract Syntax Tree, logistic regression, gradient descent, ABAP.

Citation: Belskii A., Itsykson V.M. Automatic generation of software bug fixes based on analysis of software repositories. Computing, Telecommunications and Control, 2020, Vol. 13, No. 2, Pp. 35-48. DOI: 10.18721/JCSTCS.13204

This is an open access article under the CC BY-NC 4.0 license (https://creativecommons.org/ licenses/by-nc/4.0/).

АВТОМАТИЧЕСКОЕ ФОРМИРОВАНИЕ ИСПРАВЛЕНИЙ ОШИБОК ПРОГРАММНОГО КОДА НА ОСНОВЕ АНАЛИЗА ПРОГРАММНЫХ РЕПОЗИТОРИЕВ

А. Бельский, В.М. Ицыксон

Санкт-Петербургский политехнический университет Петра Великого,

Санкт-Петербург, Российская Федерация

Описан разработанный подход к автоматизированному исправлению программных ошибок на основе анализа успешных исправлений проектов для языка программирования ABAP, имеющихся в открытых репозиториях. Подход основан на генерации кандидатов на исправления (патчей) по заранее определенным шаблонам и ранжирует полученные результаты по вероятности успешного применения, определяемой на основании вероятностной модели, полученной с помощью методов машинного обучения. Вероятностная модель формируется за счет обучения на свойствах, извлекаемых из данных успешных и неуспешных патчей ABAP-программ, доступных в открытых репозиториях. Разработанный подход протестирован как на искусственных примерах, так и на реальных проектах на языке ABAP с ошибками. В результате проведенных экспериментов успешно сформирован ряд патчей, которые показали свою работоспособность. Результаты по точности и эффективности сопоставимы или превосходят результаты экспериментов в аналогичных работах других авторов.

Ключевые слова: автоматическое исправление программ, машинное обучение, абстрактное синтаксическое дерево, логистическая регрессия, градиентный спуск, ABAP.

Ссылка при цитировании: Belskii A., Itsykson V.M. Automatic generation of software bug fixes based on analysis of software repositories // Computing, Telecommunications and Control. 2020. Vol. 13. No. 2. Pp. 35-48. DOI: 10.18721/JCSTCS.13204

Статья открытого доступа, распространяемая по лицензии CC BY-NC 4.0 (https://creative-commons.org/licenses/by-nc/4.0/).

Introduction

In recent years the size of software has been constantly growing and the development cycle is shortening, which usually leads to a total decrease in the quality of software products. This is unacceptable in areas such as embedded systems in medicine, energy, engineering, the financial sector, and others, or even it can lead to significant material losses or danger to human life and health.

To overcome these problems, developers use various methods to improve the quality of software, like testing, verification, or the static analysis of software. However, all common methods of improving the quality of software have certain limitations and they cannot fully guarantee the quality of programs. For example, testing may detect errors in software, but it cannot guarantee that there are no errors in the tested software. In addition, there are entire classes of programs, such as parallel systems, whose behavior may be non-deterministic and whose testing is inefficient. Formal methods, such as deductive verification and static analysis, are still limited by the size of the analyzed programs and can only be applied to a narrow area of software projects.

Recently, software engineering has actively been conducting research in the field of analysis and application of the accumulated experience of millions of programmers in writing hundreds of thousands of software projects. This experience is recorded in software repositories (Version control systems, VCS) as a history of changes to projects and comments to commits, as well as in task and error management systems (issue tracking and bag tracking) as a history of changes to tasks and errors. There are a large number of methods that analyze the accumulated information and extract it from the knowledge, which is used in solving various problems of software engineering. These methods have proven themselves well in various areas of software engineering. Recently, these approaches have been applied in the field of detecting and correcting software errors, using not only the artifacts of analyzed software projects, but also the previously untapped potential of information, which is stored in hundreds of thousands of software repositories and allowing to reuse and generalize the experience of millions of developers.

This paper describes the results of the research in the field of automated error correction of software code based on the experience of analyzing successful fixes (patches) of many projects for the ABAP programming language [1], which is widely used in SAP software products.

The article is organized as follows: The first section contains a description of the task and a brief overview of the subject area. The second section illustrates the scheme and verbal description of the method, which is developed by the authors. The third section is devoted to the detailed description of the developed method and includes algorithms, a mathematical model, and the technologies and used methods. The fourth section shows the results of testing of the developed method and the analysis of the results. In conclusion, the results are summarized, the results are evaluated and plans for further research are formulated.

Related work

Nowadays, there are various technologies that allow for the automatic generation of bug fixes in programs (patches). The following methods can be the most representative of these technologies.

GenProg [2], relifix [3], Astor [4], and history-based program repair [5] methods, which are based on genetic programming technology. This class of methods is a method of stochastic problem solving, which

is based on the ideas of evolutionary genetics, which include the genotype (the genetic material of an individual) stored in memory, differential reproduction of these genotypes, and variations, which are created by processes, which are similar to the biological processes of mutation and crossover [6].

Methods SemFix [7], JFIX [8], CRSearcher [9], Qlose [10], Semantic program repair using a reference implementation [11], Static automated program repair for heap properties [12], Automated program repair with canonical constraints [13], which are based on the semantic approach. The main idea of these methods is to define a set of restrictions for an expression with an error by applying methods of symbolic program execution [14] and solving these restrictions by using various SMT solvers [15].

Methods R2Fix [16], Prophet [17], ELIXIR [18], Data-Guided Repair of Selection Statements [19], which are based on a class of machine learning methods [20]. The main idea of these methods is to build machine learning models [21], which is based on the source code of programs with errors and their corrections, as well as comments and other data from source code repositories such as GitHub and others. Then the trained model is used for classification tasks, for example, to solve problems of detecting errors in the source code of the program or determining suitable patches that are classified on the same parameters as the error.

The main disadvantage of methods, which are based on genetic programming technology, is the random selection of all possible patch variants without analyzing both the source code context with an error and similar patches. Also, methods, which are based on the semantic approach, already widely analyze the source code context with an error, but do not use the experience of similar patches to strengthen the algorithm for automatic patch generation. At the same time, methods, which are based on the class of machine learning methods, are the closest in implementation to the given task for the authors, since they analyze both the source code context with an error and similar patches.

Thus, the goal of this research is to develop a method to automatically generate bug fixes for software code based on the previously accumulated experience of creating patches. The method has to have an algorithm, which is based on machine learning methods and allows for the automatic generation of patches for various types of errors of the program code without using specifications and other means of automated code generation.

Overview

The main idea of the proposed method is to automatically generate patches for errors in ABAP programs by generating candidate patches based on predefined templates and ranking the results by the probability of successful application, which is determined based on a probabilistic model, which is obtained by using machine learning methods. In turn, the probabilistic model is formed by learning from the data of successful and unsuccessful patches of ABAP programs. The main idea of the method is presented in Fig. 1.

The method contains two main contours — "Machine learning model training" and "Patch generation and ranking", which contain the following seven functional blocks.

Block 1 "Forming an abstract syntax tree". The abstract syntax tree (AST) [22] is based on the source code with an error and a patch. Two independent AST are formed by applying the recursive descent method [23] based on the text of the source code of the ABAP program containing the error and the correction of this error (patch). Further work on the analysis of the source code of the ABAP program is performed on the AST, which gives a more accurate data of the types of elements of the ABAP program (variables, constants, operators, etc.) and their relationships.

Block 2 "Determination of the features of a successful patch". The features of successful patches were formulated to train the probabilistic model, which are determined by analyzing the AST of the source code with an error and the AST of the source code of the patch, which is obtained in block 1. For example, if the program correction was formed by adding a check for an empty variable value before executing the division operator, this feature can be used as a feature of the successful patch and used for training the probabilistic model.

Fig. 1. Scheme of the proposed method

Block 3 "Model training". In this block, the machine learning model is trained based on the features of successful patches, which are obtained in block 2. As a result, the trained model can be used to predict the success rate of any ABAP patch.

Block 4 "Forming an abstract syntax tree from source code with an error". The source code of the ABAP program that needs to be automatically generated for a patch is used to generate the AST, similar to block 1.

Block 5 "Generating candidate patches based on templates". Data of all variables and constants based on the AST is extracted from the program with an error. Furthermore, the array of possible conditions is generated from the data of all variables and constants. Finally, patches are generated using templates based on the received array of variables/constants and the array of possible conditions.

Block 6 "Defining features of generated patches". The features of generated patches in block 5 are determined in the same way as successful patches in block 2 are.

Block 7 "Ranking of the generated patches based on the features of the trained model". The probability of a successful patch is determined for each generated patch based on the trained model, which is obtained in block 3 and the features of the generated patches, which are obtained in block 6. The resulting list of generated patches is sorted in descending order of the probability of a successful patch. Generated patches with the highest probability of successful patches are considered target patches.

Our approach

Let's look more detailed at the stages of the method and the nuances of implementing the methods and models, which are shown in Fig. 1.

Generating AST from source code

The program must be translated into a formalized view to perform the analysis that is suitable for further processing. In this paper, we use an abstract syntax tree. Since there is no official grammar for parser

generators (for example, for ANTLR) for the ABAP language, the authors developed a lightweight parser based on the recursive descent method. The actual formation of the AST from the source code is performed in blocks 1 and 4, which are shown in Fig. 1. The goal of it is to more accurately determine the types of objects and their relationships for further analysis of ABAP programs. The simplified algorithm for parsing ABAP programs is presented as pseudocode in Listing 1.

In line 1 of the algorithm, the input data is the array of str e S, which is the source code lines of the ABAP program. In lines 2-8, the array of lexemes L is generated for each string of str in the source code. In lines 9-13, the array of tokens t e Tis formed by defining the following data for each lexem lfrom the array of lexemes L:

— the token type tt (header, operator, brackets, number, variable, type), which is defined by assigning each token to a programming language object class;

— the error flag of the error token bt, is determined by fulfilling the condition: if the source code line str contained an error, then all tokens t, which were related to tokens L, will have the value true.

— In lines 14-20, the array of nodes AST P is formed from the token array T. Each node p e P is the following:

{p}l\t\b)e p,

p

wherepp — a reference to the parent nodep e P; l — a lexeme; tn — a node type, which is determined from the token type tt; b — the error flag of the node is determined from the error flag of the error token b.

The array of nodes AST P is formed using the recursive descent method, which consists of recursively traversing the entire array of tokens t e T and building their relationships through references pp according to the grammatical rules of the programming language ABAP, which is shown in Fig. 2.

3 for element iu str do {

12 for </./,. b,> iu J (lo { Listing 1. AST generation algorithm

Fig. 2. Grammatical rules of the programming language ABAP

Defining patch features

Patch features are defined in blocks 2 and 6, which is shown in Fig. 1. The authors of the method formulated 15 patch features based on the results of many years of experience working with the ABAP programming language on real projects and creating thousands of bug fixes, as well as on the results of analyzing patches to ABAP programs from open source repositories, which are shown in Table 1. These features are extracted from the source code of ABAP programs with an error and a patch. Previously, to determine the features the method defines the differences between P. AST source code with an error

bug

and Ppatch AST source code with a patch in the form of node indexes of the beginning of the difference idx (P (,) and the end of the difference idx (P ,,). Also, a list of all patch variables v e V(P ,,) is

startv patch' enaK patch' 1 r v patch'

defined within idx t (P ,,) and idx (P ,,).

starts patch' endx patch'

Model training

Model training is performed in block 3 in Fig. 1. There are a number of models with their own advantages and disadvantages to solve classification problems with a teacher in machine learning. The authors of the method chose the logistic regression model [24], because with a small number of properties, this model shows better performance with similar accuracy than other machine learning methods, such as neural networks or the support vector machine. Moreover, the logistic regression model is more convenient to implement and adapt [25], and is also widely used in similar works by other authors.

The following matrix m x 15 is used to train the model:

Fu F12 F13 F4 F15 F6 F17 F1S F19 F 110 F11 F112 F133 F114 F115

f21 F 22 F 23 F24 F55 F6 F 27 F 28 F 29 F 1 210 F 1 211 F 1 212 F 1 213 F 1 214 F 1 215

Fml Fm2 Fm3 F m4 F 6 mo F 7 m7 Fm8 Fm9 Fm10 Fm11 Fm12 Fm13 F m14 Fm15

where m — number of examples in the form of features Pbug and Ppatch (see the section Defining patch features).

T a b l e 1

Patch features

Feature name Algorithm for determining

Type of error Fl Determined manually, possible options: 1 — division by 0. 2 — using an empty pointer. 3 — error in the conditional operator. 4 — error in the loop condition

Type of modification the patch F2 1. Adding a check. If the nodes in the tree P t, within idx t (P ,,) and c patch starts patch' idx ,(P ,,), where 3/ = if, то F, = 1. endx patch77 ' 2 2. The change of the if condition. If the nodes in the tree Ppatch within idx t (P ,,) and idx ,(P ,,), where V/ Ф if, but the tree nodes associated with starts patch' endx patch77 ' ' them are p e P ,,, where 3/ = if, then F, = 2. p patch 2 3. The change of the loop condition. If the nodes in the tree Ppatch within idx, (P ,,) and idx (P ,,), where V/Ф loop, but the tree nodes associated with start patch end patch them are p e P ,,, where 3/ = loop, then F, = 3. p patch 2 4. Otherwise F2 = 4

Location of the patch modification F3 The tree nodes with error P are defined by defining idx (P ) and idx (P ) bug J c starts bug' endx bug' of the tree nodes P. , where 3 b = true. bug Further, the place where the patch modification occurs is determined by the following rule based on the location data idx, (P ,,) and idx (P ,,) of the tree start patch end patch nodes Ppatch and the location data idxstart(Pbug) and idxend(Pbug) of the tree nodes Pbug: 1. If idxa (P. ) >= idx . (P ,.) and idx (Pu ) >= idx (P ,.), then F3 = 0. s ar bug s ar pa ch end bug end pa ch 3 2. If idxt (P. ) < idx . (P f.) and idx (Ph ) >= idx (P ,.), then F3 = 1. s ar bug s ar pa ch end bug end pa ch 3 3. Иначе F3 = 2

If operator is present at the error location F4 If the tree nodes P. , where 3 b = true and 3/ = if, then Fx = 1 else 0 bug' ' 4

Loop operator is present at the error location F5 If the tree nodes Pbug, where 3b = true and 3/ = loop, then F5 = 1 else 0

/,*,+,- operators are present at the error location F6 If the tree nodes Pbug, where 3b = true and 3/ = /,*,+,-, then F6 = 1 else 0

Call operator is present at the error location F7 If the tree nodes P. , where 3b = true and 3/ = =>, then F. = 1 else 0 bug 7

Variable is present at the if operator at the patch F8 Defining the tree nodes P t, within idx t (P ,,) and idx ,(P t,), where 3/ = if. c patch starts patch' endx patch77 Further, if in the defined nodes 3/ = v, then F0 = 1 else 0

Variable is present at the loop operator at the patch F9 Defining the tree nodes P (, within id t (P t,) and idx ,(P t,), where c patch xstartx patch' endx patch77 3/ = loop. Further, if in the defined nodes 3/ = v, then F9 = 1 else 0

Variable is present at the /,*,+,- operators at the patch F10 Defining the tree nodes P ,, within idx (P (,) and idx (P ,,), where 3/ = /, ^ patch startv patch7 endK patch'7 ' ' *,+,-. Further, if in the defined nodes 3/ = v, then F10 = 1 else 0

Variable is present at the call operator at the patch Fn Defining the tree nodes P ,, within idx, (P ,,) and idx (P ,,), where 3/ = =>. c patch startv patch endK patch'1 Further, if in the defined nodes 3/ = v, then F11 = 1 else 0

Variable is present at the if operator the error location F12 Defining the tree nodes Pbu, where 3b = true, and 3/ = if. Further, if in the defined nodes 3/ = v, then Fj2 = 1 else 0

Variable is present at the loop operator the error location F9 Defining the tree nodes Pbu, where 3b = true, and 3/ = loop. Further, if in the defined nodes 3/ = v, then F13 = 1 else 0

Variable is present at the /,*,+,- operators the error location F10 Defining the tree nodes PUg, where 3b = true, and 3/ = /,*,+,-. Further, if in the defined nodes 3/ = v, then F14 = 1 else 0

Variable is present at the call operator the error location F11 Defining the tree nodes PUg, where 3b = true, and 3/ = =>. Further, if in the defined nodes 3/ = v, then F15 = 1 else 0

The training is performed for the logistic regression model:

prediction = + ^ .

The main idea of training a logistic regression model is to determine coefficients 0 for features F successful patches (see the section Defining patch features), which can then be used to build a forecast prediction for any generated patches ABAP programs based on their features. The coefficients 0 are determined using the gradient descent method [26], according to which the following calculations are performed simultaneously:

90 = 90 - ax — ( prediction - y ), m

1 K

01 = 91 -ax — ( prediction - y ) x FJ +— x 01,

K

m m

1K

015 =015 - ax — ( prediction - y )xFJ5 +—x015, m m

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

where y — the result of successful application Ppach to Pbug (it is set manually, 0 — unsuccessful patch, 1 — successful patch); a — the coefficient of speed of learning (it is set manually and is used to regulate the accuracy and speed of the determination process 0); X — the regularization coefficient (it is set manually and used to reduce the likelihood of model overfitting).

When calculating the coefficients 0 the cost function J is also calculated, which should tend to zero at each iteration of the calculation and reflects the progress and correctness of the gradient descent method:

1 m K 15 J = — x£ (-yt x log (prediction) - (1 - yt ) x log (1 - prediction)) +--x ^ 022.

m i=1 2 x m j=2

1 Input: Prn

2 rjh = deJVarible(PjtJ

3 for Y/ki in Vjtx do {

4 for VJH2 ill Vfix do {

5 CNDjixicndflx) - \flxi >vftx2

6 CNDfi^cndjtx) = \firi<Vfit2

7 CNDftx(cndftx) = vjki=vftx2

8 CNDflcfend/tc) = v/hi^i-fici

9 10 II 12}

13 for cndjii in CNDjix do {

14 Genera tePatchA ddlf (aidnx)

15 GenerarePatchEditIf(cndfid

16 GeneratePatchEditCvcle (cndjix)

H}

Listing 2. Algorithm for generating candidate patches based on templates

}

CNDflx(cndflx) = rjhi is initial CNDjto(cndpx) — vfixi is not initial

Generating candidate patches based on templates

The generation of patch candidates by templates is performed in block 5 in the method diagram in Fig. 1. The generation of patch candidates by predefined templates is performed from the source code objects of the ABAP program with an error. This algorithm is presented in Listing 2.

Line 2 defines the array of variables Vflx of the tree nodes P , which was obtained by forming AST (see the section Generating AST from source code) from the text of the program to automatically generate the patch for. The array of variables Vflx is determined from l of the tree nodes pfix e P , where 3 b = true and 3tn = Variable. In lines 3-14, the array of conditions CNDfa is generated by executing the Cartesian product of the array of variables V, and the array of degrees of comparison (>, <, =, is initial, is not initial). In

Jlx

lines 15-17, patch candidates P fixpatch are generated by adding a check (if statement) with the condition cndix from the array of conditions CND before the error location. In lines 15-18, patch candidates P. t, are

J jix ' r Jixpatch

generated by replacing a condition in the check statement (if) with cnd fix from the array of conditions CNDfix in the error location. In lines 15-19, patch candidates Pflxpatch are generated by changing the condition in the loop operator to cndflx from the array of conditions CNDflx in the error location.

Further, the features for the generated patch candidates P fixpatch are defined (see the section Defining patch features) and the application success rate is determined (see the section Ranking generated patches based on features and the trained logistic regression model).

Ranking generated patches based on features and the trained logistic regression model

The ranking ofgenerated patches based on features and the trained logistic regression model is performed in block 7 in Fig. 1. The success rate prediction flxpattch is determined for each generated patch candidate P fi^atch (see the section Generating candidate patches based on templates) by applying the trained logistic regression model:

predictionfixpatck = — -0xF

1 + e

fxpatch

where 0 obtained in the process of training the logistic regression model (see the section Model training); F obtained in the process of defining patch features for patch candidates P (see the section

Jixpatch r c r r Jixpatch x

Defining patch features).

Further, P. ,, with the maximum value of prediction_ ,, is selected, which means that the candidate

Jixpatch r Jixpatch 1

patches with the highest probability of application success are selected based on the analysis of existing patches.

Evaluation

The method was tested on 10 projects in the ABAP language with errors. Some of the examples were prepared by the authors in accordance with the required types of errors for evaluating the method's performance, while the other part — real projects. The test results are shown in table 2.

T a b l e 2

The results of the test method

Name of the source code example Type of error Number of lines of source code Number of candidate patches generated Execution time, sec The patch was successfully generated

ABAPException.abap1 Division by 0 34 300 66 Yes

mycalculator.abap2 Division by 0 25 100 14 Yes

SubRoutines.abap3 Division by 0 59 1200 836 Yes

AbapRep_usingclassHana. abap4 Calling a function using an empty pointer 27 800 371 No

zma_dp_strategy.prog.abap5 Calling a function using an empty pointer 33 700 285 Yes

zcl_pi_static.clas.abap6 Calling a function using an empty pointer 46 100 12 Yes

TestCodeWithIfBug.abap7 Error in the if operator 17 200 37 No

TestCodeWithIfBug2.abap8 Error in the if operator 11 50 9

TestCodeWithCycleBug. abap9 Error in the loop operator 13 20 8 1

1 https://github.com/naveenkumarbaskaran/SAP_ABAP19Jan/blob/efc47953337bb8fbaeee506ee9a3c701bfa4f498/ABAPException.abap

2 https://github.com/naveenkumarbaskaran/SAP_ABAP19Jan/blob/master/mycalculator.abap

3 https://github.com/naveenkumarbaskaran/SAP_ABAP19Jan/blob/master/SubRoutines.abap

4 https://github.com/naveenkumarbaskaran/SAP_ABAP19Jan/blob/master/AbapRep_usingclassHana.abap

5 https://github.com/Huargh/OO-Design-Patterns-in-ABAP/blob/master/src/zma_dp_strategy.prog.abap

6 https://github.com/ivangurin/abapPI/blob/5f30db0cc7a408a759ad833fe14f6e803b1b46bf/src/zcl_pi_static.clas.abap

7 https://github.com/AlekseiBelskii/AlexB/blob/master/TestCodeWithIfBug.abap

8 https://github.com/AlekseiBelskii/AlexB/blob/master/TestCodeWithIfBug2.abap

9 https://github.com/AlekseiBelskii/AlexB/blob/master/TestCodeWithCycleBug.abap

TestCodeWithCycleBug2. abap1 Error in the loop operator 13 50 12 0

276 3520 1650 6/10

The first column shows the project name with an error and a link to Github source code repository. The second column shows the type of error that patches were generated for. The third column shows the number of lines of source code with an error. The fourth column contains the number of error correction candidate patches generated for each project. The number of candidate patches was formed in an amount, which was enough to get the expected result. The fifth column shows the time it took to generate candidate patches for each project with an error. The last column shows whether patches were successfully generated for each project with an error or not. Patch is considered successfully generated if the desired patch is found among all the generated patch candidates with the highest probability of success prediction ^.

The method was tested on a stand with the following characteristics: Intel Core i3-7100U 2.40 Ghz, 4.00 Gb RAM, Windows 10. As a result of the experiments, 6 patches were successfully found for 10 programs with an error of 1650 seconds, which indicates the reality of using machine learning methods for automatic patch generation, but at the same time, the obtained accuracy and the speed indicate the necessity for additional tests, better training of the logistic regression model, increasing the power of the test stand, as well as other improvements to the method. These improvements are expected to be developed and implemented in future works.

Conclusion

During the research, the method was developed to automatically generate bug fixes for ABAP programs based on the analysis of existing patches, which generates candidate patches for ABAP programs and ranks the results using machine learning methods. The obtained preliminary test results suggest that using machine learning methods to solve problems of automatic error correction in programs is a promising direction for software engineering. Directions for further development of the work:

♦ conducting deeper testing of the method on a wider set of real projects;

♦ extending the method to support new programming languages;

♦ extending the set of the extracted features and the list of error types to fix;

♦ use more complex machine learning models to improve the performance of the method.

REFERENCES

1. SAP SE. ABAP—Keyword Documentation. Available: https://help.sap.com/doc/abapdocu_latest_index_ htm/latest/en-US/index.htm —2019.

2. Le Goues C., et al. Genprog: A generic method for automatic software repair. IEEE Transactions on Software Engineering, 2012, Vol. 38, No. 1, P. 54.

3. Tan S.H., Roychoudhury A. relifix: Automated repair of software regressions. Proceedings of the 37th International Conference on Software Engineering. IEEE Press, 2015, Vol. 1, Pp. 471—482.

4. Martinez M., Monperrus M. Astor: A program repair library for java. Proceedings of the 25th International Symposium on Software Testing and Analysis. ACM, 2016, Pp. 441—444.

5. Le X.B.D., Lo D., Le Goues C. History driven automated program repair. 2016.

6. Forrest S. Genetic algorithms: Principles of natural selection applied to computation. Science, 1993, Vol. 261, Pp. 872-878.

1 https://github.com/AlekseiBelskii/AlexB/blob/master/TestCodeWithCycleBug2.abap

7. Nguyen H.D.T., et al. Semfix: Program repair via semantic analysis. Proceedings of the 35th International Conference on Software Engineering (ICSE). IEEE, 2013, Pp. 772—781.

8. Le X.B.D., et al. JFIX: semantics-based repair of Java programs via symbolic PathFinder. Proceedings of the 26th ACMSIGSOFTInternational Symposium on Software Testing and Analysis, ACM, 2017, Pp. 376—379.

9. Wang Y., et al. CRSearcher: Searching Code Database for Repairing Bugs. Proceedings of the 9th Asia-Pacific Symposium on Internetware, ACM, 2017, P. 16.

10. D'Antoni L., Samanta R., Singh R. Qlose: Program repair with quantitative objectives. International Conference on Computer Aided Verification. Springer, Cham, 2016, Pp. 383—401.

11. Mechtaev S., et al. Semantic Program Repair Using a Reference Implementation. Proceedings of ICSE, 2018.

12. van Tonder R., Le Goues C. Static Automated Program Repair for Heap Properties, 2018.

13. Hill A., Päsäreanu C.S., Stolee K.T. Automated program repair with canonical constraints. Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. ACM, 2018, Pp. 339—341.

14. Cadar C., et al. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. OSDI, 2008, Vol. 8, Pp. 209-224.

15. De Moura L., Bjerner N. Z3: An efficient SMT solver. International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, Berlin, Heidelberg, 2008, Pp. 337-340.

16. Liu C., et al. R2Fix: Automatically generating bug fixes from bug reports. Proceedings of the 6th International Conference on Software Testing, Verification and Validation, IEEE, 2013, Pp. 282-291.

17. Long F., Rinard M. Automatic patch generation by learning correct code. ACM SIGPLAN Notices, 2016, Vol. 51, No. 1, Pp. 298-312.

18. Saha R.K., et al. Elixir: Effective object-oriented program repair. 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, 2017, Pp. 648-659.

19. Gopinath D., et al. Data-guided repair of selection statements. Proceedings of the 36th International Conference on Software Engineering. ACM, 2014, Pp. 243-253.

20. Dietterich T.G. Machine learning. Encyclopedia of Computer Science. John Wiley and Sons Ltd., GBR, 2003, Pp. 1056-1059.

21. Witten I.H., et al. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2016.

22. Cui B., et al. Code comparison system based on abstract syntax tree. Proceedings of the 3rd IEEE International Conference on Broadband Network and Multimedia Technology (IC-BNMT). IEEE, 2010, Pp. 668-673.

23. Matthew S. Davis. An object oriented approach to constructing recursive descent parsers. SIGPLAN Not., 2000, 35, 2, Pp. 29-35. DOI: https://doi.org/10.1145/345105.345113 - 2000

24. Kleinbaum D.G., et al. Logistic regression. New York: Springer-Verlag, 2002.

25. Kalantar B., et al. Assessment of the effects of training data selection on the landslide susceptibility mapping: A comparison between support vector machine (SVM), logistic regression (LR) and artificial neural networks (ANN). Geomatics, Natural Hazards and Risk, 2018, Vol. 9, No. 1, Pp. 49-69.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

26. Ruder S. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747. 2016.

Received 10.03.2020.

СПИСОК ЛИТЕРАТУРЫ

1. SAP SE. ABAP—Keyword Documentation // URL: https://help.sap.com/doc/abapdocu_latest_index_ htm/latest/en-US/index.htm-2019.

2. Le Goues C., et al. Genprog: A generic method for automatic software repair // IEEE Transactions on Software Engineering. 2012. Vol. 38. No. 1. P. 54.

3. Tan S.H., Roychoudhury A. relifix: Automated repair of software regressions // Proc. of the 37th Internat. Conf. on Software Engineering. IEEE Press, 2015. Vol. 1. Pp. 471-482.

4. . Martinez M., Monperrus M. Astor: A program repair library for java // Proc. of the 25th Internat. Symp. on Software Testing and Analysis. ACM, 2016. Pp. 441-444.

5. Le X.B.D., Lo D., Le Goues C. History driven automated program repair. 2016.

6. Forrest S. Genetic algorithms: Principles of natural selection applied to computation. Science, 1993, Vol. 261, Pp. 872-878.

7. Nguyen H.D.T., et al. Semfix: Program repair via semantic analysis // Proc. of the 35th Internat. Conf. on Software Engineering. IEEE, 2013. Pp. 772-781.

8. Le X.B.D., et al. JFIX: semantics-based repair of Java programs via symbolic PathFinder // Proc. of the 26th ACM SIGSOFT Internat. Symp. on Software Testing and Analysis. ACM, 2017. Pp. 376-379.

9. Wang Y., et al. CRSearcher: Searching Code Database for Repairing Bugs // Proc. of the 9th Asia-Pacific Symp. on Internetware. ACM, 2017. P. 16.

10. D'Antoni L., Samanta R., Singh R. Qlose: Program repair with quantitative objectives // Internat. Conf. on Computer Aided Verification. Springer, Cham, 2016. Pp. 383-401.

11. Mechtaev S., et al. Semantic Program Repair Using a Reference Implementation // Proc. of ICSE. 2018.

12. van Tonder R., Le Goues C. Static Automated Program Repair for Heap Properties. 2018.

13. Hill A., Päsäreanu C.S., Stolee K.T. Automated program repair with canonical constraints // Proc. of the 40th Internat. Conf. on Software Engineering: Companion Proceeedings. ACM, 2018. Pp. 339-341.

14. Cadar C., et al. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs // OSDI. 2008. Vol. 8. Pp. 209-224.

15. De Moura L., Bjerner N. Z3: An efficient SMT solver // Internat. Conf. on Tools and Algorithms for the Construction and Analysis of Systems. Springer, Berlin, Heidelberg, 2008. Pp. 337-340.

16. Liu C., et al. R2Fix: Automatically generating bug fixes from bug reports // Proc. of the 6th Internat. Conf. on Software Testing, Verification and Validation. IEEE, 2013. Pp. 282-291.

17. Long F., Rinard M. Automatic patch generation by learning correct code // ACM SIGPLAN Notices. 2016. Vol. 51. No. 1. Pp. 298-312.

18. Saha R.K., et al. Elixir: Effective object-oriented program repair // 2017 32nd IEEE/ACM Internat. Conf. on Automated Software Engineering. IEEE, 2017. Pp. 648-659.

19. Gopinath D., et al. Data-guided repair of selection statements // Proc. of the 36th Internat. Conf. on Software Engineering. ACM, 2014. Pp. 243-253.

20. Dietterich T.G. Machine learning. Encyclopedia of Computer Science. John Wiley and Sons Ltd., GBR, 2003. Pp. 1056-1059.

21. Witten I.H., et al. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2016.

22. Cui B., et al. Code comparison system based on abstract syntax tree // Proc. of the 3rd IEEE Internat. Conf. on Broadband Network and Multimedia Technology. IEEE, 2010. Pp. 668-673.

23. Matthew S. Davis. An object oriented approach to constructing recursive descent parsers // SIGPLAN Not. 2000. 35. 2. Pp. 29-35. DOI: https://doi.org/10.1145/345105.345113 - 2000

24. Kleinbaum D.G., et al. Logistic regression. New York: Springer-Verlag, 2002.

25. Kalantar B., et al. Assessment of the effects of training data selection on the landslide susceptibility mapping: A comparison between support vector machine (SVM), logistic regression (LR) and artificial neural networks (ANN) // Geomatics, Natural Hazards and Risk. 2018. Vol. 9. No. 1. Pp. 49-69.

26. Ruder S. An overview of gradient descent optimization algorithms // arXiv preprint arXiv:1609.04747. 2016.

Статья поступила вредакцию 10.03.2020.

THE AUTHORS / СВЕДЕНИЯ ОБ АВТОРАХ

Belskii Aleksei Вельский Алексей

E-mail: belskii.alexey@gmail.com

Itsykson Vladimir M. Ицыксон Владимир Михайлович

E-mail: vlad@icc.spbstu.ru

AUTOMATIC GENERATION OF SOFTWARE BUG FIXES BASED ON ANALYSIS OF SOFTWARE REPOSITORIES Текст научной статьи по специальности «Компьютерные и информационные науки»

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Belskii A., Itsykson V.M.

Похожие темы научных работ по компьютерным и информационным наукам , автор научной работы — Belskii A., Itsykson V.M.

Текст научной работы на тему «AUTOMATIC GENERATION OF SOFTWARE BUG FIXES BASED ON ANALYSIS OF SOFTWARE REPOSITORIES»