Научная статья на тему 'Разработка методов, моделей и средств системы авторской атрибуции текста'

Разработка методов, моделей и средств системы авторской атрибуции текста Текст научной статьи по специальности «Компьютерные и информационные науки»

CC BY
43
7
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
ЕРЕДНI ЧАСТОТИ ГРУП ПРИГОЛОСНИХ ФОНЕМ / СТИЛЕВА / ПIДСТИЛЕВА ТА АВТОРСЬКА ДИФЕРЕНЦIАЦIЯ ТЕКСТIВ / ПРОГРАМНА СИСТЕМА / МЕТОД / ФОНЕМА / ФОНОЛОГIЧНИЙ РIВЕНЬ / MEAN FREQUENCIES OF GROUPS OF CONSONANT PHONEMES / STYLE / SUBSTYLE / AUTHOR DIFFERENTIATION OF TEXTS / SOFTWARE SYSTEM / METHOD / PHONEME / PHONOLOGICAL LEVEL

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Khomytska I., Teslyuk V., Holovatyy A., Morushko O.

Уровень точности авторской атрибуции текста не есть достаточно высоким на лексическом и синтаксическом уровнях языка, так как эти уровни не являются строго организованными системами. В данном исследовании авторская атрибуция текста основывается на дифференциации фоностатистических структур стилей. Разработано систему дифференциации фоностатистических структур стилей, которая отличается от существующих избранным уровнем языка фонологическим. На этом уровне языка можно получить результаты с большей точностью. Кроме того, создана система основывается на модульном принципе, что дает возможность быстро модифицировать разработанный программный продукт. Разработано методы и модели, которые основываются на теории математической статистики и дают возможность повысить точность дифференциации фоностатистических структур стилей. Разработано метод комплексного анализа фоностатистических структур стилей и многофакторный метод определения степеней действия факторов стиля, подстиля и авторской манеры изложения. Создано статистическую модель стилевой дифференциации за методом ранжирования и статистическую модель определения общей стилевой маркированности исследуваемого текста. Разработана програмная система дифференциации текстов. Критерием дифференциации текстов являются средние частоты групп согласных фонем. В процессе реализации системы использован язык программирования java, что обеспечивает платформу-независимость программного продукта. Приведено результаты использования разработанных методов, моделей и программных средств. Результаты исследования подтверждают, что авторская атрибуция текста на фонологическом уровне является более эффективной. Разработанные методы, модели и средства авторской атрибуции текста можно использовать при определении процентного соотношения творческого вклада каждого из соавторов научных работ

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Development of methods, models, and means for the author attribution of a text

The level of accuracy of author attribution of a text is not high enough at the lexical and syntactic levels of a language as these levels are not strictly organized systems. In this study, the author attribution of a text is based on the differentiation of phonostatistical structures of styles. We have developed a system of differentiation of phonostatistical structures of styles, which differs from the existing ones by the chosen level of a language phonological. At this level of a language one can obtain results with a greater accuracy. In addition, the system constructed is based on a modular principle, which makes it possible to rapidly modify the developed software. We have developed methods and models that are based on the theory of mathematical statistics and allow the improvement in the accuracy of differentiation of phonostatistical structures of styles. A method was devised for a comprehensive analysis of phonostatistical structures of styles, as well as a multifactor method for determining the degrees of action of factors related to style, substyle, and author's manner of presentation. We have constructed a statistical model of stylistic differentiation using the ranking method, and a statistical model for determining a general stylistic markedness of the examined text. A software system for the differentiation of texts was designed. The criterion for the differentiation of texts is the mean frequencies of groups of consonant phonemes. In the process of implementing a system we used the programming language java, which ensures that the software is platform-independent. This study reports results of the application of the developed methods, models, and software tools. The research results confirm that author attribution of a text at the phonological level is more effective. The developed methods, models, and means for the author attribution of a text could be used when determining the percentage of creative contribution of each of the co-authors of scientific papers.

Текст научной работы на тему «Разработка методов, моделей и средств системы авторской атрибуции текста»

10. Elwasify A. I. A Combined Model between Artificial Neural Networks and ARIMA Models // International Journal of Recent Research in Commerce Economics and Management (IJRRCEM). 2015. Vol. 2, Issue 2. P. 134-140.

11. Hidayatulloh I., Bustoni S. A. SARIMA-Egarch Model to Reduce Heteroscedasticity Effects in Network Traffic Forecasting // Journal of Theoretical and Applied Information Technology. 2017. Vol. 95, Issue 3. P. 554-560.

12. Confidence matching in group decision-making / Bang D., Aitchison L., Moran R., Herce Castanon S., Rafiee B., Mahmoodi A. et. al. // Nature Human Behaviour. 2017. Vol. 1, Issue 6. P. 0117. doi: 10.1038/s41562-017-0117

13. Atamanyuk I., Kondratenko Y. P., Sirenko N. N. Forecasting Economic Indices of Agricultural Enterprises Based on Vector Polynomial Canonical Expansion of Random Sequences // Proceedings volume of 5th International Workshop on Information Technologies in Economic Research (ITER) in ICTERI. Kyiv, 2016. P. 458-468.

14. Pugachev V. S. Probability Theory and Mathematical Statistics for Engineers. URL: https://legyrinez.firebaseapp.com/aa122/ probability-theory-and-mathematical-statistics-for-engineers-by-v-s-pugachev-b00jez1dwq.pdf

15. Atamanyuk I. P., Kondratenko Yu. P. Information technology of polynomial forecast control of trouble-free operation of technical systems // System Research and Information Technologies. 2013. Issue 1. P. 43-52.

-□ □-

PieeHb mo4Hocmi авторськог атрибуцп текста не е достат -ньо високий на лексичному та синтаксичному рiвнях мови, бо ц piem не е строго органйованими системами. У даному до^джент авторська атрибуция текста грунтуеться на диференщацп фоностатистичних структур стилiв.

Розроблено систему диференщацп фоностатистичних структур стилiв, яка вiдpiзняеться вiд кнуючих вибраним piвнем мови - фонологiчним. На цьому piвнi мови можна отримати результати з бшьшою точтстю. Окpiм того, побудована система грунтуеться на модульному принцит, що дае змогу швидко модифжувати розроблений програмний продукт.

Розроблено методи та моделi, як грунтуються на теори математичног статистики i дають змогу тдвищити точ-тсть диференщацп фоностатистичних структур стилiв. Побудовано метод комплексного аналiзу фоностатистичних структур стилiв, багатофакторний метод визначення ступетв дп фактоpiв стилю, тдстилю та авторськог мане-ри викладу. Побудовано статистичну модель стилевог диференщацп за методом ранжування та статистичну модель визначення загальног стилевог маpкованостi дослiджува-ного текста. Розроблено програмну систему диференщацп текстiв.

Кpитеpiем диференщацп текстiв е середш частоти груп приголосних фонем. В процепpеалiзацiг системи використа-на мова програмування java, що забезпечуе платформо-неза-лежтсть програмного продукту.

Наведено результати застосування розроблених методiв, моделей та програмних засобiв, як пгдтверджують, що авторська атрибуцш текста на фонологiчному piвнi е ефек-тивп1шою.

Розроблен методи, моделi та засоби авторськог атрибуцп текста можна використати при встановлеит в^дсотку творчого внеску кожного iз спiвавтоpiв наукових праць

Ключовi слова: середн частоти груп приголосних фонем, стилева, тдстилева та авторська диференщащя текстiв,

програмна система, метод, фонема, фонологiчний piвень -□ □-

UDC 519.711; 681.5; 621.382

I DOI: 10.15587/1729-4061.2018.1320521

DEVELOPMENT OF METHODS, MODELS, AND MEANS FOR THE AUTHOR ATTRIBUTION OF A

TEXT

I. Khomytska

Assistant

Department of Applied Linguistics * V. Tesly u k

Doctor of Technical Sciences, Professor Department of Automated Control Systems* E-mail: [email protected] A. Holovatyy PhD, Associate Professor Department of Information Technologies Ukrainian National Forestry University Henerala Chuprynky str., 103, Lviv, Ukraine, 79057 O. Morushko PhD, Associate Professor Department of Social Communication and Information Activity* *Lviv Polytechnic National University Bandery str., 12, Lviv, Ukraine, 79013

1. Introduction

Modern information technologies (IT) are widely used in various fields of science and technology. One of such areas is applied linguistics [1, 2] where IT has been applied

to the author's attribution by using content analysis [3], for the attribution of texts in legal proceedings [4, 5], and for a linguistic analysis of the text commercial content [6]. IT is employed in the semantic analysis of Ukrainian texts [7] and for carrying out scientific research related to programs

©

of distance learning [8]. A system of algometric algebra has been used in a grammatical analysis for the categorization of text documents and in determining the author's style [9-11], etc.

Of particular importance under conditions of globalization, is the task on identifying the authorship of texts. An analysis of the subject area that we conducted reveals that in most cases the differentiation of phonostatistic structures of styles in the process of establishing the author of the text, as well checking a text for plagiarism, involved methods, models, and software tools at the lexical level of a language. However, the phonological level differs from other levels of the language by a stricter structure and ordering of the elements. It is easier to formalize and mathematize. Therefore, it is advisable to apply methods, models, and software tools at the phonological level of a language in order to identify the author of the text and to check the text for plagiarism. Accordingly, the development of methods, models, and tools that would enable the IT differentiation of phonostatistic structures of functional styles in the English language is an important and relevant task.

2. Literature review and problem statement

The task on identifying the authorship of the text implies differentiation of texts. Texts are differentiated at the different levels of a language in order to identify their differences and similarities. Thus, the differentiation of texts at the lexical level was performed when modeling grammatical structures [12]. However, the lexical level is an open system. The number of elements is not constant. The system is updated with new words (neologisms) while rarely used words become archaic. An author's style reflects changeable processes in a lexical system. Therefore, the identification of authorship at the lexical level is of a probabilistic character. It is worth noting that grammatical structures are abstract, idealized models, and do not provide for a complete reflection of the speech process. This makes it difficult to define the differential attributes of the author's style. Modeling of semantic structures was used for text differentiation [13]. Semantic structures are the abstract constructs whose implementation depends on the context. That is why a focus on semantics predetermines a probabilistic character of the author's attribution. Texts are differentiated at the lexical and semantic levels when splitting a sentence into key words [14]. Determining the dominant lexical units was used when distinguishing texts in the areas of culture and tourism [15, 16]. Determining the dominant key words does not make it possible to cover lexical vocabulary characteristic of a particular author and is not promising in identifying the author's style. It should be noted that the results of text differentiation at the lexical and semantic levels of a language have a more probabilistic character than that at the phonological level. In contrast to a phonological level, the number of elements is not constant and that compromises the accuracy of calculations. In addition, no combination of the most effective quantitative methods was determined to differentiate texts at each level of a language [17]. When establishing the differential attributes of the author's style using statistical methods, no scheme style^substyle^author was applied, which facilitates determining statistical parameters for the author's manner of presentation in texts from different subjects [18]. Information technologies were not employed

for the author's attribution at the phonological level, and that does not provide the proper level of accuracy [19, 20]. Software systems do not implement a combination of statistical methods, which would provide efficiency of the author's attribution [21]. An analysis of the scientific literature that we conducted revealed that the task on improving the accuracy of text differentiation remains unsolved. To solve the problem, it is required to carry out author's attribution at the phonological level, to apply the combination of statistical methods that is the most efficient to obtain probable results and to determine the degree of validity of factors related to style, substyle, and the author's manner of presentation.

3. The aim and objectives of the study

The aim of present study is to improve the accuracy of differentiation of phonostatistic structures of styles in the English language based on the developed methods, models, and software tools for the implementation of the author's, substyle, and style text attribution.

To accomplish the aim, the following tasks have been set:

- to develop a mathematical basis for the system of differentiation of phonostatistic structures of functional styles in the English language using the theory of mathematical statistics, which would make it possible to improve the accuracy of output results;

- to construct models for the differentiation of phonosta-tistic structures of styles of the English language;

- to devise a structure of the system and the software that would be based on a modular principle, which would make it possible to rapidly modify the developed IT tools and to ensure that the software system is platform-independent.

4. Development of the system's mathematical basis

The core of any software system is a mathematical basis that includes the developed methods. The constructed mathematical basis for the differentiation of phonostatistic structures of styles in the English language includes the following.

1. A method of comprehensive analysis for the differentiation of phonostatistic structures of styles [22, 23] is based on the proposed combination of such statistical methods as: a method of hypotheses, a method of ranking, and a method for determining the distances between styles. The algorithm of the constructed method of hypotheses includes the following steps:

Step 1. Check the conformity of frequency of consonant phonemes to the law of normal distribution using the Pearson criterion and a simplified criterion by Romanovsky.

Step 2. Differentiation of texts for the Student's criterion.

Step 3. Determine the groups of consonant phonemes, based on which we established substantial differences in the pairwise comparison of texts.

An algorithm of the ranking method includes the following steps:

Step 1. Determine the mean frequency of groups of consonant phonemes.

Step 2. Construction of descending series of mean frequencies for each group of phonemes.

Step 3. Determine significant differences between the pairwise compared texts based on the difference in ranking.

An algorithm of the method for determining the distances between styles is implemented by the following steps:

Step 1. Differentiation of pairwise compared texts based on the Student's criterion.

Step 2. Derive from the formula for the Student's criterion a formula for determining the distances between styles

(l = izio.) [24, 25].

Step 3. Determine a large, medium, and insignificant distance between styles.

The method considered makes it possible to differentiate with greater accuracy the styles, substyles, and texts by different authors.

2. A multi-factor method for determining the degrees of action of the factors related to style, substyle, and the author's manner of presentation, is based on the developed scheme style^substyle^author in order to identify the authorship of texts of the same style, one substyle, but by different authors. An algorithm of the method includes the following basic steps:

Step 1. Determine substantial differences in the pairwise comparison of texts based on the Student's criterion: different styles, different substyles, different authors.

Step 2. Determine a significant, medium, and insignificant degree of action factors related to: style, substyle, the author's manner of presentation.

The method makes it possible to establish with a higher accuracy the affiliation of the text under study to a specific style, substyle, and to identify its author.

5. Development of models for the differentiation of phonostatistic structures of styles

Based on the developed methods, we have built statistical models for the style, substyle, and author's differentiation of texts by the ranking method. An algorithm of the specified models includes the following steps.

Step 1. Determine the mean frequency of groups of consonant phonemes for texts: of different styles, different substyles, by different authors, determine the highest and lowest indicators of values for the mean frequency, determine large, medium, and minor differences based on the proposed formula

ra = r - r .

X1 -x2 maxX1 minx2

The models developed make it possible to take into consideration, with a greater accuracy, the position of a phoneme in a word, to perform the style, substyle, and the author's attribution of texts based on the ranking difference.

We have developed a statistical model for determining a general stylistic markedness of the examined text. An algorithm for constructing the model includes the following steps:

Step 1. Determine essential differences, based on the Student's criterion, in the compared texts: different styles, different substyles, by different authors, in various subjects.

Step 2. It is proposed to determine the mean value for the three obtained t-values for the Student's criterion:

tf + tf + tf

. J1 f2 Id

3

Step 3. Determine a large, medium, and insignificant stylistic markedness of the examined text.

The developed model is a combination of three models represented in papers [26, 27]. The model needs to be applied in the case when texts belong to the same style and substyle, but they are by different authors and address a different topic. The model makes it possible to identify the author of texts on various subjects with a higher accuracy. Therefore, the developed methods, models, and algorithms male it possible to improve the accuracy of differentiation of the phonostatis-tic structures of styles.

6. Development of the structure and software of the system

The methods and models developed have been implemented in the programming language java, in the system of differentiation of phonostatistic structures of styles in the English language.

The structure of the developed software is shown in Fig. 1; it is based on a modular principle and allows individual customization and support for each module, it ensures high reliability of the system [28]; the built software is easily upgraded.

The algorithm the English language style differentiation based on the mean frequencies of groups of consonant phonemes, which is implemented in the system, implies the execution of a sequence of the following basic steps:

1. Computer processing of the examined text:

1. 1. Upload an English-language text to the software.

1. 2. Convert the text into a transcription variant.

1. 3. Separate from the transcription characters those that denote consonant phonemes.

1. 4. Compile a sample with a volume of 51,000 consonant phonemes.

1. 5. Split the sample into 51 parts each comprising 1,000 phonemes.

1. 6. Calculate the number of consonant phonemes for any position of the phoneme in a text.

1. 7. Calculate the mean value of each consonant phoneme in a text with a volume of part that contains 1,000 phonemes and with a volume of the sample of 51,000 phonemes.

1. 8. Combine consonant phonemes in groups (summing the mean frequencies of phonemes).

The result is the determined values of the mean frequencies of groups of consonant phonemes.

1. Check whether the mean frequencies of groups of consonant phonemes match the normal distribution by using the Pearson criterion:

1. 1. Determine a theoretically normal distribution.

1. 2. Calculate a theoretical frequency (the mathematical expectation that the magnitude of X is in the ¿-th interval).

1. 3. Check the conformity to the normal distribution for eight groups of phonemes (51 parts for each).

Provided that the mean frequencies of groups of consonant phonemes comply with the normal distribution, it is necessary to perform computerized style differentiation based on the mean groups of frequencies using the Student's criterion.

The algorithm of functioning of the system supports simultaneous work with two text files (Fig. 2). This includes opening two files, converting them into transcription, sampling of consonant phonemes, splitting the sample into portions, calculation of the number of phonemes in each portion and the sample, merging into groups and further verification

by the Pearson criterion. This is performed so that it is possible, provided the mean frequencies of groups of consonant phonemes comply with the normal distribution, to compare the texts for the existence of phonetical difference.

Fig. 1. Structure of software system for the differentiation of phonostatistic structures of functional styles of the English language

In the process of software development we constructed the following basic classes: Main, Window, PanelFile, ExtFileFilter, PanelTranscription, DistributionOfPortion, DitributionOfGroup, CriterionPearson, CriterionStudent. The developed structure of classes enables choosing a text file, checking whether a given file has the .txt extension, con-

verting the text into a transcription variant. Input samples are checked by the system for conformity with the normal distribution law and are differentiated based on the mean frequencies of groups of consonant phonemes.

Using the java programming language ensures that the developed software is platform-independent.

7. Discussion of results of testing a system for the differentiation of phonostatistic structures of styles

We have chosen as the material to study texts written in the literary, conversational, newspaper, and scientific styles. Specifically, Fig. 2 shows example of the interface for adding new words to the Word.txt and Transcription.txt files.

We tested the system using material of the texts written by different authors in the scientific style. In the "Pearson Criterion" tab we verified conformity of the texts to the law of normal distribution. It was established that groups of labial, front-alveolar, mid-alveolar, post-alveolar, nasal, sonorous, slit and closed phonemes comply with the law of normal distribution. Based on the differentiation of phonostatistic structures of texts, by different authors, related to the scientific style, for the Student's criterion, we established significant differences in styles for groups of labial, front-alveolar, post-alveolar, nasal, slit and closed phonemes. Random differences were found for groups of mid-alveolar and sonorous phonemes. Thus, we have established phonostatistic parameters for the differentiation of texts by different authors.

Based on the research results, obtained for the scientific, fiction, conversational and newspaper styles, we determined significant substantial differences for the group of slit phonemes by the ranking method (rank indicators difference is 6). Fig. 3 shows statistical model of style differentiation for the scientific and conversational styles based on the ranking method for the group of slit phonemes for the case of an undefined position of the phoneme in a word:

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

3 Word - Блокнот — □ X Файл Редагування Формат Вигляд Довщка < > Transcription - Блокнот — □ X Файл Редагування Формат Вигллд Довщка < >

|1>| Додавання нових cniB — □ X Слова, ям будуть додаы до файлу 'Word'!!! 1 originally approached Java as just another programming language which in many senses it is But time passed and studied more deep began to see that the fundamental intent of this language was different from other had languages seen up point Programming about managing complexity problem you want solve laid upon solved Because most our projects fail yet all am have iOki| Додавання нових слш-фонем — □ X Слова-фонеми, як\ будуть додам до файлу 'Transcription'!!! aj endseneli eprotjt d3ave aez d3esteneder prograemir) leerjgweds witf en mcni sensezitizbettajm paest send stedid тэг dip bigeen tu si dset 5э fendementel intent avflis Iaer]gwad3wazdifarantfram adar Iaer|gwad3az haed sin appojnt progrsmin ebawt m©ned3in kemplekseti prablem ju want salv led эрап salvd bitozmost awsr prad3£kts fel jet al aem ewEralmostnen haevgan awt dssajded iOki|

3 Word,fcrt- Блокнот — □ X Файл Редагування Формат Вигллд Дошдка j|| Transcription.txt - Блокнот — □ X Файл Редагування Формат Вигляд Довщка

I originally approached Java as just another programming language which in many senses it is But time passed and studied more deep began to see that the fundamental intent of this language was different from other languages had seen up point Programming about managing complexity problem you want solve laid upon solved Because most our projects fail yet all am aware almost none have |gone out decided aj arxd3eneli eprotjt d3ava sez d3ast апаЗаг progr£emii] lser]gwed3 wit J an rmsni ssnsaz xt iz bat tajm paest and stadid тэг dip bxgcen tu si 6«et бэ fundamental intent av 3xs lseqgwed3 waz dxfarant fram аЗаг lser]gwad3az h«d sin эр pэjпt prograemir) abawt maenad3ir) kamplsksati prablam ju want salv led apan salvd Ьхкэг most awar prad3ekts fel jet э1 гет awsr эlmost nan haev g3n awt dasajdad

i > < >

Рд 9, ствп 1 Рд 8, ствп 43

Fig. 2. Example of the interface for adding new words to the Word.txt and Transcription.txt files

Fig. 3. Statistical model of style differentiation based on the ranking method: 6 — significant essential difference (6 units);

2 — insignificant similarity (2 units)

For the case of identifying the authorship of texts related to various subjects, but of one style and substyle, it is appropriate to apply a statistical model that combines three statistical models-elements: determining a style affiliation; determining a substyle affiliation; identifying an author of texts related to various topics. This is a statistical model for determining a general stylistic markedness of the examined text (Fig. 4).

SSI—I—I—I—I—I—I—I—I I' M NS___—.......

CS"—I—I—I—I—I—IPM

a

Fig. 4. Model: a — style differentiation for the case when a phoneme is at the beginning of a word when comparing

texts of poems by Moore (PM), conversational (CS), newspaper (NS), and scientific styles (SS); b — substyle differentiation for the case when a phoneme is at the end of a word when comparing texts of poems by Byron and Moore, fiction by Byron (FB) and drama by Shaw (DSh); c — author's differentiation for the case of the undefined position of the phoneme in a word when comparing texts of poetry by Byron and Moore; the belles letters style (BS)

The research results based on 5 out of 553 experiments (described earlier, in particular, in [22, 23, 26, 27]) showed that the developed methods, models, and tools make it possible to improve the efficiency of author's attribution of a text. The phonological level selected for the study is organized stricter than the other levels of a language. However, the phonological system is probabilistic in character with the probability of making an error being equal to 5 %. The developed software system could be applied for identifying the authorship of a text in fiction, as well as legal, official, business, and scientific areas. Further study will address the

development of a software system for the author attribution of a text for each of the groups of consonant phonemes in order to determine a group of phonemes for which author attribution would be most effective.

8. Conclusions

1. Effectiveness of the methods developed was tested during 553 experiments, the results of five of which are covered in this paper. Experiments were conducted for eight groups of consonant phonemes (labial, front-alveolar, mid-alveolar, post-alveolar, nasal, sonorous, slit, and closed) in texts related to fiction, as well as conversational, newspaper, and scientific styles, for the three cases of the position of a phoneme in a word. The results of experiments, described previously and examined in present work, show that the developed method of a comprehensive analysis of the differentiation of phonostatistic structures of styles, as well as a multi-factor method for determining the degree of action of factors related to style, substyle, and the author's manner of presentation, make it possible to improve the efficiency of author attribution, and thereby check a text for plagiarism. Efficiency of the method for a comprehensive analysis of differentiation of the phonostatistic structures of styles is ensured by the proposed combination of statistical methods (hypotheses, ranking, determining distances between styles), among which the ranking method was applied for the first time to solve the task on author attribution of a text. Data were obtained from three methods of mathematical statistics with a probability of error of 5 %. Efficient is the scheme style^sub-style^author, which underlies a multifactor method for determining the degree of action of factors related to style, substyle, and an author's manner of presentation.

2. Based on the developed methods, we built a statistical model of style differentiation using the ranking method and a statistical model for determining a general stylistic markedness of the examined text. The models allow the improvement of accuracy in the differentiation of phonos-tatistic structures of styles during author attribution and verifying a text for plagiarism.

3. We have developed the structure and tools for a system of differentiation of phonostatistic structures of styles, which is different from existing ones by the chosen level of a language -phonological. At this level of a language one can obtain results with a greater accuracy. The constructed system is based on a modular principle, which makes it possible to rapidly modify the developed software and to identify a group of consonant phonemes, which could be employed to perform author attribution of a text more effectively. The system was implemented in the programming language java, which ensures that it is platform-independent. The developed and implemented system can operate at different computer platforms.

The research results obtained could be used for identifying the authorship of the examined text, as well as for verifying a text for plagiarism. Further research seems promising in terms of defining phonostatistic parameters, specifically, the style differentiating power of groups of consonant phonemes whose mean frequencies are the criterion for the differentiation of an author's style.

References

1. Kornai A. Mathematical Linguistics. Springer, 2008. doi: 10.1007/978-1-84628-986-6

2. Gries Th. S. Statistics for Linguistics with R. Mouton Textbook, 2009. 335 p. doi: 10.1515/9783110216042

3. Martindale C., McKenzie D. On the utility of content analysis in author attribution: The Federalist // Computers and the Humanities. 1995. Vol. 29, Issue 4. P. 259-270. doi: 10.1007/bf01830395

4. Gibbons J. Forensic Linguistics. An Introduction to Language in the Justice System. Wiley-Blackwell, 2003. 346 p.

5. Olsson J. Forensic Linguistics. Second edition: An Introduction to Language, Crime and the Law. Bloomsbury Academic, 2008. 288 p.

6. Berko A. Yu., Vysotska V. A., Chyrun L. V. Linhvistychnyi analiz tekstovoho komertsiynoho kontentu // Informatsiyni systemy ta merezhi. Visnyk Natsionalnoho universytetu "Lvivska politekhnika". 2015. Issue 814. P. 203-227.

7. Bisikalo O. V., Vysotska V. A. Sentence syntactic analysis application to keywords identification Ukrainian texts // Radio Electronics, Computer Science, Control. 2016. Issue 3. P. 54-65. doi: 10.15588/1607-3274-2016-3-7

8. Shakhovska N., Vysotska V., Chyrun L. Intelligent Systems Design of Distance Learning Realization for Modern Youth Promotion and Involvement in Independent Scientific Researches // Advances in Intelligent Systems and Computing. 2016. P. 175-198. doi: 10.1007/978-3-319-45991-2_12

9. Content linguistic analysis methods for textual documents classification / Lytvyn V., Vysotska V., Veres O., Rishnyak I., Rish-nyak H. // 2016 XIth International Scientific and Technical Conference Computer Sciences and Information Technologies (CSIT).

2016. doi: 10.1109/stc-csit.2016.7589903

10. Lytvyn V. V., Bobyk I. O., Vysotska V. A. Application of algorithmic algebra system for grammatical analysis of symbolic computation expressions of propositional logic // Radio Electronics, Computer Science, Control. 2016. Issue 4. P. 77-89. doi: 10.15588/16073274-2016-4-10

11. Development of a method for the recognition of author's style in the Ukrainian language texts based on linguometry, stylemetry and glottochronology / Lytvyn V., Vysotska V., Pukach P., Bobyk I., Uhryn D. // Eastern-European Journal of Enterprise Technologies.

2017. Vol. 4, Issue 2 (88). P. 10-19. doi: 10.15587/1729-4061.2017.107512

12. Davydov M., Lozynska O. Linguistic models of assistive computer technologies for cognition and communication // 2016 XIth International Scientific and Technical Conference Computer Sciences and Information Technologies (CSIT). 2016. doi: 10.1109/ stc-csit.2016.7589898

13. Modelling of semantics of natural language sentences using generative grammars / Shestakevych T., Vysotska V., Chyrun L., Chyrun L. // Computer Science and Information Technologies: Proc. of the IX-th Int. Conf. CSIT'2014. Lviv: Lviv Polytechnic Publishing House, 2014. P. 19-22.

14. Application of sentence parsing for determining keywords in Ukrainian texts / Vasyl L., Victoria V., Dmytro D., Roman H., Zoriana R. // 2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT). 2017. doi: 10.1109/stc-csit.2017.8098797

15. Zhezhnych P., Markiv O. A linguistic method of web-site content comparison with tourism documentation objects // 2017 12 th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT). 2017. doi: 10.1109/ stc-csit.2017.8098800

16. Peculiarities of content forming and analysis in internet newspaper covering music news / Korobchinsky M., Chyrun L., Chyrun L., Vysotska V. // 2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT). 2017. doi: 10.1109/stc-csit.2017.8098735

17. Kapociute-Dzikiene J., Utka F., Sarkute L. Authorship Attribution and Author Profiling of Lithuanian Literary Texts // Proceedings of the 5th Workshop on Balto-Slavic Natural Language Processing. Hissac, Bulgaria, 2015. P. 96-105.

18. Stamatatos E. A survey of modern authorship attribution methods // Journal of the American Society for Information Science and Technology. 2009. Vol. 60, Issue 3. P. 538-556. doi: 10.1002/asi.21001

19. Automatically profiling the author of an anonymous text / Argamon S., Koppel M., Pennebaker J. W., Schler J. // Communications of the ACM. 2009. Vol. 52, Issue 2. P. 119. doi: 10.1145/1461928.1461959

20. Koppel M., Schler J., Argamon S. Computational methods in authorship attribution // Journal of the American Society for Information Science and Technology. 2009. Vol. 60, Issue 1. P. 9-26. doi: 10.1002/asi.20961

21. Juola P. Authorship Attribution // Foundations and Trends® in Information Retrieval. 2007. Vol. 1, Issue 3. P. 233-334. doi: 10.1561/1500000005

22. Khomytska I., Teslyuk V. The Method of Statistical Analysis of the Scientific, Colloquial, Belles-Lettres and Newspaper Styles on the Phonological Level // Advances in Intelligent Systems and Computing. 2016. P. 149-163. doi: 10.1007/978-3-319-45991-2_10

23. Khomytska I., Teslyuk V. Specifics of phonostatistical structure of the scientific style in English style system // 2016 XIth International Scientific and Technical Conference Computer Sciences and Information Technologies (CSIT). 2016. doi: 10.1109/ stc-csit.2016.7589887

24. Bektaev K. B. Matematicheskie metody v yazykoznanii. Ch. 2. Alma-Ata, 1974. 335 p.

25. Mitropol'skiy A. K. Tekhnika statisticheskih vichisleniy. Moscow: Nauka, 1971. 576 p.

26. Khomytska I., Teslyuk V. Modelling of phonostatistical structures of English backlingual phoneme group in style system // 2017 14th International Conference The Experience of Designing and Application of CAD Systems in Microelectronics (CADSM). 2017. doi: 10.1109/cadsm.2017.7916144

27. Khomytska I., Teslyuk V. Modelling of phonostatistical structures of the colloquial and newspaper styles in english sonorant phoneme group // 2017 12 th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT). 2017. doi: 10.1109/stc-csit.2017.8098738

28. Chabanyuk Y., Seniv M., Khimka U. Continuous Stochastic Optimization Procedure in Software Reliability // Proceedings of the XIIth International Conference The Experience of Designing and Application of CAD Systems in Microelectronics CADSM 2013. Polyana, 2013. P. 56-59.

i Надоели баннеры? Вы всегда можете отключить рекламу.