Научная статья на тему 'AUTOMATED METHOD OF HYPER-HYPONYMIC VERBAL PAIRS EXTRACTION FROM DICTIONARY DEFINITIONS BASED ON SYNTAX ANALYSIS AND WORD EMBEDDINGS'

AUTOMATED METHOD OF HYPER-HYPONYMIC VERBAL PAIRS EXTRACTION FROM DICTIONARY DEFINITIONS BASED ON SYNTAX ANALYSIS AND WORD EMBEDDINGS Текст научной статьи по специальности «Языкознание и литературоведение»

CC BY
82
14
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
СЕМАНТИКА / ГЛАГОЛЬНАЯ ГИПОНИМИЯ / СИНТАКСИЧЕСКИЕ МОДЕЛИ / ВЕКТОРНОЕ ПРЕДСТАВЛЕНИЕ СЛОВ / RUSVECTORES

Аннотация научной статьи по языкознанию и литературоведению, автор научной работы — Antropova Oksana I., Ogorodnikova Ekaterina A.

This study is dedicated to the development of methods of computer-aided hyper-hyponymic verbal pairs extraction. The method's foundation is that in a dictionary article the defined verb is commonly a hyponym and its definition is formulated through a hypernym in the infinitive. The first method extracted all infinitives having dependents. The second method also demanded an extracted infinitive to be the root of the syntax tree. The first method allowed the extraction of almost twice as many hypernyms; whereas the second method provided higher precision. The results were post-processed with word embeddings, which allowed an improved precision without a crucial drop in the number of extracted hypernyms. Key words: semantics, verbal hyponymy, syntax models, word embeddings, RusVectores

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «AUTOMATED METHOD OF HYPER-HYPONYMIC VERBAL PAIRS EXTRACTION FROM DICTIONARY DEFINITIONS BASED ON SYNTAX ANALYSIS AND WORD EMBEDDINGS»

Лингвистика

DOI: 10.31862/2500-2953-2020-2-60-75

O. Antropova, E. Ogorodnikova

Ural Federal University named after the first President of Russia B.N. Yeltsin, Yekaterinburg, 620002, Russian Federation

Automated method

of hyper-hyponymic verbal pairs extraction from dictionary definitions based on syntax analysis and word embeddings

This study is dedicated to the development of methods of computer-aided hyper-hyponymic verbal pairs extraction. The method's foundation is that in a dictionary article the defined verb is commonly a hyponym and its definition is formulated through a hypernym in the infinitive. The first method extracted all infinitives having dependents. The second method also demanded an extracted infinitive to be the root of the syntax tree. The first method allowed the extraction of almost twice as many hypernyms; whereas the second method provided higher precision. The results were post-processed with word embeddings, which allowed an improved precision without a crucial drop in the number of extracted hypernyms. Key words: semantics, verbal hyponymy, syntax models, word embeddings, RusVectores

Acknowledgments. The reported study was funded by RFBR according to the research project № 18-302-00129

FOR CITATION: Antropova O., Ogorodnikova E. Automated method of hyper-hyponymic verbal pairs extraction from dictionary definitions based on syntax analysis and word embeddings. Rhema. 2020. No. 2. Pp. 60-75. DOI: 10.31862/25002953-2020-2-60-75

© Antropova O., Ogorodnikova E., 2020

Контент доступен по лицензии Creative Commons Attribution 4.0 International License The content is licensed under a Creative Commons Attribution 4.0 International License

DOI: 10.31862/2500-2953-2020-2-60-75

О.И. Антропова, Е.А. Огородникова

Уральский федеральный университет

имени первого Президента России Б.Н. Ельцина,

620002 г. Екатеринбург, Российская Федерация

Автоматический метод выявления гипо-гиперонимических глагольных пар из глагольных дефиниций на основе синтаксического анализа и векторного представления слов

Исследование посвящено разработке двух методов автоматизированного выявления гипо-гиперонимических глагольных пар из словарных дефиниций. Методы основаны на суждении, что в глагольной словарной статье чаще всего определяемое слово является гипонимом, а толкование формулируется через его гипероним в инфинитиве. С помощью первого метода извлекаются все инфинитивы, имеющие зависимые слова. Во втором методе помимо этого требовалось, чтобы извлекаемый инфинитив был вершиной дерева синтаксического разбора. Первый метод позволяет извлечь почти вдвое больше истинных гиперонимов, в то время как второй метод обеспечивает более высокую точность. Результаты были обработаны с помощью векторного представления слов - это позволило значительно увеличить точность без существенного снижения количества извлеченных гиперонимов.

Ключевые слова: семантика, глагольная гипонимия, синтаксические модели, векторное представление слов, RusVectöres

Благодарности. Исследование финансировалось РФФИ в соответствии с исследовательским проектом № 18-302-00129.

ДЛЯ ЦИТИРОВАНИЯ: Антропова О.И., Огородникова Е.А. Автоматический метод выявления гипо-гиперонимических глагольных пар из глагольных дефиниций на основе синтаксического анализа и векторного представления слов // Rhema. Рема. 2020. № 2. С. 60-75. DOI: 10.31862/2500-2953-2020-2-60-75

J

1. Introduction

Studying and determining semantic relations has a special role in contemporary computational linguistics. The main domain of the application of semantic relations extraction is lexicography resources construction (e.g. electronic dictionaries and thesauri). Besides, different semantic relations are used in natural language processing and language teaching resources.

The number of semantic relations is large and the number of semantic units in a language is tremendous. This means that it is almost impossible to establish all links manually; this task requires some other solutions, including automated methods of semantic relations extraction. So, the following research is devoted to the problem of the elaboration of such a method for automated extraction of Russian hyper-hyponymic verbal pairs.

One of the problems is that there are no unified precise criteria of hyponymy (in particular troponymy); its definition is very subjective and depends on individual comprehension and linguistic experience.

Elena Kotsova describes the hypernym and hyponym as:

1. Hypernym

a. Is more frequent in a natural language;

b. Allows more active synonymic substitution of species words in speech;

c. Can be in a role of hypernym in different grades and levels of a genus-species hierarchy;

d. Is simpler in morphemic structure, has no nominal motivation;

e. Cannot be a word with an utterly broad meaning.

2. Hyponym

a. Has more specific meaning and can be divided into semantic features; usually it is a monosemantic word;

b. Has a seme, which is important for hyponymic links;

c. Is less frequent, especially for specific words with professional meaning;

d. More rarely can substitute a word with genus meaning, only in its syntagmatic field;

e. Has two-seme prototypic semantic structure (hyperseme + hyposeme);

f. Is in equivalent relations with other hyponyms of this hyponymic group;

g. Usually has more complex morphemic structure [Kotsova, 2010, p. 25-27].

The authors of Princeton WordNet formulate the idea of hyponymy among verbs as following: "...the many different kinds of elaborations that

distinguish a 'verb hyponym' from its superordinate have been merged into a manner relation that [Fellbaum, Miller, 1990] have dubbed troponymy (from the Greek tropos, manner or fashion). The troponymy relation between two verbs can be expressed by the formula To V1 is to V2 in some particular manner" [Fellbaum, 1993, p. 47].

For example, the verbs плестись, брести, тащиться 'to trail, to plod, to trudge' are synonyms as they express the same notion and contain the same quantity of information. They are also interchangeable in a context. And the verb идти 'to go' is their hypernym as it has a broader meaning. It also matches all hypernym criteria described above and fits in the formula: to trail/ plod/trudge is to go in some particular manner.

The relevance of our research is determined by the lack of studies dedicated to automated extraction of relations of verbs. As it is shown in the Related Work section, the majority of research is focused on nouns. At the same time the methods applied to nouns in most cases do not work well with verbs. The elaborated method could facilitate many theoretical and applied issues in linguistics, in particular, automated filling of lexical resources.

2. Related Work

Numerous studies on semantic relations extraction have been published since the pioneering work of [Hearst, 1992]. The methods applied to the task vary greatly: lexico-syntactic patterns [Hearst, 1992, 1998], automatic translation from a different language [Pianta et al., 2002], extraction from knowledge databases [Zesch et al., 2008; Panchenko et al., 2012], conversion from a linguistic ontology [Loukachevitch et al., 2016], crowdsourcing [Braslavski et al., 2016], extraction grammars [Gongalo et al. 2009, 2010], morpho-syntactic rules [Rubashkin et al., 2010] and different combinations of the aforementioned methods with machine-learning techniques such as clustering and word embeddings [Kiselev, 2016; Alekseevsky, 2018; Karyaeva et al., 2018].

Despite such abundance of research, studies on verbs are not easily found as long as most works are focused on nouns. Unlike the others' work [Gongalo et al., 2010], which extracted different types of relations for four open grammatical categories (nouns, verbs, adjectives, adverbs). They obtained 58,362 pairs of synonyms for nouns and 30,180 pairs for verbs; 122,478 noun hypernyms and no verb hypernyms at all. These results and our ongoing research allow one to suggest that verbal hyponymy extraction demands some special research as long as methods developed for nouns do not go well with verbs. ^

The authors of [Goncharova, Cardenas, 2013] designed a method of extraction of hypo-hypernymic hierarchy of verbs from domain-specific

corpora. This method is based on the cognitive theory of terminology [Benitez et al., 2005]. The method is developed further in [Cardenas, Ramisch, 2019]. Firstly, authors automatically extracted noun-verb-noun triples from specialized corpora of environmental science texts in English and Spanish. Secondly, they manually annotated each triple with the lexical domain of the verbs and the semantic class and role of the noun. And lastly, they manually inferred the hypo-hypernymic hierarchy of the extracted verbs according to their syntactic potential: the more types of semantic subclasses of nouns a verb accept, the higher its position in the hierarchy. The method is very different from all the aforementioned ones because it focuses on domain-specific terminology and demands much more human effort. Therefore, despite being a useful tool for the creation of domain-specific ontologies, this method is hardly applicable to common language.

To the best of our knowledge there is no other research on hypernyms extraction for Russian verbs. Our research started from lexico-syntactic patterns [Hearst, 1992, 1998], or specific linguistic expressions or constructions which usually include both hyponym and hypernym in a context. For example, <hyponym> and other <hypernym>; <hypernym> such as <hyponym> and so on. Firstly, there was an attempt to find some typical lexico-syntactic patterns in corpus data, but it turned out to be inefficient for verbs. Even though hypernyms and hyponyms can both be found in the nearest context, we have failed to discover any regular patterns in corpus data.

We manually analysed more than 400 contexts for 100 hyper-hyponymic pairs and realised that these pairs fulfil the hyponymy function very rarely, no more than 5-6 examples among our set of contexts. For example, the sentence К тому же хотелось сучить ногами, вертеться, вообще - двигаться, хотя несколько минут назад он мечтал только об одном - лечь 'In addition, he wanted to curl his toes, to spin, generally - to move, although a few minutes ago he dreamed of only one thing - to lie down' includes a hyper-hyponymic pair вертеться/двигаться 'to spin / to move', but there is no regular pattern that can be applied to other texts to find other hyper-hyponymic pairs. Also it turned out that hyper-hyponymic pairs more often play a role of contextual synonyms in texts [Ogorodnikova, 2017].

Secondly, we have tried to process dictionary data and find some lexico-syntactic patterns there as long as they commonly contain both hyponym and hypernym in one entry. Frequent and universal lexico-syntactic patterns are easily detected for nouns: <hyponym> - род/вид/разновидность/... 'class, sort, kind' <hypernym>. We have also failed to detect any similarly ^ universal lexico-syntactic patterns for verbs. Nonetheless, it was noticed that ^ in most cases a hypernym in a definition is accompanied with a repeating \ specifying word. We have called such words "lexical markers". Unfortunately,

64

J

the lexical markers drastically differ for different semantic groups of verbs. For example, such lexical markers as вверх/вниз 'up/down' are typical for verbs of movement and useless for verbs of speech. Those verbs are usually defined by such markers as громко / тихо, невнятно, отрывисто 'loudly / quietly, incomprehensibly, abruptly'. We have manually created a list of such markers for verbs of movement, automatically extracted hyper-hyponymic pairs from six dictionaries and manually evaluated them [Antropova, Ogorodnikova, 2019].

The method based on this idea showed a moderate precision of 0,61, but the coverage of the method depends on the list of markers, which has to be created separately for every semantic group. Manual creation of such lists is time-consuming and the task of automated creation does not seem to be much easier than the task of hyponym extraction itself.

3. Data

The study is mainly based on the material of dictionary definitions for verbs which were taken from seven Russian dictionaries:

1. Babenko L.G.: The Dictionary of Synonyms of the Russian Language, 2011;

2. Babenko L.G.: The Explanatory Dictionary of Russian Verbs, 1999;

3. Efremova T.F.: The New Dictionary of Russian Language. Explanatory-derivational, 2000;

4. Evgenyeva A.P.: The Small Academic Dictionary: in 4 v. The 4th ed., 1999;

5. Kuznetsov S.A.: The Big Explanatory Dictionary of the Russian Language, 2000;

6. Ushakov D.N.: The Explanatory Dictionary of the Russian Language: in 4 v., 1935-1940;

7. Linguistic Ontology Thesaurus RuThes.

These dictionaries are available in electronic form, so they can be easily processed. Besides, they are well-known to Russian linguistics and present the fullest vocabulary.

To check the effectiveness of the proposed methods we used one hundred Russian verbs. This number of test units allows the estimation of the methods and it is possible to analyze all achieved results manually. The verbs were extracted from The Explanatory Dictionary of Russian Verbs because it contains a detailed semantic classification of verbs, and it allows the consideration of the difference between semantic groups as it can influence the result of our analysis. The verbs from different groups were taken proportionally according to rates in the dictionary. We then tested our methods on these verbs' definitions taken from all seven dictionaries.

4. Methods

4.1. Syntactic analysis of definitions

The creation of the method is possible because of traditional definition construction. According to [Komarova, 1990] and [Shelov, 2003] there are some typical classes of definitions. So, the main difference, which is significant for the purpose of the research, is that definitions can be extended or unextended. Extended definitions are usually based on hyponymy, mero-nymy, or contextual explanations (рвать - резким движением разделять на части 'to rip - to divide into parts with an abrupt movement'). Unextended definitions contain synonyms of an entry word or its derivatives referring to another entry (жульничать - плутовать, мошенничать 'to cheat -to palter, to swindle'; defining perfective referring to its imperfective pair).

The most common type of semantic relations in verbal extended definitions is hyponymy. This speculation allows us to elaborate a method of automated verbal hyper-hyponymic pairs extraction.

So, a hypernym is usually expressed as an infinitive with dependent words. However, this rule still results in the extraction of some noise, as far as an infinitive can be used in different extending constructions. We suggested that it is possible to get rid of some noise by adding a rule that a target infinitive should be the root of syntax tree of the definition.

In order to implement these methods, we decided to use the UDPipe1 pre-trained model to obtain syntax trees for the definitions. UDPipe offers 3 models for the Russian language. We chose the Russian model trained on SynTagRus because it provides better quality according to its authors' estimations.2 On the basis of the model we created two methods of hypernyms extraction from dictionary definitions:

1. "InfsWithDependants". It extracts all the infinitives having dependent words in a given definition.

2. "RootlnfWithDependants". It extracts the infinitive having dependent words in a given definition only if it is the root of the syntax tree.

So, the definition стучать - ударять (ударить) в дверь, окно коротким, отрывистым звуком, выражая этим просьбу впустить кого-л., куда-л. 'to knock - to hit a door, a window with short, choppy sounds wishing to let somebody in' can be processed differently. The second method allows the discover of only one infinitive ударять 'to hit', which is the true hypernym for стучать 'to knock', while the first one extracts two verbs ударять, впустить 'to hit, to let in', and впустить 'to let in' is an example of the noise.

1 URL: http://ufal.mff.cuni.cz/udpipe

2 URL: http://ufal.mff.cuni.cz/udpipe/models

4.2. Post-processing with word embeddings

As it is shown in Table 1, "InfsWithDependants" proved to find almost twice as many correct hypernyms as "RootlnfWithDependants", whereas the second method delivers considerably higher precision. Thus, we devised an idea how to improve "InfsWithDependants" results by post-processing it with word embeddings.

A word embedding is a mathematical model of a language. It is based on the idea that similar words tend to appear in similar contexts. A word embedding is a trained neural network which transforms words into vectors (or points3) in some N-dimensional semantic space: if the words appear in similar contexts, the points are close to each other in the space. Figure 1 shows an example of such a representation. For the current research it is important that word embeddings also allow the calculation of a similarity measure of given words, namely the cosine similarity, which scales from 0 (least similar) to 1 (most similar). For example, according to the embedding from Figure 1, cosine similarity of verbs глядеть 'to gaze' and смотреть 'to look' is 0.836, глядеть and делать 'to do' - 0.310, глядеть and обладать 'to possess' - 0.144. Models differ from each other by the following parameters: corpora used for the model training; part of speech (POS) tags used to distinguish homonymic parts of speech (e.g., if "go" is a verb or a noun); the size of the sliding window - the number of neighbourhood words taken into account; and a number of other technical parameters such as the learning algorithm or dimensionality. See [Kutuzov, Kuzmenko, 2017] for details.

We employed pre-trained embeddings from RusVectores project.4 The idea of the method is to drop the extracted candidate verb if its similarity with the defined verb is lower than a threshold.

We took "InfsWithDependants" results for the hundred verbs as a starting point and randomly divided them into test (30) and development (70) sets. Then, for each available RusVectores model we did the following. First, we cleaned the development set from verbs absent in a model. Second, we performed 7-fold cross-validation on the development set in order to get a better estimation of a model and find the best threshold for it. After that, we compared all the models by their mean performance on cross-validation and chose the best and applied it to the test set.

3 Actually, the neural network output is N numbers - a set of coordinates in N-dimensional semantic space. This numbers can be visually represented either as a vector, beginning in the origin of coordinates and ending in the given set of coordinates, or simply as a point with the given set of coordinates.

4 URL: http://rusvectores.org/ru/models

RusVectores Похожие слова Визуализации Калькулятор

НКРЯ и Wikipedia

Визуализировать в TensorFlow Projector

.обладать

. работать

. VERB

.делать

.слышать .видеть

.смотреть

, уметь

.мочь

. глядеть

. глазеть

Fig. 1. A visualization of word embeddings

The embeddings are created by a RusVectores model trained on Russian National Corpus and Wikipedia. Represented verbs: делать to do', работать 'to work', иметь 'to have', обладать 'to possess', мочь 'to be able to', уметь 'can', глядеть 'to gaze', видеть 'to see', смотреть 'to look' , глазеть 'to stare', слышать 'to hear'

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

5. Results and Discussion

"InfsWithDependants" and "RootlnfWithDependants" were applied to all the definitions of the hundred verbs. The derived hypernyms were then manually marked for correctness. The evaluation results are summarized in Table 1. Obviously, "RootlnfWithDependants" cannot extract more true hypernyms than "InfsWithDependants", as long as it simply adds one more filtering condition. Calculating actual recall is not possible because only the extracted infinitives were marked. Nonetheless, we can get some notion about the recall drop judging by the drop of the true positive rate. "InfsWithDependants" allows the extraction of almost twice as many true hypernyms as "RootlnfWithDependants", but it demonstrates a significantly lower precision.

Table 1

"InfsWithDependants" and "RootlnfWithDependants" results for the hundred of verbs

Method name True Positive Rate Precision

InfsWithDependants 1.000 0.466

RootlnfWithDependants 0.595 0.571

Let us consider some typical mistakes arising during the syntax analysis of the definitions. For example, for the definition вертеться - совершать круговые движения; вращаться, крутиться 'to revolve - to carry out circular motions; to rotate, to spin' it marks крутиться 'to spin' as dependent from вращаться 'to rotate', whereas they actually are homogeneous and both have no dependent words, which is typical for synonyms in definitions and facilitates distinguishing them from hypernyms. Second, a common problem may be illustrated by the definition доходить - понимая и осознавая что-либо, разбираться/разобраться в чем-либо (в каком-либо сложном вопросе, запутанном деле и т.п.) 'to see the light - to figure something out understanding or realizing it (about challenging issue, complicated problem etc.)'. Here the model mistakenly marks the first verb of verbal adverbial construction as a root while it should have been the homogeneous verbs разбираться/разобраться 'to figure out'. Such mistakes might be avoided by customising the syntax model for our task. In further research this issue will be addressed.

The following example illustrates drawbacks of our methods. Заедать -подвергая что-л. (обычно какие-л. механизмы) отрицательному воздействию, зажимать/зажать, защемлять/защемить, зацеплять/зацепить какую-л. деталь, препятствуя движению, нормальному функционированию 'to jam - exposing negatively (usually some devices), to press, to squeeze, to hook a detail, so that movement or action is prevented'. Even if the syntax tree for this definition was perfect, our method does not allow the delineation of hypernyms from synonyms in case the latter have dependent words. The application of our method is also limited to extracting one-verb hypernyms only. Finding the exact boundaries of a multi-word hypernym is much more difficult. A frequent case of multi-word hypernym is the verb совершать 'to carry out' which can collocate with different specifying supplements. For instance, the definition to the verb вертеться 'to revolve' starts with an expression совершать движения 'to carry out motions' in many dictionaries.

As mentioned earlier, we decided to post-process the results of "InfsWithDependants" in order to improve its precision. Performance of every available RusVectores model in "InfsWithDependants" on cross-validation is shown in Table 2. It was possible to calculate recall for this case, because here we processed only the extracted hypernyms, which had been manually marked for correctness, so we knew exactly how many true hypernyms the set contained.

Table 2

Average quality measures on cross-validation for RusVectores models

N Model parameters Precision Recall F-score

Corpora POS tags Window size

1 Ruscorpora Universal tags 20 0.5171 0.6835 0.5813

2 Russian Wikipedia and Ruscorpora Universal tags 2 0.4851 0.798 0.5943

3 Tayga Universal tags 2 0.5201 0.7295 0.6011

4 Tayga None 10 0.5005 0.5889 0.5338

5 Russian news Universal tags 5 0.4796 0.8957 0.6221

6 Araneum None 5 0.5215 0.436 0.4728

The models can be downloaded from http://rusvectores.org/models/. Model filenames:

1 - ruscorpora_upos_cbow_300_20_2019;

2 - ruwikiruscorpora_upos_skipgram_300_2_2019;

3 - tayga_upos_skipgram_300_2_2019;

4 - tayga_none_fasttextcbow_300_10_2019;

5 - news_upos_skipgram_300_5_2019;

6 - araneum_none_fasttextcbow_300_5_2018

When fitting the thresholds and choosing the best model we decided to rely on precision rather than F-score because precision for this task does not grow with the increase of the threshold. A typical graph for precision, recall and F-score resembles Figure 1. This shows that precision grows up only to some threshold, but then it decreases. It happens because the word embeddings that we used does not distinguish different meanings of words, thus combining all the meanings of a word into a single average vector. Therefore, if at least one word of a hyper-hyponymic pair is used not in its most frequent meaning, their similarity might be rather low.

Also, Figure 2 demonstrates that recall changes in a much wider span, thus having greater impact on F-score.

Thresholds

Fig. 2. A typical dependency of precision, recall and F-score from threshold

We chose the third model (Tayga, Universal tags, Window Size = 2) from all the models presented in Table 2 because even though the sixth model (Araneum, No tags, Window Size = 5) has slightly higher precision, the first one has significantly higher recall. We applied the chosen model with the threshold found during cross-validation to the test set and compared it with the corresponding parts of "InfsWithDependants" and "RootlnfWithDependants" results (see Table 3). In that way we managed to obtain the results with a higher true positive rate and precision than those of the "RootInfWithDependants" method.

Table 3

Final results for the test set

Method name True Positive Rate Precision

InfsWithDependants 1.000 0.401

Post-processed InfsWithDependants 0.832 0.517

RootlnfWithDependants 0.740 0.504

Conclusion

A preliminary linguistic reflection allowed us to conclude that for verb hyponymy extraction, it is worth using dictionary definitions. In this kind of linguistic source, verbal hyper- and hyponyms occur together more frequently than in others (e.g. corpus data).

Our previous study also allowed us to conclude that lexico-syntactic patterns, widely used for the extraction of hyper-hyponymic pairs of nouns, do not fit for verbs because we were unable to find any verbal lexico-syntactic patterns neither in corpora nor in dictionary definitions. Therefore, some methods of extraction should be developed specifically for verbs.

The study shows that syntactic analysis of definitions is a good starting point for hyper-hyponymic verbal pairs extraction. We developed two methods based on syntactic analysis of definitions and applied them to seven Russian dictionaries. The first method extracted all infinitives that have dependants. The second method also demanded an extracted infinitive to be the root of the syntax tree. The use of pre-trained word embeddings from RusVectores project improved precision of the first syntax-based method without a crucial drop in the number of extracted true hypernyms, which allowed outperformance of the second syntax-based method in both precision and number of extracted true hypernyms.

Nonetheless, analysis of mistakes showed that the syntax model should be customised for our task to improve the results of the developed method. We will address these issues in future research.

References

Antropova, Ogorodnikova, 2019 - Антропова О.И., Огородникова Е.А. Возможности автоматизированного выделения гипо-гиперонимических пар из словарных определений глаголов // Вестник Южно-Уральского государственного университета. Серия: Лингвистика. 2019. Т. 16. № 2. С. 51-57. [Antropova O.I., Ogorodnikova E.A. Possibilities of computer-aided extraction of hyper-hyponymic pairs from dictionary definitions of verbs. Vestnik Yuzhno-Uralskogo gosudarstven-nogo universiteta. Lingvistika. 2019. Vol. 16. No. 2. Pp. 51-57. (In Russ.)]

Alekseevsky, 2018 - Алексеевский Д.А. Методы автоматического выделения тезаурусных отношений на основе словарных толкований: Дис. ... канд. филол. наук. М., 2018. [Alekseevsky D.A. Metody avtomaticheskogo vigeleniya tezaurusnih otnosheniy na osnove slovarnih tolkovaniy [Automatic methods of thesauri relations extraction on the basis of dictionary definitions]. PhD diss. Moscow, National Research University "Higher School of Economics", 2018.]

Benitez et al., 2005 - Benitez F., Exposito C.M., Exposito M.V., Linares C.M. Framing Terminology: A Process-Oriented Approach. Meta: Translators' Journal. 2005. Vol. 50. No. 4. URL: https://www.era-dit.org/en/journals/meta/2005-v50-n4-meta1024/019916ar (accessed: 11.10.2019).

Braslavski et al., 2016 - Braslavski P., Ustalov D., Mukhin M., Kiselev Y. YARN: Spinning-in-Progress. Proceedings of the Eight Global Wordnet Conference. V.B. Mititelu, C. Foräscu, C. Fellbaum, P. Vossen (eds.). Bucharest, 2016. Pp. 58-65.

Cardenas, Ramisch, 2019 - Cardenas B.S., Ramisch C. Eliciting specialized frames from corpora using argument-structure extraction techniques. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication. 2019. 25 (1). Pp. 1-31.

Fellbaum, Miller, 1990 - Fellbaum C., Miller G.A. Folk psychology or semantic entailment? A reply to rips and conrad. Psychological Review. 1990. No. 97. Pp. 565-570.

Fellbaum, 1993 - Fellbaum C. English verbs as a Semantic Net. Word-Net: Lexical Database - Revised August, 1993.

Goncharova, Cardenas, 2013 - Goncharova Yu., Cardenas B.S. Specialized Corpora Processing with Automatic Extraction Tools. Procedia - Social and Behavioral Sciences. A. Hamilton (ed.). Elsevier, 2013. Pp. 293-297.

Gonjalo et al., 2009 - Gonjalo H.O., Santos D., Gomes P. Relations extracted from a portuguese dictionary: Results and first evaluation. Local Proc. 14th Portuguese Conference on Artificial Intelligence (EPIA). L.S. Lopes, N. Lau, P. Mariano, L.M. Rocha (eds.). Springer, 2009. Pp. 541-552.

Gonjalo et al., 2010 - Gonjalo H.O., Gomes P. Onto.PT: Automatic Construction of a Lexical Ontology for Portuguese. Proceedings of 5th European Starting AI Researcher Symposium (STAIRS 2010). T. Abudawood, P.A. Flach (eds.). Lisbon, 2010. Pp. 199-211.

Hearst, 1992 - Hearst M.A. Automatic acquisition of hyponyms from Large Text Corpora. Proceedings of the 14th Conference on Computational Linguistics. Nantes, 1992. Pp. 539-545.

Hearst, 1998 - Hearst M.A. Automated Discovery of WordNet Relations. WordNet: An Electronic Lexical Database. C. Fellbaum (ed.). Cambridge, 1998. Pp. 132-152.

Karyaeva et al., 2018 - Karyaeva M., Braslavski P., Kiselev Yu. Extraction of hypernyms from dictionaries with a Little Help from Word Embeddings. Analysis of Images, Social Networks and Texts - 7th International Conference, AIST. A. Panchenko, W.M. van der Aalst, M. Khachay et al. (eds.) Moscow, 2018.

Kiselev, 2016 - Киселев Ю.А. Разработка автоматизированных методов выявления семантических отношений для электронных тезаурусов: Дис. ... канд. техн. наук. Екатеринбург, 2016. [Kiselev Yu.A. Razrabotka avtomatizirovannykh metodov vyyavleniya semanticheskikh otnoshenii dlya elektronnykh tezaurusov [Elaboration of computer-aided methods of semantic relations extraction for electronic thesauri]. PhD diss. Ekaterinburg, Ural Federal University, 2016.]

Komarova, 1990 - Комарова З.И. Русская отраслевая терминология и тер-минография. Каменец-Подольский, 1990. [Komarova Z.I. Russkaya otraslevaya terminologiya i terminografiya [Russian branch terminology and terminography]. Kamenets-Podolskiy, 1990.]

Kotsova, 2010 - Котцова Е.Е. Гипонимия в лексической системе русского языка (на материале глагола): Дис. ... д-ра филол. наук. Архангельск, 2010. [Giponimiya v leksicheskoi sisteme russkogo yazyka (na materiale glagola). [Hyponymy in lexical system of the Russian language (on the material of verbs)]. Dr. Hab. diss. Arkhangelsk, Pomor State University, 2010.]

Kutuzov, Kuzmenko, 2017 - Kutuzov A., Kuzmenko E. WebVectors: A Toolkit for Building Web Interfaces for Vector Semantic Models. Analysis of Images, Social Networks and Texts. AIST 2016. Ignatov D. et al. (eds.). Springer, Cham, 2017.

Loukachevitch et al., 2016 - Loukachevitch N.V., Lashevich G., Gerasimova A.A. et al. Creating Russian WordNet by Conversion. Proceedings of Conference on Computatilnal linguistics and Intellectual technologies Dialog-2016. V.P. Selegey et al. (eds.). Moscow, 2016. Pp. 405-415.

Ogorodnikova, 2017 - Огородникова Е.А. Использование лексико-син-таксических шаблонов для формализации родовидовых отношений в толковом словаре // Евразийский гуманитарный журнал. 2017. № 2. С. 20-24. [Ogorodnikova E.A. Applying of lexico-syntactic patterns to hyper-hyponymic relations formalization in an explanatory dictionary. Evraziyskiy gumanitarny zhurnal. 2017. No. 2. Pp. 20-24. (In Russ.)]

Panchenko et al., 2012 - Panchenko A., Adeykin S., Romanov P., Romanov A. Extraction of Semantic Relations between Concepts with KNN Algorithms on Wikipedia. Concept Discovery in Unstructured Data Workshop (CDUD) of International Conference On Formal Concept Analysis. D. Ignatov, S. Kuznetsov, J. Poelmans (eds.). Leuven, 2012. Pp. 78-88.

Pianta et al., 2002 - Pianta E., Bentivogli L., Christian G: MultiWordNet. Developing an aligned multilingual database. Proceedings of the 1st International WordNet Conference. Ch. Fellbaum, P. Vossen (eds.). Mysore, 2006. Pp. 293-302.

Rubashkin et al., 2010 - Опыт автоматизированного пополнения онтологий с использованием машиночитаемых словарей / Рубашкин В.Ш., Бочаров В.В., Пивоварова Л.М., Чуприн Б.Ю. // Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной Международной конференции Диалог. Вып. 9 (16). Кибрик A.E. и др. (ред.). М., 2010. С. 413-418. [Rubashkin V.Sh., Bocharov V.V., Pivovarova L.M., Chuprin B.Yu. The approach to ontology learning from machine-readable dictionaries. Proceedings of Conference on Computatilnal linguistics and Intellectual technologies Dialog-2010. No. 9 (16). A.E. Kibrik et al. (eds.). Moscow, 2010. Pp. 413-418. (In Russ.)]

Shelov, 2003 - Шелов С.Д. Термин. Терминологичность. Терминологические определения. СПб., 2003. [Shelov S.D. Termin. Terminologichnost. Terminologicheskie opredeleniya [Term. Termhood. Terminological definitions]. St. Petersburg, 2003.]

Zesch et al., 2008 - Zesch T., Müller Ch., Gurevych I. Extracting lexical semantic knowledge from Wikipedia and Wiktionary. Proceedings of the Sixth International Conference on Language Resources and Evaluation. Marrakech, 2008. Pp. 1646-1652.

Статья поступила в редакцию 26.10.2019 The article was received on 26.10.2019

Об авторах / About the authors

Антропова Оксана Игоревна - старший преподаватель кафедры технической физики Физико-технологического института, Уральский федеральный университет имени первого Президента России Б.Н. Ельцина, г. Екатеринбург

Oksana I. Antropova - Senior Lecturer at the Chair of Applied Physics of the Institute of Physics and Technology, Ural Federal University named after the first President of Russia B.N. Yeltsin, Yekaterinburg E-mail: choksy@mail.ru

Огородникова Екатерина Алексеевна - ассистент кафедры лингвистики и профессиональной коммуникации на иностранных языках департамента лингвистики, Уральский федеральный университет имени первого Президента России Б.Н. Ельцина, г. Екатеринбург

Ekaterina A. Ogorodnikova - Teaching Assistant at the Chair of Linguistics and Professional Communication on Foreign Languages of the Department of Linguistics, Ural Federal University named after the first President of Russia B.N. Yeltsin, Yekaterinburg

E-mail: kruglikova.katya@yandex.ru

Все авторы прочитали и одобрили окончательный вариант рукописи All authors have read and approved the final manuscript

i Надоели баннеры? Вы всегда можете отключить рекламу.