Научная статья на тему 'International classification of proverbs in computational linguistics'

International classification of proverbs in computational linguistics Текст научной статьи по специальности «Языкознание и литературоведение»

CC BY
384
115
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
paremiology / proverb / computational linguistics / linguistic database / language corpus / corpus linguistics / паремология / пословица / компьютерная лингвистика / лингвистическая база данных / языковой корпус / корпусная лингвистика.

Аннотация научной статьи по языкознанию и литературоведению, автор научной работы — Abdullaeva Nargiza Erkinovna

this paper is devoted to the role of computational linguistics in the development of international paremiology that is concerning with implementing electronic means and methods in creating modern electronic international paremiographic databases as well as learning and investigating proverbs of various languages at the same time. The discussion of this work provides the basis of both comparative and contrastive investigation of the proverbs that exist in a number of world natural languages. Moreover, the scientific research offers the fundamentals of coorganizing integrated work of paremiologists and computer programmers that is supposed to be fulfilled in near future.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

МЕЖДУНАРОДНАЯ КЛАССИФИКАЦИЯ ПОСЛОВИЦ В КОМПЬЮТЕРНОЙ ЛИНГВИСТИКЕ

эта научная статья посвящена роли компьютерной лингвистики в развитии международной паремиологии, которая связана с внедрением электронных средств и методов в создание современных электронных международных паремиографических баз данных, одновременно проводя исследования и изучение пословиц на разных языках. Обсуждение этой работы создает основу для сравнительного и сопоставительного исследования пословиц, которые существуют в ряде мировых естественных языков. Кроме того, это научное исследование предлагает основы совместной организации интегрированной работы паремиологов и компьютерных программистов, которые, как предполагается, будут выполнены в ближайшем будущем.

Текст научной работы на тему «International classification of proverbs in computational linguistics»

PHILOLOGICAL SCIENCES

INTERNATIONAL CLASSIFICATION OF PROVERBS IN COMPUTATIONAL LINGUISTICS Abdullaeva N.E. (Republic of Uzbekistan) Email: Abdullaeva556@scientifictext.ru

Abdullaeva Nargiza Erkinovna - Doctoral Student, Teacher, DEPARTMENT OF ENGLISH PHILOLOGY, NATIONAL UNIVERSITY OF UZBEKISTAN NAMED MIRZO ULUGHBEK, TASHKENT, REPUBLIC OF UZBEKISTAN

Abstract: this paper is devoted to the role of computational linguistics in the development of international paremiology that is concerning with implementing electronic means and methods in creating modern electronic international paremiographic databases as well as learning and investigating proverbs of various languages at the same time. The discussion of this work provides the basis of both comparative and contrastive investigation of the proverbs that exist in a number of world natural languages. Moreover, the scientific research offers the fundamentals of co-organizing integrated work of paremiologists and computer programmers that is supposed to be fulfilled in near future.

Keywords: paremiology, proverb, computational linguistics, linguistic database, language corpus, corpus linguistics.

МЕЖДУНАРОДНАЯ КЛАССИФИКАЦИЯ ПОСЛОВИЦ В КОМПЬЮТЕРНОЙ ЛИНГВИСТИКЕ Абдуллаева Н.Э. (Республика Узбекистан)

Абдуллаева Наргиза Эркиновна - докторант, преподаватель,

кафедра английской филологии, Национальный университет Узбекистана им. Мирзо Улугбека, г. Ташкент, Республика Узбекистан

Аннотация: эта научная статья посвящена роли компьютерной лингвистики в развитии международной паремиологии, которая связана с внедрением электронных средств и методов в создание современных электронных международных паремиографических баз данных, одновременно проводя исследования и изучение пословиц на разных языках. Обсуждение этой работы создает основу для сравнительного и сопоставительного исследования пословиц, которые существуют в ряде мировых естественных языков. Кроме того, это научное исследование предлагает основы совместной организации интегрированной работы паремиологов и компьютерных программистов, которые, как предполагается, будут выполнены в ближайшем будущем.

Ключевые слова: паремология, пословица, компьютерная лингвистика, лингвистическая база данных, языковой корпус, корпусная лингвистика.

Science is developing day by day to make human life easier and of course better since it has a number of advantages in various situations and conditions of human life. Since 1950s, computer technologies have been providing science improvement in high speed. Nowadays, it is obvious for everyone that in any field of human science computer technologies possess huge valuable place in many ways because of its opportunities to save time and keep much information, which is unable with other ways or tools of science. Especially, linguistics also uses computer technologies in its progress more than other fields of human science.

In the second half of the twentieth century the new branch of linguistics - "computational linguistics" has become into existence as the result of using computational methods in linguistics.

47

As John Hutchins cites "Computational linguistics originated with efforts in the United States in the 1950s to use computers to automatically translate texts from foreign languages, particularly Russian scientific journals, into English" [1, p. 30]. Simultaneously creating different linguistic databases and dictionaries have been implemented with the help of machine translation and computer programs since then. Besides, electronic and internet tools including electronic books, electronic manuals, electronic dictionaries, special programs, special sites and applications assist to maintain creating and keeping linguistic databases as well as carrying out scientific researches around the world. Linguistic corpuses also own essential and valuable part in the above-mentioned row, of course.

"Computational linguistics is an interdisciplinary field concerned with the statistical or rule-based modelling of natural language from a computational perspective, as well as the study of appropriate computational approaches to linguistic questions" [2, p. 5]. The term of "computational linguistics" was firstly used by David Hays, who is one of the founders of Association for Computational Linguistics (was founded in 1962) and International Committee on Computational Linguistics (was founded in 1965).

As it was noted above, from a computational perspective a great deal of linguistic databases exist on internet that can be found easily by a click in a second or more. As well as the electronic variants of encyclopaedic, bilingual and multilingual dictionaries of words, internet users can use the electronic variants of proverbial collections and dictionaries in the form of programs, applications or special sites, too.

Although proverbs are faced in texts of language corpuses and special databases, it is not wrong to consider they are not enough in those corpuses and databases to have thorough investigation as they exist in real natural languages. The issue of creating more satisfactory electronic databases on both paremiology and paremiography is considered one of the most complicated problems of modern computational linguistics. This issue is essential for not only paremiology of one language, but also from the point of international paremiology. Because of the peculiarities of different natural languages, proverbs existing in those languages also reflect various peculiarities of the languages to which they belong. Besides, they do not reflect only linguistic peculiarities of the language, but also the language owners' national culture and traditions as well.

From the linguistic point of view, typological approach to international paremiology is considered to be formed on the basis of genealogic classification of world natural languages. Genealogic classification of world natural languages is based on the factor that whether the chosen languages are relative or non-relative genetically. Prominent linguists F. Bopp, A. Vostokov and A. Schleicher made huge contribution to fulfil genealogic classification of languages.

According to comparative approach, relative languages and their peculiarities are compared, while according to contrastive approach non-relative languages and their peculiarities are contrasted to each other. According to this theory, English and Russian proverbs' semantic, structural, stylistic, linguo-cultural, cognitive, socio-linguistic, linguo-pragmatic and other linguistic features are investigated together in comparative linguistics; while English and Uzbek proverbs' peculiarities are analysed comparatively in contrastive linguistics [3; 4]. As a result, universal features of proverbs in relative languages are usually found in comparative linguistics; and lacunas are often faced in contrastive linguistics.

Some paremiologists and folklorists as W. Mieder [5], A. Taylor, G. Permyakov, Y. Rozhdestvensky, H. Hrisztova-Gotthardt, Outi Lauhakangas, K. Steyer [6], N. Norrick [7] made valuable contribution to theoretical and practical sides of paremiology from the computational point of view. Moreover, Kathrin Steyer investigated proverbs from a corpus linguistic point of view as she conveys it is not reliable to believe that if a proverb does not exist in the corpus of the selected language (as we know, nowadays the corpuses of several languages of the world have been created and one can easily find and use them on the internet, for instance: British National Corpus (BNC), Corpus of Contemporary American English (COCA), German Reference Corpus (DeReKo), etc.), this proverb is not used in the speech of the language speakers. She identifies two corpus linguistic approaches to the study of current proverb use: 1) knowing a proverb (searching a proverb in the corpus that exists in the mental lexicon of language speakers or in a paremiographic

48

dictionaries) and 2) detecting a proverb in a corpus (corpus driven, here key words of proverbs are bases of searching them in a corpus) [8, p. 206-207].

The major database of international proverb types and literature references on the internet is Finnish scientist Matti Kuusi's international type system of proverbs, which its computer variant was created by his daughter Outi Lauhakangas [9]. In this database proverbs in main European (including Finnish, Russian, and even North and Latin American), Asian, Islamic, Pacific and African (mostly, Subsaharan African) languages are classified thematically and structurally. Besides, universal proverb types existing in nearly all main languages are distinguished in the database. Here, proverbs from various languages are divided from semantical and structural point of view into 13 main themes (indicated by capital letters from A to T), their 52 main classes (indicated by numbers) and 325 subclasses (indicated by little letters), and distribution abbreviations reflect proverbs global type or special family types. Furthermore, each proverb owns an individual number in the database that helps to distinguish proverbs from each other. 13 main themes of Matti Kuusi's international classification of proverbs: A. The practical knowledge of nature B. Faith and basic attitudes C. The basic observations and socio-logic D. The world and human life E. Sense of proportion F. Concepts of morality

G. Social life H. Social interaction

J. Communication K. Social position

L. Agreements and norms M. Coping and learning

T. Time and sense of time [9]

For investigating main classes and subclasses of the classification, we take A type as an example: A. The practical knowledge of nature:

• A1. Natural elements: A1a. Water and fire as natural elements; A1b. Earth and sea as natural elements; A1c. Types of soil and flora as natural elements; A1d. Cultivated plant; A1e. Cold - warm;

• A2. Animals, human being: animal: A2a. Position of man, domestic and wild animals; A2b. Animals as signs of weather and harvest;

• A3. Weather, calendar: A3a. Points of the compass, wind, rain, changing weather; A3b. Morning : evening, night : day, darkness : light; A3c. Spring, autumn, summer : winter, year and harvest; A3d. Months; A3e. Omens, sayings and advice about holidays, red-letter days; A3f. Personification of red-letter days [9].

The main classes and subclasses are defined according to the frequency of the topics of proverbs. Moreover, while searching a proverbs one can find the literature references of the proverb and information about in which languages or language families this proverb exists. For instance, searching the word "sea" results the list of 23 entities. The first proverb of this list is "If the sea is rich, the land is poor. (translation!)" [9]. As it shows, the proverb is translated into English from another language. Its code is Alb,16, it means it belongs to the main theme of "A. The practical knowledge of nature", the main class of "A1. Natural elements", the subclass of "A1b. Earth and sea as natural elements" and in this subclass the number of the proverb is 16. By clicking the proverb on the list, one can get into the list of the literature references of the proverb. Eight references are found for this proverb in the site, according to the given references it is obvious that this proverb belongs to the Finnish culture.

Despite of many advantages of Matti Kuusi's international type system of proverbs, to create international databases of proverbs being more complex and at the same time clearer to users is one of the main tasks of international paremiology nowadays. Besides, improving quality of paremiological databases including proverbs of a number of languages around the world is an essential linguistic issue. It will provide to hold better scientific investigations in both paremiology and paremiography lessening several obstacles to maintain international paremiological researches in its regard.

To conclude, because of the huge encompassment of international paremiology in computational linguistics, scientists from various nations should work on this issue together to implement this linguistic task thoroughly with the assist of computer programmers as well. While

49

classifying proverbs of various languages, linguists should use comparative and contrastive approaches for investigating paremiological stock of genetically relative and non-relative languages appropriately. Then the results of the investigation and databases, which is created according to the results of the investigation, will be clear and useful for the development of not only international paremiology, but also general and special linguistics.

References / Список литературы

1. Hutchins J. Retrospect and prospect in computer-based translation. // Proceedings of MT Summit VII, 1999. P. 30-44.

2. Uszkoreit H. What Is Computational Linguistics? Department of Computational Linguistics and Phonetics of Saarland University, 2000. P. 4-17.

3. Abdullaeva N.E. Semantic and linguocultural features of English and Uzbek proverbs with concept of friendship. // Problems of Modern Science and technology, № 7 (89), 2017. P. 8591.

4. Abdullaeva N.E. English Proverbs with Graduonyms. "European Research: Innovation in Science, Education and Technology" 33th international scientific and practical conference, London, № 10 (33), 2017. P. 51-53.

5. Mieder W. Proverbs as cultural units or items of folklore. In H. Burger et al. (eds), Phraseology: An international handbook of contemporary research. Berlin: de Gruyter, 2007. P. 394-414.

6. Gotthardt H.H., Varga M.A. (eds) Introduction to Paremiology: A Comprehensive Guide to Proverb Studies. Warsaw/Berlin: De Gruyter Open, 2014. 368 p.

7. Norrick N. Proverbs as set phrases. In H. Burger et al. (eds), Phraseology: An international handbook of contemporary research. Berlin: de Gruyter, 2007. P. 381-33.

8. Steyer K. Proverbs from a Corpus Linguistic Point of View. In: Gotthardt H.H., Varga M.A. (eds) Introduction to Paremiology: A Comprehensive Guide to Proverb Studies. Warsaw/Berlin: De Gruyter Open, 2014. P. 206-227.

9. Lauhakangas O. "The Matti Kuusi International Type System of Proverbs" (2001) in the following site. [Electronic resource]. URL: http://lauhakan.home.cern.ch/lauhakan/cerp.html/ (date of acces: 13.02.2019).

i Надоели баннеры? Вы всегда можете отключить рекламу.