Научная статья на тему 'Corpus studies of speech of individuals who committed suicides'

Corpus studies of speech of individuals who committed suicides Текст научной статьи по специальности «Языкознание и литературоведение»

CC BY
103
36
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
КОРПУСНАЯ ЛИНГВИСТИКА / CORPUS LINGUISTICS / КОРПУСЫ ТЕКСТОВ / TEXT CORPORA / СУИЦИД / SUICIDE / СУИЦИДАЛЬНОЕ ПОВЕДЕНИЕ / SUICIDAL BEHAVIOUR / АВТОМАТИЧЕСКАЯ ОБРАБОТКА ТЕКСТОВ / AUTOMATIC TEXT PROCESSING

Аннотация научной статьи по языкознанию и литературоведению, автор научной работы — Litvinova T.A.

Speech is known to be a source of information about the individual as well as their mental state and thus a valuable diagnostic tool. One of the crucial tasks facing the modern psychodiagnostics is predicting suicidal tendencies. One of the promising fields is analyzing speech of individuals who committed suicides. However, designing such text corpora is a daunting scientific task. The article describes the text corpora employed in studies of the features of speech of individuals who committed suicides (mostly in English) and introduces the first Russian corpus RusSuiCorpus and outlines the perspectives for further studies.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

КОРПУСНЫЕ ИССЛЕДОВАНИЯ РЕЧИ ЛИЦ, СОВЕРШИВШИХ СУИЦИД

Как известно, речь является источником информации о человеке, в том числе о его психологическом состоянии, и соответственно ценнейшим диагностическим инструментом. Одной из важнейших задач современной психодиагностики является диагностирование склонности личности к суициду. Одним из перспективных направлений исследований в этом направлении является анализ речи лиц, совершивших законченный суицид. Однако составление подобных корпусов текстов является сложной научной задачей. В статье описываются корпусы текстов, использующиеся для исследования особенностей речи суицидентов (преимущественно англоязычные), а также вводится первый подобный корпус текстов на русском языке RusSuiCorpus, и намечаются перспективы исследований на его основе.

Текст научной работы на тему «Corpus studies of speech of individuals who committed suicides»

DOI: 10.18454/RULB.7.16 Литвинова Т.А.

Воронежский государственный педагогический университет КОРПУСНЫЕ ИССЛЕДОВАНИЯ РЕЧИ ЛИЦ, СОВЕРШИВШИХ СУИЦИД

Аннотация

Как известно, речь является источником информации о человеке, в том числе о его психологическом состоянии, и соответственно ценнейшим диагностическим инструментом. Одной из важнейших задач современной психодиагностики является диагностирование склонности личности к суициду. Одним из перспективных направлений исследований в этом направлении является анализ речи лиц, совершивших законченный суицид. Однако составление подобных корпусов текстов является сложной научной задачей. В статье описываются корпусы текстов, использующиеся для исследования особенностей речи суицидентов (преимущественно англоязычные), а также вводится первый подобный корпус текстов на русском языке RusSuiCorpus, и намечаются перспективы исследований на его основе.

Ключевые слова: корпусная лингвистика, корпусы текстов, суицид, суицидальное поведение, автоматическая обработка текстов.

Litvinova T.A.

Voronezh State Pedagogical University CORPUS STUDIES OF SPEECH OF INDIVIDUALS WHO COMMITTED SUICIDES

Abstract

Speech is known to be a source of information about the individual as well as their mental state and thus a valuable diagnostic tool. One of the crucial tasks facing the modern psychodiagnostics is predicting suicidal tendencies. One of the promising fields is analyzing speech of individuals who committed suicides. However, designing such text corpora is a daunting scientific task. The article describes the text corpora employed in studies of the features of speech of individuals who committed suicides (mostly in English) and introduces the first Russian corpus RusSuiCorpus and outlines the perspectives for further studies.

Keywords: corpus linguistics, text corpora, suicide, suicidal behaviour, automatic text processing.

Почта авторов / Author Email: [email protected]

Благодарности / Acknowledgements

Исследование выполнено при поддержке гранта Президента РФ для молодых российских ученых - кандидатов наук, проект № МК-4633.2016.6 «Диагностирование склонности личности к суицидальному поведению на основе анализа ее речевой продукции».

The research is supported by the grant of the Russian President awarded to young Russian scientists (Candidates of Sciences), project № МК-4633.2016.6 "Predicting Suicidal Tendencies of Individuals Based on Their Speech Production ".

Introduction. According to the International Health Organization, over 800000 people die of suicides daily, i.e. every 40 seconds there is a suicide and only 30 % of individuals declared their intentions of committing one [5]. Therefore there is a pressing need to develop methods to predict suicidal tendencies and prevent any possible suicide attempts. One of the most valuable diagnostic tool for monitoring individual mental conditions as well as suicidal tendencies is analyzing speech production including its formal grammatical level that cannot be consciously controlled.

1. Personality profiling based on texts. Personality profiling using texts has been performed for decades and recently there has been a growing worldwide interest in the problem due to increasing numbers of Internet communications and a need for methods allowing one to recreate an individual's personality (gender, age, education level, native language, psychological traits, etc.) using quantitative analysis of anonymous and pseudoanonymous texts [2; 12].

Psychologists and linguists have traditionally taken the lead in personality profiling using texts. However, the 1990s saw mathematicians and information technology specialists join the effort by actively employing the methods of mathematical statistics, computational linguistics and natural language processing (NLP) in particular for quick processing of large massifs of textual data. Based on the identified correlations between the numerical values of quantifiable linguistic text parameters and authors' characteristics, mathematical models are designed and developed for automatic personality profiling using texts. Formal

grammatical text parameters that cannot be controlled by authors and thus cannot be consciously distorted (function words, POS bigrams and trigrams, etc.) are of particular importance [2]. Note that most research has dealt with English texts.

There have also been attempts to identify mental disorders (depression, schizophrenia, bipolar disorder, etc.) in authors of written texts. As the literature review suggests, content-analysis of speech production does not provide complete information about the psychological status of the individual. Hence Baddeley [6] who analyzed emails showed that depressed individuals used more words describing positive emotions than in the test group probably in this way trying to mask their real feelings. It is quite obvious that in order to identify the psychological state using texts, their different levels besides vocabulary, which is easy to imitate, need to be analyzed.

It is obvious that in order to identify the psychological characteristics as well as suicidal tendencies, a comprehensive psycholinguistic analysis of individual speech production needs to be performed employing the neuropsychological data, neuropsychology of individual differences, neurobiology of suicidal behaviour based on the modern methods of automatic language processing and corpus technologies [4]. Studies of different genres of written speech of suicidal individuals produced at different points of their lives that are used to identify linguistic cues of suicidal behaviour, i.e. changes in different levels of the parameters of texts as cognitive tools as suicidal tendencies progress, compared to the speech production in the test group made up of individuals who have almost identical education levels and

other characteristics as well but did not commit suicides would allow one to develop diagnostic tools to predict suicidal tendencies based on the quantitative parameters of texts.

2. Text corpora in studies of speech of suicidal individuals. Scientists dealing with the features of texts by suicidal individuals have mainly analyzed suicidal notes. There are also similar corpora [18] where their formal grammatical characteristics (average sentence length, proportions of different parts of speech, etc.) and content (proportions of words describing positive and negative emotions; time, place, etc.) [13; 17; 19] are analyzed. Mathematical models are also being designed to distinguish between genuine and fake suicide notes using quantitative text parameters [9].

Despite a pressing need to investigate suicide notes, being small, they do not offer opportunities to look at all the features of speech production of suicidal individuals. Therefore scientists have become aware of the importance of analyzing texts of different genres written by individuals who committed suicides compared to those who did not (considering the genre, demographic characteristics, etc.) as well as their dynamics in order to identify changes in the idiostyle as the tragic ending is pending. There are few such studies due to complexities associated with working on research text corpora.

Texts by famous individuals such as writers, poets, musicians are commonly used in research and it is not their natural written speech but their literary texts that are investigated.

In the literature that we have studied the point is made that qualitative methods of analyzing suicide texts employed by psychologists and psychiatrists in particular should be employed in combination with quantitative methods relying on software. Hence in [21] using the LIWC software to compute the proportions of different parts of speech, certain vocabulary groups, etc. in a text [15], it was found that in poetic texts by suicidal individuals written at different periods of time, the pronoun "I" is more frequently used compared to the texts by the test group. As time went by, suicidal individuals were seen to use fewer "we" pronouns as well as interaction verbs (e.g., talk, share, listen), but contrary to the common belief, more words describing negative emotions (there were no statistically significant differences between the suicidal individuals and the test group in this parameter). It is argued that the results are consistent with a suicide genesis theory connecting suicidal behaviour with growing alienation from other people.

A study involving about 300 poetic texts by 18 American and Russian suicidal and non-suicidal poets came to be popular in the foreign media even though its methodology is controversial. In particular, it was not original Russian texts but their English translations that were analyzed. Besides, poetic texts are not commonly carefully edited with their investigations sometimes taking years or even decades, which undermines their significance for research.

Original Russian poetic texts by 6 Russian poets (three suicidal and three non-suicidal ones) were examined in a special study by Ch. Davidson [7]. The parameters (nothing is mentioned about the labelling of the texts) were those that according to S. W. Stirman, J. W. Pennebaker [21] differentiate between the texts by suicidal individuals and by the test group. The texts by suicidal individuals were found to contain fewer words decribing human interactions. However, the proportions of "I" prnouns (and their indirect forms) in the texts by suicidal individuals were found to increase over time instead of remaining high and the opposite

applied for the test group. There were also differences from the results obtained by S. W. Stirman, J. W. Pennebaker, which suggests that in the context of this issue it is not translated but original texts that are to be analyzed. In addition, unlike S. W. Stirman, J. W. Pennebaker, the author analyzed the number of negations (not, no) and established that their proportion increases in the texts by suicidal individuals as opposed to those in the test group as time goes by. Thus he assumes that the results obtained using the texts by authors of different nationalities should be compared.

In [11] using the LIWC and Coh-Metrix software texts by individuals who committed suicides were analyzed and compared with those in the test group and suicidal individuals were found to use more abstract words and fewer words overall, more verbs and fewer words relating to "Death". However, this research has certain limitations: song lyrics are a product of collective mind and are not always relevant for studies of individual idiostyles.

In the paper by Mulholland M., Quinn J. [14] conducted using song lyrics, the objective was to develop a mathematical model to classify texts as belonging to suicidal individuals or the test group, however a lot more text parameters were added (TTR, proportions of some parts of speech, some semantic groups, n-grams). The texts were labelled using the modern automatic language processing tools and a classifier with the accuracy of 70.6% was designed using machine learning. The results show the paramount importance of addressing suicidal risk evaluation based on the quantitative text analysis by means of NLP and mathematical statistics. As the authors are justified in commenting, in order to improve the accuracy of the model, text corpora and a selection of parameters for analysis are to be expanded.

Research corpora are to be expanded due to non-edited (unlike literary) texts that are samples of natural written speech of individuals who committed suicides.

Diaries, letters, Internet communication, interviews by individuals who ended up committing suicides (e.g., see [16]; for a review of similar research see [10]) have been analyzed recently.

Note that the above studies did not make it their objective to design methods for predicting suicidal tendencies based on qualitative analysis of speech production and merely identify statistically significant differences in texts by suicidal and non-suicidal individuals. The major tool for analyzing texts was the LIWC software [8]. Only English texts were analyzed. In addition, in similar studies individual texts are investigated and no corpus linguistics methods are employed, which makes one question how universal the conclusions made are.

3. Research methods. Hence as suggested by the literature review, most studies to identify typical features of speech of suicidal individuals using statistical methods and automatic text processing tools have been conducted using English texts. It is obvious that other languages need to be explored as well, in particular Russian. In order to investigate the features of texts by Russian suicidal individuals, the following should be done first:

• designing corpora of texts by individuals who committed suicides. There are presently no such corpora of Russian texts;

• developing the principles of selecting text parameters to study. Note that while doing so, researchers abroad prioritize automatic extraction of their numerical values by means of available software and in some cases of existing psychological theories accounting for suicidal behaviour. The data on neurobiological mechanisms of

suicidal behaviour obtained by scientists at home and abroad are neglected [1; 8];

• choosing an available software or developing one to analyze research text corpora;

• formulating the mathematical statement of the problem.

The corpus of Russian texts RusSuiCorpus written by individuals who committed suicides is being compiled. It currently contains texts by 45 individuals aged from 14 to 25, the total volume of the corpus is 200 000 words. All the texts are manually collected and are Internet texts by individuals who committed suicides (searched on social media "Vkontakte" and "Zhivoj Zhurnal"; as most posts on "Vkontakte" contain a lot of non-original material, the corpus is mainly made up of texts from "Zhivoj Zhurnal", i.e. so-called "death diaries"). The fact that suicides were actually committed was checked by analyzing friends' comments, media texts, etc.

All the texts are processed using the Russian version of LIWC as well as morphological and syntactic tagging tools [20].

Statistically significant differences between the texts by individuals who committed suicides and those in the test group are being researched and models to distinguish between texts by suicidal and non-suicidal individuals are being designed using machine learning methods. The text corpus RusPersonality [3] with metadata with the information about the authors is employed with texts by individual of a certain age being selected. RusPersonality contains texts of natural written speech, which makes it suitable for comparison with blogging texts.

Conclusions. The methods of corpus linguistics are most important in investigating the features of individuals who committed suicides. Studies to identify the typical features of suicidal individuals would allow us to develop diagnostic tools for evaluating suicidal tendencies based on linguistic analysis of speech production.

The currently designed RusSuiCorpus and studies employing it by means of modern software, statistical and machine learning methods would enable us for the first time to obtain the data regarding the features of Russian written speech of individuals who committed suicides and compare the results with those for other languages.

Список литературы / References

1. Yegorov А. Yu., Ivanov О. V. Features of Individual Profiles of Functional Asymmetry of Individuals Who Attempted a Suicide // Social and Clinic Psychiatry. 2007. № 2 (17). P. 20-24.

2. Litvinova Т.А. Identifying the Characteristics (Profiling) Authors of Written Texts // Philological Sciences. Theoretical and Practical Issues. - 2012. - № 2 (13). - P. 90-94.

3. Litvinova Т.А., Dibrova Ye.V., Litvinova О.А., Ryzhkova Ye.S. Corpus Studies of Written Speech in Forensics // Philological Sciences. Theoretical and Practical Issues. 2015. № 8. P. 1. P. 107-113.

4. Litvinova Т.А., Seredin P.V., Litvinova О.А., Zagorovskaya O.V., Serdyuk M.Ye. Identifying Autoaggressive Tendencies of Authors of Written Texts // Journal of Voronezh State University. Series: Linguistics and Multicultural Communication. 2015. № 3. P. 98-104.

5. Preventing Suicides: a Global Imperative: translated from English, 2014. URL: http ://psychiatr. ru/download/1863?view=1 &name=Suicide -report-a-global-imperative-Rus .pdf (reference date: 13.07.2016)

6. Baddeley J. L. Email communications among people with and without major depressive disorder. Unpublished doctoral dissertation. University of Texas at Austin, Austin, TX. 2011.

7. Davidson Ch. Comparative Psychological Analysis of Six Russian Poets // US-China Foreign Language. 2013. Vol. 11, no. 1. P. 40-45.

8. Joiner T. E., Brown J. S., Wingate L. R. Jr. The psychology and neurobiology of suicidal behavior // Annu Rev Psychol. 2005. Vol. 56. P. 287-314.

9. Jonesa N. J., Bennella C. The Development and Validation of Statistical Prediction Rules for Discriminating Between Genuine and Simulated Suicide Notes // Archives of Suicide Research. - 2007. - Vol. 11, Issue 2. - Р. 219-233.

10. Lester D., The «I» of the Storm: Understanding the Suicidal Mind. De Gruyter Open Ltd, 2014. 170 p.

11. Lightman E. J., McCarthy P. M., Dufty D. F., McNamara D. S., Using Computational Text Analysis Tools to Compare the Lyrics of Suicidal and Non-Suicidal Songwriters // D. S. McNamara & G. Trafton (Eds.). Proceedings of the 29th Annual Cognitive Science Society. Hillsdale, NJ: Erlbaum, 2007.

12. Litvinova T., Zagorovskaya O., Litvinova O., Seredin P. Profiling a Set of Personality Traits of a Text's Author: A Corpus-Based Approach // A. Ronzhin et al. (Eds.): SPECOM 2016, LNAI 9811, 2016, pp. 555-562.

13. Matykiewicz P., Duch W., and Pestian J. Clustering semantic spaces of suicide notes and newsgroups articles // Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing (BioNLP '09) / Association for Computational Linguistics, Stroudsburg, PA, USA, 2009. - Р. 179-184.

14. Mulholland M., Quinn J. Suicidal Tendencies: The Automatic Classification of Suicidal and Non-Suicidal Lyricists Using NLP // International Joint Conference on Natural Language Processing. Nagoya, Japan, 14-18 October 2013. P. 680684.

15. Pennebaker J. W., Chung C. K., Ireland M., Gonzales A., and Booth R. J. The Development and Psychometric Properties of LIWC2007. The University of Texas at Austin and The University of Auckland, New Zealand, 2007. http ://www. liwc.net/LIWC2007LanguageManual.pdf

16. Pennebaker J. W., Stone L. D., What Was She Trying to Say? a Linguistic Analysis of Katie's Diaries // D. Lester (Ed.). Katie's Diary: Unlocking the Mystery of a Suicide. New York: Brunner-Routledge, 2004. P. 55-80.

17. Pestian J. P., Matykiewicz P., Grupp-Phelan J. Using natural language processing to classify suicide notes // Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing (BioNLP '08) / Association for Computational Linguistics, Stroudsburg, PA, USA, 2008. Р. 96-97.

18. Pestian, J. P., Matykiewicz, P., & Linn-Gust, M. What's In a Note: Construction of a Suicide Note Corpus // Biomedical Informatics Insights. 2012. Vol. 5. P. 1-6.

19. Pestian, J., Nasrallah, H., Matykiewicz, P., Bennett, A., & Leenaars, A. Suicide Note Classification Using Natural Language Processing: A Content Analysis // Biomedical Informatics Insights. 2010. Vol. 3. P. 19-28.

20. Rybka R., Sboev A., Moloshnikov I. Gudovskikh D. Morpho-syntactic parsing based on neural networks and corpus data // Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT), IEEE, St. Petersburg, 2015. P. 89 -95.

21. Stirman S. W., Pennebaker J. W. Word Use in the Poetry of Suicidal and Non-Suicidal Poets // Psychosom Med. 2001. N 63(4). P. 517-522.

DOI: 10.18454/RULB.7.14 Киржаева В.П.

Национальный исследовательский Мордовский государственный университет им. Н.П. Огарёва ПЕДАГОГИЧЕСКИЙ ДИСКУРС В ЖУРНАЛАХ РУССКОГО ЗАРУБЕЖЬЯ 1920 - 30-Х ГГ. (НА

МАТЕРИАЛЕ СТАТЕЙ Н.А. ГАНЦА)

Аннотация

В статье рассматривается специфика педагогического дискурса в журналах русской эмиграции 1920 - 30-х гг. на примере публикаций Н.А. Ганца в «Современных записках» и «Русской школе за рубежом». Проанализированы особенности этого типа дискурса, реализуемые в нем целевые установки, характер используемой терминологии, дан краткий сопоставительный анализ с педагогическим дискурсом публикаций С.И. Гессена, В.В. Зеньковского и др. авторов «Современных записок».

Ключевые слова: педагогический дискурс, журналы русского зарубежья, Н.А. Ганц, С.И. Гессен, В.В. Зеньковский.

Kirzhaeva V.P.

Ogarev National Research Mordovia State University PEDAGOGICAL DISCOURSE IN MAGAZINES OF RUSSIAN ÉMIGRÉ COMMUNITY IN 1920- 30S (ON

THE BASIS OF N. HANS ARTICLES)

Abstract

The article deals with the characteristic aspects of pedagogical discourse in Russian émigré community magazines in 1920-30s based on N. Hans articles published in "Contemporary Annals" and "Russian School Abroad". The author analyses specific features of this type of discourse, its aims and the nature of its terminology. It contains brief comparative analysis of pedagogical discourse in publications of S. Hessen, V. Zenkovskiy and other authors who contributed to "Contem oprary Annals".

Keywords: pedagogical discourse, Russian émigré journals, N. Hans, S. Hessen, V. Zenkovskiy.

Почта авторов / Author Email: [email protected]

Благодарности / Acknowledgements

Работа выполнена при финансовой поддержке РГНФ, проект 16-06-00501а.

The work was supported by the Russian Foundation for Humanities, project 16-06-00501a.

Introduction

The phenomenon of the "first wave" of the Russian emigration traditionally interests the researchers in the field of modern Russian humanities - from those ones who specialize in Philology and History to scholars in Pedagogy, Political science and Sociology. Once they found themselves outside Russia the representatives of emigration created their own intellectual spaces, which identified and proposed solutions to a number of problems, most important of which included the preservation of national identity of the Russian culture, language and literature, the development of specific methods and techniques of education and upbringing to oppose it to the increasing loss of national identity of the younger generation. Active discussions of these issues at the congresses of NGOs, meetings of groups and associations on the pages of numerous émigré publications, including magazine periodicals became a characteristic feature of the social life of emigration. The formation of special types of emigrant discourse, the content of which is defined through both subject and the platform used for discussion, became the result of the abovementioned discussions.

At the beginning of the 1920s cultural and pedagogical space of the Russian émigré community led to the formation of pedagogical discourse on the pages of émigré publications such as "Latest News", "Today", "Contemporary Annals", "Numbers" etc. and such educational magazines as "Russian School Abroad", "Bulletin of Educational Bureau of Russian Primary and Secondary Schools Abroad", "Russian School",

"Student Herald", "Herald of Russian Student Christian Movement" and others. They contained the works of such authoritative representatives of psycho-pedagogical ideas as S. Hessen, V. Zenkovskiy, G. Troshin and many other outstanding philosophers, historians, linguists, literary scholars and public figures. This broad discussion of educational problems led to the adoption of specific measures for the development of different levels of education of the Russian emigration, elaboration of programs and curricula, introduction of new disciplines (we should note constant attention to such subjects as the Russian language, Literature, God's Law, History, Geography, and Singing aimed at the formation of national identity). This was possible only as a result of social consensus, which is why we should consider pedagogical discourse of the Russian diaspora as a much broader phenomenon than just a text immersed in the situation of educational communication [1] and should be defined as a scientific and pedagogical phenomenon.

The publication of Nikolai Hans is a good example of how magazines formed pedagogical discourse abroad and implemented its institutional characteristics. In our article we speak about the works of N. Hans not only due to his professional activity on the pages of literary and social editions, but also due to the fact that he is a rare example of a successful émigré scholar well-known at the international level [2].

In 1923 he released his publication - a review of the first issue of "Russian School Abroad" in the literary and social magazine "Contemporary Annals" which dealt with

i Надоели баннеры? Вы всегда можете отключить рекламу.