A depiction of oral speech (the potential of useing automated research for the purposes of phonoscopic evaluation)

Matveeva Liubov Yurievna; Prokofyeva Larisa Petrovna

Коммуникативные исследования. 2016. № 2 (8). С. 17-27. УДК 81'23

L.Yu. Matveeva, L.P. Prokofyeva Saratov, Russia

A DEPICTION OF ORAL SPEECH (THE POTENTIAL

OF USEING AUTOMATED RESEARCH FOR THE PURPOSES OF PHONOSCOPIC EVALUATION)

The article represents an experimental study of the emotional component of oral speech. This study was done with the help of instrumental methods: a freeware program for the analysis and reconstruction of acoustic speech signals "Praat" (© P. Boersma & D. Weenink) and phonosemantic analysis program "Zvukotsvet" (© L.P. Prokofyeva). Authentic sound material with the specified lexical (invectives and swearwords) and intonation (different intonation patterns) parameters are being analyzed by computer programs and interpreted on the base of modern phonosemantics. The results show the possibility to reveal emotional component of oral speech without any direct connection to the lexical meaning and can be used for the theoretical computerized model of phono-semantic speech analysis and pragmatic goals, including forensic phonoscopic examination of speech.

The article was first published in Russian in the book: Russkaya ustnaya rech [Russian oral speech], Proceedings of All-Russian scientific conference with international participation "2nd Barannikova's readings. Oral speech: Russian dialect and colloquial communication culture" (Saratov, SSU, November 18-19, 2015), Saratov, Amirit Publ., 2016, Iss. 2, pp. 66-76.

Key words: phonoscope examination, intonation pattern, phonosemantic analysis.

Contemporary research of oral language is extremely multifaceted and multifunctional. One important theoretical challenge is how to detect methods of transmitting information during the process of oral communication. Undoubtedly, in a broader sense, research on communication often concerns the lexical material itself. However, in the past few decades, new fundamentals that are centered on nonverbal means have arisen. This has allowed for further research into criteria that can be used to describe (assess] the emotional state of the author.The role of nonverbal cues and intonation is particularly salient in cases when the meaning of a statement (which is composed of lexicologically combined phrases] is incomplete or does not correspond with the situation in which the speech is occurring. Although research in the field of phonoseman-tics is still experimental, it is our firm conviction that phonosemantics is capable of revealing deeper and suggestive processes that are often hidden from the superficial view, but which can be detected with the aid of specialized automated

methodologies. In principle, it is possible to create a computerized model of speech analysis that would be based on experimental data on the color associations of the sounds of the Russian language [Prokofieva 2007].

As a rule, in the process of everyday communication, lexical material and nonverbal signals correspond with one another. However, if the situation has any limitations, then the most obvious factor - the lexical meaning - ceases to play the main role and gives way to other components of communication. In such circumstances, the researcher looks into individual segments of significant components of intonation, such as melody and intensity of the voice, pauses, speech tempo, and timbre, in order to determine their role in the process of direct communication. Is it possible to draw conclusions about the communicative intentions of the speaker simply by looking at the intonation patterns of his speech as separate from their lexicological meaning? In order to try to answer this question, let us hypothesize that instrumental methods of processing oral language can aid in detecting the emotional composition of the statement.

When conducting instrumental research of oral language, the pragmatic focus of the verbal statements within the boundaries of forensic examination must be taken into consideration. The prosodic study of the notional meaning of a given phrase represents a complex task for phonoscopic, linguistic, and psychological forensic analyses. According to E.I. Galyashina, when conducting such research, "... it is necessary to use auditory and instrumental methods of acoustic-phonetic speech analysis, to use specialized knowledge from the field of psychology of verbal actions to reliably answer the questions posed by the experts on the content-meaning focus, the pragmatics, and the suggestiveness of a text" [Galyashina 2006: 10]. We cannot help but agree that the classical linguistic evaluation of a text in such a case cannot be exhaustive, since the text or the so-called "decryption" of oral statements does not convey all of its pro-sodic nuances [Galyashina 2006: 44].

In order to test our hypothesis that instrumental methods can allow for the detection of the same communicative intentions of the speaker in language segments of different lexicological content, we compared the intonation patterns of statements that explicitly contain insults with statements that are of a negative character but whose lexicon is not marked by expletives or unprintable phrases.

For our experiment, we chose fragments of recordings of monologues from the speeches of two men. In one case, it was fragments of recordings from a telephone conversation (a situation of communication - a discussion of mutual acquaintances]. The speech of the participant, to be designated here as M1, represent phrases of a spontaneous dialogue - free, not prepared; the style of communication is unofficial and emotional. This material is particular in that the situation, apparently, allows for the use of varied expressive means, and it often goes outside the confines of standard language.

The speech of the other participant, to be designated as "M2," is a monologue - somewhat free, partially prepared (since he knew the topic of the con-

versation in advance]; the style of communication is official, emotional. The situation was a discussion of well-known persons and events. The phrases that we used came from fragments of a recording of a public presentation. This speech is particular in that the format allowed for the use of varying emotional-expressive devices, although they do not go outside the bounds of standard language.

In order to obtain the intonation patterns, we used the program Praat, which was created by members of the Department of Phonetic Sciences at the University of Amsterdam, Paul Boersma and David Weenik, and is intended for linguists who study oral speech. The program allows for the creation of a multi-level layout of oral speech, including constructing waveforms, spectrograms, and intonograms. One of the goals of the experiment was to study the possible use of the program in conducting phonoscopic evaluation.

For comparison, we took two pairs of phrases:

1. "You're such a f—king sh—thead" which is part of the phrase "F—k, you're such a f—king sh—thead", taken from Mi's dialogue. This correlates with the contextual meaning of the nomination "cowboy," which is part of the phrase "Bush is a c—ksucking cowboy," which was uttered by M2 in a public presentation.

2. "I'm so f—king sick of this sh—t," which is a fragment of the phrase "I have here, f—k, four text messages, I'm so f—king sick of this sh—t," taken from Mi's dialogue. This can correlate to the content of the meaning of the phrase "I'm so fed up with", which is part of the phrase "I'm so fed up with these [racial slur]," uttered by M2 in a public presentation.

Here is a comparison of the intonation pattern of each pair of phrases:

Fig. 1. The intonation pattern of Mi's phrase "You're such a f—king sh—thead"

Figures 1 and 2 show the graphics of the intonation of the two phrases that go together in meaning: "sh—t," and "...cowboy." In the general context of the conversations, these phrases have a similar meaning - a contemptuous description of the person whom they are talking about. Moreover, the dictionary

describes the noun "sh—thead," as colloquial, while the noun "cowboy" does not carry any such label. The meaning of the noun "cowboy" here is derived from the context of a concrete phrase. It is important to note that these two words have a similar phonetic pattern.

ui ¿la» ША Iflppp^fH I ! lyUJjjjllljl рщр

Hi ikM lltiiH »и i :iilfli|iP Ml

^ f rJ »»mm -- 0МЧ» wmiw renr

ЙЙГдТЖТи if* ВЕЯ

Fig. 2. The intonation patter of M2's phrase "... cowboy"

As it can be seen, the intonation patterns (the intonation is indicated by the grey line] of these short phrases are relatively similar, although the charts do indicate some variations in the amplitude of the frequency and fluency of the melodies. In our opinion, the differences are in part due to the difference in the speakers' manners of speaking. Ml speaks more emotionally, and furthermore the melody of his speech is somewhat fluid (Fig. 1]. M2 is also emotional, but his intonation is discontinuous, as can be seen in the chart (Fig. 2].

With the help of the program Praat, we were able to determine the average frequency of the voices of the speakers when they were uttering the indicated phrases. The average frequency of Ml's voice when pronouncing the phrase, "you're such a f—king sh—thead," is approximately 129 Hz, just as the average frequency of M2's voice when uttering the phrase "...cowboy" was approximately 141 Hz, which can probably be attributed to the unique acoustics of each speaker's voice.

Figures 3 and 4 show the intonation of two phrases that are similar in meaning "I'm so f—king sick of this sh—t" and "I'm so sick of.". Within the general context of the conversations, the phrases have a similar meaning: "to push someone to the brink of annoyance; to really get tired of." Moreover, the dictionary describes the verb "to be sick of" as colloquial, while the verb "to be f—king sick of" is related to the non-normative lexicology of the Russian language. On the intonongram, these phrases have a similar phonetic pattern.

is . > с- я a -j«,

Fig. 3. The intonation pattern of Ml's phrase "I'm so f—king sick of this sh—t"

«достали»

О ' F Я H'f:l'=,

ÏÏS.

mini g' т

Fig. 4. The intonation pattern of M2's phrase "I'm so sick of..."

The phonetic patterns of these short phrases are similar enough, although the charts do indicate some variations in the amplitude of frequency and fluidity in the melodies. Just as we noted in the description of the first pair of charts, a similar variation may be due to differences in the speech mannerisms of the speakers, which can be easily seen in Charts 3 and 4. The average frequency of the voices of the speakers when speaking the phrases was around 181 Hz in the case of Ml (Chart 3], while the average frequency of M2's voice is

around 530 Hz (Chart 4], which can also be attributed to the individual acoustics of the individual voices.

On the whole, Praat allows us to create fairly detailed charts of intonation and to see the maximum and minimum frequency using an instrumental analysis of the uttered phrases. However, we met with several challenges during our work:

- The program is not always capable of constructing a pattern in cases where the speech does not have enough fluidity. In particular, M2 speaks at a quick tempo and pronounces his words sharply and brokenly. The chart of his intonation, as we can see in Figures 2 and 4, is an intermittent split-level line, in so much as the program recognizes only the maximum frequency, while only rarely recognizing the minimum;

- In order to ascertain the average frequency of the characteristics of intonation, Praat suggests manually delineating the section where it is necessary to calculate the average frequency. For this section, it calculates the maximum and minimum indicators of frequency The average frequency is thus determined automatically for the entire chart. These indicators are displayed in the right-hand side of the program, separate from the chart itself. However, during the automatic deduction of the average frequency, the program experiences difficulties, as can be seen when looking at Figure 4: the minimum (75 Hz] and the maximum (500 Hz] frequencies can be seen on the right side of the chart, but the average indicator is missing. Let us recall that M2's speech is abrupt, with a varying tempo. However, the program does not seem to experience difficulties when we manually divide his speech into sections.

In our opinion, such deficiencies can be avoided in the future with the creation of or modification to the program that would automatically process oral speech. One of the necessary conditions of functionality and viability of similar programmatic products would be their ability to process oral signals that have unique characteristics, since living speech is not inhomogenous. The oral mannerisms of a subject, with the varying acoustic parameters of his voice, are his defining characteristics. Similar programs should be developed not only for abstract-ideal speech recordings, but also for processing everyday colloquial speech that does not have a stable tempo and that has varying volume and emotional saturation.

The second stage of the experiment with the selected materials was an attempt to fixate the differences in emotional impact between the fragments of the oral language and its accompanying intonation patterns. "As an integral part of the human psyche, the emotional-evaluation reaction inherently has a clearly stated national character. Representatives of different nations do not always perceive, understand, or interpret the same facts from their surroundings in an identical way" [Zavorotishcheva 2010]. Let us hypothesize that the sound of spoken language creates a general emotional state, which is intensified in certain parts, which in turn affect one's interpretation of the speech as a

whole. As such, the methods of phonosemantic analysis are capable of revealing this mood, which is set at the level of audio-color associativity.

A.S. Stern developed a rigorous method of discovering the ties between the connotative meaning of a text and its total phonetic meaning. This method is based on assessing how the frequency of sounds deviate from their usual speech frequency [Stern 1967]. In spontaneous speech, sounds are met at a somewhat fixed rate, and the speaker of that language intuitively "expects" to meet each sound a certain number of times without consciously realizing it, so long as the portion of that sound stays within the norm. An increase in their informativity, for example through a subliminal change in the quantity of a rarely met sound within a section of a speech (even in the absence of such devices of semantization of a text as assonance or alliteration] manifests itself in the conscious (or the subconscious] of the listener, thereby underlining the phonetic meaning of the entire text. In this sense, a logical proposition would be that the audio-color associativity that exists for every speaker of a national language lies dormant during normal frequency of speech. Therefore, a deliberate (or non-deliberate] change in the frequency of the occurrence of a sound could activate or transform from the level of the unconscious to the subconscious, thereby depicting the emotional background that is also connected with the noncon-scious associativity of the colors of language [Prokofyeva, Durnova 2011]. Here, it is important to recall that the established meaning of colors in the Russian linguo-culture's collective subconscious [Vasilevich, Kuznetsova, Mishchenko 2005; Yanshin 2001], combined with data from research on Russian national audio-color associativity enables the recipient to fixate a total setting of "mine" and "foreign," "good" and "bad," in essence - emotional coloring.

Earlier research on intentionally semantized language products (spells, glossolalia, mantras, prayers, commercial advertisements, children's lullabies and the like] has shown that suggestiveness is carried out on just such a pho-nosemantic level, even more so when the lexicological level of many analyzed texts is by default obscured and the frequency is, in general, asemantic [Pro-kofyeva, Klemyonova 2012].

The automated analysis of the four speech fragments of M1 and M2 with the help of the program "Sound-color" (reg. #] showed inherent differences on the basis of phonosemantics, which demonstrates the potential interpretative possibilities of this given method. M1's fragment "you're such a f—king sh— thead, f—k," was characterized as 22.22% black (Fig. 5], which is an almost unique example in our work (the second such result was only during the analysis of K. Balmont's poem "The boat of longing," and only in certain segments that contained alliteration on the letter "ch"].

The breakdown in the fragment shows that the program strictly speaking did not detect any methods of semantization, but that it designated five phonetic-letters (D, U, Ya, B, YI] out of 19 as highly informative. A similar phonosemantic "fluidity" is not rare in the traditional texts of spells, mantras,

and prayers. It is also often met in poetic language, but we were the first to note it in conversational materials. In all fairness, it must be said that the invective and expletive lexicology still has not been an object of specialized phonosemantic research, but even at a first look, its essential suggestive potential cannot not be noted, as its researchers constantly mention [Karasik 1992; Zhelvis 2001]. We must point out that neither the percussive (blue] "and" nor the (red] "I" could neutralize the black color of the text. It can be suggested that the psychological impact of the phrase has left a negative emotional "stain" in the consciousness of the listener, which spreads to the neighboring syntagmas and, possibly, to the oral work as a whole. Without specialized research, however, it is of course difficult to confirm or deny this proposition. It is possible that the use of the individual pronoun "you (informal]" when combined with the invectively and stylistically marked words strengthens the negative emotion that transforms the speech from the abstract to the personal (even outside of the bounds of the speaker's intensions!].

Fig. 5. A diagram of the main colors based on analysis of Mi's phrase "You're such a f—king sh—thead, f—k"

An analysis of the next fragment of speech, M2's "Bush is a c—ksucking cowboy," which clearly contains an obscenity, nevertheless for a Russian speaker reveals an average of white-blue-red colors (Fig. 6], which is perceived as more neutral than the first example. The program also assigns high informa-tiveness to several phonetic sounds (K, U, Ya, B, IJ] in the absence of seman-tized methods of alliteration or assonance. The emotional background of the fragment of speech is moderate, since the associative reaction does not allow it to be identified as "other" or, accordingly, "hostile."

Fig. 6. A diagram of the main colors based on the analysis of M2's phrase "Bush is a c—ksucking cowboy"

The research was experimental, and therefore it is still early to draw any global conclusions. However, even just on the basis of comparing the "Sound-color" program's analysis of the two phrases that are similar in meaning but different in linguistic representation, one can draw a preliminary conclusion about the possibilities for determining the emotional background of a particular oral speech without regard to its lexicological meaning.

References

1. Galyashina, E.I. (2006), Lingvistika vs ekstremizma: Vpomoshch' sud'yam, sledovatelyam, ekspertam [Linguistics vs. extremism: helping judges, investigators, and experts], ed. by M.B. Gorbanevskiy, Moscow, 96 p.

2. Karasik, V.I. (1992), Yazyk sotsial'nogo statusa [Language of social status], Moscow, 330 p.

3. Prokofyeva, L.P. (2007), Zvuko-tsvetovaya assotsiativnost': universal'noe, na-tsional 'noe, individual 'noe [Sound and color associativity: universal, national, individual], Saratov, 280 p.

4. Prokofyeva, L.P., Durnova, N.A. (2011), Metodika fonosemanticheskogo is-sledovaniya rechi [Methods of phonosemantic research of a language]. Mova i kul'tura [Language and culture], Iss. 14, Vol. 8 (154), pp. 55-61.

5. Prokofyeva, L.P., Klemyonova, E.N. (2012), Tsvet i zvuk v politicheskoi reklame [Color and sound in political advertising]. Galeevskie chteniya [Galeevsky's readings], materials of the international academic and practical conference ("Prometheus" 2012), April 6-8, 2012, Kazan, pp. 269-275.

6. Shtern, A.S. (1967), Obyektivnye kriterii vyyavleniya effekta "zvukovoi sim-volikim" [Objective criteria of identification of effect of "sound symbolics"]. Materialy

seminara po probleme motivirovannosti yazykovogo znaka [Seminar materials on a problem of motivation of a language sign], Leningrad, pp. 69-73.

7. Vasilevich, A.P., Kuznetsova, S.N., Mishchenko, S.S. (2005), Tsvet i nazva-niya tsveta v russkom yazyke [Color and the names of colors in Russian language], Moscow, 216 p.

8. Yan'shin, P.V. (2001), Psikhosemanticheskii analiz kategorizatsii tsveta v strukture soznaniya subyekta [Psychosemantic analysis of categorizing colors in the structure of the consciousness of the subject], Author's abstract, Moscow.

9. Zavorotishcheva, N.S. (2010), Invektivy v sovremennoi razgovornoi rechi (na materiale pireneiskogo natsional 'nogo varianta ispanskogo yazyka i amerikanskogo na-tsional 'nogo varianta angliiskogo yazyka) [Invectives in modern conversational speech (On the material of Iberian Spanish and American English)], Author's abstract, Moscow, 26 p.

10. Zhel'vis, V.I. (2001), Pole brani. Skvernoslovie kak sotsial'naya problema v yazykakh i kul'turakh mira [Battlefield ('The Field of Abuse'). Foul language as a social problem in world languages and cultures], Moscow, 352 p.

Л.Ю. Матвеева, Л.П. Прокофьева Саратов, Россия

КАРТИНА ЗВУЧАЩЕЙ РЕЧИ (ПОТЕНЦИАЛ ИСПОЛЬЗОВАНИЯ АВТОМАТИЗИРОВАННЫХ СПОСОБОВ ИССЛЕДОВАНИЯ ДЛЯ ЦЕЛЕЙ ФОНОСКОПИЧЕСКОЙ ЭКСПЕРТИЗЫ)

Представлено экспериментальное исследование эмоционального компонента звучащей речи, проведенное с использованием инструментальных методов: программы анализа и реконструкции акустических речевых сигналов Praat (© Р. Воегета & D. Weenink) и авторской программы фоносемантического анализа «Звукоцвет» (© Л.П. Прокофьева). Аутентичный звуковой материал со специфическими лексическими (инвектив и бранные слова) и интонационными (различный интонацонный рисунок) параметрами был проанализирован в автоматическом режиме и интерпретирован с учетом достижений современной фонетической семантики. Результаты демонстрируют принципиальную возможность выявления эмоциональной составляющей звучащей речи без прямой связи с лексическим значением, что может быть использовано для создания теоретической компьютерной модели фоносемантического анализа речи и прагматических целей, в том числе для фоноскопической экспертизы речи.

Статья является переводом публикации авторов: Русская устная речь: материалы Всероссийской научной конференции с международным участием «11-е Ба-ранниковские чтения. Устная речь: русская диалектная и разговорно-просторечная культура общения» (г. Саратов, СГУ, 18-19 ноября 2015г.). Вып. 2. Саратов: Ами-рит, 2016. С. 66-76.

Ключевые слова: фоноскопическая экспертиза, интонационные паттерны, фоносемантический анализ.

Сведения об авторах: Матвеева Любовь Юрьевна, аспирант кафедры русского языка как иностранного

Саратовский государственный медицинский университет им. В. И. Разумовского

410012, Россия, Саратов, ул. Большая Казачья, 112

E-mail: lyu-matveeva91@ya.ru Прокофьева Лариса Петровна, доктор филологических наук, заведующий кафедрой русского языка как иностранного

Саратовский государственный медицинский университет им. В. И. Разумовского

410012, Россия, Саратов, ул. Большая Казачья, 112

E-mail: prokofievalp@mail.ru

About the authors: Matveeva Liubov Yurievna, postgraduate student of the Russian as a Foreign Language Chair

Saratov State Medical University named after V.I. Razumovsky 112 Bolshaya Kazachya ul., Saratov, 410012, Russia

E-mail: lyu-matveeva91@ya.ru

Prokofyeva Larisa Petrovna, Prof., Head of the Russian as a Foreign Language Chair

Saratov State Medical University named

after V.I. Razumovsky

112 Bolshaya Kazachya ul., Saratov,

410012, Russia

E-mail: prokofievalp@mail.ru

Дата поступления статьи 10.06.2016

Для цитирования: Матвеева Л.Ю., Прокофьева Л.П. Картина звучащей речи (потенциал использования автоматизированных способов исследования для целей фоноскопической экспертизы) // Коммуникативные исследования. 2016. № 2 (8). С. 17-27. (На англ. яз.).

For citation: Matveeva, L.Yu., Prokofyeva, L.P. (2016), A depiction of oral speech (the potential of useing automated research for the purposes of phonoscopic evaluation). Communication Studies, No. 2 (8), pp. 17-27.

A depiction of oral speech (the potential of useing automated research for the purposes of phonoscopic evaluation) Текст научной статьи по специальности «Языкознание и литературоведение»

Аннотация научной статьи по языкознанию и литературоведению, автор научной работы — Matveeva Liubov Yurievna, Prokofyeva Larisa Petrovna

Похожие темы научных работ по языкознанию и литературоведению , автор научной работы — Matveeva Liubov Yurievna, Prokofyeva Larisa Petrovna

Текст научной работы на тему «A depiction of oral speech (the potential of useing automated research for the purposes of phonoscopic evaluation)»