Научная статья на тему 'CORPUS-STYLISTIC ANALYSIS OF A FAREWELL TO ARMS AND TESS OF THE D’URBERVILLES'

CORPUS-STYLISTIC ANALYSIS OF A FAREWELL TO ARMS AND TESS OF THE D’URBERVILLES Текст научной статьи по специальности «Языкознание и литературоведение»

CC BY
226
36
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
КОРПУС / СТИЛИСТИЧЕСКИЙ / ЛИТЕРАТУРНЫЙ / ВЫЧИСЛЕНИЕ / ОСОБЕННОСТИ НАПИСАНИЯ / CORPUS / STYLISTIC / LITERARY / COMPUTATION / WRITING FEATURE

Аннотация научной статьи по языкознанию и литературоведению, автор научной работы — Xu Yang

Corpus tools are widely used in the novel analysis to show the author’s unique writing style. Hemingway had excellent language control ability, and often expressed complex content with simple words. This thesis presents a corpus stylistic analysis of Ernest Hemingway’s A Farewell to Arms and Thomas Hardy’s Tess of the D’urber-villes. Different from literary criticism by a stylistician or a litterateur, this analysis is based on the electronic texts of the two great works, aiming at studying the frequencies of definite words and the combinations of several words detected by the corpus software. With Tess of the D’urbervilles as a referential object, based on the pertinent data obtained by a series of computation and tests, Hemingway’s main writing features in A Farewell to Arms are analyzed, which are beneficial for the better understanding of the novels.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «CORPUS-STYLISTIC ANALYSIS OF A FAREWELL TO ARMS AND TESS OF THE D’URBERVILLES»

Стилистический анализ текстов «Прощание с оружием» и «Тэсс из рода ДЭрбервиллей» с применением средств корпусной лингвистики

Сюй Ян,

доцент кафедры английского языка института иностранных языков Шэньянского политехнического университета E-mail: [email protected]

Корпус все чаще используется для стилистического анализа романов, чтобы показать уникальный стиль написания писателя. Обладая прекрасной языковой компетенцией, Хемингуэй часто простыми словами выражает сложное содержание. В данной статье используется программное обеспечение для поиска корпуса для проведения стилистического анализа текстов «Прощание с оружием» Хемингуэя и «Тэсс из рода Д'Эр-бервиллей» Томаса Харди. В отличие от литературной критики стилистов и писателей, в качестве объекта исследования взят электронный текст двух произведений, направленный на изучение высокочастотных слов и словосочетаний, обнаруживаемых корпусным программным обеспечением. Основываясь на соответствующих данных, полученных в результате вычислений и тестов, по сравнению с романом «Тэсс из рода Д'Эр-бервиллей» анализируются основные особенности написания Хемингуэя в «Прощании с оружием», которые полезны для лучшего понимания произведений.

Ключевые слова: корпус, стилистический, литературный, вычисление, особенности написания.

Introduction

A Farewell to Arms (Arms), Hemingway's mature work, brought him the public and critical acclaim he had been seeking. Arms is so well-plotted and the characters in it are so well-portrayed that the readers seem to be able to feel the hero's rhythm of life, and therefore, their attentions are naturally drawn to the author's extraordinary writing techniques. This thesis adopts Tess of the D'urbervilles (Tess) as a referential object, and uses FoxPro and SPSS etc. to make corpora, aiming to analyze Hemingway's writing features in Arms.

Data collection

As computer technologies are updated rapidly nowadays, online editions of many classic novels are available in electronic form. The online editions of A Farewell to Arms and Tess of the D'urberville were downloaded from the following links: www.read.blabla.cn and www. classicreader.com. The quantitative data for this study was achieved by the operation of a FoxPro program package. The initial step to obtain the required data was to break the original texts into individual words, which were stored respectively in different tables. Word length counts for both texts were then computed and stored in newly established tables. As words in the tables were word tokens which cause problems in linguistic counting, two lemmatization processes were performed to get the list of word types by classifying the inflected forms of a word into one type. Word frequencies were then computed and both top-frequency words and words occurring only once were recognizable from the word-frequency tables.

In addition, words occurring in both Arms and Tess and those exclusive to either text were split and stored in separate tables by the use of another subprogram comparing the word types of the two texts. For the obtaining of sentence length information, a subprogram was run to break the two texts into separate sentences, which were then imported to a new table for reference with their corresponding sentence length counts marked in the succeeding column. The resulting data were then rearranged with the help of Microsoft Excel software as required by the performance of data analysis in this research.

о с

u

Data analysis

I. Vocabulary richness

The most typical switch of varieties involves turning to the particular set of lexical items habitually used for handling the field in question. Tests are composed of

words, but the vocabulary in different registers and different writing styles are not the same. One can often, by intuition, tell that one text is stylistically different from another by a glimpse at the vocabulary of the texts.

In this section, we attempt to reveal which of the two texts is characterized by the greater vocabulary size. By sample size we mean the number of word tokens in a sample or text. In contrast, vocabulary size refers to the number of different word types covered in a sample of certain tokens (Frequency). The data in the following table show the sample size, vocabulary size and type/token ratio of Arms and Tess in addition to the number of word types covered in the overlapping words and text-specific words (tabl. 1).

Table 1

A Farewell to Arms Tess of the D'urberviNe

Number of word tokens 88967 150834

Number of word types 6704 14820

Type/token ratio 0.0753 0.0983

Number of overlapping types 2008

Number of text-specific types 4696 12812

Comparing vocabulary size of these two books, we found that Arms just has 88967 words, while Tess has 150934 words. The latter has 61967 more words than the former, and the number of word tokens covered in the former is smaller than that in the latter whose is 12812. Arms is much shorter than Tess, so it is normal that the former has smaller vocabulary size. The value of Arms is 0.0753, which is larger than that of Tess(0.0983). Nevertheless, the value of type/token ratio is also heavily influenced by the sample size, so it wouldn't hold much importance either. These statistics are provided here for reference only.

The number of overlapping word types amounts to 2008, accounting for 29.95% of the total word types in Arms and 13.55% in Tess. Arms contains 6704 distinct word types to the counterpart 14820 in Tess.

Examination of the data shows that overlapping words tend to have greater frequency than the

Table 3

text-specific vocabulary items in either Arms and Tess (tabl. 2).

Table 2. The Overlapping Words in Two Corpora and Their Respective Frequencies (30-top)

a 1792 3174 administration 1 1

able 14 24 admiration 1 8

about 195 202 admired 2 4

absolutely 3 10 advance 4 7

abstract 1 1 advantage 1 14

accent 2 8 advice 1 1

accident 3 14 affaire 1 1

account 1 44 affection 1 22

ache 1 9 affectionate 1 3

acquaintance 1 2 afford 1 5

across 83 41 afraid 46 14

act 3 19 after 111 189

action 3 5 afternoon 21 44

actual 1 4 again 101 137

address 6 12 against 44 106

We also explored the overlapping words of the two works and CET4, CET6. The numbers of overlapping word types amount to 1890 and 2027, respectively, which account for 28.19% and 30.24% of the total word types in Arms. In contrast, the statistics in Tess are 20.16% and 23.60%, respectively. We can find that Arms has less vocabulary size, but more CET4 and CET6 vocabularies. Therefore, compared to Hardy, Hemingway used more simply words concluded in CET4 and CET6.

II. Word frequency distribution

Linguists and researchers working on the field of quantitive stylistics and quantitative aspects of lexical structure regarded statistical analysis of word frequency distribution as a reliable discriminating factor in characterizing the works of a particular author. Word frequency distributions are characterized by a relatively small number of common high-frequency words and very large number of rare words (tabl. 3).

C3

o

CO "O

rank Arms freq Tess freq rank Arms freq Tess freq

1 The 6177 The 8664 26 She 446 By 768

2 And 3187 Of 4336 27 Out 444 Which 763

3 I 2816 And 4260 28 Be 428 Tess 708

4 To 1890 To 4030 29 Up 428 From 695

5 A 1792 A 3170 30 All 411 Have 695

6 Was 1448 Her 2774 31 For 392 This 633

7 You 1371 In 2437 32 Is 385 Their 605

8 In 1326 She 2174 33 Went 385 Were 605

1=1 A

—I

o

C3 t; o m o

OT

3

u o

CO

Окончание

rank Arms freq Tess freq rank Arms freq Tess freq

9 It 1240 Was 2113 34 My 382 Said 582

10 Said 1223 Had 1767 35 Very 373 So 582

11 Of 1122 That 1756 36 Go 351 Him 578

12 We 841 He 1733 37 Down 342 Is 529

13 He 771 As 1577 38 Don't 312 Been 512

14 On 771 It 1554 39 Would 306 Would 499

15 Were 649 I 1420 40 Her 300 If 467

16 With 614 You 1330 41 Are 292 One 464

17 Not 600 His 1250 42 One 282 Could 463

18 They 595 Not 1185 43 Do 281 All 453

19 That 563 For 1039 44 His 278 No 452

20 Had 551 At 1021 45 Them 277 There 435

21 Me 545 With 1003 46 When 277 When 427

22 At 495 On 879 47 Will 276 Me 425

23 There 472 They 849 48 Then 273 My 410

24 But 152 But 793 49 Catherine 269 Or 407

25 Have 150 Be 781 50 Could 264 An 369

Word frequencies in Arms range from 1 to 6704. The most frequent word is the and this is the only word with this particularly high frequency. In contrast, word frequencies in Tess are relatively higher, ranging from 1 to 14820. In both cases, the most frequent word is the. As the whole picture of the word frequency spectrum for Arms and Tess respectively are not able to be presented here, only the top 50 frequency words in the two texts concerned are analyzed here.

III. Word length

Word length has also been used as a stylistic indicator. The FoxPro program WORDLENGTH is used to calculate the word length and the corresponding frequency. A table of the mean word length of each corpus is obtained after the raw data have been processed by computer and listed in the table below (tabl. 4).

Table 4. Mean Word Length (in letters) of Each Corpus

Corpus Arms Tess

Mean word length of each corpus 3.89 4.20

The word length of Arms is 3.89, while that of Tess is 4.20. Contrast to Hardy, Hemingway used shorter and simpler words to express the theme of the work, which reflects the writing styles of conciseness and simplicity he has been famous for.

The 20 longest words in Arms and Tess are presented as follows in descending order of word length (tabl. 5).

_ From the above long words presented, it is notice-c= able that most of the long words are formed through — compounding, and the exceptions are enormously af-o fixed words. Based on the results just obtained, an tt analysis is made concerning the two modes of word

formation - compounding and affixation, and the exceptions are discussed as well. Usually compounds comprises two bases only, but for a relatively minor class of items, more number of bases may be involved, which is exactly what happens in words identified in the two corpora.

Table 5

Words in Arms wordlength Words in Tess wordlength

Chuh-chuhchuh-chuhthen 22 i've-got-a-gr't-family-v 25

Ninety-four-year-old 20 Knighted-forefathers-in-l 25

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Convalescing-leave 19 Kingsbere-sub-greenhill 24

Lieutenant-colonel 18 That-it-may-please-thees 24

Self- consciousness 18 Kingsbere-sub-greenhill 23

Nurse's-eveningoff 18 Violety-bluey-blackish 23

Morning-sickness 17 Momentarily-imprisoned 22

Self-destruction 17 Stoke-d'urbervilles 19

Leather-cushioned 17 Undemonstrative-ness 19

Stretcher-bearers 17 Strongly-disturbing 19

Segeant-adjutant 17 Self- consciousness 18

Misunderstand- 17 Chamber- 18

ings companions

Christiansexcept 16 Gentleman-farmer's 18

Окончание

Words in Arms wordlength Words in Tess wordlength

Court-martrialled 16 Self-preservation 17

Doctorinterested 16 Landscape-painter 17

English-speaking 16 Controversialists 17

Tenentecolonello 16 Daughter-in-law's 17

Girlsaccompanied 16 Faith-professions 17

Skilful-looking 15 Self-chastisement 17

In the course of compounds identification, some arbitrariness has been involved in the attribution of certain words. In the following categorization, some items, though resemble words formed by prefixa-tion, are nevertheless treated as combining forms in compounds, such as, self in Self-destruction, self in Self-preservation and so on. Other attributions are strict followers of A Comprehensive grammar of the English language. Recursive compounding is regarded as one compound becoming a constituent in a larger one.

IV. Sentence Length

The sentence length could make us understand the writing style of the writer. The longer sentences the novel has, the complicated the text is. The shorter sentences the novel involves, the more easily the readers understand. Here we use foxpro to measure the number and the length of the sentence in these two novels (tabl. 6).

Table 6. The Average Sentence Length

Arms Tess

Total sentences 10366 6386

Average sentence length 8.58 23.62

In Arms, there are 88967 word tokens forming 10366 sentences, and the average sentence length is just 8.58, while in Tess, 150834 word tokens have just formed 6386 words, and the average sentence length has reached 23.62.

All these illustrate Hemingway's writing style, just as that of bible, seems easy to understand, but in fact, under the superficial simplicity lies a deeper symbolic meaning. Although all the critics agree on the simplicity and apparent naturalness of his prose and its directness, clarity and freshness, the superficial simplicity is deceptive in nature. It is true in view of the fact that he always manages to choose words which are concrete, specific, more commonly found, more of Anglo-Saxonorigin, casual and conversational, and he employs them in syntax of short, simple sentences, which are orderly and patterned, conversational, and sometimes ungrammatical. Hemingway's strength lies in those short sentences and very specific details, which are powerfully loaded with the tension he sees in life. His abhorrence for the vague and general is extreme.

Conclusion

The access to computerized corpora of Arms and Tess made possible the statistically based descriptions of the lexicon and sentences in the aspects of vocabulary richness, word frequency distribution, word length and sentence length. Based on the body part of this paper, we can now come to the following conclusions:

Examination of the empirical data shows that there are linguistic evidence, which validate the suspicion about the different styles of Arms and Tess. With respect to vocabulary richness, Tess is proved to have a more abundant use of the lexical resources of English. The number of different word types covered in Arms of 88967 tokens is 6704 whereas Tess with 150834 word instances contains 14820 distinct word types. The overlapping word types of the two texts are 2008 in number, accounting for 29.95% of all word types in Arms and 13.55% in Tess. The data show that overlapping words tend to have greater frequency than the text-specific vocabulary items in either Arm and Tess. This is to say that in most cases a certain basic word item from the overlapping vocabulary will have greater frequency than that from text-exclusive glossary. In addition, words of high frequency tend to have higher degrees of coreness than their lower frequency partners within the same lexical set.

The mean word length in Arms is 3.89, while it is 4.20 in Tess. There is difference between the two corpora indeed. We may conclude than these two novels do not differentiate from each other in the use of word length. This is a hard evidence that Hemingway's stylistic use of words in his mature period is using small words to express the conciseness and simplicity.

From examination of long words, it is noticeable that a disproportionately large number of long words are formed through compounding, and some even consists of five bases. The exceptions are, up to the word length level presented here, enormously affixed words.

With regard to sentence length in the two corpora, we can see that Hemingway wrote Arms with a mean sentence length of 8.58 words, whereas Hardy wrote Tess with a mean sentence length of 23.62 words. And the overlap words with the CET4 and CET6 are 28.19% and 30.24% in Arm and 20.16%, 23.60% in Tess. All these illustrate that Hemingway's writing style is deliberate and polished and is never natural as it seems to be, and its simplicity can be disastrously deceptive, as it is highly suggestive and connotative and capable of offering layers of under currents of meaning. In this preliminary study of two works in a corpus-based approach, both the procedure and results demonstrate that despite the crudeness and weakness in many aspects, the statistical and computing techniques contribute to literature and analysis of the texts.

CORPUS-STYLISTIC ANALYSIS OF A FAREWELL TO ARMS AND TESS OF THE D'URBERVILLES

Xu Yang

Shenyang Ligong University

Corpus tools are widely used in the novel analysis to show the author's unique writing style. Hemingway had excellent language con-

C3

о

CO

-a

I=i А

—I

о

C3 t; о m О от

З

ы о со

trol ability, and often expressed complex content with simple words. This thesis presents a corpus stylistic analysis of Ernest Hemingway's A Farewell to Arms and Thomas Hardy's Tess of the D'urber-villes. Different from literary criticism by a stylistician or a litterateur, this analysis is based on the electronic texts of the two great works, aiming at studying the frequencies of definite words and the combinations of several words detected by the corpus software. With Tess of the D'urbervilles as a referential object, based on the pertinent data obtained by a series of computation and tests, Hemingway's main writing features in A Farewell to Arms are analyzed, which are beneficial for the better understanding of the novels.

Keywords: corpus, stylistic, literary, computation, writing feature

References

1. Hemingway, Ernest. A Farewell to Arms. www.read.blabla.cn

2. Hardy, Thomas. Tess of the D'urberville. www.classicreader. com

3. Quirk, R Et. Al. A comprehensive Grammar of the English language, Longman Group Ltd,1985

4. Bloom, Harold. Ernest Hemingway [M]. Beijing: Foreign Language Teaching and Research Press. 1992.

5. Messent, Peter. Ernest Hemingway [M]. London: The Mamillan Press Ltd. 1992.

6. Donaldson, Scott. New Essays on A Farewell to Arms [M] London: Cambridge University Press. 1990.

7. Honore, A(1979). Some simple measures of richness of vocabulary. Association for Literary and Linguistic Computing Bulletin.

8. Francus, W.N and Kucera, H. Frequency Analysis of English Language: Lexicon and Grammar Boston: Houghton Mifflin, 1982.

o d

u

i Надоели баннеры? Вы всегда можете отключить рекламу.