CORPUS - BASED APPROACH TO LANGUAGE TEACHING Ergasheva Mokhira Bakhridinovna
Master of Arts; English language teacher at Moscow State University (Lomonosov) in Tashkent
https://doi.org/10.5281/zenodo.10113169
Abstract. The report aims to describe the action research that took place to study the use of corpus - based approach in an ESL classroom. The author created a corpus using Sketch platform, and the topic of the corpus is "Food and Nutrition". It was used to create several activities, which have been provided in the Appendix. In this action research 24 students at the upper - intermediate level participated and the majority of them found the approach engaging. The data was collected by interviewing the subjects after the study. According to the participants, the corpus can be applied best to learn collocations both in and outside the classroom.
Keywords: corpus, corpora, concordance.
Introduction. Since language learning has become one of the main trends, a lot of attention has been paid to language teaching around the world. The need for learning languages and technology development encourages teachers to find modern ways of dealing with language problems. As a computer-based tool, corpus linguistics is focused on offering "a ready resource of natural, or authentic, texts for language acquisition (Reppen, 2010)." Johansson (2009) makes a solid case for the usefulness of corpora in language courses by demonstrating how it might influence exam preparation, textbooks, activities, and syllabus design. The use of corpora in language instruction can be a useful technique for teaching vocabulary, grammar, and language use to EFL learners (Dazdarevic et.al 2014).
The report aims to describe ways of introducing new vocabulary using a corpus-based approach to the topic "Food and nutrition". Therefore, it first introduces the corpus, its type, and its usage, second, a quantitative analysis of the corpus is provided, third, limitations are noted, next it reports activities designed due to the corpus, and finally, reflects on the experience at the usage of those activities.
Methodology. The corpus was compiled in 2022, and it contains authentic materials including guidebooks on food and nutrition, several magazines, some professional articles and an encyclopedia on food, etc. The data come from the same point in the time (2010 - 2022), which makes the corpus synchronic. It is a written monolingual corpus consisting of a million words which are quite small for its type (O'Keefe, McCarthy, and Carter, 2007). The corpus was created to be used in an ESL classroom, however, it can be called a learner corpus. To be more precise, learner corpora include digital textual information of a language developed by foreign language learners (Pravec, 2002). It is a specialized corpus as it is limited to one topic area - food and nutrition. This corpus can also fulfil the needs of English for Specific Purposes learners as it serves the needs of ESP learners, for example, dieticians and nutritionists.
Corpora are most usually correlated with quantitative studies due to the simplicity with which frequency analysis can be generated (Timmis, 2015). The table (Figure 1) below illustrates the most frequent words in the corpus.
WO rd <1i2° itema 1 122'800 to,al frequency)
Word Frequency Word Frequency
Word
Frequency
Word
Frequency
1 and 37,001
2 a 17,151
3 are 7,952
4 as 6,319
5 an 2,776
6 al 2,387
t also 2,337
8 at 2,222
9 about 1,400
10 all 1,363
it am 1,279
12 adults 1,093
13 acid 943
14 associated 932
15 a. 916
is acids 914
17 after 874
is among 870
19 ameriean 859
20 although 859
21 age 854
22 any 600
23 animal 647
24 amount 590
25 association 573
25 activity 556
27 americans 553
28 addition 535
29 available 507
so added 495
31 animals 469
32 agriculture 424
33 amounts 417
34 adolescents 388
35 according 375
36 aged 362
37 average 359
38 adequate 349
39 another 345
40 against 32B
41 ages 320
42 absorption 317
43 alcohol 314
44 african 313
45 around 313
46 assoc. 290
47 additional 272
48 analysis 263
49 amino 235
so academy 225
Figure 1. Food and Nutrition word frequency list.
Below frequency lists from the corpus are provided and consist of lemmas. Lemma is the basic word form (Timmis, 2015): for example, the lemma "eat" has the word forms eats, ate, and eating.
It can be seen that the most used lemmas (figure 2) are articles and prepositions. Also, it can be discovered that the materials used to build the corpus are mostly about health and diet; the word "study" indicates that there are some academic articles about studies too.
I 0 m m 9 (19.357 items | 939,263 total frequency)
Lemma Frequency Lemma Frequency Lemma Frequency
1 the 37,433
2 and 37,001
3 of 33,225
4 be 30,483
5 in 23,596
€ a 19,659
7 tO 19,482
a that 8,805
9 for 8,512
io food 7,759
11 or 7,097
12 with 6,722
13 calcium 6,565
14 as 6,319
is milk 6,299
16 have 5,865
17 Ori 4,925
is Ircm 4,327
19 intake 4,785
20 dairy 4,269
21 diet 4,204
22 it 4,182
23 by 3,944
24 fat 3,842
25 Study 3,690
26 j. 3,491
27 not 3,482
2e vitamin 3,352
25 than 3,007
30 this 2,986
31 other 2,930
32 health 2,891
33 can 2,872
34 more 2,867
35 bone 2,817
36 dietary 2,780
37 blood 2,664
3B increase 2,643
35 use 2,643
Lemma Frequency
40 effect 2,577
41 may 2,529
42 high 2,514
43 which 2,456
44 risk 2,454
45 et 2,410
46 al. 2,397
47 also 2,337
48 they 2,318
49 mg 2,262
50 al 2,222
Figure 2. Food and Nutrition frequency list of lemmas.
Figure 3 demonstrates that the word "dietary" is used more than the word "healthy", or see the difference between "significant" and "important".
adjective
(4,636 items 1102.681 total frequency)
Lemma Frequency
Lemma Frequency
Lemma Frequency
1 dietary 2,714
2 other 2,581
3 high 2,344
4 low 1,836
5 more 1,731
6 fat 1.489
7 such 1,346
8 many 1,247
9 good 1,071
id total 870
11 healthy 848
12 old 815
13 nutrient 795
14 small 784
15 less 753
16 whole 737
17 fatty 718
is low-fat 565
19 large 663
20 most 646
21 same 636
22 great B28
23 Important 616
24 human 562
25 raw 550
26 several 519
27 available 507
26 sweet 505
29 different 485
3D similar 477
31 fresh 475
32 young 474
33 due 472
34 white 467
35 common 459
36 few 456
37 clinical 441
3B american 427
39 benelicial 422
Lemma Frequency
40 early 418
41 nutritional 418
42 significant 409
43 first 395
44 saturated 368
45 daily 368
46 major 363
47 adequate 349
43 colorectal 349
49 red 348
50 SOft 338
Figure 3. Food and Nutrition frequency list of adjectives
The diagram below (figure 4) represents that the word "intake" is used four times more than "consumption". Students can use it to make assumptions about the frequency of words or lemmas on a particular topic and can check the hypothesis using the frequency lists.
I*"] Q PI <i3322 terna I 376,051 lotal frequency)
Lemma Frequency Lemma Frequency
Lemma Frequency
Frequency
i food 7,759
2 calcium 6,565
3 milk 6,193
4 intake 4,785
5 dairy 4,269
s diet 4,110
7 study 3,617
a vitamin 3,352
g health 2,891
io bone 2,769
11 blood 2,661
12 effect 2,570
13 risk 2,444
14 fat 2 328
is mg 2,262
ie nutrition 2,096
17 cancer 2,078
18 day 2 031
19 tody 1,964
20 year 1,912
21 protein 1,895
22 product 1,854
23 disease 1,796
24 child 1,781
25 woman 1,770
26 acid 1,728
27 fruit 1,587
2e weight 1,571
29 lactose 1,562
30 chapter 1,505
31 pressure 1,475
32 nutr 1,436
33 cheese 1,421
34 level 1,391
35 cholesterol 1,387
36 page 1.365
37 water 1,264
3B am 1,248
39 adult 1,242
40 source 1.183
41 consumption 1,169
42 nutrient 1,160
43 foods 1,134
44 calorie 1.101
45 group 1,098
46 vegetable 1,073
47 part 1,053
48 CUp 1,006
49 amount 1,006
so factor 976
Figure 4. Food and Nutrition frequency list of nouns.
(2'455 iienK I 122-444 to*31 frequency)
Lemma Frequency Lemma Frequency Lemrr
Frequency
I be 30,481
2 have 5,885
3 Increase 2,155
4 use 2,002
5 do 1,865
6 reduce 1,738
7 consume 1,589
8 include 1,466
9 eat 1,464
io find 1,445
11 make 1,389
12 contain 1,150
13 associate 936
14 serve 860
15 show 860
16 provide 847
17 compare 7B0
16 grow 780
19 cook 755
20 add 753
21 lead 660
22 need 651
23 indicate 616
24 see 611
25 call 584
26 age 582
27 help 580
29 follow 551
29 produce 543
30 recommend 541
31 prevent 533
32 contribute 502
33 improve 459
34 decrease 454
35 suggest 454
36 demonstrate 429
37 develop 427
3S come 426
39 randomize 426
Lemma Frequency
40 know 410
41 take 404
42 feed 399
43 become 384
44 give 381
45 accord 379
46 influence 375
47 meet 370
46 report 363
49 vary 355
50 consider 351
Figure 5. Food and Nutrition frequency list of verbs.
Using the list above (figure 5) learners can know the difference between the frequency of "eat" and "consume" or "provide" and "give". They can be taught that in an academic context the words "consume" and "provide" are used more rather than their synonyms.
□ doc#0: Promote health and reduce chronic diseases associated Decrease sodium intake (2.400 milligrams or with diet and weight less daily) Weight status and growl
□ doc#Q rough health plans Nutrition counseling for medical conditions Increase fruit intake (2+ servings daily) Increase vegetable intake (3+ servings dally) Include nut
□ doc#0 3dical conditions Increase fruit intake (2+ servings daily) Increase vegetable intake (3t servings daily) Include nutrition counseling In physician office visits Foot
□ doc#Q on counseling in physician office visits Food security Increase grain product intake (6t servings daily) Increase access to nutritionally adequate and safe foods
□ doc#Q ase access to nutritionally adequate and safe foods Decrease saturated fat intake (less than 10% of calories) for an active, healthy life Decrease total fat Intaki
[] . doc#0 itake (less than 10% of calories) for an actrve, healthy lite Decrease total fat intake (no more than 30% of calories) 'Nutrition and Overweight is one focus area [] . doc#01 slight degree, so have deaths from some cancers. </s><s>0n average, the intake of total fat and saturated fat has decreased.</s=,<s=,Food labeling provides r □ . doc#0 at the recommended 5 servings of fruits and vegetables.=/s><s>0verall, fat intake Is decreasing (from 40 percent of calories In the late 1970s to 33 percent In
Figure 6. Food and Nutrition concordance lines for the word "intake". Concordance lines above (figure 6) help to see how collocations of the word "intake" are employed together. O'Keeffe et al. (2007) describe it as a way 'to find every occurrence of a
particular word or phrase' (Yilmaz and Soruc, 2014). Furthermore, the most frequent collocates which occur within two or three words can be seen using concordance lines.
According to Jones and Durrant (2010), three essential points should be taken into consideration while applying the word frequency lists in the pedagogic context, it implies that:
1. The corpus "Food and Nutrition" is only for writing reports and should not be applied to speaking activities. The word frequency might be irrelevant for British English or other specific learners as most of the journals are by American publishers.
2. The frequency lists generated above are about lemmas; however, words can also be treated similarly.
3. While creating activities for students, polysemous words were not dealt with since the essential meanings of the lemmas were used.
Although there are a lot of various uses of the corpus, the limitations should also be noted. First, the corpus is formed to assist students to write about their diet, therefore, it includes mainly scientific articles and magazines. It does not benefit those who desire to see the frequency of a specific term in informal situations. Second, it is only for high intermediate and advanced level learners, and cannot be applied to elementary or low intermediate classes.
Discussion. According to this corpus, seven activities (See the Appendix) were designed for upper - intermediate ESL learners aged 17-18. The final learning outcome of the lesson is that students should be able to use the new vocabulary in practice. The main aim of the activities is to enable students to be familiar with the collocations and to use them in future report writing tasks. The writing task should include the collocations of discovering the corpus and/or can be assigned as homework. The activities offer the learners an opportunity to exploit the corpus directly, which enables Data-Driven Learning (DDL). Within the CLT paradigm, vocabulary learning is shifting away from teaching individual words and toward exposing students to lexical elements in authentic and relevant situations (Balunda, 2009). When compared to typical vocabulary teaching approaches such as consulting a dictionary, viewing concordance lines has been shown to result in tiny but constant gains in students' vocabulary knowledge, improved recall, and the learning of transferrable word knowledge (Balunda, 2009). The beauty of using DDL is in its autonomy as learners can discover new topics for themselves (Boulton, 2009). The designed activities for this corpus can be utilized either in the classroom or assigned as homework. The data can be reached by accessing the corpus directly or printing them and distributing them during the lesson. All activities can be done by deriving the hypothesis and testing them against further data which makes them researchers (Johns, 1991; Aston, 2001). It gives the opportunity for learner autonomy, and the classroom becomes less teacher-centralized (Gilquin et al., 2010). The activities cover the main collocations that need to be learned on the topic "Food and nutrition", and these are word phrases with "protein", "carb", "fat", "consume", "nutrition", "food", "intake" and "eat". Types of the activities are gap-filling, guessing, and making sentences, and mostly require quantitative analysis of the corpus. However, the last activity can be an example of qualitative analysis of corpus when other larger corpora such as BNC or COCA are used to compare the data.
Conclusion. Reflecting on the experience at the usage of the activities several things should be noted. The corpus was introduced in two upper - intermediate classes; both classes received the activities in printed forms. Out of 24 students almost 80 % were curious about the approach they found new. The rest of the learners described it as "time - consuming". More than half of the students created a new corpus to compile written essays on other topics to see the most frequent
collocations. The majority of the learners liked the concordance lines as they offered to compile a corpus of listening transcripts of their course book. One of them noted that it would be great to see the forms of collocations in the context. The usage of the corpus caused the students to become curiosity - driven learners (Aston, 2001). By the end of the lesson, the students developed reports including the collocations.
The need for new ways of learning and teaching languages is increasing, therefore, employing corpus - based approach in a pedagogical context can lead to students' success. This report tried to describe the appliance of the corpus designed for B2 level learners on the topic "Food and Nutrition". Although there were several benefits of using the particular corpus, some limitations were observed too. The activities designed were to introduce vocabulary only, however, it would be a better idea to integrate some grammar focused activities. I believe that in the future, a corpus-based approach to teaching will become one of the most applicable.
REFERENCES
1. Almutairi, N.D. (2016). The Effectiveness of Corpus- Based Approach to Language Description in Creating Corpus-Based Exercises to Teach Writing Personal Statements. English Language Teaching, 9(7), p.103. doi:10.5539/elt. v9n7p103.
2. Aston, G. (2001). Learning with corpora. undefined. [online] Available at: https://www.semanticscholar.org/paper/Learning-with-corpora-
Aston/08cfc 19d84291306d240962f6d6a9115b810c264 [Accessed 4 Dec. 2022].
3. Balunda, S. (2009). Teaching academic vocabulary with corpora: student perceptions of data-driven learning. [online] Available at: https://scholarworks.iupui.edu/bitstream/handle/1805/2049/Balunda%20MA%20Thesis%20 Teaching%20Academic%20Vocabulary%20with%20Corpora.pdf [Accessed 4 Dec. 2022].
4. Boulton, A. (2009). Testing the limits of data-driven learning: language proficiency and training. ReCALL, 21(1), pp.37-54. doi:10.1017/s0958344009000068.
5. Cheng, W. (2010). What can a corpus tell us about language teaching. undefined. [online] Available at: https://www.semanticscholar.org/paper/What-can-a-corpus-tell-us-about-language-teaching-Cheng/f0cb0b67a672486df52babc4b0ca2fa152f4f6eb [Accessed 4 Dec. 2022].
6. Dazdarevic, S., Fijuljanin, F. and Rastic, A. (2015). Using Corpus in Enhancing Reporting Verb Patterns in Teaching/Learning Process. Epiphany, 8(2), p.131. doi:10.21533/epiphany. v8i2.166.
7. Johansson, S. (2009). Some thoughts on corpora and second-language acquisition. Studies in Corpus Linguistics, pp.33-44. doi: 10.1075/scl.33.05joh.
8. Johns, T. (1991). Should you be persuaded. Two samples of data-driven learningmaterials. undefined. [online] Available at: https://www.semanticscholar.org/paper/Should-you-be-persuaded.-Two-samples-of-data-driven-Johns/4b146bc51031fff7c159096da40524a2edbc098c.
9. O'keeffe, A., Mccarthy, M. and Carter, R. (2007). From corpus to classroom : language use and language teaching. Cambridge ; New York: Cambridge University Press.
10. Pravec, N. (2002). Survey of learner corpora. [online] Available at: http://korpus.uib.no/icame/ij26/pravec.pdf [Accessed 4 Dec. 2022].
11. Reppen, R. (2010). Using Corpora in the Language Learning Classroom. Cambridge: Cambridge University Press.
12. Staples, S. (2013). Review: Gilquin, De Cock and Granger (2010)Louvain International Database of Spoken English Interlanguage. Louvain-la-Neuve, Belgium: Presses Universitaires de Louvain. Corpora, 8(2), pp.261-264. doi:10.3366/cor.2013.0043.
13. Yilmaz, E. and Soruç, A. (2015). The use of Concordance for Teaching Vocabulary: A Data-driven Learning Approach. Procedia - Social and Behavioral Sciences, 191, pp.2626-2630. doi:10.1016/j.sbspro.2015.04.400.