Methodology of data extraction from a corpus for the conceptual analysis of metaphor in legal English

Kucheruk Liliya Vladimirovna

Section 13. Philology and linguistics

Section 13. Philology and linguistics Секция 13. Филология и лингвистика

Kucheruk Liliya Vladimirovna, Oles Honchar Dnipropetrovsk national university,

Postgraduate student E-mail: lkucheruk@mail.ru

Methodology of Data Extraction from a Corpus for the Conceptual Analysis of Metaphor in Legal English

Abstract: The present paper entitled “Methodology of data extraction from a corpus for the conceptual analysis of metaphor in legal English” investigates the main approaches to the study of metaphor in legal language from the point of view of cognitive linguistics. It also offers the most widely spread methods of data extraction form a corpus.

Key words: corpus-bases approach, conceptual mapping, source domain, target domain, corpora.

In the last 20 years, corpus-based approaches to language have achieved great significance in linguistics, and are now regarded as an indispensable component of language study (irrespective of theoretical affiliation). Corpus driven studies are required in research fields that consider different levels of linguistic structure, and focus on language use.

But studies in cognitive semantics, especially those connected with conceptual mapping, are still in dire need of more corpus research, as many such studies are still being carried on some randomly selected data. Such an approach to the investigation of conceptual mappings, namely to the investigation of conceptual metaphors, may eventually create problems, especially when the aim of the research is a systematic characterization of a particular conceptual mapping, or of source/target domains. Thus, it turns out to be necessary to ground the research on some representative empirical data, which is authentic and can be accessed via relevant corpora. Only empirical data will “enable the linguist to make statements which are objective and based on language as it really is”. Such statements are to be contrasted with more “subjective” statements “based upon the individual’s own internalized cognitive perception of the language” [2, 103].

168

Секция 13. Филология и лингвистика

The first problem that should be solved in the process of using a corpus-based approach to the investigation of metaphors is that of extracting and identifying the data from the corpus. Extracting relevant material from a corpus becomes extremely difficult when applied to metaphors, as the way in which the process of conceptual mapping takes place is not tied to some particular linguistic forms. “Computer programs can organize language data swiftly and accurately on orthographic principles, but identifying and describing features such as grammatical patterns, meaning, and pragmatic use can only be done by a human analyst" [1, 92]. So a researcher who is investigating particular linguistic units or patterns has to look through a considerable amount of linguistic material, searching for some definite manifestations of the patterns in question. The process for extracting and identifying relevant data in a corpus should be carried out by the following procedures:

1. manual searching;

2. searching for the vocabulary of a source domain;

3. searching for the vocabulary of target domain;

4. searching for sentences containing items from both the source and target domains;

5. extraction from a corpus annotated for semantic fields;

6. extraction form a corpus annotated for conceptual mappings;

7. searching for metaphors based on “markers of metaphors”.

Manual search consists of starting from a small corpus, or from a small part of an already existing corpus, and searching it manually, marking out all the metaphors one comes across. Then, one proceeds with a larger corpus, searching for the marked metaphors in it. This is a rather efficient method, as it offers the possibility of reading a small corpus, or a part of a corpus, entirely and thoroughly, to identify all the existing metaphors in it, and by searching for them in a large corpus to receive more generalized linguistic results. On the other hand, this method of retrieving relevant material limits the potential size of a corpus to a great extent.

As for extraction from a corpus annotated for semantic fields, this procedure implies searching for particular linguistic items in the source and target domains that have previously been tagged in the corpus. Using this method, a researcher can specify a particular source domain, and analyze all the lexical items related to it, instead of manually searching for lists of lexical expressions-a somewhat tedious and often frustrating process that usually yields incomplete lists. The analysis of target domains, and the search for sentences containing both the potential source and target domains can be carried out as well. As already mentioned, the main drawback of this strategy is the rare availability of annotated corpora. Other disadvantages include the fact that a researcher must then rely almost exclusively, in his investigations, on previously existing annotations. Also, semantically annotated corpora might not include the relevant semantic fields for a particular piece of research.

169

Section 13. Philology and linguistics

Extraction form a corpus annotated for conceptual mappings consists of using corpora annotated for conceptual mappings. Unfortunately, there is no such corpus, as its creation is fraught with difficulties and complications. The conceptual annotation of a corpus poses raises the following issues:

1. definition of a reliable procedure for discovering instances of the phenomenon in question;

2. definition of the attributes that are considered relevant for each instance and the set of values that each of these attributes can take as well as guidelines as to how these values are to be assigned;

3. definition of an annotation format [5, 10].

In the present circumstances, the first two requirements are hard to fulfill because there is no general approach for the identification of metaphors in a text. Therefore a researcher must rely, to a great extent, on intuition, knowledge, and general experience. The identification of metaphors might be an easy task in some exceptionally clear cases. Even then, it would still be a challenge to provide complete and accurate lists of metaphoric expressions. Whether the identification is simple or complex, the annotator must prove the theoretical grounds the criteria applied during the annotation process. In general, the annotation is tied to a specific research project, which means that there is still a long way to go before multi-purpose annotated corpora are designed in the field of metaphor research. The problem of choosing the relevant attributes for a specific linguistic phenomenon also arises during the process of conceptual annotation. For example, in the case of metaphor and metonymy, there exist some general attributes for annotation, such as source and target domains. These are metaphoricity, metonymicity, degree of conventionality, the reason for using metaphor, etc.

Searching for metaphors based on “markers of metaphors” is another method for extracting metaphoric material from a corpus was advocated by Andrew Goatly [4]. Some linguistic markers, he claimed, pointed to the existence of metaphors in discourse. Goatly [4] thus defined “markers of metaphors” as the words and phrases occurring in the environment of a metaphor’s vehicle term, or a unit of discourse that unconventionally refers to, or colligates with, the topic of a metaphor, on the basis of similarity, matching, or analogy. The Explicit markers, intensifiers, hedges and down-toners, or symbolisms are determined on a functional basis; semantic metalanguage, mimetic terms, perceptual processes, misperception terms, or cognitive processes are connected with semantics; and modals, conditionals, or copular similes represent grammatical categories. Unfortunately, initial evaluations of the method by Walling-ton, Barndeb, Ferguson, and Glasbey have clearly established that Goatly’s list of metaphorical markers is not a sure indicator of the presence of metaphorical expressions in a text. So other strategies have to be adopted, based on searching a corpus for items belonging to the source or target domains present in conceptual metaphors.

170

Секция 13. Филология и лингвистика

Searching for the vocabulary of a source domain. The first step for this type of investigation involves finding the existing linguistic items (metaphors), or whole sets of such items, which represent a particular conceptual metaphor. A list is made, then a search is carried out with a concordance program to see if the items are present in the corpus, as “the computer cannot work from a list of conceptual metaphors to identify their linguistic realizations" [1, 93]. The selection of these linguistic items can be based on hypothetical decisions, on already existing lists of such items, or on a preceding analysis of the keywords of the texts connected with the target-domain topics. Once metaphors have been retrieved from a corpus, they can be further classified into subgroups and sub-types.

Searching for the vocabulary of target domain. Different scholars have suggested different ways of working with target domains for retrieving relevant data. In his method, which is based on searching keywords in the target domain, Partington [3] suggests creating lists of terms characteristic of particular genres of discourse, analyzing them, then running a concordance for items that appear in more than one key list, or which seem to belong to the same semantic set. According to Partington, such an analysis would help reveal some systematic metaphors for experimentation, and to distinguish the particular cases of their use. This method has its strong and weak points. Its main weakness lies in the fact that, for such an analysis, one needs a huge amount of homogeneous monothematic texts, connected to the target domain. Another weak point is that, in order to become a keyword expression, a word should be widely represented in the target domain, and, thus, this type of analysis will reveal only those source domains that are widely represented by the keywords in the target domain. Thus, the method will not provide utterly reliable results.

Searching for sentences containing items from both the source and target domains. Closely related to the previous two strategies — searching for vocabulary from the source or target domains — is the method that involves searching for sentences containing the items from both the source and target domains. Using this method, a researcher should look for sentences including the vocabulary from both source and target domains. This method requires exhaustive lists of source and target domain expressions. But to eliminate errors caused by the literal use of linguistic items, both in the source and target domains, a fair amount of manual processing and editing is required, which can be burdensome. Also, it is rarely possible to find complete lists of target and source domain expressions, so some units are likely to be missing and the results of the investigation cannot be foolproof. Furthermore, this method can only be used to reveal conceptual mappings, and is thus restricted to the metaphorical expressions known beforehand. The main advantage of this method is that it can be used for the analysis of large numbers of texts, and that, as it is based on an annotated corpus, it can be processed automatically.

171

Section 13. Philology and linguistics

All the strategies used to extract relevant material from a corpus while investigating conceptual mappings have their advantages and drawbacks. None can give complete and reliable results, since all depend on the quality of the software used and on the experience or intuition of the researcher. Thus, in order to receive reliable results is necessary to use a combination of the methods mentioned above.

References

1. Deignan, Alice. Metaphor and Corpus Linguistics. - Amsterdam:John Benjamins, 2005. - 235p.

2. MacEnery, Tony, Corpus Linguistics: An Introduction. - Edinburgh: Edinburgh University Press, 2001. - 235p.

3. Partington, Alan, In: Anatol Stefanowitsch and Stephan Thomas Gries (eds). Corpus-based Approaches to Metaphor And Metonymy, 267-304. - Berlin: Walter de Gruyter, 2006.

4. Goatly, Andrew. The Language of Metaphors. - London: Routledge, 1997.

5. Stefanowitsch, Anatol, “Corpus-based Approaches to Metaphor and Metonymy”. In: Anatol Stefanowitsch and Stefan Th Gries (eds). Corpus-based approaches to metaphor and metonymy, 1-17. - Berlin: De Gruyter Mouton, 2006.

172

Methodology of data extraction from a corpus for the conceptual analysis of metaphor in legal English Текст научной статьи по специальности «Языкознание и литературоведение»

Аннотация научной статьи по языкознанию и литературоведению, автор научной работы — Kucheruk Liliya Vladimirovna

Похожие темы научных работ по языкознанию и литературоведению , автор научной работы — Kucheruk Liliya Vladimirovna

Текст научной работы на тему «Methodology of data extraction from a corpus for the conceptual analysis of metaphor in legal English»