Научная статья на тему 'On native semantic roles - comparative study based on data from child language acquisition of English and French'

On native semantic roles - comparative study based on data from child language acquisition of English and French Текст научной статьи по специальности «Языкознание и литературоведение»

CC BY
128
16
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
LANGUAGE ACQUISITION / CORPUS ANALYSIS / MENTAL REPRESENTATION / CONCEPT FORMATION / LANGUAGE FACULTY

Аннотация научной статьи по языкознанию и литературоведению, автор научной работы — Slavova Velina

This study explores statistically child language-acquisition using data extracted from large collections for acquisition in two languages English and French. Comparison of the two collections reveals that the advancement in acquiring vocabulary displays very big differences when the children’s speech is classified by the parts of speech deployed, as these are formally defined in the two languages, despite there being no reasons to suppose that the two language groups of children should show significant differences in cognitive development. The hypothesis put forward is that there exist general classes of meaning-representation and the challenge is to obtain evidence corroborating this. A specific set of classes is proposed, derived according to their different contributing roles in the mental representation of the world, considered from the perspective of an “Actor in the environment” cognitive model. The identified parts of speech from the two languages are sorted into the proposed classes. It is shown statistically that when children’s speech is discriminated to these classes, the acquisition processes in the two languages are very alike. Examining the data, the use of these classes is evident from the onset of language production. Some particularities related to factors influencing the use of communicators, interjections and onomatopoeias in children’s speech are discussed in addition to the study’s overall findings.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «On native semantic roles - comparative study based on data from child language acquisition of English and French»

ON NATIVE SEMANTIC ROLES - COMPARATIVE STUDY BASED ON DATA FROM CHILD LANGUAGE ACQUISITION

OF ENGLISH AND FRENCH

Dr. Velina Slavova, New Bulgarian University, Department of Computer Science, Bulgaria

E-mail: vslavova@nbu.bg

A R T I C L E I N F O

Original Research Received: May, 31.2017. Revised: June, 14.2017. Accepted: June, 27.2017. doi:10.5937/IJCRSEE1702001S

UDK

159.946.3.072-053.5 81'23-053.5

Keywords:

language acquisition, corpus analysis, mental representation, concept formation, language faculty.

A B S T R A C T

This study explores statistically child language-acquisition using data extracted from large collections for acquisition in two languages - English and French. Comparison of the two collections reveals that the advancement in acquiring vocabulary displays very big differences when the children's speech is classified by the parts of speech deployed, as these are formally defined in the two languages, despite there being no reasons to suppose that the two language groups of children should show significant differences in cognitive development. The hypothesis put forward is that there exist general classes of meaning-representation and the challenge is to obtain evidence corroborating this. A specific set of classes is proposed, derived according to their different contributing roles in the mental representation of the world, considered from the perspective of an "Actor in the environment" cognitive model. The identified parts of speech from the two languages are sorted into the proposed classes. It is shown statistically that when children's speech is discriminated to these classes, the acquisition processes in the two languages are very alike. Examining the data, the use of these classes is evident from the onset of language production. Some particularities related to factors influencing the use of communicators, interjections and onomatopoeias in children's speech are discussed in addition to the study's overall findings.

1. INTRODUCTION

How the brain forms and organizes meaningful concepts and how the faculty of language permits structured expressing of their meaning are questions of fundamental theoretical importance to linguists and cognitive scientists alike. Thanks to the emergence of powerful brain-imaging technologies, neu-roscientific research has made significant strides in revealing anatomically the brain's activity pertaining to the cytoarchitectural organization of concepts and their labelling with words. The neuronal networks found to be involved in this activity appear to implicate the entire brain. All of these studied phenom-

Corresponding Author

Dr. Velina Slavova, New Bulgarian University, Department of Computer Science, Bulgaria

E-mail: vslavova@nbu.bg @©@©

This work is licensed under a Creative Commons Attribution - NonCommercial - NoDerivs 4.0. The article is published with Open Access at www.ijcrsee.com

© 2017 IJCRSEE. All rights reserved.

ena can be assumed to be universal biological properties of human brains.

Considering this wide-ranging body of research permitted Slavova and Soschen (2015 a, b) to amalgamate its findings with a hierarchical information-treatment model and propose a general theory explaining how perceptual experiencing of environmental phenomena and interacting with them provides a working basis for forming concepts and subsequently associating them with particular words. The model presented in Slavova and Soschen describes the process by which humans in general progressively acquire the faculty of language during infancy. This model supposes that syntax is founded on concept semantics. It suggests that the internal creation of semantic description of the world and the mental treatment of language syntax are products of one and the same principles of information processing. The underlying mechanisms were identified as based on multimodal perception, interoception, proprioception, the mirror neuron network and default mode network — all of them ready to run in a synchronized manner

at birth. Following this model, the process of establishing semantic description of the world initially ensues automatically as the result of interacting with it, and in accord with some underlying principle of structuring meaning.

When children learn their first language, the meaning of the words used in spontaneous communication can suggest the structure of their mental world. At the same time, the language input that children are exposed to is of crucial importance. That is why investigating the structure of the primary semantic description of the world and its constituents necessitates analysing a wide diversity of languages. As a first effort toward such an investigation, the present paper's author undertook a large-scale study of corpora of recorded utterances collected from English and French infants over the formative period of language-acquisition. This paper presents the procedure of the study and the results of its analysis.

2. PROCEDURE AND DATA

Data from 42 corpora containing 1,515 free dialogues with child speech in English and in French, annotated with part of speech and grammar, were extracted from CHILDES (Child Language Data Exchange System) and used for the statistical analyses presented here (Appendix A).

Child speech dialogues (written, audio and video recordings) are stored with their transcripts and available on-line in the CHIL-DES data repository. They are collected (in separate corpora) and transcribed by researchers in language acquisition using the standard developed over the course of several decades especially for the Exchange system (see Mac-Whinney and Snow, 1985). The transcription is performed using CLAN (Computerized Language ANalysis), a computerized system designed specifically for the Exchange system's standardized format (Appendix B). Important for the study presented here is that the stored transcripts include for each speech utterance a separate line marked with "mor%", created by the transcribers using the computerized tools, developed for supporting the annotation in a large number of target languages. This line contains the system's standardized symbols for the parts of speech (POS), based on Hausser's MORPH system (Hausser, 1989, see MacWhinney, 2012).

For the purposes of this study the transcripts, with the entire linguistic annotation, were stored locally. Additionally, a number

of tools were developed for extracting the transcripts from CLAN format and organizing them in a relational database where each dialogue and each utterance is tagged with a unique identifier (Appendix B). A more detailed description of the procedure, the tools, the data treatment and the technical aspects of the data organization and representation are provided in Slavova (2016).

The English data collection used in this study contains 620 dialogues (with 62 girls, 66 boys, and 7 children with gender not specified in the source); the French collection contains 895 dialogues (with 157 girls and 141 boys). Some children are "recorded" during several successive months and some are not. The parameters of the dialogues are as follows: the English data contains in average 520 utterances of different participants in a dialogue, where 202 child utterances; the French data contains in average 388 participants' utterances in a dialogue, where 171 are utterances of the child.

Next, a large number of queries were elaborated to select, regroup and calculate parameters of the child speech utterances. In the present study, the results for two large corpora collections — 125,873 child speech-utterances for English language production and 153,824 for French language production — were treated statistically. On average 2,400 utterances per child-month were treated, taken from different corpora, belonging to different children aged between 6 months and 62 months and originating from dialogues taking place in different circumstances and selected by different researchers over the course of the last four decades.

Examples of the children's utterances extracted from the dialogues, with their POS annotation, are given in Appendix B. The language-related parameters presented further in the statistical analysis are obtained by parsing the mor% annotation extracted from the annotated dialogues in CHILDES.

Observation of children's utterances in the two languages confirmed several known facts regarding language acquisition: The first pronounced distinguishable word- forms appear around 10-13 months in single-word expressions, and with the development of the child's overall capacities, utterances become longer, expressing increasingly complex ideas. At approximately 26 months all the analysed utterances have a phonological content comprising at least one word-form identifiable as belonging to the given language (Fig. 1). Following the collected data, after the age of 62

months, child speech starts to contain complex and subordinate sentences in a single communication utterance.

"12 36 Age (mi-nths)

12 " 136" ■ ■ m Age (mi-ntha)

Figure 1. Utterances with phonological content recognizable as word-forms in the given language. Ratio over all utterances. (girls - red, boys - blue)

The task in this study is to judge how the global content of the speech develops in terms of mental images that underlie the meaning-expression.

3. STATISTICAL ANALYSIS OF THE POS ACQUISTION

For studying the use of POS, the statistical analysis relies on the annotation performed by the authors of the respective data-corpora in the data collections (Appendix A). In the dataannotation scheme deployed in the CHILDES source, the word-forms produced by children are classified by POS as they are distinct in the corresponding languages. This annotation uses 33 POS for the English collection and 30 for the French.

In order to obtain a measure for the contribution-weight of a given POS within a speech-utterance, the following formula for calculating the Ratio per Utterance (RU) of the given POS within a dialogue was applied:

RU(POSij) =

NPOS

y

Ni

(1)

where :

POSi is one of the POS annotated in the corpora,

j is the dialogue,

Nj is the number of utterances with recognizable POS in the dialogue j.

NPosij is the number of the POSi in the dialogue j.

By applying the formula (1) a RU was obtained for all the POS for each of the 620 dialogues in English and 865 dialogues in

French. The RU show the extent of use of the given POS for expressing the child's notions within an exemplary utterance "averaged" for the dialogue. They can be seen as weights of the use of given POS for expressing the meaning communicated by the child within the dialogue. The RUs were used in the further analyses of the similarities.

The statistical result displayed in Fig. 2 is consistent with specialized studies in language acquisition. For example, it has been shown (Bassano, 2000) that in French acquisition, between the ages of 14 and 30 months, nouns clearly predominate over verbs, but that verbs are however produced in the early stages. The statistical result shown on plot of POS-acquisition does not contradict either the time-scale of acquisition of different POS reported in the specialized studies for English.

0 S 12

Figure 2. Development with children's age of the RU of the POS in the English and French data

As shown in Fig. 2, the developmental paths for the use of different POS in the utterances produced by English-acquiring and French-acquiring children are quite different. This difference can be attributed to language-

specific features, as obviously the grammatical particularities of English and French are different

Table 1 gives the correlations of the use of identical POS in the two languages (be-tween-languages correlation). This correlation is small, taking into account that the use word-forms in the considered period is described by an increasing function.

The between-languages correlation presented in Table 1 is for the 25 identically labeled POS in the two language-corpora (in descending order following the correlation values), for the 50 months (from 11 to 61) which have data collected for both languages. The p - values which suggest statistically unreliable result are given in italic. The between-languages correlations are obtained based on month-to-month correspondences. That is, each POS RU of all the dialogues belonging to equally aged (in months) children are averaged within the same month, for each of the languages, and after that compared.

The average between-languages correlation for the use of identical POS is only 0.46. At the same time, for the period investigated, the Sums of the POS RU (roughly - the length of the utterances) develop in a very similar way, correlated at 0.88, i.e. higher than the maximal POS-to-POS correlation.

Table 1. Between-languages correlations for the development over the time of language acquisition of POS RU

POS: Correlation P-value

Preposition RU 0,840 0

Determiner RU 0,830 0

Pronoun Subjective R.U 0,825 0

Verb RU 0,818 0

Conjunctions RU 0,812 0

Verb Auxiliary RU 0,795 0

Adverb RU 0,721 0

Noun - common RU 0,697 0

Verb Modal RU 0,600 0

Relativizer RU 0,553 0

Pronoun Demonstr. RU 0,541 0

Determiner Numeric RU 0,517 0

Pronoun RU 0,483 0

Pronoun Objective RU 0,471 0

Vbs. Participles RU 0,430 0

Quantifiers RU 0,370 0

Onomatopoeia RU 0,352 0,01

Pronoun Inteuogat. RU 0,349 0,01

Pronoun Indefinite RU 0,331 0,01

Pronoun Reflective RU 0,327 0,02

Adjective RU 0,304 0,02

Pronoun Possessive RU 0,299 0,03

Interjection RU -0,084 0,04

Noun -proper RU -0,197 0,56

Communicators RU -0,415 0,17

Average POS-to-POS 0,405

There are no reasons to suspect that the two languages are very different at the level of POS-structure as they are close representatives of one and the same family and are of the same morphological type. As seen in the correlation table, the POS which are used in the most comparable way are the prepositions and the determiners.

There are also no reasons to suspect that, in terms of conveyed meaning, the 1-. 2-, 3- years old English-acquiring and French-acquiring children have very different ideas to communicate.

Specialized studies discover interactions between semantic and grammatical development. For example, Bassano (2000) found that between 14 and 30 months, verb- and noun-grammati-calization in French is related to the production of concrete action verbs and to concrete object nouns. Bassano proposes that "These findings, discussed in a cross-linguistic perspective, suggest that both conceptual and grammatical packaging are important and interacting factors in noun and verb development". This idea is strongly supported by all the contemporary research presented in the book edited by Hirsh-Pasek and Golinkoff (2006), "Action meets word: How children learn verbs."

Table 2. Example - expression of meaning and desires in a dialogue of a 15 months old child. (CHI - child, MOT - mother, FAT - father, SIS - sister)

Data extracted from the dialogue

tazl5.bw

"■CHI Mommy Mommy Mommy.

"■CHI tee f: tree") tee [ tree].

"■CHI Mommy tee [: tree].

"■CHI Dada.

"■CHI Mommy.

"■MOT: what?

"■CHI out.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

"■CHI: baby.

"■CHI hi.

"■FAT : hi. Laura.

"■CHI hi.

"■SIS: hi. Laura.

"■CHI Mommy!

""MOT: what Lama.

"■CHI ah Dada.

"■MOT: what's the matter?

"■CHI dee'S'b dee(S;b.

""MOT: oh my.

"■CHI out Dee baba [= bottle].

"■MOT: what?

"■CHI hi. Dada hi. Dada Mommy.

"■CHI car car car car.

"■CHI car.

"■CHI key key key.

The core meaning-related question to be clarified concerns the very different paths of acquisition of the POS-constituents in the two languages (Fig. 2). The children's utterances in the corpora are most often incomplete and grammatically incorrect sentences, but they express the meant quite well (an example is provided in Table 2). In fact, the used measure - the POS RU reflects statistically the profile of the words produced within a dialogue and does not reflect the sentence level.

The next section presents the approach proposed here in order to find a common semantic organization which can explain the language acquisition processes as similar. Such a structure is supposed to equilibrate the difference displayed in the acquisition process measured at the POS level.

4. COGNITIVE MODEL

It is largely agreed in cognitive science that learning to assign meaning to sensory stimuli lies at the foundation of human cognition (e.g., Glezer et al. 2015). Unaided by language, infants from birth are able to begin forming meaningful conceptual knowledge about entities they perceive on a basis of interacting purposively with them, and to apply this to their interactions in increasingly structured ways and diverse contexts. Several contemporary findings suggest, as proposed herein, that the brain can execute internal meaning-related processing independently of language - that is to say, on a basis of processing of intact perceptual representations (here termed information-units) prior to, or in default of, their corresponding conceptual representations' lexicalization. A study by Moran and Tommerdahl (2011) of an 8-year-old child raised in a social environment, but without

contact with spoken or signed language, found that the child exhibited no clear evidence of cognitive deficits. The celebrated case of Helen Keller provides similar testimony. In addition, several studies (e.g., Frishberg, 1987, Torigoe and Takei, 2002) have investigated cases of home-signing, that is, the inventing of sign-languages by groups of two or more hearing-impaired individuals who have not been taught a conventional sign-language. The phenomenon of home-signing suggests that humans are predisposed to elaborate systems exhibiting a language-structure in order to share their internally created concepts, and use them for communicating. The main question addressed here concerns the existence of some primary structure behind these internally created concepts.

The present study's approach focuses upon the progressively elaborated content of children's spoken utterances. The creation of meaningful representation is portrayed in the proposed model as a process in which inborn information-treatment mechanisms organize information-flows obtained in interaction with the world so as to establish distinct units of meaning. In the model presented by Slavova and Soschen (2015a., b.) this process was termed "meaning-encapsulation". It is supposed here that the meaning-encapsulation process is ready to run at birth. The early period of language-acquisition is assumed to be underpinned by the processes of meaning-encapsulation

Two separate regimes within the language-learning process are considered in the basic scheme of the proposed model — the analytic regime, related to language comprehension, and the generative regime, related to speech production or sign-language production (Fig. 3).

Figure 3. General scheme underlying the proposed model

The analytical processing in language acquisition is concerned with the assigning of meaning to words (or expressions). The acquisition of language-labels and rules and their use is seen as mapping between a child's own meaning-related representations and the labels and rules used in the language-environment (Fig. 3). This process can be presented as blending of the internally "encapsulated" units with the language ingredients. Its outcome is the creation of a lexicalized concept - a capsule with a "name". These names are further involved as words in the generative processing.

It should be taken into account also that children often invent their own labels for the concepts they have formed. These labels (perhaps arbitrary, a la Saussure (1916), perhaps sound-symbolic or to some extent phonetically matching the words used in the language environment) are used by children in communicating, and with persistence (an example is given in Table 2).

Additionally, the term idea is used herein to signify a consciously represented thought, generated by neuronal operations on meaningful units. The process of idea-generation involves creating the assembly of distinguishable meaningful units that express the thought. Recent medical studies suggest that some brain-impairments can harm the regulating of the mechanism of idea-generation. For example, Robinson and colleagues (Robinson et al., 2015) suggest that "When a "brake" to stop message generation mechanism is damaged at the level of conceptual preparation, the speaker will have difficulty stopping new thoughts from being created, generated, and expressed as overt speech.".

Language production commences after the analytic regime has created memory-paths necessary for the retrieval of the labels and rules used in the language. Recent research has shown that children know the meaning of many nouns at the age of 6 months (Bergelson et al., 2012), while the first pronounced words start at 10-13 months. In the speech-generative regime, the child aspires to communicate to others a certain idea, which presupposes its existence already in the child's mental space. One main question addressed here concerns the idea-ingredients, given that the meaningful units are compiled by the analytical processing-regime, the settings for which are assumed to be biologically determined.

When the idea is to be communicated, it has to be converted into expressible language-items, represented in memory. The question to

be further addressed concerns what types of conceptualized items are added to children's expressions during the acquisition with regard of their role in the mental representation of the world. As children expressions become more and more reach, the next question is in what proportion and how the growth happens happens as regards of the type of meaningful units.

The analysis that follows relies on the use, during the initial stage of language acquisition, of different parts of speech (POS). POS are seen as offering the building blocks of the language's content, which (by the present hypothesis) has to be bonded to meaningful representations (Fig. 3).

The next section proposes a set of roles that can be distinguished in the mental representation of the world, suggested as basic classes of concepts.

5. MENTAL REPRESENTATION OF THE WORLD - THE MEANING CLASSES HYPOTHESIS

The overall approach in this study looks at meaning as constituting the bedrock of the faculty of language. This is opposite to some linguistic views, which suppose that meaning exists because of language. Obviously, if meaning is a consequence of language, it needs to be explained from where and how language has arisen in order to introduce the meaning that it carries; a scientifically plausible explanation has yet to be put forward, however.

The reasoning in this section relies on the widely accepted psychological model proposed by Lawrence Barsalou (Barsalou 2003), according to which the conceptualizing and in general the mental representing of the world are the undertakings of an Actor, acting in the environment. The hypothesis developed here -termed the "Self-centered model of language faculty" (see Slavova and Soschen 2015 a, b) and derived from Barsalou's model - accords to the concept of Self a central role in developing language faculty. Its authors reasoned that meaningful units of information are created by a substrate of inborn mechanisms that have the task of ensuring the survival of the biological system (the Self) as an "Actor in the environment". The Actor's mental representing of the world can be thought as a system of such meaningful units.

Categorization of meaningful information as distinctive classes is possible if the purpose (i.e., usage) of the information has been determined. The hypothesis put forward here is that the mechanisms that have arisen during evolutionary development have played the role of internal generators of information necessary for the survival of the biological species in the environment. Thus the mental mechanisms responsible for the generation of meaningful units are presumed to build units of importance for the species' survival that are assignable to such classes of information. Each class would have a specific role for representing the environment (the "world") as internal meaning.

To determine the classes, the first step was to deduce the general information-types that the newborn (an autonomous system) should have in order to act adequately with regard to his Self as Actor. Based on this, the environmentally presented "realities" that have to be organized as information units in order to further operate on them are supposed to be related to: 1. Physically negotiable objects in the environment, of significance for its functioning and existence (e.g. energy supply, obstacles, dangers etc.), 2. Their behavior (e.g. actions, states, intents etc.), namely,

that which is comparable with the actions and states of the system and of importance for the system's behavior, 3. The manner in which the environment is "organized" and changes (e.g. the spatial and temporal particulars of the significant objects, relative to the system's own functioning in space and time), 4. The qualitative features of the environment (e.g. the same color or form in separate objects) that are of importance for the system's survival in the environment and 5. The quantitative parameters of the environment (e.g. evaluation of proportions between the objects or between groups of objects).

It should be noted that the reasoning followed takes into account social concepts, science-related concepts etc. as these are presumed of importance for the survival of humans as a species, or, at least, for the survival of humans as they have evolved up until now.

Together with the analysis of the speech data from the corpora, this led to the following classes (Table 3): Entitles, Relationships, Circumstances, Quality and Attribution, Quantity and Precision and Others.

Table 3. Proposed meaning-classes in the model "Actor in the environment". Spread of the

POS with examples of annotation.

„ Examples of annotation, Meaning-Classes _ , „ Examples of annotation, Meaning-Classes , 3 French

1. Entities 1. Entities

Self I, my Name, me. my, myself, mine, Baby Self je. moi, mon Nom. bébé. mon. ma, mes, mien

Common Noun njmaji, njwoit Comm. Noun nicherai. n|pied: u|crasse, nj lapin

Proper Noun njDada, n:prop|Uncle. n:propjJoe Proper Noun n:propPapa. n:prop|Raphaël

Pronoun Subj. pro:sub he. pro:sub|tlieY Pronoun Subj". pro : subj |il, pro isubj|on

Pron. Object. pro:obj|me, pro:obj|them Pron. Object. pro:obj me, pro:obj le

Pronouns projit pro|you Pronouns projmoi. pro|toi. prc|eux

Pron. Reflect. pnozrefl myself, pro :refl yourself Pron. Reflect. pro:refl se (+v|garer), pro:reflse$v;appeler

Pron. Interrog. prorwhjwhat pro:whfwhii Pron. Interrog . pro:mt|qui pro: int.quoi

2. Relationships 2. Relationships

Verb - Action y go, vfini v|sit, v|futish Verb Action T|marcher, v connaître, v|dire,

Verb - Modal mod|can. mod mil. mod|do Verb -Modal ion défaire, v:mdl|rouloir, v:mdl|aller

Verb - Auxiliary aux|be, auxjhave. aux|get Verb -Auxil v:anxj avoir, v:aux|être

Verb To Be cop [be v;ani|être

Participles partimix. part|go, partjuse Participles particasser; part|bcire.. part|tomber,

Circumstances J Circumstances

Adverbs aidv|out adv|there. adv|almost, adv|down Adverbs ad v:placel dehors, adv| d1 abord, adv|tres, adviaussi

Prepositions prep| at prep on, prep^vith Prepositions prep|a. prep|avec, prep|dans, prep|motns

Pron. Denionstr. pro: dem|that, pro :dem|tiiere Pron. Dem. pro:dem|ce, pro:dem|i;a.

Conjunctions conj but, conj when, conj because Conjunct. conj parce pro:retque, conj.si

Cooridoners coord|and conj eit,

Relativisera rel|what, reljwhere Relativ. pro:rel|quoi, pro:rel|ou: pro:re]|que

4. Quality and At cri but ion S Quality and Attribution

Adjectives adj|broivTL adj|big, adj good Adjectives adj|blaito, adj|petst adjjbeau,

Pron.Possesive pro:possidet!my, pro:poss:dethis Pron. Possess. det:poss|moiL det:poss sa

5. Quantity and Precision Quantity and P recision

Numerals det:num|four, det:num|million Numerals det:nuni|un: det:nurn trois

Quantifiers qnjmore. qnlraany, qnjsome Quantifiers qn|plus, qn un_peu, qn plusieurs.

PrOfL Indefinite pro:indefîone, pro:indef|mcre Pron. Badef. det:gen|quelques, det:gen|chaque

Determiners det¡a, detjthe, det|this Determiners det¡le det|un

Post post|both, postiall; post:else.

Other Other

Onomatopoeia on|beeu, on bav,t, oniding Onomatopoeia onjpin on|pon. on ham on ham: on|kof

Interjection int da, int ba. int¡ da, Lntjwów \ Interjection intKvah intlbert int|berk

Communicator" co|please. co|yes. coiio. co|tkanJi_\rou Communicat. co|oui, co|non, colmerci, oojoh, oohéh, co]miam

6. STATISTICAL ANALYSIS OF THE MEANING CLASSES

POS appearing in the data-collections were sorted into the aforenamed meaning-classes, as shown in Table 3. The separation of POS is accomplished by imagining which POS are used in the speech-samples to express each of the classes. The distribution cannot be perfect — a word-form can belong to more than one meaning-class depending on the context. As an example - tout in French is at time an adjective (translated as any, every, entire), an adverb (translated as all, very, in all, all up) a noun (all, whole) and a pronoun (all, any, anything). Here the study relies on the annotation-method applied by the linguists and on their correctness.

The use of the same POS-labels in the two languages can also be a source of errors, but when looking for universals one has to apply common sense in order to find correspondences. For example, the numerals in the French corpus are annotated as nouns as it is following the rules of the French grammar adopted in CHILDES. That has required a retrieval of the numbers in the French corpus and changing the annotation. Some particularities have not been homogenized. For ex-

ample, in the French corpus mien in "le mien" (the mine, m.) is a noun and in "la mienne" (the mine, f.) is an adjective.

The child speech data were statistically treated in respect of these classes. The RUs for each of the meaning-classes were calculated by summing the RUs of their POS-constitu-ents.

As shown in Fig. 4, the paths of use of the meaning-classes during acquisition are very similar in the two languages. (The Sum of RU of the meaning-classes is strictly equal to the SUM of POS RU). The average between-languages correlation for similar meaning-classes is considerably higher than the average POS-to-POS correlations (Table 4).

A strange behavior is displayed by the class of "Others", in that its use in the two languages is negatively correlated. It is not clear from the data why the use of this class displays so different a statistical picture in the two languages. However, some reasoning is proposed in the next section.

Figure 4. Meaning classes' use with advancing age : Ent - Entities, Rel - Relationships, Crc - Circumstances, Qlt - Quality, Qnt - Quantity, Oth - Others, Sum of all classes (Sum of POS RU).

Table 4. Correlations of the use of classes between English and French

The conclusion at this point is: when children's expressions are classified to the proposed classes of meaning, the statistical pictures that describe the two acquisition processes are very similar, as seen from the plot in Fig. 4 and from the correlations.

7. DATA OBSERVATION

The proposed meaning-classes are viewed here as language replications of the semantic roles that humans mentally construct from their interaction with the environment. If the classes have this function, they would be detectable in children's speech from the first stage of language production. Observation of the data-collections confirmed their use from its outset, i.e., at 10-14 months; examples of the first use of each class in the two languages are given in Appendix C.

A commonly accepted fact in the field

of child language-acquisition is that the ability to learn arbitrary associations between words and objects develops until about 14 months of age (e.g. Werker et al., 1998). Brain studies (e.g. Friedrich and Friederici, 2005) also suggest that the processes underlying semantic integration are already developed at the age 14 months. The analysis of the data shows that at the age of 14 months all the classes of the proposed set are used by the two language groups taken as a whole.

The investigation of the English data collection confirmed statistically that there is difference in the acquisition process reflecting children's individual abilities (Atanasov at al., 2016). In the data examined here, for the age-group of 10-13 months there are dialogues of 21 children, 14 acquiring English and 7 French. The dialogues of the 13-months old English-acquiring children (9 different children) contain all the classes except Quantity and Precision and of the French-acquiring group of children (4 children) contain all the classes. The dialogues of 14-months-old English-acquiring (15 children) contain already all the classes. At 16 months the use of all classes is already intensive for the two language-samples (fig. 4). At 16 months, 2 (of the 4 recorded) English and 2 (of the 4 recorded) French acquiring children used all the classes within the confines of single dialogues. Two of the "classes-incomplete" dialogues belong to children who are recorded at younger age, which allowed seeing that these children have used the "missing" classes in their dialogues at 14 and 15 months of age.

The conclusion that can be derived is that the two language-collections' samples support the primary character of the proposed meaning classes.

There are questions concerning the behavior of the classes, however, which have to be clarified. As discussed in the previous section, the classes' acquisition displays a very similar, smoothly growing development, except for the class of Others. The class comprises Onomatopoeias, Interjections and Communicators and its use in both languages is initially high and tends to decrees over the time (Fig 4). The detailed plot of the development over the time of its components is given in Fig. 5.

Analyzing results obtained in specialized domains offers some reasons concerning the behavior and the differences accounted statistically for the use of the class of Others. A body of research supports Imai and Kita's sound symbolism bootstrapping hypothesis (Imai and Kita 2014) stating that sound ico-nicity facilitates language learning in general (e.g. Assano et al., 2015). For example, studies of adults' and children's language-learning have shown that non-Japanese speakers learn easer sound symbolic Japanese adjectives (Lockwood et al., 2016) and verbs (Imai et al., 2008). The results obtained by Fenson and colleagues (Fenson et al., 1994) regarding English and Spanish children's language-acquisition showed that the earliest-acquired words were those judged as being most iconic, where Onomatopoeias and Interjections were rated as being highest in iconicity.

Figure 5. The class of Others. Correlations : Communicators: -0,415 Onomatopoeias: 0,352, Interjections: -0,084,.

based on picture-mapping task and reported an advantage for onomatopoeia in the mapping word-to-semantic-item in broader perceptual sense. The extended analysis of the results related to these phenomena, proposed by Laing, states that Onomatopoeia probably constitutes the most obvious and common form of iconic-ity, but ideophones (e.g., glisten, jingle) and mimetics (found in Japanese) also contribute to iconicity. The author states that the extent of this contribution varies across languages.

Concerning the statistical picture obtained in the present study, in the light of these cited results it is hypothesized here that Onomatopoeias, known to be dominant in infants' early lexicons, are initially used to name Entities, Relationships etc., which explains the decrease of their use over the course of time, in both languages. It can be supposed that the use of onomatopoeias is language and culture dependent, which may explain the observed differences of the two languages.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

As may be seen from the plot in Fig. 5, the negative between-languages correlation observed for the Others class is due mainly to the dissimilar use of Communicators (yes, no, thank you, hi!, oui, merci, salut! etc.) and Interjections (see Table 3). It should be noted that the p-value for the correlation of Communicators is 0.17, so, formally, the correlation represents an unreliable result. In fact, observation of the data shows that the intensity of use of Communicators displays a big dispersion over the dialogues. Unsurprisingly, the use of Communicators is dialogue-dependent. This dependency reflects the influence of the context occasioning the dialogue on its content. It is plausible, too, that children's individual habits have an impact on their use of Communicators. In all cases, in the English dialogues the level of use of Communicators is approximately constant, whereas the French data contain dialogues in which the use of Communicators drops drastically over the course of time.

The plots in Fig. 6 show the distances found after multidimensional scaling for the set of POS RU in the two languages (for the entire period of language acquisition investigated). The plots suggest that use of POS advances en bloc (with the exception of Nouns and Verbs) in both languages. Only the use of Communicators displays a markedly distant point, suggesting their separate role in language expression.

Laing, C. E. (2017) used an approach

De-r^ed SCirmJui CnnflB.iirilnn Eucllde-em dUiUrect m&del

English

'■-WTfiS) "U

Aa ubtiU

iamdr^n

^t^rnP...

etilUïP. ■ I IJ\>i' iP^^CnOT. r .«U

-■^■1 r I, . ,

1*1 uMtgmJwCLPoj. MV-iU :fi v

P>

iHSLl 5ÎU

IptrlSU

^LJ

laWeHil^j

SïiwiUMl CcnflpurKltfl Euclidean rr-adil

Figure 6. Euclidian distance model after multidimensional scaling of the POS RU

In French, the use of the Others class decreases as language-acquisition progresses (Fig. 4). One may suppose that the decrease is offset by the use of alternative POS (as is found regarding Onomatopoeias), or perhaps by other means of communicating.

0.1» w

0,1.1]

(jroup II-3b months

(■roup >7 - <i2 months

I Comm imitators - oui, all ouï, oh oui. bell oui, ouais... mh, mhm j ftdverbs - d'accord, si, ben 5i, mais si...

Figure 7. Example - use of some affirmative communicators and adverbs in French

Fig 7, for example, depicts the analyzed samples regarding use of affirmative communicators in French, together with other means often used by French-speakers for expressing them (where "si" is a quite specific manner of expressing a double negation - i.e., expressing

approval in contradiction to a negative statement just made by the other speaker in the exchange).

Speech communication in free dialogues is an act. The internal dispositions, intents, and emotional states implied by the speaker are not transferred in a uniformly faithful manner by the pronounced words alone. Even when taking into account that the prosody of the announcement conveys a lot of information, the accompanying signs and reactions such as gestures and gaze make part of the communication. This behavioral aspect of language communication is culture dependent. As an example, the communication within a group of speakers of Italian looks different to that within a group of speakers of Dutch. It seems reasonable to propose that the difference observed in the statistical result is due to the influence of the adult's language and communication practices in the two cultures.

Re-addressing the questions of concept formation and language production after undertaking her huge analysis, Laing (2017) discusses the role of iconicity in early language development as follows: "...These 2 words [dog and ball, found to be among the first 10 most frequently used by small children] are among the 3 least iconic of the 10 words overall, and dog is both the least iconic and least systematic of the 2 words. In these two cases, therefore, the motivation behind their early acquisition cannot be driven by iconicity. "

Perhaps the reason for children's frequent production of such words could reside in the inborn necessity to mentally represent the related semantic types.

8. DISCUSSION

The idea underlying this study is not novel in linguistics. The overall approach can be seen as a statistical investigation towards the "semantic bootstrapping hypothesis" proposed by Steven Pinker (Pinker, 1987). Pinker supposes the existence of a "semantic inductive basis" that helps children in the acquisition of language rules by means of "syntax-semantic pairing". The content of the inductive basis proposed by Pinker comprises categories such as "name of person or thing", "action or change of state", "attribute" and "spatial relation, path or direction". Further, Pinker's work shows how these categories can lead to the acquisition of syntactic rules.

The set of classes proposed here, inferred from the Actor in the environment mod-

el, came to be quite similar to the categories proposed by Pinker. The results of the present study support the hypothesis that syntactic rules are based on semantic determinants and suggests, too, that this basis is common for all humans.

This implies that during the course of evolution, over a large time-span, the development of languages has been dictated by the development of the mechanism for mental representing of the world.

Let us present language development (and acquisition) as depending on two complementary factors: the one being how necessary it is to encode some item of information in order to communicate it (in terms of its importance for the continued communal existence of humankind), and the other being how feasible it is to do so. One may suppose that, in evolutionary terms, the necessity to communicate in order to survive has influenced the development of the mental capacities required.

Communicating information by means of spoken language necessitates first conceptualizing it and then according to it a phonetic content (label). This raises the question as to abilities necessary for conceptualization and those regarding the phonological encoding.

In the languages examined, Communicators and Interjections are mostly expressed with short syllables comprising simple phonetic content, easily memorized and pronounced. These lexemes are expressing internally generated, affective reactions to information that has been processed (immediately or in the past), and express products generated by lim-bic system processes. They can be seen as speech-expressions of internal information flows that encode intrinsic characteristics of the Actor. This explains their intensive use in the initial period of language production (see Fig. 5).

From the standpoint of conveyed meaning, Ccommunicators are the most complex representatives of the speech as they serve to communicate agreement or disagreement, intents, internal dispositions, etc., evaluating the overall conceptualized situation. Indeed, Communicators serve to summarize and convey, in a single word, both the Actor's overall conception of a situation and his or her immediate stance or inclinations in reaction to it. As stated in the Wikipedia article on "Yes and No", "They are sometimes classified as a part of speech in their own right: sentence words or word sentences."

Onomatopoeias, most likely, are used intensively at first because their sound-sym-

bolic nature facilitates the word-to-concept mapping. From the standpoint of mental representing of concepts, their use is equivalent to the use of Entities, Relationships, Circumstances, Quality and Quantity.

The problematic is related to mechanisms ensuring the concept-creation and their dependence on brain resources. It has been argued by Fennell and Werker (Fennell and Werker, 2003) that 14 month old children's failure in associative word-learning situations is due to a processing overload (which, however, does not incapacitate their discriminating of the words' phonetic detail).

Children's speech demands the use of so-far extant concepts, so their volume in the speech reflects the processing charge which the mental system has allowed at that given age. The plot of the use of classes (Fig. 4) shows that the line of the Sum of classes develops in a very similar way for the two language groups of children (correlation 0.87). This suggests that some resource underlying the conceptualization abilities is used quite similarly by the two language groups.

The statistical picture displayed entails several questions. One is why the classes participate with different weights within the meaning construction. The reason for this should be related to the processing charge that they demand.

A second question is related to the proportions of meaning-classes - in English the intensity of use develops in the order Entities

- Relationships - Circumstances (Fig. 4) and in French the order of intensity is Entities -Circumstances - Relationships. These classes serve to mentally re-describe an Event. The mental image of an Event consists (in general) of Entities, Relationships and Circumstances. The proportion of their use to express an Event can be language dependent. The Sums of the RU of discussed three classes develop in very similar way (correlation till 36 months

- 0.905). That suggests that the mental process treats the two schemes with an equal effort.

All these questions necessitate establishing a model depicting the complexity of the mental processing associated with different meaning-classes that can explain the reported statistical observation.

9. CONCLUSION

Theories and studies in the field of child language-acquisition have increasingly concentrated on the relation between language

units and semantic representations. Despite the huge amount of brain studies investigating the reactions to semantic stimuli, semantic confusion, word-to-concept mapping and other aspects of semantics, there is still no explanation from where the semantic representations come and what primary role do they have. In other words, why do they exist?

The present study proposes a model which, as first step, portrays the general biological foundation for the existence of semantic representations as information substance. The model posits that children are born equipped for the role of Self-actor in the environment and supposes that inborn information-treatment mechanisms organize information into general meaning-related classes that have the role of ensuring the Actor's survival in the environment.

The classes-hypothesis is tested by analyzing data from child language-acquisition of two languages. When children's speech is considered in terms of use of these classes, the similarity between the two language acquisition processes is important. An essential statistical observation is that children use representatives of all these classes from the onset of language production - an indication that the proposed classes reflect inborn mechanisms for mental representing of the world.

The presented result and reasoning, as it often happens in science, give rise to several novel questions. The most important of them is related to the processing load which, following the data is different for the different classes.

Conflict of interests

Author declares no conflict of interest.

ACKNOWLEDGMENTS

The author is thankful to Brian Mac-Whinney and his collaborators for creating and maintaining online the CHILDES (Child Language Data Exchange System) corpora, and to the researchers who shared there their valuable results, without which this study would not have been possible; to Gary Maz-zaferro, for the main ideas underlying this study and the suggested sources in the areas of cognition and information modelling; and to Richard Traub for his precious advices on the subject of cognition and for patiently correcting and editing this paper's text.

REFERENCES

Asano, M., Imai, M., Kita, S., Kitajo, K., Okada, H., & Thierry, G. (2015). Sound symbolism scaffolds language development in preverbal infants. Cortex, 63, 196-205. https://doi.Org/10.1016/j. cortex.2014.08.025 Atanasov, D., Slavova, V, & Andonov, F. (2016, July). A Statistical Study of First Language Acquisition: No Gender Differences in the Use of Parts of Speech. In Proc. of the 12th Annual International Conference on Computer Science and Education in Computer Science (CSECS 2016) (pp. 1-4). https://www. researchgate.net/profile/Velina_Slavova/ publication/315729185_A_STATISTICAL_ STUDY_OF_FIRST_LANGUAGE_ACQUI-SITION_NO_GENDER_DIFFERECES_ IN_THE_USE_OF_PARTS_OF_SPEECH/ links/58dfa453a6fdcc41bf920578/A-STATIS-TICAL-STUDY-OF-FIRST-LANGUAGE-AC-QUISITION-NO-GENDER-DIFFERECES-IN-THE-USE-OF-PARTS-OF-SPEECH.pdf Barsalou, L. (2003). Situated simulation in the human conceptual system. Language and cognitive processes, 18(5-6), 513-562. http://dx.doi. org/10.1080/01690960344000026 Bassano, D. (2000). Early development of nouns and verbs in French: Exploring the interface between lexicon and grammar. Journal of child language, 27(03), 521-559. https://www.cambridge.org/ core/journals/journal-of-child-language/article/ early-development-of-nouns-and-verbs-in-french-exploring-the-interface-between-lexi-con-and-grammar/854B3A644E6542A31487A 7B2343828ED Bergelson, E., & Swingley, D. (2012). At 6-9 months, human infants know the meanings of many common nouns. Proceedings of the National Academy of Sciences, 109(9), 3253-3258. doi:10.1073/ pnas.1113380109 http://www.pnas.org/con-tent/109/9/3253.full CHILDES - the child language component of the Talk Bank system for sharing and studying conversational interactions. http://childes.talkbank.org/ De Saussure, F. (1916). Cours de linguistique générale, publié par Ch. Bally et A. Sechehaye avec la collaboration de A. Riedlinger. Paris: Payot. https:// books.google.bg/books?hl=en&lr=&id=wmLQ flL01Y4C&oi=fnd&pg=PA10&dq=De+Saussu re,+F.+(1916).+Cours+de+linguistique+g%C3 %A9n%C3%A9rale,+&ots=nqYpoyz0Rh&sig= V6Si7Ckl8YMEFfTHqBnHKTwThBU&redir_ esc=y#v=onepage&q=De%20 Saussure%2C%20F.%20(1916).%20 C o u r s % 2 0 d e % 2 01 i n g u i s t i q u e % 2 0 g%C3%A9n%C3%A9rale%2C&f=false Fennell, C. T., & Werker, J. F. (2003). Early word learners' ability to access phonetic detail in well-known words. Language and speech, 46(2-3), 245-264. doi:10.1177/00238309030460020901 Fenson, L., Dale, P. S., Reznick, J. S., Bates, E., Thal, D. J., Pethick, S. J., ... & Stiles, J. (1994). Variability in early communicative development. Monographs of the society for research in child development, i-185. doi:10.2307/1166093 Friedrich, M., & Friederici, A. D. (2005). Lexical priming and semantic integration reflected in the event-related potential of 14-month-olds. Neu-

roreport, 16(6), 653-656. http://journals.lww. com/neuroreport/Abstract/2005/04250/Lexi-cal_priming_and_semantic_integration_reflect-ed.28.aspx

Frishberg, N. (1987). Home sign. Gallaudet encyclopedia of deaf people and deafness, 3, 128-131. https:// scholar.google.bg/citations?view_op=view_cita tion&hl=en&user=Sw1utaAAAAAJ&citation_ for_view=Sw1utaAAAAAJ:Y0pCki6q_DkC Glezer, L. S., Kim, J., Rule, J., Jiang, X., & Riesen-huber, M. (2015). Adding words to the brain's visual dictionary: novel word learning selectively sharpens orthographic representations in the VWFA. Journal of Neuroscience, 35(12), 4965-4972. https://doi.org/10.1523/JNEURO-SCI.4031-14.2015 Hausser, Roland. 1989. Principles of computational morphology. Pittsburgh, PA, Carnegie Mellon University, Laboratory for Computational Linguistic, Technical report. The application: https://pdfs.semanticscholar.org/7dbb/02fdb2f1 4cf3f701e76fbf3165661197ceba.pdf Hirsh-Pasek, K., & Golinkoff, R. M. (Eds.). (2010). Action meets word: How children learn verbs. Oxford University Press. https://books.google. bg/books?hl=en&lr=&id=McEVDAAAQB AJ&oi=fnd&pg=PR9&dq=Hirsh-Pasek+an d+Golinkoff+(2006),+%E2%80%9CActio n+meets+word:+How+children+learn+ver bs.%E2%80%9D&ots=KI19ZZAWI4&sig =NBgr3jPCWuIFoS2VoyTuWzvtSgU&red ir_esc=y#v=onepage&q=Hirsh-Pasek%20 and%20Golinkoff%20(2006)%2C%20 %E2%8 0%9CAction%2 0meets%2 0 word%3A%20How%20children%20learn%20 verbs.%E2%80%9D&f=false Imai, M., Kita, S., Nagumo, M., & Okada, H. (2008). Sound symbolism facilitates early verb learning. Cognition, 109(1), 54-65. https://doi. org/10.1016/j.cognition.2008.07.015 Imai, M., & Kita, S. (2014). The sound symbolism bootstrapping hypothesis for language acquisition and language evolution. Phil. Trans. R. Soc. B, 369(1651). doi:10.1098/rstb.2013.0298 Laing, C. E. (2017). A perceptual advantage for onomatopoeia in early word learning: Evidence from eye-tracking. Journal of Experimental Child Psychology, 161, 32-45. https://doi. org/10.1016/j.jecp.2017.03.017 Lockwood, G., Dingemanse, M., & Hagoort, P. (2016). Sound-symbolism boosts novel word learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(8), 1274. http:// dx.doi.org/10.1037/xlm0000235 MacWhinney, B., & Snow, C. (1985). The child language data exchange system. Journal of child language, 12(02), 271-295. https://doi. org/10.1017/S0305000900006449 MacWhinney, B. (2012). The CHILDESProject. Tools for analyzing talk-Electronic edition. Part 1. The CHAT transcription format. August 6, 2012. https://pdfs.semanticscholar.org/7dbb/02fdb2f1 4cf3f701e76fbf3165661197ceba.pdf Moran P. & J. Tommerdahl (2011). A case study of linguistic isolation and questions about subsequent language support and educational provision in the United Kingdom. In Patricia Sutcliff, William J. Sullivan & Arle Lommel (eds.), The Linguistic Association of Canada and the United States, LACUS Forum 36: Mechanisms

of Linguistic Behavior, 229-240. Houston, TX: LACUS, 01.05.17: http://www.lacus.org/vol-umes/36/216_moran_p.pdf Pinker, S. (1987). The bootstrapping problem in language acquisition. Mechanisms of language acquisition, 399-441. https://goo.gl/USrGZM Robinson, G. A., Butterworth, B., & Cipolotti, L. (2015). "My Mind Is Doing It All": No "Brake" to Stop Speech Generation in Jargon Aphasia. Cognitive and Behavioral Neurology, 28(4), 229-241. doi:10.1097/WNN.0000000000000080 Slavova, V & A. Soschen. (2015 a.). On mental representations: Language structure and meaning revised, International Journal Information theories & applications 2(4), 316-325. http://www. foibg.com/ijita/vol22/ijita22-04-p02.pdf Slavova, V. & A. Soschen. (2015 b.) Syntactic operations - modelling language faculty, International Journal Information theories & applications 2(4), 326-337. http://www.foibg.com/ijita/ vol22/ijita22-04-p03.pdf Slavova, V. (2016, July). Data Collection for Studying Language Acquisition. In Proc. of the 12th Annual International Conference on Computer Science and Education in Computer Science (CSECS 2016) (pp. 1-4). https://www. researchgate.net/profile/Velina_Slavova/pub-lication/316406783_DATA_COLLECTION_ FOR_STUDYING_LANGUAGE_ACQUI-SITION/links/58fc6cb2aca2723d79d89506/ DATA-COLLECTION-FOR-STUDYING-LANGUAGE-ACQUISITION.pdf Torigoe, T., & Takei, W. (2002). A descriptive analysis of pointing and oral movements in a home sign system. Sign Language Studies, 2(3), 281-295. doi:10.1353/sls.2002.0013 Werker, J. F., Cohen, L. B., Lloyd, V. L., Casasola, M., & Stager, C. L. (1998). Acquisition of word-object associations by 14-month-old infants. Developmental psychology, 34(6), 1289. http:// dx.doi.org/10.1037/0012-1649.34.6.1289

Appendix A

List of the corpora in CHILDES data repository used in this study.

Number of

Corpus in CHILDES: Dialogues included

in this study

English Belfast Corpus 10

EnElish Bernstein-Ratner Corpus 5

English Bliss Corpus 2

English Bloom73 Corpus 6

English Braunwald Corpus 111

English Brent Corpus 24

English Brown Corpus 33

English Clark Corpus 1

English Cornell Corpus 11

English Demetrasl Corpus 25

English Feldinan Corpus 19

English Fletcher Corpus 10

English Gleason Corpus 45

English Hall Corpus 2

English Higginson Corpus 4

English HSLLD Corpus 36

English Mac\Vhiniiey Corpus 22

English NewEngland Corpus 19

English Peters Corpus 8

English Post Corpus 20

English Rollins Corpus 24

English Sachs Corpus 4

English Snow Corpus 4

English Suppes Corpus 1

English VanHouteti Corpus 15

English Warren Corpus 4

English Weist Corpus 10

English-USA Bates Corpus 99

Eng-USA Soderstrom Corpus 1

French Champaud Corpus 32

French Geneva Corpus 15

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

French Hanimeliath Corpus 224

French Hunkeler Corpus 22

French Levetlle Corpus 34

French Lyon Corpus 207

French MTLN Corpus 299

French Paris Corpus 52

French York Corpus 23

Phonbank Enaiish Providence Corpus 1

Appendix B: Presentation of the Data. Example: English — the beginning of a dialogue.

iB 43 a) As presented in CHELDES, the data source. Window of the CLAN interface.

gBegiri @ Lang Li ages eng @Partici pants CHI Ross Target_Ch<ld . MAR Mark Brother . MOT Mary Mother . FAT Brian Father @1D en<j|M3eWhiniiey|CHI|4 1 1?|riali|(ypHMl!|Target Ch.ld||j @ ID eng| MacWhinney |MAR|2.2 23IIII BrotheKI @ 10 eng|Mac Whin ney |M0T1| III Moth erlil

@ID en g| Mart @ Media 48a 1 S @D*e 11-FEB- O ¡i i . «.a i D b) Tagged and additionally annotated records of the same dialogue, as stored locally in the constnicted database

(gj o iluaii on N *FAT you can ! M . Modulet . Text Marked1 - - TypeÛfRecDr. 1Mb - Paeri .

49 rre49.mc rfs4í.mcwüü0? Û ^Languages: eng

%mor prolyou i %gra 1|3|SUBU "CHI. no . ■ 49 rrs49.rnt rT549.mcwt)003 0 ■^Participants: CHI Ross Targel_Ch id, MAR Mar* Brother. MOT Mary Mother

43 49 rr&iS.mc rrs49.mt rrs43LmcwODM 0 fitt ns4S.incn0005 0 fölDt eng Mac Whin n(flCHI|l:1.17Jmale|lypical||Targ(l_Ciiid :| üii y Mac'iVh 'il net U ARj?:?.?3 ':B i util ü i "!

%mor: coino 49 .'raHS.rr rTE4î.ircw0006 0 $ID: c-n y r.13 :;vn n n MûT!|;i|Moth c r|;.

%gra 1|0[lNCR( 49 rrs49.nnc rfs49untw000r 0 eoo M^cv/h m n«ïff ATIIIEIF atn^nil

■FAT.: okay - 49 rrs49.mc rT545_incwOOù8 ü ■iïiMetfia; 4ftaK audio

%mor: colokay 43 rrs49.mc rrs4MiCWO009 0

%gra 1|0|INCR( 49 >rs49.nlc itS49,mtw001B 0 S Situation: Rjss was being real nice Id Brian And Saving Candy lor him.

'FAT 111 eat it ti 49 ireJS.mc ns4ï.ircwOCH 1 1 *FAT: you can «at it now. FAT:

%mor: pro subll %gra 1I3ISJBJ 49 rrs49.rmc rTS49JH£W0Q12 1 prolyou moctan v e-atcioiit aflvinotf,

49 rrs49.rnc nrs49.mcw0013 1 'iffra: 1|3|SUBJ 2t3|AUX 3|0|R(X)T <l|3|OBJ 5|3|JCTi|3|PUNCT

■CHI: no -%mor. co[no . %gra 1I0IINCRI %com: he misu ■ i^l_11 ('_____■_ _ 49 rr&sS.mc rrs49untw00l4 2 ■che no. CHI:

49 rra49.mt n549.ircw0015 1 ^mror DDjno.

49 rrs49.mc rrsiJ.mcwQOI S 2 Sgra: 1I9IINCR00T 2|1|PUIKT

49 rrs49.n>t rrs49.mew0ül7 3 "FAT: ofcaf. fAT:

49 rre49.mc rTs4ï.mcw001Î 3 Smor «Hw,

CHI i m saving %tfíOt pro $ub|l 49 rrsiS.nw: rîs49.mewÛD19 3 Sora: IIDlirtCFtOÛT 2I1IPUHCT

49 rrs49.mc rt&49.incwOÜ2D i "FAT: I'll eat it tomoinow. FAT:

prepftaf adu 49 rr&ij.mc pno:sub|l-nn>d|wii v|eat prolU aJictemHimerTOW.

%gra 1|3|SUBJ 49 rrs4&.n-c rtS49.iricw0022 4 *igra; 1|3|5UBJ 3|3|ALfJ< 3|0|RQOT 4|3|OBJ 5|3|JCTi|3|PUNCT

49 rrsiS.mc rrsJî.mcwûOîl 5 *ÇHt m?- CHI:

49 rrS49.mt rtS49,m£W0Q24 5 Nmot:

49 rns49.mc nr&4î.iticw0025 5 tijra: 1|D|INCROOT 2|1|PUHCT

49 49 rr&i9.nc rraí9.mc rfs49micw0026 5 Scom: rT549.mcw002i 6 •CHt Us misunderelooflwliowas the subtectol №st stnleiw-№i saving it far you Tor lomomaw. CHI:

49 rrslS.mc rrs4ïjrewOD2î 6 tinn>r pra:sub|l-au*jbe41S partlsave-PHES-P profit prsp|for pnolyoti

49 rrs49.n>t rrs49umcw0029 S prep|lor adviemltottiortmv.

49 rrs49.mc rrs4î.mcw003û 6 Sgra: 1|3|BLBJ 2[3|AUX 3|0|RQOT J|3|OBJ E|3|JCTfi|5|POEJ T|3|JCT

49 rrs49.ni»C rrs49.mewûD3i S 9|J|PUNCT

c) Child speeds after extraction from the same of dialogue, stored "Children speeds (in 3th Normal form). in the local database table

if ni4t.inr rrsflU.rpuwmjl no J C L> 1 n O . llOIIMtAOOl «

19 HH9.II1Í rrwn.mrmfflur I'm uyiiij ii fix you t UP Ivfnuircw . 1 fii«»|il> 11 W«IhíílS Píít 1 PfHP pro 1 It pnp| (or pra|yeu prvpffuP jdv:tvni|tDPiiDrpaw . 1|1|1UI» J|1|A1P* ijajftoor M iIjCI »!Mhu9j r| ¡In i a J 7 1 I'UUI 1 1 PUNC T 6

1 RlUfft ai L&4 F »i Ï

Appendix C. First use of the proposed meaning classes

Examples for the period 09-19 months in English and 11-20 months in French. There are no occurrences of use of the classes observed in the corpora before the earliest month, shown in the listed 10 examples for each class.

[Montlis| Dialogue | Speech English | PQS

Entities

09 ale09.br Vfama. n:prop Mama .

09 mor09.br Dada. n:prop Dada.

09 mor09.br oJegg .

10 mirlO.br Vla &=noise. n:prop;Ma.

11 may 11 Jig shoes . n| shoe-PL .

11 mayll Jig saby. njbaby.

11 may 11 Jig 'ba)nana. a.banana.

11 dilll.br no , Dillon &=noise . co|no cm|cm n:prop|Dillon. (Self)

12 roll2.ro ca(r). njcar.

12 roll 2 jo me me _ 3ro::obj|me pro:obj|me .

Relationships

11 mogll.br io &=noise ! v|go !

12 ratl2.ro <what's> [/] what's that ? prorwhj what--cop be&3S pro:dem|that ?

12 chgl2.ro pulls on hat. v|pull-3S prep|onnjhat.

13 mrgl3.br done . partdo&PASTP.

14 iml4.ne draw . vjdraw.

15 alil5.br &d® sit. vjsit.

16 will 6.pr ctired [?]y. parttire-PASTP .

16 alil6.bl climb . vfclimb .

16 alilfi.bl gone . part go&PASTP.

17 lahlT.bw eatinE . parteat-PRESP .

Circumstances

12 ratl2.ro <what's> [/] whafs that ? 3ro:wh'what-copbe&3S pro:dem|that ?

13 3ryl3jie yyyup. adv|up.

14 mirl4.br out. adv|out.

14 naol4.sa dere [: there] . adv|there .

15 tazl5.bw out Dee baba [= bottle]. prep out n:prop|Dee njbaby .

15 tazl5.bw Vfommy out. n:prop Mommy adv|out.

16 alilfi.bl away. adv|away.

16 alilfi.bl there Mama. adv|there n:prop Mama .

17 ia.el7.bw outside [= actually savs side]. adv|outside .

17 ia.el7.bw down. adv|down .

Circumstances

12 ratl2.ro <what's> [,'] what's that ? pro::whj what—cop jbe&3S pro:dem|that ?

13 biyl3.ne YYY up . adv|up .

14 tnirl4.br out. adv|out.

14 uao 14.sa dere [: there] . adv|there .

15 tazl5.bw out Dee baba [= bottle]. prep out n:prop|Dee n|baby .

15 tazl5.bw Mommy out. n:prop Mommy adv|out.

16 alilfi.bl away. adv|away.

16 alilfi.bl there Mama. adv|there n:prop Mama .

17 lael7.bw outside [= actually says side]. adv|outside .

17 iael7.bw down. adv|down .

Quality and Attribution

10 mirlO.br vummy [?] &=noise. adij |yum&dn-Y .

13 tnrgl3.br hot. adj |hot.

14 norl4.ne orange . adlj | orange.

14 mirl4.br big. adlj|big.

15 tazl5.bw my me Dada. pro:poss:det|my pro:objme n:prop Dada .

16 stfl6.pe gross. adlj |gross.

16 alil&bl dirty. adj dirt&dn-Y.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

17 laelT.bw I want my bottle . pro:sub|I v|want pro:poss:det|my ti bottle .

17 Iagl7.bw a@.z:sc <my bike> [?]. imk|a. pro:poss:det|my n bike .

18 gerl S.cl lie sleepy. pro:sub|he adj sleep&dn-Y .

Quantity and Precision

14 tiorl4.ne the woof. det|the on woof.

14 norl 4.ne that duck. det|that n|duck .

15 alil5.br a mommy . det|a n mommy.

16 alilfi.bl more . pro:indef more .

17 ¡ah.17.bw sxx have that one . [+ PI] v|have det|thatpro:indef)one .

IS laelS.bw <eat all> [?] . v|eatpro:indef|all.

IS gerl S.cl two, I. det:numtwo cm cm pro:sub|I.

19 tah.19.bw sxx this one [?] [>]. [+ PI] det|this pro:indef]one.

19 lafl9.bw <six [?] egg [* 0s]> [<]. detinum six njegg .

Monthi Di alegue Speech French FOS

Entities

11 jlilí.Pa vw maman. n:tnaman&f.

11 jlilî.Pa papa. n!papa£m.

12 mrcl2_Ly bébé ! n|bébë£m !

12 mral2JLy des (cubes [= l gémit] > . prep|deâles n|cubeifcni-P!_ .

12 tmcl2.Ly chat. njchat&m.

12 mial2JLy *;(. ..) oh> les pin^ouias . co|oli detjles&pi n|pmg3uimfetii-PL .

12 mml2JLy un ¡vélo [= ! génrit]>. detiun&máisa ïlvélo&m.

13 anslj.Ly c'est bien Ana. pro: dem] ce$ v:aux|être&PRE>it3s n|bien&m □:prop|Ana. (N.B. Self)

13 tmdl3JLy ■gâteau [.■■}> gâteau. njgàteau&m.

13 áml3X,y ramasse un lego. viramasser-PEESÊSUB&13E det|mi£m£sg nilego&m=toy.

Relationships

11 tmbliiLy —; cache. particacher-PPifem.

12 mial2JLy tu Sis ! pro: subi Ituvflire-PASS&HRES&lîa !

12 nad 12Xy +<«■[= ! tit : .

12 mral2.Ly -K veux dormir là . v:mdlivaiitoir&PRES&12s yjdjoanii-INF adv:place là.

14 mrbl4.Lv ça tourne. advjçà vttoumer-FRES.ftSUEA: 13s .

14 anal4_Ly eh regardes . co|eh v!reïaidei -PEES&SUB&2s.

14 mibl4.Lv me (voir [?]:>. pro:ob¡¡me vIvoii&INF .

If tmbl51y est iâ. v: aiiï|être&PRE5&3s adv:place|là.

16 tmdlóJLy (h! esE où> [7] doudou . pro:sutj¡il v:aui ètrs£:PRESâ3s proie'loù ni(ioudouifetiL=b]aiil:[e .

1« tmdlS.Ly a pas. v:aus|avoirí:PE£Sife33 ad\-neg!pas.

Circumstances

11 j'Lll.Pa ça11 pro: demi ça 7

12 tmcl2.Lv encore ah. advlenccre co:ah.

12 mral2JLy —1- (non ,> tout de smte . co|non=iio cm.cm advjfcut de smte .

13 tmcl3.Ly et c(el)ui-là 7 can: et pro:dem|celui-là 7

14 mrbl4Lv (ça [./]> ça tourne yyy . pro:dem ça v toçimer-F3ES&SUB&13s .

If anzl5_Ly et là. ccmj|et adv:piace|là.

If ansli.Lv et (voila [7]i . cani et adv:piace|voiià .

16 tmcl6.Lv -=à côte?- [?] . preplà n;coté&m.

17 mral 7JLy ah (dedans [7]>. co|ahadv:placei dedans.

17 anal7JLy émoi. prep|â pro moi&sg .

Quality and Attribution

12 nshl2Ly —< (ma [?] \ mamarj. det:psss|[naát;sg n raaman&f.

12 tmdl21y grand. ad¡¡.grand£m.

If atmlfLy bleu PI. adi¡bleufíim !

16 tmal6 Lv fermé [?]. adiifeimé&m.

16 tmcl6 .Ly xxx rigolo. ad;ingo!o&m.

17 tmbl7Jjy sa tête . det:psss|saiS:fS[sg rdtéte&f.

17 anel7JLy pas gentille [7] . advine g pas adi geutilie&f.

IS tmalSly rouge [7]. adjjrouge.

IS tmelS.Ly est iourd[7] . v:au.\jêtre&PE£S&3s ad¡|lcuidJ;m.

19 mrdl9Ly VW nion sac sac . det:pi>ss|mcni;m&sg n|sac&m n|satSm

Quantity and Precision

12 mral2JLy et les deux. canjjet det|les&pl oumjdeux.

12 mral2J.y parce que dl [■■]> dl [,]> (il [;]> il a lit un petit peu . conjlparce pro relique pro:subj|il v: auxjavoir&PEESifejs n|lit&m detjun&m&sa ad '.petiti&m npeuifcm.

13 mralSly if ) la la la ['?]:■ . detjlaiif&sg detlla&f&ss det|la£f&sa.

14 mrcl41.y le chat. detileíim&sg ncliatiini.

If ¡lilf.Pa elle va aller dans le yyy . pro:subj|elle vmdlfaller&PRESMs v:mdl alleráiINF prepjdans detlle&m&sa .

16 tmblély une abeille. detuneíifíisg n|abBÍHeáif-=bee.

17 anal71y c'est le feutre. pro:dem ce$v:au3i|etre&PP-E£ife3s det le&ni&sa nifeutre&m.

IS tmelS.Ly «dieux troisi- [71 . niim|deux&m numjtrois&ni.

19 jlilS.Pa c'est un pain ! prLj:demceiv:aux|être&PE£Sife3s detun&m&sg nipain&m !

20 ant20.Ly et un carré pour Ana petit (N.B. Self) canj|et det|un&m&sg ad!|carrié&m n|pour&m n:prop|Ana adj.petit&m.

S7.B. The youngest children in the French group aie 11 months old.

i Надоели баннеры? Вы всегда можете отключить рекламу.