Improved computational models of sound change shed light on the history of the Tukanoan languages

Chacon Thiago Costa; List Johann-Mattis

Thiago Costa Chacon+, Johann-Mattis List t

+ University of Brasilia; t Centre des Recherches Linguistiques sur l'Asie Orientale (Paris); corresponding author: T. C. Chacon, [email protected]

Improved computational models of sound change shed light on the history of the Tukanoan languages *

There has been much debate regarding the internal history of the Tukanoan languages during the last four decades, with different classification proposals being based on lexical and phonological data. Here, we present a new classification of the Tukanoan language family based on an improved computational approach which infers phylogenetic trees from proposed sound change patterns. In contrast to traditional methods based on the manual identification of shared innovations by experts, our method identifies valid innovations within a parsimony framework. In contrast to existing computational models which are mostly based on binary character states for lexical data, we model sound change patterns as directed weighted transitions between multiple character states. We apply the new approach to a set of 21 extant Tukano languages. Our results confirm the east-west split of the Tukanoan languages which was proposed in the past and suggest a classification which groups Kubeo with Tanimuka on the one hand, and Koreguahe with Maihiki, on the other hand, thus reconciling previous classifications. We use this new classification to propose a consensus phy-logeny of Tukanoan in which all automatically inferred shared innovations were manually checked and uncertainties are explicitly displayed.

Keywords: sound change, phylogenetic reconstruction, Tukanoan language family, computerassisted language comparison.

1. The Tukanoan language family

1.1. Comparative studies on the Tukanoan languages.

The Tukanoan language family comprises 29 languages spoken in the Northwest Amazon. These languages are distributed over a complex linguistic area in which they have been evolving in prolonged contact with languages from a large variety of different language families, including Quechuan, Arawakan, Cariban, Huitotoan, Nadahup, and Boran, and a couple of language isolates. Figure 1 displays the geographic distribution of the Tukanoan languages.

Comparative studies of the Tukanoan languages began more than a century ago (Brinton 1891). Since Beuchat and Rivet (1911), most scholars agree that the Tukanoan languages form a separate linguistic group with no relation to any other South American language. Building on data of Beuchat and Rivet (1911) along with non-linguistic evidence like geography and eth-

* As part of the GlottoBank Project, this work was supported by the Max Planck Institute for the Science of Human History and the Royal Society of New Zealand Marsden Fund grant 13-UOA-121. This paper was further supported by the DFG research fellowship grant 261553824 "Vertical and lateral aspects of Chinese dialect history" (JML) and by the Brazilian Scientific Reseaerch Council (CNPq) grant for the project entitled "Changes and continuities in the history of the Tukanoan Family" (TCC). We thank Natalia Chousou-Polydouri and David Morrison for helpful discussions on the topic of rooting trees and non-reversible models. We thank Simon Greenhill for many helpful comments on the technical aspects of the paper. We also thank an anonymous reviewer for critical remarks.

Journal of Language Relationship • Вопросы языкового родства • 13/3 (2015) • Pp. 177-203 • © The authors, 2015

Tucanoan Q Arawakan

|~| Quechuan ^ Cariban

Huitotoan ^^ Nadahup

A Boran ■ Unclassified / Other

Figure 1. The location and the geographic distribution of the Tukanoan languages and their neighboring language families (the map is originally based on Glottolog, Hammarstrom et al. 2015, http://glottolog.org, but the layout was modified). A legend for the three-letter abbreviation for Tukanoan languages used in the map can be found in Figure 3. Tukanoan languages which were not considered in our study are marked in white font on black background.

nology, Mason (1950) proposed to group the Tukanoan languages into two main branches, a western branch (Western Tukanoan, WT) and an eastern branch (Eastern Tukanoan, ET). The same division into two main branches was supported by Chacon (2014), based on shared innovations identified for the development of Proto-Tukanoan consonants. Alternative approaches group the Tukanoan languages into a western branch, an eastern branch, and a central (or middle) branch (Waltz and Wheeler 1972, Malone 1987, Barnes 1999, Ramirez 1997). The central branch comprises Kubeo (Kub) and Tanimuka (Tan), two languages which were assigned to ET in the two-branch classification. The family tree in Figure 2 shows the classification as presented in Chacon (2014).

Apart from the proposals regarding the major branches of the Tukanoan language family, scholars have also tried to identify more detailed subgroupings for the languages of the major branches. In WT it is especially the position of Maihiki (Mai) which causes disagreement among scholars. Wheeler (1992), Barnes (1999) and Chacon (2014) place Maihiki as an out-group to Koreguahe, Siona and Sekoya (Figure 3 A), while Skilton (2013) has Koreguahe as an outgroup. As for ET, Waltz and Wheeler (1972) opt for three main branches, Northern ET (including Tukano, Tuyuka, and Piratapuyo), Central ET (including, among others, Bar, Desano, and Tatuyo), which they further subdivide into three branches, and Southern ET (including Makuna and Barasano). Their classification, which is given in Figure 3 C, is basically confirmed by Barnes (1999). Ramirez (1997) treats Tanimuka as an outgroup and splits the rest of ET into three main branches, as shown in Figure 3 D.

Classifications of the Tukanoan family have been based on lexicostatistics (Waltz and Wheeler 1972, Ramirez 1997) and phonological innovations (Malone 1987, Chacon 2014, Wheeler 1992, Skilton 2013). In the approaches based on phonological innovations, one can

(//) Western Tukano ^ East-Eastern Tukano South-Eastern Tukano West-Eastern Tukano

Tan o

Kub o

Des o

Sir o

Yup o

Bas o

Mak o

Wan

Pir

Kar

Pis

Tuy

Yur

Tuk

Tat

Bar

Sio o

Sek o

Kor o

Mai o

Kue o

Figure 2. Classification of Tukanoan languages. The figure shows the classification of 21 Tukanoan languages based on shared innovations in the consonantal development as identified by Chacon (2014).

find important differences regarding the reconstructed proto-sounds which often directly affect the subgrouping. Chacon (2014 and 2015) reconstructs a series of creaky voiced stops and a class of palatalized coronal stops (or coronal affricates, as an alternative reconstruction which is used in this paper). Other studies propose voiced stops instead of creaky voiced stops (Waltz and Wheeler 1972, Malone 1987) and a single coronal fricative (Malone 1987). Lexical comparison has been limited to lexicostatistics, where evidence for distinguishing innovations from retentions, crucial for subgrouping, are lacking. Lexical comparisons are further exacerbated by a high degree of contact among some geographically proximate Tukanoan languages, which is reflected in a strong correlation between geographic proximity, intermarrying patterns, and lexical similarity (Malone 1987, Gomez-Imbert 1993, Ramirez 1997, Chacon 2013, Chacon 2014). Since classifications based on phonological innovations do not show this correlation to the same degree (Chacon 2014), they seem to be more reliable to model the history of the Tukanoan language family, at least until larger amounts of lexical data are available.

1.2. Sound change.

Sound change is a central aspect of language change, and the identification of sound change patterns is a key objective of the comparative method (Fox 1995, Ross and Durie 1996). Although first scientific investigations on sound change date back almost 200 years ago (Rask 1818, Grimm 1822), it is still one of the major challenges of modern historical linguistics to get a deeper understanding regarding its nature.

Abbr. Name ISO

Bar Bar bao

Bas Barasano bsn

Des Desano des

Kar Karapana cbc

Kor Koreguahe coe

Kub Kubeo cub

Kue Kueretu -

Mai Maihiki ore

Mak Makuna myy

Pir Piratapuyo pir

Pis Pisamira -

Sek Sekoya sey

Sio Siona snn

Sir Siriano sri

Tan Tanimuka tnc

Tat Tatuyo tav

Tuk Tukano tuo

Tuy Tuyuka tue

Wan Wanano gvc

Yup Yupua -

Yur Yuruti yur

A

■ Sio

■ Sek Kor Mai

B

C

Tuk ■ Wan Pir Bar Tuy Pis Des Sir Tat Kar Mak Bas

D

1 Sio Sek Mai Kor

Wan Pir

■ Tuk Tat

■ Kar Bar Tuy

■ Yur Des

Sir Mak

Bas Tan

Figure 3. Comparing alternative proposals for the subgrouping of the Tukanoan languages. A and B show sub-groupings of Western Tukanoan, and C and D show alternative classifications of Eastern Tukanoan (see text).

When trying to model sound change for the purpose of phylogenetic reconstruction, it is important to pay attention to the specific characteristics of sound change as a process. Weinreich et al. (1968) have raised a number of issues for the scientific investigation of language change in general which likewise apply to the investigation of sound change1. The constraint problem refers to the typologically possible changes and their preconditions. The preconditions refer to the embedding of sound change in the linguistic structure, like the phonetic context, in which a change occurs, or, more generally, the system, which constrains or favors a change. The transition problem refers to the question of how sound change emerges from common articulatory variation, making its way into the phonological system of a language. The actuation problem deals with the driving forces behind the implementation of a given sound change in a specific norm of a speech community.

List (List 2014: 27) makes a distinction between the mechanisms, the types, and the patterns of sound change. Mechanisms deal with the procedural aspects of the sound change phenomenon and can be compared with the transition problem raised by Weinreich et al. (1968). Types deal with the substantial aspects of sound change (what sound changes under which conditions into which sound?). Patterns deal with the systemic aspects of sound change (what are the effects of sound change on the system of a given language? do they lead to a loss of a phonemic distinction, or do they introduce new distinctions?).

As for the mechanisms of sound change, the Neogrammarians advocated that sound laws are recurrent (regular) and exceptionless (Osthoff and Brugmann 1878). Exceptionless means

1 When discussing these issues we disregard the social aspect of change, specifically the embedding in the social structure and the evaluation of change.

that all sounds in all words of a given language at a given time change in the same manner if they occur in the same conditioning context. This view, which attributed all exceptions to proposed sound laws to the mechanism of borrowing and analogy (1968), was challenged more seriously in the 1960s, when research in Chinese dialectology revealed that certain mechanisms of sound change do not affect all words of a language's lexicon at the same time, but instead spread from word to word (Wang 1969, Chen 1972). Later, Labov (1981) showed that both the phenomenon of lexical diffusion and the sound change mechanism proposed by the Neogram-marians reflected two basic mechanisms of sound change which could both be observed in empirical studies on sound change in progress (Bermudez-Otero 2007). More recently, Kipar-sky (1995) and Hock (2009) proposed to reconcile Neogrammarian sound laws and lexical diffusion, by identifying lexical diffusion with a specific mechanism of analogy.

The types of possible sound changes are often characterized by distinguishing types affecting the number of segments (fusion, apocope, epenthesis, etc.), types modifing the distribution of segments (metathesis), and types altering the phonetic features of segments (plosiviza-tion, spirantization, nasalization, etc.), but it is often difficult to make a clear cut between these sybtypes, since they often interact with each other. Kiparsky (1988) focuses on the underlying processes to distinguish three more general types of sound change: weakening processes, like assimilation, lenition, fusion, loss, and merger, strengthening processes, like dissimilation, fortition, and chain shifts, and prosodic processes, like compensatory lengthening, gemination, and epenthesis.

The Neogrammarians characterized the sound change process as phonetically blind. Much of the recent literature supports this traditional view, with the addition that perception is now seen as another important trigger of sound change (Ohala 1989). Other authors (including Tynjanow and Jakobson 1991[1928], Jakobson 1962[1929], Martinet 1952, Weinreich et al. 1968, Kiparsky 1988, 1995, Labov 1994) have emphasized the role which the linguistic structure plays in driving and restricting the types of sound changes. Thus, structural relations between sounds and distinctive features, the overall structure of the sound system of a language at a given stage, the functional load of phonological contrasts (Martinet 1952, King 1967), and the organization underlying morpheme-structure, are the ultimate factors that drive, constrain and accommodate patterns of change. Crucial in this respect is the role of perception and language acquisition in children, which may introduce selection biases into the pool of phonetic variants (Kiparsky 1995).

A robust and realistic model of sound change should include as many of the characteristics discussed above (and potentially even many more aspects which we did not mention). As a starting point towards more realistic computerized models of sound change, we think that the following three aspects are indispensable:

1) Gradiency. It has long been recognized that synchronic variation feeds diachronic change (Ohala 1989). Variants of /t/ before /i/, like [th, te, for example, may yield a change from [t] to [^]. Gradiency is an important aspect of Neogrammarian sound change, while it is not necessarily characteristic for lexical diffusion. Nevertheless, assuming that lexical diffusion also has its basis in synchronic variation, a model of sound change should assume gradiency and try to account for it, and instead of assuming a direct change from [p] to [h], we would assume a chain of changes from [p] via [pf] and [f] to [h].

2) Directionality. Sound change processes can be directional in the sense that a given sound X may change into a sound Y while the opposite is highly unlikely (Haspelmath 2004). It is beyond question that not all sound change transitions are directional, and that we can always find exceptions to very strong tendencies, but the directional component is a crucial characteristic of sound change, and any attempt to model sound

change for the purpose of phylogenetic reconstruction needs to take it into account. However, despite the fact that many scholars seem to agree that there are certain tendencies of sound change which can be observed across the languages of the world, the number of studies in which these tendencies were rigorously investigated is rather rare (Brown et al. 2013, Blevins 2004, Kümmel 2008, Dolgopolsky 1964 = 1986).

3) Context. The fact that contextual factors, including the sounds that precede or follow, but also suprasegmental aspects like accent and tone, may trigger specific sound changes has long since been recognized (Verner 1877). Some scholars distinguish conditioned from unconditioned sound changes (Campbell 1999: 17-19), but if one accepts that context plays a role in sound change, unconditioned sound change just represents a specific form of context that applies to all instances of a given change.

1.3. Traditional approaches to subgrouping.

There is a certain disagreement among scholars regarding the nature, the purpose, and the scope of the traditional comparative method in historical linguistics (Meillet 1954[1925], Weiss 2014). Some scholars see its main purpose in the proof of relationship (Anttila 1972, Harrison 2003), some see it as a general tool to study language history without further restrictions (Kli-mov 1990, Matthews 1997), some identify it with external reconstruction (Fox 1995, Lehmann 1969), and some see it as a method for language classification (Fleischhauer 2009). We think that the comparative method is best described as an overarching framework to study language history (Ross and Durie 1996, Klimov 1990). Whether the question of subgrouping should be included into this overarching framework has been the center of some debates (Harrison 2003), and while some linguists explicitly include phylogenetic reconstruction as one part of the workflow (Ross and Durie 1996), other scholars reject it.

No matter whether one considers it as part of the comparative method or not, the traditional approaches to subgrouping in linguistics go back to the end of the 19th century (Brugmann 1967) and are conceptually close to the framework of cladistics in evolutionary biology, which was developed in the 1950s (Hennig 1950). The common idea of subgrouping in classical historical linguistics and subgrouping in cladistics is that only shared innovations can be used to identify a valid clade. While this view is sound and just on the first sight, it is essentially circular for a couple of reasons. First, the identification of innovations defining a given subgroup requires knowing what was the ancestral state before the innovation occurred. This makes subgrouping vulnerable as it depends on the criteria used for linguistic reconstruction, which can vary considerably from one linguist to another even when dealing with the same sound correspondences. The most extreme risk is for true innovations getting analyzed as retentions and vice-versa. The only way to circumvent the problem of circularity is to know the direction underlying the processes under investigation. Directionality is, however, only a necessary condition for the identification of shared innovations, since it is likewise possible that directional process occur independently or are propagated by horizontal transmission. Second, subgrouping may bear the risk of circularity if linguists employ sub-grouping hypotheses when doing their reconstructions, since, in common practice, many linguists often have a certain tree topology in mind when refining their reconstructions.2 Third,

2 For instance, if languages A, B and C have the correspondence A r : B r : C l the reconstructed proto-sound might be dependent on which of the three languages are hypothetically more closely related to one another: if B and C are more closely related, then *r is likely the proto-sound. However, if it is A and B that are more closely related, then the reconstruction of *l has equal probability. In the latter case, an independent motivation can guide the linguist making the reconstruction. For example, the existence of an overlapping correspondence set A r : B r : C r, where *r would definitely be the reconstructed form, leaving *l for the correspondence A r: B r : C l.

many sound innovations do not occur only once in a well defined subset of languages. In fact, the most common types of sound change have a greater probability to occur multiple times in the evolutionary history of a linguistic family (homoplasy), not to mention cases of phonological borrowing (lateral transfer). In many cases it is a priori impossible to say whether a sound change is a "true" shared innovation (a singular evolutionary event) or an instance of independent innovations (multiple evolutionary events). In order to separate the wheat from the chaff, linguists need to distinguish which sound innovations are more reliable for subgroup-ing than others. This introduces, however, a further problem, since sound changes which are more reliable for subgrouping need to be rare. Rare sound changes, however, are difficult to observe and study, and the risk that they are merely based on wrongly proposed cognate sets or incorrectly interpreted assessments of directionality is very high (Harrison 2003). So no matter what we do in traditional subgrouping, as long as we try to use shared innovations, we are confronted with problems of circularity, epistemology, and objectivity.

But how can we increase the objectivity of the traditional subgrouping process, and how can we circumvent the obvious problem of circularity? In this paper, we propose a computerassisted method that helps to deal with conflicting patterns by using an objective criterion to identify the most adequate subgrouping solutions. This method models sound change as weighted directed transitions between character states in order to search for the phylogenies which provide the best explanation for the observed data. We use a parsimony framework to implement our model, but, apart from data sparseness and the complexity of implementation, there is no reason why our approach should be restricted to parsimony. As all methods based on parsimony, our method also maximizes the overall uniqueness of sound transitions by searching for those phylogenies that minimize the overall amount of change. In contrast to simple parsimony frameworks, however, we take the timing of changes into account: by providing a preferred order of character state transitions, we favor those solutions which correspond most closely to plausible pathways of change. As a result, our method helps to identify those rare changes which are the best candidates for shared innovations in the classical Neogrammarian sense, while at the same time revealing those changes which occur frequently and independently.

2. Materials and methods

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

2.1. Materials.

The data for this paper is a revised version of Chacon (2014). It is based on 150 cognates with correspondences across the major subgroups in the family. Cognates are root morphemes following a (C)V(C)V template. Words are written phonemically according to the source of information for each language (for details see Chacon 2014: 280). From the cognate sets and reconstructed forms, 18 proto-consonants were extracted. Due to context-specific patterns of change, the 18 proto-consonants yielded a total of 34 different correspondence sets with a total of 42 different reflexes in the extant languages. The proto-consonants along with the reflexes were plotted in a matrix, with proto-consonants, conditioning contexts and reflexes of each language in separate columns. The proto-sounds and the conditioning contexts are given in Table 1. The matrix with the reflexes in the 21 daughter languages is available in the Supplementary Material.

Based on the factors discussed in the previous section, we modeled patterns of sound change as gradient transitions between phonetic states, constrained by explicit direction preferences sensitive to the surrounding phonetic context. To some extent, this model follows from the Neogrammarian idea of sound laws, but it can be applied to sound changes with any de-

Table 1. Consonants and conditioning contexts in Proto-Tukanoan used for this study.

No. Sound Context No. Sound Context No. Sound Context

1 *h 13 *P #_ 24 *ts #_

2 *j #_ 14 *P V_V 25 *ts V_V

3 *j V_V 15 *p? #_ 26 *ts? #_

4 *k #_ 16 *p? #_V*p 27 *ts? #_i

5 *k V_V 17 *p? V_V 28 *tt V_V

6 *k i,e_ 18 *p? ~V_V 29 *t? #_

7 *k ~V_V 19 *s #_ 30 *t? V_V

8 *kk 20 *s V_V 31 *t? ~V_V

9 *k? #_ 21 *t #_ 32 *tj #_

10 *k? V_V 22 *t V_V 33 *w

11 *m 23 *t ~V_V 34 *?

12 *n

gree of regularity in the lexicon. According to Garret (2014), this means that our model accounts for the most common types of sound changes, but we cannot handle the structural pressure or systemic effects on sound change, as well as prosodic processes.

This specific model of sound change was applied to the sound matrix as follows: From the sound matrix, we constructed sound transition networks from a proto-sound to all reflexes within a sound correspondence set. The sound transition networks followed a model of phonetic transitions, which represent sound changes as internally organized in different stages within a pool of potential phonetic variations. The principles upon which the phonetic transitions were constructed are the following:

1) Finite space. Transitions start with a proto-sound and end in attested reflexes (e.g., *w > b). Reflexes which are judged to be ancestral to other reflexes on a phonetic basis were assigned intermediate positions in the transitions paths.

2) Intermediate states. In order to capture directionality and gradiency as general properties of the sound change process, certain transitions from proto-sounds to reflexes were mediated by intermediate proto sounds, i.e. sounds that are not attested in the daughter languages but can be inferred as a necessary state in order to guarantee a phonologi-cally natural transition from proto-sounds to reflexes. Intermediate proto-sounds were formalized as a minimal change of one articulatory or acoustic feature from a source sound to a target sound in the transition networks (e.g. *w > b, actually is inferred as **w > > b).

3) Competing pathways of change. The transition from a proto-sound to reflexes was sometimes allowed for multiple pathways of change. This is usually necessary when at least two reflexes represent different directionality of change (e.g. *k > x in language A and *k > s in language B, where x and s are incompatible reflexes of a single direction of change). Competing pathways of change are also important when a single reflex may be the result of more than one possible phonetic transition (e.g. *k > h could be the result of the transition *k > c > tj > ts > s > h or *k > kx > x > h).

4) Context dependency. A proto-sound can have more than one set of corresponding reflexes due to conditioning phonetic contexts. Thus, different sound transitions were proposed for every relevant conditioning context, with the concrete transitions capturing the intrinsic phonological naturalness of sound changes in each specific context.

2.2. Methods.

2.2.1. Modeling sound change as weighted directed transitions of character states. Phonetic transitions are the central component in our model to infer the phylogenetic history of a linguistic family. Our phonetic transitions represent valid characters which can be used in phy-logenetic approaches, be they based on parsimony or probabilistic frameworks. Our characters, however, differ substantially from the characters which are traditionally used in phy-logenetic analyses in historical linguistics, such as cognate sets (Gray and Atkinson 2003, Pagel 2009), or typological features (Longobardi et al. 2013), since:

1) they comprise multiple states as opposed to binary, presence-absence states of cognate sets or grammatical features,

2) they are polarized (Bryant 2001), that is, they contain preferred directions of character state transitions, and

3) they contain latent states, that is, they contain character states which are not reflected in any of the extant languages.

Using pathways of sound transitions as character states has many practical advantages for an analysis. Due to the use of non-reversible (polarized) models of character state transitions, the approach does not need an outgroup (Bryant 2001). With time-reversible models which do not allow for preferred directions of character state transitions, only unrooted trees can be inferred, since the parsimony or probability scores are the same, no matter where one places the root (Durbin et al. 2002: 176). Non-reversible (directed) models of character state transitions, however, yield different scores depending on the position of the root and can thus provide us automatically with a rooted topology. Although not very common, the advantages of non-reversible models of character evolution are well-known in evolutionary biology and have been discussed and tested in a couple of recent publications (Williams et al. 2015, Huelsenbeck et al. 2002). With a few exceptions, like Baxter's analysis of phonological mergers in Northern Chinese dialects (Baxter et al. 2006), or Pellard's analysis of Japanese and Ryukyu languages (Pellard 2009: 249-294), the advantage of directed models for character evolution has been mostly ignored in phylogenetic approaches to historical linguistics. A further advantage of our model is that it reduces dependencies. Since the characters comprise multiple states, we run less danger of modeling dependent characters independently, as it is the case for phylogenetic analyses based on lexical data in which the original meaning of the words is ignored (Pagel 2009). Furthermore, since we include latent character states, we account for the often-ignored fact (Bouchard-Côté et al. 2013, Hruschka et al. 2015) that there is no theoretical justification to assume that an ancestral language has only those sounds which are still reflected in descendent languages. A famous example of latent states in linguistic reconstruction are the coefficients sonantiques, later named laryngeals (Zgusta 2006), which Ferdinand de Saussure (1857-1913) proposed in 1879. Based on the internal comparison of Greek and Sanskrit morphology, Saussure reconstructed two sounds for Indo-European which were not preserved in any of its descendant languages. 50 years later, after Hrozny (1915) deciphered Hittite, Kurylowicz (1927) could show that one of the sounds was still reflected in Hittite (compare the initial in Hittite hant-s 'front, face' with Latin ante, Meier-Brugger 2002: 243). Today, scholars agree that at least three laryngeals were present

Character Language state

• A

o B

o C, D

A

B

C

D

C

£

c

€

A

-O B

-O c -o D

-• A

-o c

-O D -O B

i

A

-o b

-O D

-o C

A

-o b -o D -o C

i

a

-o c

-O D

-o b

-• A -O B

-o c -o D

A

-o b -o c -o D

£

c

£

C

c

A

-o c

-O D

-o B

A

-o c

-O B -O D

{

A

-O D

-o c -O B

A

-o b -o c -o D

A

-o b -o c

-O D

£

c

-<1

A

-o b -o c

-O D

-• A -O D

-O B

-o c

-O

-o

-o -o

Figure 4. The benefits of directed, multi-state characters including latent states in phylogenetic reconstruction. Assumed are four different models of transition, in which the first weighs all transitions equal (A), the second allows only specific (weighted) transitions (B), the third employs polarized weighted transitions (C), and the fourth employs polarized weighted transitions with latent character states. The consequences of each of the models are illustrated by showing potential phylogenetic trees which explain the evolution of characters under the model within a parsimony framework. As can be seen, the number of equally optimal phylogenetic trees decreases drastically from A, via B, C, to D.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

in the sound system of Proto-Indo-European (Clackson 2007: 33-40, Mallory and Adams 2006: 48-50).

From the perspective of parsimony, our model resembles classical Camin-Sokal parsimony (Camin and Sokal 1965) in that it also employs polarized character state transitions, but since we explicitly weight character state transitions and allow for multiple pathways for the same reflex, our model is probably better described as a polarized (directed) version of Sankoff parsimony (Sankoff 1975), including latent states. Our character model has many benefits for phylogenetic reconstruction. Not only does it spare us the rooting with outgroups, it also drastically reduces the number of optimal trees, as illustrated in Figure 4, where models of four different stages of complexity, ranging from undirected transitions (Fitch 1971), via weighted transitions (Sankoff 1975), and directed weighted transitions, up to directed weighted transitions with latent character states, are given along with the range of different solutions scoring equally well.

A. Reflexes (including proto-form)

EEEHEE

B. Transitions (provided by expert)

D. Creation of Transition Matrix

k - E k —s x

E-E x - h

E-E s - h

C. Conversion to directed network

k x tj J s h

k 0 1 1 2 3 2

x 50 0 50 50 50 1

tj 50 50 0 1 2 3

J 50 50 50 0 1 2

s 50 50 50 50 0 1

h 50 50 50 50 50 0

Figure 5. Construction of sound transition matrices from networks of sound transitions.

The core of the model is an individual transition matrix for each character which is constructed directly from the network of sound transitions. The creation of this matrix consists of four steps (see also Figure 5):

1) assemble the reflexes for a given proto-sound in a specificed context,

2) provide direct transitions between the sounds, adding additional steps (if needed), until it is guaranteed that the proto-sound can be converted into all of the reflex sounds,

3) convert the sound transitions into a directed network, and

4) calculate transition penalties between all characters (the transition matrix) by calculating the shortest path for each character pair, using a high penalty if no shortest path can be found.

Note that our model does not generally disallow certain transitions: if no other solution can be found, it will necessarily propose transitions which were flagged as unlikely to occur. Yet the high penalties for strange transitions, like, say an [h] becoming an [s], would force the algorithm to try to avoid to propose these changes.

2.2.2. Tree search heuristics. Since it is not feasible to search through the whole tree space when dealing with more than 10 languages, we need to search heuristically. The strategy we use employs a "genetic framework" in which the best trees in the sample are always retained and new trees are created by slightly modifying the best trees in the sample. In order to prevent to get stuck in local maxima, the search space of trees is constantly refreshed by adding trees drawn from a random sample. The users can decide when to stop their analyses. In practice, the algorithm seems to converge rather quickly, and we were often able to find first near-optimal trees for samples of 21 languages already after 10 000 iterations, as shown in more detail in Figure 6. One should, however, be aware that our search only touches the top of the iceberg. For 21 taxa, there are as many as 319 830 986 772 877 770 815 625 possible rooted trees (Felsenstein 1978: 31), of which we will necessarily only sample a very small amount, no matter whether we test 500 000 or 10 000 000 trees.

800

700

600

<ü i—

o

u to

>" 500 O

JE

i—

(Z

a.

400

300 200

0 1000 2000 3000 4000 5000 6000

Number of Trials

Figure 6. Searching the tree space. The plot illustrates how the tree heuristics searches the tree space for the first 6000 trees in a run. Note the nearly constant amount of badly scoring trees, reflecting the constant amount of random trees which are generated to make sure that the model does not get stuck in a local optimum. The diamond dots indicate trees which were among the best scoring ones after 100 000 iterations.

2.2.3. Implementation, analysis and evaluation. The code for the analyses was written in Python. It builds on LingPy (http://lingpy.org), a Python library for quantitative tasks in historical linguistics (List and Moran 2013). We analyzed three different models, in order to check for the effects which directionality and weights in sound transition models have on phyloge-netic reconstruction:

1) FITCH: a simple parsimony model that penalizes every transition with 1,

2) SANKOFF: a weighted parsimony model that penalizes transitions by calculating the shortest path in the sound transition network, but with the sound transition network being treated as an undirected graphm and

3) DiWeST (directed weighted state transitions): The model described above in which state transitions are penalized in dependence of recognized sound transition tendencies.

Each model was tested by running our tree-search heuristic on 500 000 trees and selecting those trees from the sample which had the lowest score. The source code underlying all analyses presented in this paper along with the results in form of text-files, plots, and interactive HTML applications can be downloaded from https://zenodo.org/record/45233.

. : ' I

■ ^ ^ • : : ; ; È ; ^. : ; : ^ ^f : : — : ; i : ': ;: ; ;. E : ; ; - ^ :

"^-jj ^ Î ' = ' i ; = ; j. ; = ^ î 1 :r^ *: ^ ^ = ^'ihE^: ^■ i0:^- = E =

S ' . • . ••■ ■■ ••. • .. ... .. • . • .■ .. ■• .. • - . •. .

, : i-E ..•:•.-•;..• . .• • ; ' ■ ,. •.. .... ... •.. ••. • '••• -.'a- '..■•• V: "•-• V \ ......■ • . ■. ; .•'..''••' ' . ■ ■ .;•• ■

••• ••.■.• «...

3. Results

3.1. General results.

The general results of the analysis are shown in Table 2, where we list the parsimony score, the number of most parsimonious trees, the amount of homoplasy3 and the reconstruction success for each of the models. The parsimony score is the sum of all individual parsimony scores for the characters in our sample. Since our models differ largely, the resulting scores are not comparable across different models, and they are only listed for completeness. The number of most parsimonious trees, on the other hand, reflects the resolving power of a given parsimony model (Grand et al. 2013): If a model produces multitudes of optimal trees, the discriminative force is low, since the model allows for a high number of equivalent solutions.4 The degree of homoplasy reflects how many identical sound transitions occur across different branches of the tree. If our characters were chosen in such a way that homoplasy can be excluded a priory, the degree of homoplasy would give us an immediate hint regarding the quality of a given tree. Since, as elaborated above, it is by no means unlikely that certain sound transitions occur independently and repeatedly during language evolution, homoplasy is, as the parsimony score, no measure for the quality of a given analysis. The reconstrucction success was measured by comparing to which degree the characters that were reconstructed back to the root by a given tree and a given model were identical with the proto-forms proposed by Chacon (2014). The score was computed by re-applying the parsimony analysis with the respective model on a consensus tree of the trees found to be most parsimonious. The consensus tree was computed with help of the Dendroscope software (http://dendroscope.org, Huson and Scornavacca 2012). In order to account for the fact that both the FITCH and the SANKOFF model need to be externally rooted, since they lack the directed character state transitions of the DiWeST model, we rooted them by treating WT (which was inferred as one group in both approaches) as an outgroup.

As can be seen from the results, the DiWeST model outperforms the two other models largely regarding resolving power and reconstruction success. While FITCH and SANKOFF yield 716 and 1019 optimal trees, only 18 out of 500 000 trees we searched are optimal with regard to DiWeST. It is not surprising that the three models differ regarding the degree of ho-moplasy: the core principle of parsimony analyses is to reduce homoplasy, but the differences in the models lead to differences in the assessment of homoplasy. Since only specific transitions are favored in the SANKOFF and the DiWeST models, certain transitions which do not cost very much are allowed to occur more often than changes which occur not very frequently but might cost a lot. As a result, the general homoplasy scores in both models are higher than in the FITCH model. The reconstruction success of the DiWeST model underlines, however, that the general amount of homoplasy is not a good indicator of model realism. We know that certain sound transitions occur abundantly and frequently, and if we use sound transitions to reconstruct phylogenetic trees, we should try to account for this fact.

3 We compute homoplasy by simply counting how often a certain change from a character state X to a character state Y occurs on the tree. There are well-defined indices for homoplasy in the literature (see Nunn 2011: 31f). We had difficulties in applying them, however, since these indices are traditionally only defined for binary character transitions and non-directional models.

4 Large numbers of optimal trees reflect our uncertainty regarding the character evolution for a given model, and it is a well-known phenomenon that parsimony can yield islands of very similar trees (Maddison 1991). We find it striking, however, how drastically the directed model reduces the number of optimal trees, and we take this as evidence that it reduces our uncertainty.

Table 2. General results for the three different analyses.

Model Parsimony score Best trees Homoplasy Reconstruction success

FITCH 104 716 0.64 39%

SANKOFF 148 1019 0.79 37%

DiWeST 184 18 2.0 90%

Table 3. Comparing the frequencies of specific change patterns as proposed by the three models.

Source Target FITCH SANKOFF DiWeST

s h 5 0 7

h s 0 16 0

ts s 0 0 8

s ts 1 1 0

tj s 0 0 8

s tj 9 2 3

h tj 0 7 0

tj h 0 0 1

3.1.1. Sound changes with increased degrees of homoplasy. Our approach makes it easy to measure the degree of uniqueness of each sound transition or, in other words, the degree of independent parallel evolution. For this, we simply count how often a given change occurs across all branches of the family tree. In doing so, we can easily spot those changes which occur most frequently independently of each other and compare those findings with our general knowledge and intuition on tendencies and frequencies of sound transitions. Table 3 summarizes the most frequent sound transitions resulting from the three different models. That the DiWeST model shows exclusively credible transitions which confirm our intuition is not surprising, since it was built using explicit expert assumptions regarding directed sound transition tendencies. What is more surprising, however, is that the two undirected models, FITCH and SANKOFF, often even seem to favor the patterns inverse to our expectations. This underlines that the data itself is not enough to infer realistic processes. On the contrary, it seems that the data can be even quite misleading when using weak models which do not allow for direc-tionality.5

5 One might argue that the fight between the undirected and the directed models is an unfair battle, since we put much more information into the directed models. Note, however, that we manually rooted the undirected FITCH and SANKOFF models prior to computing the most frequently occurring parallel transitions, thus bringing direction to the models via the tree. The fact that this does not seem to help much can be explained in two ways: One could assume that there is not enough data to infer the correct directions, or one could assume that reversible models are generally misleading. For the moment, we cannot decide which assumption is right. Additional tests are needed to determine whether reversibility is generally problematic for the models, or only in cases of sparse data or even specifically to phonological data.

Western Tukano ^^ East-Eastern Tukano South-Eastern Tukano West-Eastern Tukano

-foli"

■|q|[

InT

Des o

Sir o

Yup o

Pir

Tuk

Wan

Yur

Tuy

Pis

Kar

Tat

Bar

Tan o

Kub o

Mak o

Bas o

Kue o

Mai o

Kor o

Sek o

Sio o

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Figure 7. The FITCH tree. The colored circles next to the language names indicate the classification by Chacon (2014).

3.1.2 Tree topology. The three consensus trees delivered by each model are quite different, with the FITCH and the SANKOFF tree presenting classifications that are only in parts in concordance with previously proposed classifications. Both FITCH and SANKOFF identify WT as a valid subgroup, which makes it easy for us to root them. They differ, however, largely from previous classifications regarding the lower subgroups, especially the ones proposed for ET. Thus, the FITCH analysis (see Figure 7) does not recover any of the three generally recognized ET subgroups, and the SANKOFF analysis recovers only Western ET as a firm grouping, which it nests inside Eastern ET. The fact that SANKOFF has Tanimuka as a first outgroup of ET can be seen as evidence for a Central Tukano branch, as suggested by scholars before (Malone 1987, Barnes 1999), but given the general impression that the SANKOFF analysis gives in comparison to previous proposals, this should be taken with care.6

The DiWeST model resulted in the best tree of all analyses. First, the WT and ET split is firmly and independently identified, since, as we mentioned above, directed models do not require external rooting. Eastern-ET is well captured and has an internal classification that combines in an interesting way the classifications proposed by Waltz and Wheeler (1972), Bar-

6 As a first experiment, we computed consensus trees of all FITCH and SANKOFF trees without manual rooting. Both models failed to identify the ET-WT split. It is also interesting to notice that our manual rooting of the consensus trees did not much improve the reconstruction scores, which were 35% and 33% for FITCH and SANKOFF, respectively. We actually expected to have a higher increase in reconstruction success with the manual rooting, but it seems that here again, the lack of directionality in both models is of great importance.

(//^ Western Tukano ^^ East-Eastern Tukano South-Eastern Tukano West-Eastern Tukano

Figure 8. The SANKOFF tree. The colored circles next to the language names indicate the classification by Chacon (2014).

nes (1999) and Chacon (2014), having Tukano, Piratapuyo and Wanano as an outgroup to the rest of the ET-Eastern languages. Western ET closely matches with Chacon's proposal, except for Kubeo. In fact, DiWeST classifies Tanimuka and Kubeo as ET languages (as Chacon 2014), but quite remarkably identifies them as outgroups to the rest of the ET branch. This is another interesting combination of the classification trends in Waltz and Wheeler (1972) and Barnes (1999), who classified Tanimuka and Kubeo as Central Tukanoan, and Chacon (2014), who classified Tanimuka as an ET language that outgroups the rest of the branch and Kubeo as a Western-ET language.

3.2. Specific results.

Above, we have illustrated that the DiWeST model outperforms the two other analyses in many respects, yielding a more integrated picture of previous analyses, and different kinds of data. For this reason, in the remainder of this section we will try to carry out a more finegrained analysis of the consequences of the model regarding reconstructions, sound change patterns and subgrouping criteria.

3.2.1. Reconstructions. The DiWeST model fully recovered the same proto-consonant as Chacon (2014) in 30 out of 34 contexts. This means that in all possible scenarios that plot character evolution along the consensus tree reflecting the DiWeST model, the character state proposed for the root was identical with the character proto-consonant proposed by Chacon

(Western Tukano ^^ East-Eastern Tukano South-Eastern Tukano ^^ West-Eastern Tukano

-q;o>

-[gzT -IqzT

■H|[

■foir

-tozT

Tan o

Kub o

Pis

Tuy

Yur

Kar

Tat

Bar

Pir

Tuk

Wan

Yup o

Sir o

Des o

Mak o

Bas o

Sek o

Sio o

Mai o

Kor o

Kue o

Figure 9. The DiWeST tree. The colored circles next to the language names indicate the classification by Chacon (2014).

(2014). In four cases, the model differed from Chacon's reconstruction. The four proto-consonants are: *p?, *t?, *tt and *kk, all occuring in intervocalic context (V_V). None of these sounds are found as reflexes in the daughter languages in this particular context, but given that the design of the model does not a priori prohibit the reconstruction of latent character states, as also reflected in the fact that the DiWeST model reconstructs *p? (#_V), *ts? (in all contexts), *t? (#_ and ~V_V) and *k? (V_V), this is not the reason for the divergence between the model's prediction and Chacon's reconstruction.

In fact, the proto-form *p? was also proposed by the DiWeST model, but only in two out of three possible scenarios which all are equally parsimonious with respect to the model and the tree. One alternative scenario with equal weight yields the reconstruction of *V?p (a laryn-gealized vowel followed by [p]). Similarly, the DiWeST model reconstructs *V?t instead of Chacon's *t?, but this time in all possible scenarios for character evolution. In Chacon (2014) the reflexes corresponding to *V?C (laryngealized vowel followed by a consonant) are treated as an innovation in Proto-ET, where *C? (a proto-laryngealized consonant) became prelaryn-gealized in intervocalic context (*C? > *V?C / V_V). Some ET languages further changed it to a voiced stop, loosing the pre-laryngealization, whereas other languages changed it to a voiceless stop, also loosing the pre-laryngealization. The languages that kept a V?C reflex represent a retention from Proto-ET. The difference between DiWeST and Chacon (2014) is an issue of phonological reconstruction and not of subgrouping, illustrating the lack of structural (systemic) considerations in our model of directed weighted state transitions. DiWeST lacks struc-

tural considerations, and every character is modeled in isolation of all other characters. Chacon's (2014) reconstruction, however, identifies a complementary distribution between *V?C in intervocalic context and *C? in word-initial context, thus reconstructing *C? in both contexts, even though some amount of allophonic variation could be expected in the different environments.

Instead of geminate sounds *tt and *kk, the DiWeST model reconstructed *t and *k. Chacon (2014) reconstructs geminate stops on the basis of complex sound correspondences between plain stops versus geminate stops. Reflexes of *C in intervocalic contexts have lenited reflexes (/d/ or /g/) in Desano, Yupua, Siriano and Kubeo, whereas geminates have voiceless reflexes in the same languages. Thish is is another example for the relevance of structural dependency which we did not model in our automatic approach. According to the internal logic of sound transitions in the DiWeST model, reconstructing a simple consonant in intervocalic context minimizes the parsimony score. That it leads to sound change patterns which are difficult to explain from a structural perspective and actually contradict Neogrammarian doctrine which says that identical contexts should yield identical reflexes can only be handled when taking the structural perspective into account.

3.2.2. Sound changes and subgrouping. Given the major and minor subgroups proposed by DiWeST and Chacon (2014), we compare the most relevant changes in these trees in terms of shared innovations. Thanks to our automatic approach which illustrates all consequences for a given phylogeny, we can define shared innovations in a very strict manner now, as those changes that occurred just once in an ancestor language to a given subgroup of the family. From this we distinguish independent innovations as those changes that occurred more than one time in different branches of the tree. As the analysis shows, there is a high degree of agreement between Chacon's analysis and the DiWeST analysis of sound changes regarding the major subgroups WT and ET. With minor subgroups in each major branch, there is also wide agreement regarding the sound changes involved, but because of the differences in subgroup-ing, the two analyses have different proposals regarding the discrimination of shared innovations and independent innovations. Thanks to our automated approach, we can easily plot all individual scenarios for a given reference tree and an underlying model of sound change transitions within an interactive application.7 The following comparison of the family tree proposed by Chacon (2014) and the consensus tree for the most parsimonious trees of the DiWeST model can be directly compared with the interactive applications.

Western Tukanoan. The changes that took place from Proto-Tukanoan to Proto-WT according to both analyses are8: *p > h [13, 14], *p? > h/V_V [16, 17, 18], *t? > t/V_V [30], *ts > s [24], *k? > k/V_V[10]. The change *p > h also took place in the ET languages Barasano and Ma-kuna, but they did not undergo the change from *p? > h, which occurred in WT languages. Also no other ET language showed a systemtic change of *p? > h, *t? > t and *k? > k, where clearly there is a merger between all *C? and *C / V_V (laryngealized stops with plain stops between vowels). The change *ts > s also occured in some ET languages, but clearly in more shallow subgroups in the branch, given the variety of reflexes in ET languages.

Eastern Tukanoan. The following are the shared innovations in the ET branch, equally proposed by Chacon and DiWeST: *p? > b ~ V?b [15, 18], *t? > d ~ V?d [29, 31], *m, *n > b, d [11,

7 This application can be found online at http://digling.github.io/tukano-paper/. It can be downloaded from https://zenodo.org/record/45233/.

8 Numbers in brackets refer to the 34 characters in Table 1 and in the Supplementary Material.

12], *ts? > dz [26, 27]. The changes *p? > b ~ V?b and *t? > d ~ V?d are another set of systemic changes, where *C? > Cvoiced (laryngealized consonants became voiced consonants). Another case of systemic change was the merger of proto-nasal stops *m and *n with Proto-ET voiced stops b (<*p?) and *d (<*t?), not shared with WT languages. A subsequent change of *t? > d ~ V?d was d > r, which occurred in several subgroups of ET and it has been interpreted as independent innovations by Chacon (2014) and DiWeST. The change from *ts? > dz was subsequently followed by the changes changes dz > d (> r) / _ i and dz > j elsewhere9. The change from *ts? > dz is also independently shared with the WT languages Maihiki, Koreguahe and Kueretu. Some WT languages also present the change *t? > d, *t? > ?d and (?)d > r.

We now turn to sound changes that are relevant for minor subgroups within the family. We start with the most relevant changes for ET subgroups, followed by WT subgroups.

Tanimuka-Kubeo (Eastern Tukanoan). DiWeST proposes Kubeo-Tanimuka as an out-group to the rest of the ET languages. Several changes are listed as supporting this soubgroup: *h > 0 [1], *s > h [19, 20], *ts > s > h [24, 25], *tj > s > h, p? > p / #_V*p [16], *k? > k [10]. This is actually a clear chain shift pattern, where *h > 0 occurred first, followed by the debucalization of all proto-fricatives and affricates after merging with *s. In addition, Kubeo underwent the change *ts? > dz > h, whereas Tanimuka did not — showing that the debucalization process became more general in Kubeo. These changes are highly homoplastic, however, being shared by several other languages. The change *h > 0 also occurred in all WT languages (with the exception of Kueretu) and the Eastern-ET languages Tuyuka, Yuruti, Pisamira, Karapana, Tatuyo and Bara. To a lesser extent the same is true for the debucalization of fricatives and affricates, which also occurred in the same Eastern-ET languages. The change from *k? > k is also highly homoplastic, occurring in WT and the Eastern-ET Tuyuka, Yuruti, Pisamira, Kara-pana, Tatuyo and Bara. The only truly unique change is p? > p / #_V*p. Since it resembles a ty-pologically not very common process of consonant harmony, it can be seen as a reliable sound change for subgrouping.

There are only three changes analyzed by DiWeST as unique to the larger subgroup within the ET branch (that is all ET languages except Kubeo and Tanimuka), namely: p? > b / #_*p [16], *ts? > dz > j, d [26, 27]. The change *ts? > dz > j, d is also shared by Tanimuka, and both DiWeST and Chacon (2014) do not classify this resemblance as a shared innovation, despite its quite unique pattern. The change p? > b [16] in word-initial position cannot be analyzed as true innovation since it is simply the same reflex of the more general change *p? > b [15].

Eastern-ET. The Eastern-ET subgroup is supported by three shared innovations: *j > tj / V_V [3], *ts > j [24], *tj > s [32]. Both Chacon and DiWeST agree that *j > tj / V_V is a truly unique change among Estern-ET languages. The two analyses also agree that the change *ts > j is a shared innovation of all Eastern-ET languages, while an independent innovation among some Western-ET languages. This is likely due to areal contact between Eastern-ET (the original innovators) and Western-ET. The same situation is found regarding the change *tj > s, but, because it is also found among WT languages, Chacon (2014) analyzes it as independent innovation in different branches of the Tukanoan family.

The analyses by Chacon (2014) and DiWeST differ slightly regarding the subclassification of the Eastern-ET languages. Considering the validity of the Tukano-Wanano-Piratapuya sub-

9 Kubeo is an exception where dz > h, following a systemic change where all Proto-ET fricatives and affricates debuccalized becoming [h].

group proposed by DiWeST, three shared innovations are given: *k? > V?k [10], *p? > V?p [17] and *ts > s [25]. The latter change is highly homoplastic, having occurred in all the other East-ern-ET languages (except for Pisamira), in the Western-ET languages Desano, Yupua and Siri-ano, and in all WT languages (except for Kueretu). The other changes *k? > V?k [10] and *p? > V?p [17] were interpreted as occurring early in Proto-ET by Chacon (2014), which makes them no direct candidates for shared innovations10. However, there is ample support of sound changes for the subgroup composed by Tuyuka, Yuruti, Pisamira, Karapana, Tatuyo and Bara, namely: *h > 0 [1], *? > 0, *k? > V?k > k [10], *p? > V?p > p, b [17, 18], *t? > V?t > t, d [30, 31]. One can observe that the intermediate stages of the *C? > C changes are shared with Tukano, Wanano and Piratapuyo, but the final reflexes are not. In fact, the final reflexes correlate with the loss of *? in these languages, which seems to be a consequence of a more general loss of glottal segments, including *h. In most of these languages, there was subsequent change where *s became h. So despite the homoplastic distribution of *h > 0 and *? > 0, there is a strong systemic pattern regarding these changes, which favors a subgrouping proposal. The same is not true, however, regarding Tukano, Pira-Tapuyo and Wanano.

Western-ET. A greater amount of shared innovations bundles up at the Western-ET subgroup: *k? > g [9, 10], *p? > b [17, 18], *t? > d [30, 31], *? > 0 [34]. The change *? > 0 is highly homoplastic, having occurred in Kubeo, Eastern-ET languages (Tuyuka, Yuruti, Pisamira, Karapana, Tatuyo and Bara) and the WT language Maihiki. Changes *p? > b / ~V_V [18] and *t? > d / ~V_V [31] were analyzed by DiWest as 3 independent innovations: Kubeo, Western-ET and a subgroup within Eastern-ET (Tuyuka, Yuruti, Pisamira, Karapana, Tatuyo, Bara). Chacon (2014) proposes the same analysis, except from the change in Kubeo and Western-ET which is interpreted as a shared innovation. In fact, the intermediate stage of these changes was *v?C in Proto-ET, which is recovered by both models. A few ET languages retained this kind of reflex. The changes *p? > b / V_V [17] and *t? > d / ~V_V [30] are more restricted and occurred only in Western-ET languages, including Kubeo. Chacon (2014) analyzed these changes as shared innovations, but the DiWeST characterizes them as independent innovations between Kubeo and Western-ET. A similar pattern also holds for *k?. The change *k? > g

/ #_[9] is independently shared by a number of ET languages if we assume that the reflex 0

is the result of the transition *k? > g > 0. Since there are many synchronic alternations between [g] and 0 in these languages, this analysis seems correct. Thus, we may include within this change also Tanimuka, Eastern-ET languages (Tukano, Karapana, Tatuyo, Bara) and the WT language Maihiki. The change *k? > g / V_V [10] is more restricted, but includes Kubeo, in addition to Western-ET. Thus, if we exclude Kubeo from the Western-ET branch, there is no single shared innovation that is unique to this subgroup. If we include Kubeo, though, the West-ern-ET branch becomes more consistently unique.

Two other changes not captured by DiWeST were suggested by Chacon (2014) in favor of the classification of Kubeo as a Western-ET language: *t > d / (~) V_V [22, 23]. When the vowels are nasalized, the change t > d is more general and includes Tanimuka, Kubeo, Yupua, Desano, Siriano (all ET) and the WT language Kueretu where t > n /~V_V.11 When the context has only oral vowels, the change is restricted to Kubeo, Desano, Yupua and Siriano. The change in the

10 Notice that the change *t? > V?t [30] was readily captured by DiWest; in fact, the model actually reconstructs V?t for Proto-Tukanoan. Reflexes of *t? and *p? are quite parallel in ET, though not *k? which has more diverse reflexes. The problem in capturing parallel changes in *t? and *p? may be due to a lack of a structural view of the evolution of sound systems by DiWeST.

11 [n] is an allophone of /d/ in several ET languages, as well as Sekoya.

oral context was taken as further evidence for the placement of Kubeo among the Western-ET languages Desano, Yupua and Siriano by Chacon (2014). The change in the nasal environment was seen as an independent innovation in Tanimuka and Kueretu. The DiWeST model is slightly different in that it analyzes the change *t > d /~V_V [23] as a shared innovation between Kubeo and Tanimuka, whereas for *t > d /V_V [22] it assumes an independent change in Kubeo.

WT subgroups. Regarding the WT branch, while there is more solid support for the Siona-Sekoya subgroup, there is very little support for a subgroup composed of Koreguahe, Maihiki and Kueretu. DiWeST differs from all previous classification in proposing such a subgroup. The only unique sound change in these languages is *ts? > dz [26, 27]. Maihiki further

changed dz > j /_i and dz > d elsewhere. Kueretu and Koreguahe changed dz > j. Note that

the other WT languages, Siona and Sekoya have [s?] as a result of this change. While the change *dz > j is unique among WT languages for Maihiki, Koreguahe and Kueretu, it is not a unique change regarding the remainder of the family, since many ET languages have undergone the same change. In fact, the reconstruction of this proto-consonant is itself problematic and, thus, not reliable for subgrouping. Maihiki also has two independent changes: *? > 0 and *w > b. There are, however, sound changes that favor a Koreguahe and Kuretu subgroup. These include: *k? > k (Maihiki has [g], Siona and Sekoya [k?]), *p? > p (Maihiki [?b], Siona and Sekoya [p?]), *t? > r (Maihiki [?d], Siona and Sekoya [d]). On the other hand, other changes show a split between Kueretu and Koreguahe, such as *ts > t [24, 25] and *tj > t [8] in Kueretu.

4. Discussion

As it can be seen from the above discussion, Chacon's (2014) subgrouping proposal and the proposal inferred from the DiWeST model agree to a large extent, not only in the topology of the family tree but also regarding the interpretation of sound innovations concerning ET versus WT branches, and the Western-ET versus Eastern-ET subgroups. The main point of disagreement concerns WT subgroups, Eastern-ET subgroups and the subgrouping of Kubeo with Tanimuka or with Western-ET languages.

While there is some evidence for the Tanimuka-Kubeo subgroup, the evidence for a larger subgroup in the ET branch, composed by all ET languages except Tanimuka and Kubeo, is quite weak. Even the Tanimuka-Kubeo subgroup can be questioned. Chacon (2014) interpreted all sound changes regarding the Tanimuka-Kubeo subgroup as independent innovations, giving less emphasis to the uniqueness of p? > p / #_V*p and the outstanding debucali-zation processes. What was more relevant for the classification of Kubeo was the set of shared innovations with Western-ET languages concerning the systematic changes of *? > 0 [34] and *C? > V?C > Cvoiced [10, 17, 18, 30, 31]. The DiWeST model also recovered these changes, but interpreted them as independent innovations in Kubeo.

Because of the competing evidence for the classification of Kubeo as part of a Tanimuka-Kubeo subgroup or the Western-ET subgroup and the lack of more solid evidence for a single large subgroup of all ET languages excluding Kubeo and Tanimuka, we suggest a revision of the ET subclassification in favor of a tree topology with four main branches as illustrated in Figure 10. Based on the discussion in the previous section, the Eastern-ET subgroup can also be revised in favor of a tree topology with three main branches, since there is no clear evidence for a Tukano-Wanano-Piratapuya subgroup, despite retentions and homoplastic changes. On the other hand, there is ample evidence for an Eastern-ET Inner subgroup, composed by Tuyuka, Yuruti, Pisamira, Karapana, Tatuyo, Bara (see Figure 10). As for the WT subgroups, there is clear

) Western Tukano ^^ East-Eastern Tukano West-Eastern Tukano ^^ Tan-Eastern Tukano Kub-Eastern Tukano

Tan o

Kub ©

Bas o

Mak o

Yup o

Des o

Sir o

Tuk

Wan

Pir

Tuy

Yur

Pis

Kar

Tat

Bar

Kor o

Kue o

Sio Q

Sek o

Mai o

Figure 10. A consensus tree between DiWeST and Chacon (2014). The tree is conservative in that we hesitated to interpret those changes as shared innovations which show a homoplastic distribution and did not side in cases of competing substantial evidence for alternative subgroupings.

evidence for a Siona-Sekoya subgroup and, perhaps, a Koreguahe-Kueretu subgroup. Maihiki most likely composes a single subgroup of its own. We found no evidence for a more detailed subclassification, so we also suggest the conservative proposal of a topology with three main branches in WT. Our consensus tree is more conservative than DiWeST and Chacon (2014) and directly reflects our current uncertainty in certain parts of Tukanoan language history. Until we know more about the proper handling of competing evidence for subgrouping and the interpretation of homoplastic character state transitions (be they due to contact or parallel evolution) we feel less comfortable in proposing a more detailed subgrouping for the Tukanoan languages.12

In fact, one of the results of experimenting with the DiWeST model has been to try to demonstrate the amount of homoplasy regarding sound changes in a more objective manner than this has been done in previous analyses. Although interpreting shared innovations works quite well regarding major subgroups, with minor subgroups we see a wave-like pattern of sound changes that bundle in one minor subgroup crossing over other minor subgroups. The result is a complex picture of bundles of sound changes that all together make a unique configuration of sound innovations for each subgroup, following the principle of overall uniqueness discussed in section 1.2. However, closer scrutiny of the data shows that there are only a

12 The consensus tree reflect uncertainty by using multifurcations. The purpose of these multifurcations is not to represent concrete facts for a particular topology but to reflect that some facts are not conclusive, not allowing us to actually decide where to place a given language or a given group of languages.

few unique sound changes for every minor subgroup. In these cases, it seems inevitable to explore additional evidence for subgrouping, including a structural analysis of bundles of sound change transitions, or independent linguistic evidence like lexicon and morphology. In this study, we restricted ourselves to deal solely with phonology in order to keep the complexity at a level that would still allow us to manually inspect all consequences of a given model and proposal. Future approaches may advance on this.

Interestingly, the contrast between clearly defined higher order subgroups and increasing degrees of fuzziness in lower level subgroups reflects in parts the history of the Tukanoan family. The major subgroups reflect a history of split and separation with no or only moderate degrees of language contact. The smaller subgroups reflect a history of split with continuous language contact, which is by no means surprising given the linguistic area of the Vaupes and the intermarrying practice of the region where ET subgroups are located.

5. Conclusion and outlook

This paper has presented a new method for the analysis of sound changes and the inference of linguistic phylogenies. In contrast to fully automated approaches, this method relies much more on expert input, reflecting a computer-assisted framework for historical linguistics which is based on the interaction between humans and machines rather than the replacement of humans by machines. We think that the consensus results we presented in Section 4 are a clear improvement compared to previous classification attempts, not only because they reconcile alternative proposals made in the past, but also and especially because they transparently list all our evidence and expose it to falsification attempts by future analyses. Thus, we think that the method proposed in this paper is an important tool not only because it emphasizes the importance of using more realistic solutions for the modeling of sound changes and the inference of linguistic phylogenies, but mainly because it allows us to handle our data and our criteria for language classification with greater scrutiny. Furthermore, the results of the approach are also useful in a practical way for those who work on phonological reconstruction and sub-grouping. After linguists apply the comparative method and arrive at a first classification of their language family, they can then use the approach presented here in order to test the consequences of their reconstruction and classification in a concrete model, thus acquiring a new viewpoint on their own findings.

We tested our approach only on Tukanoan language data. Critics may rightfully say that we would have done better by testing it on more well-studied language families, such as Romance, for example. Unfortunately, we lack time and expertise in this linguistic area to propose the relevant sound transitions and the sound correspondences for a sufficiently large sample of languages. In the future, we hope we can find collaborators on different language families with whom we can further test the potential of the method. A further point of criticism may relate to the sound transitions which we fed to the algorithm. We are aware of the fact that the transitions we proposed may be subjective in certain aspects and thus favor specific results. Unfortunately, there are no large collections of sound transitions which could be used independently for this purpose, and despite the large amount of empirical data underlying recent diachronic and synchronic theories (Blevins 2004, Mielke 2008), the data that is used in these theories are not enough to account for the specific situation of sound change in the Tukanoan languages. As long as we lack reliable databases on sound change frequencies across the language families in the world, our method will rely on the experts' intuition on sound change tendencies in the languages under investigation.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Our current model is still far away from being linguistically realistic. We improve on previous gain-loss models by introducing directionality and transitions between multi-state characters, as well as latent character states, but we still lack the means to model the structural aspects of sound change. That structural aspects are important to model sound change realistically, seems to be out of question, and future research is needed to enhance our current approach in this respect. Another aspect our approach does not cover so far is the transmission and propagation of change. For the sake of simplicity we assume tree-like patterns of language divergence. As our problems to find enough support for lower-level subgroups in ET illustrate, however, we should not forget that the genetic signal may easily be blurred by contact and horizontal transmission.

Supplementary materials.

The supplementary material accompanying this paper contains the source code and the data needed to replicate these analyses, the results mentioned in the paper along with additional detailed results for all analyses which we carried out, and an interactive application which allows the user to investigate all inferred changes for a given phylogeny. The supplementary material can be downloaded at https://zenodo.org/record/45233. The interactive application can directly be browsed from https://digling.github.io/tukano-paper/.

References

Anttila 1972 — R. Anttila. An introduction to historical and comparative linguistics. Macmillan, New York.

Barnes 1999 — J. Barnes. Tucano. In: The Amazonian languages. Ed by. R. Dixon & A. Aikhenvald. Cambridge University Press, Cambridge, pp. 207-226.

Baxter et al. 2006 — G. Baxter, R. Blythe, W. Croft, A. McKane. Utterance selection model of language change. Physical Review E 73, pp. 046118-1 — 046118-20.

Bermudez-Otero 2007 — R. Bermudez-Otero. Diachronic phonology. The Cambridge handbook of phonology. Ed by. P. de Lacy. Cambridge University Press, New York, pp. 497-517.

Beuchat & Rivet 1911 — H. Beuchat, P. Rivet. La famille Betoya ou tucano. Memoir de la Societé Linguistique de Paris 17, pp. 117-36.

Blevins 2004 — J. Blevins. Evolutionary phonology. The emergence of sound patterns. Cambridge University Press, Cambridge.

Bouchard-Côté et al. 2013 — A. Bochard-Côté, D. Hall, T. Griffiths, D. Klein. Automated reconstruction of ancient languages using probabilistic models of sound change. Proc. Natl. Acad. Sci. U. S. A. 110.11, pp. 4224-4229.

Brinton 1891 — D. Brinton. The American race. A linguistic classification and ethnographic description of the native tribes of North and South America. N. D. C. Hodges, New York.

Brown et al. 2013 — C. Brown, E. Holman, S. Wichmann. Sound correspondences in the world's languages. Language 89.1, pp. 4-29.

Brugmann 1967 — K. Brugmann. Einleitung und Lautlehre: vergleichende Laut-, Stammbildungs- und Flexionslehre der indogermanischen Sprachen. Walter de Gruyter, Berlin, Leipzig.

Bryant 2001 — H. Bryant. Character polarity and the rooting of cladograms. The character concept in evolutionary biology. Ed by. G. Wagner. Academic Press, San Diego, London, pp. 319-342.

Camin & Sokal 1965 — J. Camin, R. Sokal. A method for deducing branching sequences in phylogeny. Evolution 19.3, pp. 311-327.

Campbell 1999 — L. Campbell. Historical linguistics. An introduction. Edinburgh Univ. Press, Edinburgh.

Chacon 2013 — T. Chacon. Kubeo: Linguistic and cultural interactions in the Upper Rio Negro. In: Upper Rio Negro: Cultural and Linguistic Interaction in Northwestern Amazonia. Ed by. P. Epps & K. Stenzel. Museu Nacional and Museu do Indio Funai, Rio de Janeiro, pp. 403-443.

Chacon 2014 — T. Chacon. A revised proposal of Proto-Tukanoan consonants and Tukanoan family classification. Journal of American Lingusitics 80.3, pp. 275-322.

Chacon 2015 — T. Chacon. The reconstruction of laryngealization in Proto-Tukanoan. Laryngeal Features in the Languages o f the Americas. Ed by. M. Coler. Brill, Leiden, pp. 258-284.

Chen 1972 — M. Chen. The time dimension. Contribution toward a theory of sound change. Foundations of Language 8.4, pp. 457-498.

Clackson 2007 — J. Clackson. Indo-European linguistics. Cambridge University Press, Cambridge.

Dolgopolsky 1964 — A. Dolgopolsky. Gipoteza drevnejsego rodstva jazykovych semej Severnoj Evrazii s verojat-nostej tocky zrenija. Voprosy Jazykoznanija 2, pp. 53-63.

Dolgopolsky 1986 — A probabilistic hypothesis concerning the oldest relationships among the language families of northern Eurasia. In: V. V. Shevoroshkin & T. L. Markey (eds.). Typology, Relationship, and Time: A Collection of Papers on Language Change and Relationship by Soviet Linguists. Ann Arbor (MI): Karoma, pp. 2750.

Durbin et al. 2002 — R. Durbin, S. Eddy, A. Krogh, G. Mitchinson. Biological sequence analysis. Probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge.

Felsenstein 1978 — J. Felsenstein. Cases in which Parsimony or Compatibility Methods Will be Positively Misleading. Systematic Zoology 27.4, pp. 401-410.

Fitch 1971 — W. Fitch. Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology. Syst. Biol. 20.4, pp. 406-416.

Fleischhauer 2009 — J. Fleischhauer. A Phylogenetic Interpretation of the Comparative Method. J. Lang. Relationship 2, pp. 115-138.

Fox 1995 — A. Fox. Linguistic reconstruction. An introduction to theory and method. Oxford University Press, Oxford.

Garret 2014 — A. Garret. Sound change. The Routledge Handbook of Historical Linguistics. Ed by. C. Bowern & N. Evans. Routledge, pp. 227-248.

Gomez-Imbert 1993 — E. Gomez-Imbert. Problemas en torno a la comparación de las lenguas tucano-orientales. Estado actual de la clasificación de las lenguas indígenas de Colombia. Ed by. M. de Monte. Instituto Caro y Cuervo, Bogotá, pp. 235-267.

Grand et al. 2013 — A. Grand, A. Corvez, L. Duque Velez, M. Laurin. Phylogenetic inference using discrete characters: performance of ordered and unordered parsimony and of three-item statements. Biological Journal of the Linnean Society of London 110, pp. 914-930.

Gray & Atkinson 2003 — R. Gray, Q. Atkinson. Language-tree divergence times support the Anatolian theory of Indo-European origin. Nature 426.6965, pp. 435-439.

Grimm 1822 — J. Grimm. Deutsche Grammatik. Dieterichsche Buchhandlung, Göttingen.

Hammarström et al. 2015 — H. Hammarström, R. Forkel, M. Haspelmath, S. Bank 2015 Glottolog. URL: http://glottolog.org.

Harrison 2003 — S. Harrison. On the limits of the comparative method. The handbook of historical linguistics. Ed by. B. Joseph & R. Janda. Blackwell, Malden, Oxford, Melbourne, Berlin, pp. 213-243.

Haspelmath 2004 — M. Haspelmath. On directionality in language change with particular reference to grammati-calization. In: Up and down the cline -- The nature of grammaticalization. Ed by. O. Fischer, M. Norde, & H. Per-ridon. John Benjamins Publishing Company, , pp. 17-44.

Hennig 1950 — W. Hennig. Grundzüge einer Theorie der phylogenetischen Systematik. Deutscher Zentralverlag, Berlin.

Hock & Joseph 2009 — H. Hock, B. Joseph. Language history, language change and language relationship. An introduction to historical and comparative linguistics. Mouton de Gruyter, Berlin, New York.

Hrozny 1915 — B. Hrozny. Die Lösung des hethitischen Problems. Mitt. Dtsch. Orient-Ges. 56, pp. 17-50.

Hruschka et al. 2015 — D. Hruschka, S. Branford, E. Smith, J. Wilkins, A. Meade, M. Pagel, T. Bhattacharya. Detecting regular sound changes in linguistics as events of concerted evolution. Curr. Biol. 25.1, pp. 1-9.

Huelsenbeck et al. 2002 — J. Huelsenbeck, J. Bollback, A. Levine. Inferring the root of a phylogenetic tree. Systems Biology 51.1, pp. 32-43.

Huson & Scornavacca 2012 — D. Huson, C. Scornavacca. Dendroscope 3: an interactive tool for rooted phylogenetic trees and networks. Systems Biology 61.6, pp. 1061-1067.

Jakobson 1962[1929] — R. Jakobson. Remarque sur l'évolution phonologique du russe compare a celle des autres langue slaves. In: Phonological studies. Ed by. R. Jakobson. Mouton, The Hague, pp. 7-116.

King 1967 — R. King. Functional load and sound change. Language 43.4, pp. 831-852.

Kiparsky 1988 — P. Kiparsky. Phonological change. In: Linguistic theory. Ed by. F. Newmeyer. Cambridge University Press, Cambridge, New York, New Rochelle, Melbourne, Sydney, pp. 363-415.

Kiparsky 1995 — P. Kiparsky. The phonological basis of sound change. In: Handbook of phonological theory. Ed by J. Goldsmith. Blackwell, Oxford, pp. 640-670.

Klimov 1990 — G. Klimov. Osnovy lingvisticeskoj komparativistiki. Nauka, Moscow.

Kümmel 2008 — M. Kümmel. Konsonantenwandel. Reichert, Wiesbaden.

Kurylowicz 1927 — J. Kurylowicz. a indo-européen et h hittite. Symbolae grammaticae in honorem Ioannis Rozwad-owski. Ed by. W. Taszycki & W. Doroszewski. Gebethner & Wolf, Cracow, pp. 95-104.

Labov 1981 — W. Labov. Resolving the Neogrammarian Controversy. Language 57.2, pp. 267-308.

Labov 1994 — W. Labov. Principles of linguistic change. Volume 1: Internal factors. Wiley-Blackwell, Malden, Oxford, West Sussex.

Lehmann 1969 — W. Lehmann. Einführung in die historische Linguistik. Carl Winter, Heidelberg.

List 2014 — J.-M. List. Sequence comparison in historical linguistics. Düsseldorf University Press, Düsseldorf.

List & Moran 2013 — J.-M. List, S. Moran. An open source toolkit for quantitative historical linguistics. Proceedings of the ACL 2013 System Demonstrations. Association for Computational Linguistics, Stroudsberg, pp. 1318.

Longobardi et al. 2013 — G. Longobardi, C. Guardiano, G. Silvestri, A. Boattini, A. Ceolin. Toward a syntactic phylogeny of modern Indo-European languages. J. Hist. Linguist. 3.1, pp. 122-152.

Maddison 1991 — D. Maddison. The discovery and importance of multiple islands of most-parsimonious trees. Systematic Zoology 40.3, pp. 315-328.

Malone 1987 — T. Malone. Proto-Tucanoan and Tucanoan genetic relationship. Instituto Linguístico de Verano, Colombia.

Mallory & Adams 2006 — J. Mallory, D. Adams. The Oxford introduction to Proto-Indo-European and the Proto-Indo-European world. Oxford University Press, Oxford.

Martinet 1952 — A. Martinet. Function, structure, and sound change. Word 8.1, pp. 1-33.

Mason 1950 — J. Mason. The languages of South American Indians. Bureau of American Ethnology Bulletin 143, pp. 157-317.

Matthews 1997 — P. H. Matthews. Oxford concise dictionary of linguistics. Oxford University Press, Oxford.

Meillet 1954[1925] — A. Meillet. La méthode comparative en linguistique historique. Honoré Champion, Paris.

Mielke 2008 — J. Mielke. The emergence of distinctive features. Oxford University Press, Oxford.

Nunn 2011 — C. Nunn. The comparative approach in evolutionary anthropology and biology. University of Chicago Press, Chicago, London.

Ohala 1989 — J. Ohala. Sound change is drawn from a pool of synchronic variation. In: Language Change: Contributions to the study of its causes. Ed by. L. Breivik & E. Jahr. Mouton de Gruyter, Berlin, pp. 173-198.

Osthoff & Brugmann 1878 — H. Osthoff, K. Brugmann. Morphologische Untersuchungen auf dem Gebiete der indogermanischen Sprachen. Hirzel, Leipzig.

Pagel 2009 — M. Pagel. Human language as a culturally transmitted replicator. Nature Reviews. Genetics 10, pp. 405-415.

Pellard 2009 — T. Pellard. Ögami. Éléments de description d'un parler du Sud des Ryükyü. PhD thesis. École des Hautes Études en Sciences Sociales. Paris.

Ramirez 1997 — H. Ramirez. Gramática. Inspetoria Salesiana Missionaria da Amazonia, CEDEM, Manaus.

Rask 1818 — R. Rask. Undersögelse om det gamle Nordiske eller Islandske sprogs oprindelse. Gyldendalske Boghan-dlings Forlag, Copenhagen.

Ross & Durie 1996 — M. Ross, M. Durie. Introduction. In: The comparative method reviewed: Regularity and irregularity in language change. Ed by. M. Durie. Oxford University Press, New York, pp. 3-38.

Sankoff 1975 — D. Sankoff. Minimal mutation trees of sequences. SIAM Journal on Applied Mathematics 28.1, pp. 35-42.

Saussure 1879 — F. Saussure. Mémoire sur le système primitif des voyelles dans les langues indo- européennes. Teubner, Leipzig.

Skilton 2013 — A. Skilton. A new proposal of Western Tukanoan consonants and internal classification. Senior Thesis Essay. Yale University.

Tynjanow & Jakobson 1991[1928] — J. Tynjanow, R. Jakobson. Probleme der Literatur- und Sprachforschung. In: Alternative Traditionen. Ed by. R. Viehoff. Vieweg, Braunschweig , pp. 67-69.

Verner 1877 — K. Verner. Eine Ausnahme der ersten Lautverschiebung. Zeitschrift für vergleichende Sprachforschung auf dem Gebiete der Indogermanischen Sprachen 23.2, pp. 97-130.

Waltz & Wheeler 1972 — N. Waltz, A. Wheeler. Proto Tucanoan. In: Comparative studies in Amerindian languages. Ed by. E. Matteson, A. Wheeler, F. Jackson, N. Waltz, & D. Christian. Mouton, The Hague, Paris, pp. 119-149.

Wang 1969 — W.-Y. Wang. Competing changes as a cause of residue. Language 45.1, pp. 9-25.

Weinreich et al. 1968 — U. Weinreich, W. Labov, M. Herzog. Empirical foundations for a theory of language change. In: Directions for historical linguistics: A symposium. Ed by. W. Lehmann & Y. Malkiel. University of Texas Press, Austin, pp. 95-189.

Weiss 2014 — M. Weiss. The comparative method. In: The Routledge Handbook of Historical Linguistics. Ed by C. Bowern & N. Evans. Routledge, New York, pp. 127-145.

Wheeler 1992 — A. Wheeler. Comparaciones lingüisticas en el grupo Tucano Occidental. In: Estudios comparativos Proto Tucano. Ed by. S. Levinsohn. Alberto Lleras Camargo, Bogotá,

Williams et al. 2015 — T. Williams, S. Heaps, S. Cherlin, T. Nye, R. Boys, T. Embley. New substitution models for rooting phylogenetic trees. Philos. T. Roy. Soc. B 370.

Zgusta 2006 — L. Zgusta. The laryngeal and glottalic theories. In: History of the language sciences. Ed by. S. Auroux, E. Koerner, H.-J. Niederehe, & K. Versteegh. de Gruyter, Berlin, New York, pp. 2462-2479.

Т. К. Шакон, Й.-М. Лист. Уточнение компьютерной модели звуковых изменений помогает понять историю языков тукано.

За последние сорок лет внутренняя история языков тукано была в центре множества дискуссий; предлагались самые разные классификации, основанные на лексических и фонологических данных. Данная статья представляет новую классификацию языковой семьи тукано, основанную на улучшенном компьютеризированном подходе, который реконструирует филогенетическое дерево исходя из предлагаемых звуковых изменений. В отличие от традиционного метода, который основан на ручной идентификации общих инноваций лингвистами, новый метод определяет инновации по принципу бережливости (parsimony). В отличие от моделей с бинарными признаками, пользующихся большой популярностью при анализе лексики, звуковые изменения описываются авторами как направленные взвешенные переходы между несколькими состояниями признака. Авторы применяют свой алгоритм к выборке из 21 современного языка семьи тукано. Результаты подтверждают бинарное разделение языков тукано на западную и восточную ветви, которое уже было ранее предложено, а также указывают на генетическую близость между, с одной стороны, языками кубео и танимука, с другой — корегуахе и маихики, таким образом, примиряя друг с другом предыдущие классификации. Авторы используют полученную классификацию для реконструкции консенсусного филогенетического дерева, в котором все общие инновации проверены вручную и для всех неточностей предлагаются детальные объяснения.

Ключевые слова: звуковые изменения, филогенетическая реконструкция, языки тукано, автоматизированная лингвистическая компаративистика.

Improved computational models of sound change shed light on the history of the Tukanoan languages Текст научной статьи по специальности «Языкознание и литературоведение»

Аннотация научной статьи по языкознанию и литературоведению, автор научной работы — Chacon Thiago Costa, List Johann-Mattis

Похожие темы научных работ по языкознанию и литературоведению , автор научной работы — Chacon Thiago Costa, List Johann-Mattis

Текст научной работы на тему «Improved computational models of sound change shed light on the history of the Tukanoan languages»