This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages)(Learn how and when to remove this template message)
|Part of a series on|
In historical linguistics, the tree model (also Stammbaum, genetic, or cladistic model) is a model of the evolution of languages analogous to the concept of a family tree, particularly a phylogenetic tree in the biological evolution of species. As with species, each language is assumed to have evolved from a single parent or "mother" language, with languages that share a common ancestor belonging to the same language family.
Popularized by the German linguist August Schleicher in 1853, the tree model has always been a common method of describing genetic relationships between languages since the first attempts to do so. It is central to the field of comparative linguistics, which involves using evidence from known languages and observed rules of language feature evolution to identify and describe the hypothetical proto-languages ancestral to each language family, such as Proto-Indo-European and the Indo-European languages. However, this is largely a theoretical, qualitative pursuit, and linguists have always emphasized the inherent limitations of the tree model due to the large role played by horizontal transmission in language evolution, ranging from loanwords to creole languages that have multiple mother languages. The wave model was developed in 1872 by Schleicher's student Johannes Schmidt as an alternative to the tree model that incorporates horizontal transmission.
The tree model also has the same limitations as biological taxonomy with respect to the species problem of quantizing a continuous phenomenon that includes exceptions like ring species in biology and dialect continua in language. The concept of a linkage was developed in response and refers to a group of languages that evolved from a dialect continuum rather than from linguistically isolated child languages of a single language.
In the 21st century, methods from computational phylogenetics have increasingly been applied to large linguistic datasets to automatically produce phylogenetic trees for language families, which has been met with skepticism by many historical linguists.
- 1 History of the model
- 2 Computational phylogenetics in historical linguistics
- 3 See also
- 4 Notes
- 5 Bibliography
- 6 External links
History of the model
The confusion of Babel
Historical linguistics was not possible in Europe from the dominance of Christianity in the late Roman empire to the Age of Enlightenment due to literal adherence to Genesis 11:1-9, which offers an explanation of why languages differ: "And the whole earth was of one language, and of one speech." According to the Genesis narrative, the descendants of Noah gathered together in the land of Shinar and began constructing the Tower of Babel in an attempt to reach heaven. In response to their over-reaching God decided to "confound their language, that they may not understand one another's speech" and "scattered them abroad from thence upon the face of the earth." In other words, if languages were given by God, then they did not evolve, and there is no point in comparing them.
The Christian philosopher, Saint Augustine of Hippo, supposed that each of the descendants of Noah founded a nation and that each nation was given its own language: Assyrian for Assur, Hebrew for Heber, and so on. In all he identified 72 nations, tribal founders and languages. The confusion and dispersion occurred in the time of Peleg, son of Heber, son of Shem, son of Noah.
St. Augustine then makes a hypothesis not unlike those of later historical linguists, that the family of Heber "preserved that language not unreasonably believed to have been the common language of the race ... thenceforth named Hebrew." Most of the 72 languages, however, date to many generations after Heber. St. Augustine solves this first problem by supposing that Heber, who lived 430 years, was still alive when God assigned the 72.:123
Ursprache, the language of paradise
St. Augustine's hypothesis stood without major question for over a thousand years. Then, in a series of tracts, published in 1684, expressing skepticism concerning various beliefs, especially Biblical, Sir Thomas Browne wrote:
"Though the earth were widely peopled before the flood ... yet whether, after a large dispersion, and the space of sixteen hundred years, men maintained so uniform a language in all parts, ... may very well be doubted."
By then, discovery of the New World and exploration of the Far East had brought knowledge of numbers of new languages far beyond the 72 calculated by St. Augustine. Citing the native American languages, Browne suggests the "confusion of tongues at first fell only upon those present in Sinaar at the work of Babel ...." For those "about the foot of the hills, whereabout the ark rested ... their primitive language might in time branch out into several parts of Europe and Asia ...." This is an inkling of a tree. In Browne's view, simplification from a larger aboriginal language than Hebrew could account for the differences in language. He suggests ancient Chinese, from which the others descended by "confusion, admixtion and corruption". Later he invokes "commixture and alteration."
Browne reports a number of reconstructive activities by the scholars of the times:
"The learned Casaubon conceiveth that a dialogue might be composed in Saxon, only of such words as are derivable from the Greek ... Verstegan made no doubt that he could contrive a letter that might be understood by the English, Dutch, and East Frislander ... And if, as the learned Buxhornius contendeth, the Scythian language as the mother tongue runs throughout the nations of Europe, and even as far as Persia, the community on many words, between so many nations, hath more reasonable traduction and were rather derivable from the common tongue diffused through them all, than from any particular nation, which hath also borrowed and holdeth but at second hand."
The confusion at the Tower of Babel was thus removed as an obstacle by setting it aside. Attempts to find similarities in all languages were resulting in the gradual uncovering of an ancient master language from which all the other languages derive. Browne undoubtedly did his writing and thinking well before 1684. In that same revolutionary century in Britain James Howell published Volume II of Epistolae Ho-Elianae, quasi-fictional letters to various important persons in the realm containing valid historical information. In Letter LVIII the metaphor of a tree of languages appears fully developed short of being a professional linguist's view:
"I will now hoist sail for the Netherlands, whose language is the same dialect with the English, and was so from the beginning, being both of them derived from the high Dutch [Howell is wrong here]: The Danish also is but a branch of the same tree ... Now the High Dutch or Teutonick Tongue, is one of the prime and most spacious Maternal Languages of Europe ... it was the language of the Goths and Vandals, and continueth yet of the greatest part of Poland and Hungary, who have a Dialect of hers for their vulgar tongue ... Some of her writers would make this world believe that she was the language spoken in paradise."
The first Indo-Europeanists
On February 2, 1786, Sir William Jones delivered his Third Anniversary Discourse to the Asiatic Society as its president on the topic of the Hindus. In it he applied the logic of the tree model to three languages, Greek, Latin and Sanskrit, but for the first time in history on purely linguistic grounds, noting "a stronger affinity, both in the roots of the verbs and in the forms of grammar, than could possibly have been produced by accident; ...." He went on to postulate that they sprang from "some common source, which, perhaps, no longer exists." To them he added Gothic, Celtic and Persian as "to the same family."
Jones did not name his "common source" nor develop the idea further, but it was taken up by the linguists of the times. In the (London) Quarterly Review of late 1813-1814, Thomas Young published a review of Johann Christoph Adelung's Mithridates, oder allgemeine Sprachenkunde ("Mithridates, or a General History of Languages"), Volume I of which had come out in 1806, and Volumes II and III, 1809-1812, continued by Johann Severin Vater. Adelung's work described some 500 "languages and dialects" and hypothesized a universal descent from the language of paradise, located in Kashmir central to the total range of the 500. Young begins by pointing out Adelung's indebtedness to Conrad Gesner's Mithridates, de Differentiis Linguarum of 1555 and other subsequent catalogues of languages and alphabets.
Young undertakes to present Adelung's classification. The monosyllabic type is most ancient and primitive, spoken in Asia, to the east of Eden, in the direction of Adam's exit from Eden. Then follows Jones' group, still without a name, but attributed to Jones: "Another ancient and extensive class of languages united by a greater number of resemblances than can well be altogether accidental." For this class he offers a name, "Indoeuropean," the first known linguistic use of the word, but not its first known use. The British East India Company was using "Indo-European commerce" to mean the trade of commodities between India and Europe. All the evidence Young cites for the ancestral group are the most similar words: mother, father, etc.
"It may be worth while to illustrate this view of classification, by taking the case of languages. If we possessed a perfect pedigree of mankind, a genealogical arrangement of the races of man would afford the best classification of the various languages now spoken throughout the world; and if all extinct languages, and all intermediate and slowly changing dialects, had to be included, such an arrangement would, I think, be the only possible one. Yet it might be that some very ancient language had altered little, and had given rise to few new languages, whilst others (owing to the spreading and subsequent isolation and states of civilisation of the several races, descended from a common race) had altered much, and had given rise to many new languages and dialects. The various degrees of difference in the languages from the same stock, would have to be expressed by groups subordinate to groups; but the proper or even only possible arrangement would still be genealogical; and this would be strictly natural, as it would connect together all languages, extinct and modern, by the closest affinities, and would give the filiation and origin of each tongue."
The phylogenetic tree
Greenberg began writing during a time when phylogenetic systematics lacked the tools available to it later: the computer (computational systematics) and DNA sequencing (molecular systematics). To discover a cladistic relationship researchers relied on as large a number of morphological similarities among species as could be defined and tabulated. Statistically the greater the number of similarities the more likely species were to be in the same clade. This approach appealed to Greenberg, who was interested in discovering linguistic universals. Altering the tree model to make the family tree a phylogenetic tree he said:
"Any language consists of thousands of forms with both sound and meaning ... any sound whatever can express any meaning whatever. Therefore, if two languages agree in a considerable number of such items ... we necessarily draw a conclusion of common historical origin. Such genetic classifications are not arbitrary ... the analogy here to biological classification is extremely close ... just as in biology we classify species in the same genus or high unit because the resemblances are such as to suggest a hypothesis of common descent, so with genetic hypotheses in language."
In this analogy, a language family is like a clade, the languages are like species, the proto-language is like an ancestor taxon, the language tree is like a phylogenetic tree and languages and dialects are like species and varieties. Greenberg formulated large tables of characteristics of hitherto neglected languages of Africa, the Americas, Indonesia and northern Eurasia and typed them according to their similarities. He called this approach "typological classification", arrived at by descriptive linguistics rather than by comparative linguistics.
Computational phylogenetics in historical linguistics
|This section relies largely or entirely upon a single source. (October 2013)|
In the late 20th century, linguists began using software intended for biological classification to classify languages. Programs and methods became increasingly sophisticated. In the early 21st century, the Computational Phylogenetics in Historical Linguistics (CPHL) project, a consortium of historical linguists, received funding from the National Science Foundation to study phylogenies. 
Limitations of the model
The limitations of the tree model, in particular its inability to handle the non-discrete distribution of shared innovations in dialect continua, have been addressed through the development of non-cladistic (non-tree-based) methodologies. They include the Wave model; and more recently, the concept of linkage.
An additional limitation of the tree model involves mixed and hybrid languages, as well as language mixing in general since the tree model allows only for divergences. For example, according to Zuckermann (2009:63), "Israeli", his term for Modern Hebrew, which he regards as a Semito-European hybrid, "demonstrates that the reality of linguistic genesis is far more complex than a simple family tree system allows. 'Revived' languages are unlikely to have a single parent."
The software massages all the states of all the characters of all the languages by one of several mathematical methods to accomplish a pairwise comparison of each language with all the rest. It then constructs a cladogram based on degrees of similarity; for example, hypothetical languages, a and b, which are closest only to each other, are assumed to have a common ancestor, a-b. The next closest language, c, is assumed to have a common ancestor with a-b, and so on. The result is a projected series of historical paths leading from the overall common ancestor (the root) to the languages (the leaves). Each path is unique. There are no links between paths. Every leaf and node have one and only one ancestor. All the states are accounted for by descent from other states. A cladogram that conforms to these requirements is a perfect phylogeny.
To obtain a reasonably valid phylogeny, the researchers found they needed to enter as input all three types of characters: phonological, lexical and morphological, which were all required to present a picture that was sufficiently detailed for calculation of phylogeny. Only qualitative characters produced meaningful results. Repeated states were too ambiguous to be correctly interpreted by the software; therefore characters that were subject to back formation and parallel development, which reverted a character to a prior state or adopted a state that evolved in another character, respectively, were screened from the input dataset.
Perfect phylogenetic networks
Despite their care to code the best qualitative characters in sufficient numbers, the researchers could obtain no perfect phylogenies for some groups, such as Germanic and Albanian within Indo-European. They reasoned that a significant number of characters, which could not be explained by genetic descent from the group's calculated ancestor, were borrowed. Presumably, if the wave model, which explained borrowing, were a complete explanation of the group's characters, no phylogeny at all could be found for it. If both models were partially effective, then a tree would exist, but it would need to be supplemented by non-genetic explanations. The researchers therefore modified the software and method to include the possibility of borrowing.
A tree so modified was no longer a tree as such: there could be more than one path from root to leaf. The researchers called this arrangement a network. The states of a character still evolved along a unique path from root to leaf, but its origin could be either the root under consideration or a contact language. If all the states of the experiment could be accounted for by the network, it was termed a perfect phylogenetic network.
Compatibility and feasibility
The generation of networks required two phases. In the first phase, the researchers devised a number of phylogenies, called candidate trees, to be tested for compatibility. A character is compatible when its origin is explained by the phylogeny generated. 
Most feasible network for Indo-European
The researchers began with five candidate trees for Indo-European, lettered A-E, one generated from the phylogenetic software, two modifications of it and two suggested by Craig Melchert, a historical linguist and Indo-Europeanist. The trees differed mainly in the placement of the most ambiguous group, the Germanic languages, and Albanian, which did not have enough distinctive characters to place it exactly. Tree A contained 14 incompatible characters; B, 19; C, 17; D, 21; E,18. Trees A and C had the best compatibility scores. The incompatibilities were all lexical, and A's were a subset of C's.
Subsequent generation of networks found that all incompatibilites could be resolved with a minimum of three contact edges except for Tree E. As it did not have a high compatibility, it was excluded. Tree A had 16 possible networks, which a feasibility inspection reduced to three. Tree C had one network, but as it required an interface to Baltic and not Slavic, it was not feasible.
Tree A, the most compatible and feasible tree, hypothesizes seven groups separating from Proto-Indo-European between about 4000 BC and 2250 BC, as follows.
- The first to separate was Anatolian, about 4000 BC.
- Tocharian followed at about 3500 BC.
- Shortly thereafter, about 3250, Proto-Italo-Celtic (western Indo-European) separated, becoming Proto-Italic and Proto-Celtic at about 2500 BC.
- At about 3000, Proto-Albano-Germanic separated, becoming Albanian and Proto-Germanic at about 2000.
- At about 3000 Proto-Greco-Armenian (southern Indo-European) divided, becoming Proto-Greek and Proto-Armenian at about 1800.
- Balto-Slavic appeared about 2500, dividing into Proto-Baltic and Proto-Slavic at about 1000.
- Finally, Proto-Indo-European became Proto-Indo-Iranian (eastern Indo-European) at about 2250.
Trees B and E offer the alternative of Proto-Germano-Balto-Slavic (northern Indo-European), making Albanian an independent branch. The only date for which authors vouch is the last, based on the continuity of the Yamna culture, the Andronovo Culture and known Indo-Aryan speaking cultures. All others are described as "dead reckoning."
Given the phylogeny of best compatibility, A, three contact edges are required to complete the compatibility. This is group of edges with the least number of borrowing events:
- First, an edge between Proto-Italic and Proto-Germanic, which must have begun after 2000, according to the dating scheme given.
- A second contact edge was between Proto-Italic and Proto-Greco-Armenian, which must have begun after 2500.
- The third contact edge is between Proto-Germanic and Proto-Baltic, which must have begun after 1000.
Tree A with the edges described above is described by the authors as "our best PPN." In all PPNs, it is clear that although the initial daughter languages became distinct in relative isolation, the later evolution of the groups can be explained only by evolution in proximity to other languages with which an exchange takes place by the wave model.
- List, Johann-Mattis; Nelson-Sathi, Shijulal; Geisler, Hans; Martin, William (2014). "Networks of lexical borrowing and lateral gene transfer in language and genome evolution". BioEssays. 36 (2): 141–150. doi:10.1002/bies.201300096. ISSN 0265-9247.
- Saint Augustine. "XVI: 9-11". City of God.
- Genesis 10:25
- 1 Chronicles 1:19.
- Saint Augustine (Bishop of Hippo.) (1871). The Works of Aurelius Augustine: A New Translation. T. & T. Clark.
- Browne 1684, pp. 223–241
- Browne 1684, p. 224
- Browne 1684, p. 225
- Browne 1684, p. 228
- Browne 1684, pp. 226–228
- Howell, James (1688) . "Letter LVIII To the Right Honourable the Earl R.". Epistolae Ho-Elianae, Familiar Letters, Domestic and Forren, Divided into Four Books, Partly Historical, Political, Philosophical, Upon Emergent Occasions. Volume II (6th ed.). London: Thomas Guy. p. 356.
- Jones, William (1807) , "Third Anniversary Discourse, on the Hindus", in Lord Teignmouth, The Works of Sir William Jones with the life of the Author, in Thirteen Volumes, Volume III, London: John Stockdale and John Walker, p. 34
- Young 1813, p. 251
- Young 1813, p. 255
- Grant, Robert (1813). A sketch of the history of the East-India company, from its first formation to the passing of the Regulating act of 1773; with a summary view of the changes that have taken place since that period in the internal administration of British India. London: Black, Parry, and Co. [etc.] pp. xxxiv–xxxv.
- D M Williams, D.M.; Malte C Ebach; Gareth J Nelson (2008). Foundations of systematics and biogeography. New York, NY: Springer. p. 45.
- Darwin, Charles (1860). On the origin of species by means of natural selection, or, The preservation of favoured races in the struggle for life. London: J. Murray. p. 422.
- Post, David G (2009). In search of Jefferson's moose: notes on the state of cyberspace. Oxford; New York: Oxford University Press. p. 125.
- Greenberg, Joseph H. (1990) , "A Quantitative Approach to the Typological Morphology of Language", in Denning, Keith M.; Kemmer, Suzanne, On language: selected writings of Joseph H. Greenberg, Stanford: Stanford University Press, pp. 3–4
- Greenberg, Joseph Harold (1971). Language, culture, and communication. Stanford: Stanford University Press. p. 113.
- "CPHL: Computational Phylogenetics in Historical Linguistics". 2004–2012.
- See Bloomfield 1933, p. 311; Heggarty et al. (2010); François (2014).
- Zuckermann, Ghil'ad. 2009. "Hybridity versus Revivability: Multiple Causation, Forms and Patterns." Journal of Language Contact, Varia 2:40-67.
- Nakhleh 2005, p. 383.
- Nakhleh 2005, pp. 384–385.
- The technical details of the algorithms used are stated in Nakhleh 2005, Appendix A. The details of the dataset are stated in Appendix B.
- Nakhleh 2005, pp. 388–391.
- Nakhleh 2005, p. 387.
- Nakhleh 2005, p. 396.
- Nakhleh 2005, p. 400.
- Nakhleh 2005, p. 398.
- Nakhleh 2005, p. 401.
- Nakhleh 2005, p. 407.
- Bloomfield, Leonard (1984) . Language. Chicago and London: University of Chicago Press.
- Browne, Thomas (1852) , "Miscellany Tracts; Miscellanies; Tract 8, Of Languages, and Particularly of the Saxon Tongue", in Tenison, Thomas, The Works of Sir Thomas Browne, Bohn's Antiquarian Library, Volume III, Lincoln's Inn Fields: Cox (Brothers) and Wyman, pp. 223–241.
- François, Alexandre (2014), "Trees, Waves and Linkages: Models of Language Diversification" (PDF), in Bowern, Claire; Evans, Bethwyn, The Routledge Handbook of Historical Linguistics, London: Routledge, pp. 161–189, ISBN 978-0-41552-789-7.
- Heggarty, Paul; Maguire, Warren; McMahon, April (2010). "Splits or waves? Trees or webs? How divergence measures and network analysis can unravel language histories". Philosophical Transactions of the Royal Society B. 365: 3829–3843. doi:10.1098/rstb.2010.0099.
- Nakhleh, Luay; Ringe, Don; Warnow, Tandy (2005). "Perfect Phylogenetic Networks: A New Methodology for Reconstructing the Evolutionary History of Natural Languages" (PDF). Language. Linguistic Society of America. 81 (2): 382–420. doi:10.1353/lan.2005.0078.
- Young, Thomas (Oct. 1813, & Jan. 1814). "Adlung's General History of Languages". The Quarterly Review. London: John Murray. X (No. XIX Article XII): 250–292. Check date values in:
|Wikimedia Commons has media related to Language charts and trees.|
- Labov, William (2010). "15. The Diffusion of Language from Place to Place". Principles of Linguistic Change. Volume 3: Cognition and Cultural Factors. UK: Wiley-Blackwell; scribd.com.
- Santorini, Beatrice; Kroch, Anthony (2007). "Node Relations". The syntax of natural language: An online introduction using the Trees program. University of Pennsylvania.