Mass comparison

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Mass comparison is a method developed by Joseph Greenberg to determine the level of genetic relatedness between languages. It is now usually called multilateral comparison. The method is rejected by most linguists (Campbell 2001, p. 45), though not all.

Some of the top-level relationships Greenberg named had already been posited by others and are now generally accepted (e.g. Afro-Asiatic and Niger–Congo). Others are accepted by many though disputed by some prominent specialists (e.g. Nilo-Saharan), others are predominantly rejected but have some defenders (e.g. Khoisan and Eurasiatic), while others are almost universally rejected (e.g. Amerind).


Mass comparison involves setting up a table of basic vocabulary items and their forms in the languages to be compared. The table can also include common morphemes. The following table was used by Greenberg (1957, p. 41) to illustrate the technique. It shows the forms of six items of basic vocabulary in nine different languages, identified by letters.

Head kar kar se kal tu tu to fi pi
Eye min ku min miŋ min min idi iri
Nose tor tör ni tol was waš was ik am
One mit kan kan kaŋ ha kan kεn he čak
Two ni ta ne kil ne ni ne gum gun
Blood kur sem sem šam i sem sem fik pix

The basic relationships can be determined without any experience in the case of languages that are fairly closely related. Knowing a bit about probable paths of sound change allows one to go farther faster. An experienced typologist—Greenberg was a pioneer in the field—can quickly recognize or reject several potential cognates in this table as probable or improbable. For example, the path p > f is extremely frequent, the path f > p much less so, enabling one to hypothesize that fi : pi and fik : pix are indeed related and go back to protoforms *pi and *pik/x, while knowledge that k > x is extremely frequent, x > k much less so enables one to choose *pik over *pix. Thus, while mass comparison does not attempt to produce reconstructions of protolanguages—according to Greenberg (2005:318) these belong to a later phase of study—phonological considerations come into play from the very beginning.

The tables used in actual research involve much larger numbers of items and languages. The items included may be either lexical, such as 'hand', 'sky', and 'go', or morphological, such as PLURAL and MASCULINE (Ruhlen 1987, p. 120).

Detection of borrowings[edit]

Critics of mass comparison generally assume that mass comparison has no means to distinguish borrowed forms from inherited ones, unlike comparative reconstruction, which is able to do so through regular sound correspondences. These questions were addressed by Greenberg (1957, p. 39) as of the 1950s. According to him, the key points are as follows :

  • Basic vocabulary is much less readily borrowed than cultural vocabulary.
  • "[D]erivational, inflectional, and pronominal morphemes and morph alternations are the least subject of all to borrowing."
  • Any type of linguistic item may be borrowed "on occasion". However, "fundamental vocabulary is proof against mass borrowing."
  • Mass comparison does not possess means to distinguish borrowing in every instance: "in particular and infrequent instances the question of borrowing may be doubtful". However, it is always possible to detect whether borrowing is responsible for "a mass of resemblances" between languages: "Where a mass of resemblances is due to borrowing, they will tend to appear in cultural vocabulary and to cluster in certain semantic areas which reflect the cultural nature of the contact."
  • The technique of mass comparison, as opposed to bilateral comparison, provides a check on whether forms are borrowed or not (Greenberg 1957, p. 40):
Borrowing can never be an over-all explanation of a mass of recurrent basic resemblances in many languages occurring over a wide geographical area.... Since we find independent sets of resemblances between every pair of languages, among every group of three languages, and so on, each language would have to borrow from every other.
  • "[R]ecurrent sound correspondences" do not suffice to detect borrowing, since "where loans are numerous, they often show such correspondences" (Greenberg 1957, pp. 39–40).

Greenberg considered that the results achieved through this method approached certainty (Greenberg 1957, p. 39): "The presence of fundamental vocabulary resemblances and resemblances in items with grammatical function, particularly if recurrent through a number of languages, is a sure indication of genetic relationship."

The place of sound correspondences in the comparative method[edit]

It is often reported that Greenberg sought to replace the comparative method with a new method, mass comparison (or, among his less scrupulous critics, "mass lexical comparison"). He consistently rejected this characterization, stating for instance, "The methods outlined here do not conflict in any fashion with the traditional comparative method" (1957:44) and expressing wonderment at "the strange and widely disseminated notion that I seek to replace the comparative method with a new and strange invention of my own" (2002:2). According to Greenberg, mass comparison is the necessary "first step" in the comparative method (1957:44), and "once we have a well-established stock I go about comparing and reconstructing just like anyone else, as can be seen in my various contributions to historical linguistics" (1990, quoted in Ruhlen 1994:285). Reflecting the methodological empiricism also present in his typological work, he viewed facts as of greater weight than their interpretations, stating (1957:45):

[R]econstruction of an original sound system has the status of an explanatory theory to account for etymologies already strong on other grounds. Between the *vaida of Bopp and the *γwoidxe of Sturtevant lie more than a hundred years of the intensive development of Indo-European phonological reconstruction. What has remained constant has been the validity of the etymologic relationship among Sanskrit veda, Greek woida, Gothic wita, all meaning "I know", and many other unshakable etymologies both of root and of non-root morphemes recognized at the outset. And who will be bold enough to conjecture from what original the Indo-Europeanist one hundred years from now will derive these same forms?


The thesis of mass comparison, then, is that:

  • A group of languages is related when they show numerous resemblances in basic vocabulary, including pronouns, and morphemes, forming an interlocking pattern common to the group.
  • While mass comparison cannot identify every instance of borrowing, it can identify broad patterns of borrowing, which suffices in establishing genetic relationship.
  • The results achieved approach certainty.
  • It is unnecessary to establish sets of recurrent sound correspondences or reconstructed ancestral forms to identify genetic relationships. On the contrary, it is not possible to establish such correspondences or to reconstruct such forms until genetic relationships are identified.

Disputed legacy of the comparative method[edit]

The conflict over mass comparison can be seen as a dispute over the legacy of the comparative method, developed in the 19th century, primarily by Danish and German linguists, in the study of Indo-European languages.

Position of Greenberg's detractors[edit]

Since the development of comparative linguistics in the 19th century, a linguist who claims that two languages are related, whether or not there exists historical evidence, is expected to back up that claim by presenting general rules that describe the differences between their lexicons, morphologies, and grammars. The procedure is described in detail in the comparative method article.

For instance, one could demonstrate that Spanish is related to Italian by showing that many words of the former can be mapped to corresponding words of the latter by a relatively small set of replacement rules—such as the correspondence of initial es- and s-, final -os and -i, etc. Many similar correspondences exist between the grammars of the two languages. Since those systematic correspondences are extremely unlikely to be random coincidences, the most likely explanation by far is that the two languages have evolved from a single ancestral tongue (Latin, in this case).

All pre-historical language groupings that are widely accepted today—such as the Indo-European, Uralic, Algonquian, and Bantu families—have been established this way.

Response of Greenberg's defenders[edit]

The actual development of the comparative method was a more gradual process than Greenberg's detractors suppose. It has three decisive moments. The first was Rasmus Rask's observation in 1818 of a possible regular sound change in Germanic consonants. The second was Jacob Grimm's extension of this observation into a general principle (Grimm's law) in 1822. The third was Karl Verner's resolution of an irregularity in this sound change (Verner's law) in 1875. Only in 1861 did August Schleicher, for the first time, present systematic reconstructions of Indo-European proto-forms (Lehmann 1993:26). Schleicher, however, viewed these reconstructions as extremely tentative (1874:8). He never claimed that they proved the existence of the Indo-European family, which he accepted as a given from previous research—primarily that of Franz Bopp, his great predecessor in Indo-European studies.

Karl Brugmann, who succeeded Schleicher as the leading authority on Indo-European, and the other Neogrammarians of the late 19th century, distilled the work of these scholars into the famous (if often disputed) principle that "every sound change, insofar as it occurs automatically, takes place according to laws that admit of no exception" (Brugmann 1878).[1]

The Neogrammarians did not, however, regard regular sound correspondences or comparative reconstructions as relevant to the proof of genetic relationship between languages. In fact, they made almost no statements on how languages are to be classified (Greenberg 2005:158). The only Neogrammarian to deal with this question was Berthold Delbrück, Brugmann's collaborator on the Grundriß der vergleichenden Grammatik der indogermanischen Sprachen (Greenberg 2005:158-159, 288). According to Delbrück (1904:121-122, quoted in Greenberg 2005:159), Bopp had claimed to prove the existence of Indo-European in the following way:

The proof was produced by juxtaposing words and forms of similar meanings. When one considers that in these languages the formation of the inflectional forms of the verb, noun and pronoun agrees in essentials and likewise that an extraordinary number of inflected words agree in their lexical parts, the assumption of chance agreement must appear absurd.

Furthermore, Delbrück took the position later enunciated by Greenberg on the priority of etymologies to sound laws (1884:47, quoted in Greenberg 2005:288): "obvious etymologies are the material from which sound laws are drawn."

The opinion that sound correspondences or, in another version of the opinion, reconstruction of a proto-language are necessary to show relationship between languages thus dates from the 20th, not the 19th century, and was never a position of the Neogrammarians. Indo-European was recognized by scholars such as William Jones (1786) and Franz Bopp (1816) long before the development of the comparative method.

Furthermore, Indo-European was not the first language family to be recognized by students of language. Semitic had been recognized by European scholars in the 17th century, Finno-Ugric in the 18th. Dravidian was recognized in the mid-19th century by Robert Caldwell (1856), well before the publication of Schleicher's comparative reconstructions.

Finally, the supposition that all of the language families generally accepted by linguists today have been established by the comparative method is untrue. For example, although Eskimo–Aleut has long been accepted as a valid family, "Proto-Eskimo–Aleut has not yet been reconstructed" (Bomhard 2008:209). Other families were accepted for decades before comparative reconstructions of them were put forward, for example Afro-Asiatic and Sino-Tibetan. Many languages are generally accepted as belonging to a language family even though no comparative reconstruction exists, often because the languages are only attested in fragmentary form, such as the Anatolian language Lydian (Greenberg 2005:161). Conversely, detailed comparative reconstructions exist for some language families which nonetheless remain controversial, such as Altaic and Nostratic (however, a specification is needed here: Nostratic is a proposed proto-proto-language, while Altaic is a "simple" proto-language - with Altaic languages widely accepted as typologically related. Detractors of both proposals simply claim that the data collected to show by comparativism the existence of both families is scarce, wrong and non sufficient. Keep in mind that regular phonological correspondences need thousands of lexicon lists to be prepared and compared before being established. These lists are lacking for both the proposed families. Furthermore, other specific problems affect "comparative" lists of both proposals, like the late attestation for Altaic languages, or the comparison of not certain proto-forms, like proto-Kartvelian, for Nostratic.).[2][3]

A continuation of earlier methods?[edit]

Greenberg claimed that he was at bottom merely continuing the simple but effective method of language classification that had resulted in the discovery of numerous language families prior to the elaboration of the comparative method (1955:1-2, 2005:75) and that had continued to do so thereafter, as in the classification of Hittite as Indo-European in 1917 (Greenberg 2005:160-161). This method consists in essentially two things: resemblances in basic vocabulary and resemblances in inflectional morphemes. If mass comparison differs from it in any obvious way, it would seem to be in the theoretization of an approach that had previously been applied in a relatively ad hoc manner and in the following additions:

  • The explicit preference for basic vocabulary over cultural vocabulary.
  • The explicit emphasis on comparison of multiple languages rather than bilateral comparisons.
  • The very large number of languages simultaneously compared (up to several hundred).
  • The introduction of typologically based paths of sound change.

The positions of Greenberg and his critics therefore appear to provide a starkly contrasted alternative:

  • According to Greenberg, the identification of sound correspondences and the reconstruction of protolanguages arise from genetic classification.
  • According to Greenberg's critics, genetic classification arises from the identification of sound correspondences or (others state) the reconstruction of protolanguages.

Time limits of the comparative method[edit]

Besides systematic changes, languages are also subject to random mutations (such as borrowings from other languages, irregular inflections, compounding, and abbreviation) that affect one word at a time, or small subsets of words. For example, Spanish perro (dog), which does not come from Latin, cannot be rule-mapped to its Italian equivalent cane (the Spanish word can is the Latin-derived equivalent but is much less used in everyday conversations, being reserved for more formal purposes). As those sporadic changes accumulate, they will increasingly obscure the systematic ones—just as enough dirt and scratches on a photograph will eventually make the face unrecognizable.

On this point, Greenberg and his critics agree, as over against the Moscow school, but they draw contrasting conclusions:

  • Greenberg's critics argue that the comparative method has an inherent limit of 6,000 – 10,000 years (depending on the author), and that beyond this too many irregularities of sound change have accumulated for the method to function. Since according to them the identification of regular sound correspondences is necessary to establish genetic relationship, they conclude that genetic relationships older than 10,000 years (or less) cannot be determined. In consequence, it is not possible to go much beyond those genetic classifications that have already been arrived at (e.g. Ringe 1992:1).
  • Greenberg argued that cognates often remain recognizable even when recurrent sound changes have been overlaid by idiosyncratic ones or interrupted by analogy, citing the cases of English brother (2002:4), which is easily recognizable as a cognate of German Bruder even though it violates Verner's law, and Latin quattuor (1957:45), easily recognizable as a reflex of Proto-Indo-European *kʷetwor even though the changes e > a and t > tt violate the usual sound changes from Proto-Indo-European to Latin. (In the case of brother, the sound changes are actually known, but intricate, and are only decipherable because the language is heavily documented from an early date. In the case of quattuor, the changes are genuinely irregular, and the form of the word can only be explained through means other than regular sound change, such as the operation of analogy.)
  • In contrast, the "Moscow school" of linguists, perhaps best known for its advocacy of the Nostratic hypothesis (though active in many other areas), has confidence in the traceability of regular sound changes at very great time depths, and believes that reconstructed proto-languages can be pyramided on top of each other so as to attain still earlier proto-languages, without violating the principles of the standard comparative method.

Toward a resolution of the conflict?[edit]

In spite of the apparently intractable nature of the conflict between Greenberg and his critics, a few linguists have begun to argue for its resolution. Edward Vajda, noted for his recent proposal of Dené–Yeniseian, attempts to stake out a position that is sympathetic to both Greenberg's approach and that of its critics, such as Lyle Campbell and Johanna Nichols.[4] George Starostin, a member of the Moscow school, argues that Greenberg's work, while perhaps not going beyond inspection, presents interesting sets of forms that call for further scrutiny by comparative reconstruction, specifically with regard to the proposed Khoisan [5] and Amerind [6] families.

See also[edit]


  1. ^ Lehmann, Winfred P. (2007-03-20). "A Reader in Nineteenth Century Historical Indo-European Linguistics: Preface to 'Morphological Investigations in the Sphere of the Indo-European Languages' I". Retrieved 2012-03-11. 
  2. ^ R.L. Trask, Historical Linguistics (1996), chapters 8 to 13 for an intensive lookout on language comparison.
  3. ^ Claudia A. Ciancaglini, "How to prove genetic relationships among languages: the cases of Japanese and Corean", 2005, "La Sapienza" University, Rome
  4. ^ [1] Archived May 18, 2008, at the Wayback Machine.
  5. ^
  6. ^


Works cited[edit]

  • Baxter, William H. and Alexis Manaster Ramer. 1999. "Beyond lumping and splitting: Probabilistic issues in historical linguistics."
  • Bomhard, Allan R. 2008. Reconstructing Proto-Nostratic: Comparative Phonology, Morphology, and Vocabulary, 2 volumes. Leiden: Brill.
  • Bopp, Franz. 1816. Über das Conjugationssystem der Sanskritsprache in Vergleichung mit jenem der griechischen, lateinischen, persischen und germanischen Sprache. Frankfurt-am-Main: Andreäischen Buchhandlung.
  • Brugmann, Karl. 1878. Preface to the first issue of Morphologische Untersuchungen auf dem Gebiete der indogermanischen Sprachen. Leipzig: S. Hirzel. (The preface is signed Hermann Osthoff and Karl Brugmann but was written by Brugmann alone.)
  • Brugmann, Karl and Berthold Delbrück. 1886-1893. Grundriß der vergleichenden Grammatik der indogermanischen Sprachen, 5 volumes (some multi-part, for a total of 8 volumes). Strassburg: Trübner.
  • Caldwell, Robert. 1856. A Comparative Grammar of the Dravidian or South-Indian Family of Languages. London: Harrison.
  • Campbell, Lyle (2001). "Beyond the Comparative Method". In Blake, Barry J.; Burridge, Kate; Taylor, Jo. Historical Linguistics 2001. 15th International Conference on Historical Linguistics, Melbourne, 13–17 August 2001. 
  • Campbell, Lyle (2004). Historical Linguistics: An Introduction (2d ed.). Cambridge, Massachusetts: MIT Press. 
  • Delbrück, Berthold. 1884. Einleitung in das Sprachstudium, 2d edition. Leipzig: Breitkopf und Härtel.
  • Delbrück, Berthold. 1904. Einleitung in das Studium der indogermanischer Sprachen, 4th and renamed edition of Einleitung in das Sprachstudium, 1880. Leipzig: Breitkopf und Härtel.
  • Georg, Stefan; Vovin, Alexander (2003). "From Mass Comparison to Mess Comparison: Greenberg's "Eurasiatic" Theory". Diachronica (20:2): 331–362. 
  • Greenberg, Joseph H. (1955). Studies in African Linguistic Classification. New Haven: Compass Publishing Company.  (Photo-offset reprint of eight articles published in the Southwestern Journal of Anthropology from 1949 to 1954, with minor corrections.)
  • Greenberg, Joseph H. (1957). Essays in Linguistics. Chicago: University of Chicago Press. 
  • Greenberg, Joseph H. 1960. "The general classification of Central and South American languages." In Selected Papers of the Fifth International Congress of Anthropological and Ethnological Sciences, 1956, edited by Anthony F.C. Wallace, 791-94. Philadelphia|publisher=University of Pennsylvania Press. (Reprinted in Greenberg 2005, 59-64.)
  • Greenberg, Joseph H. (1963). The Languages of Africa. Bloomington: Indiana University Press.  (Heavily revised version of Greenberg 1955.)(From the same publisher: second, revised edition, 1966; third edition, 1970. All three editions simultaneously published at The Hague by Mouton & Co.)
  • Greenberg, Joseph H. 1971. "The Indo-Pacific hypothesis." Current Trends in Linguistics, Volume 8: Linguistics in Oceania, edited by Thomas F. Sebeok, 807-871. The Hague: Mouton. (Reprinted in Greenberg 2005.)
  • Greenberg, Joseph H. (1987). Language in the Americas. Stanford: Stanford University Press. 
  • Greenberg, Joseph (1993). "Observations concerning Ringe's 'Calculating the factor of chance in language comparison". Proceedings. 137.1. American Philosophical Society. pp. 79–90. 
  • Greenberg, Joseph H. (2000). Indo-European and Its Closest Relatives: The Eurasiatic Language Family. Volume 1: Grammar. Stanford University Press. 
  • Greenberg, Joseph H. (2002). Indo-European and Its Closest Relatives: The Eurasiatic Language Family. Volume 2: Lexicon. Stanford University Press. 
  • Greenberg, Joseph H. (2005). Genetic Linguistics: Essays on Theory and Method, edited by William Croft. Oxford. Oxford University Press. 
  • Kessler, Brett (2001). The Significance of Word Lists: Statistical Tests for Investigating Historical Connections Between Languages. Stanford, California: CSLI Publications. 
  • Laakso, Johanna. 2003. "Linguistic shadow-boxing." Review of The Uralic Language Family: Facts, Myths and Statistics by Angela Marcantonio.
  • Lehmann, Winfred P. 1993. Theoretical Bases of Indo-European Linguistics. London: Routledge
  • Ringe, Donald. 1992. "On calculating the factor of chance in language comparison." American Philosophical Society, Transactions 82.1, 1-110.
  • Ringe, Donald. 1993. "A reply to Professor Greenberg." American Philosophical Society, Proceedings 137, 91-109.
  • Ringe, Donald A., Jr. 1995. "'Nostratic' and the factor of chance." Diachronica 12.1, 55-74.
  • Ringe, Donald A., Jr. 1996. "The mathematics of 'Amerind'." Diachronica 13, 135-54.
  • Ruhlen, Merritt (1987). A Guide to the World's Languages. Stanford: Stanford University Press. 
  • Ruhlen, Merritt. 1994. On the Origin of Languages: Studies in Linguistic Taxonomy. Stanford: Stanford University Press.
  • Schleicher, August. 1861-1862. Compendium der vergleichenden Grammatik der indogermanischen Sprachen. Kurzer Abriss der indogermanischen Ursprache, des Altindischen, Altiranischen, Altgriechischen, Altitalischen, Altkeltischen, Altslawischen, Litauischen und Altdeutschen, 2 volumes. Weimar: H. Boehlau.
  • Schleicher, August. 1874. A Compendium of the Comparative Grammar of the Indo-European, Sanskrit, Greek, and Latin Languages, translated from the third German edition by Herbert Bendall. London: Trübner and Co. (An abridgement of the German original.)

Further reading[edit]


  • Clifton, John. 2002. LINGUIST List 13.491: Review of Kessler 2001.
  • Hock, Hans Henrich and Brian D. Joseph. 1996. Language History, Language Change, and Language Relationship: An Introduction to Historical and Comparative Linguistics. Berlin: Mouton de Gruyter.
  • Kessler, Brett. 2003. Review of Time Depth in Historical Linguistics. Diachronica 20, 373-377.
  • Kessler, Brett and A. Lehtonen. 2006. "Multilateral comparison and significance testing of the Indo-Uralic question." In Phylogenetic Methods and the Prehistory of Languages, edited by Peter Foster and Colin Renfrew. McDonald Institute for Archaeological Research. (Also: Unofficial prepublication draft (2004).)
  • Matisoff, James. 1990. "On megalocomparison." Language 66, 109-20.
  • Poser, William J. and Lyle Campbell. 1992. "Indo-European Practice and Historical Methodology." Proceedings of the Eighteenth Annual Meeting of the Berkeley Linguistics Society, 214-236.


  • Greenberg, Joseph H. 1990. "The American Indian language controversy." Review of Archaeology 11, 5-14.
  • Newman, Paul. 1995. On Being Right: Greenberg's African Linguistic Classification and the Methodological Principles Which Underlie It. Bloomington: Institute for the Study of Nigerian Languages and Cultures, African Studies Program, Indiana University.
  • Ruhlen, Merritt. 1994. The Origin of Language: Tracing the Evolution of the Mother Tongue. New York: John Wiley and Sons.

External links[edit]