Why is no one talking of sample size and sampling strategy of all the studies that have been done? MOst of these studies are poor in terms of design and not much generalisations can be made from them.


This article probably could benefit from some reference to (or at least a link to) the Tocharian people [1]

Borderline if they can be included in the "south asia concept" but as india seems to belong here, and the Tocharians probably had some contact there on their way to the east of China....

Even if they did or did not in the end contribute many genes they can stand as an example of the migrations from Europe to Asia that were taking place. And it's nice that we have physical relics of them, mummies and colour depictions.

Then there are the genetics of the Berber people and the Guanche People, who also seem to have had that funny looking yellow hair and seem to have been lodged outside their normal realm, but they are in africa hence by far outside the scope of this article.

Would take a shot at it myself if I were not so lazzzzy.

Removed the following:

It is certainly agreed upon by most antoropologists that most Indians, whether "Dravidian" or "Aryan" are still members of the Caucasiod race as per their skull-structure and genes.

Egon Freiherr von Eickstedt {{Von E. Eickstedt racial definitions}}

--- skull-structure? and his theories are linked to some dodgy 'white race' page.. It should be clear that the existence of a specific genetic marker Haplogroup R1a1 (Y-DNA) does not necessarily relate to the appearance (skull-structure) or colour of skin of a person. Even though it is interesting to see how the genes distribution is consistent with the thus far lingustically theoretical Proto-Indo-Europeans But note that the article's links are considere pseudo-science and that this is a highly controversial field... As of 1999, most Physical Anthropologists disagreed that there are any physical existing 'races' that differentiate the human race.

Max Nemo 13:49, 28 February 2007‎ (UTC)


Maybe this text (from Indo-Aryan migration article) can be merged here: There has been significant progress in genetic studies of the Indian caste populations in the last five years (as of 2006); this has implications for the Indo-Aryan migration/invasion theory. The studies could be broadly classified into pro-racial and pro-cultural.

The genetic studies are ongoing with conflicting results: those that support an infusion of genetic material {Bamshad et al.(2001), Spencer Wells, Journey of Man(2002), Basu et al. (2003), Cordaux et al.(2004)} (pro-racial) and those that don't {Kivisild et al.(2003), Sengupta et al.(2005), Sahoo et al.(2006)} (pro-cultural). A final picture will emerge after critical and comparative analyses of these studies.

The pro-racial construction studies maintain that there exists "Aryan" Y-lineages in Indian(especially upper caste) population. The age of these Y-lineages in India coincided with the putative Aryan invasion period in their studies.

The pro-cultural construction studies argue that the lineages identified with the "Aryan" are in fact more diverse in lower caste and tribal populations even though their frequency is lower. Their studies came to the conclusion that most of Indian Y-chromosomes date back to late pleistocene.

Interestingly, western Eurasian mtDNA haplogroup U2i in pro-racial Bamshad et al. study turned into Eastern Eurasian(mostly India specific) in pro-culture Kivisild et al. study.

However, there are still doubts exist over autosomal admixture analysis. It is also suggested that Indian marital traditions may have an impact on the calculation of age of Indian Y-haplogroups.

The genetic views on race differ in their classification of Dravidians. Most modern anthropologists, however, reject the genetic existence of race[1], like Richard Lewontin who states that "every human genome differs from every other", showing the impossibility of using genetics to define races. (Biology as Ideology, page 68).[2] According to population geneticist L.L. Cavalli-Sforza of Stanford, almost all Indians are genetically Caucasian,[3] but Lewontin rejects the label Caucasian.[4] Cavalli-Sforza found that Indians are about three times closer to West Europeans than to East Asians. Genetic anthropologist Stanley Marion Garn considers the entirety of the Indian Subcontinent to be a "race" genetically distinct from other populations.[5][6] Others, such as Lynn B. Jorde and Stephen P. Wooding, claim South Indians are genetic intermediaries between Europeans and East Asians.[7][8][9] Recent studies of the distribution of alleles on the Y chromosome[2][3], microsatellite DNA[4], and mitochondrial DNA[5] in India have cast overwhelmingly strong doubt upon any biological Dravidian "race" as distinct from non-Dravidians in the Indian subcontinent. This doubtfulness applies to both paternal and maternal descent; however, it does not preclude the possibility of distinctive South Indian ancestries associated with Dravidian languages.[10]

  1. ^ Bindon, Jim. University of Alabama. Department of Anthropology. August 23, 2006. <>.
  2. ^ Lewontin, R.C. Biology as Ideology The Doctrine of DNA. Ontario: HarperPerennial, 1991.
  3. ^ Sailer, Steve. "Interesting India, Competitive China". xbiz. Retrieved 2006-09-12. 
  4. ^ Robert Jurmain, Lynn Kilgore, Wenda Trevathan, and Harry Nelson. Introduction to Physical Anthropology. 9th ed. (Canada: Thompson Learning, 2003)
  5. ^ Garn SM. Coon. On the Number of Races of Mankind. In Garn S, editor. Readings on race. Springfield C.C. Thomas.
  6. ^ Robert Jurmain, Lynn Kilgore, Wenda Trevathan, and Harry Nelson. Introduction to Physical Anthropology. 9th ed. (Canada: Thompson Learning, 2003)
  7. ^ Jorde, Lynn B Wooding, Stephen P. Nature Genetics. Department of Human Genetics. 2004. <>.
  8. ^ Bamshad, M.J. et al. Human population genetic structure and inference of group membership. Am. J. Hum. Genet. 72, 578−589 (2003).
  9. ^ Rosenberg, N.A. et al. Genetic structure of human populations. Science 298, 2381−2385 (2002).
  10. ^ Sitalaximi, T "Microsatellite Diversity among Three Endogamous Tamil Populations Suggests Their Origin from a Separate Dravidian Genetic Pool" Human Biology - Volume 75, Number 5, October 2003, pp. 673-685

— Preceding unsigned comment added by Rayfield (talkcontribs) 20:49, 17 June 2006‎ (UTC)

Possible error

From the mtDNA section:

Virtually all modern Central Asian MtDNA M lineages seem to belong to the Eastern Eurasian (Mongolian) rather than the Indian subtypes of haplogroup M, which indicates that no large-scale migration from the present Turkic-speaking populations of Central Asia to India (and vice versa) could have occurred (Kivisild 2000).

This might be clarified. Migrations were sometimes mostly comprised of males, esp if migrations of conquest, Therefore the lack of Turkic mtDNA does not necessarily mean there was n;t any substantial migration of Turkic men into India Hxseek (talk) 08:36, 20 January 2008 (UTC)

this is correct. we cannot draw our own conclusions. we need to phrase this along the lines of "Kivisild (2000) conclude ..." provided the conclusion is in the paper. Otherwise remove. dab (𒁳) 09:18, 20 January 2008 (UTC)

Bunch of stuff

Transfering incoherent material from Indo-Aryan migration#Physical anthropology. Some of it seems to be in the article already; whether any sense can be made of the rest remains to be seen.

However (Kivisild 2003a; Kivisild 2003b) have revealed that a high frequency of haplogroup 3 (R1a1) occurs in about half of the male population of Northwestern India and is also frequent in Western Bengal. These results, together with the fact that haplogroup 3 is much less frequent in Iran and Anatolia than it is in India, indicates that haplogroup 3 found among high caste Telugus did not necessarily originate from Eastern Europeans. The high diversity of haplogroup 3 and 9 in India suggests that these haplogroups may have originated in India (Kivisild 2003a). Studies of Indian scholars showed the R1a lineage forms around 35–45% among all the castes in North Indian population (Namita Mukherjee et al. 2001) and the high frequency of R1a1 present in the indigenous Chenchu and Badaga tribes of South India making the association with the Brahmin caste more vague. However, a model involving population flow from Southern Asia into Central Asia during Paleolithic interglacial periods with a subsequent R1a1-mediated Neolithic migration of Indo-European-speaking pastoralists back into Southern Asia would also be consistent with these data. A further study (Saha et al 2005) examined R1a1 in South Indian tribals and Dravidian population groups more closely, and questioned the concept of its Indo-Iranian origin. Most recently Sengupta et al. (2006) have confirmed R1a's diverse presence including even Indian tribal and lower castes (the so-called untouchables) and populations not part of the caste system. From the diversity and distinctiveness of microsatellite Y-STR variation they conclude that there must have been an independent R1a1 population in India dating back to a much earlier expansion than the Indo-Aryan migration. The pattern of clustering does not support the model that the primary source of the R1a1-M17 chromosomes in India was a single entry of Indo-European speaking pastoralists from Central Asia. However, the data are not necessarily inconsistent with more complicated demographic scenarios involving multiple entries in both Paleolithic and Neolithic periods and two-way population flows into and out of South Asia. In addition, there remains a difference in haplogroup prevalence between present day Indo-Aryan and Dravidian speakers[citation needed] and between upper and lower castes. The preponderant haplogroup amongst a proportion of Indo-Aryan populations and upper castes is R (both R1a and R2) and, amongst tribal groups and lower castes, to a higher extent Dravidian ones, it is haplogroup H (Y-DNA). The high prevalence of haplogroup R1a (around 50-60%) combined with the relatively low prevalence of haplogroup H in the northwestern portion of the subcontinent (northwestern India and present-day Pakistan) also suggests an affinity between this part of the subcontinent and the Central Asian steppes, perhaps brought about by longstanding two-way population flows. The absence of haplogroup R1b (Y-DNA) in Indo-Aryan and Dravidian populations which is found in all other Indo-European populations, in especially large proportions in western Europe, may suggest significant levels of native genetic base for the Indo-Aryan peoples compared to other Indo-European peoples. However, it must be noted that R1b is also not present in significant levels in Slavic and Central Asian populations.

Bafflegab at its finest. rudra (talk) 07:19, 22 January 2008 (UTC)

Haplogroup L

There is a whole paragraph that looks as self-research (and obviously lacks of any sources):

Most of the pro-migration papers imply that R1a1 is the genetic marker that is representative of a migration, due to its high frequency in Eurasia. But an equally likely genetic marker is haplogroup L. This haplogroup is present in Greek, Turkish, Lebanese, Iranian, Central Asian, and Indian populations (and Europe, see Kivisild). This marker is found in locations where written sources record the presence of Indo-European languages and people: i.e. Greeks, Hittite, Mitanni, Iranians and Indians. Its peak frequency is found in Indo-Iranian populations. The 'Western Eurasian' components that are found in Indian mtDNA show a distribution closer to that found in the Southern Caucasus and Middle East than to that found in Eastern Europe. There is also the question of why one should assume only one Y haplogroup is representative of the Aryan gene pool. R1a1, R1b, J2, L and H - all of which are present in India and Central and West Asia - are all possibilities. However, haplogroup L has a very low level of diversity in the Punjab. This is suggestive of a recent migration or expansion event in the area, and is supported by the fact that the diversity of R1a1, J2 and haplogroup C is higher in the region. Haplogroup C is supposed to be the remmants of the "Out of Africa" migration of humans, but still retains a high level of diversity. Haplogroup L is also found in South India at relatively high freqencies and has been associated by some (along with J2) with the spread of farming and Dravidian languages. However haplogroup L1 is the dominant one in southern India, hence may represent an expansion event in the South (or elite dominance from the North).

Haplogroup L does exist in Western Asia (Iran, Turkey) [6][7], but it is quite rare. It doesn't seem to justify the hypothesis expressed above and in any case, it should not be self-research but the elaboration of an academic opinion.

H is not present (except maybe erratics and the well known case of the Roma people) outside of South Asia.

And I really don't understand why macrohaplogroup C (most common in NE Asia and Oceania) is even mentioned in that context, really.

Overall the paragraph looks a very good candidate for deletionist practices. --Sugaar (talk) 08:25, 10 March 2008 (UTC)

Rosenberg et al.

This article has the much-reproduced graph of populations vs. 7 genetic clusters (Image:Rosenberg2007.png) from the recent Rosenberg et al. paper but the article text has no discussion of the results of the paper.

It is based on 1200 polymorphisms across the whole genome, while all the genetic discussion currently in the article is based on very restricted, much smaller sets of DNA that are only inherited through matrilineage (mtDNA) or patrilineage (Y-DNA); or on single autosomal genes. Including it would modernize this article into the genomic era.

Here are some representative quotes:

  • "Populations from India, and groups from South Asia more generally, form a genetic cluster, so that individuals placed within this cluster are more genetically similar to each other than to individuals outside the cluster. However, the amount of genetic differentiation among Indian populations is relatively small. The authors conclude that genetic variation in India is distinctive with respect to the rest of the world, but that the level of genetic divergence is smaller in Indians than might be expected for such a geographically and linguistically diverse group."
  • "We found that allele frequencies in India showed detectably greater similarity to populations in Europe and the Middle East than to those in East Asia (Figure 4). This result is consistent with the fact that the cluster corresponding to India in Figure 2A subdivides a previously obtained cluster corresponding to Europe, the Middle East, and Central/South Asia [19]."
  • "The only population whose Fst values within India substantially overlapped those of either Europe/Middle East or East Asia was the Parsi population."
  • "Compared to groups that speak Indo-European languages, the groups in our study that speak Dravidian languages (Kannada, Malayalam, Tamil, and Telugu) did not show noticeably different patterns of pairwise Fst values, and in particular, they did not show a greater Fst from populations of Europe and the Middle East (Figure 5)."
  • "European allele frequencies are often reasonably predictive of frequencies in India, particularly for microsatellites (Figure 7A and 7C). The correlations are increased by using a linear combination of allele frequencies with ~2/3 contribution from Europe/Middle East and ~1/3 contribution from East Asia (Figure 8). At the same time, however, the separate cluster for India in population structure analysis indicates that allele frequencies in India are distinctive, so that predictions obtained based on European and East Asian groups cannot fully explain allele frequencies in Indian populations. This comment applies particularly for the indels (Figure 7B and 7D)"

--JWB (talk) 00:12, 8 August 2008 (UTC)

Ah, I see, it's a quote from the Rosenberg paper. What a shame that people writing these sorts of papers themselves don't appear to understand exactly what they are measuring. No wonder others get confused. Alun (talk) 18:07, 12 January 2009 (UTC)
Ah, I see, it's a quote from the Rosenberg paper. What a shame that people writing these sorts of papers themselves don't appear to understand exactly what they are measuring. No wonder others get confused. Alun (talk) 18:07, 12 January 2009 (UTC) — Preceding unsigned comment added by Dougweller (talkcontribs)


What's the notability of this subject? Most of the references eem to discuss India and not South Asia. The introduction is no introduction at all, it doesn't establish the notability of the subject, and there is no attempt to explain exactly what the article is about as a specific coherent subject. There are several articles called "Genetic history of ..." (e.g. Genetic history of Europe, Genetic history of the British Isles). I suggest that we rename this article Genetic history of India (or Genetic history of South Asia). I think it will hang together better then. I'll move to that name in a week or so unless there are any serious objections. Alun (talk) 08:02, 12 January 2009 (UTC)

Any sensible Y-chromosome studies?

My comment applies more to the research papers than this article. I know such comments are against Wikipedia policy, but I do pose one relevant request:

Please post link(s) to relevant research paper(s), if any.

Correlations between Indian caste and Y-haplogroup would be interesting to study. Too bad so many of the studies are flawed; in some cases the flaws appear to be deliberate and have political motivation. I'll mention the most obvious.

(1) Is it politically incorrect in India to distinguish Kshatriya and Vaisya castes? If not, why are these lumped together in so many DNA studies? And, if we must lump, why, in heaven's name, is Kshatriya lumped into "Middle Caste" in some studies and "Upper Caste" in others?

(2) Since some surveys combine Upper vs Middle studies with different definitions into single Upper vs Middle summaries, one might even guess that the Kshatriya confusion is deliberate! ... to enhance the appearance of "genetic homogeneity."

(3) As shown in my own crude summary (at R1a is common among both Brahmin and Sudra but not the "Middle Castes"; in my page I provide a reason for this. Yet some studies ignore this and use a flawed statistic to derive "genetic homogeneity" from the Sudra fact!

(4) Many papers describing Y-chromosome research by caste discard, unreported, the haplogroup information altogether. giving only a closeness statistic! Need I explain why this is silly?

(5) mtDNA studies concluding genetic homogeneity of castes are too laughable for words. Castes are inherited from the father; mtDNA from mother.

What I would like to see is raw data summarizing three facts about test individuals: caste, region, haplogroup. Note that the region information has much importance since, as the caste system spread to new regions, different indigenous groups might be rewarded with higher status. (And please let's don't lump Kshatriya with either Brahmin or Vaisya depending on which lumping produces the politically desired statistic!)

For example, there is a very strong

    Kshatriya = R2 haplogroup

effect in Northeast India, but this would be invisible in a region-agnostic study like Sengupta's, even without Sengupta's unfortunate equation of Kshatriya and Vaisya.

Jamesdowallen (talk) 06:17, 17 June 2009 (UTC)

Section on Autosomal markers

The section on Autosomal markers is just a collection of unrelated non continuous cunks of information. Please let me know if somebody can make sense out of it and could put it as a palatable section. Else it is up for deletion. nihar (talk) 05:34, 10 July 2009 (UTC)

Done the needfull nihar (talk) 11:27, 22 July 2009 (UTC)

Reich study

New study Indian ancestry revealed, likely to have important implications for genetic history of South Asia. Some of the material has social political implications related to the Indian caste system. Basically the study suggests an ancient mixing of two distinct populations, one population similar to central and western Eurasians, and another population related to the Andaman Islanders. Wapondaponda (talk) 18:06, 23 September 2009 (UTC)

"Sharma et al (2009)"

What article is this? If it's this, it's very hard to fathom how it even passed peer and editorial review. Much of it, of course, is very similar in style to so many other articles of its type that are flooding various "journals" these days, in that the authors make portentous claims without ever, even once, stating the actual null hypotheses tested. (Hey, anyone, including my sainted grandmother, can use fancy shmancy statistical software and charting packages. That is still no substitute for actually making sense or being relevant.) But this article hits a new low for sheer shoddiness. Look at Table 1. Half of the rows have percentages which amount to non-integral numbers of persons: e.g., of the 30 West Bengal Brahmins sampled, apparently 1.67 (5.56% or 1/18 of 30) were H1, 21.67 were R1a1 and 6.67 were R2. Never mind the typos (e.g. the first two columns for Gujarat Brahmins should be 3.13, not 3.33) And then, under "RESULTS AND DISCUSSION", without explanation, the sample size switches from 621 in the first section to 510 in the second (the apparent data for which is tucked away in the supplementary document, as "supplementary Table 1"), literally from one paragraph to the next. Why is this agglomeration of graduate student GIGO being cited at all? rudra (talk) 18:19, 16 January 2010 (UTC)

Even better (or worse) they cite/quote Poliakov's Aryan Myth for allegedly philological and anthropological "models" and "evidence". Are these clowns for real? rudra (talk) 18:27, 16 January 2010 (UTC)
In a footnote: "Received 17 August 2008; revised 30 October 2008; accepted 6 November 2008; published online 9 January 2009". Revised!? Unbelievable. This journal is no longer being published through Springer. Does it have any credibility left? rudra (talk) 01:06, 17 January 2010 (UTC)
R1a(1) being indigenous to India is also attested by Underhill (2009). If you had cared to read the article in question you would know that the 621 referred to all subjects tested whereas the only time the number "510" appears is when the authors are discussing the Y-chromosomes of the Brahmins in question. Content on this page should not be discussed in isolation of content on the related article Haplogroup R1a (Y-DNA) (which presently favors a South Asian origin of R1a based on many more studies than this one). GSMR (talk) 03:15, 17 January 2010 (UTC)
Have you read the articles or are you parroting your favorite blog-warrior? A full-text search reveals that the number "510" occurs exactly once in the entirety of the paper, footnotes and all. It is not explained. According to "supplementary table 1" this 510 consist of, as far as can be made out, 256 "Brahmins" and 254 "Tribals". So it is not about the "Y-chromosomes of the Brahmins in question" as 510 also includes 254 Tribals. As for the 621 that conveniently and promptly dropped out of sight, they had 367 "Brahmins", 227 "Tribals" and 27 "Scheduled Castes". So, not only is the 510 not explained, the 256 and 254 parts of it are not explained either. Care to explain how the numbers match up? And while you're at it, try explaining how "72.22%" of 30 West Bengal Brahmins is a whole number. This paper is utter and total garbage from beginning to end. As for Underhill, don't claim. Produce a direct quote. (The paper isn't about what you would like it to be about) rudra (talk) 04:10, 17 January 2010 (UTC)
As far as your doubts on the Underhill article (and the 6 others which have been used by editors on Haplogroup R1a (Y-DNA) to establish South Asian origin of R1a as the most likely candidate...) why don't you ask there? It's kind of stupid if there are a bunch of articles on the same encyclopedia which contradict each other. Let's play along assume that that the Sharma paper should be disregarded. What about [9] and [10], should I replace the citation with these instead? Or is there a problem with these ones, too? Let's hear it, seeing as no one has complained about the Sahoo study in the past four years.

The Y-chromosomal data consistently suggest a largely South Asian origin for Indian caste communities and therefore argue against any major influx, from regions north and west of India, of people associated either with the development of agriculture or the spread of the Indo-Aryan language family.


The perennial concept of people, language, and agriculture arriving to India together through the northwest corridor does not hold up to close scrutiny. Recent claims for a linkage of haplogroups J2, L, R1a, and R2 with a contemporaneous origin for the majority of the Indian castes’ paternal lineages from outside the subcontinent are rejected ...

So! What's your complaint about this one? Oh, and here's the direct quote from the Underhill article as you requested:

Analysis of associated STR diversity profiles revealed that among the R1a1a*(xM458) chromosomes the highest diversity is observed among populations of the Indus Valley yielding coalescent times above 14 KYA (thousands of years ago), whereas the R1a1a* diversity declines toward Europe where its maximum diversity and coalescent times of 11.2 KYA are observed in Poland, Slovakia and Crete.

... it might bear some significance for assessing dispersal models that have been proposed to explain the spread of Indo-Aryan languages in South Asia as it would exclude any significant patrilineal gene flow from East Europe to Asia, at least since the mid-Holocene period.

As you can see, the most thorough study to date is consistent with Indian origin of R1a.
GSMR (talk) 04:30, 17 January 2010 (UTC)
I really didn't think your English was that poor. It's probably your POV-pushing getting in the way. (Hint: Underhill (2009) did not attest what you would like it to have had.) And since, in the course of 31 consecutive edits, you finally abandoned your effort to defend garbage, I've eliminated it again from the article. rudra (talk) 10:39, 17 January 2010 (UTC)

(unindent) Try the personal attacks and ad hominems as much as you'd like. I would warn you like I should, but I'm immune. But no, these studies are compatible with an Indian origin of R1a, for the last time, look at how these sources are used on Haplogroup R1a (Y-DNA). Yes, I retract the statement that the rows for that one study did not add up. How does that falsify the other sources? They are used on Haplogroup R1a (Y-DNA) as such (along with others; a grand total of 7 of them):

Three recent studies have argued that South Asia is a likely original point of dispersal,[40] while four other studies which have concluded that the data is at least consistent with this scenario.[41] The most thorough study as of December 2009, including a collation of retested Y-DNA from previous studies, makes a South Asian R1a1a origin the strongest proposal amongst the various possibilities.[2]

You're the one trying to hint that there was patrilineal gene flow from Europe to Asia associated with Indo-European migrations (and the caste system), which each of these studies clearly falsifies. GSMR (talk) 14:28, 17 January 2010 (UTC)

Listen up and listen good. Do not threaten, like a punk. Either act, or shut up. rudra (talk) 21:23, 17 January 2010 (UTC)
You want to degenerate this discussion to a flame war, be my guest. In the mean time, WP:WQA#Personal attacks, incivility and accusations of POV-pushing by User:Rudrasharman. GSMR (talk) 21:26, 17 January 2010 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Quite the wikilawyer, aren't you? This section was supposed to be about the Sharma paper. There has been nothing so far to counter my contention that the paper is, to sum up, "garbage". Now, the paper had been cited to back up a contention about R1a1 in high-caste populations. My edit removed the citation because it was pathetically substandard; but preserved the contention it was being used to substantiate, because it's pretty clear that sound references can be found. Practically every paper on South Asian genetics these days says something on the subject, even if in passing. And now here you are, shadow boxing with demons of your own imagining and billowing smoke from fires of your own creation that have nothing to do with the topic (in a nutshell, that's why there were 31 edits in a row). It's okay, though. You're forgiven anyway, because your POV stance is too boring and old-hat to get worked up about. rudra (talk) 21:51, 17 January 2010 (UTC)

Look, that South Asia is presently believed to be the most likely (but not the only) possible candidate for the origin of haplogroup R1a is notable and relevant to this article about South Asian genetics, and the claim I inserted did not cite the Sharma 2009 article; I removed and replaced it with more valid citations which have been around for longer (except Underhill's which is very recent) and no one has found objectionable (yet). GSMR (talk) 22:32, 17 January 2010 (UTC)
It is hard to take you seriously when you barefacedly contradict the evidence of your own edits. Here is the edit where you reintroduced the Sharma paper, now sandwiched between Sengupta(2005) and Sahoo(2006),) in support of your copy-edit with new phrasing. When I removed it, you reverted my edit to bring it back again. Only later, after a detour to my talk page, did you finally take it out yourself. In this process, you also failed to note that I had not removed the new references you had brought in. In fact, it's pretty clear that you had convinced yourself, to the greater dudgeon of your righteous indignation, that I had taken them out. You are too overwhelmed by your POV-pushing to even read what you yourself write. rudra (talk) 21:12, 18 January 2010 (UTC)

Per WP:V, "The threshold for inclusion in Wikipedia is verifiability, not truth—that is, whether readers are able to check that material added to Wikipedia has already been published by a reliable source, not whether we think it is true. " Gerardw (talk) 21:42, 17 January 2010 (UTC)

Indeed. It takes only some arithmetic to verify that Sharma et al (2009) have served up gibberish. Whatever "truth" they might have had to offer was quite lost. (WP:RS is a necessary condition, not a sufficient one, for inclusion in WP. That, in fact, is how WP protects itself from the garbage that, believe it or not, does get published from time to time even in reputable sources.) rudra (talk) 03:11, 18 January 2010 (UTC)

RfC: R1a1

I have requested comment here in order to prevent an imminent edit-war regarding the South Asian origin of Haplogroup R1a (Y-DNA)... I believe that if many sources are used to state the same thing on the main article on the topic (see above), then it should be safe to restate it here. I reiterate:

Three recent studies have argued that South Asia is a likely original point of dispersal,[40] while four other studies which have concluded that the data is at least consistent with this scenario.[41] The most thorough study as of December 2009, including a collation of retested Y-DNA from previous studies, makes a South Asian R1a1a origin the strongest proposal amongst the various possibilities.[2]

Now, if a South Asian origin of R1a is likely according to these sources then there is nothing wrong with restating that content here. Rudra's complaint is the Sharma (2009) article is inaccurate and badly written... but even if this source is excluded what is wrong with all the others?
PS: After carefully looking at table 1 it seems the non-integral percentages seem to be a simple miscount of N. Try using:
Punjab Brahmin = 56, Himachal Brahmin = 39, Uttar Pradesh (South) Kols = 27, Uttar Pradesh (South) gonds = 37, Maharashtra Brahmins = 30, and West Bengal Brahmins = 36. (not that i endorse this source seeing as its obviously bad). GSMR (talk) 15:47, 17 January 2010 (UTC)

That would make the total sample size 637 instead of the correct total of 621 in Table 1. Please try again. rudra (talk) 22:28, 17 January 2010 (UTC)
". (not that i endorse this source seeing as its obviously bad). " <- Myself. GSMR (talk) 00:12, 18 January 2010 (UTC)
True. There were 8 (not 6) bad rows, apart from 6 obvious typos: JK.Guj/Q(xQ5), JK.Guj/R1a1, UP.Br/L, Punj.Br/C5, Guj.Br/C5 and Guj.Br/E. The factors of proportionality (i.e minimum of which the sample size has to be a multiple) for 7 of these rows are, as you found, easy to determine: 28 for Punj.Br, 19 (not 39) for Him.Br, 27 for UP.Kol, 37 for UP.Gond, 16 for MP.Gond, 18 for WB.Br and 30 for MH.Br. You missed MP.Gond, and the weirdest by far, the MP.Saharia row, which is completely off kilter with a minimum size in the hundreds at least. This is all the more remarkable because this row is the stand-out "finding" of the paper (significantly high R1a* incidence in a tribal group): how could they screw up this row, of all rows? rudra (talk) 02:35, 18 January 2010 (UTC)

South Asian origin of R1a1 (M420) is not "clear", it is one of several hypotheses. Yes, the hypothesis is worth mentioning in this article, but within proper context and without disingenious attempts to over-emphasize its strength by cherry-picking references.

Also, "origin" is not the same as the later migration movements the haplogroup can be used as a marker for. R1a1 originated some 20 kya (before the last glacial maximum), and by 11kya (LGM) had spread throughout Eurasia. Its ultimate place of origin is irrelevant in any discussion of post-glacial population movements. --dab (𒁳) 20:56, 17 January 2010 (UTC)

This is reminiscent of the surreal episode with Cosmos416 over Rajaram's predictable misreading of Oppenheimer on M17. There really is nothing new in this POV-magnet. rudra (talk) 23:34, 17 January 2010 (UTC)
Again you are neglecting what I said (right below). I am not arguing about Indo-European migrations or the OIT. Don't put words in my mouth. R1a is not "the" Indo-European gene and I never claimed it was. Why are you trying to provoke me? I added content which mirrored that on the main article for the subject of this section and now you're inferring (wrongly, might I add) what I am trying imply? Give it a rest, rudra. No one made any conclusion based on the notable opinion that South Asia is the most likely candidate for the (pre-Glacial?) origin of R1a1 about anything regarding your pet Aryan Invasion Theory. GSMR (talk) 00:16, 18 January 2010 (UTC)
Trust me, we've heard it all before, many times. Boringly too many times. The obsession with "proving" the South Asian origin of M17 (yclept R1a, then R1a1, and now R1a1a) is characteristic of a POV all too familiar on this and related pages -- as a matter of fact, this page exists only because the obsession was fouling a number of other pages on WP -- all because the POV imagines it slays their favorite bogeyman, the AIT. Outside this POV obsession, no one, not even my sainted grandmother, gives a flying f*ck where M17 originated, but Wikipedians are obliged to maintain accuracy in reporting the scientific literature. And that includes awareness of WP:OTHERCRAP. rudra (talk) 03:39, 18 January 2010 (UTC)
Yeah, you're right. I suppose that Underhill is a Hindutva freak too...
So according to you me saying that M17 being most likely South Asian based on the most recent studies is tantamount to me saying the OIT is correct? I never made such a claim and that is not what I hoped to imply by carrying information from Haplogroup R1a (Y-DNA) to here. Don't accuse me of POV-pushing, all you are doing is baselessly putting words in my mouth and trying to infer what I am trying to imply. No, I have not tried to imply any of these things with my edits. Hey, I have not made any substantial edits to Haplogroup R1a (Y-DNA), are you going to tell me that the infobox saying that "Asia, probably South Asia" being its most liekly origin is because POV-pushing 'indigenists' added that there? GSMR (talk) 15:29, 18 January 2010 (UTC)
Who said anything about what this implies about post-glacial population movements on this article? As I recall it was rudra who tried to relate occurrence to caste rank, not I. GSMR (talk) 21:06, 17 January 2010 (UTC)
  • Comment. I think this recap is enough.
  1. My involvement started with this critique [11][12] posted to the Talk page, followed by this edit of the article. As pointed out elsewhere on this Talk page, I removed the reference, but retained, by suitably editing the material, the contention it was supposed to substantiate. (If this was a "POV edit", I would appreciate someone explaining exactly what POV was introduced and how -- IOW, in what way did my edit of the existing material misrepresent or distort the original contention that I claim it was my intention to retain.)
  2. Then, GSMR tried to defend the reference.[13][14][15][16][17][18]
  3. My response, in view of the editorial effort that went into this defence.
  4. At this point, GSMR apparently tried to advance the defence of the Sharma paper, among other somewhat more polemical concerns. After 31 consecutive edits ([19][20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45][46][47][48][49]) he finally gave up on defending the paper and settled for this on the Talk page.
  5. Having abandoned the defence on the Talk page, GSMR thereupon edited the article, reintroducing the Sharma paper (sandwiched between two new references), and, in my view, substantially modifying the contention in the text of the article. (The WP:OTHERCRAP form of the edit summary is noteworthy, and presumably the ultimate defence of the reintroduced Sharma paper.)
  6. GSMR then posted to my Talk page. Presumably, he had decided that attack was the best form of defence.
  7. I edited the article, removing the Sharma reference, and reworking the material into a statement of the facts, while retaining the other new references introduced by GSMR. (It's my hunch that GSMR failed to register this retention of the new references: e.g. if he looked at the diff, he probably neglected to scroll down and jumped to the conclusion that I had simply reverted his edit.) I also posted to the Talk page.
  8. After some more expostulations ([50][51][52][53]) on the Talk page, GSMR reverted my edit (that it was a true revert is shown by this diff), posted to my talk page [54][55], and then elected to remove the Sharma reference himself. In view of this sequence of actions and the final edit summary, it's pretty clear that GSMR thought he was restoring the other references that he thought I had edited away, presumably in service of some imagined POV.
  9. I reverted back to my version, which thus amounted to a copy-edit only, leaving the references intact (the byte count remained unchanged by coincidence). My annoyance with GSMR's antics shows in the edit summary, in my reply on my talk page, and in this response on this Talk page.
  10. GSMR has chosen to escalate this. Shrug.

I really have no interest in these old boring POV games. The underlying issues have been covered quite well here and here. Everything else, including the escalation, is the usual run of sound and fury in an effort to obfuscate (and, perhaps, to cover embarrassment.) rudra (talk) 08:34, 18 January 2010 (UTC)

The question boils down to, what do we do with the Sharma paper. If GSMR insists it must be mentioned, let us mention it. But it must be mentioned for what it is. GSMR uses it as a reference for the claim

"In South Asia high levels of R1a1a been observed in some populations, and it is a probable point of dispersal, a position defended by three studies"

this is wrong. First of all, how do we know there are exactly three studies "defending" this idea? Let it suffice to say that this has been suggested. Further, the Sharma paper does not make this point at all. Quite to the opposite, it states, in the abstract,

Y-haplogroup R1a1* has a widespread distribution and high frequency across Eurasia, Central Asia and the Indian subcontinent, with scanty reports of its ancestral (R*, R1* and R1a*) and derived lineages (R1a1a, R1a1b and R1a1c).

It then goes on to postulate "autochthonous origin" for R1a1. Does GSMR understand the Sharma abstract? If so, why does he cite the Sharma paper in the context of R1a1a? If not, why does he edit-war over a topic of which he does lack a basic understanding?

The Sharma paper is just that, one primary research paper. We cannot base encylopedic articles on heaps of research papers. But if we do cite such papers, we need to at least report their gist correctly.

I do not have an ovevrview on current evidence on R1a1. The Sharma claim that there is "scanty evidence" of R1a1a in India, which together with the claim of the high frequency of R1a1* is crucial to the conclusion that R1a1 is "autochthonous" is contradicted by this image (Underhill et al. 2009). I don't know which view is correct. But the tell-tale "confusion" of R1a1 (pre-LGM) and R1a1a (post-LGM) unsurprisingly illustrates that this is about "indigenous Aryans" after all. I mean, who would edit-war over pre-glacial population distribution? But the subclades of R1a1a very much do have relevance to population movements in historical or close to historical periods.

My conclusion is pedestrian, GSMR is a pov-pusher with neither knowledge on nor interest in human genetics, who is taking the "wikiquette alert" avenue now because it has become clear that he has no case. This is entirely unremarkable, we see this sort of behaviour on a daily basis. --dab (𒁳) 09:57, 18 January 2010 (UTC)

The Sharma paper is gibberish, incoherent nonsense. WP is not under any obligation to mention every ostensibly "scientific" paper that happens to get published. As far as this article is concerned, we need to decide what haplogroups need to be covered and find references that appropriately summarize the state of knowledge. Random journal articles with poorly documented data, unexplained procedures, naive to simplistic to misinformed conjectures from disciplines outside the competence of geneticists and portentous rambling do not qualify. rudra (talk) 23:37, 18 January 2010 (UTC)

I agree the Sharma paper is flawed. I do not think it is "gibberish", it was ostensibly written by geneticists, and it does make certain amount of sense even if the people pushing it cannot appreciate this. What am I saying is that in my experience it is easier to take the paper into account and discuss it competently rather than insisting that we "need not" cite it. We do not need to cite any research paper, and we still do cite lots of them, so it is difficult to argue why we absolutely shouldn't cite this one. The best solution would be to find a review which states it is flawed and then cite it alongside its review.

The Sharma paper is an attempt to link genetics to the caste system. I think this article should have a separate section on such attempts. The Sharma paper proposes to examine the origin of the caste system, but it does not deliver. In any case, afaics, all these attempts to unravel the origin of the caste system based on genetics so far just yields a zero result: the proposition that the caste system is endogamous over the centuries simply does not stand up. Genes do have a way of diffusing between class boundaries, and after a couple of generations, you have just a blurry gradient. Endogamy is a societal ideology saying how things "should" be. Genetics reveals that reality has never kept up with such rules very well. So, if Sharma et al. find that the Brahmins have "a tribal link" (i.e. show genetic relationship to members of scheduled tribes) this is extremely unremarkable if you assume by default that everybody will be related to everybody else in any society if you just wait for a couple of centuries. Effectively, the Sharma results used to say "autochthonous brahmins" could be interpreted with equal justification as "brahmin wives kept sleeping with handsome tribals". Which, if you think about it, is an mostly equivalent claim anyway, because the sons of these handsome tribal men were the brahmins of 20 years later and kept passing on their Y-chromosomes entirely legally.

After cutting the caste crap, we are left with the proposition that R1a1 originated in India. This would be an interesting result in its own right, but of course they had to sex it up with "autochthonous brahmins". Even if we agree that the caste part is nonsense added to increase publicity (and chances of funding), we can still consider the purely genetic claim on the origin of R1a1. --dab (𒁳) 10:44, 19 January 2010 (UTC)

I agree with dab I think. If people insist upon talking in terms of removing all reference to an article you just end up with a stalemate and/or edit warring. You need to try to find a balanced way to refer to the paper. It should not be so hard because this article does not need to go into every detail. BTW the Sharma article might be poorly worked, but it contains some important data. The poor working is worse than average for genetics articles, but there are some amazingly sloppy "discussion" sections in very respected articles in this field.--Andrew Lancaster (talk) 09:27, 20 January 2010 (UTC)

Dab, I don't think you appreciate the degree to which the Sharma paper is "flawed". It is not a matter of discounting some stray typos or whatnot. Their numbers are out and out nonsense, all the way down to the impressive 72.22% soundbite (which is 21.67 West Bengal Brahmins out of 30 -- my personal favorite statistic is that 21.43% of 49 Punjab Brahmins were J2). They have no data: the numbers they present are impossible. So, there is nothing of scientific value in their paper, and it is entirely possible that they had no clue of what they were doing, as well as being lucky in their choice of journal (considering the editorial expertise involved). Without any numbers, never mind credible ones, it's all bullshit. Jargon encrusted, jazzy sounding, peer-reviewed, reliably sourced bullshit, but bullshit nonetheless. There is no need to pay any attention to this paper when other papers exist to cover the needs of this article, and there is no need to explain why the Sharma paper is bullshit either. Are you really proposing to put an encyclopedic veneer on bullshit? I don't get it, sorry. rudra (talk) 11:05, 21 January 2010 (UTC)

Y-chromosomal problem should lend more caution and conservation in interpretation

Sorry folks, I have been away from WP for some time. I want make a point to dab, while your council here has generally been worthy, that there are underlying issues that must be understood that these molecular pundits within the primary are not discussing.

A recent study on the Y-chromosome by Page indicates that the Y chromosome has a highly variable clock, these studies that argue that by STR or SNPs and interpreting the time of molecular evolution vary within a magnitude range.

"As a prologue Science (edited by Constance Holden, P. 397, 22 Jan, 2010, vol 327

"Surprise in Y") is reporting on work recently reported in the competing Journal Nature. David Page of MIT, sequenced the male specific region of Y in chimpanzee, and found a 30% difference between humans and chimpanzees, much larger than expected.::

I have read this paper and they have found vast changes in the STR regions between chimps and humans, most of the 30% differences lie in these regions.

"Nature 463, 536-539 (28 January 2010) | doi:10.1038/nature08700; Received 3

August 2009; Accepted 24 November 2009; Published online 13 January 2010 From Abstract: "The chimpanzee MSY contains twice as many massive palindromes as the human MSY, yet it has lost large fractions of the MSY protein-coding genes and gene families present in the last common ancestor. We suggest that the extraordinary divergence of the chimpanzee and human MSYs was driven by four synergistic factors: the prominent role of the MSY in sperm production, `genetic hitchhiking' effects in the absence of meiotic crossing over, frequent ectopic recombination within the MSY, and species differences in mating behaviour."

There are several other papers in the last 3 years that support this conclusion. One need only take note of the fact that the TMRCA in 1990s was given as 25 ka, 2000s as 40 ka, and more recently 75 and 110 ka. As the knowledge of the Y chromosome has grown, the estimates of the TMRCA are approximating those based on mtDNA and population size estimates are increasing with constriction exit times earlier and earlier.

As these evolutionary forces (Sexual selection and reproductive) have evolved over time so has the rate of Y evolution, and in particular, the STR evolution. If one has the time I pointed out several critiques of Y-STR which include rate variance as a consequence of Number of STR, size of STR, and generation times in the population. I should point out that STR rate estimates make assumptions about populations (for example current political boundaries) and sampling bias that can markedly throw off rate estimates.

This is a major problem for wikipedia, because every sample statistic has a context and if that author does not spell out the context (such as confidence range or mitigating factors) we dumbly presents results as facts. I should point out clearly that with Y-chromosome these temporal measures of genetic distances are not fact and all are subject to some change. In addition, there is sufficient information to suggest based on the Chimp human comparison that suprises in evolution do occur, and rates for some sites might change rapidly.PB666 yap 07:01, 13 March 2010 (UTC)

With regard to Sharma et al. 2009. There are two principle deficiencies in the paper, the style of the manuscript and its readability (or lack thereof). One could argue that the results were premature since they used an obsolete methodology. Clearly the R1* subcatagory is not defined adequately and with M420 could have been R1a, there are other issues. Basically it a molegen typing paper, as many papers are of the type (I have written a few so . . . . I try to use the highest resolution markers), unlike Underhill which sets out to both apply typing for previously typed markers and application of several new markers, expanding the boundaries of Science in 2 dimensions. Rudra, whenever a new technology is introduced there is always a period of overlap between new and old technologies, one cannot be too critical of the science, other than noting these deficiencies and raising a call for a second improved publication. I see papers with HLA that are still doing Low resolution typing 10 years after commercial high resolution typing kits have hit the market. In that respect, Rudra, you need to back off.

With regard to the frequencies, I was attacked and harassed for bringing up this issue on the R1a page by Andrew Lancaster. To a non statistician, numbers appear to be concrete entities. Population frequencies are anything but concrete. The confidence range, in relative proportion tends to decrease at a rate k/(n^0.5) [a very rough gauge] when k is a function of N then number of observed events of a type and where n is the sample size. When relative frequencies, N/n, are low, such as N = 0 to 10 or population sample (n) is low, relative percent variation is great. The relative variation of a sample of N = 1 is >20 fold. IOW, I don't see a reason we need to present in WP frequency percentages beyond 10, 20, 30, . . . .%. Unless that sample size (n) is such that it deserves such an estimate. It is certainly a violation of significant decimal place rules when one presents a 1 in 27 as 3.70% since the 96% CI exists from 0.18% to well over 4%. In some studies of the same population 0 events in 30 have been observed and 8 events in 50 have been observed in different samples for the same haplotype. If you can get the blessing of WP, I would strongly welcome rounding these sample frequencies to the first digit.PB666 yap 07:01, 13 March 2010 (UTC)

I'm sorry, but I know the difference between science and garbage. The Sharma 2009 paper is garbage. It has nothing to do with "technology" or "methodology", either old or new. It has to do with competence. Their reported numbers -- in other words, their data -- are impossible and nonsensical and therefore utterly devoid of any scientific value. To repeat, Sharma et al have no data. Therefore they have no "results". It's all garbage, period. GIGO is an elementary principle. Wikipedia is not the place to manufacture excuses for manifest incompetence, nor the place to find ways of serving up garbage as if it made some sort of scientific sense. rudra (talk) 12:40, 13 March 2010 (UTC)
Part of the paper is garbage, it is, however no worse than a wide variety of papers cited in the R1a article. This is the basic problem. I made a great effort last year to extract these references from the article and promote more recent reports. Despite the well documented flaws in Sharma et al. 2009 it is still better than a large number of papers on the subject from earlier in the decade that have now reached mythic values.
Let me go through the issues one by one. The error in the column "Gujarat Brahmins" is an error of sloppiness. However its effect is not profound, or to put otherwise not as profound as the way the data is presented. The inverted binomial probability range at 96% confidence suggests the range about 2/64 occurrences is 0.4 to 10.8% (The error ranges should have been presented) the 0.2% difference between what they stated and what the values actually were is trivial, it is sloppy. The row in the table of "Maharashtra Brahmins" would be correct if n=33 and there is a column in the row that is zero, it is possible there was a entry there of 1 for R1a*. What may have happened here is that an excel table was embedded into a microsoft table, someone changed the excel table not realizing that they also changed the final version of the word document. It is always better to cut and past values only once the final status of data is complete, otherwise a misplaced click of the mouse will result in the loss of data.
The major problem IMHO, which is true for a great many (almost all) Y-DNA, is that they fail to state the confidence range. The only valid relative frequency is when the entire population is sample, and changes with migration, births and deaths. Interpreting [absolute] frequency requires the calculation of Single selection haplotype (or allele) probability. THIS IS NOT a relative frequency, it is not even a descrete point it is a probability distribution best represented either by the 95%CI (roughly 2 sigma) range or the 1 sigma (68.4%) CI. When you see the frequency data presented as 96%CI or +/- ranges, you can deal with higher versus lower quality presentations.
PB666, your remark that "With regard to the frequencies, I was attacked and harassed for bringing up this issue on the R1a page by Andrew Lancaster" is sure surprising to me. Can you give a diff for that?--Andrew Lancaster (talk) 10:15, 14 March 2010 (UTC)

For example within the column of "Gujarat Brahmins" a frequency of 0.00% = 0/64, SSHP = 0 to 5.6%; 1.56% = 1 in 64, SSHP at 95% = .039% to 8.4%; 3.13% = 2 in 64, SSHP at 95% = 0.38% to 10.8; and so on . . . . The problem is that if you apply a criteria that a 0.2% misrepresentation of the data is a critical failure, you end up condemning 95% of the literature since the presentation of relative frequency in percent for values that fall below 10 occurances within a sample set is a gross misrepresentation of the valid SSHP range. If you want to go through the R1a page and remove all papers that do not fit that level of criteria there would not be anything left to the R1a page. It is sufficient for me to state that the Y-DNA research is grossly behind other molecular genetic studies, such as HLA, where SSHP and SSAP have been given now in many papers for over 2 decades. Because of the popular nature of the Y-DNA research it is far behind the rest of the feild in quality control and this is a major reason for the constant battling. I have been far more tolerant of these 'results' on the R1a, despite this I have been accussed of harassing, harshness and brutal. Many folks do not want to see the error in the methodological approaches and they want to block the fact that the controversy exists on wikipedia. This is the debate you have entered yourself in, and the way to avoid the information suppression that will go forth is the make compromises with others. Andrew was latrine diving with the quality of data he included, but if we use the strictest statistical standards on published R1a works would be limited to material from only a couple of papers. You cannot place the dividing line a few errors. I strongly recommend that you go through the entire paper and make a list of the inadequacies, when you are done write a letter to the editor of the journal, if they don't give you the attention then send me the info, I will edit for content and send it as a formal critique of the paper.PB666 yap 19:57, 13 March 2010 (UTC)

PB666, I think this discussion is not assisted by denying that the table has unusual problems. Rudra does deserve credit for pointing to that. The discussion should be about the practicalities of what to do in such a situation. I think removal of exact numbers in the main R1a article is acceptable, but I suggest that removing all reference to ever having seen any R1a1* is probably going to far. Can you compress your thoughts into a short comment focused upon practical suggestions?--Andrew Lancaster (talk) 10:15, 14 March 2010 (UTC)


The picture showing Haplogroup F moving into India is wrong. The article on Haplogroup F says it originates in India while the picture shows otherwise. —Preceding unsigned comment added by (talk) 02:17, 14 September 2010 (UTC)