Jump to content

Genealogical DNA test

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Severa (talk | contribs) at 10:32, 17 November 2007 (fix dab). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

A genealogical DNA test examines the nucleotides at specific locations on a person's DNA for genetic genealogy purposes. The test results are meant to have no informative medical value and do not determine specific genetic diseases or disorders (see possible exceptions in Medical information below); they are intended only to give genealogical information. Genealogical DNA tests generally involve comparing the results of living individuals as opposed to obtaining samples from deceased people.

Procedure

The general procedure for taking a genealogical DNA test involves taking a painless cheek-scraping (also known as a buccal swab) at home and mailing the sample to a genetic genealogy laboratory for testing. Some laboratories use mouth wash or chewing gum instead of cheek swabs. Some laboratories offer to store DNA samples for ease of future testing. All United States laboratories will destroy the DNA sample upon request by the customer, guaranteeing that a sample is not available for further analysis.

Types of tests

The most popular ancestry tests are Y chromosome (Y-DNA) testing and mitochondrial DNA (mtDNA) testing. Other tests attempt to determine a researcher's comprehensive genetic history and/or ethnic origins.

Y chromosome (Y-DNA) testing

A man's paternal ancestry can be traced using the DNA on his Y chromosome (Y-DNA) through Y-STR Testing. This is useful because the Y chromosome, like many European surnames, passes from father to son, and can be used to help study surnames. Women who wish to determine their paternal ancestry can ask their father, brother, paternal uncle, paternal grandfather, or a cousin who shares the same paternal lineage to take a test for them (i.e. any male family member who has the same surname as her father).

What gets tested

Y-DNA testing involves looking at segments of DNA on the Y chromosome (found only in males) where sequences of nucleotides repeat, known as short tandem repeats (STRs). The segments which are examined are referred to as genetic markers and occur in what is considered "junk" DNA.

STR markers

These repeating sets of nucleotides are referred to as genetic markers and are designated by a DYS number (DNA Y-chromosome Segment number). Marker values, which range from 8 to 31 in the examples below in Haplotype, are the number of times each segment repeats.

SNP markers

SNPs are changes to a single nucleotide in a DNA sequence. The relative mutation rate for a SNP is extremely slow. This makes them ideal for marking the history of the human genetic tree. SNPs are named with a letter code and a number. The letter indicates the lab or research team that discovered the SNP. The number indicates the order in which it was discovered. For example M173 is the 173rd SNP documented by the group who uses the letter M.

Understanding test results

Y-DNA tests generally examine 10-67 STR markers on the Y chromosome but over 100 markers are available. STR test results provide the personal haplotype. SNP results indicate the haplogroup.

Haplotype

A Y-DNA haplotype is the numbered results of a genealogical Y-DNA test. Each allele value has a distinctive frequency within a population. For example, at DYS455, the results will show 8, 9, 10, 11 or 12 repeats, with 11 being most common[1]. For high marker tests the allele frequencies provide a signature for a surname lineage.

Kit Surname Haplo 3
9
3

 
3
9
0

 
1
9


 
3
9
1

 
3
8
5
a

3
8
5
b

4
2
6

 
3
8
8

 
4
3
9

 
3
8
9
-
1
3
9
2

 
3
8
9
-
2
4
5
8

 
4
5
9
a

4
5
9
b

4
5
5

 
4
5
4

 
4
4
7

 
4
3
7

 
4
4
8

 
4
4
9

 
4
6
4
a

4
6
4
b

4
6
4
c

4
6
4
d

11111 Rumpelstiltskin Q 12 23 13 10 16 17 12 12 13 14 14 31 18 8 9 11 11 27 13 19 28 14 14 15 15

The test results are then compared to another project member's results to determine the time frame in which the two people shared a most recent common ancestor (MRCA). If the two tests match on 37 markers, there is a 50% probability that the MRCA was fewer than 5 generations ago and a 90% probability that the MRCA was fewer than 17 generations ago.

Kit Surname Haplo 3
9
3

 
3
9
0

 
1
9


 
3
9
1

 
3
8
5
a

3
8
5
b

4
2
6

 
3
8
8

 
4
3
9

 
3
8
9
-
1
3
9
2

 
3
8
9
-
2
4
5
8

 
4
5
9
a

4
5
9
b

4
5
5

 
4
5
4

 
4
4
7

 
4
3
7

 
4
4
8

 
4
4
9

 
4
6
4
a

4
6
4
b

4
6
4
c

4
6
4
d

11111 Rumpelstiltskin Q 12 23 13 10 16 17 12 12 13 14 14 31 18 8 9 11 11 27 13 19 28 14 14 15 15
11178 Rumpelstiltskin Q 12 23 13 10 16 17 12 12 13 14 14 31 18 8 9 11 11 27 13 19 28 14 14 15 15

It is important to check the number of markers that will be tested before choosing a test. For example, the Genographic Project only looks at 12 markers, while most laboratories and surname projects recommend testing at least 25. The more markers that are tested, the more discriminating and powerful the results will be. A 12 marker STR test is usually not discriminating enough to provide conclusive results for a common surname.

STRs results may also indicate a likely haplogroup, though this can only be confirmed by specifically testing for that Haplogroups' single nucleotide polymorphisms (SNPs).

Haplogroup

Haplogroups are large groups of haplotypes that can be used to define genetic populations and are often geographically oriented.

Y-DNA haplogroups are determined by SNP tests. SNPs are locations on the DNA where one nucleotide has "mutated" or "switched" to a different nucleotide. The nucleotide switch must occur in at least 1% of the population to be considered a useful SNP. If it occurs in less than 1% of the population, it is considered a personal SNP.

Haplogroup prediction

A person's haplogroup can often be inferred from their haplotype, but can only be proven with a Y-chromosome SNP tests (Y-SNP test). In addition, some companies offer sub-clade tests, such as for Haplogroup G. For example, Haplogroup G has a known modal haplotype:

DYS markers 3
8
5
a
3
8
5
b
3
8
8
 
3
8
9
i
3
8
9
ii
3
9
0
 
3
9
1
 
3
9
2
 
3
9
3
 
3
9
4
 
4
2
6
 
4
3
7
 
4
3
9
 
4
4
7
 
4
4
8
 
4
4
9
 
4
5
4
 
4
5
5
 
4
5
8
 
4
5
9
a
4
5
9
b
4
6
4
a
4
6
4
b
4
6
4
c
4
6
4
d
Haplogroup G: Modal STR values 14 14 12 12 29 22 10 11 14 15 11 16 11 23 21 31 11 11 16 9 9 12 13 13 14

Few haplotypes will exactly match the modal values for Haplogroup G. One can consult an allele frequency table to determine the likelihood of remaining in Haplogroup G based on the variations observed. Additional predictions include:

  • If DYS426 is 12 and DYS392 is 11, one is probably a member of haplogroup R1a1.
  • If DYS426 is 12 and DYS392 is not 11, one is probably a member of haplogroup R1b.
  • If DYS426 is 11, one is probably a member of haplogroup G,I, or J.
  • If DYS426 is 11 and DYS388 is 12, one is probably a member of haplogroup N3 or E3b

A Bayes classifier to predict the haplogroup probabilities for an observed haplotype is available on the web: Whit Athey Haplogroup Predictor.

Mitochondrial DNA (mtDNA) testing

Map of human migration out of Africa, according to Mitochondrial DNA. The numbers represent thousands of years before present time. The blue line represents the area covered in ice or tundra during the last great ice age. The North Pole is at the center. Africa, harboring the start of the migration, is at the top left and South America is at the far right.

A person's maternal ancestry can be traced using his or her Mitochondrial DNA (mtDNA). The DNA in the human mitochondria is passed down by the mother unchanged. One exception, which was linked to infertility, has been shown. Additionally, some people cite paternal mtDNA transmission as invalidating mtDNA testing[2], but this is not considered problematic in scholarly population genetics studies or genetic genealogy.

What gets tested

mtDNA by current conventions is divided into three regions. They are the coding region and two Hyper Variable Regions (HVR1 and HVR2). All test results are compared to the mtDNA of a European in Haplogroup H2a2. This sample is known as the Cambridge Reference Sequence (CRS). A list of single nucleotide polymorphisms (SNPs) is returned. Any "mutations" or "transitions" that are found are simply differences from the CRS.

The test results are compared to another person's results to determine the time frame in which the two people shared a most recent common ancestor (MRCA). The two most common mtDNA tests are a sequence of HVR1 and a sequence of both HVR1 and HVR2. Some people are now choosing to have a full sequence performed. This is still somewhat controversial as it may reveal medical information.

Understanding test results

The most basic of mtDNA tests will sequence Hyper Variable Region 1 (HVR1). HVR1 nucleotides are numbered 16001-16569. Some test reports might omit the 16 prefix from HVR1 results. ie 519C and not 16519C.

Region HVR1 HVR2
Differences from CRS 111T,223T,259T,290T,319A,362C Not Tested

More extensive tests will also sequence Hyper Variable Region 2 (HVR2). HVR2 nucleotides are numbered 073-577.

Region HVR1 HVR2
Differences from CRS 111T,223T,259T,290T,319A,362C 073G,146C,153G

Haplogroup

Most results include a prediction of mtDNA Haplogroup.

Phylogenetic tree of human mitochondrial DNA (mtDNA) haplogroups

  Mitochondrial Eve (L)    
L0 L1–6  
L1 L2   L3     L4 L5 L6
M N  
CZ D E G Q   O A S R   I W X Y
C Z B F R0   pre-JT   P   U
HV JT K
H V J T

If you belong to a Haplogroup that is distantly related to the CRS, then the prediction may be sufficient. Some companies test for specific mutations in the coding region. For large Haplogroups, such as mtDNA Haplogroup H, an extended test is offered to assign a sub-clade.

Ethnic tests

Autosomal tests that test the recombining chromosomes are available. These attempt to measure an individual's mixed ethnic heritage. The tests' validity and reliability have been called into question but they continue to be popular.

Biogeographical ancestry

Autosomal DNA testing purports to determine the "genetic percentage" of certain ethnicities in a person. These tests examine SNPs, which are locations on the DNA where one nucleotide has "mutated" or "switched" to a different nucleotide. These tests are designed to tell what percentage Native American, European, East Asian, and African a person is. These tests are controversial—their validity has not been independently confirmed — and the results are often disputed.

One company[3] describes these four ethnic groups as follows:

  • Native American: Populations that migrated from Asia to inhabit North, South and Central America.
  • European: European, Middle Eastern and South Asian populations from the Indian subcontinent, including India, Pakistan and Sri Lanka.
  • East Asian: Japanese, Chinese, Mongolian, Korean, Southeast Asian and Pacific Islander populations, including populations native to the Philippines.
  • African: Populations from Sub-Saharan Africa such as Nigeria and Congo region.

Based on customer feedback, the company in June 2007 introduced a new version of its EURO DNA test with a more limited range of countries that promises to have more meaningful clues to one's European ancestry. Both tests -- the four-part ethnicity estimate and EURO DNA test -- use a high number of so-called Ancestry Informative Markers whose genetic distance between populations reflects the populations' geographic distance from each other. The location and variation of these AIMs are proprietary to the company, which is publicly held, and have never been published.

In 2006, another company[4] developed an autosomal DNA ancestry-tracing product that combined the traditional CODIS markers used by law enforcement officers and the judicial system with OmniPop, a population database developed by San Diego detective Brian Burritt. Customers received matches to their profile's frequency of occurrence in world populations as well as a breakout for European ancestry based on the European Network of Forensic Science Institutes, or ENFSI [3]. As a public service, the company has supported the expansion of OmniPop, which currently encompasses over 300 populations, double that of its first release. The ENFSI calculator uses data from 24 European populations (5700 profiles). The two databases must be searched separately, however, because they are based on two different sets of markers. The company sells its product as the DNA Fingerprint Test. The 16 markers incorporated in its results are: D8S1179, D21S11, D7S820, CSFIPO, D3S1358, THO1, D13S317, D16S539, D2S1338, D19S433, VWA, TPOX, D18S51, D5S818, and FGA.

The theory behind using a forensic profile for ancestry tracing is that the alleles' respective frequency of occurrence develops over generations with equal input of the two parents since for each location we take one value from our mother and one from our father. It thus serves as a window into a person's total ancestral composition. The configuration of scores reflects inherited changes from all previous generations, all ancestral lines, and can predict an individual's unique probable ethnic matches based on the profile's commonness or rarity in different populations [5]. The only validation study so far is one by Donald N. Yates and Elizabeth C. Hirschman based on company files[6]. However, neither Yates nor Hirschman are professional geneticists.

To give an idea of the inclusiveness of the latest version of OmniPop, the following are the last populations that have been added.

Greek, Sikkim (India), Bhutia (India), Italian, Argentinian (Misiones), Hungarian(E. Romani), Hungarian(Ashkenazim), Romanian (Szekler), Romanian (Csango), Tibet (Luoba).

As marker sets from more and more populations are included, it is expected that the accuracy of results should improve, leading to a more informative picture of one's ancestry.

Along the same lines, yet another company[7] identifies the indigenous and diaspora populations in which an individual's autosomal STR profile is most common. This test examines autosomal STRs, which are locations on a chromosome where a pattern of two or more nucleotides is repeated and the repetitions are directly adjacent to each other. The populations in which the individual's profile is most common are identified and assigned a likelihood score. The individual's profile is assigned a likelihood of membership in each of twenty three world regions:

  • Alaskan: Inuit peoples of Alaska.
  • Athabaskan: Athabaskan speaking peoples of Western North America.
  • Northeast Amerindian: Native peoples of Northeastern North America.
  • Salishan: Salish speaking peoples of the American Pacific Northwest.
  • South Amerindian: Native peoples of South America.
  • Mestizo (“mixed”): Native Americans blended with Europeans and Africans.
  • Arabian: The Arabian Peninsula.
  • Asia Minor: The East Mediterranean and Anatolia to the Tarim Basin.
  • North African: North Africa.
  • North Indian: Northern India.
  • South Indian: Southern India.
  • Sub-Saharan African: Africa south of the Sahara Desert.
  • Eastern European: The Slavic speaking region of Eastern Europe.
  • Basque: The Basque speaking peoples of Western Europe.
  • Finno-Ugrian: The Uralic speaking region of Northeastern Europe.
  • Mediterranean: The Romance speaking region of Southern Europe.
  • Northwest European: The Celtic and Germanic speaking region of Northwestern Europe.
  • Australian: Aboriginal peoples of Australia.
  • Chinese: The Chinese region of East Asia.
  • Japanese: The Japanese Archipelago.
  • Polynesian: The Polynesian Islands.
  • Southeast Asian: Southeast Asia and the Malay Archipelago.
  • Tibetan: The Himalayas and Tibetan Plateau.

This STR analysis measures the frequency of a person's DNA profile within major world regions. Unlike SNP admixture tests, this analysis is based on objectively identified world regions and does not depend on any system of presumed biogeographic classifications. However, as most STR analysis examines markers chosen for their high intra-group variation, the utility of these particular STR markers to access inter-group relationships may be greatly diminished.

Native American ancestry

Autosomal testing, Y-DNA, and mtDNA testing can also be conducted to determine Amerindian ancestry. A mitochondrial Haplogroup determination test based on mutations in Hypervariable Region 1 and 2 may establish whether a person's direct female line belongs to one of the five recognized Native American Haplogroups, A, B, C, D or X, with the inference that he or she is, in whole or part, Native American. Comparisons with tribal-specific haplotypes is at times possible. Trace Genetics has amassed a databank of mtDNA sequences from American Indian individuals. Comparisons to this or other databanks may suggest tribal affiliation, though no federally-recognized tribe considers DNA as admissible evidence for enrollment. This is based rather on the demonstrable appearance of names of one's direct ancestors on tribal-specific Native American censuses prepared as the fallout of treaty making and relegation to reservations in the 1800s. Complicating factors are the Native American name controversy and recent evidence that indigenous North American Mitochondrial Haplogroups may not limited to the five named, though at present, the vast majority of Native American individuals do belong to one of the five identified mtDNA Haplogroups. Many Americans are just discovering their Native roots, however, and the small chance of belonging to one of the acknowledged lineages, particularly in the case of male lines, which were almost entirely eradicated by the process of history, does not deter some from attempting to validate their heritage with the goal of gaining admittance into a tribe. These tests, moreover, are ideal for adoptees with Native American ancestry, of which there are now many in U.S. and Canadian society because of past policies of assimilation.

African ancestry

Y-DNA and mtDNA testing can determine with which present-day African country a person shares his or her ancestry. Testing company African Ancestry[8] maintains a "African Lineage Database" of African lineages from 30 countries and over 160 ethnic groups. Mostly due to Caucasian slave owners impregnating their female African slaves, approximately 30% of African American males have a European Y chromosome haplogroup[9]. As for the mitochondrial haplotypes, African Ancestry lists approximately 300 tribal affiliations and seeks to assign, within a certain measure of likelihood, an African tribe to testees. This is how Oprah Winfrey, discovered her ancestry. When Oprah had her DNA tested, the results suggested her most likely match was from the Kpelle people of Liberia. According to authorities like Salas, nearly three-quarters of the ancestors of African Americans taken in slavery came from West Africa.

The African American tribal movement has burgeoned since DNA testing, with members of African American churches taking the test as groups. One reason is that owing to slavery, African Americans cannot easily trace their ancestry through surname research, census and property records and other traditional means.

Cohanim ancestry

The Cohanim (or Kohanim) is a patrilineal priestly line of descent in Judaism. According to the Bible, the ancestor of the Cohanim is Aaron, brother of Moses. Many believe that descent from Aaron is verifiable with a Y-DNA test: the first published study of all in genealogical Y chromosome DNA testing found that very many of the Cohens did indeed have distinctively similar DNA, rather more so than general Jewish or Middle Eastern populations. These Cohens tended to belong to Haplogroup J, with Y-STR values clustered unusually closely around a haplotype known as the Cohen Modal Haplotype (CMH). This could indeed be consistent with a shared common ancestor, or with the hereditary priesthood having originally been founded from members of a single closely-related clan.

However it should be noted that the original studies only tested six Y-STR markers, which in modern terms is a very low-resolution test. Such a test simply does not have the resolution to prove relatedness, nor to very reliably estimate the time to a common ancestor. The Cohen Modal Haplotype, whilst notably frequent amongst Cohens, is also far from unusual in the general populations of haplogroups J1 and J2 with no particular link to the Cohen ancestry. So whilst many Cohens have haplotypes close to the CMH, a far larger number of such haplotypes worldwide belong to people with no likely Cohen connection at all.

To some extent one could increase resolution by testing more than six Y-STR markers. For some this could certainly help to establish relatedness to particular recent Cohen clusters; although for many it is likely that it would still be hard to definitively distinguish shared Cohen ancestry from the more general population distribution. However so far there is no openly published research to indicate what extended Y-STR haplotype distributions appear to be characteristic of Cohens. Although some high-resolution testing has certainly been done, to date the results are held as closely-guarded secrets. It is not even known whether the high-resolution testing that has been done tends to confirm or tends to call into question the basic hypothesis of a majority of Cohens sharing a recent common ancestry back to a Y-chromosomal Aaron at an appropriate date.

European testing

For people with European maternal ancestry, mtDNA tests are offered to determine which of eight European maternal "clans" the direct-line maternal ancestor belonged to. This is simply an mtDNA haplotype test based on the research in the book The Seven Daughters of Eve.

SNP testing may enable mostly-European individuals to determine to which Sub-European population they belong:

  • Northern European subgroup (NOR) - mostly Northern and Southwestern European
  • Southeastern European (Mediterranean) subgroup (MED) - mostly Southeastern Europeans (Greeks or Turks)
  • Middle Eastern subgroup (MIDEAS) - mostly Middle Eastern
  • South Asian subgroup (SA) - mostly South Asian from the Indian sub-continent (i.e. Indian)

Hindu testing

The 49 established gotras are clans or families whose members trace their descent to a common ancestor, usually a sage of ancient times. The gotra proclaims a person's identity and a "gotraspeak" is required to be presented at Hindu ceremonies. People of the same gotra are not allowed to marry.

One company says it can use a 37-marker Y-DNA test to "verify genetic relatedness and historical gotra genealogies for Hindu and Buddhist engagements, marriages and business partnerships." Any Y-DNA test can be used to compare results with another person whose gotra is known.

Melungeon testing

Several efforts, including a number of ongoing studies, have examined the genetic makeup of families historically identified as Melungeon. Most results point to a mixture of European, African, and Native American lineages. Though some companies provide additional Melungeon research materials with Y-DNA and mtDNA tests, any test will allow comparisons with the results of current and past Melungeon DNA studies.

Benefits

Genealogical DNA tests have become popular due to the ease of testing at home and the various additions they make to genealogical research. Genealogical DNA tests allow for an individual to determine with 99.9% certainty that he or she is related to another person within a certain time frame, or with 100% certainty that he or she is not related. DNA tests are perceived as more scientific, conclusive and expeditious.

Drawbacks

The Y DNA lineage from father to son can have complications including mutation and false paternity (i.e. the father in one generation is not the father in birth records), a discovery that might upset some people. Maternal DNA is generally harder to correlate with surnames because the mother displaces the father's maternal DNA ( from his mother). Maternal DNA continues only from mother to daughter. Also, a daughter cannot transmit her father's Y DNA to her sons and a son cannot transmit his mother's maternal DNA to his children. [4]

The most common complaint from DNA test customers is the failure of the company to make results understandable and meaningful to them. This was the primary reason cited for customer dissatisfaction in a June 2006 nationwide telephone survey conducted by Shapiro and Associates[citation needed]. According to an earlier survey, 1 in 6 Americans (16.6%) said they were aware of the ancestry-tracing capability of a home DNA test but when probed, most knew little about the details, reliability or differences between tests.

A further drawback, at least with autosomal tests, is their present state of imperfection and large margin of error (up to 15%, according to some genomics experts), with significant blind spots, such as confusion of Mongolian ancestry with Native American.

Medical information

Though genealogical DNA tests results generally have no informative medical value and are not intended to determine genetic diseases or disorders, there has been a correlation established between a lack of DYS464 markers and infertility, and a correlation between mtDNA haplogroup H and protection from sepsis. Certain haplogroups have been linked to longevity.

The testing of full mtDNA sequences is still somewhat controversial as it may reveal medical information. The field of linkage disequilibrium, unequal association of genetic disorders with a certain mitochondrial lineage, is in its infancy, but those mitochondrial mutations that have been linked are searchable in the genome database Mitomap[10]. The National Human Genome Research Institute operates the Genetic And Rare Disease Information Center[11] that can assist consumers in identifying an appropriate screening test and help locate a nearby medical center that offers such.

References

  1. ^ Ybase statistics
  2. ^ for example: Paradise lost: Mitochondrial eve refuted by M. Pickford, July 2006
  3. ^ AncestryByDNA
  4. ^ DNA Consulting [1]
  5. ^ Balding, D.J. et al., eds. (2001). Handbook of Statistical Genetics. New York: Wiley
  6. ^ [2]
  7. ^ DNA Tribes
  8. ^ African Ancestry
  9. ^ African Ancestry :: Patriclan: Trace Your Paternal Ancestry
  10. ^ Mitomap
  11. ^ Genetic And Rare Disease Information Center (GARD)

See also

Haplogroup prediction