Y Chromosome Haplotype Reference Database

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Logo of the Y Chromosome Haplotype Reference Database (YHRD) version 4.0

The Y Chromosome Haplotype Reference Database (YHRD) is an open access, annotated collection of population samples typed for Y chromosomal sequence variants. Two important objectives are pursued: (1) the generation of reliable frequency estimates for Y-STR haplotypes and Y-SNP haplotypes to be used in the quantitative assessment of matches in forensic and kinship cases and (2) the characterization of male lineages to draw conclusions about the origins and history of human populations. Since its creation in 1999 it has been curated by Lutz Roewer and Sascha Willuweit at the Institute of Legal Medicine and Forensic Sciences, Charité - Universitätsmedizin Berlin. The database is endorsed by the International Society for Forensic Genetics (ISFG). By June 2019 285,406 9-STR locus haplotypes, among them 225,098 17-STR locus haplotypes, 62,737 23-STR locus haplotypes, 56,114 27-STR locus haplotypes and 24,328 Y SNP profiles sampled in 135 countries have been directly submitted by forensic institutions and universities from 72 countries. In geographic terms, 47% of the YHRD samples stem from Asia, 23% from Europe, 14% from North America, 11% from Latin America, 3% from Africa, 1% from Oceania/Australia and 0.3% from the Arctic (release 61 of June 24, 2019). The 1,308 individual sampling projects are described in more than 570 peer-reviewed publications [1]


YHRD is built by direct submissions of population data from individual laboratories. Upon receipt of a submission, the YHRD staff examines the originality of the data and assigns an accession number to the population sample and performs quality assurance checks. The submissions are then released to the public database, where the entries are retrievable by Search for haplotypes, populations, contributors or accession numbers. All population data published in forensic journals as FSI: Genetics or International Journal of Legal Medicine are required to be validated by the YHRD custodians and are subsequently included in the YHRD.[2]

Database structure[edit]

The database supports the most frequently used haplotype formats (e.g. Minimal (minHt), Powerplex Y12,[3] YFiler,[4] Powerplex Y23 [5] , YfilerPlus and Maximal (maxHt) for which differently-sized databases exist.

Because strong correlations between geographic areas and Y chromosomal variants exist, the YHRD population database was structured to display the geographic, linguistic and phylogenetic relationship of searched haplotype profiles. Currently the YHRD database recognizes four separate "metapopulation" structures: national, continental, linguistic/ethnic and phylogenetic affiliation with several categories within. In population genetics the term metapopulation describes discrete spatially distributed population groups which are interconnected by geneflow and migration.[6] By analogy, the term metapopulation is used in forensic genetics to describe a set of geographically dispersed populations with shared ancestry and continuing geneflow. Thus, the population groups are more similar within the metapopulation than to groups outside the metapopulation.[7]


The concept of pooling data to build "national databases" has a very straightforward explanation: law enforcement agencies and forensic services rely on their national population to build reference databases. In most instances offenders and victims stem from the national population, and their genetic profiles should thus be represented in the database. In countries like USA, Brazil, UK or China which are characterized by strong population substructure national reference databases are often built on basis of a historical concept of ethnic affiliation, e.g. the US population is sub-structured in a Caucasian, African, Hispanic, Asian and Native American populations or UK differentiates English, Afro-Caribbean, Indo-Pakistani and Chinese. National databases due to their importance in national legislation are thus searchable in the YHRD. Each national Metapopulation in the YHRD comprises all individuals sampled in a particular country regardless of the ancestry of the individuals.


Continental Metapopulations in the YHRD comprises all individuals sampled in a particular continent regardless of their ancestries. The YHRD defines seven continental Metapopulations following the United Nations classification of geographical regions: Africa, Arctic, Asia, Europe, Latin America, North America, Oceania/Australia.


The Metapopulation structure built on basis of "ethnicity/linguistic affiliation" takes to a larger extent the ancestry of sampled individuals into account. "Ancestry" is a term collating historical, cultural, geographical and linguistic categories. Of course, a Metapopulation concept on basis of "ethnicity" is by no means ideal, fully rational or fully translatable, but simply takes the fact into account that on a global level categories other than "nation" or "geography" far better describe the observed genetic clustering and inhomogeneity of Y chromosome patterns.

For a global reference database the "major language group" criterion seems most appropriate to group data by taking the ancestry into account and produce subdatabases with respect to genetic similarity. The reasoning in doing so is twofold: first, language is an inherited cultural trait and thus the language phylae often correlate with genetic traits not the least to Y chromosome polymorphisms. Second, since languages are well examined by science and mostly understood by the public due to the long tradition of language research, the linguistic terminology is in principal more understandable and translatable into practice than their genetic pendant. Aside from the pure linguistic categorization (e.g. the Altaic language family comprising people speaking Turk and Mongol languages) we took also unifying geographic criteria (Sub-Saharan Africa comprising speakers of different African language groups which live south of the Sahara).

It is important to state, that the current Metapopulation structure is an a-priori categorization which needs a continuous evaluation and verification by means of statistical methods to quantify the genetic similarity/dissimilarity between the samples. While the current categorization of eight large Metapopulations gains some support from genetic distance analysis done on basis of ~41,000 haplotypes [7] a further subdivision of the "Eurasian - European Metapopulation" was implemented solely on basis of Y-STR haplotypes. The analysis of ~12,000 European Haplotypes by AMOVA demonstrates that three larger pools of European haplotypes exist: the western, eastern and southeastern metapopulations.[8]

Currently the YHRD has seven non-overlapping metapopulations: African, Afro-Asiatic, Amerindian, Australian Aboriginal, East Asian, Eskimo-Aleut, and Eurasian. Some of these metapopulations are further subdivided, e.g. Eurasian into six subcategories, from which European subgroup splits further into three groups of Western, Eastern and Southeastern Europeans.


The DNA profiling of Y chromosomes submitted to the YHRD is now continuously extended for binary Y-SNP polymorphisms. The phylogeny of the Y chromosome defined by binary polymorphisms is well established and stable (Underhill et al. (2000), Hammer et al. (2001), Jobling and Tyler-Smith (2003) and Karafet et al. (2008)). All Y chromosomes sharing a mutation are related by descent, until a further mutation splits the branch. Haplotypes within a haplogroup could be highly similar or even "identical by descent" (IBD). In thus, the haplogroup could be used as a criterion to substructure the database according to the phylogenetic descent of samples. Even though the chronology of the SNP mutations is far less certain than the structure of the tree, many haplogroups could be equated with events in human prehistory. The worldwide distribution of the patterns of the human Y-chromosome diversity has revealed clear geographically associated haplogroups (Underhill et al. (2000)).

Database Tools[edit]


Analysis of molecular variance (AMOVA) is a method for analyzing population variation using molecular data, e.g. Y-STR haplotypes.[9] With AMOVA it is possible to evaluate and quantify the extent of differentiation between two or more population samples. AMOVA is implemented as an online tool in the YHRD and provides a way of estimating ΦST and FST values. The online tool accepts Excel files and creates entry files from it. As much as 9 reference populations selected from the YHRD as well as population sets can be added to the AMOVA analysis. The online calculation returns as a result a *.csv table with pairwise FST or ΦST(RST) values plus p-values as a test for significance (10,000 permutations). In addition, an MDS plot is generated to illustrate the genetic distance between the analyzed populations graphically. The program shows the references for the selected population studies which facilitates the correct citation.


The tool can be applied for forensic cases when a mixed trace (2 or more male contributors) should be analyzed. The result will be a likelihood ratio of donorship vs. non-donorship of the putative contributor to the trace.


The tool can be applied for kinship cases when a relationship between upstream and downstream relatives (e.g. father-son or grandfather-grandson) should be analyzed. The result will be a likelihood ratio (or kinship index) of patrilineal relationship vs. patrilineal non-relationship of the analyzed persons.

Match statistics[edit]

Searching the YHRD will result in a match or a non-match between a searched haplotype and the databased reference samples. The relative number of matches is described as the profile frequency. In forensic casework the probability of a match which is based on the profile frequency is evaluated using different methods. Some of these are recommended by national guidelines, e.g. the augmented counting method with confidence intervals and/or theta subpopulation correction (SWGDAM Interpretation Guidelines for Y-Chromosome STR typing by Forensic Laboratories in the USA, 2014) or the Discrete Laplace method (Andersen et al. 2013) as recommended in Germany (Willuweit et. al. 2018). Both augmented counting and DL values are provided by the YHRD for different metapopulations.


Date Release Haplotypes Milestone
August 1, 1999 1 2,517 YHRD 1.0
June 16, 2000 1a 3,589
January 1, 2003 2 18,050
August 18, 2003 3 19,482
October 30, 2003 4 20,152
July 11, 2003 5 20,320
October 12, 2003 6 20,865
December 29, 2003 8,9 21,446
February 24, 2004 10 21,546
February 26, 2004 11 22,872
April 13, 2004 12 24,524 YHRD 2.0
May 24, 2004 13 25,066
July 1, 2004 14 26,325
September 18, 2004 15 28,649
December 17, 2004 16 32,196
May 31, 2005 17 34,558
October 14, 2005 18 38,761
January 31, 2006 19 41,965
August 1, 2006 20 46,831
December 28, 2006 21 51,253
April 13, 2007 22 52,655
August 10, 2007 23 54,833
July 23, 2008 24 59,004 YHRD 3.0
October 1, 2008 25 65,165
January 29, 2009 26 68,108
February 13, 2009 27 72,082
March 23, 2009 28 72,055
June 12, 2009 29 74,742
August 21, 2009 30 79,147
November 16, 2009 31 81,099
December 18, 2009 32 84,047
March 3, 2010 33 86,568
July 16, 2010 34 89,237
December 30, 2010 35 91,601
May 15, 2011 36 93,290
June 21, 2011 37 97,575
December 30, 2011 38 99,881
February 17, 2012 39 101,055
August 29, 2012 40 104,174
October 1, 2012 41 105,498
January 11, 2013 42 108,949
January 18, 2013 43 112,005
July 12, 2013 44 114,256
October 31, 2013 45 124,343
December 20, 2013 46 126,931
August 15, 2014 47 132,553 YHRD 4.0
November 10, 2014 48 136,184
February 17, 2015 49 143,044
July 18, 2015 50 154,329
January 6, 2016 51 160,693
October 27, 2016 52 178,171
March 01, 2017 53 183,655
June 06, 2017 54 188,209
October 20, 2017 55 197,102
April 9, 2018 56 207,467
June 15, 2018 57 216,562
September 9, 2018 58 255,811
November 1, 2018 59 265,324
January 14, 2019 60 269,383
June 24, 2019 61 285,406

See also[edit]


  1. ^ "YHRD Homepage". Retrieved January 15, 2019.
  2. ^ "FSIGEN Publishing Guidelines" (PDF). Retrieved 25 September 2013.
  3. ^ "Promega PowerPlex Y". Retrieved 25 September 2013.
  4. ^ "Applied Biosystem Yfiler". Retrieved 25 September 2013.
  5. ^ "Promega PowerPlex Y23". Retrieved 25 September 2013.
  6. ^ Hanski, I. and Gilpin, M. (1997). Metapopulation Biology: Ecology, Genetics, and Evolution., Academic Press, San Diego.
  7. ^ a b Willuweit, S., Roewer, L. and The International Forensic Y Chromosome User Group (2007). Y chromosome haplotype reference database (YHRD): Update., Forensic Sci Int Genet 1(2): 83--87.
  8. ^ Roewer, L., Croucher, P. J. P., Willuweit, S., Lu, T. T., Kayser, M., Lessig, R., de Knijff, P., Jobling, M. A., Tyler- Smith, C. and Krawczak, M. (2005). Signature of recent historical events in the european y-chromosomal STR haplotype distribution., Hum Genet 116(4): 279--291.
  9. ^ Roewer, L., Kayser, M., Dieltjes, P., Nagy, M., Bakker, E., Krawczak, M. and de Knijff, P. (1996). Analysis of molecular variance (AMOVA) of y-chromosome-specific microsatellites in two closely related human populations., Hum Mol Genet 5(7): 1029--1033.

External links[edit]