DNA database

From Wikipedia, the free encyclopedia
Jump to: navigation, search

A DNA database or DNA databank is a database of DNA data. A DNA database can be used in the analysis of genetic diseases, genetic fingerprinting for criminology, or genetic genealogy. DNA databases may be public or private. those databases do not normally hold DNA except for a short time. DNA fingerprints or DNA profiles are made from the DNA and these are held electronically in the DNA database.


Forensic DNA Database A centralised database for storing DNA profiles of individuals that enables searching and comparing of DNA samples collected from a crime scene against stored profiles. The most important function of the forensic database is to produce matches between the suspected individual and crime scene bio-markers, and then provides evidence to support criminal investigations, and also leads to identify potential suspects in the criminal investigation. Majority of the National DNA databases are used for forensic purposes.[1]

Genetic Genealogy Database A genetic genealogy database is a DNA database of genealogical DNA test results. GenBank is a public genetic genealogy database that stores genome sequences submitted by many genetic genealogists. Until now, GenBank has contained large number of DNA sequences gained from more 140,000 registered organizations, and is updated everyday to ensure a uniform and comprehensive collection of sequence information. These databases are mainly obtained from individual laboratories or large-scale sequencing projects. The files stored in GenBank are divided into different groups, such as BCT (bacterial), VRL (viruses), PRI (primates)…etc. People can access GenBank from NCBI’s retrieval system, and then use “BLAST” function to identify a certain sequence within the GenBank or to find the similarities between two sequences.[2]

Medical DNA Database A medical DNA database is a DNA database of medically relevant genetic variations. It collects individual’s DNA which can reflect their medical records and lifestyle details. Through recording DNA profiles, scientists may find out the interactions between the genetic environment and occurrence of certain diseases (such as cardiovascular disease or cancer), and thus finding some new drugs or effective treatments in controlling these diseases. It is often collaborated with the National Health Service.[3]

National DNA databases[edit]

A National DNA database is a DNA database maintained by the government for storing DNA profiles of its population. They are generally used for forensic purposes which includes searching and matching of DNA profiles of potential criminal suspects.[4]

National DNA databases stores DNA profiles. Each DNA profile based on PCR and uses STR (Standard Tandem Repeats) analysis.

Majority of the European countries have successfully launched National DNA databases. European Network of Forensic Science Institutes (ENFSI) DNA working group has introduced set of recommendations for DNA database management and guidelines for auditing DNA databases. There were 33 recommendations in the 2014 released document.[5]


The first National DNA database NDNAD (National DNA Database) was launched in April 1995 by UK home office, and it is the largest and the most inclusive national forensic DNA databases in the world. Until 2006, it has included 2.7 million individual’s DNA profiles (about 5.2% of its population), as well as other information from individuals and crime scenes.[6] The information are stored in the form of a digital code, which is based on the nomenclature of each STR.[7] As the large amount of DNA profiles which have been stored in NDNAD, "cold hits" may happen during the DNA matching, which means finding an unexpected match between an individual's DNA profile and an unsolved crime-scene DNA profile.This can introduce a new suspect into the investigation, thus helping to solve the old cases.[8]


US National DNA database is maintained at three levels, National, State and local. Each level implemented its own DNA index system NDIS (National DNA Index System) allows DNA profiles exchanged and compared between participated laboratories nationally. SDIS (State DNA Index System) allows DNA profiles exchanged and compared between the laboratories of various states. LDIS (Local DNA Index System) allows DNA profiles collected at local sites and uploaded to SDIS and NDIS. The CODIS (Combined DNA index system) software integrates and connects all the DNA Index systems at three levels. CODIS is installed on each participated laboratory site and uses standalone network known as CJIS WAN(Criminal Justice Information Systems Wide Area Network)[9][10] to connect to other laboratories.


Australia National DNA database is called as National Criminal Investigation DNA database (NCIDD). By the start of 2013, it contained 718,462 DNA profiles.[11][12] This database uses 9 STR locations and a sex gene for analysis. The NCIDD breaks the precious state boundaries and put all the forensic data together, including individual's DNA profile, advanced bio-metrics or even cold cases. The use of NCIDD will help the matching of unknown DNA profiles left at crime scenes throughout Australia, thus, help to solve the cases.[13]


Canadian National DNA database is called as National DNA Data Bank (NDDB) was launched in 1998 but first used in 2000.[14]

NDDB maintains two indexes Convicted Offender Index (COI) and National Crime Scene Index (CSI-nat). There is one more index Local Crime Scene Index (CSI-loc) which is maintained by local laboratories but not NDDB as local DNA profiles doesn't meet NDDB collection criteria.

Further National Crime Scene Index (CSI-nat) is a collection of three labs operated by Royal Canadian Mounted Police (RCMP), Laboratory Sciences Judiciary Medicine Legal(LSJML) and Center of Forensic Sciences(CFS).


The Israel DNA database, also known as IPDIS (Israel Police DNA Index System)[15] was established in 2007, and has a collection of more than 135,000 DNA samples till now. The collection includes DNA samples from suspected, accused and convicted offenders or people who are charged by laws, which can be searched to solve the unsolved crimes in the future. Israel database also include an “elimination bank” of profiles from laboratory staff and other police personnel who may have contact with the forensic evidence in the course of their work. In order to handle the high throughput processing and analysis of DNA samples from FTA cards, the Israeli Police DNA database has established a semi-automated program LIMS, which enables a small number of police to finish processing a large number of samples in a relatively small period of time, and it is also responsible for the future tracking of samples. These characteristics have made the Israel DNA database a successful and effective database.

Interpol DNA Database[edit]

Interpol's DNA Database is particularly used in criminal investigations. It maintains an automated DNA database called as DNA Gateway that contains DNA profiles submitted by member countries collected from crime scenes, missing persons, and unidentified bodies.[16] When it was first established in 2002, it only included a single DNA profile, but at the end of 2013, it has more than 140,000 DNA profiles from 69 member countries. Unlikely to other DNA databases, INTERPOL is only used for information sharing and comparison, it will not link a DNA profile to any individual, and the physical or psychological conditions of an individual will not be included in the database.[17]


[18] [19] DNA databases occupy more storage when compared to other non DNA databases due to enormous size of each DNA sequence. Every year DNA databases are growing exponentially. This posed a major challenge to storage, data transfer, retrieval and search. To address these challenges DNA databases are compressed to save storage space and bandwidth during the data transfers. They are decompressed during search and retrieval. It uses various compression algorithms to compress and decompress. The efficiency of any compression algorithm depends how well and fast it compressed and decompressed which is generally measured in compression ratio. The more the compression ratio, the good is efficiency of an algorithm. At the same time, the speed of compression and decompression considered for evaluation.

DNA sequences contains repetitions of A, C, T, G in the form of palindrome. Compression of sequence involves searching and encoding these repetitions and decoding them when decompressed.

Some of the several encoding approaches used to encode and decode are

1) Huffman Encoding

2) Adaptive Huffman Encoding

3) Arithmetic coding

4) Arithmetic adaptive coding

5) Context tree weighted method

Few of the compression algorithms listed below uses the one of the above encoding approaches to compress and decompress DNA database


2) RLZ

3) GenCompress

4) BioCompress

5) DNACompress


Privacy issues[edit]

Critics of DNA databases warn that the various uses of the technology can pose a threat to individual civil liberties.[20][21] Personal information included in genetic material, such as markers that identify various genetic diseases and behavioral traits, could be used for discriminatory profiling and its collection may constitute an invasion of privacy.[22] Also, DNA can be used to establish paternity and whether or not a child is adopted. Nowadays, the privacy and security issues of DNA database has caused huge attention. Some people are afraid that their personal DNA information will be let out easily, others may define their DNA profiles recording in the Databases as a sense of "criminal", and being falsely accused in a crime can lead to having a "criminal" record for the rest of their lives.

UK introduced new laws in the year 2001 that the DNA profiles were allowed to be kept in a Database even if the suspect is acquitted; from 2003, the arrested person’s DNA will be taken immediately after he was arrested before they have a chance to be charged. These actions had caused huge reactions from public, as thousands of innocent people’s DNA may be retained in a “criminal” DNA database, and including bigger amounts of innocent people’s DNA profile cannot help to solve more crimes.[23]

In European countries which have established a DNA database, there are some measures which are being used to protect the privacy of individuals, more specifically, some criteria to help removing the DNA profiles from the databases. Among the 22 European countries which have been analyzed, most of the countries will record the DNA profiles of the suspects or those who has convinced serious crimes. Within some countries like the Netherlands, individual's DNA profile can only be taken if it can help to solve the criminal case, which means that a convinced suspect's DNA profile will not be stored into the DNA database.[24] For some countries (like Belgium and France) may remove the criminal’s profile after 30–40 years, because these “criminal investigation” database are no longer needed. Most of the countries will delete the suspect’s profile after they are acquitted…etc. All the countries have a completed legislation to largely avoid the privacy issues which may occur during the use of DNA database.[25]

Privacy issues in DNA database not only means threatens when collecting and analyzing DNA samples, it also exist in protecting and storing these important personal information. As the DNA profiles can be stored infinitely in DNA database, it has raised concerns that these DNA samples can be used for new and unidentified purpose.[26] With the increase of the users who access the DNA database, people are worried about their information being let out or shared inappropriately, for example, their DNA profile may be shared with others such as law enforcement agencies or countries without individual consent.[27]


  1. ^ Santos, F., Machado, H., & Silva, S. (2013). Forensic DNA databases in European countries: is size linked to performance?. Life Sciences Society and Policy, 9(1), 12.
  2. ^ Benson, D. A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., & Sayers, E. W. (2012). GenBank. Nucleic acids research, gks1195.
  3. ^ Hagmann, M. (2000). UK plans major medical DNA database. Science, 287(5456), 1184-1184.
  4. ^ Butler, J. M. (2011). Advanced Topics in Forensic DNA Typing: Methodology: Methodology. Academic Press.
  5. ^ ENFSI DNA Working Group. (2010). DNA database management: Review and Recommendation. The Hague (The Netherlands): ENFSI.
  6. ^ Linacre, A. (2003). The UK National DNA Database. The Lancet, 361(9372), 1841-1842.
  7. ^ Gill, P. (2002). Role of short tandem repeat DNA in forensic casework in the UK-past, present, and future perspectives. Biotechniques, 32(2), 366-385.
  8. ^ Wallace, H. (2006). The UK national DNA database. EMBO reports, 7(1S), S26-S30.
  9. ^ CODIS Brochure
  10. ^ Butler, J. M. (2011). Advanced Topics in Forensic DNA Typing: Methodology: Methodology. Academic Press.
  11. ^ CrimTrac Biometric Services
  12. ^ Mobbs, Jonathan D. "Crimtrac-technology and detection." 4th National Outlook Symposium on Crime in Australia, New Crimes or New Responses. Canberra. 2001.
  13. ^ National DNA database completed
  14. ^ Milot, E., Lecomte, M. M., Germain, H., & Crispino, F. (2013). The national DNA data bank of Canada: a Quebecer perspective. Frontiers in genetics, 4.
  15. ^ Zamir, A., Dell’Ariccia-Carmon, A., Zaken, N., & Oz, C. (2012). The Israel DNA database—The establishment of a rapid, semi-automated analysis system. Forensic Science International: Genetics, 6(2), 286-289.
  16. ^ Interpol Forensics
  17. ^ Forensics
  18. ^ Ateet Mehta & Bankim Patel, et al., 2010, "DNA Compression using Hash Based Data Structure", International Journal of Information Technology and Knowledge Management July-December 2010, Volume 2, No. 2, pp. 383-386
  19. ^ Kuruppu, S. S. (2012). Compression of Large DNA Databases (Doctoral dissertation, The University of Melbourne).
  20. ^ Jeffries, Stuart (27 October 2006). "Suspect nation". The Guardian. 
  21. ^ Lemieux, Scott (March 23, 2012). "Are Police Building a Massive DNA Database?". AlterNet. 
  22. ^ "DNA database 'breach of rights'". BBC News. 4 December 2008. 
  23. ^ Wallace, H. M., Jackson, A. R., Gruber, J., & Thibedeau, A. D. (2014). Forensic DNA databases: Ethical and legal standards: A global review. Egyptian Journal of Forensic Sciences.
  24. ^ Martin, P. D., Schmitter, H., & Schneider, P. M. (2001). A brief history of the formation of DNA databases in forensic science within Europe. Forensic science international, 119(2), 225-231.
  25. ^ Santos, F., Machado, H., & Silva, S. (2013). Forensic DNA databases in European countries: is size linked to performance?. Life Sciences Society and Policy, 9(1), 12.
  26. ^ Roman-Santos, C. (2010). Concerns Associated with Expanding DNA Databases. Hastings Sci. & Tech. LJ, 2, 267.
  27. ^ DNA databank proposal raises privacy concerns