Mega2, the Manipulation Environment for Genetic Analysis: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Updating version number and release date
Citation bot (talk | contribs)
m Alter: journal, title, pages. Add: doi, pmc, pmid, issue. | You can use this bot yourself. Report bugs here. | User-activated.
Line 34: Line 34:
}}
}}


'''Mega2''' (short for '''manipulation environment for genetic analysis''') allows the applied statistical geneticist to convert one's data from several input formats to a large number output formats suitable for analysis by commonly used software packages.<ref name=Mega2_Abstract>{{cite journal|last=Mukhopadhyay|first=N|author2=Almasy L |author3=Schroeder M |author4=Mulvihill WP |author5=Weeks DE |title=Mega2, a data-handling program for facilitating genetic linkage and association analyses|journal=Am J Hum Genet|date=1999|volume=65|page=A436}}</ref><ref name=Mega2_paper>{{cite journal|last=Mukhopadhyay|first=N|author2=Almasy L |author3=Schroeder M |author4=Mulvihill WP |author5=Weeks DE |title=Mega2: data-handling for facilitating genetic linkage and association analyses|journal=Bioinformatics|date=2005|volume=21|issue=10|pages=2556–2557|pmid=15746282 |doi=10.1093/bioinformatics/bti364}}</ref><ref name=Mega2_2013_Abstract>{{cite journal|last=Kollar|first=CP|author2=Baron RV |author3=Mukhopadhyay N |author4=Weeks DE |title=Mega2: enhanced data-handling for facilitating genetic linkage and association analyses|journal=Presented at the 63rd Annual Meeting of The American Society of Human Genetics, Boston|date=October 2013|page=Abstract 1831|url=http://abstracts.ashg.org/cgi-bin/2013/ashg13s.pl?author=kollar&sort=ptimes&sbutton=Detail&absno=130121140&sid=32111}}</ref><ref name=Mega2_2014_paper>{{cite journal |vauthors=Baron RV, Kollar C, Mukhopadhyay N, Weeks DE | title=Mega2: validated data-reformatting for linkage and association analyses | journal=Source Code Biol Med | date=2014 | volume=9 | issue=1|pages=26|pmc=4269913 | doi=10.1186/s13029-014-0026-y | pmid=25687422}}</ref> In a typical human genetics study, the analyst often needs to use a variety of different software programs to analyze the data, and these programs usually require that the data be formatted to their precise input specifications. Conversion of one's data into these multiple different formats can be tedious, time-consuming, and error-prone. Mega2, by providing validated conversion pipelines, can accelerate the analyses while reducing errors.
'''Mega2''' (short for '''manipulation environment for genetic analysis''') allows the applied statistical geneticist to convert one's data from several input formats to a large number output formats suitable for analysis by commonly used software packages.<ref name=Mega2_Abstract>{{cite journal|last=Mukhopadhyay|first=N|author2=Almasy L |author3=Schroeder M |author4=Mulvihill WP |author5=Weeks DE |title=Mega2, a data-handling program for facilitating genetic linkage and association analyses|journal=Am J Hum Genet|date=1999|volume=65|page=A436}}</ref><ref name=Mega2_paper>{{cite journal|last=Mukhopadhyay|first=N|author2=Almasy L |author3=Schroeder M |author4=Mulvihill WP |author5=Weeks DE |title=Mega2: data-handling for facilitating genetic linkage and association analyses|journal=Bioinformatics|date=2005|volume=21|issue=10|pages=2556–2557|pmid=15746282 |doi=10.1093/bioinformatics/bti364}}</ref><ref name=Mega2_2013_Abstract>{{cite journal|last=Kollar|first=CP|author2=Baron RV |author3=Mukhopadhyay N |author4=Weeks DE |title=Mega2: enhanced data-handling for facilitating genetic linkage and association analyses|journal=Presented at the 63rd Annual Meeting of the American Society of Human Genetics, Boston|date=October 2013|page=Abstract 1831|url=http://abstracts.ashg.org/cgi-bin/2013/ashg13s.pl?author=kollar&sort=ptimes&sbutton=Detail&absno=130121140&sid=32111}}</ref><ref name=Mega2_2014_paper>{{cite journal |vauthors=Baron RV, Kollar C, Mukhopadhyay N, Weeks DE | title=Mega2: validated data-reformatting for linkage and association analyses | journal=Source Code Biol Med | date=2014 | volume=9 | issue=1|pages=26|pmc=4269913 | doi=10.1186/s13029-014-0026-y | pmid=25687422}}</ref> In a typical human genetics study, the analyst often needs to use a variety of different software programs to analyze the data, and these programs usually require that the data be formatted to their precise input specifications. Conversion of one's data into these multiple different formats can be tedious, time-consuming, and error-prone. Mega2, by providing validated conversion pipelines, can accelerate the analyses while reducing errors.


Mega2 produces a common intermediate data representation using SQLite3, which enables the data to be accessed by other programs and languages. In particular, the [https://cran.r-project.org/package=Mega2R Mega2R] R package converts the SQLite3 data into R data frames. Several R functions are provided that illustrate how data can be extracted from the data frames for common R analysis, such as [https://cran.r-project.org/package=SKAT SKAT] and [https://cran.r-project.org/package=pedgene pedgene]. The key is being able to efficiently extract genotypes corresponding to chosen subsets of markers so as to facilitate gene-based association testing by automating looping over genes in the genome. Another function converts to VCF format and another converts the data to [https://cran.r-project.org/package=GenABEL GenABEL] format. For more information about the Mega2R package, see [https://watson.hgen.pitt.edu/mega2/mega2r/ here].
Mega2 produces a common intermediate data representation using SQLite3, which enables the data to be accessed by other programs and languages. In particular, the [https://cran.r-project.org/package=Mega2R Mega2R] R package converts the SQLite3 data into R data frames. Several R functions are provided that illustrate how data can be extracted from the data frames for common R analysis, such as [https://cran.r-project.org/package=SKAT SKAT] and [https://cran.r-project.org/package=pedgene pedgene]. The key is being able to efficiently extract genotypes corresponding to chosen subsets of markers so as to facilitate gene-based association testing by automating looping over genes in the genome. Another function converts to VCF format and another converts the data to [https://cran.r-project.org/package=GenABEL GenABEL] format. For more information about the Mega2R package, see [https://watson.hgen.pitt.edu/mega2/mega2r/ here].
Line 50: Line 50:
! Input format !! Description !! Links
! Input format !! Description !! Links
|-
|-
| LINKAGE<ref name=LINKAGE1984 >{{cite journal |vauthors=Lathrop GM, Lalouel JM | title=Easy calculations of lod scores and genetic risks on small computers | journal=Am J Hum Genet | date=1984 | volume=36 |pages=460–465 }}</ref><ref name=LINKAGE1985 >{{cite journal |vauthors=Lathrop GM, Lalouel JM, Julier C, Ott J | title=Multilocus linkage analysis in humans: detection of linkage and estimation of recombination | journal=Am J Hum Genet | date=1985 | volume=37 | issue=3|pages=482–498 }}</ref><ref name=LINKAGE1986 >{{cite journal |vauthors=Lathrop GM, Lalouel JM, White RL | title=Construction of human linkage maps: likelihood calculations for multilocus analysis | journal=Genet Epidemiol | date=1986 | volume=3 |pages=39–52 | doi=10.1002/gepi.1370030105}}</ref><ref name=LINKAGE1988>{{cite journal |vauthors=Lathrop GM, Lalouel JM | title=Efficient computations in multilocus linkage analysis | journal=Am J Hum Genet | date=1988 | volume=42 |pages=498–505 }}</ref>
| LINKAGE<ref name=LINKAGE1984 >{{cite journal |vauthors=Lathrop GM, Lalouel JM | title=Easy calculations of lod scores and genetic risks on small computers | journal=Am J Hum Genet | date=1984 | volume=36 |pages=460–465 }}</ref><ref name=LINKAGE1985 >{{cite journal |vauthors=Lathrop GM, Lalouel JM, Julier C, Ott J | title=Multilocus linkage analysis in humans: detection of linkage and estimation of recombination | journal=Am J Hum Genet | date=1985 | volume=37 | issue=3|pages=482–498 }}</ref><ref name=LINKAGE1986 >{{cite journal |vauthors=Lathrop GM, Lalouel JM, White RL | title=Construction of human linkage maps: likelihood calculations for multilocus analysis | journal=Genet Epidemiol | date=1986 | volume=3 | issue=1 |pages=39–52 | doi=10.1002/gepi.1370030105| pmid=3957003 }}</ref><ref name=LINKAGE1988>{{cite journal |vauthors=Lathrop GM, Lalouel JM | title=Efficient computations in multilocus linkage analysis | journal=Am J Hum Genet | date=1988 | volume=42 | issue=3 |pages=498–505 | pmid=3162348 | pmc=1715153 }}</ref>
|| pre-Makeped or post-Makeped formats || [http://www.jurgott.org/linkage/LinkageUser.pdf Linkage User Guide (PDF)], [https://watson.hgen.pitt.edu/docs/mega2_html/mega2_documentation.html#sub:Linkage-file-formats LINKAGE format]
|| pre-Makeped or post-Makeped formats || [http://www.jurgott.org/linkage/LinkageUser.pdf Linkage User Guide (PDF)], [https://watson.hgen.pitt.edu/docs/mega2_html/mega2_documentation.html#sub:Linkage-file-formats LINKAGE format]
|-
|-
Line 72: Line 72:
| ASPEX format || [http://aspex.sourceforge.net ASPEX]
| ASPEX format || [http://aspex.sourceforge.net ASPEX]
|-
|-
| Allegro format<ref>{{cite journal |vauthors=Gudbjartsson DF, Jonasson K, Frigge ML, Kong A | title=Allegro, a new computer program for multipoint linkage analysis | journal=Nat Genet | date=2000 | volume=25 | issue=1|pages=12–13 | doi=10.1038/75514}}</ref>
| Allegro format<ref>{{cite journal |vauthors=Gudbjartsson DF, Jonasson K, Frigge ML, Kong A | title=Allegro, a new computer program for multipoint linkage analysis | journal=Nat Genet | date=2000 | volume=25 | issue=1|pages=12–13 | doi=10.1038/75514| pmid=10802644 }}</ref>
||
||
|-
|-
Line 78: Line 78:
|| [http://faculty.washington.edu/browning/beagle/beagle.html BEAGLE]
|| [http://faculty.washington.edu/browning/beagle/beagle.html BEAGLE]
|-
|-
| CRANEFOOT format<ref>{{cite journal |vauthors=Makinen VP, Parkkonen M, Wessman M, Groop PH, Kanninen T, Kaski K | title=High-throughput pedigree drawing | journal=Eur J Hum Genet | date=2005 | volume=13 | issue=8|pages=987–989 | doi=10.1038/sj.ejhg.5201430}}</ref>
| CRANEFOOT format<ref>{{cite journal |vauthors=Makinen VP, Parkkonen M, Wessman M, Groop PH, Kanninen T, Kaski K | title=High-throughput pedigree drawing | journal=Eur J Hum Genet | date=2005 | volume=13 | issue=8|pages=987–989 | doi=10.1038/sj.ejhg.5201430| pmid=15870825 }}</ref>
|| [http://www.finndiane.fi/software/cranefoot/ CRANEFOOT]
|| [http://www.finndiane.fi/software/cranefoot/ CRANEFOOT]
|-
|-
Line 84: Line 84:
|| [http://www.hsph.harvard.edu/alkes-price/software/ EIGENSOFT]
|| [http://www.hsph.harvard.edu/alkes-price/software/ EIGENSOFT]
|-
|-
| FBAT format<ref>{{cite journal |vauthors=Laird NM, Horvath S, Xu X | title=Implementing a unified approach to family-based tests of association | journal=Genet Epidemiol | date=2000 | volume=19 | issue=Suppl 1|pages=S36-42. | doi=10.1002/1098-2272(2000)19:1+<::aid-gepi6>3.3.co;2-d}}</ref>
| FBAT format<ref>{{cite journal |vauthors=Laird NM, Horvath S, Xu X | title=Implementing a unified approach to family-based tests of association | journal=Genet Epidemiol | date=2000 | volume=19 | issue=Suppl 1|pages=S36–42 | doi=10.1002/1098-2272(2000)19:1+<::aid-gepi6>3.3.co;2-d}}</ref>
|| [http://www.hsph.harvard.edu/fbat/fbat.htm FBAT]
|| [http://www.hsph.harvard.edu/fbat/fbat.htm FBAT]
|-
|-
Line 90: Line 90:
|| [http://www.broadinstitute.org/ftp/distribution/software/genehunter/ GeneHunter]
|| [http://www.broadinstitute.org/ftp/distribution/software/genehunter/ GeneHunter]
|-
|-
| GeneHunter-Plus format<ref>{{cite journal |vauthors=Kong A, Cox NJ | title=Allele-sharing models: LOD scores and accurate linkage tests | journal=Am J Hum Genet | date=1997 | volume=61 | issue=5|pages=1179–1188 | doi=10.1086/301592}}</ref>
| GeneHunter-Plus format<ref>{{cite journal |vauthors=Kong A, Cox NJ | title=Allele-sharing models: LOD scores and accurate linkage tests | journal=Am J Hum Genet | date=1997 | volume=61 | issue=5|pages=1179–1188 | doi=10.1086/301592| pmid=9345087 | pmc=1716027 }}</ref>
|| [http://galton.uchicago.edu/genehunterplus/ GeneHunter-Plus]
|| [http://galton.uchicago.edu/genehunterplus/ GeneHunter-Plus]
|-
|-
| IQLS/Idcoefs format<ref>{{cite journal |vauthors=Wang Z, McPeek MS | title=An Incomplete-Data Quasi-likelihood Approach to Haplotype-Based Genetic Association Studies on Related Individuals | journal=J Am Stat Assoc | date=2009 | volume=104 | issue=487|pages=1251–1260 | doi=10.1198/jasa.2009.tm08507}}</ref><ref>{{cite journal | author=Abney M | title=A graphical algorithm for fast computation of identity coefficients and generalized kinship coefficients | journal=Bioinformatics | date=2009 | volume=25 | issue=12|pages=1561–1563 | doi=10.1093/bioinformatics/btp185 | pmid=19359355 | pmc=2687941}}</ref>
| IQLS/Idcoefs format<ref>{{cite journal |vauthors=Wang Z, McPeek MS | title=An Incomplete-Data Quasi-likelihood Approach to Haplotype-Based Genetic Association Studies on Related Individuals | journal=J Am Stat Assoc | date=2009 | volume=104 | issue=487|pages=1251–1260 | doi=10.1198/jasa.2009.tm08507| pmid=20428335 | pmc=2860453 }}</ref><ref>{{cite journal | author=Abney M | title=A graphical algorithm for fast computation of identity coefficients and generalized kinship coefficients | journal=Bioinformatics | date=2009 | volume=25 | issue=12|pages=1561–1563 | doi=10.1093/bioinformatics/btp185 | pmid=19359355 | pmc=2687941}}</ref>
|| [http://www.stat.uchicago.edu/~mcpeek/software/IQLS/index.html IQLS],[http://home.uchicago.edu/~abney/abney_web/Software.html Idcoefs]
|| [http://www.stat.uchicago.edu/~mcpeek/software/IQLS/index.html IQLS],[http://home.uchicago.edu/~abney/abney_web/Software.html Idcoefs]
|-
|-
| Linkage format<ref name=LINKAGE1984 /><ref name=LINKAGE1985 /><ref name=LINKAGE1986 /><ref name=LINKAGE1988 /> || [http://www.jurgott.org/linkage/LinkageUser.pdf Linkage User Guide (PDF)], [https://watson.hgen.pitt.edu/docs/mega2_html/mega2_documentation.html#sub:Linkage-file-formats LINKAGE format]
| Linkage format<ref name=LINKAGE1984 /><ref name=LINKAGE1985 /><ref name=LINKAGE1986 /><ref name=LINKAGE1988 /> || [http://www.jurgott.org/linkage/LinkageUser.pdf Linkage User Guide (PDF)], [https://watson.hgen.pitt.edu/docs/mega2_html/mega2_documentation.html#sub:Linkage-file-formats LINKAGE format]
|-
|-
| Loki format<ref>{{cite journal | author=Heath SC | title=Markov chain Monte Carlo segregation and linkage analysis for oligogenic models | journal=Am J Hum Genet | date=1997 | volume=61 | issue=3|pages=748–760 | doi=10.1086/515506}}</ref>
| Loki format<ref>{{cite journal | author=Heath SC | title=Markov chain Monte Carlo segregation and linkage analysis for oligogenic models | journal=Am J Hum Genet | date=1997 | volume=61 | issue=3|pages=748–760 | doi=10.1086/515506| pmid=9326339 }}</ref>
|| [https://www.stat.washington.edu/thompson/Genepi/Loki.shtml Loki]
|| [https://www.stat.washington.edu/thompson/Genepi/Loki.shtml Loki]
|-
|-
| MaCH/minimac3 format<ref>{{cite journal |vauthors=Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR | title=Fast and accurate genotype imputation in genome-wide association studies through pre-phasing | journal=Nat Genet | date=2012 | volume=44 | issue=8|pages=955–959 | doi=10.1038/ng.2354 | pmid=22820512 | pmc=3696580}}</ref> <ref>{{cite journal |vauthors=Fuchsberger C, Abecasis GR, Hinds DA | title=minimac2: faster genotype imputation | journal=Bioinformatics | date=2015 | volume=31 | issue=5|pages=782–784 | doi=10.1093/bioinformatics/btu704 | pmid=25338720 | pmc=4341061}}</ref> || [http://csg.sph.umich.edu//abecasis/MACH/index.html MaCH], [http://genome.sph.umich.edu/wiki/Minimac3 minimac3]
| MaCH/minimac3 format<ref>{{cite journal |vauthors=Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR | title=Fast and accurate genotype imputation in genome-wide association studies through pre-phasing | journal=Nat Genet | date=2012 | volume=44 | issue=8|pages=955–959 | doi=10.1038/ng.2354 | pmid=22820512 | pmc=3696580}}</ref> <ref>{{cite journal |vauthors=Fuchsberger C, Abecasis GR, Hinds DA | title=minimac2: faster genotype imputation | journal=Bioinformatics | date=2015 | volume=31 | issue=5|pages=782–784 | doi=10.1093/bioinformatics/btu704 | pmid=25338720 | pmc=4341061}}</ref> || [http://csg.sph.umich.edu//abecasis/MACH/index.html MaCH], [http://genome.sph.umich.edu/wiki/Minimac3 minimac3]
|-
|-
| MLBQTL format<ref>{{cite journal |vauthors=Alcais A, Philippi A, Abel L | title=Genetic model-free linkage analysis using the maximum-likelihood- binomial method for categorical traits | journal=Genet Epidemiol | date=1999 | volume=17 | issue=Suppl 1|pages=S467-472 | doi=10.1002/gepi.1370170775}}</ref>
| MLBQTL format<ref>{{cite journal |vauthors=Alcais A, Philippi A, Abel L | title=Genetic model-free linkage analysis using the maximum-likelihood- binomial method for categorical traits | journal=Genet Epidemiol | date=1999 | volume=17 | issue=Suppl 1|pages=S467–472 | doi=10.1002/gepi.1370170775}}</ref>
|| [http://www.hgid.net/site/site.php?rubr=9# MLB-QTL]
|| [http://www.hgid.net/site/site.php?rubr=9# MLB-QTL]
|-
|-
Line 111: Line 111:
|| [http://www.genetics.ucla.edu/software/ Mendel]
|| [http://www.genetics.ucla.edu/software/ Mendel]
|-
|-
| Merlin format<ref name=MERLIN2002 >{{cite journal |vauthors=Abecasis GR, Cherny SS, Cookson WO, Cardon LR | title=Merlin--rapid analysis of dense genetic maps using sparse gene flow trees | journal=Nat Genet | date=2002 | volume=30 | issue=1|pages=97–101. | doi=10.1038/ng786 | pmid=11731797}}</ref>
| Merlin format<ref name=MERLIN2002 >{{cite journal |vauthors=Abecasis GR, Cherny SS, Cookson WO, Cardon LR | title=Merlin--rapid analysis of dense genetic maps using sparse gene flow trees | journal=Nat Genet | date=2002 | volume=30 | issue=1|pages=97–101 | doi=10.1038/ng786 | pmid=11731797}}</ref>
|| [http://www.sph.umich.edu/csg/abecasis/Merlin Merlin]
|| [http://www.sph.umich.edu/csg/abecasis/Merlin Merlin]
|-
|-
| Merlin/SimWalk2-NPL format<ref name=MERLIN2002 /><ref name=SIMWALK2 /> || [http://www.sph.umich.edu/csg/abecasis/Merlin Merlin] [https://watson.hgen.pitt.edu/register SimWalk2]
| Merlin/SimWalk2-NPL format<ref name=MERLIN2002 /><ref name=SIMWALK2 /> || [http://www.sph.umich.edu/csg/abecasis/Merlin Merlin] [https://watson.hgen.pitt.edu/register SimWalk2]
|-
|-
| PANGAEA MORGAN format<ref>{{cite journal | author=Thompson EA | title=Monte Carlo likelihood in the genetic mapping of complex traits | journal=Philos Trans R Soc Lond B Biol Sci | date=1994 | volume=344 | issue=1310|pages=345–350; discussion 350–341 | doi = 10.1098/rstb.1994.0073 }}</ref><ref>{{cite journal | author=Thompson EA | title=Monte Carlo likelihood in genetic mapping | journal=Statistical Science | date=1994 | volume=9 | issue=3|pages=355–366 | doi=10.1214/ss/1177010381}}</ref>
| PANGAEA MORGAN format<ref>{{cite journal | author=Thompson EA | title=Monte Carlo likelihood in the genetic mapping of complex traits | journal=Philos Trans R Soc Lond B Biol Sci | date=1994 | volume=344 | issue=1310|pages=345–350; discussion 350–341 | doi = 10.1098/rstb.1994.0073 | pmid=7800704 }}</ref><ref>{{cite journal | author=Thompson EA | title=Monte Carlo likelihood in genetic mapping | journal=Statistical Science | date=1994 | volume=9 | issue=3|pages=355–366 | doi=10.1214/ss/1177010381}}</ref>
|| [http://www.stat.washington.edu/thompson/Genepi/MORGAN/Morgan.shtml MORGAN]
|| [http://www.stat.washington.edu/thompson/Genepi/MORGAN/Morgan.shtml MORGAN]
|-
|-
Line 151: Line 151:
|| [https://web.archive.org/web/20070726120937/http://www-gene.cimr.cam.ac.uk/clayton/software/ SPLINK]
|| [https://web.archive.org/web/20070726120937/http://www-gene.cimr.cam.ac.uk/clayton/software/ SPLINK]
|-
|-
| SUP format<ref name=SLINK /><ref>{{cite journal | author=Lemire M | title=SUP: an extension to SLINK to allow a larger number of marker loci to be simulated in pedigrees conditional on trait values | journal=BMC Genet | date=2006 | volume=7 |pages=40 }}</ref>
| SUP format<ref name=SLINK /><ref>{{cite journal | author=Lemire M | title=SUP: an extension to SLINK to allow a larger number of marker loci to be simulated in pedigrees conditional on trait values | journal=BMC Genet | date=2006 | volume=7 |pages=40 | doi=10.1186/1471-2156-7-40 | pmid=16803631 | pmc=1524809 }}</ref>
|| [http://mlemire.freeshell.org/software.html SUP]
|| [http://mlemire.freeshell.org/software.html SUP]
|-
|-
Line 157: Line 157:
|| [https://watson.hgen.pitt.edu/register SimWalk2]
|| [https://watson.hgen.pitt.edu/register SimWalk2]
|-
|-
| Structure format<ref>{{cite journal |vauthors=Pritchard JK, Stephens M, Donnelly P | title=Inference of population structure using multilocus genotype data | journal=Genetics | date=2000 | volume=155 | issue=2|pages=945–959. | pmid=10835412 | pmc=1461096}}</ref><ref>{{cite journal |vauthors=Falush D, Stephens M, Pritchard JK | title=Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies | journal=Genetics | date=2003 | volume=164 | issue=4|pages=1567–1587 }}</ref><ref>{{cite journal |vauthors=Falush D, Stephens M, Pritchard JK | title=Inference of population structure using multilocus genotype data: dominant markers and null alleles | journal=Mol Ecol Notes | date=2007 | volume=7 | issue=4|pages=574–578 | doi=10.1111/j.1471-8286.2007.01758.x| pmc=1974779 }}</ref>
| Structure format<ref>{{cite journal |vauthors=Pritchard JK, Stephens M, Donnelly P | title=Inference of population structure using multilocus genotype data | journal=Genetics | date=2000 | volume=155 | issue=2|pages=945–959 | pmid=10835412 | pmc=1461096}}</ref><ref>{{cite journal |vauthors=Falush D, Stephens M, Pritchard JK | title=Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies | journal=Genetics | date=2003 | volume=164 | issue=4|pages=1567–1587 }}</ref><ref>{{cite journal |vauthors=Falush D, Stephens M, Pritchard JK | title=Inference of population structure using multilocus genotype data: dominant markers and null alleles | journal=Mol Ecol Notes | date=2007 | volume=7 | issue=4|pages=574–578 | doi=10.1111/j.1471-8286.2007.01758.x| pmid=18784791 | pmc=1974779 }}</ref>
|| [https://web.archive.org/web/20131219061524/http://pritchardlab.stanford.edu/structure.html Structure]
|| [https://web.archive.org/web/20131219061524/http://pritchardlab.stanford.edu/structure.html Structure]
|-
|-
| VCF format<ref name=VCF /> || [[Variant Call Format|Variant Call Format (Wikipedia entry)]]
| VCF format<ref name=VCF /> || [[Variant Call Format|Variant Call Format (Wikipedia entry)]]
|-
|-
| Vintage Mendel format<ref name=MENDEL2013 /><ref>{{cite journal |vauthors=Lange K, Weeks D, Boehnke M | title=Programs for pedigree analysis: MENDEL, FISHER, and dGENE | journal=Genet Epidemiol | date=1988 | volume=5 |pages=471–472 | doi=10.1002/gepi.1370050611}}</ref>
| Vintage Mendel format<ref name=MENDEL2013 /><ref>{{cite journal |vauthors=Lange K, Weeks D, Boehnke M | title=Programs for pedigree analysis: MENDEL, FISHER, and dGENE | journal=Genet Epidemiol | date=1988 | volume=5 | issue=6 |pages=471–472 | doi=10.1002/gepi.1370050611}}</ref>
|| [http://www.genetics.ucla.edu/software/ Vintage Mendel]
|| [http://www.genetics.ucla.edu/software/ Vintage Mendel]
|-
|-

Revision as of 08:22, 21 February 2019

Mega2
Original author(s)Previous Programmers: Charles P. Kollar, Nandita Mukhopadhyay, Lee Almasy, Mark Schroeder, William P. Mulvihill.
Developer(s)Daniel E. Weeks, Robert V. Baron, Justin R. Stickel.
Initial release16 January 2000; 24 years ago (2000-01-16)
Stable release
5.0.1 / 13 December 2018; 5 years ago (2018-12-13)
Repository
Written inC++
Operating systemLinux, Mac OS X, Microsoft Windows
TypeApplied statistical genetics, Bioinformatics
LicenseGNU General Public License version 3
Websitewatson.hgen.pitt.edu/register/

Mega2 (short for manipulation environment for genetic analysis) allows the applied statistical geneticist to convert one's data from several input formats to a large number output formats suitable for analysis by commonly used software packages.[1][2][3][4] In a typical human genetics study, the analyst often needs to use a variety of different software programs to analyze the data, and these programs usually require that the data be formatted to their precise input specifications. Conversion of one's data into these multiple different formats can be tedious, time-consuming, and error-prone. Mega2, by providing validated conversion pipelines, can accelerate the analyses while reducing errors.

Mega2 produces a common intermediate data representation using SQLite3, which enables the data to be accessed by other programs and languages. In particular, the Mega2R R package converts the SQLite3 data into R data frames. Several R functions are provided that illustrate how data can be extracted from the data frames for common R analysis, such as SKAT and pedgene. The key is being able to efficiently extract genotypes corresponding to chosen subsets of markers so as to facilitate gene-based association testing by automating looping over genes in the genome. Another function converts to VCF format and another converts the data to GenABEL format. For more information about the Mega2R package, see here.

Mega2 has been used to facilitate genetic analyses of a wide variety of human traits, including hereditary dystonia,[5] Ehlers-Danlos syndrome,[6] multiple sclerosis,[7] and gliomas.[8] A list of PubMed Central articles citing Mega2 can be seen here.

Mega2, which focusses on data reformatting, should not be confused with the MEGA, Molecular Evolutionary Genetics Analysis program, which focuses on molecular evolution and phylogenetics.

Input file formats

Mega2 accepts input data in a variety of widely used file formats. These contain, at a minimum, data about the phenotypes, the marker genotypes, any family structures, and map positions of the markers.

Input format Description Links
LINKAGE[9][10][11][12] pre-Makeped or post-Makeped formats Linkage User Guide (PDF), LINKAGE format
Mega2[1][2][3][4] simplified/augmented LINKAGE-format Mega2 format
PLINK[13] ped format or binary bed format PLINK documentation
VCF or BCF[14] Variant Call Format or Binary Variant Call Format Variant Call Format (Wikipedia entry), BCF documentation
IMPUTE2[15][16] IMPUTE2 GEN and BGEN Formats IMPUTE2 documentation, GEN format, BGEN format

Output file formats

Mega2 supports conversion to the following output formats.

Output format Links
ASPEX format ASPEX
Allegro format[17]
Beagle format[18][19] BEAGLE
CRANEFOOT format[20] CRANEFOOT
Eigenstrat format[21][22] EIGENSOFT
FBAT format[23] FBAT
GeneHunter format[24] GeneHunter
GeneHunter-Plus format[25] GeneHunter-Plus
IQLS/Idcoefs format[26][27] IQLS,Idcoefs
Linkage format[9][10][11][12] Linkage User Guide (PDF), LINKAGE format
Loki format[28] Loki
MaCH/minimac3 format[29] [30] MaCH, minimac3
MLBQTL format[31] MLB-QTL
Mega2 annotated format[1][2][3][4] Mega2 format
Mendel format[32] Mendel
Merlin format[33] Merlin
Merlin/SimWalk2-NPL format[33][34] Merlin SimWalk2
PANGAEA MORGAN format[35][36] MORGAN
PAP format[37] PAP
PLINK format[13] (bed, lgen, or ped formats) PLINK
PREST format[38][39] PREST
PSEQ format PSEQ
Pre-makeped LINKAGE format[9][10][11][12] Linkage User Guide (PDF), LINKAGE format
ROADTRIPS format[40] ROADTRIPS
SAGE format SAGE, openSAGE
SHAPEIT format[41][42][43][44][45] SHAPEIT
SIMULATE format[46] SIMULATE
SLINK format[47][48] FASTSLINK
SOLAR format[49][50] SOLAR
SPLINK format[51] SPLINK
SUP format[48][52] SUP
SimWalk2 format[34] SimWalk2
Structure format[53][54][55] Structure
VCF format[14] Variant Call Format (Wikipedia entry)
Vintage Mendel format[32][56] Vintage Mendel
Vitesse format[57] Vitesse

Documentation

The Mega2 documentation is available here in HTML format, and here in PDF format.

References

  1. ^ a b c Mukhopadhyay, N; Almasy L; Schroeder M; Mulvihill WP; Weeks DE (1999). "Mega2, a data-handling program for facilitating genetic linkage and association analyses". Am J Hum Genet. 65: A436.
  2. ^ a b c Mukhopadhyay, N; Almasy L; Schroeder M; Mulvihill WP; Weeks DE (2005). "Mega2: data-handling for facilitating genetic linkage and association analyses". Bioinformatics. 21 (10): 2556–2557. doi:10.1093/bioinformatics/bti364. PMID 15746282.
  3. ^ a b c Kollar, CP; Baron RV; Mukhopadhyay N; Weeks DE (October 2013). "Mega2: enhanced data-handling for facilitating genetic linkage and association analyses". Presented at the 63rd Annual Meeting of the American Society of Human Genetics, Boston: Abstract 1831.
  4. ^ a b c Baron RV, Kollar C, Mukhopadhyay N, Weeks DE (2014). "Mega2: validated data-reformatting for linkage and association analyses". Source Code Biol Med. 9 (1): 26. doi:10.1186/s13029-014-0026-y. PMC 4269913. PMID 25687422.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  5. ^ Hersheson J, Mencacci NE, Davis M, Macdonald N, Trabzuni D, Ryten M, Pittman A, Paudel R, Kara E, Fawcett K, Plagnol V, Bhatia KP, Medlar AJ, Stanescu HC, Hardy J, Kleta R, Wood NW, Houlden H (2013). "Mutations in the autoregulatory domain of beta-tubulin 4a cause hereditary dystonia". Ann Neurol. 73 (4): 546–553. doi:10.1002/ana.23832. PMC 3698699. PMID 23424103.
  6. ^ Baumann M, Giunta C, Krabichler B, Ruschendorf F, Zoppi N, Colombi M, Bittner RE, Quijano-Roy S, Muntoni F, Cirak S, Schreiber G, Zou Y, Hu Y, Romero NB, Carlier RY, Amberger A, Deutschmann A, Straub V, Rohrbach M, Steinmann B, Rostasy K, Karall D, Bonnemann CG, Zschocke J, Fauth C (2012). "Mutations in FKBP14 cause a variant of Ehlers-Danlos syndrome with progressive kyphoscoliosis, myopathy, and hearing loss". Am J Hum Genet. 90 (2): 201–216. doi:10.1016/j.ajhg.2011.12.004. PMC 3276673. PMID 22265013.
  7. ^ Dyment DA, Cader MZ, Chao MJ, Lincoln MR, Morrison KM, Disanto G, Morahan JM, De Luca GC, Sadovnick AD, Lepage P, Montpetit A, Ebers GC, Ramagopalan SV (2012). "Exome sequencing identifies a novel multiple sclerosis susceptibility variant in the TYK2 gene". Neurology. 79 (5): 406–411. doi:10.1212/wnl.0b013e3182616fc4. PMC 3405256. PMID 22744673.
  8. ^ Shete S, Lau CC, Houlston RS, Claus EB, Barnholtz-Sloan J, Lai R, Il'yasova D, Schildkraut J, Sadetzki S, Johansen C, Bernstein JL, Olson SH, Jenkins RB, Yang P, Vick NA, Wrensch M, Davis FG, McCarthy BJ, Leung EH, Davis C, Cheng R, Hosking FJ, Armstrong GN, Liu Y, Yu RK, Henriksson R, Gliogene C, Melin BS, Bondy ML (2011). "Genome-wide high-density SNP linkage search for glioma susceptibility loci: results from the Gliogene Consortium". Cancer Res. 71 (24): 7568–7575. doi:10.1158/0008-5472.can-11-0013. PMC 3242820. PMID 22037877.
  9. ^ a b c Lathrop GM, Lalouel JM (1984). "Easy calculations of lod scores and genetic risks on small computers". Am J Hum Genet. 36: 460–465.
  10. ^ a b c Lathrop GM, Lalouel JM, Julier C, Ott J (1985). "Multilocus linkage analysis in humans: detection of linkage and estimation of recombination". Am J Hum Genet. 37 (3): 482–498.
  11. ^ a b c Lathrop GM, Lalouel JM, White RL (1986). "Construction of human linkage maps: likelihood calculations for multilocus analysis". Genet Epidemiol. 3 (1): 39–52. doi:10.1002/gepi.1370030105. PMID 3957003.
  12. ^ a b c Lathrop GM, Lalouel JM (1988). "Efficient computations in multilocus linkage analysis". Am J Hum Genet. 42 (3): 498–505. PMC 1715153. PMID 3162348.
  13. ^ a b Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. (2011). "The variant call format and VCFtools". Bioinformatics. 27 (15): 2156–8. doi:10.1093/bioinformatics/btr330. PMC 3137218. PMID 21653522.
  14. ^ Howie BN, Donnelly P, Marchini J (2009). "A flexible and accurate genotype imputation method for the next generation of genome-wide association studies". PLoS Genet. 5 (6): e1000529. doi:10.1371/journal.pgen.1000529. PMC 2689936. PMID 19543373.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  15. ^ Marchini J, Howie B (2010). "Genotype imputation for genome-wide association studies". Nat Rev Genet. 11 (7): 499–511. doi:10.1038/nrg2796. PMID 20517342.
  16. ^ Gudbjartsson DF, Jonasson K, Frigge ML, Kong A (2000). "Allegro, a new computer program for multipoint linkage analysis". Nat Genet. 25 (1): 12–13. doi:10.1038/75514. PMID 10802644.
  17. ^ Browning SR, Browning BL (2007). "Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering". Am J Hum Genet. 81 (5): 1084–1097. doi:10.1086/521987. PMC 2265661. PMID 17924348.
  18. ^ Browning BL, Browning SR (2009). "A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals". Am J Hum Genet. 84 (2): 210–223. doi:10.1016/j.ajhg.2009.01.005. PMC 2668004. PMID 19200528.
  19. ^ Makinen VP, Parkkonen M, Wessman M, Groop PH, Kanninen T, Kaski K (2005). "High-throughput pedigree drawing". Eur J Hum Genet. 13 (8): 987–989. doi:10.1038/sj.ejhg.5201430. PMID 15870825.
  20. ^ Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D (2006). "Principal components analysis corrects for stratification in genome-wide association studies". Nat Genet. 38 (8): 904–909. doi:10.1038/ng1847. PMID 16862161.
  21. ^ Patterson N, Price AL, Reich D (2006). "Population structure and eigenanalysis". PLoS Genet. 2 (12): e190. doi:10.1371/journal.pgen.0020190. PMC 1713260. PMID 17194218.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  22. ^ Laird NM, Horvath S, Xu X (2000). "Implementing a unified approach to family-based tests of association". Genet Epidemiol. 19 (Suppl 1): S36–42. doi:10.1002/1098-2272(2000)19:1+<::aid-gepi6>3.3.co;2-d.
  23. ^ Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES (1996). "Parametric and nonparametric linkage analysis: a unified multipoint approach". Am J Hum Genet. 58: 1347–1363.
  24. ^ Kong A, Cox NJ (1997). "Allele-sharing models: LOD scores and accurate linkage tests". Am J Hum Genet. 61 (5): 1179–1188. doi:10.1086/301592. PMC 1716027. PMID 9345087.
  25. ^ Wang Z, McPeek MS (2009). "An Incomplete-Data Quasi-likelihood Approach to Haplotype-Based Genetic Association Studies on Related Individuals". J Am Stat Assoc. 104 (487): 1251–1260. doi:10.1198/jasa.2009.tm08507. PMC 2860453. PMID 20428335.
  26. ^ Abney M (2009). "A graphical algorithm for fast computation of identity coefficients and generalized kinship coefficients". Bioinformatics. 25 (12): 1561–1563. doi:10.1093/bioinformatics/btp185. PMC 2687941. PMID 19359355.
  27. ^ Heath SC (1997). "Markov chain Monte Carlo segregation and linkage analysis for oligogenic models". Am J Hum Genet. 61 (3): 748–760. doi:10.1086/515506. PMID 9326339.
  28. ^ Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR (2012). "Fast and accurate genotype imputation in genome-wide association studies through pre-phasing". Nat Genet. 44 (8): 955–959. doi:10.1038/ng.2354. PMC 3696580. PMID 22820512.
  29. ^ Fuchsberger C, Abecasis GR, Hinds DA (2015). "minimac2: faster genotype imputation". Bioinformatics. 31 (5): 782–784. doi:10.1093/bioinformatics/btu704. PMC 4341061. PMID 25338720.
  30. ^ Alcais A, Philippi A, Abel L (1999). "Genetic model-free linkage analysis using the maximum-likelihood- binomial method for categorical traits". Genet Epidemiol. 17 (Suppl 1): S467–472. doi:10.1002/gepi.1370170775.
  31. ^ a b Lange K, Papp JC, Sinsheimer JS, Sripracha R, Zhou H, Sobel EM (2013). "Mendel: the Swiss army knife of genetic analysis programs". Bioinformatics. 29 (12): 1568–1570. doi:10.1093/bioinformatics/btt187. PMC 3673222. PMID 23610370.
  32. ^ a b Abecasis GR, Cherny SS, Cookson WO, Cardon LR (2002). "Merlin--rapid analysis of dense genetic maps using sparse gene flow trees". Nat Genet. 30 (1): 97–101. doi:10.1038/ng786. PMID 11731797.
  33. ^ a b Sobel E, Lange K (1996). "Descent graphs in pedigree analysis: Applications to haplotyping, location scores, and marker-sharing statistics". Am J Hum Genet. 58 (6): 1323–1337.
  34. ^ Thompson EA (1994). "Monte Carlo likelihood in the genetic mapping of complex traits". Philos Trans R Soc Lond B Biol Sci. 344 (1310): 345–350, discussion 350–341. doi:10.1098/rstb.1994.0073. PMID 7800704.
  35. ^ Thompson EA (1994). "Monte Carlo likelihood in genetic mapping". Statistical Science. 9 (3): 355–366. doi:10.1214/ss/1177010381.
  36. ^ Hasstedt SJ (2005). "jPAP: Document-driven software for genetic analysis". Genet Epidemiol. 29: 255.
  37. ^ McPeek MS, Sun L (2000). "Statistical tests for detection of misspecified relationships by use of genome-screen data". Am J Hum Genet. 66 (3): 1076–1094. doi:10.1086/302800. PMC 1288143. PMID 10712219.
  38. ^ Sun L, Wilder K, McPeek MS (2002). "Enhanced pedigree error detection". Hum Hered. 54 (2): 99–110. doi:10.1159/000067666. PMID 12566741.
  39. ^ Thornton T, McPeek MS (2010). "ROADTRIPS: case-control association testing with partially or completely unknown population and pedigree structure". Am J Hum Genet. 86 (2): 172–184. doi:10.1016/j.ajhg.2010.01.001. PMC 2820184. PMID 20137780.
  40. ^ Delaneau O, Marchini J, Zagury JF (2012). "A linear complexity phasing method for thousands of genomes". Nat Methods. 9 (2): 179–81. doi:10.1038/nmeth.1785. PMID 22138821.
  41. ^ Delaneau O, Zagury JF, Marchini J (2013). "Improved whole-chromosome phasing for disease and population genetic studies". Nat Methods. 10 (1): 5–6. doi:10.1038/nmeth.2307. PMID 23269371.
  42. ^ Delaneau O, Howie B, Cox AJ, Zagury JF, Marchini J (2013). "Haplotype estimation using sequencing reads". Am J Hum Genet. 93 (4): 687–96. doi:10.1016/j.ajhg.2013.09.002. PMC 3791270. PMID 24094745.
  43. ^ O'Connell J, Gurdasani D, Delaneau O, Pirastu N, Ulivi S, Cocca M, et al. (2014). "A general approach for haplotype phasing across the full spectrum of relatedness". PLoS Genet. 10 (4): e1004234. doi:10.1371/journal.pgen.1004234. PMC 3990520. PMID 24743097.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  44. ^ Delaneau O, Marchini J, The 1000 Genomes Project Consortium (2014). "Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel". Nat Commun. 5: 3934. doi:10.1038/ncomms4934. PMC 4338501. PMID 25653097.
  45. ^ Speer M, Terwilliger JD, Ott J (1992). "A chromosome-based method for rapid computer simulation". Am J Hum Genet. 51: A202.
  46. ^ Weeks DE, Ott J, Lathrop GM (1990). "SLINK: a general simulation program for linkage analysis". Am J Hum Genet. 47 (3): A204.
  47. ^ Blangero J, Almasy L (1997). "Multipoint oligogenic linkage analysis of quantitative traits". Genet Epidemiol. 14 (6): 959–964. doi:10.1002/(sici)1098-2272(1997)14:6<959::aid-gepi66>3.0.co;2-k.
  48. ^ Almasy L, Blangero J (1998). "Multipoint quantitative-trait linkage analysis in general pedigrees". Am J Hum Genet. 62 (5): 1198–1211. doi:10.1086/301844.
  49. ^ Holmans P (1993). "Asymptotic properties of affected-sib-pair linkage analysis". Am J Hum Genet. 52 (2): 362–374.
  50. ^ Lemire M (2006). "SUP: an extension to SLINK to allow a larger number of marker loci to be simulated in pedigrees conditional on trait values". BMC Genet. 7: 40. doi:10.1186/1471-2156-7-40. PMC 1524809. PMID 16803631.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  51. ^ Pritchard JK, Stephens M, Donnelly P (2000). "Inference of population structure using multilocus genotype data". Genetics. 155 (2): 945–959. PMC 1461096. PMID 10835412.
  52. ^ Falush D, Stephens M, Pritchard JK (2003). "Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies". Genetics. 164 (4): 1567–1587.
  53. ^ Falush D, Stephens M, Pritchard JK (2007). "Inference of population structure using multilocus genotype data: dominant markers and null alleles". Mol Ecol Notes. 7 (4): 574–578. doi:10.1111/j.1471-8286.2007.01758.x. PMC 1974779. PMID 18784791.
  54. ^ Lange K, Weeks D, Boehnke M (1988). "Programs for pedigree analysis: MENDEL, FISHER, and dGENE". Genet Epidemiol. 5 (6): 471–472. doi:10.1002/gepi.1370050611.
  55. ^ O'Connell JR, Weeks DE (1995). "The VITESSE algorithm for rapid exact multilocus linkage analysis via genotype set-recoding and fuzzy inheritance". Nat Genet. 11 (4): 402–408. doi:10.1038/ng1295-402. PMID 7493020.

External links