Multifactor dimensionality reduction: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
updating citations
updated citations
Line 1: Line 1:
'''Multifactor dimensionality reduction (MDR)''' is a [[machine learning]] approach<ref>{{Cite journal|last=McKinney|first=Brett A.|last2=Reif|first2=David M.|last3=Ritchie|first3=Marylyn D.|last4=Moore|first4=Jason H.|date=2006-01-01|title=Machine learning for detecting gene-gene interactions: a review|url=https://www.ncbi.nlm.nih.gov/pubmed/16722772|journal=Applied Bioinformatics|volume=5|issue=2|pages=77–88|issn=1175-5636|pmc=PMC3244050|pmid=16722772}}</ref> for detecting and characterizing combinations of [[Attribute (computing)|attribute]]s or [[independent variable]]s that interact to influence a dependent or class variable.<ref>{{Cite journal|last=Ritchie|first=Marylyn D.|last2=Hahn|first2=Lance W.|last3=Roodi|first3=Nady|last4=Bailey|first4=L. Renee|last5=Dupont|first5=William D.|last6=Parl|first6=Fritz F.|last7=Moore|first7=Jason H.|date=2001-07-01|title=Multifactor-Dimensionality Reduction Reveals High-Order Interactions among Estrogen-Metabolism Genes in Sporadic Breast Cancer|url=http://www.cell.com/ajhg/fulltext/S0002-9297(07)61453-0|journal=The American Journal of Human Genetics|language=English|volume=69|issue=1|pages=138–147|doi=10.1086/321276|issn=0002-9297|pmc=PMC1226028|pmid=11404819}}</ref><ref>{{Cite journal|last=Ritchie|first=Marylyn D.|last2=Hahn|first2=Lance W.|last3=Moore|first3=Jason H.|date=2003-02-01|title=Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity|url=http://onlinelibrary.wiley.com/doi/10.1002/gepi.10218/abstract|journal=Genetic Epidemiology|language=en|volume=24|issue=2|pages=150–157|doi=10.1002/gepi.10218|issn=1098-2272}}</ref><ref>{{Cite journal|last=Hahn|first=L. W.|last2=Ritchie|first2=M. D.|last3=Moore|first3=J. H.|date=2003-02-12|title=Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions|url=https://academic.oup.com/bioinformatics/article/19/3/376/258073/Multifactor-dimensionality-reduction-software-for|journal=Bioinformatics|language=en|volume=19|issue=3|pages=376–382|doi=10.1093/bioinformatics/btf869|issn=1367-4803}}</ref><ref>{{Cite journal|last=W.|first=Hahn, Lance|last2=H.|first2=Moore, Jason|date=2004-01-01|title=Ideal Discrimination of Discrete Clinical Endpoints Using Multilocus Genotypes|url=http://content.iospress.com/articles/in-silico-biology/isb00126|journal=In Silico Biology|language=en|volume=4|issue=2|issn=1386-6338}}</ref><ref>{{Cite journal|last=Moore|first=Jason H.|date=2004-11-01|title=Computational analysis of gene-gene interactions using multifactor dimensionality reduction|url=http://dx.doi.org/10.1586/14737159.4.6.795|journal=Expert Review of Molecular Diagnostics|volume=4|issue=6|pages=795–803|doi=10.1586/14737159.4.6.795|issn=1473-7159}}</ref><ref name=":1" /> MDR was designed specifically to identify nonadditive [[interaction]]s among [[discrete random variable|discrete]] variables that influence a [[binary numeral system|binary]] outcome and is considered a [[nonparametric]] and model-free alternative to traditional statistical methods such as [[logistic regression]].
'''Multifactor dimensionality reduction (MDR)''' is a [[machine learning]] approach<ref>{{Cite journal|last=McKinney|first=Brett A.|last2=Reif|first2=David M.|last3=Ritchie|first3=Marylyn D.|last4=Moore|first4=Jason H.|date=2006-01-01|title=Machine learning for detecting gene-gene interactions: a review|url=https://www.ncbi.nlm.nih.gov/pubmed/16722772|journal=Applied Bioinformatics|volume=5|issue=2|pages=77–88|issn=1175-5636|pmc=PMC3244050|pmid=16722772}}</ref> for detecting and characterizing combinations of [[Attribute (computing)|attribute]]s or [[independent variable]]s that interact to influence a dependent or class variable.<ref>{{Cite journal|last=Ritchie|first=Marylyn D.|last2=Hahn|first2=Lance W.|last3=Roodi|first3=Nady|last4=Bailey|first4=L. Renee|last5=Dupont|first5=William D.|last6=Parl|first6=Fritz F.|last7=Moore|first7=Jason H.|date=2001-07-01|title=Multifactor-Dimensionality Reduction Reveals High-Order Interactions among Estrogen-Metabolism Genes in Sporadic Breast Cancer|url=http://www.cell.com/ajhg/fulltext/S0002-9297(07)61453-0|journal=The American Journal of Human Genetics|language=English|volume=69|issue=1|pages=138–147|doi=10.1086/321276|issn=0002-9297|pmc=PMC1226028|pmid=11404819}}</ref><ref>{{Cite journal|last=Ritchie|first=Marylyn D.|last2=Hahn|first2=Lance W.|last3=Moore|first3=Jason H.|date=2003-02-01|title=Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity|url=http://onlinelibrary.wiley.com/doi/10.1002/gepi.10218/abstract|journal=Genetic Epidemiology|language=en|volume=24|issue=2|pages=150–157|doi=10.1002/gepi.10218|issn=1098-2272}}</ref><ref>{{Cite journal|last=Hahn|first=L. W.|last2=Ritchie|first2=M. D.|last3=Moore|first3=J. H.|date=2003-02-12|title=Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions|url=https://academic.oup.com/bioinformatics/article/19/3/376/258073/Multifactor-dimensionality-reduction-software-for|journal=Bioinformatics|language=en|volume=19|issue=3|pages=376–382|doi=10.1093/bioinformatics/btf869|issn=1367-4803}}</ref><ref>{{Cite journal|last=W.|first=Hahn, Lance|last2=H.|first2=Moore, Jason|date=2004-01-01|title=Ideal Discrimination of Discrete Clinical Endpoints Using Multilocus Genotypes|url=http://content.iospress.com/articles/in-silico-biology/isb00126|journal=In Silico Biology|language=en|volume=4|issue=2|issn=1386-6338}}</ref><ref>{{Cite journal|last=Moore|first=Jason H.|date=2004-11-01|title=Computational analysis of gene-gene interactions using multifactor dimensionality reduction|url=http://dx.doi.org/10.1586/14737159.4.6.795|journal=Expert Review of Molecular Diagnostics|volume=4|issue=6|pages=795–803|doi=10.1586/14737159.4.6.795|issn=1473-7159}}</ref><ref name=":1" /><ref>{{Cite journal|last=Moore|first=Jason H.|date=2010-01-01|title=Detecting, characterizing, and interpreting nonlinear gene-gene interactions using multifactor dimensionality reduction|url=https://www.ncbi.nlm.nih.gov/pubmed/21029850|journal=Advances in Genetics|volume=72|pages=101–116|doi=10.1016/B978-0-12-380862-2.00005-9|issn=0065-2660|pmid=21029850}}</ref> MDR was designed specifically to identify nonadditive [[interaction]]s among [[discrete random variable|discrete]] variables that influence a [[binary numeral system|binary]] outcome and is considered a [[nonparametric]] and model-free alternative to traditional statistical methods such as [[logistic regression]].


The basis of the MDR method is a constructive induction or [[feature engineering]] algorithm that converts two or more variables or attributes to a single attribute.<ref>{{Cite journal|last=Moore|first=Jason H.|last2=Gilbert|first2=Joshua C.|last3=Tsai|first3=Chia-Ti|last4=Chiang|first4=Fu-Tien|last5=Holden|first5=Todd|last6=Barney|first6=Nate|last7=White|first7=Bill C.|date=2006-07-21|title=A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility|url=http://www.sciencedirect.com/science/article/pii/S0022519305005217|journal=Journal of Theoretical Biology|volume=241|issue=2|pages=252–261|doi=10.1016/j.jtbi.2005.11.036}}</ref> This process of constructing a new attribute changes the representation space of the data.<ref>{{Cite web|url=http://www.sciencedirect.com/science/article/pii/0004370283900164|title=A theory and methodology of inductive learning - ScienceDirect|website=www.sciencedirect.com|language=en|access-date=2017-05-06}}</ref> The end goal is to create or discover a representation that facilitates the detection of [[nonlinear]] or nonadditive interactions among the attributes such that prediction of the class variable is improved over that of the original representation of the data.
The basis of the MDR method is a constructive induction or [[feature engineering]] algorithm that converts two or more variables or attributes to a single attribute.<ref name=":2">{{Cite journal|last=Moore|first=Jason H.|last2=Gilbert|first2=Joshua C.|last3=Tsai|first3=Chia-Ti|last4=Chiang|first4=Fu-Tien|last5=Holden|first5=Todd|last6=Barney|first6=Nate|last7=White|first7=Bill C.|date=2006-07-21|title=A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility|url=http://www.sciencedirect.com/science/article/pii/S0022519305005217|journal=Journal of Theoretical Biology|volume=241|issue=2|pages=252–261|doi=10.1016/j.jtbi.2005.11.036}}</ref> This process of constructing a new attribute changes the representation space of the data.<ref>{{Cite web|url=http://www.sciencedirect.com/science/article/pii/0004370283900164|title=A theory and methodology of inductive learning - ScienceDirect|website=www.sciencedirect.com|language=en|access-date=2017-05-06}}</ref> The end goal is to create or discover a representation that facilitates the detection of [[nonlinear]] or nonadditive interactions among the attributes such that prediction of the class variable is improved over that of the original representation of the data.


== Illustrative example ==
== Illustrative example ==
Line 37: Line 37:


== Machine learning with MDR ==
== Machine learning with MDR ==
As illustrated above, the basic constructive induction algorithm in MDR is very simple. However, its implementation for mining patterns from real data can be computationally complex. As with any machine learning algorithm there is always concern about [[overfitting]]. That is, machine learning algorithms are good at finding patterns in completely random data. It is often difficult to determine whether a reported pattern is an important signal or just chance. One approach is to estimate the generalizability of a model to independent datasets using methods such as [[cross-validation (statistics)|cross-validation]].<ref name=":0">{{Cite journal|last=Coffey|first=Christopher S.|last2=Hebert|first2=Patricia R.|last3=Ritchie|first3=Marylyn D.|last4=Krumholz|first4=Harlan M.|last5=Gaziano|first5=J. Michael|last6=Ridker|first6=Paul M.|last7=Brown|first7=Nancy J.|last8=Vaughan|first8=Douglas E.|last9=Moore|first9=Jason H.|date=2004-01-01|title=An application of conditional logistic regression and multifactor dimensionality reduction for detecting gene-gene Interactions on risk of myocardial infarction: The importance of model validation|url=http://dx.doi.org/10.1186/1471-2105-5-49|journal=BMC Bioinformatics|volume=5|pages=49|doi=10.1186/1471-2105-5-49|issn=1471-2105|pmc=PMC419697|pmid=15119966}}</ref><ref>{{Cite journal|last=Motsinger|first=Alison A.|last2=Ritchie|first2=Marylyn D.|date=2006-09-01|title=The effect of reduction in cross-validation intervals on the performance of multifactor dimensionality reduction|url=http://onlinelibrary.wiley.com/doi/10.1002/gepi.20166/abstract|journal=Genetic Epidemiology|language=en|volume=30|issue=6|pages=546–555|doi=10.1002/gepi.20166|issn=1098-2272}}</ref> Models that describe random data typically don't generalize. Another approach is to generate many random permutations of the data to see what the data mining algorithm finds when given the chance to overfit. [[Resampling (statistics)#Permutation tests|Permutation testing]] makes it possible to generate an empirical [[p-value]] for the result.<ref>{{Cite journal|last=Pattin|first=Kristine A.|last2=White|first2=Bill C.|last3=Barney|first3=Nate|last4=Gui|first4=Jiang|last5=Nelson|first5=Heather H.|last6=Kelsey|first6=Karl T.|last7=Andrew|first7=Angeline S.|last8=Karagas|first8=Margaret R.|last9=Moore|first9=Jason H.|date=2009-01-01|title=A computationally efficient hypothesis testing method for epistasis analysis using multifactor dimensionality reduction|url=http://onlinelibrary.wiley.com/doi/10.1002/gepi.20360/abstract|journal=Genetic Epidemiology|language=en|volume=33|issue=1|pages=87–94|doi=10.1002/gepi.20360|issn=1098-2272|pmc=PMC2700860|pmid=18671250}}</ref><ref>{{Cite book|url=http://www.worldscientific.com/doi/abs/10.1142/9789814295291_0035|title=Biocomputing 2010|last=Greene|first=Casey S.|last2=Himmelstein|first2=Daniel S.|last3=Nelson|first3=Heather H.|last4=Kelsey|first4=Karl T.|last5=Williams|first5=Scott M.|last6=Andrew|first6=Angeline S.|last7=Karagas|first7=Margaret R.|last8=Moore|first8=Jason H.|date=2009-10-01|publisher=WORLD SCIENTIFIC|isbn=9789814299473|pages=327–336|doi=10.1142/9789814295291_0035}}</ref> Replication in independent data may also provide evidence for an MDR model but can be sensitive to difference in the data sets.<ref>{{Cite journal|last=Greene|first=Casey S.|last2=Penrod|first2=Nadia M.|last3=Williams|first3=Scott M.|last4=Moore|first4=Jason H.|date=2009-06-02|title=Failure to Replicate a Genetic Association May Provide Important Clues About Genetic Architecture|url=http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0005639|journal=PLOS ONE|volume=4|issue=6|pages=e5639|doi=10.1371/journal.pone.0005639|issn=1932-6203|pmc=PMC2685469|pmid=19503614}}</ref><ref>{{Cite journal|last=Piette|first=Elizabeth R.|last2=Moore|first2=Jason H.|date=2017-04-19|title=Improving the Reproducibility of Genetic Association Results Using Genotype Resampling Methods|url=https://link.springer.com/chapter/10.1007/978-3-319-55849-3_7|journal=Applications of Evolutionary Computation|language=en|publisher=Springer, Cham|pages=96–108|doi=10.1007/978-3-319-55849-3_7}}</ref> These approaches have all been shown to be useful for choosing and evaluating MDR models. Tips and approaches for using MDR to model gene-gene interactions have been reviewed.<ref name=":1">{{Cite book|url=http://dx.doi.org/10.1007/978-1-4939-2155-3_16|title=Epistasis|last=Moore|first=JasonH.|last2=Andrews|first2=PeterC.|date=2015-01-01|publisher=Springer New York|isbn=9781493921546|editor-last=Moore|editor-first=Jason H.|series=Methods in Molecular Biology|pages=301–314|language=English|doi=10.1007/978-1-4939-2155-3_16|editor-last2=Williams|editor-first2=Scott M.}}</ref>
As illustrated above, the basic constructive induction algorithm in MDR is very simple. However, its implementation for mining patterns from real data can be computationally complex. As with any machine learning algorithm there is always concern about [[overfitting]]. That is, machine learning algorithms are good at finding patterns in completely random data. It is often difficult to determine whether a reported pattern is an important signal or just chance. One approach is to estimate the generalizability of a model to independent datasets using methods such as [[cross-validation (statistics)|cross-validation]].<ref name=":0">{{Cite journal|last=Coffey|first=Christopher S.|last2=Hebert|first2=Patricia R.|last3=Ritchie|first3=Marylyn D.|last4=Krumholz|first4=Harlan M.|last5=Gaziano|first5=J. Michael|last6=Ridker|first6=Paul M.|last7=Brown|first7=Nancy J.|last8=Vaughan|first8=Douglas E.|last9=Moore|first9=Jason H.|date=2004-01-01|title=An application of conditional logistic regression and multifactor dimensionality reduction for detecting gene-gene Interactions on risk of myocardial infarction: The importance of model validation|url=http://dx.doi.org/10.1186/1471-2105-5-49|journal=BMC Bioinformatics|volume=5|pages=49|doi=10.1186/1471-2105-5-49|issn=1471-2105|pmc=PMC419697|pmid=15119966}}</ref><ref>{{Cite journal|last=Motsinger|first=Alison A.|last2=Ritchie|first2=Marylyn D.|date=2006-09-01|title=The effect of reduction in cross-validation intervals on the performance of multifactor dimensionality reduction|url=http://onlinelibrary.wiley.com/doi/10.1002/gepi.20166/abstract|journal=Genetic Epidemiology|language=en|volume=30|issue=6|pages=546–555|doi=10.1002/gepi.20166|issn=1098-2272}}</ref> Models that describe random data typically don't generalize. Another approach is to generate many random permutations of the data to see what the data mining algorithm finds when given the chance to overfit. [[Resampling (statistics)#Permutation tests|Permutation testing]] makes it possible to generate an empirical [[p-value]] for the result.<ref>{{Cite journal|last=Pattin|first=Kristine A.|last2=White|first2=Bill C.|last3=Barney|first3=Nate|last4=Gui|first4=Jiang|last5=Nelson|first5=Heather H.|last6=Kelsey|first6=Karl T.|last7=Andrew|first7=Angeline S.|last8=Karagas|first8=Margaret R.|last9=Moore|first9=Jason H.|date=2009-01-01|title=A computationally efficient hypothesis testing method for epistasis analysis using multifactor dimensionality reduction|url=http://onlinelibrary.wiley.com/doi/10.1002/gepi.20360/abstract|journal=Genetic Epidemiology|language=en|volume=33|issue=1|pages=87–94|doi=10.1002/gepi.20360|issn=1098-2272|pmc=PMC2700860|pmid=18671250}}</ref><ref>{{Cite book|url=http://www.worldscientific.com/doi/abs/10.1142/9789814295291_0035|title=Biocomputing 2010|last=Greene|first=Casey S.|last2=Himmelstein|first2=Daniel S.|last3=Nelson|first3=Heather H.|last4=Kelsey|first4=Karl T.|last5=Williams|first5=Scott M.|last6=Andrew|first6=Angeline S.|last7=Karagas|first7=Margaret R.|last8=Moore|first8=Jason H.|date=2009-10-01|publisher=WORLD SCIENTIFIC|isbn=9789814299473|pages=327–336|doi=10.1142/9789814295291_0035}}</ref> Replication in independent data may also provide evidence for an MDR model but can be sensitive to difference in the data sets.<ref>{{Cite journal|last=Greene|first=Casey S.|last2=Penrod|first2=Nadia M.|last3=Williams|first3=Scott M.|last4=Moore|first4=Jason H.|date=2009-06-02|title=Failure to Replicate a Genetic Association May Provide Important Clues About Genetic Architecture|url=http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0005639|journal=PLOS ONE|volume=4|issue=6|pages=e5639|doi=10.1371/journal.pone.0005639|issn=1932-6203|pmc=PMC2685469|pmid=19503614}}</ref><ref>{{Cite journal|last=Piette|first=Elizabeth R.|last2=Moore|first2=Jason H.|date=2017-04-19|title=Improving the Reproducibility of Genetic Association Results Using Genotype Resampling Methods|url=https://link.springer.com/chapter/10.1007/978-3-319-55849-3_7|journal=Applications of Evolutionary Computation|language=en|publisher=Springer, Cham|pages=96–108|doi=10.1007/978-3-319-55849-3_7}}</ref> These approaches have all been shown to be useful for choosing and evaluating MDR models. An important step in an machine learning exercise is interpretation. Several approaches have been used with MDR including entropy analysis<ref name=":2" /><ref>{{Cite journal|last=Moore|first=Jason H.|last2=Hu|first2=Ting|date=2015-01-01|title=Epistasis analysis using information theory|url=https://www.ncbi.nlm.nih.gov/pubmed/25403536|journal=Methods in Molecular Biology (Clifton, N.J.)|volume=1253|pages=257–268|doi=10.1007/978-1-4939-2155-3_13|issn=1940-6029|pmid=25403536}}</ref> and pathway analysis<ref>{{Cite journal|last=Kim|first=Nora Chung|last2=Andrews|first2=Peter C.|last3=Asselbergs|first3=Folkert W.|last4=Frost|first4=H. Robert|last5=Williams|first5=Scott M.|last6=Harris|first6=Brent T.|last7=Read|first7=Cynthia|last8=Askland|first8=Kathleen D.|last9=Moore|first9=Jason H.|date=2012-07-28|title=Gene ontology analysis of pairwise genetic associations in two genome-wide studies of sporadic ALS|url=https://www.ncbi.nlm.nih.gov/pubmed/22839596|journal=BioData Mining|volume=5|issue=1|pages=9|doi=10.1186/1756-0381-5-9|issn=1756-0381|pmc=PMC3463436|pmid=22839596}}</ref><ref>{{Cite journal|last=Cheng|first=Samantha|last2=Andrew|first2=Angeline S.|last3=Andrews|first3=Peter C.|last4=Moore|first4=Jason H.|date=2016-01-01|title=Complex systems analysis of bladder cancer susceptibility reveals a role for decarboxylase activity in two genome-wide association studies|url=https://www.ncbi.nlm.nih.gov/pubmed/27999618|journal=BioData Mining|volume=9|pages=40|doi=10.1186/s13040-016-0119-z|pmc=PMC5154053|pmid=27999618}}</ref>. Tips and approaches for using MDR to model gene-gene interactions have been reviewed.<ref name=":1">{{Cite book|url=http://dx.doi.org/10.1007/978-1-4939-2155-3_16|title=Epistasis|last=Moore|first=JasonH.|last2=Andrews|first2=PeterC.|date=2015-01-01|publisher=Springer New York|isbn=9781493921546|editor-last=Moore|editor-first=Jason H.|series=Methods in Molecular Biology|pages=301–314|language=English|doi=10.1007/978-1-4939-2155-3_16|editor-last2=Williams|editor-first2=Scott M.}}</ref><ref>{{Cite journal|last=Gola|first=Damian|last2=Mahachie John|first2=Jestinah M.|last3=van Steen|first3=Kristel|last4=König|first4=Inke R.|date=2016-03-01|title=A roadmap to multifactor dimensionality reduction methods|url=https://www.ncbi.nlm.nih.gov/pubmed/26108231|journal=Briefings in Bioinformatics|volume=17|issue=2|pages=293–308|doi=10.1093/bib/bbv038|issn=1477-4054|pmc=PMC4793893|pmid=26108231}}</ref>


== Applications ==
== Extensions to MDR ==
MDR has mostly been applied to detecting gene-gene interactions or [[epistasis]] in genetic studies of common human diseases such as [[atrial fibrillation]]<ref>{{Cite journal|last=Tsai|first=Chia-Ti|last2=Lai|first2=Ling-Ping|last3=Lin|first3=Jiunn-Lee|last4=Chiang|first4=Fu-Tien|last5=Hwang|first5=Juey-Jen|last6=Ritchie|first6=Marylyn D.|last7=Moore|first7=Jason H.|last8=Hsu|first8=Kuan-Lih|last9=Tseng|first9=Chuen-Den|date=2004-04-06|title=Renin-Angiotensin System Gene Polymorphisms and Atrial Fibrillation|url=http://circ.ahajournals.org/content/109/13/1640|journal=Circulation|language=en|volume=109|issue=13|pages=1640–1646|doi=10.1161/01.CIR.0000124487.36586.26|issn=0009-7322|pmid=15023884}}</ref><ref>{{Cite journal|last=Asselbergs|first=Folkert W.|last2=Moore|first2=Jason H.|last3=van den Berg|first3=Maarten P.|last4=Rimm|first4=Eric B.|last5=de Boer|first5=Rudolf A.|last6=Dullaart|first6=Robin P.|last7=Navis|first7=Gerjan|last8=van Gilst|first8=Wiek H.|date=2006-01-01|title=A role for CETP TaqIB polymorphism in determining susceptibility to atrial fibrillation: a nested case control study|url=http://dx.doi.org/10.1186/1471-2350-7-39|journal=BMC Medical Genetics|volume=7|pages=39|doi=10.1186/1471-2350-7-39|issn=1471-2350|pmc=PMC1462991|pmid=16623947}}</ref>, [[autism]]<ref>{{Cite journal|last=Ma|first=D.Q.|last2=Whitehead|first2=P.L.|last3=Menold|first3=M.M.|last4=Martin|first4=E.R.|last5=Ashley-Koch|first5=A.E.|last6=Mei|first6=H.|last7=Ritchie|first7=M.D.|last8=DeLong|first8=G.R.|last9=Abramson|first9=R.K.|date=2005-09-01|title=Identification of Significant Association and Gene-Gene Interaction of GABA Receptor Subunit Genes in Autism|url=http://www.cell.com/ajhg/fulltext/S0002-9297(07)63019-5|journal=The American Journal of Human Genetics|language=English|volume=77|issue=3|pages=377–388|doi=10.1086/433195|issn=0002-9297|pmc=PMC1226204|pmid=16080114}}</ref>, [[bladder cancer]]<ref>{{Cite journal|last=Andrew|first=Angeline S.|last2=Nelson|first2=Heather H.|last3=Kelsey|first3=Karl T.|last4=Moore|first4=Jason H.|last5=Meng|first5=Alexis C.|last6=Casella|first6=Daniel P.|last7=Tosteson|first7=Tor D.|last8=Schned|first8=Alan R.|last9=Karagas|first9=Margaret R.|date=2006-05-01|title=Concordance of multiple analytical approaches demonstrates a complex relationship between DNA repair gene SNPs, smoking and bladder cancer susceptibility|url=https://academic.oup.com/carcin/article/27/5/1030/2476131/Concordance-of-multiple-analytical-approaches|journal=Carcinogenesis|volume=27|issue=5|pages=1030–1037|doi=10.1093/carcin/bgi284|issn=0143-3334}}</ref><ref>{{Cite journal|last=Andrew|first=Angeline S.|last2=Karagas|first2=Margaret R.|last3=Nelson|first3=Heather H.|last4=Guarrera|first4=Simonetta|last5=Polidoro|first5=Silvia|last6=Gamberini|first6=Sara|last7=Sacerdote|first7=Carlotta|last8=Moore|first8=Jason H.|last9=Kelsey|first9=Karl T.|date=2008-01-01|title=DNA Repair Polymorphisms Modify Bladder Cancer Risk: A Multi-factor Analytic Strategy|url=https://www.karger.com/Article/Abstract/108942|journal=Human Heredity|language=english|volume=65|issue=2|pages=105–118|doi=10.1159/000108942|issn=0001-5652|pmc=PMC2857629|pmid=17898541}}</ref>, [[breast cancer]]<ref>{{Cite journal|last=Cao|first=Jingjing|last2=Luo|first2=Chenglin|last3=Yan|first3=Rui|last4=Peng|first4=Rui|last5=Wang|first5=Kaijuan|last6=Wang|first6=Peng|last7=Ye|first7=Hua|last8=Song|first8=Chunhua|date=2016-12-01|title=rs15869 at miRNA binding site in BRCA2 is associated with breast cancer susceptibility|url=https://link.springer.com/article/10.1007/s12032-016-0849-2|journal=Medical Oncology|language=en|volume=33|issue=12|pages=135|doi=10.1007/s12032-016-0849-2|issn=1357-0560}}</ref>, [[cardiovascular disease]]<ref name=":0" />, [[hypertension]]<ref>{{Cite journal|last=Williams|first=Scott M.|last2=Ritchie|first2=Marylyn D.|last3=III|first3=John A. Phillips|last4=Dawson|first4=Elliot|last5=Prince|first5=Melissa|last6=Dzhura|first6=Elvira|last7=Willis|first7=Alecia|last8=Semenya|first8=Amma|last9=Summar|first9=Marshall|date=2004-01-01|title=Multilocus Analysis of Hypertension: A Hierarchical Approach|url=https://www.karger.com/Article/Abstract/77387|journal=Human Heredity|language=english|volume=57|issue=1|pages=28–38|doi=10.1159/000077387|issn=0001-5652}}</ref><ref>{{Cite journal|last=Sanada|first=Hironobu|last2=Yatabe|first2=Junichi|last3=Midorikawa|first3=Sanae|last4=Hashimoto|first4=Shigeatsu|last5=Watanabe|first5=Tsuyoshi|last6=Moore|first6=Jason H.|last7=Ritchie|first7=Marylyn D.|last8=Williams|first8=Scott M.|last9=Pezzullo|first9=John C.|date=2006-03-01|title=Single-Nucleotide Polymorphisms for Diagnosis of Salt-Sensitive Hypertension|url=http://clinchem.aaccjnls.org/content/52/3/352|journal=Clinical Chemistry|language=en|volume=52|issue=3|pages=352–360|doi=10.1373/clinchem.2005.059139|issn=0009-9147|pmid=16439609}}</ref>, [[pancreatic cancer]]<ref>{{Cite journal|last=Duell|first=Eric J.|last2=Bracci|first2=Paige M.|last3=Moore|first3=Jason H.|last4=Burk|first4=Robert D.|last5=Kelsey|first5=Karl T.|last6=Holly|first6=Elizabeth A.|date=2008-06-01|title=Detecting pathway-based gene-gene and gene-environment interactions in pancreatic cancer|url=https://www.ncbi.nlm.nih.gov/pubmed/18559563|journal=Cancer Epidemiology, Biomarkers & Prevention: A Publication of the American Association for Cancer Research, Cosponsored by the American Society of Preventive Oncology|volume=17|issue=6|pages=1470–1479|doi=10.1158/1055-9965.EPI-07-2797|issn=1055-9965|pmc=PMC4410856|pmid=18559563}}</ref>, and [[prostate cancer]]<ref>{{Cite journal|last=Xu|first=Jianfeng|last2=Lowey|first2=James|last3=Wiklund|first3=Fredrik|last4=Sun|first4=Jielin|last5=Lindmark|first5=Fredrik|last6=Hsu|first6=Fang-Chi|last7=Dimitrov|first7=Latchezar|last8=Chang|first8=Baoli|last9=Turner|first9=Aubrey R.|date=2005-11-01|title=The Interaction of Four Genes in the Inflammation Pathway Significantly Predicts Prostate Cancer Risk|url=http://cebp.aacrjournals.org/content/14/11/2563|journal=Cancer Epidemiology and Prevention Biomarkers|language=en|volume=14|issue=11|pages=2563–2568|doi=10.1158/1055-9965.EPI-05-0356|issn=1055-9965|pmid=16284379}}</ref>. It has also been applied to other biomedical problems such as the genetic analysis of [[pharmacology]] outcomes.<ref>{{Cite journal|last=Wilke|first=Russell A.|last2=Reif|first2=David M.|last3=Moore|first3=Jason H.|date=2005-11-01|title=Combinatorial Pharmacogenetics|url=https://www.nature.com/nrd/journal/v4/n11/full/nrd1874.html|journal=Nature Reviews Drug Discovery|language=en|volume=4|issue=11|pages=911–918|doi=10.1038/nrd1874|issn=1474-1776}}</ref> A central challenge is the scaling of MDR to [[big data]] such as that from [[Genome-wide association study|genome-wide association studies]] (GWAS)<ref>{{Cite journal|last=Moore|first=Jason H.|last2=Asselbergs|first2=Folkert W.|last3=Williams|first3=Scott M.|date=2010-02-15|title=Bioinformatics challenges for genome-wide association studies|url=https://www.ncbi.nlm.nih.gov/pubmed/20053841|journal=Bioinformatics (Oxford, England)|volume=26|issue=4|pages=445–455|doi=10.1093/bioinformatics/btp713|issn=1367-4811|pmc=PMC2820680|pmid=20053841}}</ref>. Several approaches have been used. One approach is to filter the features prior to MDR analysis<ref>{{Cite journal|last=Sun|first=Xiangqing|last2=Lu|first2=Qing|last3=Mukherjee|first3=Shubhabrata|last4=Mukheerjee|first4=Shubhabrata|last5=Crane|first5=Paul K.|last6=Elston|first6=Robert|last7=Ritchie|first7=Marylyn D.|date=2014-01-01|title=Analysis pipeline for the epistasis search - statistical versus biological filtering|url=https://www.ncbi.nlm.nih.gov/pubmed/24817878|journal=Frontiers in Genetics|volume=5|pages=106|doi=10.3389/fgene.2014.00106|pmc=PMC4012196|pmid=24817878}}</ref>. This can be done using biological knowledge through tools such as BioFilter<ref>{{Cite journal|last=Pendergrass|first=Sarah A.|last2=Frase|first2=Alex|last3=Wallace|first3=John|last4=Wolfe|first4=Daniel|last5=Katiyar|first5=Neerja|last6=Moore|first6=Carrie|last7=Ritchie|first7=Marylyn D.|date=2013-12-30|title=Genomic analyses with biofilter 2.0: knowledge driven filtering, annotation, and model development|url=https://www.ncbi.nlm.nih.gov/pubmed/24378202|journal=BioData Mining|volume=6|issue=1|pages=25|doi=10.1186/1756-0381-6-25|pmc=PMC3917600|pmid=24378202}}</ref>. It can also be done using computational tools such as ReliefF<ref>{{Cite journal|last=Moore|first=Jason H.|date=2015-01-01|title=Epistasis analysis using ReliefF|url=https://www.ncbi.nlm.nih.gov/pubmed/25403540|journal=Methods in Molecular Biology (Clifton, N.J.)|volume=1253|pages=315–325|doi=10.1007/978-1-4939-2155-3_17|issn=1940-6029|pmid=25403540}}</ref>. Another approach is to use [[stochastic search]] algorithms such as [[genetic programming]] to explore the search space of feature combinations<ref>{{Cite book|url=http://link.springer.com/chapter/10.1007/978-0-387-49650-4_2|title=Genetic Programming Theory and Practice IV|last=Moore|first=Jason H.|last2=White|first2=Bill C.|date=2007-01-01|publisher=Springer US|isbn=9780387333755|editor-last=Riolo|editor-first=Rick|series=Genetic and Evolutionary Computation|pages=11–28|language=en|doi=10.1007/978-0-387-49650-4_2|editor-last2=Soule|editor-first2=Terence|editor-last3=Worzel|editor-first3=Bill}}</ref>. Yet another approach is a brute-force search using [[High Performance Computing|high-performance computing]]<ref>{{Cite journal|last=Greene|first=Casey S.|last2=Sinnott-Armstrong|first2=Nicholas A.|last3=Himmelstein|first3=Daniel S.|last4=Park|first4=Paul J.|last5=Moore|first5=Jason H.|last6=Harris|first6=Brent T.|date=2010-03-01|title=Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS|url=https://www.ncbi.nlm.nih.gov/pubmed/20081222|journal=Bioinformatics (Oxford, England)|volume=26|issue=5|pages=694–695|doi=10.1093/bioinformatics/btq009|issn=1367-4811|pmc=PMC2828117|pmid=20081222}}</ref><ref>{{Cite journal|last=Bush|first=William S.|last2=Dudek|first2=Scott M.|last3=Ritchie|first3=Marylyn D.|date=2006-09-01|title=Parallel multifactor dimensionality reduction: a tool for the large-scale analysis of gene-gene interactions|url=https://www.ncbi.nlm.nih.gov/pubmed/16809395|journal=Bioinformatics (Oxford, England)|volume=22|issue=17|pages=2173–2174|doi=10.1093/bioinformatics/btl347|issn=1367-4811|pmc=PMC4939609|pmid=16809395}}</ref><ref>{{Cite journal|last=Sinnott-Armstrong|first=Nicholas A.|last2=Greene|first2=Casey S.|last3=Cancare|first3=Fabio|last4=Moore|first4=Jason H.|date=2009-07-24|title=Accelerating epistasis analysis in human genetics with consumer graphics hardware|url=https://www.ncbi.nlm.nih.gov/pubmed/19630950|journal=BMC research notes|volume=2|pages=149|doi=10.1186/1756-0500-2-149|issn=1756-0500|pmc=PMC2732631|pmid=19630950}}</ref>.
Numerous extensions to MDR have been introduced. These include family-based methods<ref>{{Cite journal|last=Martin|first=E. R.|last2=Ritchie|first2=M. D.|last3=Hahn|first3=L.|last4=Kang|first4=S.|last5=Moore|first5=J. H.|date=2006-02-01|title=A novel method to identify gene-gene effects in nuclear families: the MDR-PDT|url=https://www.ncbi.nlm.nih.gov/pubmed/16374833|journal=Genetic Epidemiology|volume=30|issue=2|pages=111–123|doi=10.1002/gepi.20128|issn=0741-0395|pmid=16374833}}</ref><ref>{{Cite journal|last=Lou|first=Xiang-Yang|last2=Chen|first2=Guo-Bo|last3=Yan|first3=Lei|last4=Ma|first4=Jennie Z.|last5=Mangold|first5=Jamie E.|last6=Zhu|first6=Jun|last7=Elston|first7=Robert C.|last8=Li|first8=Ming D.|date=2008-10-01|title=A combinatorial approach to detecting gene-gene and gene-environment interactions in family studies|url=https://www.ncbi.nlm.nih.gov/pubmed/18834969|journal=American Journal of Human Genetics|volume=83|issue=4|pages=457–467|doi=10.1016/j.ajhg.2008.09.001|issn=1537-6605|pmc=PMC2561932|pmid=18834969}}</ref><ref>{{Cite journal|last=Cattaert|first=Tom|last2=Urrea|first2=Víctor|last3=Naj|first3=Adam C.|last4=De Lobel|first4=Lizzy|last5=De Wit|first5=Vanessa|last6=Fu|first6=Mao|last7=Mahachie John|first7=Jestinah M.|last8=Shen|first8=Haiqing|last9=Calle|first9=M. Luz|date=2010-04-22|title=FAM-MDR: a flexible family-based multifactor dimensionality reduction technique to detect epistasis using related individuals|url=https://www.ncbi.nlm.nih.gov/pubmed/20421984|journal=PloS One|volume=5|issue=4|pages=e10304|doi=10.1371/journal.pone.0010304|issn=1932-6203|pmc=PMC2858665|pmid=20421984}}</ref>, fuzzy methods<ref>{{Cite journal|last=Leem|first=Sangseob|last2=Park|first2=Taesung|date=2017-03-14|title=An empirical fuzzy multifactor dimensionality reduction method for detecting gene-gene interactions|url=https://www.ncbi.nlm.nih.gov/pubmed/28361694|journal=BMC genomics|volume=18|issue=Suppl 2|pages=115|doi=10.1186/s12864-017-3496-x|issn=1471-2164|pmc=PMC5374597|pmid=28361694}}</ref>, covariate adjustment<ref>{{Cite journal|last=Gui|first=Jiang|last2=Andrew|first2=Angeline S.|last3=Andrews|first3=Peter|last4=Nelson|first4=Heather M.|last5=Kelsey|first5=Karl T.|last6=Karagas|first6=Margaret R.|last7=Moore|first7=Jason H.|date=2010-01-01|title=A simple and computationally efficient sampling approach to covariate adjustment for multifactor dimensionality reduction analysis of epistasis|url=https://www.ncbi.nlm.nih.gov/pubmed/20924193|journal=Human Heredity|volume=70|issue=3|pages=219–225|doi=10.1159/000319175|issn=1423-0062|pmc=PMC2982850|pmid=20924193}}</ref>, survival methods<ref>{{Cite journal|last=Gui|first=Jiang|last2=Moore|first2=Jason H.|last3=Kelsey|first3=Karl T.|last4=Marsit|first4=Carmen J.|last5=Karagas|first5=Margaret R.|last6=Andrew|first6=Angeline S.|date=2011-01-01|title=A novel survival multifactor dimensionality reduction method for detecting gene-gene interactions with application to bladder cancer prognosis|url=https://www.ncbi.nlm.nih.gov/pubmed/20981448|journal=Human Genetics|volume=129|issue=1|pages=101–110|doi=10.1007/s00439-010-0905-5|issn=1432-1203|pmc=PMC3255326|pmid=20981448}}</ref><ref>{{Cite journal|last=Lee|first=Seungyeoun|last2=Son|first2=Donghee|last3=Yu|first3=Wenbao|last4=Park|first4=Taesung|date=2016-12-01|title=Gene-Gene Interaction Analysis for the Accelerated Failure Time Model Using a Unified Model-Based Multifactor Dimensionality Reduction Method|url=https://www.ncbi.nlm.nih.gov/pubmed/28154507|journal=Genomics & Informatics|volume=14|issue=4|pages=166–172|doi=10.5808/GI.2016.14.4.166|issn=1598-866X|pmc=PMC5287120|pmid=28154507}}</ref>, robust methods<ref>{{Cite journal|last=Gui|first=Jiang|last2=Andrew|first2=Angeline S.|last3=Andrews|first3=Peter|last4=Nelson|first4=Heather M.|last5=Kelsey|first5=Karl T.|last6=Karagas|first6=Margaret R.|last7=Moore|first7=Jason H.|date=2011-01-01|title=A robust multifactor dimensionality reduction method for detecting gene-gene interactions with application to the genetic analysis of bladder cancer susceptibility|url=https://www.ncbi.nlm.nih.gov/pubmed/21091664|journal=Annals of Human Genetics|volume=75|issue=1|pages=20–28|doi=10.1111/j.1469-1809.2010.00624.x|issn=1469-1809|pmc=PMC3057873|pmid=21091664}}</ref>, methods for quantitative traits<ref>{{Cite journal|last=Gui|first=Jiang|last2=Moore|first2=Jason H.|last3=Williams|first3=Scott M.|last4=Andrews|first4=Peter|last5=Hillege|first5=Hans L.|last6=van der Harst|first6=Pim|last7=Navis|first7=Gerjan|last8=Van Gilst|first8=Wiek H.|last9=Asselbergs|first9=Folkert W.|date=2013-01-01|title=A Simple and Computationally Efficient Approach to Multifactor Dimensionality Reduction Analysis of Gene-Gene Interactions for Quantitative Traits|url=https://www.ncbi.nlm.nih.gov/pubmed/23805232|journal=PloS One|volume=8|issue=6|pages=e66545|doi=10.1371/journal.pone.0066545|issn=1932-6203|pmc=PMC3689797|pmid=23805232}}</ref><ref>{{Cite journal|last=Lou|first=Xiang-Yang|last2=Chen|first2=Guo-Bo|last3=Yan|first3=Lei|last4=Ma|first4=Jennie Z.|last5=Zhu|first5=Jun|last6=Elston|first6=Robert C.|last7=Li|first7=Ming D.|date=2007-06-01|title=A generalized combinatorial approach for detecting gene-by-gene and gene-by-environment interactions with application to nicotine dependence|url=https://www.ncbi.nlm.nih.gov/pubmed/17503330|journal=American Journal of Human Genetics|volume=80|issue=6|pages=1125–1137|doi=10.1086/518312|issn=0002-9297|pmc=PMC1867100|pmid=17503330}}</ref>, and many others.

== Applications of MDR ==
MDR has mostly been applied to detecting gene-gene interactions or [[epistasis]] in genetic studies of common human diseases such as [[atrial fibrillation]]<ref>{{Cite journal|last=Tsai|first=Chia-Ti|last2=Lai|first2=Ling-Ping|last3=Lin|first3=Jiunn-Lee|last4=Chiang|first4=Fu-Tien|last5=Hwang|first5=Juey-Jen|last6=Ritchie|first6=Marylyn D.|last7=Moore|first7=Jason H.|last8=Hsu|first8=Kuan-Lih|last9=Tseng|first9=Chuen-Den|date=2004-04-06|title=Renin-Angiotensin System Gene Polymorphisms and Atrial Fibrillation|url=http://circ.ahajournals.org/content/109/13/1640|journal=Circulation|language=en|volume=109|issue=13|pages=1640–1646|doi=10.1161/01.CIR.0000124487.36586.26|issn=0009-7322|pmid=15023884}}</ref><ref>{{Cite journal|last=Asselbergs|first=Folkert W.|last2=Moore|first2=Jason H.|last3=van den Berg|first3=Maarten P.|last4=Rimm|first4=Eric B.|last5=de Boer|first5=Rudolf A.|last6=Dullaart|first6=Robin P.|last7=Navis|first7=Gerjan|last8=van Gilst|first8=Wiek H.|date=2006-01-01|title=A role for CETP TaqIB polymorphism in determining susceptibility to atrial fibrillation: a nested case control study|url=http://dx.doi.org/10.1186/1471-2350-7-39|journal=BMC Medical Genetics|volume=7|pages=39|doi=10.1186/1471-2350-7-39|issn=1471-2350|pmc=PMC1462991|pmid=16623947}}</ref>, [[autism]]<ref>{{Cite journal|last=Ma|first=D.Q.|last2=Whitehead|first2=P.L.|last3=Menold|first3=M.M.|last4=Martin|first4=E.R.|last5=Ashley-Koch|first5=A.E.|last6=Mei|first6=H.|last7=Ritchie|first7=M.D.|last8=DeLong|first8=G.R.|last9=Abramson|first9=R.K.|date=2005-09-01|title=Identification of Significant Association and Gene-Gene Interaction of GABA Receptor Subunit Genes in Autism|url=http://www.cell.com/ajhg/fulltext/S0002-9297(07)63019-5|journal=The American Journal of Human Genetics|language=English|volume=77|issue=3|pages=377–388|doi=10.1086/433195|issn=0002-9297|pmc=PMC1226204|pmid=16080114}}</ref>, [[bladder cancer]]<ref>{{Cite journal|last=Andrew|first=Angeline S.|last2=Nelson|first2=Heather H.|last3=Kelsey|first3=Karl T.|last4=Moore|first4=Jason H.|last5=Meng|first5=Alexis C.|last6=Casella|first6=Daniel P.|last7=Tosteson|first7=Tor D.|last8=Schned|first8=Alan R.|last9=Karagas|first9=Margaret R.|date=2006-05-01|title=Concordance of multiple analytical approaches demonstrates a complex relationship between DNA repair gene SNPs, smoking and bladder cancer susceptibility|url=https://academic.oup.com/carcin/article/27/5/1030/2476131/Concordance-of-multiple-analytical-approaches|journal=Carcinogenesis|volume=27|issue=5|pages=1030–1037|doi=10.1093/carcin/bgi284|issn=0143-3334}}</ref><ref>{{Cite journal|last=Andrew|first=Angeline S.|last2=Karagas|first2=Margaret R.|last3=Nelson|first3=Heather H.|last4=Guarrera|first4=Simonetta|last5=Polidoro|first5=Silvia|last6=Gamberini|first6=Sara|last7=Sacerdote|first7=Carlotta|last8=Moore|first8=Jason H.|last9=Kelsey|first9=Karl T.|date=2008-01-01|title=DNA Repair Polymorphisms Modify Bladder Cancer Risk: A Multi-factor Analytic Strategy|url=https://www.karger.com/Article/Abstract/108942|journal=Human Heredity|language=english|volume=65|issue=2|pages=105–118|doi=10.1159/000108942|issn=0001-5652|pmc=PMC2857629|pmid=17898541}}</ref><ref>{{Cite journal|last=Andrew|first=Angeline S.|last2=Hu|first2=Ting|last3=Gu|first3=Jian|last4=Gui|first4=Jiang|last5=Ye|first5=Yuanqing|last6=Marsit|first6=Carmen J.|last7=Kelsey|first7=Karl T.|last8=Schned|first8=Alan R.|last9=Tanyos|first9=Sam A.|date=2012-01-01|title=HSD3B and gene-gene interactions in a pathway-based analysis of genetic susceptibility to bladder cancer|url=https://www.ncbi.nlm.nih.gov/pubmed/23284679|journal=PloS One|volume=7|issue=12|pages=e51301|doi=10.1371/journal.pone.0051301|issn=1932-6203|pmc=PMC3526593|pmid=23284679}}</ref>, [[breast cancer]]<ref>{{Cite journal|last=Cao|first=Jingjing|last2=Luo|first2=Chenglin|last3=Yan|first3=Rui|last4=Peng|first4=Rui|last5=Wang|first5=Kaijuan|last6=Wang|first6=Peng|last7=Ye|first7=Hua|last8=Song|first8=Chunhua|date=2016-12-01|title=rs15869 at miRNA binding site in BRCA2 is associated with breast cancer susceptibility|url=https://link.springer.com/article/10.1007/s12032-016-0849-2|journal=Medical Oncology|language=en|volume=33|issue=12|pages=135|doi=10.1007/s12032-016-0849-2|issn=1357-0560}}</ref>, [[cardiovascular disease]]<ref name=":0" />, [[hypertension]]<ref>{{Cite journal|last=Williams|first=Scott M.|last2=Ritchie|first2=Marylyn D.|last3=III|first3=John A. Phillips|last4=Dawson|first4=Elliot|last5=Prince|first5=Melissa|last6=Dzhura|first6=Elvira|last7=Willis|first7=Alecia|last8=Semenya|first8=Amma|last9=Summar|first9=Marshall|date=2004-01-01|title=Multilocus Analysis of Hypertension: A Hierarchical Approach|url=https://www.karger.com/Article/Abstract/77387|journal=Human Heredity|language=english|volume=57|issue=1|pages=28–38|doi=10.1159/000077387|issn=0001-5652}}</ref><ref>{{Cite journal|last=Sanada|first=Hironobu|last2=Yatabe|first2=Junichi|last3=Midorikawa|first3=Sanae|last4=Hashimoto|first4=Shigeatsu|last5=Watanabe|first5=Tsuyoshi|last6=Moore|first6=Jason H.|last7=Ritchie|first7=Marylyn D.|last8=Williams|first8=Scott M.|last9=Pezzullo|first9=John C.|date=2006-03-01|title=Single-Nucleotide Polymorphisms for Diagnosis of Salt-Sensitive Hypertension|url=http://clinchem.aaccjnls.org/content/52/3/352|journal=Clinical Chemistry|language=en|volume=52|issue=3|pages=352–360|doi=10.1373/clinchem.2005.059139|issn=0009-9147|pmid=16439609}}</ref>, [[obesity]]<ref>{{Cite journal|last=De|first=Rishika|last2=Verma|first2=Shefali S.|last3=Holzinger|first3=Emily|last4=Hall|first4=Molly|last5=Burt|first5=Amber|last6=Carrell|first6=David S.|last7=Crosslin|first7=David R.|last8=Jarvik|first8=Gail P.|last9=Kuivaniemi|first9=Helena|date=2017-02-01|title=Identifying gene-gene interactions that are highly associated with four quantitative lipid traits across multiple cohorts|url=https://www.ncbi.nlm.nih.gov/pubmed/27848076|journal=Human Genetics|volume=136|issue=2|pages=165–178|doi=10.1007/s00439-016-1738-7|issn=1432-1203|pmid=27848076}}</ref><ref>{{Cite journal|last=De|first=Rishika|last2=Verma|first2=Shefali S.|last3=Drenos|first3=Fotios|last4=Holzinger|first4=Emily R.|last5=Holmes|first5=Michael V.|last6=Hall|first6=Molly A.|last7=Crosslin|first7=David R.|last8=Carrell|first8=David S.|last9=Hakonarson|first9=Hakon|date=2015-01-01|title=Identifying gene-gene interactions that are highly associated with Body Mass Index using Quantitative Multifactor Dimensionality Reduction (QMDR)|url=https://www.ncbi.nlm.nih.gov/pubmed/26674805|journal=BioData Mining|volume=8|pages=41|doi=10.1186/s13040-015-0074-0|pmc=PMC4678717|pmid=26674805}}</ref>, [[pancreatic cancer]]<ref>{{Cite journal|last=Duell|first=Eric J.|last2=Bracci|first2=Paige M.|last3=Moore|first3=Jason H.|last4=Burk|first4=Robert D.|last5=Kelsey|first5=Karl T.|last6=Holly|first6=Elizabeth A.|date=2008-06-01|title=Detecting pathway-based gene-gene and gene-environment interactions in pancreatic cancer|url=https://www.ncbi.nlm.nih.gov/pubmed/18559563|journal=Cancer Epidemiology, Biomarkers & Prevention: A Publication of the American Association for Cancer Research, Cosponsored by the American Society of Preventive Oncology|volume=17|issue=6|pages=1470–1479|doi=10.1158/1055-9965.EPI-07-2797|issn=1055-9965|pmc=PMC4410856|pmid=18559563}}</ref>, [[prostate cancer]]<ref>{{Cite journal|last=Xu|first=Jianfeng|last2=Lowey|first2=James|last3=Wiklund|first3=Fredrik|last4=Sun|first4=Jielin|last5=Lindmark|first5=Fredrik|last6=Hsu|first6=Fang-Chi|last7=Dimitrov|first7=Latchezar|last8=Chang|first8=Baoli|last9=Turner|first9=Aubrey R.|date=2005-11-01|title=The Interaction of Four Genes in the Inflammation Pathway Significantly Predicts Prostate Cancer Risk|url=http://cebp.aacrjournals.org/content/14/11/2563|journal=Cancer Epidemiology and Prevention Biomarkers|language=en|volume=14|issue=11|pages=2563–2568|doi=10.1158/1055-9965.EPI-05-0356|issn=1055-9965|pmid=16284379}}</ref><ref>{{Cite journal|last=Lavender|first=Nicole A.|last2=Rogers|first2=Erica N.|last3=Yeyeodu|first3=Susan|last4=Rudd|first4=James|last5=Hu|first5=Ting|last6=Zhang|first6=Jie|last7=Brock|first7=Guy N.|last8=Kimbro|first8=Kevin S.|last9=Moore|first9=Jason H.|date=2012-04-30|title=Interaction among apoptosis-associated sequence variants and joint effects on aggressive prostate cancer|url=https://www.ncbi.nlm.nih.gov/pubmed/22546513|journal=BMC medical genomics|volume=5|pages=11|doi=10.1186/1755-8794-5-11|issn=1755-8794|pmc=PMC3355002|pmid=22546513}}</ref><ref>{{Cite journal|last=Lavender|first=Nicole A.|last2=Benford|first2=Marnita L.|last3=VanCleave|first3=Tiva T.|last4=Brock|first4=Guy N.|last5=Kittles|first5=Rick A.|last6=Moore|first6=Jason H.|last7=Hein|first7=David W.|last8=Kidd|first8=La Creis R.|date=2009-11-16|title=Examination of polymorphic glutathione S-transferase (GST) genes, tobacco smoking and prostate cancer risk among men of African descent: a case-control study|url=https://www.ncbi.nlm.nih.gov/pubmed/19917083|journal=BMC cancer|volume=9|pages=397|doi=10.1186/1471-2407-9-397|issn=1471-2407|pmc=PMC2783040|pmid=19917083}}</ref> and [[tuberculosis]]<ref>{{Cite journal|last=Collins|first=Ryan L.|last2=Hu|first2=Ting|last3=Wejse|first3=Christian|last4=Sirugo|first4=Giorgio|last5=Williams|first5=Scott M.|last6=Moore|first6=Jason H.|date=2013-02-18|title=Multifactor dimensionality reduction reveals a three-locus epistatic interaction associated with susceptibility to pulmonary tuberculosis|url=https://www.ncbi.nlm.nih.gov/pubmed/23418869|journal=BioData Mining|volume=6|issue=1|pages=4|doi=10.1186/1756-0381-6-4|pmc=PMC3618340|pmid=23418869}}</ref>. It has also been applied to other biomedical problems such as the genetic analysis of [[pharmacology]] outcomes.<ref>{{Cite journal|last=Wilke|first=Russell A.|last2=Reif|first2=David M.|last3=Moore|first3=Jason H.|date=2005-11-01|title=Combinatorial Pharmacogenetics|url=https://www.nature.com/nrd/journal/v4/n11/full/nrd1874.html|journal=Nature Reviews Drug Discovery|language=en|volume=4|issue=11|pages=911–918|doi=10.1038/nrd1874|issn=1474-1776}}</ref> A central challenge is the scaling of MDR to [[big data]] such as that from [[Genome-wide association study|genome-wide association studies]] (GWAS)<ref>{{Cite journal|last=Moore|first=Jason H.|last2=Asselbergs|first2=Folkert W.|last3=Williams|first3=Scott M.|date=2010-02-15|title=Bioinformatics challenges for genome-wide association studies|url=https://www.ncbi.nlm.nih.gov/pubmed/20053841|journal=Bioinformatics (Oxford, England)|volume=26|issue=4|pages=445–455|doi=10.1093/bioinformatics/btp713|issn=1367-4811|pmc=PMC2820680|pmid=20053841}}</ref>. Several approaches have been used. One approach is to filter the features prior to MDR analysis<ref>{{Cite journal|last=Sun|first=Xiangqing|last2=Lu|first2=Qing|last3=Mukherjee|first3=Shubhabrata|last4=Mukheerjee|first4=Shubhabrata|last5=Crane|first5=Paul K.|last6=Elston|first6=Robert|last7=Ritchie|first7=Marylyn D.|date=2014-01-01|title=Analysis pipeline for the epistasis search - statistical versus biological filtering|url=https://www.ncbi.nlm.nih.gov/pubmed/24817878|journal=Frontiers in Genetics|volume=5|pages=106|doi=10.3389/fgene.2014.00106|pmc=PMC4012196|pmid=24817878}}</ref>. This can be done using biological knowledge through tools such as BioFilter<ref>{{Cite journal|last=Pendergrass|first=Sarah A.|last2=Frase|first2=Alex|last3=Wallace|first3=John|last4=Wolfe|first4=Daniel|last5=Katiyar|first5=Neerja|last6=Moore|first6=Carrie|last7=Ritchie|first7=Marylyn D.|date=2013-12-30|title=Genomic analyses with biofilter 2.0: knowledge driven filtering, annotation, and model development|url=https://www.ncbi.nlm.nih.gov/pubmed/24378202|journal=BioData Mining|volume=6|issue=1|pages=25|doi=10.1186/1756-0381-6-25|pmc=PMC3917600|pmid=24378202}}</ref>. It can also be done using computational tools such as ReliefF<ref>{{Cite journal|last=Moore|first=Jason H.|date=2015-01-01|title=Epistasis analysis using ReliefF|url=https://www.ncbi.nlm.nih.gov/pubmed/25403540|journal=Methods in Molecular Biology (Clifton, N.J.)|volume=1253|pages=315–325|doi=10.1007/978-1-4939-2155-3_17|issn=1940-6029|pmid=25403540}}</ref>. Another approach is to use [[stochastic search]] algorithms such as [[genetic programming]] to explore the search space of feature combinations<ref>{{Cite book|url=http://link.springer.com/chapter/10.1007/978-0-387-49650-4_2|title=Genetic Programming Theory and Practice IV|last=Moore|first=Jason H.|last2=White|first2=Bill C.|date=2007-01-01|publisher=Springer US|isbn=9780387333755|editor-last=Riolo|editor-first=Rick|series=Genetic and Evolutionary Computation|pages=11–28|language=en|doi=10.1007/978-0-387-49650-4_2|editor-last2=Soule|editor-first2=Terence|editor-last3=Worzel|editor-first3=Bill}}</ref>. Yet another approach is a brute-force search using [[High Performance Computing|high-performance computing]]<ref>{{Cite journal|last=Greene|first=Casey S.|last2=Sinnott-Armstrong|first2=Nicholas A.|last3=Himmelstein|first3=Daniel S.|last4=Park|first4=Paul J.|last5=Moore|first5=Jason H.|last6=Harris|first6=Brent T.|date=2010-03-01|title=Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS|url=https://www.ncbi.nlm.nih.gov/pubmed/20081222|journal=Bioinformatics (Oxford, England)|volume=26|issue=5|pages=694–695|doi=10.1093/bioinformatics/btq009|issn=1367-4811|pmc=PMC2828117|pmid=20081222}}</ref><ref>{{Cite journal|last=Bush|first=William S.|last2=Dudek|first2=Scott M.|last3=Ritchie|first3=Marylyn D.|date=2006-09-01|title=Parallel multifactor dimensionality reduction: a tool for the large-scale analysis of gene-gene interactions|url=https://www.ncbi.nlm.nih.gov/pubmed/16809395|journal=Bioinformatics (Oxford, England)|volume=22|issue=17|pages=2173–2174|doi=10.1093/bioinformatics/btl347|issn=1367-4811|pmc=PMC4939609|pmid=16809395}}</ref><ref>{{Cite journal|last=Sinnott-Armstrong|first=Nicholas A.|last2=Greene|first2=Casey S.|last3=Cancare|first3=Fabio|last4=Moore|first4=Jason H.|date=2009-07-24|title=Accelerating epistasis analysis in human genetics with consumer graphics hardware|url=https://www.ncbi.nlm.nih.gov/pubmed/19630950|journal=BMC research notes|volume=2|pages=149|doi=10.1186/1756-0500-2-149|issn=1756-0500|pmc=PMC2732631|pmid=19630950}}</ref>.


== Software ==
== Software ==
[http://www.epistasis.org www.epistasis.org] provides an [[open-source]] and freely-available MDR software package.
[http://www.epistasis.org www.epistasis.org] provides an [[open-source]] and freely-available MDR software package.


An [https://cran.r-project.org/web/packages/MDR/index.html R package] for MDR<ref>{{Cite journal|last=Winham|first=Stacey J.|last2=Motsinger-Reif|first2=Alison A.|date=2011-08-16|title=An R package implementation of multifactor dimensionality reduction|url=https://www.ncbi.nlm.nih.gov/pubmed/21846375|journal=BioData Mining|volume=4|issue=1|pages=24|doi=10.1186/1756-0381-4-24|issn=1756-0381|pmc=PMC3177775|pmid=21846375}}</ref>
An [https://cran.r-project.org/web/packages/MDR/index.html R package] for MDR


A sklearn-compatible [https://github.com/EpistasisLab/scikit-mdr Python implementation]
A sklearn-compatible [https://github.com/EpistasisLab/scikit-mdr Python implementation]

An [https://cran.r-project.org/web/packages/mbmdr/index.html R package] for Model-Based MDR<ref>{{Cite journal|last=Calle|first=M. Luz|last2=Urrea|first2=Víctor|last3=Malats|first3=Núria|last4=Van Steen|first4=Kristel|date=2010-09-01|title=mbmdr: an R package for exploring gene-gene interactions associated with binary or quantitative traits|url=https://www.ncbi.nlm.nih.gov/pubmed/20595460|journal=Bioinformatics (Oxford, England)|volume=26|issue=17|pages=2198–2199|doi=10.1093/bioinformatics/btq352|issn=1367-4811|pmid=20595460}}</ref>


== See also ==
== See also ==

Revision as of 17:50, 14 May 2017

Multifactor dimensionality reduction (MDR) is a machine learning approach[1] for detecting and characterizing combinations of attributes or independent variables that interact to influence a dependent or class variable.[2][3][4][5][6][7][8] MDR was designed specifically to identify nonadditive interactions among discrete variables that influence a binary outcome and is considered a nonparametric and model-free alternative to traditional statistical methods such as logistic regression.

The basis of the MDR method is a constructive induction or feature engineering algorithm that converts two or more variables or attributes to a single attribute.[9] This process of constructing a new attribute changes the representation space of the data.[10] The end goal is to create or discover a representation that facilitates the detection of nonlinear or nonadditive interactions among the attributes such that prediction of the class variable is improved over that of the original representation of the data.

Illustrative example

Consider the following simple example using the exclusive OR (XOR) function. XOR is a logical operator that is commonly used in data mining and machine learning as an example of a function that is not linearly separable. The table below represents a simple dataset where the relationship between the attributes (X1 and X2) and the class variable (Y) is defined by the XOR function such that Y = X1 XOR X2.

Table 1

X1 X2 Y
0 0 0
0 1 1
1 0 1
1 1 0

A machine learning algorithm would need to discover or approximate the XOR function in order to accurately predict Y using information about X1 and X2. An alternative strategy would be to first change the representation of the data using constructive induction to facilitate predictive modeling. The MDR algorithm would change the representation of the data (X1 and X2) in the following manner. MDR starts by selecting two attributes. In this simple example, X1 and X2 are selected. Each combination of values for X1 and X2 are examined and the number of times Y=1 and/or Y=0 is counted. In this simple example, Y=1 occurs zero times and Y=0 occurs once for the combination of X1=0 and X2=0. With MDR, the ratio of these counts is computed and compared to a fixed threshold. Here, the ratio of counts is 0/1 which is less than our fixed threshold of 1. Since 0/1 < 1 we encode a new attribute (Z) as a 0. When the ratio is greater than one we encode Z as a 1. This process is repeated for all unique combinations of values for X1 and X2. Table 2 illustrates our new transformation of the data.

Table 2

Z Y
0 0
1 1
1 1
0 0

The machine learning algorithm now has much less work to do to find a good predictive function. In fact, in this very simple example, the function Y = Z has a classification accuracy of 1. A nice feature of constructive induction methods such as MDR is the ability to use any data mining or machine learning method to analyze the new representation of the data. Decision trees, neural networks, or a naive Bayes classifier could be used in combination with measures of model quality such as balanced accuracy[11] and mutual information[12].

Machine learning with MDR

As illustrated above, the basic constructive induction algorithm in MDR is very simple. However, its implementation for mining patterns from real data can be computationally complex. As with any machine learning algorithm there is always concern about overfitting. That is, machine learning algorithms are good at finding patterns in completely random data. It is often difficult to determine whether a reported pattern is an important signal or just chance. One approach is to estimate the generalizability of a model to independent datasets using methods such as cross-validation.[13][14] Models that describe random data typically don't generalize. Another approach is to generate many random permutations of the data to see what the data mining algorithm finds when given the chance to overfit. Permutation testing makes it possible to generate an empirical p-value for the result.[15][16] Replication in independent data may also provide evidence for an MDR model but can be sensitive to difference in the data sets.[17][18] These approaches have all been shown to be useful for choosing and evaluating MDR models. An important step in an machine learning exercise is interpretation. Several approaches have been used with MDR including entropy analysis[9][19] and pathway analysis[20][21]. Tips and approaches for using MDR to model gene-gene interactions have been reviewed.[7][22]

Extensions to MDR

Numerous extensions to MDR have been introduced. These include family-based methods[23][24][25], fuzzy methods[26], covariate adjustment[27], survival methods[28][29], robust methods[30], methods for quantitative traits[31][32], and many others.

Applications of MDR

MDR has mostly been applied to detecting gene-gene interactions or epistasis in genetic studies of common human diseases such as atrial fibrillation[33][34], autism[35], bladder cancer[36][37][38], breast cancer[39], cardiovascular disease[13], hypertension[40][41], obesity[42][43], pancreatic cancer[44], prostate cancer[45][46][47] and tuberculosis[48]. It has also been applied to other biomedical problems such as the genetic analysis of pharmacology outcomes.[49] A central challenge is the scaling of MDR to big data such as that from genome-wide association studies (GWAS)[50]. Several approaches have been used. One approach is to filter the features prior to MDR analysis[51]. This can be done using biological knowledge through tools such as BioFilter[52]. It can also be done using computational tools such as ReliefF[53]. Another approach is to use stochastic search algorithms such as genetic programming to explore the search space of feature combinations[54]. Yet another approach is a brute-force search using high-performance computing[55][56][57].

Software

www.epistasis.org provides an open-source and freely-available MDR software package.

An R package for MDR[58]

A sklearn-compatible Python implementation

An R package for Model-Based MDR[59]

See also

References

  1. ^ McKinney, Brett A.; Reif, David M.; Ritchie, Marylyn D.; Moore, Jason H. (2006-01-01). "Machine learning for detecting gene-gene interactions: a review". Applied Bioinformatics. 5 (2): 77–88. ISSN 1175-5636. PMC 3244050. PMID 16722772.{{cite journal}}: CS1 maint: PMC format (link)
  2. ^ Ritchie, Marylyn D.; Hahn, Lance W.; Roodi, Nady; Bailey, L. Renee; Dupont, William D.; Parl, Fritz F.; Moore, Jason H. (2001-07-01). "Multifactor-Dimensionality Reduction Reveals High-Order Interactions among Estrogen-Metabolism Genes in Sporadic Breast Cancer". The American Journal of Human Genetics. 69 (1): 138–147. doi:10.1086/321276. ISSN 0002-9297. PMC 1226028. PMID 11404819.{{cite journal}}: CS1 maint: PMC format (link)
  3. ^ Ritchie, Marylyn D.; Hahn, Lance W.; Moore, Jason H. (2003-02-01). "Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity". Genetic Epidemiology. 24 (2): 150–157. doi:10.1002/gepi.10218. ISSN 1098-2272.
  4. ^ Hahn, L. W.; Ritchie, M. D.; Moore, J. H. (2003-02-12). "Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions". Bioinformatics. 19 (3): 376–382. doi:10.1093/bioinformatics/btf869. ISSN 1367-4803.
  5. ^ W., Hahn, Lance; H., Moore, Jason (2004-01-01). "Ideal Discrimination of Discrete Clinical Endpoints Using Multilocus Genotypes". In Silico Biology. 4 (2). ISSN 1386-6338.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  6. ^ Moore, Jason H. (2004-11-01). "Computational analysis of gene-gene interactions using multifactor dimensionality reduction". Expert Review of Molecular Diagnostics. 4 (6): 795–803. doi:10.1586/14737159.4.6.795. ISSN 1473-7159.
  7. ^ a b Moore, JasonH.; Andrews, PeterC. (2015-01-01). Moore, Jason H.; Williams, Scott M. (eds.). Epistasis. Methods in Molecular Biology. Springer New York. pp. 301–314. doi:10.1007/978-1-4939-2155-3_16. ISBN 9781493921546.
  8. ^ Moore, Jason H. (2010-01-01). "Detecting, characterizing, and interpreting nonlinear gene-gene interactions using multifactor dimensionality reduction". Advances in Genetics. 72: 101–116. doi:10.1016/B978-0-12-380862-2.00005-9. ISSN 0065-2660. PMID 21029850.
  9. ^ a b Moore, Jason H.; Gilbert, Joshua C.; Tsai, Chia-Ti; Chiang, Fu-Tien; Holden, Todd; Barney, Nate; White, Bill C. (2006-07-21). "A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility". Journal of Theoretical Biology. 241 (2): 252–261. doi:10.1016/j.jtbi.2005.11.036.
  10. ^ "A theory and methodology of inductive learning - ScienceDirect". www.sciencedirect.com. Retrieved 2017-05-06.
  11. ^ Velez, Digna R.; White, Bill C.; Motsinger, Alison A.; Bush, William S.; Ritchie, Marylyn D.; Williams, Scott M.; Moore, Jason H. (2007-05-01). "A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction". Genetic Epidemiology. 31 (4): 306–315. doi:10.1002/gepi.20211. ISSN 0741-0395. PMID 17323372.
  12. ^ Bush, William S.; Edwards, Todd L.; Dudek, Scott M.; McKinney, Brett A.; Ritchie, Marylyn D. (2008-01-01). "Alternative contingency table measures improve the power and detection of multifactor dimensionality reduction". BMC Bioinformatics. 9: 238. doi:10.1186/1471-2105-9-238. ISSN 1471-2105. PMC 2412877. PMID 18485205.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  13. ^ a b Coffey, Christopher S.; Hebert, Patricia R.; Ritchie, Marylyn D.; Krumholz, Harlan M.; Gaziano, J. Michael; Ridker, Paul M.; Brown, Nancy J.; Vaughan, Douglas E.; Moore, Jason H. (2004-01-01). "An application of conditional logistic regression and multifactor dimensionality reduction for detecting gene-gene Interactions on risk of myocardial infarction: The importance of model validation". BMC Bioinformatics. 5: 49. doi:10.1186/1471-2105-5-49. ISSN 1471-2105. PMC 419697. PMID 15119966.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  14. ^ Motsinger, Alison A.; Ritchie, Marylyn D. (2006-09-01). "The effect of reduction in cross-validation intervals on the performance of multifactor dimensionality reduction". Genetic Epidemiology. 30 (6): 546–555. doi:10.1002/gepi.20166. ISSN 1098-2272.
  15. ^ Pattin, Kristine A.; White, Bill C.; Barney, Nate; Gui, Jiang; Nelson, Heather H.; Kelsey, Karl T.; Andrew, Angeline S.; Karagas, Margaret R.; Moore, Jason H. (2009-01-01). "A computationally efficient hypothesis testing method for epistasis analysis using multifactor dimensionality reduction". Genetic Epidemiology. 33 (1): 87–94. doi:10.1002/gepi.20360. ISSN 1098-2272. PMC 2700860. PMID 18671250.{{cite journal}}: CS1 maint: PMC format (link)
  16. ^ Greene, Casey S.; Himmelstein, Daniel S.; Nelson, Heather H.; Kelsey, Karl T.; Williams, Scott M.; Andrew, Angeline S.; Karagas, Margaret R.; Moore, Jason H. (2009-10-01). Biocomputing 2010. WORLD SCIENTIFIC. pp. 327–336. doi:10.1142/9789814295291_0035. ISBN 9789814299473.
  17. ^ Greene, Casey S.; Penrod, Nadia M.; Williams, Scott M.; Moore, Jason H. (2009-06-02). "Failure to Replicate a Genetic Association May Provide Important Clues About Genetic Architecture". PLOS ONE. 4 (6): e5639. doi:10.1371/journal.pone.0005639. ISSN 1932-6203. PMC 2685469. PMID 19503614.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  18. ^ Piette, Elizabeth R.; Moore, Jason H. (2017-04-19). "Improving the Reproducibility of Genetic Association Results Using Genotype Resampling Methods". Applications of Evolutionary Computation. Springer, Cham: 96–108. doi:10.1007/978-3-319-55849-3_7.
  19. ^ Moore, Jason H.; Hu, Ting (2015-01-01). "Epistasis analysis using information theory". Methods in Molecular Biology (Clifton, N.J.). 1253: 257–268. doi:10.1007/978-1-4939-2155-3_13. ISSN 1940-6029. PMID 25403536.
  20. ^ Kim, Nora Chung; Andrews, Peter C.; Asselbergs, Folkert W.; Frost, H. Robert; Williams, Scott M.; Harris, Brent T.; Read, Cynthia; Askland, Kathleen D.; Moore, Jason H. (2012-07-28). "Gene ontology analysis of pairwise genetic associations in two genome-wide studies of sporadic ALS". BioData Mining. 5 (1): 9. doi:10.1186/1756-0381-5-9. ISSN 1756-0381. PMC 3463436. PMID 22839596.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  21. ^ Cheng, Samantha; Andrew, Angeline S.; Andrews, Peter C.; Moore, Jason H. (2016-01-01). "Complex systems analysis of bladder cancer susceptibility reveals a role for decarboxylase activity in two genome-wide association studies". BioData Mining. 9: 40. doi:10.1186/s13040-016-0119-z. PMC 5154053. PMID 27999618.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  22. ^ Gola, Damian; Mahachie John, Jestinah M.; van Steen, Kristel; König, Inke R. (2016-03-01). "A roadmap to multifactor dimensionality reduction methods". Briefings in Bioinformatics. 17 (2): 293–308. doi:10.1093/bib/bbv038. ISSN 1477-4054. PMC 4793893. PMID 26108231.{{cite journal}}: CS1 maint: PMC format (link)
  23. ^ Martin, E. R.; Ritchie, M. D.; Hahn, L.; Kang, S.; Moore, J. H. (2006-02-01). "A novel method to identify gene-gene effects in nuclear families: the MDR-PDT". Genetic Epidemiology. 30 (2): 111–123. doi:10.1002/gepi.20128. ISSN 0741-0395. PMID 16374833.
  24. ^ Lou, Xiang-Yang; Chen, Guo-Bo; Yan, Lei; Ma, Jennie Z.; Mangold, Jamie E.; Zhu, Jun; Elston, Robert C.; Li, Ming D. (2008-10-01). "A combinatorial approach to detecting gene-gene and gene-environment interactions in family studies". American Journal of Human Genetics. 83 (4): 457–467. doi:10.1016/j.ajhg.2008.09.001. ISSN 1537-6605. PMC 2561932. PMID 18834969.{{cite journal}}: CS1 maint: PMC format (link)
  25. ^ Cattaert, Tom; Urrea, Víctor; Naj, Adam C.; De Lobel, Lizzy; De Wit, Vanessa; Fu, Mao; Mahachie John, Jestinah M.; Shen, Haiqing; Calle, M. Luz (2010-04-22). "FAM-MDR: a flexible family-based multifactor dimensionality reduction technique to detect epistasis using related individuals". PloS One. 5 (4): e10304. doi:10.1371/journal.pone.0010304. ISSN 1932-6203. PMC 2858665. PMID 20421984.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  26. ^ Leem, Sangseob; Park, Taesung (2017-03-14). "An empirical fuzzy multifactor dimensionality reduction method for detecting gene-gene interactions". BMC genomics. 18 (Suppl 2): 115. doi:10.1186/s12864-017-3496-x. ISSN 1471-2164. PMC 5374597. PMID 28361694.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  27. ^ Gui, Jiang; Andrew, Angeline S.; Andrews, Peter; Nelson, Heather M.; Kelsey, Karl T.; Karagas, Margaret R.; Moore, Jason H. (2010-01-01). "A simple and computationally efficient sampling approach to covariate adjustment for multifactor dimensionality reduction analysis of epistasis". Human Heredity. 70 (3): 219–225. doi:10.1159/000319175. ISSN 1423-0062. PMC 2982850. PMID 20924193.{{cite journal}}: CS1 maint: PMC format (link)
  28. ^ Gui, Jiang; Moore, Jason H.; Kelsey, Karl T.; Marsit, Carmen J.; Karagas, Margaret R.; Andrew, Angeline S. (2011-01-01). "A novel survival multifactor dimensionality reduction method for detecting gene-gene interactions with application to bladder cancer prognosis". Human Genetics. 129 (1): 101–110. doi:10.1007/s00439-010-0905-5. ISSN 1432-1203. PMC 3255326. PMID 20981448.{{cite journal}}: CS1 maint: PMC format (link)
  29. ^ Lee, Seungyeoun; Son, Donghee; Yu, Wenbao; Park, Taesung (2016-12-01). "Gene-Gene Interaction Analysis for the Accelerated Failure Time Model Using a Unified Model-Based Multifactor Dimensionality Reduction Method". Genomics & Informatics. 14 (4): 166–172. doi:10.5808/GI.2016.14.4.166. ISSN 1598-866X. PMC 5287120. PMID 28154507.{{cite journal}}: CS1 maint: PMC format (link)
  30. ^ Gui, Jiang; Andrew, Angeline S.; Andrews, Peter; Nelson, Heather M.; Kelsey, Karl T.; Karagas, Margaret R.; Moore, Jason H. (2011-01-01). "A robust multifactor dimensionality reduction method for detecting gene-gene interactions with application to the genetic analysis of bladder cancer susceptibility". Annals of Human Genetics. 75 (1): 20–28. doi:10.1111/j.1469-1809.2010.00624.x. ISSN 1469-1809. PMC 3057873. PMID 21091664.{{cite journal}}: CS1 maint: PMC format (link)
  31. ^ Gui, Jiang; Moore, Jason H.; Williams, Scott M.; Andrews, Peter; Hillege, Hans L.; van der Harst, Pim; Navis, Gerjan; Van Gilst, Wiek H.; Asselbergs, Folkert W. (2013-01-01). "A Simple and Computationally Efficient Approach to Multifactor Dimensionality Reduction Analysis of Gene-Gene Interactions for Quantitative Traits". PloS One. 8 (6): e66545. doi:10.1371/journal.pone.0066545. ISSN 1932-6203. PMC 3689797. PMID 23805232.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  32. ^ Lou, Xiang-Yang; Chen, Guo-Bo; Yan, Lei; Ma, Jennie Z.; Zhu, Jun; Elston, Robert C.; Li, Ming D. (2007-06-01). "A generalized combinatorial approach for detecting gene-by-gene and gene-by-environment interactions with application to nicotine dependence". American Journal of Human Genetics. 80 (6): 1125–1137. doi:10.1086/518312. ISSN 0002-9297. PMC 1867100. PMID 17503330.{{cite journal}}: CS1 maint: PMC format (link)
  33. ^ Tsai, Chia-Ti; Lai, Ling-Ping; Lin, Jiunn-Lee; Chiang, Fu-Tien; Hwang, Juey-Jen; Ritchie, Marylyn D.; Moore, Jason H.; Hsu, Kuan-Lih; Tseng, Chuen-Den (2004-04-06). "Renin-Angiotensin System Gene Polymorphisms and Atrial Fibrillation". Circulation. 109 (13): 1640–1646. doi:10.1161/01.CIR.0000124487.36586.26. ISSN 0009-7322. PMID 15023884.
  34. ^ Asselbergs, Folkert W.; Moore, Jason H.; van den Berg, Maarten P.; Rimm, Eric B.; de Boer, Rudolf A.; Dullaart, Robin P.; Navis, Gerjan; van Gilst, Wiek H. (2006-01-01). "A role for CETP TaqIB polymorphism in determining susceptibility to atrial fibrillation: a nested case control study". BMC Medical Genetics. 7: 39. doi:10.1186/1471-2350-7-39. ISSN 1471-2350. PMC 1462991. PMID 16623947.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  35. ^ Ma, D.Q.; Whitehead, P.L.; Menold, M.M.; Martin, E.R.; Ashley-Koch, A.E.; Mei, H.; Ritchie, M.D.; DeLong, G.R.; Abramson, R.K. (2005-09-01). "Identification of Significant Association and Gene-Gene Interaction of GABA Receptor Subunit Genes in Autism". The American Journal of Human Genetics. 77 (3): 377–388. doi:10.1086/433195. ISSN 0002-9297. PMC 1226204. PMID 16080114.{{cite journal}}: CS1 maint: PMC format (link)
  36. ^ Andrew, Angeline S.; Nelson, Heather H.; Kelsey, Karl T.; Moore, Jason H.; Meng, Alexis C.; Casella, Daniel P.; Tosteson, Tor D.; Schned, Alan R.; Karagas, Margaret R. (2006-05-01). "Concordance of multiple analytical approaches demonstrates a complex relationship between DNA repair gene SNPs, smoking and bladder cancer susceptibility". Carcinogenesis. 27 (5): 1030–1037. doi:10.1093/carcin/bgi284. ISSN 0143-3334.
  37. ^ Andrew, Angeline S.; Karagas, Margaret R.; Nelson, Heather H.; Guarrera, Simonetta; Polidoro, Silvia; Gamberini, Sara; Sacerdote, Carlotta; Moore, Jason H.; Kelsey, Karl T. (2008-01-01). "DNA Repair Polymorphisms Modify Bladder Cancer Risk: A Multi-factor Analytic Strategy". Human Heredity. 65 (2): 105–118. doi:10.1159/000108942. ISSN 0001-5652. PMC 2857629. PMID 17898541.{{cite journal}}: CS1 maint: PMC format (link)
  38. ^ Andrew, Angeline S.; Hu, Ting; Gu, Jian; Gui, Jiang; Ye, Yuanqing; Marsit, Carmen J.; Kelsey, Karl T.; Schned, Alan R.; Tanyos, Sam A. (2012-01-01). "HSD3B and gene-gene interactions in a pathway-based analysis of genetic susceptibility to bladder cancer". PloS One. 7 (12): e51301. doi:10.1371/journal.pone.0051301. ISSN 1932-6203. PMC 3526593. PMID 23284679.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  39. ^ Cao, Jingjing; Luo, Chenglin; Yan, Rui; Peng, Rui; Wang, Kaijuan; Wang, Peng; Ye, Hua; Song, Chunhua (2016-12-01). "rs15869 at miRNA binding site in BRCA2 is associated with breast cancer susceptibility". Medical Oncology. 33 (12): 135. doi:10.1007/s12032-016-0849-2. ISSN 1357-0560.
  40. ^ Williams, Scott M.; Ritchie, Marylyn D.; III, John A. Phillips; Dawson, Elliot; Prince, Melissa; Dzhura, Elvira; Willis, Alecia; Semenya, Amma; Summar, Marshall (2004-01-01). "Multilocus Analysis of Hypertension: A Hierarchical Approach". Human Heredity. 57 (1): 28–38. doi:10.1159/000077387. ISSN 0001-5652.
  41. ^ Sanada, Hironobu; Yatabe, Junichi; Midorikawa, Sanae; Hashimoto, Shigeatsu; Watanabe, Tsuyoshi; Moore, Jason H.; Ritchie, Marylyn D.; Williams, Scott M.; Pezzullo, John C. (2006-03-01). "Single-Nucleotide Polymorphisms for Diagnosis of Salt-Sensitive Hypertension". Clinical Chemistry. 52 (3): 352–360. doi:10.1373/clinchem.2005.059139. ISSN 0009-9147. PMID 16439609.
  42. ^ De, Rishika; Verma, Shefali S.; Holzinger, Emily; Hall, Molly; Burt, Amber; Carrell, David S.; Crosslin, David R.; Jarvik, Gail P.; Kuivaniemi, Helena (2017-02-01). "Identifying gene-gene interactions that are highly associated with four quantitative lipid traits across multiple cohorts". Human Genetics. 136 (2): 165–178. doi:10.1007/s00439-016-1738-7. ISSN 1432-1203. PMID 27848076.
  43. ^ De, Rishika; Verma, Shefali S.; Drenos, Fotios; Holzinger, Emily R.; Holmes, Michael V.; Hall, Molly A.; Crosslin, David R.; Carrell, David S.; Hakonarson, Hakon (2015-01-01). "Identifying gene-gene interactions that are highly associated with Body Mass Index using Quantitative Multifactor Dimensionality Reduction (QMDR)". BioData Mining. 8: 41. doi:10.1186/s13040-015-0074-0. PMC 4678717. PMID 26674805.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  44. ^ Duell, Eric J.; Bracci, Paige M.; Moore, Jason H.; Burk, Robert D.; Kelsey, Karl T.; Holly, Elizabeth A. (2008-06-01). "Detecting pathway-based gene-gene and gene-environment interactions in pancreatic cancer". Cancer Epidemiology, Biomarkers & Prevention: A Publication of the American Association for Cancer Research, Cosponsored by the American Society of Preventive Oncology. 17 (6): 1470–1479. doi:10.1158/1055-9965.EPI-07-2797. ISSN 1055-9965. PMC 4410856. PMID 18559563.{{cite journal}}: CS1 maint: PMC format (link)
  45. ^ Xu, Jianfeng; Lowey, James; Wiklund, Fredrik; Sun, Jielin; Lindmark, Fredrik; Hsu, Fang-Chi; Dimitrov, Latchezar; Chang, Baoli; Turner, Aubrey R. (2005-11-01). "The Interaction of Four Genes in the Inflammation Pathway Significantly Predicts Prostate Cancer Risk". Cancer Epidemiology and Prevention Biomarkers. 14 (11): 2563–2568. doi:10.1158/1055-9965.EPI-05-0356. ISSN 1055-9965. PMID 16284379.
  46. ^ Lavender, Nicole A.; Rogers, Erica N.; Yeyeodu, Susan; Rudd, James; Hu, Ting; Zhang, Jie; Brock, Guy N.; Kimbro, Kevin S.; Moore, Jason H. (2012-04-30). "Interaction among apoptosis-associated sequence variants and joint effects on aggressive prostate cancer". BMC medical genomics. 5: 11. doi:10.1186/1755-8794-5-11. ISSN 1755-8794. PMC 3355002. PMID 22546513.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  47. ^ Lavender, Nicole A.; Benford, Marnita L.; VanCleave, Tiva T.; Brock, Guy N.; Kittles, Rick A.; Moore, Jason H.; Hein, David W.; Kidd, La Creis R. (2009-11-16). "Examination of polymorphic glutathione S-transferase (GST) genes, tobacco smoking and prostate cancer risk among men of African descent: a case-control study". BMC cancer. 9: 397. doi:10.1186/1471-2407-9-397. ISSN 1471-2407. PMC 2783040. PMID 19917083.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  48. ^ Collins, Ryan L.; Hu, Ting; Wejse, Christian; Sirugo, Giorgio; Williams, Scott M.; Moore, Jason H. (2013-02-18). "Multifactor dimensionality reduction reveals a three-locus epistatic interaction associated with susceptibility to pulmonary tuberculosis". BioData Mining. 6 (1): 4. doi:10.1186/1756-0381-6-4. PMC 3618340. PMID 23418869.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  49. ^ Wilke, Russell A.; Reif, David M.; Moore, Jason H. (2005-11-01). "Combinatorial Pharmacogenetics". Nature Reviews Drug Discovery. 4 (11): 911–918. doi:10.1038/nrd1874. ISSN 1474-1776.
  50. ^ Moore, Jason H.; Asselbergs, Folkert W.; Williams, Scott M. (2010-02-15). "Bioinformatics challenges for genome-wide association studies". Bioinformatics (Oxford, England). 26 (4): 445–455. doi:10.1093/bioinformatics/btp713. ISSN 1367-4811. PMC 2820680. PMID 20053841.{{cite journal}}: CS1 maint: PMC format (link)
  51. ^ Sun, Xiangqing; Lu, Qing; Mukherjee, Shubhabrata; Mukheerjee, Shubhabrata; Crane, Paul K.; Elston, Robert; Ritchie, Marylyn D. (2014-01-01). "Analysis pipeline for the epistasis search - statistical versus biological filtering". Frontiers in Genetics. 5: 106. doi:10.3389/fgene.2014.00106. PMC 4012196. PMID 24817878.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  52. ^ Pendergrass, Sarah A.; Frase, Alex; Wallace, John; Wolfe, Daniel; Katiyar, Neerja; Moore, Carrie; Ritchie, Marylyn D. (2013-12-30). "Genomic analyses with biofilter 2.0: knowledge driven filtering, annotation, and model development". BioData Mining. 6 (1): 25. doi:10.1186/1756-0381-6-25. PMC 3917600. PMID 24378202.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  53. ^ Moore, Jason H. (2015-01-01). "Epistasis analysis using ReliefF". Methods in Molecular Biology (Clifton, N.J.). 1253: 315–325. doi:10.1007/978-1-4939-2155-3_17. ISSN 1940-6029. PMID 25403540.
  54. ^ Moore, Jason H.; White, Bill C. (2007-01-01). Riolo, Rick; Soule, Terence; Worzel, Bill (eds.). Genetic Programming Theory and Practice IV. Genetic and Evolutionary Computation. Springer US. pp. 11–28. doi:10.1007/978-0-387-49650-4_2. ISBN 9780387333755.
  55. ^ Greene, Casey S.; Sinnott-Armstrong, Nicholas A.; Himmelstein, Daniel S.; Park, Paul J.; Moore, Jason H.; Harris, Brent T. (2010-03-01). "Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS". Bioinformatics (Oxford, England). 26 (5): 694–695. doi:10.1093/bioinformatics/btq009. ISSN 1367-4811. PMC 2828117. PMID 20081222.{{cite journal}}: CS1 maint: PMC format (link)
  56. ^ Bush, William S.; Dudek, Scott M.; Ritchie, Marylyn D. (2006-09-01). "Parallel multifactor dimensionality reduction: a tool for the large-scale analysis of gene-gene interactions". Bioinformatics (Oxford, England). 22 (17): 2173–2174. doi:10.1093/bioinformatics/btl347. ISSN 1367-4811. PMC 4939609. PMID 16809395.{{cite journal}}: CS1 maint: PMC format (link)
  57. ^ Sinnott-Armstrong, Nicholas A.; Greene, Casey S.; Cancare, Fabio; Moore, Jason H. (2009-07-24). "Accelerating epistasis analysis in human genetics with consumer graphics hardware". BMC research notes. 2: 149. doi:10.1186/1756-0500-2-149. ISSN 1756-0500. PMC 2732631. PMID 19630950.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  58. ^ Winham, Stacey J.; Motsinger-Reif, Alison A. (2011-08-16). "An R package implementation of multifactor dimensionality reduction". BioData Mining. 4 (1): 24. doi:10.1186/1756-0381-4-24. ISSN 1756-0381. PMC 3177775. PMID 21846375.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  59. ^ Calle, M. Luz; Urrea, Víctor; Malats, Núria; Van Steen, Kristel (2010-09-01). "mbmdr: an R package for exploring gene-gene interactions associated with binary or quantitative traits". Bioinformatics (Oxford, England). 26 (17): 2198–2199. doi:10.1093/bioinformatics/btq352. ISSN 1367-4811. PMID 20595460.