Talk:Genome-wide association study

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Good article Genome-wide association study has been listed as one of the Natural sciences good articles under the good article criteria. If you can improve it further, please do so. If it no longer meets these criteria, you can reassess it.
WikiProject Genetics (Rated GA-class, Top-importance)
WikiProject icon This article is within the scope of WikiProject Genetics, a collaborative effort to improve the coverage of Genetics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 GA  This article has been rated as GA-Class on the project's quality scale.
 Top  This article has been rated as Top-importance on the project's importance scale.
WikiProject Computational Biology (Rated GA-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Computational Biology, a collaborative effort to improve the coverage of Computational Biology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 GA  This article has been rated as GA-Class on the quality scale.
 High  This article has been rated as High-importance on the importance scale.

Difference between humans[edit]

Note: the familiar statement that humans differ from each other by only about 0.1 % seems inaccurate because the two haplotypes of a diploid human genome differ by about 0.5 %, and that was shown for an individual with European ancestry through both parents (PMID: 17803354). Differences might be greater with greater geographical variation in ancestry. Daniel haft (talk) 19:39, 22 April 2009 (UTC)

The statement seems correct for SNPs only and is also true for comparisons between a Han Chinese and the two European genomes (Nature 456, 60-65 (6 November 2008) | doi:10.1038/nature07484, see supplementary figure 2)

while the 0.5% is a more complex measure (all heterozygous bases, i.e., SNP, multi-nucleotide polymorphisms [MNP], indels, [complex variants + putative alternate alleles + CNV]/genome size; [2,894,929 + 939,799 + 10,000,000]/2,809,547,336) (talk) 13:14, 20 July 2009 (UTC)

This might be the reason for the difference. --hroest 07:40, 18 May 2010 (UTC)

Hypothesis free[edit]

I'm concerned about the statement that GWAS are "hypothesis-free." It's quite the opposite: every SNP tested is a hypothesis, and any analysis must be subject to corrections for multiple-hypothesis testing. —Preceding unsigned comment added by Uri.laserson (talkcontribs) 05:01, 15 October 2009 (UTC)

They are hypothesis free compared to a single gene study --- there is no a priori hypothesis, which part of the genome might be involved and all are tested. This does not mean that there are no hypothesis that are tested but rather that a flat prior is assumed. Maybe reformulation might alleviate that...--hroest 07:40, 18 May 2010 (UTC)
I'm concerned by the statement as well, and I'll change it. I have heard "hypothesis-free" being used before, but it is not the right and it is not a commonly used. There is very much a hypothesis behind, namely that testing all SNPs will reveal which are associated. Correct words would be "unbiased" vs "candidate-driven". --LasseFolkersen (talk) 20:01, 25 October 2011 (UTC)

Claim that noncoding are almost certainly NOT causal[edit]

I reverted the recent edit that claims that noncoding are almost certainly NOT causal. That is not true. I agree that most GWAS hits still are not complete clear on which particular SNP in a block is the actual causal variant, but for the ones where it is known (e.g. SORT1-locus) it is noncoding SNPs that are ultimately causal. --LasseFolkersen (talk) 09:31, 22 November 2011 (UTC)

Under construction[edit]

I'm starting up a major revision of this article. So far I expanded the backgrounds and methods. Will get back to "genes identified" and "limitations". Help and suggestions welcome!--LasseFolkersen (talk) 14:40, 6 December 2011 (UTC)

I have tried to keep all the good bits that were already in the text (even though I rewrote some of them). One exception however: "...these [ARMD] genes can predict half the risk of ARMD between siblings, and it is among the most successful examples of GWAS.". I think this sentence is highly useful, but the wording of prediction of half the risk between siblings is a bit confusing in comparison to the more commonly used proportion of heritability explained (which I also use later on). So apologies for removing a good bit - I just think the article will be less confusing without it --LasseFolkersen (talk) 08:38, 7 December 2011 (UTC)

Ok, I'm pretty much done with the expansions I planned. Only need to clear up the citations. CitationBot has problems, but I'll try it later again. Then some grammar and prose check-ups, and then I'll have a couple of colleagues read it through. Other comments welcome! --LasseFolkersen (talk) 12:16, 7 December 2011 (UTC)

Peer review updates[edit]

I now had time to go through the reviews as found in the request for peer review page. The changes to the main text have already been comitted, and here is the point-by-point response.

From User:Cryptic C62

1) Except for the introduction sentence, all "GWAS" have now been replaced with GWA study/ies. This choice was made for easier pluralization.

2) "From 2005" was changed to "published in 2005"

3) Today was replaced with the As of year template, as this is definetly a statement that will age quickly.

4) Surprisingly was deleted

5) I have added the following summary of limitations section to the lead section "Several GWA studies have received criticism for omitting important quality control steps, rendering the findings invalid. Largely this type of criticism can and is overcome in more modern publications. However, the fundamental methodology still have opponents."

From User:The Rambling Man

1) ok, I have changed the caption to follow the correct way of captioning things here.

2) ok, I have removed the full stop after the caption

3) ok, the User:Cryptic C62 suggestions have been included

4) I think anybody who reads an article called genome-wide association study will think less of it, if it expands DNA to deoxyribonucleic acid. "DNA" is so routinely used that the whole spectrum from science journals to popular newspaper articles assumes knowledge of it and leaves it out of acronym lists.

5) The SNP acronym is already expanded to single-nucleotide polymorphisms at first occurence. Could you perhaps be more specific on where you are aiming.

6) ok - I use the Schena 1995 reference because that's the landmark microarray publication.

7) you might be right that "unity" is a more formal way. I didn't change it now though, because my aim was to loose as few readers to formality as possible. Feel free to change yourself if you think it is important.

8) ok. I have moved it. It's a pity that the article doesn't have a lead figure though.

9) ok, I have now used the correct format for dashes.

That should solve all the comments in this peer review. The only thing I miss in this article now is a good lead picture. Reviewer 2 asked for the previous lead, the manhattan plot, to be moved to the methods sections and I can't think of a better picture to illustrate GWAS. Suggestion, anyone? --LasseFolkersen (talk) 15:28, 21 December 2011 (UTC)

Late peer-review related comments[edit]

Apologies for the lateness, but I was invited to contribute to the peer review through my talkpage.


  • I think the first sentence needs rewording; specifically "to see if any variant is associated with a trait" is unclear.
  • I also don't think genetic variant should link to SNP, it's not the only possible genetic variant (as the dab shows) and if we're talking about SNPs it should just say SNP; it seems SNP GWAS are the focus of the rest of the lead. Admittedly other genetic variants can be used for GWAS so I suggest either a more general reference to genetic variation or removing the link... or we decide the entire article is about SNP GWAS, with a brief mention to the alternatives.
  • The lead is pretty hefty. I think it could be cut down, for example the details of the first GWAS (number of case/controls, outcome) aren't needed here. The lead should summarise the main article, so this detail should have been repeated in Results anyway (I suggest it's just moved there).
  • The second lead paragraph is also bloated, sentences like "The results are then read into computers, where they can be analyzed" could be removed.
  • Couple of word repeats in: "...genetic variants ... if any variant is..."; "Typically ... and typically...".

Body: (Prelim)

  • Has their been any debate on use (which I note is consistent) of GWA studies? Personally, I would prefer GWAS, that's the term I'm most familiar with, but if this has been decided by prior consensus then ignore me by all means.
    • edit: I should have read the above comments more closely!
  • In 'Background', the sentence "In addition to the conceptual framework other enabling factors made GWA studies possible" could be reworded for clarity, or preferably merged into the following sentence as it is being used as a verbose conjunction.
  • I'm not convinced WikiGWA deserves a mention in 'Results', if it is to be kept it needs a third-party source.
  • In 'Clinical applications', the final paragraph is of a dubiously-notable specific example published in Bioessays, I suggest it is removed.
  • "SNPs associated with diseases are currently numbered in the thousands" requires a citation or should be removed, presumably some of these associations are exceptionally weak so if it remains it should be further qualified.
  • The limitation section has wording issues, for example: "Although these issues can be taken care of, it is not always done" sounds like synthesis and adds little. The following sentence also needs attention. They both could be removed and the citation kept with the previous statement.
  • There's also a quote without a citation directly following it (I've marked as such).

Overall, the citations are mostly present when needed (I have not investigated the references themselves yet), the blue-linking is about right and the language is not overly technical. My criticisms are that sometimes the language could be tighter and I feel this article is missing some content, I suggest there are additional examples which convey the power and findings of these studies (failing that a CC-licensed publisher could add some nice diagrams, e.g. PLoS Genet.). Jebus989 23:43, 4 January 2012 (UTC)

Point-by-point response for lead:

1) Could you come with more suggestions on what is unclear? Is it the use of the word trait? Or association? Instead of trait I could use "property" or "characteristic" or something like that. Association is more difficult to get around, but not impossible if I just use more words. How about "(WGA study, or WGAS), is an examination of many common genetic variants in different individuals to see if they are more or less common depending on the characteristics of the individuals."

2) Changed genetic variants to point to genetic variants (thanks for pointing out). About the focus of the article on SNPs and/or "other variant types" (CNVs, etc) - a CNV study is also a GWA study, but I the vast majority of all GWA studies are on SNPs. So I kept focus on that, while still mentioning the more general genetic variants. Hence "Typically SNPs are investigated".

3) Sure. I moved the sample size info to results.

4) Ok. I removed a lot of it. The bits about risk-SNPs being outside coding regions were not even in the main text so I removed it also. I could probably fill in a little about that in the main later. TODO

5) The variants-variants is the same sentence you comment on in 1). See there. The typical-typical... I would argue for keeping it this way. The reason is that it is not only SNPs and it is not only majory diseases. Only the vast majority of studies are. But if we don't write it with a "typical"-disclaimer all the people in the non-major-disease/other-phenotype/CNV groups will conclude that the text is wrong.

Point-by-point response for prelim body:

1) yes, 'GWAS' is a bit more widely used historically but I feel this is turning towards GWA study/ies, because it is so much easier for the writer to indicate pluralization. And so it also makes for more readable wikipedia article.

2) I think I understand what you mean - how about "In addition to the conceptual framework, the following technological factors enabled GWA studies: One was the advent...." then it is more connected, and by writing technological it's emphasized that it is requirements apart from just the idea to test for many associations. Or did I understand this comment correctly?

3) you are right - it is not very noteworthy yet. I think their manuscript for it must still be in review (I'm not associated with that group by the way). In contrast the most noteworthy attempt (the PNAS article reference, with is just so horribly in lack of updates that it seems wrong to let it stand as the sole source. How would you think about just removing the web-page reference but leaving the sentence "however, more updated resources might be found...".

4)I'll extend this SORT1 example rather than leave it. The bioessays-article was just a review covering all the research on it, which is much more heavy-weight. I'll do better and expand that, because I think this is one of the most notable examples at least of cardiovascular GWA studies. Certainly more than the Peginterferon-link which was here previously, but which I've heard very little about outside of wikipedia. TODO

5) Ok I added the reference from the lead that was already used for the same. For now. You are undoubtly right that several of these are of less than strong significance, but finding out exactly how many is probably original research.

6) I reworded it to "Ignoring these correctible issues has been cited as contributing to a general sense of problems with the GWA methodology", because I think it is important comments from some of the heavy weighters in the field (Joe Pickrell etc. in the reference). Did this address your concern?

7) I don't know. This factory-science-sentence was here before I started so I left it, to not upset previous authors. I'll just contract it into the cousin-frankel reference which was definetly a verifiable and much discussed opionion piece.

That's it for now. I know I left some to-do loose ends, but it's getting late so I'll get back to them later. --LasseFolkersen (talk) 16:40, 5 January 2012 (UTC)

To reply to 1), I meant it in the sense that it could make GWAS seem like a genome-wide functional annotation exercise, as if we're looking to assign any variant any function. I am unable to suggest a clearer sentence, however, and it's a minor criticism which is cleared up by the following sentence. I think everything else I agree with, good work! Jebus989 09:53, 9 January 2012 (UTC)
Ok, now I have also cleared up the few to-do's I had postponed and added some PLoS genetics figures that looks a bit more professional. Also gave a custom-made figure to explain the methods a try. I'm not sure if it adds anything though --LasseFolkersen (talk) 10:08, 16 January 2012 (UTC)

Undo of first GWAS study[edit]

After some research I reverted the changes on first GWA study to the 2005 CFH study by Klein et al. In addition I added "the first successful GWA study". As far as I could see, the results from earlier studies such as the one by Ozaki et al were never reproduced successfully.--LasseFolkersen (talk) 11:17, 7 August 2012 (UTC)

Thanks for explaining. Blue Rasberry (talk) 12:49, 7 August 2012 (UTC)

GA Review[edit]


for dead URLs

This review is transcluded from Talk:Genome-wide association study/GA1. The edit link for this section can be used to add comments to the review.

Reviewer: Estevezj (talk · contribs) 02:48, 9 January 2013 (UTC)

I've started my first read-through and will update this review as I proceed. — James Estevez (talk) 02:48, 9 January 2013 (UTC)



  1. Well-written:
    Criteria Notes Result
    (a) (prose)
    1. Background... OK
      1. OK: "In addition to the conceptual framework...": Fine as is, but might be helpful to rephrase to clarify whether you mean other factors besides the conceptual framework or other factors when combined with a conceptual framework.
    2. Methods... OK
    3. Results... OK
      1. OK: "Wellcome Trust Case Control Consortium... was the to date[when?] largest GWA study": Please clarify: largest as of 2007, or to the present day?
    4. Clinical applications... OK
    5. Limitations... OK
      1. OK: "In addition to these preventable issues...": I attempted to rephrase this, but further clarification as to the nature of the criticism would be helpful. Is the criticism of GWAS of the fundamental approach of looking at SNPs, or its hypothesis-free (sort of) approach, or lack of power?
      2. OK: "It can be discussed if...": My personal preference would be to drop this sentence, but its nevertheless acceptable under the GAC.
    Pass Pass
    (b) (MoS) The article complies with the five WP:MOS sections required by the GAC. Pass Pass
  2. Verifiable with no original research:
    Criteria Notes Result
    (a) (references) The References section contains a properly formatted list of references used in the article.[7] Pass Pass
    (b) (citations to reliable sources)
    1. Background... OK
    2. Methods... OK
      1. Hold: [citation needed] in 1st paragraph.
    3. Results... OK
      1. OK: [citation needed] at beginning of third paragraph.
    4. Clinical applications... OK
    5. Limitations...
      1. OK: Reference 42 is acceptable under WP:BLOGS.[8]
      2. OK: [citation needed] in 2nd paragraph.
    6. See also... OK
    7. References... OK
    Pass Pass
    (c) (original research) Besides the tagged exceptions, statments in the article are supported by the sources and contain no original research. Pass Pass
  3. Broad in its coverage:
    Criteria Notes Result
    (a) (major aspects) The article offers a good overview of the main aspects of GWAS. Pass Pass
    (b) (focused) The article is concise and adheres to summary style. Pass Pass
  4. Neutral: it represents viewpoints fairly and without bias, giving due weight to each.
    Notes Result
    The article fairly represents different significant viewpoints on the topic. Pass Pass
  5. Stable: it does not change significantly from day to day because of an ongoing edit war or content dispute.
    Notes Result
    Stable. Article had fewer than 20 edits in the last six months of 2012. Pass Pass
  6. Illustrated, if possible, by images:
    Criteria Notes Result
    (a) (images are tagged and non-free images have fair use rationales) Images are available under free licenses and are appropriately cited where necessary. Pass Pass
    (b) (appropriate use with suitable captions) The images are appropriate for the article and the captions are accurate and informative.[9] Pass Pass


Result Notes
Pass Pass Fine work on this article. There are minor ambiguities in the prose (please see 1a) and missing references (see 2b), that prevent an immediate pass. Thank you for your contributions both to this article in particular and to Wikipedia in general.


Again, fine work on the article. In the course of the review I've made some minor changes to the article to correct for usage, style, and to add a handful of references. I've tagged several statements that require in-line citations under the Good Article criteria.

Moving forward, as work on the article continues I would like to bring to your attention the article available at PLoS Computational Biology.[10] This article is available under the Wikipedia compatible CC-BY license, meaning that figures and text from it can be incorporated in this article. This resource may prove valuable should editors decide to work towards FA status.

Best regards, — James Estevez (talk) 02:26, 18 January 2013 (UTC)

Working my way slowly through it. Have a bit at work right now, so apologies for the small steps approach - I will be able to make it in the 14 days though. Today I figured the "there have been two general trends" citation need. First I wanted to claim that this was not needed since the previous paragraphs cites a 2005, n=146 study for main disease phenotype (ARM), and the following 4 citations give later studies that are larger or directed towards more narrowly defined phenotypes. But I think I found an ok article for it now (the Ioannidis et al 2009, nature review one). However, particularly the size statement is so self-evident that it's kinda hard to find stated clearly in reviews. I hope "Consortia of investigators are also becoming increasingly popular" should cover it ok. The defined phenotypes is easier, and it's summed up by the "Phenome mapping" mapping section. Also updated the 200 wellcome trust with a slight re-write that should fix the 'when' mark.--LasseFolkersen (talk) 21:08, 20 January 2013 (UTC)
Same way I reviewed it. Anyhow, that's no problem, just let me know if you get slammed and need a bit more time. From what I understand, things that are self-evident to experts getting tagged with {{fact}} is a fairly common point of frustration for people (Wikipedia:Expert retention), but I think that'll suffice. Include it in your next GWAS article and problem solved.☺ (I also moved the WTCCC section around a bit.) — James Estevez (talk) 22:38, 20 January 2013 (UTC)
Haha, yeah - I'll try to get a short sentence into the next publication "by the way, GWA studies are larger today than in 2007"  :-) at least it's hard to argue against. But seriously - no worries. I do understand the need for good sources for this. Got one more citation needed now. That Bush et al that you suggested - its section 6.3 covers the basic P-value thresholds pretty well. Will look at the last citation needed and prose tomorrow--LasseFolkersen (talk) 19:58, 21 January 2013 (UTC)
Ok, I figured out the last citation missing and also re-wrote some of that paragraph to address your prose hold. I also removed those extra reviews as by your note 7 - there's plenty of good reviews in the text already. What about the "external links"? I removed some of them now, leaving only the ones I had heard about as main resources often used. For your point about ref-previously-known-as-42/the MacArthur blog, I'll make a note of looking for articles for it later. I'm not sure that one ever came up in official review afterwards, but I'll change it if I find something. Also I wanted to ask you about note 9 - Do you suggest that I remove the horizontal dashed lines, or that I try to explain them in caption? It can't be much more than "the top dashed lines represents the chosen cutoff for significance at p < 5×10^−8, since that's as much as I could get from the article myself. I have no clue why they put in a line at P=10^-5 (and frankly I think it's quite arbitrary). --LasseFolkersen (talk) 21:04, 21 January 2013 (UTC)
External links are beyond the scope of the GAC. That said, the guideline (WP:EL) goes into detail, but I see no problem with it as it currently exists. I've reconsidered my note for the lead figure: I think that further detail would probably be better placed in the Manhattan plot (the same image is used to illustrate both articles) caption or in the description. If you choose to add it here I suggest dropping the final sentence of the caption ("This example is...") and then tacking on "the top dashed lines represents the chosen cutoff for significance at p < 5.0×10−8" or some concise variation. In any event, I've passed the article and everything should update once the bot comes through. Thanks again for your work on this. — James Estevez (talk) 22:10, 22 January 2013 (UTC)

Additional Notes[edit]

  1. ^ Compliance with other aspects of the Manual of Style, or the Manual of Style mainpage or subpages of the guides listed, is not required for good articles.
  2. ^ Either parenthetical references or footnotes can be used for in-line citations, but not both in the same article.
  3. ^ This requirement is significantly weaker than the "comprehensiveness" required of featured articles; it allows shorter articles, articles that do not cover every major fact or detail, and overviews of large topics.
  4. ^ Vandalism reversions, proposals to split or merge content, good faith improvements to the page (such as copy editing), and changes based on reviewers' suggestions do not apply. Nominations for articles that are unstable because of unconstructive editing should be placed on hold.
  5. ^ Other media, such as video and sound clips, are also covered by this criterion.
  6. ^ The presence of images is not, in itself, a requirement. However, if images (or other media) with acceptable copyright status are appropriate and readily available, then some such images should be provided.
  7. ^ NB: The references given in the "Reviews" subsection are acceptable (per WP:GENREF), but are likely to draw opposition during any future WP:FAC.
  8. ^ NB: I would expect this to be an issue at WP:FAC.
  9. ^ NB: I would suggest updating the caption of the lead article to note what the dashed lines represent. These are presumably significance levels, but the original figure caption at PLOS Genetics isn't clear. This issue doesn't merit a hold, but it is something editors may wish to address in the future.
  10. ^ Bush, W. S.; Moore, J. H. (2012). "Chapter 11: Genome-Wide Association Studies". In Lewitter, Fran; Kann, Maricel. PLoS Computational Biology 8 (12): e1002822. doi:10.1371/journal.pcbi.1002822. PMC 3531285. PMID 23300413.  edit