Wikipedia:How to use primary sources (biological sciences)

From Wikipedia, the free encyclopedia
Jump to: navigation, search

The primary biological literature consists of all the publications by individual researchers describing their own work. Among biologists, the ability to read, interpret, and criticize primary literature is a highly regarded skill that graduate students practice during "journal clubs" as coursework and within individual research labs. Understandably, Wikipedia policies express some hesitation regarding the ability of editors to make these judgments. Specifically, Wikipedia policy (WP:PRIMARY) directs editors, "All interpretive claims, analyses, or synthetic claims about primary sources must be referenced to a secondary source, rather than original analysis of the primary-source material by Wikipedia editors." Even so, Wikipedia editors should find that primary biological sources have a great deal to offer.

Reasons to use the primary literature[edit]

  • Speed. With the exception of "News and Views" articles that sometimes accompany reports in top-level journals, there are rarely any secondary sources that give a very detailed description of a freshly published result.
  • Open access. Biologists have partially transitioned to open access publishing. An article that Wikipedia readers and editors can read for themselves is worth much more than one which is available only to subscribers. Unfortunately, the secondary literature lags behind in this process. The NIH open access policy does not apply to publications which are not peer-reviewed journal articles. Although the NIH open access policy now requires review articles to be deposited in PubMed Central,[1] this was not formerly the case.[2]
  • Clarity. A secondary source may summarize many experiments that prove a general point. While this in many cases is most desirable, occasionally it is desirable to document a particular experiment or study. For example, it is nice to provide readers with a direct link to the original publication of a major advance, and sometimes also to provide some detail from this source describing how the study was done.

Recognizing primary literature[edit]

It is commonly said that biological journals are primary literature,[3] but the status of certain information by Wikipedia standards may vary.

Distinguished from secondary literature[edit]

Review articles form a portion of many journals, but are clearly secondary sources. (see WP:evaluating sources) Prominent journals such as Nature and Science may accompany one or more primary publications of research results with a companion "news and views" article that discusses these results as secondary literature. The distinction between primary and secondary literature in biological journals is not really very large: almost any research article includes an Introduction which summarizes previous work somewhat after the fashion of a review article, and sometimes review articles are written by the same people who published much of the underlying research. A more important distinction for scientific purposes is that usually these journal articles, whether primary or secondary, are subject to peer review. While Wikipedia guidelines generally encourage the use of sources far removed from the original research, for scientific articles this isn't always the case. Providing peer reviewed sources to supplement or supplant tertiary textbook and encyclopedia references can help to improve the credibility of an article.

Distinguished from self-published literature[edit]

Some journals encourage researchers to accompany primary research articles with a large volume of "supplementary data". This data is not part of the main research paper, but is downloaded from the publisher's Web site separately. In some cases, especially among the most highly competitive journals, this supplementary data may originally have been submitted as part of the paper, but needed to be cut out to bring the article down to a certain size. In others, it may simply be extra information offered by the researchers, or link to a research Web site that may or may not continue to change over time. Therefore, it may not be obvious whether or not the supplemental data has actually undergone peer review, though the general qualification of the authors is not at issue. Editors should consider the advice of WP:SPS: "if the information in question is really worth reporting, someone else is likely to have done so." In this case, if the supplemental data is as important as you believe, it probably should be described somewhere in the parent article.

Self-published research also includes public announcements at scientific meetings or on research Web sites. In this case it may be useful to cite a media report to make the importance of the announcement clear to other editors.

Finding the primary publication from a news article[edit]

A very common reason to look for primary literature for Wikipedia is that a breakthrough result has just been reported in the news, and you need more information. Unfortunately, newspapers sometimes print only the name of the university at which research was done, making it difficult to find the original reference. To find the primary article:

  • Track backward from the news article you encounter to earlier versions. Typically the first publication of a news article is the best, and subsequent versions are truncated or rewritten with less information. One way to do this is to use Yahoo News to find all news articles that use a name or a distinct quoted phrase. Sort these articles by date rather than relevance, and choose the last page of search results. The oldest articles in this list should at least mention the name of the newspaper that first broke the news story. Ideally, you will find the names of one or more of the authors of the scientific study or the journal in which it was published.
  • Search directly for a reference in PubMed. If you have an article that mentions the researchers by name, rather than solely by their university affiliation, it becomes practical to search NCBI [2] Select "PubMed" from the drop-down menu and enter the author's last name followed by his first initial only. Unfortunately, this limitation on names has a long history in biological publishing, making searches for certain common names such as Singh quite difficult. But the most common problem with these searches is that there is a substantial lag between the time an article is published and when it is indexed at NCBI, sometimes the better part of a month, making it unlikely that this approach will work in time for a Wikinews article.
  • Search directly at the publisher. If the article included the name of the journal, you can locate its web site and search for the article directly. Be sure to look for "advance online publications" which may not be included in the main journal index.
  • Search at scientific news sites such as Eurekalert[3] and Nature News[4]. Results that have come out within the past few days should be easy to find, but older results may not be accessible.
  • Search recent press releases at the university credited with the publication. These are usually fairly simple to access from the main university web page.[4]

Structure of the research article[edit]

Biological research articles generally follow a stereotyped format with abstract, introduction, materials and methods, results, discussion, and references sections, in that order. The sections for results and discussion may be combined.

  • The Abstract contains the vast majority of statements you should be using on Wikipedia. These few sentences are composed carefully by researchers who know that often they will be all that a colleague actually reads of their work. The abstract is nearly always available publicly at NCBI PubMed as described above, making it much more accessible to the Wikipedia audience than the remainder of the copyrighted journal article. In clinical literature, separate sections for the hypothesis and conclusions of the study are likewise a useful place to mine for important general statements and quotes.
  • The Introduction is a bit like a miniature review article. However, statements from the Introduction should be used cautiously. It is not uncommon for researchers to somewhat skew the emphasis of previous work to highlight the plausibility and fundworthiness of their research. It is also quite common for an author to mention ideas with which he does not agree in the introduction, and to break with them only later in the text. Even so, the introduction is a useful minireview, and can be an openly accessible secondary source or a way to direct readers to several additional sources about a relatively minor point. When used in this way, it is best to add an explanation to your reference, such as adding "Reviewed in" before your citation template.
  • The Materials and Methods are technical descriptions of what went in to making the study. Sometimes these provide interesting detail for readers – how many kilograms of liver or how many thousands of dissected silk glands it took to isolate a protein.
  • The Results section offers an opportunity to spice up what can be appallingly vague scientific language, but needs to be used with great care. Abstracts sometimes omit specific numbers, giving only vague statements that some quantity has significantly changed. Wikipedia has different priorities from the original journal – we do not assume our audience is made up of biologists who can easily go back to the original data. When possible, we would like to make clear statements, even if we must approximate to a small degree to do so.

    As Wikipedia guidelines advise, it is usually a bad idea to try to make sense of these numbers in a way that the authors did not. For example, a paper may offer numbers demonstrating that drug A has an IC50 that is five times less than drug B. But can you say that drug A is "five times as potent" as drug B? What if it has partial agonist activity, and so can never fully block a process that drug B blocks readily at high concentrations? True, in some circumstances the temptation may be irresistible, and Wikipedia guidelines are not absolute (see WP:IAR). You should always try to make the best article you can, but if you need to break with standard practice to do so, it would be best to state this in your edit summary and/or on the article talk page.

Finding illustrations[edit]

Figures and tables are ordinarily of great importance to primary journal articles, but for Wikipedia their use is limited. Because figures in most journal articles are copyrighted, they often cannot be copied directly. Editors also generally should not try to interpret the figures beyond what is written about them in the text.

If a Wikipedia article would benefit from added figures, it may be best to begin by choosing a public access journal such as a PLOS or BMC journal, or a public database such as PubMed Central, and search for relevant articles within that source only. In this way it is possible to rapidly look through figures for desired illustrations.

Chemical compounds are very rarely depicted in biological literature. While it is possible to draw them yourself with programs like Chemdraw, nearly any chemical that you can order commercially or which is biochemically relevant should have an entry at PubChem.[5] Images from this source can be cut and pasted (perhaps cropped) to illustrate articles.

Journal prominence and quality[edit]

The vast majority of scientific journals maintain a basic standard of quality and peer review. Even the most obscure articles are often useful references for specialized study. Nonetheless, there is some variation in quality, and cases like the Sternberg peer review controversy and a similar case in Proteomics[5] illustrate that the process is not perfect.

Much more importantly, there is a great variation in the overall significance of work. One study may be ground-breaking, while another serves only as a confirmation; or one study may use a wider range of techniques to prove a point conclusively, while another can only prove a correlation. For this reason, there is a loose hierarchy of journals. These are often described in terms of impact factors, though as noted in that article, the concept is not without flaws. It probably isn't worth trying to find out how well-known a scientific journal is before citing a fact for Wikipedia, but if there is suspicion or dispute it may help to do a Web search using the name of the journal followed by the words "impact factor". For example, Nature Biotechnology impact factor brings up 22.8, while Biochemical and Biophysical Research Communications brings up 2.749, from their respective publishers' Web sites. The former journal is far more likely to break major discoveries, while the second may provide confirmatory results. But the second journal is more likely to permit an author to describe his techniques and results in full detail.

No journal, however prominent, is immune from fraud. Nor can anyone guarantee that a paper can be verified, and in practice logical flaws and irreproducible results are never ruled out entirely. Fortunately, Wikipedians don't need to consider all of these contingencies, but can confidently enter information that is verifiable, regardless of whether it is ultimately true. Someone with different sources will turn up to explain things sooner or later.


  1. ^ "IADR-AADR submitted response to the NIH". 
  2. ^ Peter Suber (2005). "NIH public access policy FAQ". 
  3. ^ "Harding University: Primary Biological Literature". 
  4. ^ For example, Yale's news releases[1] can be accessed as a second-level menu item from the front page, followed by a choice of a few options.
  5. ^ James Randerson (2009-02-13). "How was this paper ever published?". The Guardian.