Talk:Genetic code

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Good article Genetic code has been listed as one of the Natural sciences good articles under the good article criteria. If you can improve it further, please do so. If it no longer meets these criteria, you can reassess it.
Article milestones
Date Process Result
September 21, 2006 Good article nominee Listed
March 1, 2010 Good article reassessment Kept
Current status: Good article
WikiProject Molecular and Cell Biology (Rated GA-class, Top-importance)
WikiProject icon This article is within the scope of the WikiProject Molecular and Cell Biology. To participate, visit the WikiProject for more information.
 GA  This article has been rated as GA-Class on the project's quality scale.
 Top  This article has been rated as Top-importance on the project's importance scale.
WikiProject Genetics (Rated GA-class, Top-importance)
WikiProject icon This article is within the scope of WikiProject Genetics, a collaborative effort to improve the coverage of Genetics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 GA  This article has been rated as GA-Class on the project's quality scale.
 Top  This article has been rated as Top-importance on the project's importance scale.
WikiProject Biology (Rated GA-class, Top-importance)
WikiProject icon Genetic code is part of the WikiProject Biology, an effort to build a comprehensive and detailed guide to biology on Wikipedia.
Leave messages on the WikiProject talk page.
 GA  This article has been rated as GA-Class on the project's quality scale.
 Top  This article has been rated as Top-importance on the project's importance scale.

Which DNA strand[edit]

I couldn't find anywhere the answer to this: the DNA consists of 2 strands, which are complementary. Which strand is selected during gene expression? The choice cannot be random, since it would generate different proteins. For example, suppose one strand contains CTC, then the opposite strand will contain GAG. The first one will translate into mRNA GAG and encode for glutamic acid, the other one into CUC and generate Leucine. — Preceding unsigned comment added by (talk) 10:27, 22 September 2011 (UTC)

The gene promoter determines the position and orientation of RNA polymerase, and therefore which strand is transcribed. Adrian J. Hunter(talkcontribs) 10:43, 22 September 2011 (UTC)

2 Tables?[edit]

It seems rather redundant to have both - I undestand the reasons for setting up the table both ways but I don't think it adds much to the article to include the 2nd table. If there are no objection, I'll remove it. Hichris 18:49, 28 November 2006 (UTC)

And then there's the lack of pretty pictures, but I suppose that isn't really correctable :) Chris Cunningham 18:12, 2 October 2006 (UTC)

It's not really redundant. For me (and hopefully for others) this table is a valuable resource that may be used for designing mutagenesis primers when exchanging amino acids by PCR. May I ask you to put it back, please? This message is encrypted! You'll need a brain to decode it. 14:53, 12 January 2007 (UTC)

While maybe useful to some (I do mutagenesis and haven't found any need for both, but thats me) I don't feel it adds to the article. The information is already there. I'm sure you can find the same sort of table in Text book or elsewhere online, so I'd vote no on putting it back. However if there is a lot of support for putting back then you can do so.
I actually like the circular version of the code, which can be read in both directions (see If someone knows of good image like that, I'd be all for replacing the current table. Hichris 16:39, 12 January 2007 (UTC)
I think the inverse table is relevant and I replaced it. The Dutch version of this artikle (click on the interwiki link 'Nederlands') has a circular table. Maybe someone knows how to copy that table to this page? 09:49, 19 January 2007 (UTC)

Here's an alternative presentation, using the IUPAC abbreviations from DNA_sequence:

Alternate inverse table
R=Purine (G or A), Y = Pyrimidine (U or C), N = Any


I just started reading the article so I am hitting points as I go. The age of the genetic code is estimated to be about as old as the earth itself.Eigen M, Lindemann BF, Tietze M, Winkler-Oswatitsch R, Dress A, von Haeseler A.How old is the genetic code? Statistical geometry of tRNA provides an answer. Science. 1989 May 12;244(4905):673-9. PMID: 2497522 [PubMed - indexed for MEDLINE]

The standard genetic code is universal but there are some modification in mitochondria, chloroplast, some organisms like yeast, etc. Oops! its already there.GetAgrippa 22:18, 17 January 2007 (UTC)

Now that I've read the article, kudos to the authors. Excellent article!GetAgrippa 05:46, 21 January 2007 (UTC)

"God did it"[edit] being repeatedly added as an "alternative explanation" of how the genetic code came to be, and the explanation of why it should be added is based on editor bias or supression of alternative viewpoints. So let's figure it out:

  1. Is the page about a scientific topic, and are the other (non-theistic) explanations within the bounds of mainstream science and well-cited by WP standards? yes
  2. Is "God did it" within the bounds of mainstream science? no
  3. Is there any extraordinary evidence given to support this fairly extraordinary/non-mainstream-science claim? no
  4. Are there any citations being provided that support this alternative viewpoint at all? no
  5. Does Gdi it even qualify as a scientific theory? Does not appear so.

Conclusion: does not belong. DMacks 19:57, 30 January 2007 (UTC)

Thanks for the comments DMacks. Your are correct! I have removed it two or three times. I call it vandalism. Well POV pushing for sure, since it is a belief and not a verifiable scientific fact. GetAgrippa 20:23, 30 January 2007 (UTC)

I entirely agree with DMacks. With all my respect, the Bible is not a scientific source. Still there is an option for the supporters of the devine origins of everything. In Cosmology exists something called Anthropic principle. As it says in the dedicated article here in Wikipedia, this is a collective term that attempts to explain the structure of the universe by way of coincidentally balanced features that are necessary and relevant to the existence on Earth of biochemistry, carbon-based life, and eventually human beings to observe such a universe. There is a lot of serious science standing behind it and it is probably the only point at which religion and science get at a shouting distance from each other. I can hardly be more politically balanced.
GGenov 20:34, 30 January 2007 (UTC)
I removed the God did it with references. This is not science as ruled in court cases (contrived dualism argument). This is not NPOV, but POV pushing. I believe there is some scientific credence to the notion of an extraterrestrial origin of life on earth, but this is a horse of a different color. I have noted on this editors Talk that he has a problem with POV pushing and failing to take the rules seriously. GetAgrippa 21:36, 1 February 2007 (UTC)
Frank needs to make a case on the Talk page. His statment is not NPOV, because we would have to include other faiths-God is only the Abrahamic faiths. Further, nowhere in the JudeoChristian bible does it say God (Yahweh, etc.) created the genetic code. The article is about the genetic code. GetAgrippa 22:22, 1 February 2007 (UTC)
No God mention in my last edit, and I retitled the section to say "Scientific theories of the origin of the genetic code". This should be a good compromise. As for vandalism accusations, that violates WP:AGF. NPOV does not equal SPOV, but if you want this to be a science only article then the section heading change is in order. A link to the controversy somewhere should be put in, as it isn't neutral to state only science explanations for the genetic code, but title the section as if it was all inclusive. --Frank Lofaro Jr. 23:50, 1 February 2007 (UTC)
This is a science article. NPOV is per subject not in general. GetAgrippa 00:30, 2 February 2007 (UTC)
The editors on this page might be interested in reading Wikipedia:Requests for arbitration/Pseudoscience. JChap2007 23:38, 8 February 2007 (UTC)


Codon redirects here, but this is not very useful if you more or less know what the genetic code is and are wondering what the heck a codon is. I mean, is it a real physical structure, or is it just a scientific convention? in the first paragraph you get the idea that its a physical structure, later on you learn it can be read from any of three ways. If you chop a strand and have no start/stop sequence do its codons cease to exist? It could use its own article, even if its a short one.Brallan 17:59, 27 March 2007 (UTC)

As a pre-med student, there is a whole bullet-list of things to know about codons. The guy above me wrote this 13 months ago, and a lot of discoveries have happened, and lots of advancements made about genetics and RNA translation. I have not made a new page before. The role of codons, codon mutations, developing HIV treatments that block or interfere with specific codons, start codons, stop codons, anti-codons, directionality of codons comparing RNA and DNA, proteins which splice long nucleic acids at specific codon transitional sequences specific to prokaryotes. I could go into subsections, but trust me, I came to wikipedia because what I don't know, not what I know. Thus there is way more stuff on the topic, and the google stuff takes me to high-school versions of what I need to know. If someone can create the article, I'll edit into it all I know. Sentriclecub (talk) 08:43, 27 April 2008 (UTC)
I'm not sure what the heck you're actually on about, but - there's virtually no way to talk about the genetic code and codons separately. The two are inextricably intertwined, which is why the articles were merged in the first place, lo these many years ago. I suggest you actually read the article; most of the shit you mention above is discussed there. Could the article be improved? Could it explain codons better in the intro? Sure. But separating the two out is dumb. Graft | talk 03:54, 30 April 2008 (UTC)
Thank you for your response. My desire to expand wikipedia articles, and make helpful contributions is met by discouragement at how you have made me feel. I read the wikiproject:mol bio and thought it would be a great place for me to contribute, and you have given me a bad first impression as this is not the type of treatment I would expect from the talkpage of a science article. I would never discourage anyone else, and I always treat people and their ideas with respect. But here, you have respected neither. Good day, and sorry if you think anything from my April 27th post was innapropriate for the talk-page of a mol-bio article. Sentriclecub (talk) 19:15, 2 May 2008 (UTC)
Ergh - I apologize for the above language. It was early in the morning and I wasn't thinking clearly, and after it was pointed out I meant to come and correct it. I didn't mean to belittle your contributions, nor was my language meant to be harsh or dismissive (it was just unfortunately phrased). Please don't let this terrible introduction dissuade you from editing articles! Graft | talk 19:33, 2 May 2008 (UTC)

Internal section link[edit]

I thought it was strange that there was a hyperlink from the Introduction to a section later in the same article (the words "variant codes" linking to the section, "5 Variations to the standard genetic code"). Does anybody else agree? richard.decal (talk) 07:05, 22 June 2010 (UTC)

I don't personally see a problem with it, and it seems to be endorsed at WP:MOSLINK (see the last sentence of Wikipedia:Link#Link_specificity). Wikipedia doesn't have a specific article on variant codes, so the link in the lead helps readers looking for information on variant codes find that information quickly. Adrian J. Hunter(talkcontribs) 08:26, 22 June 2010 (UTC) (ps. New topics go down the bottom) (pps. Good copy edits!).

Merger proposal[edit]

I have written up my issues with Universal genetic code on its talk page. -Madeleine 21:02, 15 April 2007 (UTC)

Didn't even know the other article existed! I say merge 'em, but for the most part keep the bulk of this article and only significant additions from the other Hichris 14:39, 16 April 2007 (UTC)
The other article has a lot of dubious comments, and questionable facts. For instance, it says (rather, said) the discovery of variant codes was surprising, but, in fact, it was actually considered slightly worrying that no variants had been found, as the process, though difficult, should've happened at least a few times. ( Crick, F. H. C. and Orgel, L. E. (1973) "Directed panspermia." Icarus 19:341-346. p. 344: "It is a little surprising that organisms with somewhat different codes do not coexist." several other examples are at [1] Adam Cuerden talk 20:34, 17 April 2007 (UTC)
Support with apprehensions: I agree with the merger but with significant trepidations. PLEASE be sure to preserve the integrity of this article, and use whatever you can from the other article to make this one as great as it can be. This article is incredibly important, and a smooth, flowing article will be great for Wikipedia. It's a big merger! I support it, as long as it's done carefully. WiiAlbanyGirl 01:07, 25 April 2007 (UTC)
I support as well. The other article is making a mountain out of a mole-hill and talks at great length about very little. Frankly I think it's entirely superfluous, and the minimal discussion here suffices. It could be expanded slightly to cover most of the worthwhile content in the other page. Graft | talk 01:35, 25 April 2007 (UTC)
Actually after re-reading, it smells a lot like a POV fork, to me, and I'm wondering whether it might not be worthwhile to simply AfD it. Graft | talk 01:37, 25 April 2007 (UTC)

Scientific theories on the origins?[edit]

I do not agree with the word "scientific". As this is a science related article, any theory provided here should be scientific in the first place. CharonZ 22:22, 25 April 2007 (UTC)

I think you're right, so I went ahead and changed it. -- Madeleine 22:34, 25 April 2007 (UTC)
See above; it was titled so to keep "God did it" from being added. 20:05, 29 May 2007 (UTC)

Alternate representations of the genetic code[edit]

One of the "arguments" creationists often use is that the code is too complicated to have evolved on its own. I recently tried to see if the code could be condensed in order to simplify it. I simply placed the second base of the triplets in the middle of the code-sun, followed by the first, followed by the third. If you do that you manage to get all the codons for leucine, serine, arginine and stop together! Futhermore you can twist the codons in such a way that amino acids seem to cluster into structural/ functional groups (unpolar, polar, charged, intermediary, and special properties). If you are interested, please have a look at and leave comments. The Journal of Theoretical Biology seemed interested but they are known to take forever to process manuscripts. So, in the spirit of the opensource movement I went ahead and published a very rough first sketch on the net. Agabirhei 12:55, 18 July 2007 (UTC)

While this different way of representing the genetic code may help visualize groupings, as long as this is unpublished research this should not be mentioned in wikipedia under the "no original research" guideline: WP:NOR. Even if it is published, I think it should have more notability before getting put into this article. Apologies. Madeleine 00:18, 26 July 2007 (UTC)

Is there the possibility to delete this particular talk section (Alternate representations of the genetic code)? The Journal of Theoretical Biology accepted my article and I have calmed down considerably after my initial shock at the results of playing sudoku with the genetic code. Apologies again for not following proper procedures. I don't know if I'm entitled to delete the section but I give my full consent to anyone who wishes and is entitled to do so. Agabirhei (talk) 19:36, 4 July 2008 (UTC)

Specifically address genetic-code/genome confusion?[edit]

Popular accounts often misuse the phrase "genetic code" to mean "genome". See, for example, this Scientific American article: Genetic Code of Deadly Mosquito Cracked. Should the entry for "genetic code" or "genome" address this? —Tyrrell McAllister 09:57, 18 May 2007 (UTC)

That's an interesting question. There was an interesting article in the Scientist a few years ago that addressed those concerns. Right now, I personally am of the opinion that this doesn't have to be addressed immiediately. Antorjal 17:20, 28 July 2007 (UTC)
Comment added I just found the link I was talking about in the previous post. The article can be accessed here: [2] Antorjal 01:06, 5 August 2007 (UTC)
The misuse continues: [3]. I think this should be addressed in the article. If someone can find the article in The Scientist mentioned above, it could be used as a reference. Here is a more recent article that seems to discuss it --GregRM (talk) 16:46, 28 August 2010 (UTC)
The more this misuse occurs, the more entrenched it becomes. Descriptionist adherents of linguistics would say that because the term is used that way, it takes on that meaning. (talk) 20:02, 3 December 2010 (UTC)

Coming at this from a background of linguistics, I have a problem with using "code" to describe the subject matter of the article, as well as describing the codon as "transmitting information." The problem is that the codon describes the relatively circumscribed behavior of a limited series of chemical compounds, and the only "information" conveyed is the actual configuration of those chemical compounds. I recognize that a linguistic metaphor is useful, and possibly deemed essential by biologists in order to understand the workings of the codon and communicate it to others, but there should be a clear and unambiguous statement somewhere in the article that this is a heuristic device, and that the codon is not identical to a linguistic code.Digthepast (talk) 15:19, 26 September 2011 (UTC)

Our Code article begins with this definition: "A code is a rule for converting a piece of information (for example, a letter, word, phrase, or gesture) into another form or representation (one sign into another sign), not necessarily of the same type." The genetic code is such a rule for converting information (DNA/RNA sequences) into another representation (protein). I don't see a problem here - the genetic code is just a lot older than language (our representations using letters are younger still, but they represent a truly ancient genetic code). -- Scray (talk) 00:41, 27 September 2011 (UTC)

Codon usage stats in the genetic code image[edit]

What is the source of the codon usage statistics given in the picture? The values for Arginine in E. coli seem to contradict information given at (which is apparently from Escherichia coli and Salmonella, Vol. 2, Ch. 114:2047-2066, 1996, Neidhardt FC ed., ASM press, Washington, D.C). It says in the image that the agg and aga codons are not used at all in E. coli, which seems to be wrong. Also, there is a very large difference according to the source I cited between usage of cgg and cgt, which the picture doesn't reflect. I haven't checked anything besides those, though. —Preceding unsigned comment added by (talk) 12:50, 24 January 2008 (UTC)

Why is amino acid residue hydropathy and molar volume encoded in the genetic code prior to translation?[edit]

Genetic Code Structure.jpg

Doug Youvan (talk) 02:04, 25 April 2008 (UTC)

The genetic code is fault tolerant such that point mutations (single base changes) are less likely to cause destabilizing mutations in proteins. Thus, amino acids with similar physical properties are more likely to have similar triplets. From this article: "A practical consequence of redundancy is that some errors in the genetic code only cause a silent mutation or an error that would not affect the protein because the hydrophilicity or hydrophobicity is maintained by equivalent substitution of amino acids; for example, a codon of NUN (where N = any nucleotide) tends to code for hydrophobic amino acids." Madeleine 02:52, 25 April 2008 (UTC)
The genetic code is certainly capable of evolving, especially when you don't have a lot of cellular machinery committed to it. As is evident from the article, there are many examples of mitochondrial variations on the genetic code. Tweaking the code isn't impossible - the code, after all, is the product of a specific machinery - tRNA synthetases and tRNA itself, to be exact. It's possible that at a primitive enough point in the history of organismal complexity, this machinery evolved to some semblance of optimality. But this is just my speculation - I'll try and find a few references on the subject. Graft | talk 05:27, 25 April 2008 (UTC)
The source of the figure is here: and that on-line paper's references will show a few interesting things: 1) the structure of a membrane protein can be predicted from the nucleotide sequence (without translation), because the correlation between the structure of the genetic code and the the hydropathy of the amino acids residues is significant, 2) Singular Value Decomposition (SVD) can be used to map amino acid residue hydropathy and molar volume (separately) back onto the triplet codon as a function of the position in the codon and the nucleotide used, and 3) the genetic code as it stands is special as compared to random codes for supporting in vitro directed evolution experiments wherein genetic algorithms (theory and practice) are used to guide the 'doping' of codons in synthetic DNA for combinatorial mutagenesis.
So, my question can be rephrased as follows: Why is the genetic code structured in a manner that predicts (another word might be better ?) the two most important properties of the amino acid residues? I don't believe there is a known feedback mechanism to select for a particular code, nor is the subject discussed much. It would seem that a hypothetical evolutionary selection on the primordial code might have lead to several different codes which we do not see. Any references to this that we can cite?
I should add that there is a related discussion here Talk:Moore-Penrose_pseudoinverse#PseudoInverse_of_Partitioned_Tuples where readers of this discussion should recognize "alphabet = 4" as the four nucleotides, and "word = 3" as the triplet (3) codon. Using that math, the conventional PsuedoInverse (from SVD) is not needed for matrices structured such as the genetic code. It's unclear whether that is a special and / or trivial solution to P=NP. Proper referencing to encyclopedic quality work is needed in that case, too. Doug Youvan (talk) 06:17, 25 April 2008 (UTC)
May I suggest that the "yellow figure" be placed in the article with an explanation of the 20 letter single amino acid code and the use of "N" for any of the four nucleotides in the triplet? Later, some of the more complex discussion (as in above) can be referenced rather than trying to go into this depth. Another editor's help would be appreciated in order to keep this understandable. Doug Youvan (talk) 14:54, 25 April 2008 (UTC)
It's hard to imagine a feedback mechanism these days, but with a much higher error rate in the process of translation redundancy would ensure greater fidelity of the protein. That seems mechanism enough. Graft | talk 15:46, 25 April 2008 (UTC)
So, without any mechanistic explanation, should we just insert the figure as an interesting phenomenon? It's basically the code plus two critical aa residue physicochemical properties in a Venn diagram. The explanation of molar volume is simply size, and the hydropathy scale is basically water solubility. Nothing more needs to be said.Doug Youvan (talk) 16:47, 25 April 2008 (UTC)
No. The diagram is difficult to understand and you're writing a lot of strange OR-ish stuff on this talk page. The observation of this redundancy is not new, try googling [optimization of the genetic code]. If someone else would like to expand the article's coverage of this then that might be nice (although I don't see that it's critical), but I would be uncomfortable with seeing any additions come from you since you seem to be pushing an original research viewpoint that I do not understand. Sorry to be so blunt. Madeleine 04:26, 26 April 2008 (UTC)

Is this more understandable? OR-ish does not apply, because this is data already on WP, and I have published in the field.

Elliptical Genetic Code.JPG

Doug youvan (talk) 16:43, 30 May 2008 (UTC)

I'll now add this as an external link to the article. The website target has nothing but this figure and the data (referenced) to create the figure. At some point, perhaps the figure can be integrated into the article and we can drop the external link. Doug youvan (talk) 21:47, 30 May 2008 (UTC)
Codon Bias.jpg
I suggest someone else write the legend for this very simplified, small figure and insert it into the appropriate position in the Article. Doug youvan (talk) 19:42, 31 May 2008 (UTC)
It is clear that Francis Crick could have produced this diagram decades ago. He is often referenced for looking into structuring within the genetic code, but then he apparently reversed his position. This was not understandable to me until, just today, I searched "Thomas Jukes" (and) "Francis Crick" on Google and found this statement from Crick:
"The lectures will be concerned with the impact of biological ideas, both present and future, on our concept of the world. They will not be militantly anti-Christian, but nevertheless will be directed against the sort of ideas at present held by many religious people." - Referenced as 14 December 1965 from - having many internal links that an historian should now study in reference to Intelligent Design. Doug youvan (talk) 00:29, 1 June 2008 (UTC)
In thinking that someone might have actively opposed Crick's statement (only 20 years after WWII), and given the politics of big science, it seemed like a Crick opponent would suffer damage. Given the glaring hole at the Karolinska for the structure of tRNA, I just went to Alex Rich's website at MIT and found this letter: Doug youvan (talk) 01:13, 1 June 2008 (UTC)

Sorry - This out of sequence, but I now see Holley goes to Salk with Crick in 1968. Historical. Doug youvan (talk) 02:39, 3 June 2008 (UTC)

No one seems to be watching this page, so I made ~ 10 posts on the Talk pages of ~ the last 10 editors on this article. That is a request to review [[Image:Codon_Bias.jpg]] before it is inserted. Doug youvan (talk) 05:08, 1 June 2008 (UTC)
Wikipedia:ARCHIVE Please do not archive this section at this time: "There are two main methods for archiving a talk page, detailed below. Regardless of the method you choose, you should leave current, ongoing discussions on the existing talk page." Doug youvan (talk) 10:17, 1 June 2008 (UTC)
None of these diagrams above mean much to me on their own. Please draft a paragraph outlining what you wish to insert and citing the references that make the points that are also made in the text. Please note that references cannot be used to provide data that is then interpreted in a novel way on Wikipedia - that is original research. The ideas and arguments cannot be novel, you must simply report what others have said in publications on the topic beforehand. Tim Vickers (talk) 18:31, 1 June 2008 (UTC)
Tim - Please click on the black-and-white figure, above, and it will take you Commons where the key is given in the description. You will see see links to two published papers. One is on-line. This figure is a re-draw and simplification of the published figure. In particular, I redrew so that the black and white version shown here survives reduction in physical size. The "yellow" published figure which you see above on this discussion page is the same as the one referenced through Commons, and it does not survive physical reduction in size. I still think I should avoid writing the legend myself so as to double check any possible typos. A typo in the genetic code will propagate rapidly from this article. The pattern is referenced in the issued USPTO database as a program from MIT (Cyberdope) 19 times [4] and in the published USPTO database 7 times [5] Thanks. Doug youvan (talk) 20:17, 1 June 2008 (UTC)
Sorry, I mean please provide a draft of the text that you wish to insert, I've started a section below. Tim Vickers (talk) 20:33, 1 June 2008 (UTC)

Draft section[edit]


Note: Exactly the same text as the current article with one sentence (bold) inserted with one or two references, and one small figure already on Commons:

The reference cited does not appear to discuss the evolution of the genetic code. Was this the link you intended to include as a reference for this paragraph? Tim Vickers (talk) 02:15, 2 June 2008 (UTC)
Tim - I was clarifying what was already stated in that section: "One can ask the question: is the genetic code completely random, just one set of codon-amino acid correspondences that happened to establish itself and be "frozen in" early in evolution, although functionally any of the many other possible transcription tables would have done just as well? Already a cursory look at the table shows patterns that suggest that this is not the case." My insert occurs as the next (new) sentence. In the thumbnail, I redraw the Fuellen-referenced figure in a simpler manner, and Fuellen states that he has "adapted" the Yang-referenced figure from the book. I've gone through the Fuellen version for the redraw, because the Fuellen reference is on-line. The Yang reference is not. Consequently, the thumbnail is simply a redraw of Yang, but it would be difficult for readers to track that to the Yang publication. Doug youvan (talk) 15:35, 2 June 2008 (UTC)
Here is the abstract from the Yang publication: A solution to the problem of relating the physico-chemical properties of the amino acids to their codon sequences has been achieved by treating the genetic code as a system of linear equations and applying the numerical method, Singular Value Decomposition (SVD). For example, hydropathy and molar volume, which are important deteminants of protein structure and function, can be quantitatively related to the nucleotide sequence. The 20 hydropathy values of the amino acid residues were remapped to 12 nucleotide-determined values which, in turn, were used to predict structural aspects the photosynthetic reaction center protein, without DNA -> protein translation. These algorithms establish a theoretical basis for manipulating the properties of ensembles of proteins at the DNA level, which is important for engineering and analyzing combinatorial cassette libraries, and for designing reduced information content (RIC) proteins. —Preceding unsigned comment added by Doug youvan (talkcontribs) 16:41, 2 June 2008 (UTC) oops Doug youvan (talk) 16:43, 2 June 2008 (UTC)
Yes, I see where the figure comes from, but what reference are you using that discusses the significance of these relationships to the evolution of the genetic code? Tim Vickers (talk) 18:34, 2 June 2008 (UTC)
Figure and text moved up and blended into section on degeneracy. The figure can be relabeled:

U1 = UNN

A2 = NAN

C2 = NCN

U2 = NUN

Solubility -> Hydropathy

Size -> Molar Volume Doug youvan (talk) 13:42, 3 June 2008 (UTC)

your figures for "hydropathy" are for the individual amino acid and not for its properties in proteins. Proline, for example, has a generally very exceptional set of functions in protein that are not accounted for by the simple qualifications "molar volume" and "hydropathy". While there is some insight into the notion that chemical properties are transduced from the genetic code, I think you're trying too hard to create correlations that are there but not as deep as you want them to be. Takometer (talk) 20:52, 7 June 2008 (UTC)

Theories on the origin of the genetic code[edit]

Despite the variations that exist, the genetic codes used by all known forms of life on Earth are very similar. Since there are many possible genetic codes that are thought to have similar utility to the one used by Earth life, the theory of evolution suggests that the genetic code was established very early in the history of life and meta-analysis of transfer RNA suggest it was established soon after the formation of earth.

One can ask the question: is the genetic code completely random, just one set of codon-amino acid correspondences that happened to establish itself and be "frozen in" early in evolution, although functionally any of the many other possible transcription tables would have done just as well? Already a cursory look at the table shows patterns that suggest that this is not the case. For example, C in 2nd position of the codon yields amino acid residues that are small in size and moderate in hydropathy; U in 2nd position encodes average size hydrophobic residues; A in 2nd position encodes average size hydrophilic residues; U in 1st position encodes residues that are not hydrophilic, see Image:Codon_Bias.jpg, adapted from] and (Yang et al. 1990. In Reaction Centers of Photosynthetic Bacteria. M.-E. Michel-Beyerle. (Ed.) (Springer-Verlag, Germany) 209-218).

There are three themes running through the many theories that seek to explain the evolution of the genetic code (and hence the origin of these patterns).[1] One is illustrated by recent aptamer experiments which show that some amino acids have a selective chemical affinity for the base triplets that code for them.[2] This suggests that the current, complex translation mechanism involving tRNA and associated enzymes may be a later development, and that originally, protein sequences were directly templated on base sequences. Another is that the standard genetic code that we see today grew from a simpler, earlier code through a process of "biosynthetic expansion". Here the idea is that primordial life 'discovered' new amino acids (e.g. as by-products of metabolism) and later back-incorporated some of these into the machinery of genetic coding. Although much circumstantial evidence has been found to suggest that fewer different amino acids were used in the past than today,[3] precise and detailed hypotheses about exactly which amino acids entered the code in exactly what order has proved far more controversial.[4][5] A third theory is that natural selection has led to codon assignments of the genetic code that minimize the effects of mutations.[6].


  1. ^ Knight, R.D.; Freeland S. J. and Landweber, L.F. (1999) The 3 Faces of the Genetic Code. Trends in the Biochemical Sciences 24(6), 241-247.
  2. ^ Knight, R.D. and Landweber, L.F. (1998). Rhyme or reason: RNA-arginine interactions and the genetic code. Chemistry & Biology 5(9), R215-R220. PDF version of manuscript
  3. ^ Brooks, Dawn J.; Fresco, Jacques R.; Lesk, Arthur M.; and Singh, Mona. (2002). Evolution of Amino Acid Frequencies in Proteins Over Deep Time: Inferred Order of Introduction of Amino Acids into the Genetic Code. Molecular Biology and Evolution 19, 1645-1655.
  4. ^ Amirnovin R. (1997) An analysis of the metabolic theory of the origin of the genetic code. Journal of Molecular Evolution 44(5), 473-6.
  5. ^ Ronneberg T.A.; Landweber L.F. and Freeland S.J. (2000) Testing a biosynthetic theory of the genetic code: Fact or artifact? Proceedings of the National Academy of Sciences, USA 97(25), 13690-13695.
  6. ^ Freeland S.J.; Wu T. and Keulmann N. (2003) The Case for an Error Minimizing Genetic Code. Orig Life Evol Biosph. 33(4-5), 457-77.

Freeland et al reference[edit]

The Freeland et al. reference in this article links to an abstract at PubMed. As usual, further reading of the actual paper is blocked by copyright. However, there appears to be an on-line copy: . In fact, this verbose pdf paper supports the opposite view as what it is referenced to support in this genetic code article. The Freeland reference has excellent historical literature citations in mutational analyses, but nothing is cited in terms of an a priori mathematical analyses of the structure of the genetic code. Should we agree on how to fix this reference and the re-statement of its conclusion in this article? Doug youvan (talk) 16:09, 1 June 2008 (UTC)


Image:GeneticCode21-version-2.svg needs replaced at higher resolution with a more common file format. Any ideas that aren't a copyvio? Doug youvan (talk) 01:14, 3 June 2008 (UTC)

SVG's have arbitrarily high resolution and are actually a preferred format for graphs in Wikipedia. The template ShouldBeSVG is often added to images to request a conversion to SVG, I've never seen anyone request a conversion away from it. If you have a different problem with the image (eg. readability) that is a more reasonable criticism. Madeleine 02:15, 3 June 2008 (UTC)
Let me look into .pptx -> animated gifs on the Commons side as a possibility for creating one static frame for print and many animated frames behind it for additional information. I'll do that in a sandbox before suggesting such a mutation for an article as important as this one. My questions and discussion on the page are now concluded, so I think it is time to archive. Doug youvan (talk) 06:10, 5 June 2008 (UTC)

The forbidden combinations of gentic codes and amino acids[edit]

The following codon and the corresponding amino acid combinations do not occur in nature.

  1. TGGTGTATG corresponding to the amino acid combination WCM
  2. TGGATGTGT corresponding to the amino acid combination WMC
  3. TGTATGTGG corresponding to the amino acid combination CMW
  4. TGTTGGATG corresponding to the amino acid combination CWM
  5. ATGTGTTGG corresponding to the amino acid combination MCW
  6. ATGTGGTGT corresponding to the amino acid combination MCW

These combinations are found only in certain genetically modified clones and hypothetical proteins. To verify this (this won't take not more than a miniute):

  1. Run blastp at for wcmwmccmwcwmmcwmwc and check the output, check for the proteins, find whether they are hypothetical or biochemically characterized.
  2. Run blastn at for TGGTGTATGAAAAAAAAAAA, TGGATGTGTAAAAAAAAAAA, TGTATGTGGAAAAAAAAAAA, TGTTGGATGAAAAAAAAAAA, ATGTGTTGGAAAAAAAAAAA, ATGTGGTGTAAAAAAAAAAA, and check each output. One would find similar sequences only, no exact match (except some clones). —Preceding unsigned comment added by Jeyamalini (talkcontribs)
Are they forbidden (they would have some alternative effect or lead to an impossible/incompatible situation), or are they just base-sequences that nothing happens to use? Are they really not found in nature, or are they only not found in the subset of nature that has been gene-sequenced? "Hypothetical" and "biochemically characterized" excludes all the things that exist or have existed but that have not been studied yet. If there is really something special about these sequences that they can't occur or have been proven not to occur in general (not just blast datasets), then someone will have certainly published about it and we need a source supporting that strong claim. If it's just a novel thing something someone stumbled upon, it's WP:OR, and Wikipedia isn't for publishing of original results--publish it in a good journal and Wikipedia (a collection of non-experts) will trust the editorial judgement of experts (journal staff). DMacks (talk) 05:40, 22 September 2009 (UTC)
If you BLAST experiment is correct, these sequences do not occur in the non-redundant set of sequenced genomes. This is different from them being "forbidden" There are probably many combinations of nucleotides that either do occur but have not been seqenced yet, or do not occur in any living organism. This is to be expected. Tim Vickers (talk) 17:06, 22 September 2009 (UTC)
If every genome can be sequenced, then the genetic code itself may be disproved. As of now, there is no such combination of genetic codes which occur in nature. If there are not found in nature, it is not wrong to assume that they are forbidden.
Actually it is completely wrong to assume they are forbidden! You are assuming there is a cause (rather than coincidence) and you are not even ruling out experimental design flaws (extrapolating from a known incomplete dataset without knowing the existing data is a representative sample). "There are no students named Jake in my class, therefore my university does not allow anyone named Jake to take any classes?" DMacks (talk) 02:50, 23 September 2009 (UTC)
Amino acids are not used with equal frequencies. The least frequently used one in humans is W, while the most frequent L is used eight times as often. In addition, particular amino acid orderings are favored or disfavored because they lead to structural instability or are targets for other enzymes which recognize certain amino acid sequences and modify the protein based on them.
Your analysis is flawed and demonstrates your lack of experience in the field. You can't use BLAST like this, it's a seed based method meant to extend to significantly long patterns with various point scores given for extensions. You never use it to look for matches so short.
Actually getting the data and doing a vaguely reasonable analysis is not difficult. Download the human genome protein sequences from then write a perl program to iterate through every three letter combination of amino acids. Count how many proteins have a given combination. Here are the the fifteen rarest, in order of increasing frequency: MWW (43), WCM (54), MWC (59), WWC (69), CMW (72), WMW (73), CWW (75), WHW (76), WWM (76), YWW (76), HWW (84), WWW (84), MMW (85), MWM (87), HMW (87). There is no sharp drop-off in frequency that would support your immodest proposal of some not-yet-discovered deep biological rule regarding the letters W, C, and M.
Is it an immodest proposal? Are they biochemically characterized proteins?
In fact, this is what we'd expect given which amino acids are the rarest: W, M, C, Y, and H (in order of increasing frequency). All these trinucleotides have W in them, although WWW is probably less rare due to some instances of repetitive sequences (cf. trinucleotide repeat disorders).
Most days I wouldn't bother to do any analysis in response to someone being so rude. I guess I was bored. Your persistent editing of the main page was totally out of line. If you want to do bioinformatics, go to grad school and learn how to do it there. Then publish it in a journal. Not here. -- Madeleine 03:52, 23 September 2009 (UTC)

Removed computer science detail from "Transfer of information via the genetic code"[edit]

I removed the following parenthetical statement from the "Transfer of information via the genetic code" section:

(Practically speaking, one would need at least 2 bits to represent a nucleotide, and 6 for a codon, in a typical computer.)

It's correct, but I think it's more information than the section needs, and it interrupts the flow of the article. No need to nitpick. James A. Stewart (talk) 10:34, 7 October 2009 (UTC)

Prokaryotic v eukaryotic[edit]

Codons appear to code for different amino acids in prokaryotic and eukaryotic cells—can someone add a table or at least a comprehensive list of differences? Bongomatic 06:21, 8 October 2009 (UTC)

GA Reassessment[edit]

This discussion is transcluded from Talk:Genetic code/GA1. The edit link for this section can be used to add comments to the reassessment.

Starting GA reassessment as part of the GA Sweeps process. Jezhotwells (talk) 21:15, 26 February 2010 (UTC)

Checking against GA criteria[edit]

GA review (see here for criteria)
  1. It is reasonably well written.
    a (prose): b (MoS):
    THe tone of the article is not encyclopaedic, it reads more like a text book. Consider a thorough copy-edit throughout.  Done
  2. It is factually accurate and verifiable.
    a (references): b (citations to reliable sources): c (OR):
    Large parts of the article are unreferenced, I have placed citation needed tags. This leads me to de-list immediately
    References supplied check out. Assume good faith for references to which I do not have access.
    Links to journal sites which require subscriptions should contain "|format=Subscription required" in the template. Not done
  3. It is broad in its coverage.
    a (major aspects): b (focused):
  4. It follows the neutral point of view policy.
    Fair representation without bias:
  5. It is stable.
    No edit wars, etc.:
  6. It is illustrated by images, where possible and appropriate.
    a (images are tagged and non-free images have fair use rationales): b (appropriate use with suitable captions):
  7. Overall:
    The lack on referencing in large parts of the article is a serious concern. means that I will de-list now. I will place back on hold as User:Boghog2 has requested. Note also that the article needs to be rewritten in a more encyclopaedic tone, less like a text book. When sorted this can be brought back to WP:GAN. Major contributors and projects will be notified. Jezhotwells (talk) 21:40, 26 February 2010 (UTC) On hold for seven days. Jezhotwells (talk) 01:48, 27 February 2010 (UTC)
    OK, I am happy for this to keep GA status. I still think that you should put subscription required in the templates for online journals where free access is not given. However this is not a specific GA criterion. Keep GA status. Jezhotwells (talk) 13:44, 1 March 2010 (UTC)


Could you provide some specific examples of where you consider the artcle's tone to be unencyclopedic? At first glance I don't notice any clear infringements of WP:NOTTEXTBOOK, like leading questions or systemic problem solutions as examples. Emw (talk) 04:46, 1 March 2010 (UTC)

It looks like User:Boghog2 has tidied the text. Jezhotwells (talk) 12:31, 1 March 2010 (UTC)
My only edits were to add some citations. I agree with Emw. I don't see any major problems with the tone of this article. Boghog (talk) 17:59, 1 March 2010 (UTC)

MCB Collaboration of the month[edit]

Here we can coordinate the collaboration of the month of this article. We need to find open points and unsolved issues and areas where be believe we can improve the article.

Open points / questions:

  1. Frameshift mutations altering the sequence reading frame, and nonsense mutations causing a stop codon are examples of point mutations. This is factually wrong, point mutations are substitutions and (hardly ever; except maybe if the changet the start codon) alter the reading frame.

Greetings --hroest 11:23, 4 March 2010 (UTC)

This section is a very good idea. Thank you. I will be able to contribute more next week. I agree with you that this section on mutations needs to be clarified. I prefer the association of point mutations with substitutions, too. However, the term is not only used in this strict sense. Alternatively, the article Point mutation needs to be corrected as well. I will check this again. --Firefly's luciferase (talk) 06:23, 5 March 2010 (UTC)

Just to add to the work list to remember (also for myself):

2. Section Genetic code#Sequence reading frame: Add the term Open reading frame = ORF. I will check some references since this article in WP is not well referenced as well. --Firefly's luciferase (talk) 03:14, 9 March 2010 (UTC) 3. Two details to be added: wobble base (for the base that is not necessary to determinate a particular amino acid. And, more details on Selenocysteine and Pyrrolysine, I think. --Firefly's luciferase (talk) 03:19, 9 March 2010 (UTC)

I'm putting a lot of work into it and have resolved the frameshift mutation erroniusly being used as a point mutation example, hope you like what you see! КĐ 05:41, 18 August 2010 (UTC)

Merge proposal[edit]

Resolved: Redirected here. Adrian J. Hunter(talkcontribs) 18:08, 9 June 2010 (UTC)

It's been proposed that Codon Dictionary be merged here; discussion is at Talk:Codon Dictionary. I note it here because there's only been one comment since the merge was proposed nearly a month ago. Adrian J. Hunter(talkcontribs) 16:25, 1 June 2010 (UTC)

Codon Table[edit]

Not sure when you guys changed the codon table, but it's be more useful to leave the table as a DNA codon table rather than a mRNA codon table. Reason is that there is a lot more people working at a DNA rather than at a mRNA level. I am going to change the uracils back to thymine. —Preceding unsigned comment added by (talk) 22:59, 9 September 2010 (UTC)

Looking back through the article history, it appears to have been RNA for a very long time. I don't see an actual discussion or cite for this convention, but I also see this as the standard way the entire article is written, and same convention used in other articles (Proteinogenic amino acid#Gene expression and biochemistry for example). How do biochem and genetics textbooks do it? Regardless of whether the existing way is changed or not (I see your change was changed back by another editor), I think the table should have a clear note mentioning the T-vs-U switch for the "other" one so that one doesn't need to read through the rest of the article to find this simple fact. DMacks (talk) 05:39, 10 September 2010 (UTC)

Hmm... It appears you are right. For some reason, I was under the impression that DNA codons have always been used but I could've confused this page with some other sites. You are right again in that a lot of biochem and genetics textbooks use an RNA codon table instead of a DNA codon table because that reflects the biochemistry of the process.

I made the change from the standpoint that RNA codons should be of less interest than DNA codons in practice. Unless a genome biologist is working exclusively with mRNA, he'd likely be much more interested in DNA codons than RNA codons. That stems partly from the fact that much of genes that are annotated are now stored as DNA sequences rather than RNA sequences.

If the consensus is that a RNA codon table is more appropriate, then I'd concede. However, I'd prefer to have a DNA codon table in this wiki so that people who are actually working with codons will be able to refer to those tables. Although of course... it's really just a matter of changing an U to a T, but that can be tedious depending on what one's using it for. —Preceding unsigned comment added by Bobthefish2 (talkcontribs) 17:34, 10 September 2010 (UTC)

I propose breaking out the RNA original, the reverse-mapping one, and the DNA Codon Table one you just made into a new separate article (maybe Codon tables)? They're important references on their own, not just as one aspect of the prose content here. It's also a fairly lengthy technical chunk that one doesn't need if one wants to learn about the ideas of the article topic--the article is 53K, in part due to several mergers of associated topics. Seems useful to be able to pull them up directly and more accessibly without having to scan down the whole article here. DMacks (talk) 20:58, 10 September 2010 (UTC)
Agreed. One possibility is to make the aforementioned page, leave the reverse RNA codon table around (the forward table is way too bulky), and then add a link below the category heading to the new page. I'd also suggest doing something similar to the amino acid page. Having a separate page like Properties of Amino Acids would come in handy for people who want to quickly grab various tabularized biochemical or statistical information on the 20 standard amino acids. Bobthefish2 (talk) 22:17, 10 September 2010 (UTC)
I checked six textbooks I had handy, and all of them use Us in their codon table. I think Bobthefish2 makes a valid point about the superior practical utility of a T-based table. But I think Wikipedia's purpose is closer to that of a textbook than a technical reference document. If textbook writers agree that a U-based table is the better way of presenting information about the genetic code, then I think Wikipedia should do the same. The only alternative I'd support, and then only if we could create an uncluttered and elegant table that does this, is to use the approach taken by Britannica (link) and show both the DNA and the RNA codons on a single table. Though for what it's worth, the table we've got right now is far superior to what Britannica's got :-) Adrian J. Hunter(talk

contribs) 05:08, 11 September 2010 (UTC)

Textbooks I checked

  • Snustad, D. Peter; Simmons, Michael J. (2003). Principles of genetics. New York, NY: John Wiley Sons. p. 325. ISBN 978-0-471-44180-9. 
  • Campbell, Neil; Reece, Jane (2005). Biology. San Francisco: Pearson, Benjamin Cummings. p. 314. ISBN 978-0-8053-7171-0. 
  • Tobin, Allan J.; Morel, Richard E. (1997). Asking about cells. Fort Worth, Tx: Saunders College Publishing. p. 362. ISBN 978-0-03-098018-3. 
  • Ladiges, Pauline Y.; Evans, Barbara; Saint, Robert (2010). Biology: An Australian Focus. Australia: McGraw Hill. p. 239. ISBN 978-007027440-2. 
  • McMurry, John; McMurry, Susan (2000). Organic chemistry. Pacific Grove, CA: Brooks/Cole. p. 1174. ISBN 0-534-36274-5. 
  • Elliott, William Rowcliffe; Elliott, Daphne C. (2001). Biochemistry and molecular biology. Oxford: Oxford University Press. p. 382. ISBN 0-19-870045-8. 
Adrian J. Hunter(talkcontribs) 05:08, 11 September 2010 (UTC)
Yes, I was somewhat surprised when most of the textbooks use an RNA codon table. At the same time, we need to keep in mind that while an RNA codon table is the most appropriate in the context of molecular cell biology, it may not be the most practical form of a codon table to use. By the way, I like the Britannica version of the codon table. If we somehow include a similar table that is copy/paste-friendly then that solves everything. Bobthefish2 (talk) 20:53, 13 September 2010 (UTC)
Agreed that most sequence data appears as DNA, but who in their right mind would translate DNA sequences by consulting a table? The RNA version is used because the functional interactions for translation occur at the RNA level. (talk) 20:17, 3 December 2010 (UTC)

Codon table layout[edit]

Until several weeks ago the codon table in this article looked like this:

An IP changed it to the following:

This was reverted a few times, until the IP (editing from explained the change: "codons are not polar etc but amino acids are." In other words, the color scheme applies to the amino acids, not the codons, so the colored shading should not be applied to the codons themselves. I agree with this reasoning. The problems with this table are that (1) As noted above, it's very bulky, and (2) Consequently, it obscures the non-random nature of the genetic code – the tendency for single-base substitutions to result in codons with similar chemical properties. In the sandbox, created a new table which I think combines the best features of both other tables: it does not apply color shading to the codons, it's compact, and the non-random nature of the code is readily apparent. I've applied minor tweaks to this table, and the result is below:

I'm not sure why didn't incorporate this into the article, but if there's no objection I'd like to replace the current table with this one. Adrian J. Hunter(talkcontribs) 11:21, 11 September 2010 (UTC)

Support. Looks like an improvement in the current (making trends more apparent and table less bulky) while still retaining the valid polarity concern of the IP. An alternative solution is to simply clearly state in the table legend or intro that the coloring refers to the amino acid polarity, not just "polarity". The IP-current and Adrian's improvements still make horizontal (second-base for constant first and third) trends harder to notice (though obviously no 2D table can alow all three base-positions' comparisons by adjacency). DMacks (talk) 15:55, 11 September 2010 (UTC)
I see what you mean about the second base trends, though those are weaker than for the other bases anyway (2nd base changes are never synonymous). For now I've changed the table to the third option as the first option uses horizontal lines inconsistently (eg look at the lines between leucines), but I wouldn't object to coloring of the codons if someone wants to implement it. Maybe one day someone will present the table as a 4x4x4 Borg cube that shows all trends equally :-) Adrian J. Hunter(talkcontribs) 11:59, 19 September 2010 (UTC)
Just so you have something to look forward to, that was added to the article a while ago and looks great! We also have developed time-travel. DMacks (talk) 12:25, 19 September 3010 (UTC)
Would be cool if there were a VR or at least user-rotatable-3D format supported. DMacks (talk) 12:25, 19 September 2010 (UTC)

Quotes for some of the challenged statements[edit]

My recent edit here came up for discussion at Wikipedia talk:No original research: what you wrote 'the code is determined by' just doesn't make sense that I can see. And you have gone and stuck it in again. The code 'just is', it is interpreted to produce things, it is not determined by the proteins or enzymes it produces. Do you mean determines? by user:Dmcq

No, the code isn't "just is". The DNA sequence just is, but "the code" in this context isn't the DNA sequence. The genetic code is the correspondence by which three-letter codons in nucleic acid sequences yield amino acids in proteins.

When I say that it's determined by stuff coded for in the DNA, I mean that the aminoacyl-tRNA synthetases are ordinary proteins: they're produced from genes using the genetic code, same as any other protein. And that the tRNAs are, like any other RNA, produced by transcription from DNA (followed by processing, although I didn't mention that). It's hard to find a quote for the former. In most contexts, it would never occur to anyone to doubt that the synthetases are coded for in DNA just like any other protein. If you know enough to be talking about them in the first place, you don't need to be told that. The best I can find at the moment is a search listing lots of genes for aminoacyl tRNA synthetases: [6] Each entry tells what chromosome the gene is on, but it doesn't belabor the fact that it's DNA.

The fact that tRNAs are coded in DNA is much easier. Here's a quote for the it: "Transfer RNA molecules, like mRNA and other types of RNA, are transcribed from DNA templates." p 325, Biology, Neil A. Campbell ISBN 0-8053-1880-1 And for the fact that tRNAs are processed rather than having the original transcript be the final form: "tRNAs are covalently modified before they exit the nucleus", p 338 Molecular Biology of the Cell, Bruce Alberts et al., ISBN 0-8153-3218-1.

But to avoid running afoul of SYNTH, I would have to find those two statements together in the same source, making their implications at least as explicit as I did. Hard plus easy isn't somewhere in between, in this type of case. Hard plus easy is well-nigh impossible, so the article never gets improved in ways that are hard plus easy. I hate SYNTH. Suppose I took this "original research" to a journal that publishes original research. Would they vet it for correctness and then publish my article? No, they would assign an intern (if they felt sorry for me enough to waste the time) to explain to me that my supposed discovery was already obvious to everyone back in 1959. Here in WP, though, it's unverifiable original research. Did I mention that I hate SYNTH? (If anyone feels the need to answer that question, please do so at Wikipedia talk:No original research#I hate SYNTH.)

The problem in the article that got me to make the edit was that it said mRNA is produced directly by transcription, i.e. that there are no introns. I wrote, "In eucaryotes, the result of transcription is not mRNA itself, but pre-mRNA." Here's the corresponding statement in Campbell: "In bacteria, mRNAs are ready for translation as soon as they peel away from their DNA templates. In contrast, the RNA products of transcription in eukaryotes are processed before they leave the nucleus as mRNA." Campbell p 325 --Dan Wylie-Sears 2 (talk) 05:21, 23 May 2011 (UTC)

The discussion here and at Wikipedia talk:No original research#I hate SYNTH is complicated by the fact there were two parts to your edit. Addressing both of them:
(1) The article previously stated "Each protein-coding gene is transcribed into a template molecule of the related polymer RNA, known as messenger RNA or mRNA." You noted that this implies there are no introns, and I agree that that sentence was potentially misleading. Your edit fixed that problem but also introduced some confusing wording. I've edited that paragraph myself to provide what I hope is clearer, simpler wording. I agree that whole section could use more work.
(2) You added a new paragraph that stated that the genetic code is determined by the tRNAs and the aminoacyl tRNA synthetases. You are 100% correct. I don't think this is obvious—it's actually counter-intuitive that the genetic code is determined by the genome via the genetic code—but the fact is self-evidently true to someone who understands these concepts and takes the time to think them through. But... I don't think that statement belongs in the "Transfer of information via the genetic code" section. It's far too confusing to someone who's still learning the basics of molecular biology, which presumably includes most of this article's readership. For the same reason, I doubt you'll find that statement in a general biology textbook like Campbell. The section where that statement belongs is the "Origin" section, and you may find a supporting citation in one of the papers already cited there. I do think the statement needs to be cited, because (a) it would show a reader that this initially surprising fact is not something we've made up ourselves, and (b) it would allow an interested reader to follow-up with a more detailed, professionally written source.
Adrian J. Hunter(talkcontribs) 12:13, 23 May 2011 (UTC)
I agree that the edit makes it clearer, and that "Transfer of information via the genetic code" isn't a satisfactory spot for the statement. I had been planning to make a new section for it, but then I saw that much of the content would duplicate what was already in the "Transfer ..." section. Counter-intuitive but self-evident is a good description. The most reasonable interpretation of current policy is that it needs to be cited. And even if not required, a citation would be good to have. But I think self-evident should be enough that a citation wouldn't be necessary, provided there are citations of everything needed for it to be self-evident. If we were polishing an already-stellar article, it might be a priority, but at the current level I think time is better spent improving the text of the article than digging for such a citation. --Dan Wylie-Sears 2 (talk) 17:52, 23 May 2011 (UTC)
I had a misunderstanding of what you were trying to say. However it seems that a mountain is being made of a molehill. The genetic code as defined is not a complete description of the whole process of generating proteins or other products, it is simply confined to the correspondence between codons and amino acids with a bit extra about things like stop and start codons. That eukaryotes chop bits out is interesting but not very relevant. The bit you were saying about determines can simply be summarized as that the DNA is part of a system with its environment in the cell, The genetic code is a description of part of how the environment interprets the DNA to form products and in turn those products help form the environment. I'm pretty certain there is some elementary introduction around that would say something straightforward like that if you feel it needs to be said. It would theoretically be possible to have two completely different organisms using the same DNA starting with different environments but using different genetic codes but I see no point in underlining this. Dmcq (talk) 12:55, 23 May 2011 (UTC)
Mea culpa for the mountain. As I mentioned on the SYNTH talk page, I'm once-burned from a SYNTH dispute some years back. I don't remember any details, only that the word SYNTH elicits a visceral reaction (which I already toned down about 30 decibels before I clicked submit on any of this).
I like the idea of two completely different organisms with the same DNA sequence. It could be easier than it sounds, and might even be workable in a science-fiction story, for a sufficiently nerdy audience. --Dan Wylie-Sears 2 (talk) 18:01, 23 May 2011 (UTC)

Expanded Genetic code[edit]

I don't see why the expanded genetic code section here should cover more than a few generalities from the linked article expanded genetic code. And I definitely do not think the link to that article should be removed. That article should be expanded rather than putting stuff in here at any detailed level. I have therefore reverted the changes being put in here to deal with the subject in greater depth. Dmcq (talk) 17:54, 22 July 2011 (UTC)

The article is already quite lengthy and detailed, focusing on certain aspects of the genetic code as standardly taught and used by most readers and scientists (note, it passed GA standards). There are certainly extensions in numerous directions, but let's not make this one article an all-encompassing one. Rather, we could (and already start to) off-load selfcontained articles on specific subtopics and link to them, per WP:SUMMARYSTYLE. DMacks (talk) 03:59, 25 July 2011 (UTC)

Re. Reversals of edits on "Genetic Code" page: Nucleic Acid Analogs[edit]

(Copied from my user page where it shouldn't have been) Dmcq (talk) 10:56, 24 July 2011 (UTC)

The problem here is that not all the new synthetic bases are analogs. For example, several synthetic bases are not analogs of A, C, G, T and U. For another example, Watson-Crick complements require A/T or A/U and C/G pairings; however, another synthetic base is X/X. X is its own Watson-Crick complement. None of the standard DNA or RNA bases are self-complementary. Thus, X is not an analog of anything previously discussed. Thus, adding the material as you require to Nucleic Acid Analogs would be incompatible with the actual research. While this refers to the research of Eric T. Kool (one such article appearing as xDNA in Wikipedia), Kool and other researchers have several more non-analog bases. I think it would be wrong to put factually-incorrect information under Nucleic Acid Analogs.

Typical codons are based on the standard DNA bases A, C, G and T, but if synthetic bases are used (such as X, which is not a nucleic acid analog) then the corresponding synthetic codons are entirely different. For example, a codon such as iso-C/X/G could hardly be included under a discussion of Nucleic Acid Analogs since once again X is not a nucleic acid analog. Just to be clear here, an analog usually is understood to mean a derivative of a standard base. X and other synthetic bases are not derivatives of A, C, G or T. This becomes even clearer when dealing with synthetic codons based on 4-bases. Thus the research based on "the 65th codon" certainly cannot come under standard codons of analogs of nucleic acids.

Synthetic mRNA anti-codons must be treated similarly. Furthermore, synthetic tRNA also should not be treated under analogs of nucleic acids. Indeed, analogs of nucleic acids really should be treated entirely separately from codons, anti-codons, mRNA, tRNA and synthetic amino acids. The idea of putting all these things together under Nucleic Acids sounds very strange. No acceptable book in standard biochemistry has ever done this. Shadow600 (talk) 05:44, 24 July 2011 (UTC)

I think all this is irrelevant. The stuff has nothing to do with genetic codes that occur in nature and should be under another article which is for that sort of thing. This article has a short section redirecting to such articles. It should just give an overview of the impoirtant things in that area and not go into detail best dealt with elsewhere. Dmcq (talk) 10:56, 24 July 2011 (UTC)

DNA codon table, E. coli usage[edit]

yellow, nonpolar g-Yellow, Trp green-yellow, Tyr green, polar green-blue, His blue, basic red, acidic (stop codon)
  2nd base
1st base T TTT 0.57 Phe / F TCT 0.11 Ser / S TAT 0.53 Tyr / Y TGT 0.42 Cys / C
TTC 0.43 Phe / F TCC 0.11 Ser / S TAC 0.47 Tyr / Y TGC 0.58 Cys / C
TTA 0.15 Leu / L TCA 0.15 Ser / S TAA 0.64 Ochre TGA 0.36 Opal
TTG 0.12 Leu / L TCG 0.16 Ser / S TAG 0.00 Amber TGG 1.00 Trp / W
C CTT 0.12 Leu / L CCT 0.17 Pro / P CAT 0.55 His / H CGT 0.36 Arg / R
CTC 0.10 Leu / L CCC 0.13 Pro / P CAC 0.45 His / H CGC 0.44 Arg / R
CTA 0.05 Leu / L CCA 0.14 Pro / P CAA 0.30 Gln / Q CGA 0.07 Arg / R
CTG 0.46 Leu / L CCG 0.55 Pro / P CAG 0.70 Gln / Q CGG 0.07 Arg / R
A ATT 0.58 Ile / I ACT 0.16 Thr / T AAT 0.47 Asn / N AGT 0.14 Ser / S
ATC 0.35 Ile / I ACC 0.47 Thr / T AAC 0.53 Asn / N AGC 0.33 Ser / S
ATA 0.07 Ile / I ACA 0.13 Thr / T AAA 0.73 Lys / K AGA 0.02 Arg / R
ATG[A] 1 Met / M ACG 0.24 Thr / T AAG 0.27 Lys / K AGG 0.03 Arg / R
G GTT 0.25 Val / V GCT 0.11 Ala / A GAT 0.65 Asp / D GGT 0.29 Gly / G
GTC 0.18 Val / V GCC 0.31 Ala / A GAC 0.35 Asp / D GGC 0.46 Gly / G
GTA 0.17 Val / V GCA 0.21 Ala / A GAA 0.70 Glu / E GGA 0.13 Gly / G
GTG 0.40 Val / V GCG 0.38 Ala / A GAG 0.30 Glu / E GGG 0.12 Gly / G
A The codon ATG both codes for methionine and serves as an initiation site: the first ATG in the coding region is where translation into protein begins.[1]


  1. ^ Nakamoto T (2009). "Evolution and the universality of the mechanism of initiation of protein synthesis". Gene. 432 (1–2): 1–6. doi:10.1016/j.gene.2008.11.001. PMID 19056476.  Unknown parameter |month= ignored (help)

Suggestion for Figure 4[edit]

20 small color-coded dots would be useful. A 4-color scheme could be used to coordinate these colored dots with the 4 major triplet groups. In this proposed drawing, one could see the extent of the spread of the amino acids through the H-M space. Unforetunately, the figure is too small to put the single letter code next to the dots. Youvan's biography has the same figure and it expands to a more complex figure that is fully labelled. Perhaps that is an option, but the expanded figure needs to be redrawn for rescaling and clarity. The choice of the colors was poor and the printed version is confusing because 2 of the colors are similar. Frank Layden (talk) 17:48, 3 June 2013 (UTC)

Redundancy and Origin[edit]

One of my favorite fields of study is the Origin of the genetic code. At this time, I believe our ideas on Origin are highly speculative and something REALLY BIG is missing. Such discission belongs in a separate article. There we might also pick up some work on NP-hard problems. It is hard to imagine how the program (making protein) and the computer language (the genetic code) can evolve simultaneously and sustain life - by analogy. On the other hand, Redundancy is tabulated fact. There is a "take home" message for a student studying Redundancy: wobble at the 3rd position and hydropathy at the second position. So, in terms of editing this article, we have just removed a factual section (Redundancy) and added a speculative section (Origin). I am sorry to say this article was better one year ago. The current article reflects people's research interests, including mine, while the older article gave us more history and fact.

Can we vote?

Redundacy - IN; Origin - OUT Frank Layden (talk) 16:17, 5 August 2013 (UTC)

Figure 4 Error[edit]

There is an error in Figure 4: The top label should read "NGN". I am having it fixed by a graphic artist in the next few days. I am also having a vectorized version of the "expanded view" made for the hyperlink and for the subsection. Frank Layden (talk) 01:02, 16 October 2013 (UTC)


There has been a lot of back and forth on Duons. We should come to con consensus before it goes into the main article.

My take on the whole this is summed up quite nicely in this article[1] basically the "duon" functionality is already well know and already has defined names like Regulatory DNA sequences, promoters, enhancers, termination sequences. These terms are already used in the Article and I don't think we should be using a term that basically boils down to a PR buzz word. it should not be in the introduction of the article, if at all. Ryftstarr (talk) 14:29, 16 December 2013 (UTC)


This whole Duon stuff is obviously PR buzz and trying to make the fact that epigenetic modifications occur within protein sequences a novel finding AND a distinct phrase is quite laughable. Also, the even larger claim that non-coding selection on protein regions being unknown is even more unbelievable. For transcription factor binding sites within proteins check back to at LEAST 2001, . This concept is not at all controversial for people in the specific field of epigenetics (modifications to DNA that do not change the underlying genetic code). Also ideas about optimal codons have been expressed since at least 1987 (

Neglecting this background makes the recent insertion both shortsighted and also suspect in seeming to increase the tout of its scientific claims.

REGARDLESS, none of this discussion belongs in an introduction to the genetic code. At best it should be a distant footnote at the end linked to the more extensive discussion on codon usage. Were the editors not aware of this 30+ year research topic (

To be more specific, the last paragraph of the introduction is distracting to a general introduction of the genetic code, which is specifically about the translation of mRNA into a protein sequence. Trying to shoe-horn into some talk about how organisms are more than protein (I happen to agree with things being more than proteins) is obviously out of place and seems like proselytizing. Again, if you really want it to be there, mention it in the context of the main body, not 1/3 of the introduction. (talk) 04:45, 17 December 2013 (UTC)Thomas

I'd support a single sentence noting that the base-sequence has more functions than just encoding proteins, specifically noting non-coding regions (general idea) and regulatory domains in particular. It helps define the scope of the article to the coding regions rather than the whole sequence. It's a pretty important point that not all the DNA is coding, and from what I see it's a pretty common misconception even among science students.DMacks (talk) 05:01, 17 December 2013 (UTC)

I agree with both of those points DMacks. My main dispute was 1) overemphasis to a secondary point in the intro and 2) ignoring the rest of the extensive research on non-coding regions. I agree that the idea of DNA sequence=deterministic is misleading and worth mentioning, but as currently stated it seems to undercut the whole premise rather than the reality being more nuanced. A single sentence linking out to regulatory sequences and possibly codon usage seems like a good idea.

Though, I have to mention the overall misconception about the role most of these regulatory processes play. Most epigenetic modifications, be they transcription factors, DNA methylation, or histone modification, largely relate to changes in how MUCH of a given protein is produced, rather than WHAT protein is produced. There are some recent studies showing that histone/methylation can affect whether introns/exons are included/excluded thereby changing the protein sequence, but that does not (currently) seem to be a primary function. So again, the actual research is much less clear than is currently purported. As it stands, the last paragraph of the intro speaks more about the relevance of protein abundance evolution vs protein sequence evolution. I'm forgetting the more eloquent description of this debate, but it is a long-going discussion in the evolution literature. (talk) 05:15, 17 December 2013 (UTC) Thomas

  • I agree with the above, specifically that the term "duon" should not appear in this article and that a sentence on non-coding functionality is warranted (but the current paragraph should be removed) benmoore 09:40, 17 December 2013 (UTC)

Here is a rough draft of a replacement sentence: "While the genetic code determines the protein sequence for a given coding region, other genomic regions can influence when and where these proteins are produced" This sentence could be expanded to talk about further impact towards phenotype, but then in my opinion it starts to get bogged down in specifics that are tangential to the main article. (talk) 05:04, 18 December 2013 (UTC) Thomas

GeneticCode21-version-2.svg confusing[edit]

Genetic code graphic figure GeneticCode21-version-2.svg is confusing in this context. I may be confused myself (not a biologist), but this figure is from the catalog of a company that specializes in posttranslational modification, and makes heavy reference to various modifications, which as far as I can tell have no direct relation to the natural genetic code. My initial interpretation was "oh, so the redundant codons actually specify posttranslational modifications." I can understand the desire for a sexy graphic instead of boring tables, but IMO this page would be improved by simple deletion of that figure.

Robertmacl (talk) 12:45, 13 May 2014 (UTC)

What is the genetic code?[edit]

Descriptions of the genetic code have improved in the past ten years, but even a simple definition is still lacking. The old definition is no longer explicitly given - the genetic code is a transfer of linear information from DNA to protein - but it is still strongly implied in everything being said here. The net effect is that there is no working definition for molecular information, and molecular information is the purpose of genetic translations.

I think there is a simple, logical foundation for the genetic code. I think that the genetic code, if it is properly understood, is central to all processes in life. I think there is a phenomenal amount of molecular information stored in and translated by the genetic code, not just codons and amino acids.

I seem to be the only person on the planet that feels this way about it, and that's okay. But I'm a little bit surprised that after ten years these valid ideas are not even mentioned on a page like this. I think for the sake of debate, you should point them out if only to refute them, or tell people why they should reject them.

If anybody cares to understand this, they can start here:

Many sources use the term "genetic code" to mean "genome", and it sounds like you're arguing for a similarly inclusive or broad definition. I think this article should narrowly define "genetic code" to mean essentially what a codon table shows, as it currently does. Otherwise the scope of this article becomes ill-defined, and would overlap with related articles. Adrian J. Hunter(talkcontribs) 05:33, 22 July 2014 (UTC)

I absolutely do not mean "genome." If you want to define the genetic code to be "essentially what a codon table shows" then I think that should be included as the first line in the page. Then I think you should explain exactly what a codon table is and exactly what it shows, because other than being defined that way, that is not what the genetic code is.

The basic problem is that "everybody knows" that the genetic code is something that translates "molecular information." Unfortunately, this represents nothing but a tautology in that molecular information is defined as that thing translated by the genetic code, and the genetic code is defined as essentially what a codon table shows.

My basic point is that a codon table is a very small part of what the genetic code actually is, and I am limiting this here specifically to the molecular information translated from nucleotide sequences to protein sequences. The genetic code at that level is still so many things that I think it is incumbent on any explanation like this to clearly define what it is explaining. Short of that, it does more to confuse people than actually clear things up.

Central point hidden[edit]

This fact is covered by the article but rather much hidden away. It is not represented in the lead. There are two critical steps. The FIRST step is the coupling of the Transfer RNA the the Amino Acid. This requires a specific enzyme, the amino asyl transfer RNA synthetase, for each amino acid. One can say the the DNA code for the AATRS embodies half of the genetic code. But this is not pointed out by the text. It has to be reasoned from the text. --Ettrig (talk) 12:59, 25 November 2014 (UTC)

I would say that tRNA is the molecule that does the translation from codon to amino acid. It is like a dictionary or something. You are correct, it is only because of the tRNA that AAA means Lysine. The translation from AAA to lysine has nothing to do with ribosomes. It is the tRNA and only the tRNA that translates codon to amino acid. "All" that the ribosome does is to get the correct tRNA to match the mRNA and then join the amino acids into a polypeptide. Note that the tRNA also provides the energy for the ribosome to move the mRNA by 3 bases as the mRNA is read. This probably should be clarified -- Lehasa (talk) 13:51, 22 February 2015 (UTC)

Per mille[edit]

Trying to use this data, I found it confusing. It's clearly not percent, since it sums to more than 100. I looked at the column heading, but was not familiar with the percent-like symbol. I hovered over the symbol, and it said "per mille", so I though it was per thousand, but was not sure in what language (I don't know Latin).

I went to the original reference cited in the section, which said "per thousand", which made sense. So I looked up per mille and found that I was not alone is not being familiar with this:

The term occurs so rarely in English that major dictionaries do not agree on the spelling or pronunciation even within a single dialect of English[10] and some major dictionaries such as Macmillan[11] and Longman[12] do not even contain an entry.

So I changed it, so now when you hover over the symbol it says "per thousand", which will be more helpful to the reader, I think. Other opinions are welcome. LouScheffer (talk) 12:29, 17 October 2016 (UTC)

The hover-text sounds like a good reason, even though it is only a redirect to our article that uses "per mille" as its actual name. DMacks (talk) 12:38, 17 October 2016 (UTC)

Amount of codons per gene[edit]

I'm wondering whether the amount of codons per gene varies, and if so, whether there is a minimum and maximum amount of codons per gene. Also, if there's a minimum/maximum amount of codons, is this amount a multiplication of 3 (i.e. 1³, 2³, 3³, ...). That way, we could also know the amount of possible genetic code variations per gene. KVDP (talk) 16:10, 22 June 2017 (UTC)


I was wondering whether there has been any research in Decipherment of the DNA. For instance, there are various types of mutations of the same gene in the human population, which express themselves as differences in real life between the humans.[1]

Logically, each of these mutations is a code for a different message that conveys details on how to do something in the human body. My guess is that each of the 64 codons is a base building block in that code (so comparable to a letter in our own alphabet). Each gene (or hence sequence of codons) will (I think) convey a message to what type of tissue needs to be build (i.e. fat, bone, flesh, ...) and how long this strand of tissue needs to be, and its shape, and to what tissue it should connect). The thickness of the tissue is probably not specified directly, but rather specified by a seperate gene, perhaps via the "codon for specifying length". The latter, I assume because a disease like Talk:Sclerosteosis also exists.

The reason why this is useful to know is because, at present, for treating genetic diseases, we can only just use the genetic code of humans without that disease to overwrite the faulty gene in a person with the disease. However, as Stephen Friend from The Resilience Project found out, there are many versions of "good genetic code", and not all version will work on that person. We don't know why this is, and so every gene therapy that would be undertaken becomes a puzzle, and each gene therapy may need to be repeated several times. If we understand what message is in the gene, we might avoid all this.

KVDP (talk) 09:13, 27 June 2017 (UTC)

External links modified[edit]

Hello fellow Wikipedians,

I have just modified one external link on Genetic code. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

You may set the |checked=, on this template, to true or failed to let other editors know you reviewed the change. If you find any errors, please use the tools below to fix them or call an editor by setting |needhelp= to your help request.

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

If you are unable to use these tools, you may set |needhelp=<your help request> on this template to request help from an experienced user. Please include details about your problem, to help other editors.

Cheers.—InternetArchiveBot (Report bug) 18:11, 12 October 2017 (UTC)