Wikipedia talk:WikiProject Molecular Biology

Welcome to the WikiProject Molecular Biology talk page. Please post any comments, suggestions or questions. Also feel free to introduce yourself if you plan on becoming an active editor!

Please remain civil, be respectful, and assume good faith.
Put new text under old text. (Start a new topic ).
Threads older than 90 days are automatically archived.

WikiProject Molecular Biology Archives: 1, 2, 3

Taskforce archives:

MCB: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
Genetics: 1, 2, 3, 4
Computational Biology: 1, 2
Gene Wiki: 1, 2, 3, 4

Biophysics (inactive): 1, 2
Metabolic Pathways (inactive): 1
Cell Signaling (inactive): 1
RNA (inactive): 1

Infobox genome is linking to incorrect genome lists on NCBI

Hi all,

As per the Infobox genome talk page, it appears a recent NCBI change has broken the links produced by the Infobox genome template. My talk page comment goes into more detail, but in summary: NCBI is removing the Genome resource, replacing it with the Datasets resource; the Infobox genome template links to the old Genome resource when given a taxId argument; the template now links to incorrect genome lists on some pages (e.g.: the box on Chimpanzee now incorrectly links to a genome listing for Impatiophila pipa). Not all pages are affected (e.g., NCBI automatically redirects the link on Bonobo to the correct genome list), but many are. I imagine there's a risk all the links could stop working if/when NCBI completely deprecates the Genome resource.

Note that the taxId field in the template actually expects a genome ID (and this is what articles have been using for this field). This is at the crux of the problem, as these genome IDs appear to no longer be used. When infobox links still work, it's because the NCBI silently redirects the link to one using the organism's taxonomy ID.

I can think of three ways to fix the problem; all would require updating each article using the Infobox with new IDs, so I wanted to talk about which solution is best before jumping in and making all those changes.

Change the infobox to link to the reference genome only; replace genome IDs on all articles with reference genome IDs. Right now, on pages where it's still working properly, the Infobox links to a list of all the genomes NCBI has available. E.g.: for Bonobo, the NCBI genome page lists 8 genomes, with the reference genome at the top. We could instead update the taxId on every organism's page by replacing the genome ID with the reference genome ID (for Bonobo, changing to taxId=NHGRI_mPanPan1-v2.0_pri) and change the URL the template uses to https://www.ncbi.nlm.nih.gov/datasets/genome/{{{taxId}}} (resulting in this link for Bonobo).
Keep the infobox behavior of linking to genome lists; replace genome IDs on all articles with taxonomy IDs. If we think it's valuable to link users to a list of all the available genomes rather than just the reference, we could instead replace the genome IDs on all pages with the taxonomy IDs (for Bonobo, we'd remove taxId=10729 and replace with taxId=9597). We'd change the URL the template uses to https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon={{{taxId}}} (resulting in this link for Bonobo).
Link to the NCBI Taxonomy page instead of directly linking to genome data; replace genome IDs on all articles with taxonomy IDs. We could instead link to the NCBI taxonomy pages, which give a short summary of the organism. This includes a link to the genome list & a link directly to the reference genome. This is technically not a link to a genome dataset anymore, but I believe this is the closest behavior to how our links worked before the NCBI change - and it does still provide easy access to the genome data (including a download button that directly links to the reference genome). We'd change the genome ID on all pages with the taxonomy ID (like option #2) and change the URL in the template to https://www.ncbi.nlm.nih.gov/datasets/taxonomy/{{{taxId}}} (resulting in this link for Bonobo).

Any thoughts on which option is best, or other options I haven't thought of? Of note, there are ~62 articles using this Infobox that would need to be updated to the new IDs no matter what option is picked; not an insurmountable number but not so few that changing them all would be quick & trivial. — nmael ^talk 14:38, 8 August 2024 (UTC)[reply]

A typical genome infobox contains a link to NCBI and at least three bits of information: ploidy, genome size, and number of chromosomes. I think that ploidy and the number of chromosomes are bits of information that should be in the main body of the article. The size of the genome is not a fixed number, it changes from year to year as more of a genome is sequenced and annotated. I would argue that putting this number in an infobox gives it too much credence.

I would also argue that the best link to sequenced genomes is the Ensembl site and not NCBI. For example, here's the Ensembl link to the chimpanzee genome. Note that it gives the genome size as 3,231 Mb and not 3,323 Mb as in the genome infobox. It also says that there are 23,534 protein-coding genes and 9,710 non-coding genes and this data conflicts with what's written in the article on chimpanzee.

I suggest a 4th solution, delete all genome infoboxes. Alternatively, someone could replace all the NCBI links with Ensembl links then review and update the genome sizes of all infoboxes every few months. Genome42 (talk) 16:38, 8 August 2024 (UTC)[reply]

While I think option 3 provides a more informative wrapping around links to genomes per taxon, option 1 actually would link to a genome as you would expect for the field NCBI genome ID. The field label should probably be updated to NCBI reference genome ID or similar. I also see no reason not to exclude linking to Ensembl or any other database as long as the statistics in the box are of the linked genome.

I think using an Infobox to pull out (or duplicate) common measures of genetic material from the body text is generally helpful and is especially useful for hosting external links to databases. I support keeping the infobox around.

Things like base counts would be accurate/stable as long as the genome is sequenced with complete coverage and high confidence. I'm more familiar with bacterial genomes where that kind of thing is more common, so I'm not sure how realistic that tends to be for eukaryotic reference genomes. Surely, cutting off those numbers at two significant figures would make the estimates close enough for general purposes and pretty stable over time. Same for coding vs. non-coding genes. ― Synpath 02:01, 13 August 2024 (UTC)[reply]

Standardizing The Sections of Protein Articles

I noticed that the example protein articles referenced in the protein article style guide do not follow the recommended sections layout. Should there be a task to standardize these articles?

I'm working on the Tumor Necrosis Factor article and wondered if it should be structured consistently with the other protein articles.

Example articles: https://en.wikipedia.org/wiki/Protein_C https://en.wikipedia.org/wiki/Gonadotropin-releasing_hormone AdeptLearner123 (talk) 06:49, 18 August 2024 (UTC)[reply]

Style guide: https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Molecular_Biology/Style_guide_(gene_and_protein_articles)#Sections — Preceding unsigned comment added by AdeptLearner123 (talk • contribs) 06:52, 18 August 2024 (UTC)[reply]

Task force for Structural Biology

Structural biology lacks any type of wikiproject, at least from what I can find. The core articles surrounding structural biology are scattered, and need a lot of coordinated work done to them. Any thoughts on this? Niashervin (talk) 03:36, 2 September 2024 (UTC)[reply]

Pentose phosphate pathway, chemokines, myokines categories

Please leave your comments in Wikipedia:Categories_for_discussion/Log/2024_September_4#Category:Pentose_phosphate_pathway. Marcocapelle (talk) 05:18, 4 September 2024 (UTC)[reply]

Now at Wikipedia:Categories for discussion/Log/2024 September 12#Category:Pentose phosphate pathway. We would really appreciate your input; I do not think any CFD regulars are experts in molecular biology. Best, House Blaster (talk • he/they) 15:56, 12 September 2024 (UTC)[reply]

New article that might be relevant to this project

I have recently made a new article, rhizoplast, a cellular structure found mainly on protists and some fungi. I am not a member of this WikiProject so I thought I would notify it here so that the proper importance assessment can be made. — Snoteleks (talk) 16:35, 20 September 2024 (UTC)[reply]

Good article reassessment for Catalytic triad

Catalytic triad has been nominated for a good article reassessment. If you are interested in the discussion, please participate by adding your comments to the reassessment page. If concerns are not addressed during the review period, the good article status may be removed from the article. Z1720 (talk) 17:04, 6 November 2024 (UTC)[reply]