User talk:Boghog

If I have left you a message: please answer on your talk page, as I am watching it.
If you leave me a message: I will answer on my talk page, so please add it to your watchlist.
Please click here to leave me a new message.

Archives

This page has archives. Sections older than 30 days may be automatically archived by when more than 4 sections are present.

New Page Patrol newsletter October 2022

Hello Boghog,

Much has happened since the last newsletter over two months ago. The open letter finished with 444 signatures. The letter was sent to several dozen people at the WMF, and we have heard that it is being discussed but there has been no official reply. A related article appears in the current issue of The Signpost. If you haven't seen it, you should, including the readers' comment section.

Awards: Barnstars were given for the past several years (thanks to MPGuy2824), and we are now all caught up. The 2021 cup went to John B123 for leading with 26,525 article reviews during 2021. To encourage moderate activity, a new "Iron" level barnstar is awarded annually for reviewing 360 articles ("one-a-day"), and 100 reviews earns the "Standard" NPP barnstar. About 90 reviewers received barnstars for each of the years 2018 to 2021 (including the new awards that were given retroactively). All awards issued for every year are listed on the Awards page. Check out the new Hall of Fame also.

Software news: Novem Linguae and MPGuy2824 have connected with WMF developers who can review and approve patches, so they have been able to fix some bugs, and make other improvements to the Page Curation software. You can see everything that has been fixed recently here. The reviewer report has also been improved.

Suggestions:

There is much enthusiasm over the low backlog, but remember that the "quality and depth of patrolling are more important than speed".
Reminder: an article should not be tagged for any kind of deletion for a minimum of 15 minutes after creation and it is often appropriate to wait an hour or more. (from the NPP tutorial)
Reviewers should focus their effort where it can do the most good, reviewing articles. Other clean-up tasks that don't require advanced permissions can be left to other editors that routinely improve articles in these ways (creating Talk Pages, specifying projects and ratings, adding categories, etc.) Let's rely on others when it makes the most sense. On the other hand, if you enjoy doing these tasks while reviewing and it keeps you engaged with NPP (or are guiding a newcomer), then by all means continue.
This user script puts a link to the feed in your top toolbar.

Backlog:

Saving the best for last: From a July low of 8,500, the backlog climbed back to 11,000 in August and then reversed in September dropping to below 6,000 and continued falling with the October backlog drive to under 1,000, a level not seen in over four years. Keep in mind that there are 2,000 new articles every week, so the number of reviews is far higher than the backlog reduction. To keep the backlog under a thousand, we have to keep reviewing at about half the recent rate!

Reminders

Newsletter feedback - please take this short poll about the newsletter.
If you're interested in instant messaging and chat rooms, please join us on the New Page Patrol Discord, where you can ask for help and live chat with other patrollers.
Please add the project discussion page to your watchlist.
If you are no longer very active on Wikipedia or you no longer wish to be a reviewer, please ask any admin to remove you from the group. If you want the tools back again, just ask at PERM.
To opt out of future mailings, please remove yourself here.

New Pages Patrol newsletter January 2023

Hello Boghog,

Backlog

The October drive reduced the backlog from 9,700 to an amazing 0! Congratulations to WaddlesJP13 who led with 2084 points. See this page for further details. The queue is steadily rising again and is approaching 2,000. It would be great if <2,000 were the “new normal”. Please continue to help out even if it's only for a few or even one patrol a day.

2022 Awards

Onel5969 won the 2022 cup for 28,302 article reviews last year - that's an average of nearly 80/day. There was one Gold Award (5000+ reviews), 11 Silver (2000+), 28 Iron (360+) and 39 more for the 100+ barnstar. Rosguill led again for the 4th year by clearing 49,294 redirects. For the full details see the Awards page and the Hall of Fame. Congratulations everyone!

Minimum deletion time: The previous WP:NPP guideline was to wait 15 minutes before tagging for deletion (including draftification and WP:BLAR). Due to complaints, a consensus decided to raise the time to 1 hour. To illustrate this, very new pages in the feed are now highlighted in red. (As always, this is not applicable to attack pages, copyvios, vandalism, etc.)

New draftify script: In response to feedback from AFC, the The Move to Draft script now provides a choice of set messages that also link the creator to a new, friendly explanation page. The script also warns reviewers if the creator is probably still developing the article. The former script is no longer maintained. Please edit your edit your common.js or vector.js file from User:Evad37/MoveToDraft.js to User:MPGuy2824/MoveToDraft.js

Redirects: Some of our redirect reviewers have reduced their activity and the backlog is up to 9,000+ (two months deep). If you are interested in this distinctly different task and need any help, see this guide, this checklist, and spend some time at WP:RFD.

Discussions with the WMF The PageTriage open letter signed by 444 users is bearing fruit. The Growth Team has assigned some software engineers to work on PageTriage, the software that powers the NewPagesFeed and the Page Curation toolbar. WMF has submitted dozens of patches in the last few weeks to modernize PageTriage's code, which will make it easier to write patches in the future. This work is helpful but is not very visible to the end user. For patches visible to the end user, volunteers such as Novem Linguae and MPGuy2824 have been writing patches for bug reports and feature requests. The Growth Team also had a video conference with the NPP coordinators to discuss revamping the landing pages that new users see.

Reminders

Newsletter feedback - please take this short poll about the newsletter.
There is live chat with patrollers on the New Page Patrol Discord.
Please add the project discussion page to your watchlist.
If you no longer wish to be a reviewer, please ask any admin to remove you from the group. If you want the tools back again, just ask at PERM.
To opt out of future mailings, please remove yourself here.

New Pages Patrol newsletter June 2023

Hello Boghog,

Backlog

Redirect drive: In response to an unusually high redirect backlog, we held a redirect backlog drive in May. The drive completed with 23851 reviews done in total, bringing the redirect backlog to 0 (momentarily). Congratulations to Hey man im josh who led with a staggering 4316 points, followed by Meena and Greyzxq with 2868 and 2546 points respectively. See this page for more details. The redirect queue is steadily rising again and is steadily approaching 4,000. Please continue to help out, even if it's only for a few or even one review a day.

Redirect autopatrol: All administrators without autopatrol have now been added to the redirect autopatrol list. If you see any users who consistently create significant amounts of good quality redirects, consider requesting redirect autopatrol for them here.

WMF work on PageTriage: The WMF Moderator Tools team, consisting of Sam, Jason and Susana, and also some patches from Jon, has been hard at work updating PageTriage. They are focusing their efforts on modernising the extension's code rather than on bug fixes or new features, though some user-facing work will be prioritised. This will help make sure that this extension is not deprecated, and is easier to work on in the future. In the next month or so, we will have an opt-in beta test where new page patrollers can help test the rewrite of Special:NewPagesFeed, to help find bugs. We will post more details at WT:NPPR when we are ready for beta testers.

Articles for Creation (AFC): All new page reviewers are now automatically approved for Articles for Creation draft reviewing (you do not need to apply at WT:AFCP like was required previously). To install the AFC helper script, visit Special:Preferences, visit the Gadgets tab, tick "Yet Another AFC Helper Script", then click "Save". To find drafts to review, visit Special:NewPagesFeed, and at the top left, tick "Articles for Creation". To review a draft, visit a submitted draft, click on the "More" menu, then click "Review (AFCH)". You can also comment on and submit drafts that are unsubmitted using the script.

You can review the AFC workflow at WP:AFCR. It is up to you if you also want to mark your AFC accepts as NPP reviewed (this is allowed but optional, depends if you would like a second set of eyes on your accept). Don't forget that draftspace is optional, so moves of drafts to mainspace (even if they are not ready) should not be reverted, except possibly if there is conflict of interest.

Pro tip: Did you know that visual artists such as painters have their own SNG? The most common part of this "creative professionals" criteria that applies to artists is WP:ARTIST 4b (solo exhibition, not group exhibition, at a major museum) or 4d (being represented within the permanent collections of two museums).

Reminders

Newsletter feedback - please take this short poll about the newsletter.
There is live chat with patrollers on the New Page Patrol Discord and #wikimedia-npp ^connect on IRC.
Please add the project discussion page to your watchlist.
To opt out of future mailings, please remove yourself here.

HGNC-UNIPROT ID pairs in the HGNC database and Wikidata

Hey Boghog,

I took a look at how I might create a combined dataset from the HGNC protein-coding gene file & Wikidata's gene & protein data items. In a nutshell, I need to run a SPARQL query on Wikidata (or alternatively, on UNIPROT) with the mkwikidata library to pull more information on the gene and/or protein than what is listed in the current data file my bot uses. From there, I can just use pandas to merge the two datasets on HGNC-UNIPROT ID pairs and write the merged dataset to a tab-delimited file with that same formatting as the current "protein-coding gene.txt" file my bot uses. Not a problem to merge the two, although it appears there will be a few cases where HGNC-UNIPROT ID pairs in one dataset aren't matched to the same ID pair in the other.

Based on an initial look, there are a lot more HGNC-UNIPROT pairs listed in wikidata than in the HGNC "protein-coding gene.txt" file. That's not really a problem. I was planning on dropping any HGNC-UNIPROT ID pairs from Wikidata that aren't paired in the HGNC's dataset, so the extra ID pairs present in Wikidata won't pose an issue.

It's sort of a problem if the HGNC-UNIPROT ID links in the HGNC file aren't matched in Wikidata because it will render in the wikitables as partially blank rows for any entries with unpaired HGNC-UNIPROT IDs in Wikidata (i.e., the gene symbol, HGNC ID, and UNIPROT ID from the HGNC file will still appear in a table row for that HGNC-UNIPROT ID pair, but any columns of information I pull from Wikidata for that HGNC-UNIPROT ID pair will contain missing data). There are only 40 genes with more than one Uniprot ID in the HGNC's "protein-coding gene.txt" file, and they only contribute a total of 82 unique HGNC-UNIPROT ID pairs (NB: these are all listed in collapse tab below). So, while I don't think this is likely to be a prevalent issue, I did find one instance of wikidata missing an ID pair that's present in the HGNC dataset. E.g., from special:permalink/1166507817 in row #14387 SCRIB has the following HGNC-UNIPROT ID pairs:

HGNC:30377 - C0HLS1 [wikidata: unpaired; Uniprot's recommended name "SCRIB overlapping open reading frame protein" doesn't exist on Wikidata and Uniprot's short name for the protein ("oSCRIB") - is merely listed as an alias of the gene in SCRIB (Q18037229)]
HGNC:30377 - Q14160 [wikidata: SCRIB (Q18037229) - Scribble planar cell polarity protein (Q21116596)]

I'm not sure if the first HGNC-UNIPROT pair actually should be listed in Wikidata or not after looking at UNIPROT and HGNC separately. I was wondering if you knew why they aren't.

HGNC IDs that are linked to more than one UNIPROT ID in the "protein-coding gene.txt" file

For context, the current list of human protein-coding genes contains 19247 HGNC IDs and 19289 HGNC-UNIPROT ID pairs. I need to use the pair of identifiers to perform a 1:1 merge of the gene-protein entries in HGNC & Wikidata datasets or HGNC & Uniprot datasets simply because HGNC ID & UNIPROT ID have a one-to-many relationship on this tiny portion of the HGNC dataset.

HGNC:377 Uniprot links: O43687, Q9P0M2
HGNC:17868 Uniprot links: Q96PG8, Q9BXH1
HGNC:1437 Uniprot links: P01258, P06881
HGNC:1787 Uniprot links: P42771, Q8N726
HGNC:24346 Uniprot links: Q96RT6, Q9HC47
HGNC:2557 Uniprot links: P39880, Q13948
HGNC:2726 Uniprot links: P0DPQ6, P35638
HGNC:19681 Uniprot links: Q6B8I1, Q9UII6
HGNC:3413 Uniprot links: B3EWF7, O95278
HGNC:3438 Uniprot links: P0DP91, Q03468
HGNC:4392 Uniprot links: O95467, P63092, P84996, Q5JWF2
HGNC:23037 Uniprot links: A8MTL9, P0C7T4
HGNC:13664 Uniprot links: O94854, Q9UPN3
HGNC:25979 Uniprot links: L0R8F8, Q9NQG6
HGNC:7193 Uniprot links: O96007, O96033
HGNC:31104 Uniprot links: O95411, Q92614
HGNC:7629 Uniprot links: E9PAV3, Q13765
HGNC:8008 Uniprot links: P58400, Q9ULB1
HGNC:8009 Uniprot links: P58401, Q9P2S2
HGNC:8010 Uniprot links: Q9HDB5, Q9Y4C0
HGNC:20422 Uniprot links: P0DPB5, P0DPB6
HGNC:14862 Uniprot links: P0CAP2, Q6EEV4
HGNC:9403 Uniprot links: C0HM02, P24723
HGNC:9449 Uniprot links: F7VJQ1, P04156
HGNC:16519 Uniprot links: P0DI83, Q9BZG1
HGNC:24663 Uniprot links: B7ZAP0, Q5R372
HGNC:9896 Uniprot links: P0DW28, P98175
HGNC:2569 Uniprot links: A6ZKI3, O15255
HGNC:30377 Uniprot links: C0HLS1, Q14160
HGNC:10845 Uniprot links: P60896, Q6ZVN7
HGNC:15928 Uniprot links: O00241, Q5TFQ8
HGNC:20753 Uniprot links: L0R6Q1, Q96G79
HGNC:11875 Uniprot links: P42166, P42167
HGNC:24055 Uniprot links: Q8NFQ8, Q9H496
HGNC:11996 Uniprot links: Q5JU69, Q8N2E6
HGNC:12027 Uniprot links: P0DSE1, P0DTU3
HGNC:12155 Uniprot links: P0DSE2, P0DTU4
HGNC:1158 Uniprot links: B1AH88, P30536
HGNC:18194 Uniprot links: Q70YC4, Q70YC5
HGNC:25173 Uniprot links: C0HLU2, Q96CS4

Also, SPARQL is fairly new to me. I'm pretty sure, but not 100% certain, that this query yields all the relevant HGNC-UNIPROT ID pairs for human protein-coding genes on Wikidata. I need to read more about SPARQL queries and look at the descriptive statistics for this dataset in Python to be sure. I imagine a Wikidata project might be able to help me out too, though.

SELECT DISTINCT ?gene ?geneLabel ?HGNC_ID ?HGNCsymbol ?protein ?proteinLabel ?UNIPROT_ID ?wd_gene_item_article_link ?wd_protein_item_article_link
 {
   ?gene wdt:P31 wd:Q7187 .
   ?gene wdt:P703 wd:Q15978631 .
   ?gene wdt:P279 wd:Q20747295 .
   ?gene wdt:P354 ?HGNC_ID .
   ?gene wdt:P353 ?HGNCsymbol .
   ?gene wdt:P688 ?protein .
   ?protein wdt:P352 ?UNIPROT_ID .
    OPTIONAL { 
    ?article 	schema:about ?gene ;
                schema:name ?wd_gene_item_article_link ;
 			    schema:isPartOf <https://en.wikipedia.org/> .
    }
    OPTIONAL { 
    ?article 	schema:about ?protein ;
                schema:name ?wd_protein_item_article_link ;
 			    schema:isPartOf <https://en.wikipedia.org/> .
    }
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } .
 }

Click here to launch the Wikidata query

Seppi333 (Insert 2¢) 18:54, 26 July 2023 (UTC)[reply]

Hi Seppi333. This is really cool! I am impressed that you were able to do this so quickly. I don't have any experience with SPARQL either, but it looks pretty powerful. I didn't realize that there is one to many relationship between HGNC and UniProt IDs. I was under the impression that various splice variants of a gene would be listed under a single UniProt entry, but apparently not always the case. Concerning unmatched HGNC/UniProtpairs, these look to be rare, so I would think that these are simply an uncorrected errors. So it looks like it will be straight forward to add extra columns to your list, assuming that there is consensus to do so. We need for the AfD to close and then decide what to do. Thanks for your help! Cheers. Boghog (talk) 19:14, 26 July 2023 (UTC)[reply]

No worries. Main reason I did this was out of curiosity; I just wanted to see how feasible it is to perform a dataset merger using matched gene-protein identifiers in two different databases. On the off-chance that a consensus to merge the lists or expand my list occurs, I can expand the SPARQL query to return other data items that are linked to each HGNC-UNIPROT ID pair, like "expressed in" for the linked gene and "molecular function", "cell component", "biological process" for the linked protein. Seppi333 (Insert 2¢) 19:46, 26 July 2023 (UTC)[reply]

After looking more into this, it looks like my problem is a solution to another problem. Probably going to need to create a data pipeline from HGNC to Wikidata and get a Wikidata bot approved to add missing HGNC-UNIPROT ID links in Wikidata from the HGNC dataset prior to running a script that queries Wikidata and merges the returned dataset with the HGNC data.

Also, I used a slightly modified version of the query above to find proteins without UNIPROT IDs that are linked to HGNC IDs. After quickly skimming the 400 or so additional proteins the query returned, I noticed EAAT3 (Q11856447) was missing a UNIPROT ID, which I thought was odd considering that I'm familiar with the gene and protein. The UNIPROT ID is already linked to SLC1A1 (Q18031520), and I don't even need to look at a database to know these are duplicate Wikidata entries. I probably ought to generate dataset of HGNC IDs that are linked to two or more proteins where at least one protein is missing a UNIPROT ID to find duplication issues in Wikidata like this one. Would need to be manually checked, but it should be fairly manageable (less than 1000 entries). For now, it's probably best to let this duplicate entry remain unmerged so that I know whatever SPARQL query I write to locate duplicate entries actually finds that true positive case.

#title: Sort on the UNIPROT_ID column to see the ~400 proteins linked to an HGNC ID that are missing a UNIPROT identifier.
SELECT ?gene ?geneLabel ?HGNC_ID ?HGNCsymbol ?protein ?proteinLabel ?UNIPROT_ID ?wd_gene_item_article_link ?wd_protein_item_article_link
 {
   ?gene wdt:P31 wd:Q7187 .
   ?gene wdt:P703 wd:Q15978631 .
   ?gene wdt:P279 wd:Q20747295 .
   ?gene wdt:P354 ?HGNC_ID .
   ?gene wdt:P353 ?HGNCsymbol .
   ?gene wdt:P688 ?protein .
    OPTIONAL {
    ?protein wdt:P352 ?UNIPROT_ID .
    }
    OPTIONAL { 
    ?article 	schema:about ?gene ;
                schema:name ?wd_gene_item_article_link ;
 			    schema:isPartOf <https://en.wikipedia.org/> .
    }
    OPTIONAL { 
    ?article 	schema:about ?protein ;
                schema:name ?wd_protein_item_article_link ;
 			    schema:isPartOf <https://en.wikipedia.org/> .
    }
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } .
 }

Click here to launch the Wikidata query

First it was a mountain of DABlinks on Wikipedia. Now it's this. I always seem to find tons of stuff that needs to be fixed whenever I do any work involving my bot script... lol.

In any event, I'll bring this up at Wikidata:Wikidata:WikiProject Molecular biology once I do a little more work on this. Seppi333 (Insert 2¢) 02:47, 28 July 2023 (UTC)[reply]

So… looks like the consensus was to draftify. I could reopen the original RFBA to generate consensus. Was wondering if you think others might support revising the lists. I don’t mind doing the programming work, but I don’t really want to unilaterally attempt to convince everyone for bot approval like last time. Seppi333 (Insert 2¢) 17:32, 3 August 2023 (UTC)[reply]

@Seppi333: Hi Seppi. The creater of the list, Claes Lindhardt, asked How do I update my vote to Merge?. I interpret that comment as there is support to expand your list by merging. But what is meant by merge? We can merge by columns or rows. I think merging by columns make a lot more sense since the required data can be downloaded from reliable web sources. For such a huge list, human generated content is not practical. The target articles are a much more logical place to put human generated content. The question then becomes is what columns to add. Boghog (talk) 18:02, 3 August 2023 (UTC)[reply]

PS: How did you pick up SPARQL so quickly? I have a use case in real life to write UniProt SPARQL queries. Do you have any recommendations for learning SPARQL? Learning SPARQL look promising. Boghog (talk) 18:02, 3 August 2023 (UTC)[reply]

Eh, I’m not really that inclined to make the lists editable, if that’s what you meant. I suppose I could try reopening the RDBA in about 2 weeks when I’ll have more time to focus on this.

As for SPARQL, I just picked it up from looking at example syntax and through trial and error. It’s my first query language, but a lot of people tell me I’m a really fast learner, so IDK if that approach works the same for everyone. Seppi333 (Insert 2¢) 20:46, 3 August 2023 (UTC)[reply]

@Seppi333: I’m not really that inclined to make the lists editable I complety agree with you, that would be an incredibly bad idea. There is no need to re-invent the wheel. Just download the data from HUGO and UniProt. Boghog (talk) 20:56, 3 August 2023 (UTC)[reply]

Since you do a lot of citation work...

These pieces might interest you. Headbomb {t · c · p · b} 17:50, 1 August 2023 (UTC)[reply]

Citation Help

Hi BogHog,

I'm having a bit of trouble citing one of my sources on my page of Obturator hernia, can you take a look at it? Thank you! Immanueltjahjadi (talk) 21:56, 1 August 2023 (UTC)[reply]

Hi Immanueltjahjadi. Thank you for your contributions to the Obturator hernia. I am not certain which reference you are referring to. Did you mean Schizas 2021? Since that source was already cited and was already had a ref name "Schizas_2021", you just need to insert the tag <ref name = "Schizas_2021" /> to the location you want to recite Schizas 2021 (see WP:REFNAME). Cheers. Boghog (talk) 04:25, 2 August 2023 (UTC)[reply]

ALS Good Article Nomination

Hey there wiki-buddy! I'm hoping I can attract some interested folks to consider reviewing the Wikipedia page about amyotrophic lateral sclerosis for Good Article status. As you may know, ALS is a rare and fatal neurodegenerative disease that quickly causes people to lose the ability to move, speak, and breathe. The Wikipedia page about ALS is read over 2,000 times each day in English alone, and often experiences spikes in traffic whenever a celebrity is diagnosed. There have recently been a number of genetic advances made in the space and some recent drug approvals, thanks in part to the momentum started by the ALS Ice Bucket Challenge. I've been grinding away at it since early this year but keen to see it improve further, hope you'll consider! PaulWicks (talk) 08:38, 6 August 2023 (UTC)[reply]

Good article reassessment for Osteopathic medicine in the United States

Osteopathic medicine in the United States has been nominated for a good article reassessment. If you are interested in the discussion, please participate by adding your comments to the reassessment page. If concerns are not addressed during the review period, the good article status may be removed from the article. ~~ AirshipJungleman29 (talk) 15:55, 8 August 2023 (UTC)[reply]

Category:Genes on human chromosome has been nominated for merging

Category:Genes on human chromosome has been nominated for merging. A discussion is taking place to decide whether this proposal complies with the categorization guidelines. If you would like to participate in the discussion, you are invited to add your comments at the category's entry on the categories for discussion page. Thank you. Gjs238 (talk) 22:44, 15 August 2023 (UTC)[reply]

Hi Boghog, the article Tandem repeat seems to be missing some substantive content. Shouldn't satDNA be at least briefly described there (e.g. in the Terminology section)? Cf. Satellite DNA. TaurenMoonlighting (talk) 22:00, 28 August 2023 (UTC)[reply]