Jump to content

User talk:Taavi: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
UTRS Account - Does not meet requirements
m Reverted edits by DeltaQuadBot (talk) to last version by Lowercase sigmabot III
Line 112: Line 112:
</div></div> <section end="technews-2020-W34"/> 20:40, 17 August 2020 (UTC)
</div></div> <section end="technews-2020-W34"/> 20:40, 17 August 2020 (UTC)
<!-- Message sent by User:Johan (WMF)@metawiki using the list at https://meta.wikimedia.org/w/index.php?title=Global_message_delivery/Targets/Tech_ambassadors&oldid=20366028 -->
<!-- Message sent by User:Johan (WMF)@metawiki using the list at https://meta.wikimedia.org/w/index.php?title=Global_message_delivery/Targets/Tech_ambassadors&oldid=20366028 -->
== Your UTRS Account ==
You have no wikis in which you meet the requirements for UTRS. Your account has been removed and you will be required to reregister once you meet the requirements. If you are blocked on any wiki that UTRS uses, please resolve that before registering agian also. -- [[User:DeltaQuadBot|DQB]] ([[User talk:AmandaNP|owner]] / [[User talk:AmandaNP|report]]) 11:40, 20 August 2020 (UTC)

Revision as of 12:50, 20 August 2020


MajavahBot auto scanning talk pages

Hey there, Majavah! Been a while! :)

I was wondering something. Would it be possible for your bot to be set up so it automatically scans talk pages and if it detects one exists, it "sets itself up" for archiving that page with some pre-specified settings (period, etc.)? I got the idea after monitoring IA Bot's contributions in the Albanian Wikipedia. We've set it to post talk page messages after it does anything and as you may already know, if you leave that option on, soon enough you start having big numbers of talk pages in need for archiving. I was going to ask if something could be done about it. Like using your bot to archive them in a different archive dedicated to IA Bot. But then I thought that even if the talk page wasn't created by the IA Bot, it would be nice if it could be set up automatically for archiving and I came here to ask you about it. I'm thinking mainly Mainspace but even some other tech-spaces could benefit from a function like that. If that's possible, if we could also have a dedicated IA Bot archive for mainspace that would be even better but let's ask for the basics first. :P - Klein Muçi (talk) 11:07, 28 July 2020 (UTC)[reply]

I see what you mean by them getting long (that list has longest talk pages that are not subpages and do not contain a link to the archive config template). It shouldn't be that hard to write a separate script to automatically add the archive template to long pages if the community wants that.  Majavah talk · edits 11:51, 28 July 2020 (UTC)[reply]
Oh yeah, we do so much! :P I can create a VP post about it if you want proof but, as we've talked some time before, in general, as much automation as we can provide, the better it is for us since we're lacking human resources in the tech things. What specifications are you thinking of? I mean, how many posts to let in the page after archiving, how often to archive, etc. Also, can anything be done about a "special" IA Bot archive? Or is that too complex with not much benefits? Ideally it would be like this: MajavahBot detects a post by IA Bot in a talk page and after a certain period of time has passed, it archives it in a special archive dedicated to messages like these. The archive somehow shows in the template for archives along the normal archive. - Klein Muçi (talk) 12:02, 28 July 2020 (UTC)[reply]
I think that technically it would be easiest to make the bot subst a template that contains the archival config template with some default settings to pages which need that. That would allow you (or other editors) to edit the defaults as necessary by editing the template. It would need some tinkering but it might be doable.  Majavah talk · edits 13:54, 28 July 2020 (UTC)[reply]
Well, whenever you can help on that, tell me. Also feel free to tell me if I can somehow help by providing any needed information or anything similar. :) - Klein Muçi (talk) 14:07, 28 July 2020 (UTC)[reply]
Hi, I'll probably have time for this next week. Could you please create a page (a subpage of sq:User:MajavahBot, of course name it in your language) and add the archival config template and any other contents (the archive navigator for example) inside it so contents of it can just be appended to top of all talk pages automatically.  Majavah talk · edits 17:45, 31 July 2020 (UTC)[reply]

Hello! I'm not sure I fully understand your request (because it looked too easy :P ). I made this page. It has the most common settings we use when archiving pages. Of course the name of the archive will change according to article's talkpage the bot is archiving. What content can I potentially add more than that? Any suggestions? Also, what name do you suggest for the bot's subpage? Can you give me one example in English? I'm not really sure how you plan to use that so I'm not sure about the name either. For now it's just "arkivimi automatik" meaning "automatic archiving process". I'd like the name to have meaning according to the page usage. Did I answer your request? - Klein Muçi (talk) 09:54, 1 August 2020 (UTC)[reply]

Unrelated question: Archiving can be delayed for a particular thread by substituting the template {{DNAU}} into the thread. Use {{subst:DNAU}} to retain a thread indefinitely, or {{subst:DNAU|<integer>}} to retain a thread for <integer> days. see the template documentation for details about its use and function. Why would the template need to be substituted? Any particular reason? I want to add the following text to the bot's primal page so users know how to keep the bot away from a particular thread, if they so wish to. - Klein Muçi (talk) 10:12, 1 August 2020 (UTC)[reply]
The template needs to be substed because the template just adds a timestamp (within a comment) in the future which makes the bot to not archive it because it's not enough time in the past. The bot does not actually try to look for any templates, it just looks for timestamps. I'll reply with more info about the configuration page later, can't do that currently.  Majavah talk · edits 19:36, 1 August 2020 (UTC)[reply]
The idea for the page is that I can just use a bot to add "{{subst:User:MajavahBot/Simple archive setup}} to top of all pages and it would just work adding the archive config and navigators etc. It can use some template tricks (I can help with those) to add the page name to archive config. I think that the current page name "automatic archiving process" is fine, as that's exactly what that template is supposed to do (automatically add everything needed to archive the page)  Majavah talk · edits 17:37, 3 August 2020 (UTC)[reply]
Oh, okay then. Tell me when we can go on with the next phase. :) - Klein Muçi (talk) 23:09, 3 August 2020 (UTC)[reply]
Hi, I think we can proceed. I have written a little script that can do this. Can you please translate an edit summary for these edits (something like Bot: Automatically setup archiving for this talk page)?  Majavah talk · edits 08:01, 5 August 2020 (UTC)[reply]
MajavahBot: Vendosja e faqes së diskutimit për arkivim automatik - It would be good if we could have the bot's name showing up instead of just "Bot". We do have that for other bots and it helps distinguishing their edits from one another. If that can be done, I'm thinking of changing the 4 TranslateWiki strings related to the bot to also include that name. If that can't be done, you can use Roboti instead of MajavahBot, which is the Albanian word for "Bot". But I strongly insist on using the bot's name.
Question: This header, the automatic archiving configurations, will be added at ALL main talk pages just as soon as they're created or is there a bit of discrimination? How have you designed that script to work related to this aspect? - Klein Muçi (talk) 09:47, 5 August 2020 (UTC)[reply]
I was planning on adding the headers for talk pages that are over 5,000 bytes long (this is just over 5k bytes) as there is no need to archive every single talk page with a comment but it should catch everything that's at least a little longer. How does that sound?
For translations: the translations at translatewiki.net should not contain my bot's name. I could modify the script to replace the word "Bot: " in all edit summaries with "MajavahBot: " if you want.  Majavah talk · edits 12:18, 5 August 2020 (UTC)[reply]

Yes and yes. The number feels right and that replacement could really help. Can we run 1-2 manual runs after you're done so we can sort out any unseen bugs/minor problems? Even though to be honest, I hope everything is all right because even 1 manual run would mean changing many pages the way it is designed. :P - Klein Muçi (talk) 12:33, 5 August 2020 (UTC)[reply]

For technical reasons I'll be doing the tagging in batches, very likely at max 100 per day. There are currently 787 talk pages that meet the criteria so it will take just over a week to go thru the backlog. Is that an okay speed or do you want me to start slower?  Majavah talk · edits 19:19, 5 August 2020 (UTC)[reply]
Can we do like 20 the first 2 times and then we go on at full capacity, whatever that is? - Klein Muçi (talk) 19:24, 5 August 2020 (UTC)[reply]
Yes, sure. Are there any more questions or issues or can I do the first test run tomorrow morning?  Majavah talk · edits 19:47, 5 August 2020 (UTC)[reply]
No, everything's good. Proceed with the test whenever you're ready. :) - Klein Muçi (talk) 23:07, 5 August 2020 (UTC)[reply]
I did a test run of 20 pages (and I did one manually to test that the template works). List of edits the bot did is available here. If they look good, could you also add information about this auto-tagging to the bot's user page?  Majavah talk · edits 07:51, 6 August 2020 (UTC)[reply]
Everything seems fine except for 1 problem. There are cases like this (see at the bottom) where articles have templates in their talk pages that shouldn't be archived. Other examples: Here and here. For the moment being, these talk page template messages aren't yet standardized for us and take different forms according to the one who makes them and their aim. We need to find a solution for those in general. After dealing with this, we may start the second 20 entries test, and after that I'll take care of changing the bot's user page. - Klein Muçi (talk) 10:07, 6 August 2020 (UTC)[reply]

Given that I saw this page, maybe we can have another part where we can enter our templates and the bot will know not to archive them? Or maybe you can think of a better option? Maybe just making sure that the archive config. part is below the template is a better solution? - Klein Muçi (talk) 12:24, 6 August 2020 (UTC)[reply]

We have article tags here on enwiki too. It would be technically challenging to not archive certain things due to weird edge cases (it's probably doable but I don't think it's a great idea if there are better alternatives). The bot won't archive anything that isn't in a section that has a timestamp. None of the pages you linked (or other pages I checked) have those templates in a timestamped section. Is that enough? One idea would also be to not tag any pages that do not have any sections.  Majavah talk · edits 12:43, 6 August 2020 (UTC)[reply]
Ooh, okay then! Can you do another batch of 20 entries as a final test today? Then I'll take care of the changes for the user page and we can proceed with full capacity. - Klein Muçi (talk) 14:16, 6 August 2020 (UTC)[reply]
Sure! That script is currently running and will be finished in a minute or two.  Majavah talk · edits 15:02, 6 August 2020 (UTC)[reply]
Just checked the new entries. They seem all fine so we can proceed with full capacity from now on. I also updated the bot's user page to include information about the automatic archiving process. I also included this page as a switch for that process. That's its function, no? Finally I protected that page so it can be changed only by admins/crats/stewards. I guess that concludes the work. Thank you a lot for helping our community for the third time now! :)) - Klein Muçi (talk) 23:38, 6 August 2020 (UTC)[reply]
Great, I've set up a scheduled job to run the automatic archiving setup job every night couple of hours before the job that actually archives threads. I also changed the code to use "MajavahBot" instead of "Roboti" while saving archived pages.  Majavah talk · edits 16:59, 7 August 2020 (UTC)[reply]

Thank you! Final question: I believe the automatic archive set up process is only for mainspace talk pages no? If that's so, do you think that's enough? Or should we also include other namespaces and even subpages? - Klein Muçi (talk) 17:47, 7 August 2020 (UTC)[reply]

Yes, it only applies to mainspace talk pages (NS 1). I think that's enough (that's the only NS that InternetArchiveBot posts in). If you need to have it on more namespaces just come back here and I can modify the script.  Majavah talk · edits 18:39, 7 August 2020 (UTC)[reply]
Okay then. Thank you! :)) - Klein Muçi (talk) 10:39, 8 August 2020 (UTC)[reply]

A barnstar for you!

The Technical Barnstar
For your help with the MajavahBot in SqWiki. We're really seeing the benefits now. - Klein Muçi (talk) 15:32, 10 August 2020 (UTC)[reply]
Thank you!  Majavah talk · edits 19:15, 10 August 2020 (UTC)[reply]

16:06, 10 August 2020 (UTC)

Help with autocategorization

Hey there, Majavah! I'm trying to deal with a problem in SqWiki. Maybe you can help me with an idea since you deal with the tech stuff. This is not related to archiving or MajavahBot though.

In SqWiki we have +1 more CS1 citation error (compared to EnWiki). If you use a citation without setting the language, you get an error and the article gets automatically categorized here. We need it for statistical purposes. The problem is that, as you can see from that page, there are thousands of articles in need of a language parameter. Can you think of any way I can automatize this job somehow? Maybe with AWB or any kind of bot? Of course, manually is easy to do it. You just open the article, check the link used in the citation, determine its language and put it in the citation. But doing that thousands and thousands of times... That's time consuming, to say the least and impossible, to be realistic. Can you think of any way I can somehow lower that number with any kind of tool? - Klein Muçi (talk) 17:26, 13 August 2020 (UTC)[reply]

@Klein Muçi: Some of them could probably be done with a bot and or a gadget similar to ProveIt, however that's not an area I'm familiar with and I don't have time to spend that much time on.  Majavah talk · edits 20:31, 15 August 2020 (UTC)[reply]
Don't worry. I wanted just a general idea, if you had. Like, what would the bot look for (the pattern) so it could determine the language? Even for only some of them.
I wasn't aware of ProveIt functionalities. Maybe that can help a bit. Thank you! :)) - Klein Muçi (talk) 20:43, 15 August 2020 (UTC)[reply]
The HTML root tag (<html>) should have a lang attribute that could be used. For example this page has it set to en: <html class="client-nojs" lang="en" dir="ltr">.  Majavah talk · edits 20:46, 15 August 2020 (UTC)[reply]
Oh... Interesting... I didn't know that... Okay then, thank you! :)) Maybe I'll find someone to help me with designing a bot regarding that attribute. - Klein Muçi (talk) 20:59, 15 August 2020 (UTC)[reply]

20:40, 17 August 2020 (UTC)