User:Novem Linguae/Essays/Toolforge bot tutorial: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
→‎Bot passwords: new section
(2 intermediate revisions by the same user not shown)
Line 121: Line 121:
*You must change the permissions of the file that is making the changes, from 644 to 744. This adds the "execute" permission to it.
*You must change the permissions of the file that is making the changes, from 644 to 744. This adds the "execute" permission to it.
*This idea may be important later for getting your bot to run. If you decide to use an .sh file to execute your main bot file, then the .sh file will need 744 permission.
*This idea may be important later for getting your bot to run. If you decide to use an .sh file to execute your main bot file, then the .sh file will need 744 permission.

=== Pick a framework ===

* Whatever language you're writing your bot in, you'll probably want to pick a framework (external library) specifically designed for logging into Wikipedia and using its API.
* For PHP, I use the ancient framework [[User:RMCD bot/botclasses.php|botclasses.php]]. It's not modern, but it fits nicely in one file.
* One file means I can simply copy/paste the code into my repo, then do <code><?php include('botclasses.php');</code>, and now I can log in to Wikipedia and execute API commands with a lot less code. Example botclasses.php code:

<syntaxhighlight lang="php"><?php

include('botclasses.php');

// Log in
$wp = new wikipedia();
$this->wp->http->useragent = '[[en:User:NovemBot]] task A, owner [[en:User:Novem Linguae]], framework [[en:User:RMCD_bot/botclasses.php]]';
$this->wp->login('usernameGoesHere', 'passwordGoesHere');

// Get page wikicode
$wikicode = $this->wp->getpage('User:NovemBot/userlist.js');

// Edit page wikicode
$this->wp->edit(
'User:NovemBot/userlist.js',
$page_contents,
'Update list of users who have permissions (NovemBot Task A)'
);</syntaxhighlight>

=== Bot passwords ===

* Best practice is not to use your bot's actual username and password when logging in.
* Instead, create a bot password at [[Special:BotPasswords]] just for that bot or that task, and use that.
* Benefits
** If your credentials leak (e.g. you commit them to git accidentally, or you forget to set your password files to 0644 on Toolforge), your main account is not compromised.
** You can give each bot password limited permissions. So for example, if your bot is a template editor, you can have one bot task edit template protected pages, and another bot task that can't. This helps limit damage in the case of password compromise.


== Web access ==
== Web access ==

Revision as of 17:48, 18 January 2023

This is my Toolforge bot tutorial. I had some difficulty using MediaWiki and Wikitech tutorials to set up a bot on Toolforge. In my opinion, there is a big learning curve. These are my streamlined notes that will hopefully help the next person.

This tutorial is optimized for the operating system Windows and the programming language PHP. If you are using a different OS or language, you will need to change some of the steps.

Anywhere it says novem-bot, you should replace that with your Toolforge tool name. Anywhere it says novemlinguae, you should replace that with your wikitech username.

Bot or user script?

  • User scripts are usually easier to make than bots.
  • Use a user script when:
    • You want it to be triggered by a user rather than run every X minutes/hours/days.
    • You want to get up and running quickly
    • You know JavaScript
    • You want the edit to be associated with the editor triggering it, not a bot.
    • You can get the data you need with a couple of MediaWiki API queries and don't need a complex SQL database query.
    • You only need to make edits and/or display an interface to the user via a wiki.
    • You only need to make a couple edits at a time.
  • However there are cases when a bot is the better tool.
  • Use a bot when:
    • You want to do a chore every X hours/days (cron), repetitively, forever, rather than triggered by a user.
    • You don't mind taking awhile to get everything set up in ToolForge
    • You know a back end language such as PHP or Python
    • You want the edit to be associated with a bot, not with the editor triggering it.
    • You need to run complex SQL database queries and it would be impossible/inefficient to just use MediaWiki API queries
    • You want to make your own custom website that isn't nested inside a wiki. For example, XTools or some other web tool.
    • You plan on making dozens or more edits at a time.

Username, project name, programming language

My Wikitech username is novemlinguae, and my project's name is novem-bot. I will use those examples in this tutorial. Substitute them with your own name, as needed.

These notes will get a webserver with the programming language PHP running. If you program in a different language, you will need to adjust some steps.

Apply for a ToolForge account

  1. Create a Wikimedia developer account. This is different from your normal Wikipedia SUL account.
  2. Create an SSH key and add it to your Wikitech account.
  3. Submit a Toolforge project membership request and wait for its approval.
    • Your request will be reviewed, and you will receive confirmation within a week. You will be notified through your Wikitech user account.
  4. Once you are added as a Toolforge member, you must log out and then log in again at https://toolsadmin.wikimedia.org/
  5. Create a new tool

Generate an SSH key

  • SSH is a way to increase password security. You still use a password, but you must also keep a file with a key on your computer. This file is combined with your password to compute a super long and uncrackable password hash, and this long and uncrackable hash is what is sent to the server to log in.
  • Your SSH key will be needed for FTP and for shell/PuTTy.
  • I store my private key file at F:\Dropbox\Code\NovemBot\Gerrit SSH key\wikitech.ppk

FTP client (file transfers)

  • FTP is file transfer protocol. This is one way to get files to and from the server. You usually want to install a program that has a drag-n-drop interface, and you drag files from one side (your machine) to the other side (Toolforge), and vice versa.
  • You must use the FTP program WinSCP, and you must configure it a certain way. Other FTP programs will not work, as they do not have the ability to be configured the quirky way Toolforge wants.
  • protocol: SFTP
  • host: login.toolforge.org
  • user: novemlinguae
  • advanced ->
    • environment -> directories ->
      • remote directory: /data/project/novem-bot
      • local directory: F:\Dropbox\Code\NovemBot\
    • environment -> SFTP -> SFTP server ->
      • this step very important to become the right user (novem-bot)
      • sudo -u tools.novem-bot /usr/lib/sftp-server
    • ssh -> authentication -> private key file ->
      • F:\Dropbox\Code\NovemBot\Gerrit SSH key\wikitech.ppk
  • Connect. Enter password when prompted.

Shell (command line, bash, ssh)

  • You need one of those hacker-looking black text windows so you can type commands to the server. This is called shell, bash, console, SSH, or command line.
  • You can't just use a local shell window. Since you are talking to a remote server, you need a special program.
  • Download and install PuTTy.
  • host name: novemlinguae@login.toolforge.org
  • port: 22
  • connection: ssh
  • Connections -> SSH -> Auth -> Authentication parameters -> Private key file for authentication -> F:\Dropbox\Code\NovemBot\Gerrit SSH key\wikitech.ppk
  • Connect. Enter password when prompted.
  • Once you're connected, you must type become novem-bot to change from your regular account to your project account. If you don't do this, you will have issues with your files belonging to the wrong owner, which will cause issues later.

Bash? Shell? I don't speak Linux

  • Me neither.
  • In Linux, every file has a certain permission level, and this permission level affects who else on the server can see it, execute it, etc. And also what scripts can and can't be executed from certain places.
  • Here's some useful shell commands.
    • become novem-bot - Change from your username to your project name. Important for making sure files and folders you touch have the right owners.
    • cd .. - Navigate up one level.
    • cd folderName - Navigate down one level.
    • cd C:\Documents\ - Navigate to this location.
    • chmod 644 fileName - Change file permissions
    • ls -l or dir - Display directory contents.
    • mkdir folderName - create a directory
    • pwd - Print working directory. Shows you where you are.
    • rm -rf folderName - Delete file/folder and all its contents.
    • take fileName - take a file that is assigned to a different owner, and make you the owner, if able
  • Here's some stuff that is installed and can be accessed in shell

Write and test your bot

  • Do as much of this offline (localhost development environment) as you can.
    • That way you don't have to re-upload your changed files via FTP as you tweak and test code, saving time.
    • For web development, you can use a program like XAMPP to execute PHP files locally, e.g. https://localhost/
    • The Wikipedia API doesn't care what calls it or from where, making localhost development easy.
    • The SQL replica database does care. That needs to be called from Toolforge servers only. Localhost doesn't work.

Protect your passwords

How to let your script modify files on the server

  • Let's say you want a specific file (your bot file, for example), when executed, to write a .txt file on the server with some data.
  • You must change the permissions of the file that is making the changes, from 644 to 744. This adds the "execute" permission to it.
  • This idea may be important later for getting your bot to run. If you decide to use an .sh file to execute your main bot file, then the .sh file will need 744 permission.

Pick a framework

  • Whatever language you're writing your bot in, you'll probably want to pick a framework (external library) specifically designed for logging into Wikipedia and using its API.
  • For PHP, I use the ancient framework botclasses.php. It's not modern, but it fits nicely in one file.
  • One file means I can simply copy/paste the code into my repo, then do <?php include('botclasses.php');, and now I can log in to Wikipedia and execute API commands with a lot less code. Example botclasses.php code:
<?php

include('botclasses.php');

// Log in
$wp = new wikipedia();
$this->wp->http->useragent = '[[en:User:NovemBot]] task A, owner [[en:User:Novem Linguae]], framework [[en:User:RMCD_bot/botclasses.php]]';
$this->wp->login('usernameGoesHere', 'passwordGoesHere');

// Get page wikicode
$wikicode = $this->wp->getpage('User:NovemBot/userlist.js');

// Edit page wikicode
$this->wp->edit(
	'User:NovemBot/userlist.js',
	$page_contents,
	'Update list of users who have permissions (NovemBot Task A)'
);

Bot passwords

  • Best practice is not to use your bot's actual username and password when logging in.
  • Instead, create a bot password at Special:BotPasswords just for that bot or that task, and use that.
  • Benefits
    • If your credentials leak (e.g. you commit them to git accidentally, or you forget to set your password files to 0644 on Toolforge), your main account is not compromised.
    • You can give each bot password limited permissions. So for example, if your bot is a template editor, you can have one bot task edit template protected pages, and another bot task that can't. This helps limit damage in the case of password compromise.

Web access

  • By default, only the command line will work.
  • If you want web access, you will need to specifically turn it on. This can be useful for testing your bot, and for creating bots that are summoned via a web form.
  • Make a folder in your project called public_html. Anything inside this folder will be available from the web.
  • In bash:
    • webservice start
    • As of December 2021, this creates a container with PHP 7.3 and lighttpd
  • Domain: https://novem-bot.toolforge.org/
  • If needed, don't forget to turn it off. Although I just leave mine running.
    • webservice stop
  • Other webservice commands
    • webservice status
    • webservice --backend=kubernetes TYPE_OF_YOUR_TOOL start
  • TYPE_OF_YOUR_TOOLs that can be plugged in above
    • golang111 - Rust
    • jdk11 - Java
    • node10 - Node.js
    • php7.4
    • python
    • python2
    • python3.5
    • python3.7
    • python3.9
    • uwsgi-python
    • uwsgi-plain
    • Mono/.NET is unsupported on Kubernetes, but supported on Grid
  • --backends
    • kubernetes
    • gridengine
  • If you have anything sensitive on the server that you don't want random people to be able to run, make sure to password protect it or similar.
    • <?php if ( $_GET['password'] ?? "" != 'myPassword' ) die(); /* rest of code goes here */

Running at regular intervals (cronjob, kubernetes, grid)

  • A cron job is setting up a server to execute a file at a regular interval.
  • This is exactly what we need for most kinds of bots. Most bots will run a job, finish the job, exit, then need something to start them up again at the appropriate time.
  • There are two ways to do cron jobs on Toolforge:
    • Grid - older, easier, will eventually be deprecated
    • Kubernetes - newer, harder, the one I happened to learn so it will be the one I talk about here
  • create a task-a.sh file (.sh files just run shell commands, .sh stands for shell) with contents:
    • php /data/project/novem-bot/public_html/novembot-task-a.php 'CLI arguments (such as password) go here, if needed by your program'
  • upload the file to your root directory using FTP (/data/project/novem-bot)
  • set file's permissions to 0700. need 7 for kubernetes. need 00 in case your file has a password, so others can't see it
  • create and upload cronjobs.yaml (see below for example file)
    • this is where you pick the type of docker container.
      • PHP, no web server: docker-registry.tools.wmflabs.org/toolforge-php73-sssd-base:latest
      • PHP, web server: docker-registry.tools.wmflabs.org/toolforge-php73-sssd-web:latest
    • tricky to get the schedule field right. use this tool: https://crontab.guru/
  • get kubernetes running in bash
    • kubectl apply -f /data/project/novem-bot/cronjobs.yaml --validate=true
  • check status website to confirm cronjob is running correctly

Example cronjobs.yaml

apiVersion: batch/v1
kind: CronJob
metadata:
  name: task-a
  labels:
    name: novem-bot.task-a
    # The toolforge=tool label will cause $HOME and other paths to be mounted from Toolforge
    toolforge: tool
spec:
  schedule: "01 13 * * *"  # daily at 13:01 UTC. server time is in UTC. https://crontab.guru/
  jobTemplate:
    spec:
      template:
        metadata:
          labels:
            toolforge: tool
        spec:
          containers:
          - name: bot
            workingDir: /data/project/novem-bot
            image: docker-registry.tools.wmflabs.org/toolforge-php73-sssd-base:latest
            args:
            - /bin/sh
            - -c
            - /data/project/novem-bot/task-a.sh
            env:
            - name: HOME
              value: /data/project/novem-bot
          restartPolicy: Never

More Kubernetes commands

  • Add a cron job
    • kubectl apply --validate=true -f $HOME/cronjobs.yaml
  • Delete a cron job
    • kubectl delete cronjob task-a
  • List all cron jobs
    • kubectl get pods
  • Get logs for a cron job
    • kubectl logs [pod name] - there will be multiple pods, insert the pod name with the newest age

Getting help

  • WP:DISCORD's #technical channel is a great resource.
  • Folks that speak Linux and Toolforge and have helped me out in the past include AntiCompositeNumber, Taavi, Chlod, and SD0001.
  • WP:IRC #wikimedia-cloud is the official support channel for ToolForge.
  • WP:VPT can probably help if you prefer onwiki help.