User:CBM/TemplateList

From Wikipedia, the free encyclopedia

How to make a list of pages that once used a template, from a database dump.

  1. Download templatelinks.sql.gz and page.sql.gz
  2. zcat templatelinks.sql.gz
    | perl stage1.pl TemplateName TMPFILE
  3. zcat page.sql.gz
    | perl stage2.pl TMPFILE RESULTS
  4. rm TMPFILE

Results are in RESULTS.txt

Source[edit]

Stage1.pl
$template = $ARGV[0];
open OUT, ">", $ARGV[1];

while ( <STDIN> ) { 
    $line++;
    print STDERR ".";
    if ( 0 == $line % 50 ) { print STDERR " pageid: $id hits: $x\n"; }

    while ( $_ =~ /\((\d+),(\d+),'(.+?)'\)/g ) { 
     $id = $1;
     if ( ( $2 == '10' ) && ( $3 eq $template ) ){ 
      print OUT "$1\n";
      $x++;
    }   
  }
}
close OUT;
Stage2.pl
open IN, "<", $ARGV[0];
while ( <IN> ) {
  chomp;
  $seen{$_} = 1;
}
close IN;

open OUT, ">", $ARGV[1];
$x = 0;
$id = 0;
while ( <STDIN> ) {
  $line++;
  print STDERR ".";
  if ( 0 == $line % 50 ) { print STDERR " pageid: $id hits: $x\n"; }

  while ( $_ =~ /\((\d+),(\d+),'(.*?[^\\])',[^)]+?\)/g ) {
    $id = $1;
    $page = $3;
    $ns = $2;
    if ( defined $seen{$id} ) {
      $page =~ s/\\'/'/g;
      print OUT "$ns:$page\n";
      $x++;
    }
  }
}
close OUT;