Wikipedia talk:WikiProject Perl

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Stats[edit]

Items categorised as outlines get a total of about 500,000 hits per month.

James Bond: Most likely someone added either a lot of portal links or talk page banners on that day.

Rich Farmbrough, 11:47, 7 October 2011 (UTC).

use LWP::Simple;
 
$month = "09";
$year = "2011";
$lang="en";
 
while (<>){
    s/ /_/g;	
    print "$_";
    $page=get ("http://stats.grok.se/$lang/$year$month/$_" );
    $page =~ /has been viewed (\d+) times in/;
    $total+=$1;
    print " $total\n ";    
}
 
print "\nTotal: $total";

You need a list of pages as well.

You also need perl I reccomend Strawberry for Windows

Rich Farmbrough, 17:48, 7 October 2011 (UTC).

Fishing...[edit]

I love that little statistic you provided. It's a fish. I need more.

500,000 per month, that's 6 million per year! My guess was 5.

'Give someone a fish and you feed him for a day; teach the person to fish and you feed him for a lifetime.' (See Distributism).

Please teach me how to fish. How did you generate that stat?

I look forward to your reply. The Transhumanist 17:28, 7 October 2011 (UTC)

Text editor?[edit]

What (free) text editor do you recommend for editing perl scripts? The Transhumanist 21:47, 10 December 2011 (UTC)

I use mainly VIM, as recommended by Anomie (also vi, notepad and the command line editor), I also have Perl IDE but I haven't done much with it. The main problem with VIM is that it doesn't cope with Unicode. Rich Farmbrough, 23:25, 10 December 2011 (UTC).
I'll try 'em. Thank you. The Transhumanist 00:47, 11 December 2011 (UTC)

Perl text editor[edit]

Do you know of any (copyleft) text editors and/or word processors written in perl? I'd like to familiarize myself with how they work. The Transhumanist 21:49, 10 December 2011 (UTC)

No idea on this one. Rich Farmbrough, 23:25, 10 December 2011 (UTC).

These aren't Perl specific but try taking a look at notepad ++ here and Scintilla here. They may lead you to some helpful information. You can also check out Sourceforge fro some good stuff written in Perl. All three of these are Free open source software related. --Kumioko (talk) 00:07, 11 December 2011 (UTC)

Thank you. I'll take a look. The Transhumanist 00:41, 11 December 2011 (UTC)

Re: a little challenge[edit]

Previously, you wrote:

In fact a little challenge:
  1. get the stats for the previous year for one page
  2. output the data in a format suitable for a wiki-page - using a by-month table and a year total on the right.
  3. do the same for a list of pages
  4. We could build this into a little bot.

I saw how to do #1 and #3 in your initial ("Stats") script. How do you do #2? The Transhumanist 22:01, 11 December 2011 (UTC)

OK so by "Previous year" I meant Dec 2010, Jan 2011, Feb 2011....
To output the data in Wiki-format you just need to use the print command. Perl is generally very forgiving about print:
print '{|\n!December\n!-\n...';

(note both types of quotes work, they are subtlety different.)

print "\|$number";
You might need to use a for loop.
Rich Farmbrough, 22:11, 11 December 2011 (UTC).

But how do you put the data in a file ("print" just displays it on the screen, right?), and then how do you place it in a page on Wikipedia? The Transhumanist 03:21, 16 December 2011 (UTC)

smile. One thing at a time. If you open a file for output then you can print to it.
open MYPAGE, ">mypage,txt;
print MYPAGE "Some words and a newline.\n";
close MYPAGE;
Rich Farmbrough, 18:06, 17 December 2011 (UTC).

Nice. By the way, was that supposed to be "mypage.txt" (mypage dot txt)?

Thank you for the tip. I'm now reading the Input and Output chapter of the Llama book.

And I found the documentation on get () (which you used in the initial script).

Okay, here's my next question...

Now that you have content in a file, how to you place that content on a Wikipedia page? The Transhumanist 23:53, 17 December 2011 (UTC)


Re:Stats[edit]

Okay, I've been studying Perl, and today I finally took a crack at the script you sent me:

use LWP::Simple;
 
$month = "09";
$year = "2011";
$lang="en";
 
while (<>){
    s/ /_/g;
    print "$_";
    $page=get ("http://stats.grok.se/$lang/$year$month/$_" );
    $page =~ /has been viewed (\d+) times in/;
    $total+=$1;
    print " $total\n ";
}
 
print "\nTotal: $total";

It's a command with the syntax perl script list

You use LWP, because that's the module where "get ()" is.

The "$" lines set literal variables to the values provided.

while is a looping command, and in this case works on the default variable $_. The default here appears to be each successive entry in the list specified.

The angle brackets <> turn the script into a command that is executable from the command prompt in the same way that a Unix command is.

In the loop, you substitute all spaces for underscores, to make the entries work in URLs.

Then you print the current entry to the screen, but print; would have done the same thing.

You follow that with pulling in the output from toolserver. For example http://stats.grok.se/en/201109/Outline_of_geography. In the same operation, you assign the output to the variable $page.

Then you employ the bind operator to specify a pattern (regular expression) match from toolserver's output (taking the match from the content of the $page variable), for the purpose of using the automatic match variable $1. The \d matches digits and the + means one or more of them in a row.

Then you assign the matched string to $total using a cumulative numeric assignment operator. Because it's a numeric operator, Perl automatically strips out the non-numerical stuff from the string (well, not quite, the stuff on the left of the numbers is set to zero, while the stuff on the right is dropped).

Basically, you've scraped the monthly page views from toolserver's output.

Then you print that value to the screen and advance to a new line.

And the loop repeats on the next item in the list.

When the loop is done, you repeat the final total at the end.

I'm ready for my next one. Please send me another simple but useful Wikipedia-related script. The Transhumanist 01:48, 9 December 2011 (UTC)

P.S.: Thank you for the Strawberry recommendation. It works fine.

P.P.S.: is there a collection of perl scripts on Wikipedia somewhere?

Good work. The angle brackets actually take next line of input. If you ran this without the list file, the script would take input from the command line, one item at a time. The input from the angle brackets is automatically assigned to $_. (As you can see, perl does a lot of stuff automatically for us.) I'll ferret around for something tomorrow, and see what I can find.
I'm not sure if there's much simple perl floating around, perhaps we should start a library. But there are quite a few bots, Anomie's code is rather beautiful, if a little obscure. Rich Farmbrough, 02:32, 9 December 2011 (UTC).
In fact a little challenge:
  1. get the stats for the previous year for one page
  2. output the data in a format suitable for a wiki-page - using a by-month table and a year total on the right.
  3. do the same for a list of pages
We could build this into a little bot.
Rich Farmbrough, 02:36, 9 December 2011 (UTC).
Yes, I'm intersted.
On a similar vein, a script or bot that I have great need for is one that builds a chart (similar to this) of subjects, with columns showing comparitively the monthly traffic for outline, portal, and category corresponding to each subject listed. It could take input from a list similar to the script you sent me.
Is that something you'd be interested in helping to create? The Transhumanist 03:50, 9 December 2011 (UTC)
OF course, that is where we started, wasn't it? Rich Farmbrough, 11:12, 9 December 2011 (UTC).
What's the plan? To pass code back and forth, or wiki-develop it on a project page? The Transhumanist 00:45, 11 December 2011 (UTC)

Outline of Perl[edit]

Here's a new outline.

You could help us Perl newbies by adding anything you think would be helpful. The Transhumanist 19:21, 7 January 2012 (UTC)

perl table construction script[edit]

I don't have a clue where to start.

I'd like the table to list subjects down the left, with columns for traffic on the right. One traffic column for the corresponding outline, category, and portal for comparison purposes.

And totals at the bottom of each column.

If you whip something up, I'm sure I could help refine it.

I look forward to any perl code you can throw at me. The Transhumanist 02:29, 3 January 2012 (UTC)

P.S.: Happy New Year!

OK so here's the (untested) basics in pseudo-perl. (There's two approaches, storing everything then making the table,or making the table line by line. Both have advantages, the latter is simpler.)
let us suppose we have a config file with the subject, outline, cats and portals listed thus:

Stamford,Outline of Stamford, Category:Stamford, Wikipedia:WikiProject Stamford

(We could just have the word "Stamford" - if we could be sure that all three entities follow the naming convention.)
print_headers...
 
while (<>){
    chomp;
    if (/^([^,]*),([^,]*),([^,]*),([^,]*)$/){ # Note: this could be also done with the split function, in a different way
       $name=$1;
       $outline=$2
       $cat=$3;
       $project=$4;
   }
   else{
       print "$_ does not match pattern; skipping.\n";
   }
   $outline_count=count($outline);
   $cat_count=count($outline);
   $project_count=count($project);
   print "\|$name\|\|$outline_count\|\|$cat_count\|\|$project_count\n\|-\n"; # make a line of the table....
   # keep track of the totals....
   $outline_total+=$outline_count;
   ...
}
 
print_footers....
 
sub count{
    # in some circumstances there would be error checking code here - what if the page doesn't exist,or the server is down?
    $url=shift;
    get the page...
    $count= find the number..
    return $count
}
Rich Farmbrough, 11:01, 3 January 2012 (UTC).
I'll see if I can figure out how it works. Thank you! The Transhumanist 21:24, 4 January 2012 (UTC)

A couple questions...[edit]

In the annotation Perl script you wrote...

What does $page do?

I tried opening a file into $page, and it didn't work:

open $page, "Outline.txt" or die $!;
while ($page =~ /\n\*\s*\[\[([^\])]*\]\]\s*\*/s ){
   $bulleted = $1;
   $entry =   get ($bulleted);
   $entry =~ s/.*?'''.*?'''//;
   $entry =~ s/([^\.]*.[^\.]*.).*/$1/;
   $page =~ s/(\n\*\s*\[\[$bulleted\]\]\s*)\*/$1 $entry/;
}


I used the following script to test the behavior of $page:

open $page, "Outline.txt" or die $!;
while ($page){
   chomp;
   print "$_\n"
}

It just produced blank space.

I tried the above script without the "open" line, providing "Outline.txt" as a command line argument, and it still didn't work.

I use regex all the time, but file handling in perl has me stumped.

The Transhumanist 01:25, 19 March 2012 (UTC)

Yes, that's not how I would open a file - if I wrote that it was very strange.
open FILE, "Outline.txt" or die $!;
will open the file.
Then you need to read from it. Something like:
@array=<FILE>;
$page=join "\n", @array;
close FILE;

perlmonks

The you are good to go. Rich Farmbrough, 02:10, 19 March 2012 (UTC).
Or just
while (<FILE>) {$page.=$_}
TMTOWTDI. Rich Farmbrough, 02:17, 19 March 2012 (UTC).


Is it normal for beginner Perl students' heads to spin? (Mine is spinning).  :)

You provided the following script fragment in a previous thread:

while ($page =~ /\n\*\s*\[\[([^\])]*\]\]\s*\*/s ){
   $bulleted = $1;
   $entry =   get ($bulleted);
   $entry =~ s/.*?'''.*?'''//;
   $entry =~ s/([^\.]*.[^\.]*.).*/$1/;
   $page =~ s/(\n\*\s*\[\[$bulleted\]\]\s*)\*/$1 $entry/;
}

There is definitely something missing, because the script does not work when run, even when I replace the guts with

while ($page){
   chomp;
   print "$_\n";
}

I don't understand "$page". It's not defined in the script, and I don't know how to define it from the examples you just provided.

It doesn't appear that this fragment can be dropped into the new script you provided above.

What is missing? The Transhumanist 03:34, 19 March 2012 (UTC)

Yes $page is the contents of the page. How you get the contents of the page is another matter. Remember data types in perl are somewhat flexible - $page does not need to be defined unless you
 use strict;
which you probably should. So in the example I gave (gluing it together)
open FILE, "Outline.txt" or die $!;
while (<FILE>) {$page.=$_}
close FILE;
 
while ($page =~ /\n\*\s*\[\[([^\])]*\]\]\s*\*/s ){
   $bulleted = $1;
   $entry =   get ($bulleted);
   $entry =~ s/.*?'''.*?'''//;
   $entry =~ s/([^\.]*.[^\.]*.).*/$1/;
   $page =~ s/(\n\*\s*\[\[$bulleted\]\]\s*)\*/$1 $entry/;
}
 
# Now do something with the text we have created.
open FILE, "Annotated.txt" or die $!;
print FILE $page;
close FILE;

the text was loaded from a file. There would need to be a subroutine to get the Wikipage $bulleted.

Rich Farmbrough, 03:34, 19 March 2012 (UTC).


I swapped out the guts to test the file handling portion...

open FILE, "Outline.txt" or die $!;
while (<FILE>) {$page.=$_}
close FILE;
 
while ($page){
   chomp;
   print "$_\n";
}

...and it didn't work.

What did I do wrong? The Transhumanist 03:54, 19 March 2012 (UTC)

P.S.: is there supposed to be a "." after "$page"?   -TT

Yes ".=" appends so it's the same as "$page = $page . $_;"
The critical difference is that the while loop in my code has a match in it. While that match is true it loops.
Assuming there is some text in your "Outline.txt" file, your code will stay in the while loop forever, printing whatever is in the $_ variable; If $page evaluates to false, (which probably means an empty file) then it will just finish.
open FILE, "Outline.txt" or die $!;
while (<FILE>) {$page.=$_}
close FILE;
 
print $page;

would be all that was needed. Rich Farmbrough, 04:04, 19 March 2012 (UTC).


Pagestats doesn't appear to work anymore[edit]

It looks like they changed the output at http://stats.grok.se...

I tried running this script again (last time was in September), to get a new total for outline traffic, and it doesn't seem to work right. Just returns zeros now.

use LWP::Simple;
 
$month = "02";
$year = "2012";
$lang="en";
 
while (<>){
    s/ /_/g;
    print "$_";
    $page=get ("http://stats.grok.se/$lang/$year$month/$_" );
    $page =~ /has been viewed (\d+) times in/;
    $total+=$1;
    print " $total\n ";
}
 
print "\nTotal: $total";


On the command line I specified a file that is a list with bare unbracketed article names, one article name per line.

perl Pagestats Outlinelist.txt

Does the script work for you?

I look forward to your reply. The Transhumanist 00:53, 23 March 2012 (UTC)


I found the problem. Solved by removing " times in" from the script.

The new total is 586,206. The Transhumanist 01:33, 23 March 2012 (UTC)

Excellent! Rich Farmbrough, 01:34, 23 March 2012 (UTC).

Viewing outlines with or without annotations[edit]

The next improvement I'd like to tackle is to provide some way to toggle an outline's annotations off/on (all at the same time) while viewing the outline in Wikipedia!!!

For example...

The user is browsing Wikipedia and has just arrived at an outline page. It's fully annotated, but he wants to look at the page uncluttered by the annotations.

How could we make it so that all he has to do is press a hot key to make (all of) the annotations disappear?

And then reappear by pressing a different hot key.

What are the possible approaches to implementing this?

Sincerely, The Transhumanist 23:54, 15 March 2012 (UTC)

Hmm well, the two that spring to mind are using the collapse functionality - navboxes can be set to collapse if there is more than one of them, so presumably this can be brought under control using the same technology (I assume CSS), or java-script. The Javascript code would need to be installed as default, whereas css an be soemwhat standalone, I think, although the preference is for having it all centrally stored. Rich Farmbrough, 02:30, 16 March 2012 (UTC).
It sounds like a javascript might be the best approach. Though it would be nice to have the functionality built-in on the browser level (via add-on). Do you know any add-on programmers? The Transhumanist 23:14, 16 March 2012 (UTC)
I don't that I know of. And I have to disagree, add-ins are great but not for something you wan to be standard WP functionality. However the two tasks become very similar if you use Scriptish. Rich Farmbrough, 12:18, 23 March 2012 (UTC).

BTW, I'm stuck on the thread preceding this one (extract/insert annotations). I posted a bunch of new questions up there for you (I mention them here just in case you missed them). The Transhumanist 23:14, 16 March 2012 (UTC)


Data extraction & insertion[edit]

Let's say I have the wikicode file "Outline of Stamford" saved on my computer, and I want a program that goes through the outline, finds the first bulleted entry lacking an annotation, pulls the article from Wikipedia for the subject in the entry, extracts the first two sentences of the lead paragraph, then inserts those two sentences as the annotation for that entry, then repeats for the next missing entry, until the all the entries have annotations.

This would be very helpful, as it would save tons of manual cutting and pasting.

How would you go about doing that with perl?

The Transhumanist 22:05, 4 January 2012 (UTC)


I'm not sure what "un-annotated" means but at a guess you could use something like:
while ($page =~ /\n\*\s*\[\[([^\])]*\]\]\s*\*/s ){
   $bulleted = $1;
   $entry =   get ($bulleted);
   $entry =~ s/.*?'''.*?'''//;
   $entry =~ s/([^\.]*.[^\.]*.).*/$1/;
   $page =~ s/(\n\*\s*\[\[$bulleted\]\]\s*)\*/$1 $entry/;
}

here the handwaving is in the assumption that the Wikipeida articles are well-formed, and not exceptional. Rich Farmbrough, 22:27, 4 January 2012 (UTC).

You would need to get the source of the article. You need a module for that, which comes with examples. MediaWiki::API I think is the name. Rich Farmbrough, 23:33, 4 January 2012 (UTC).

Entries in outlines look like this:

  • Architecture – art and science of designing buildings.
  • Crafts – activities and hobbies that are related to making things with one's hands and skill.
  • Drawing – visual art that makes use of any number of drawing instruments to mark a two-dimensional medium. As a verb, it is the act of making marks on a surface so as to create an image, form or shape. As a noun, it is the image produced, or the visual art form itself.
  • Film – also called a movie or motion picture, is a series of still or moving images. It is produced by recording photographic images with cameras, or by creating images using animation techniques or visual effects. The process of filmmaking has developed into an art form and industry.
  • Painting – the practice of applying paint, pigment, color or other medium[1] to a surface (support base) with a brush or other objects. The term describes both the act and the result of the action.
  • Photography
  • Sculpture

Concerning list entries, an annotation is a dashed comment.

The entries "Photography" and "Sculpture" above lack annotations. Would the program you wrote above home in on those and add an annotation for each?   The Transhumanist 03:41, 5 January 2012 (UTC)

It would pick up the first, fail on the second for two reasons: it would count the endash as an annotation, and there's no following list item. Rich Farmbrough, 11:13, 5 January 2012 (UTC).

I'm stuck![edit]

(I had to return the programming books to the library).

I don't know what to do to be able to use the while loop you provided above on an outline.

That is, how do you make it read the outline file into the $page variable?

Also, what did you mean by "handwaving"?

Once the annotations are inserted, how do I save the outline back to disk?

When this script becomes fully operational, I expect it will do more than 50% of the work on outlines. Because inserting annotations by hand is tedious as hell, and all of the outlines have entries that need annotations. We're talking tens of thousands of annotation insertions. I can't stress how helpful this tool will be.

How fast do you think it could insert 100 annotations? [

I look forward to your reply. The Transhumanist 23:49, 16 March 2012 (UTC)

  • The while loop will run while something is in the $page that consists of a newline followed by a bulleted link with nothing after it on the line.
  • So this is done .. wait didn't we do this? Depending on where the file is, by the reading it from disk as we discussed, or by loading it form Wikipedia.
  • "Handwaving" means the bit of the argument that is glossed over. Often it is a good idea to simply not worry about some problems until they can be actually met with (like developing the internal combustion engine, without worrying too much about people getting lost in strange towns), but sometimes this can be disastrous (like setting out across the desert without planning your water consumption).
  • Once the text is completed you can save with
open FILE, ">:", "somefilename.txt" or die;
print FILE $page;
close FILE;

Rich Farmbrough, 00:39, 27 March 2012 (UTC).


Think of it as a change in routine, and a change of pace...[edit]

The block may be a blessing in disguise.

This may give editors who have had a hard time keeping your attention the opportunity to converse with you on a more meaningful level (i.e., not rushed).

Why would we want to?

Because you are an expert on many aspects of Wikipedia.

This vacation gives you valuable time to share your expertise and experience with other Wikipedians.

Personally, I have many questions for you... – The Transhumanist 03:55, 1 April 2012 (UTC)

Hey[edit]

I just took a look at your user page, to see if it provides any info on the types of questions you would be able to answer, and I noticed you're from London. Half my family tree lives around there.

I haven't been to London since 1997. Almost got killed jaywalking 3 times, due to looking the wrong way before crossing. I guess it's not "jaywalking" over there, because it's legal — for you it's just crossing the street. I think it's cool that you have the right to cross the street. Here we are subject to getting ticketed by the police if we cross anywhere other than at an intersection.

By the way, that you drive on the other side of the street over there makes it easy to spot foreigners. I noticed many of them looking the wrong way.

I also learned that clotted cream tea is not tea with clotted cream in it. :) The Transhumanist 04:46, 1 April 2012 (UTC)

Mm, worth going to Devon and Cornwall just for the cream teas. They are good elsewhere but that is the home of the cream tea. Rich Farmbrough, 16:15, 1 April 2012 (UTC).
Incidentally the Magna Carta had a provision for access to the highway I believe. Other rights, of course, have been eroded massively over the last 20-30 years. Notably extra-territoriality, retroactive legislation, double jeopardy, right to silence, the rights of the second chamber and just about anything that might be construed as "fundamental" has been thrown to the wolves of political opportunism. The few that have been saved have been as much as a result of the political opportunism of the opposition of the day as principled resistance by backbenchers. Of course historically it was ever thus, but the extremism of recent events, considering that we are not in the straitened circumstances of previous eras, is telling. Rich Farmbrough, 16:49, 1 April 2012 (UTC).

Wow, that's a lot of edits[edit]

Still number 1, I see. And closing in on the 1,000,000 edit mark. The Transhumanist 05:25, 1 April 2012 (UTC)

Yes, but blocked... one could hypothesise a link, between people who do very similar edits to me, and call for me to be blocked using Freudian analysis, but that would be unkind (if funny). Rich Farmbrough, 16:04, 1 April 2012 (UTC).

What is the most advanced operation you've used AWB for?[edit]

And how did you do it? The Transhumanist 05:38, 1 April 2012 (UTC)

Hm, well I did use it for checksum calculations and ISBN hyphenation. Basically I wrote a perl program to write a program to write the rulebase.The hyphenation was just a large number of rules, but the calculations involved implementing a partial arithmetic parser in regular expressions, including full addition and multiplication tables modulo 11 (and possibly 10 as well). Rich Farmbrough, 16:03, 1 April 2012 (UTC).

What do your bots consist of?[edit]

I.e., what are they made of (what languages, programs, etc.)? The Transhumanist 05:01, 1 April 2012 (UTC)

I use perl and AWB. For example the main bot runs on perl (because I was being blocked for using AWB), but if I have a one-off job it is often quicker to use AWB. Even there though I use perl to write some of the rules. Rich Farmbrough, 15:58, 1 April 2012 (UTC).

Can AWB be used to remove redlinks from a list?[edit]

(For example, see: Outline of Mozambique)

How?   The Transhumanist 05:44, 1 April 2012 (UTC)

Kinda.

Use the list maker to make a list of "Links on page (redlinks only)".

Save the list to a text file.

Replace the carriage returns in the text file with "|". Copy the content.

Create a normal rule that replaces \[\[(<paste the contents here>)\]\] with $1

Run it against the page in question.

Rich Farmbrough, 16:13, 1 April 2012 (UTC).

Do you have a perl script...[edit]

...that opens a file, does something to it, and then saves it under a new filename?

I need to see how that is done. The Transhumanist 05:51, 1 April 2012 (UTC)

Er... so I think we covered this? Something like.
OPEN FILE, "<:utf-8" , "oldfile";
while (<FILE>){ $text .= $_}
CLOSE FILE

# do some stuff
$text =~ s/e/z/; #  replace e with z to even up letter usage across the universe a little

OPEN FILE, ">:utf-8" , "newfile";
print FILE $text;
CLOSE FILE

Rich Farmbrough, 16:08, 1 April 2012 (UTC).


I hate semicolons![edit]

It took me over an hour to realize my script didn't work because a semicolon was missing from the end of a line. The Transhumanist 19:53, 2 April 2012 (UTC)

If I had a shilling for every time I'd done that I'd be Rich! Rich Farmbrough, 20:19, 2 April 2012 (UTC).
And that, you are. – The Transhumanist 20:07, 3 April 2012 (UTC)

How do I pull a file into a scalar?[edit]

I opened a file, and tried to define a variable to be the contents of the filehandle, like this:

open(LIST,      "list.txt") || die("can't open list.txt: $!");
 
$list = LIST;            #pull LIST into a scalar variable (doesn't work)
 
print "$list"            # to see if it worked, display contents of $list, which should be the file list.txt

But it just prints out the filehandle!

What am I doing wrong? The Transhumanist 19:53, 2 April 2012 (UTC)

<LIST>
this will pull the next record in scalar context. If you set the record separator appropriately you should get the whole file. (It's a special variable $/ and $\ are the input and output separators.)

It is not clear from what you've said how to use the record separators. What should the line look like? Like this...?

$list = $/<LIST>;$\            #pull LIST into a scalar variable using record separators (doesn't work)

I'm trying to be able to use the following line of code to search a file for a string. If it's in there, I want the program to run a subroutine. If it's not in there I want the program to run a different subroutine.

$list =~ m/stringcheckingfor/   #look for string in contents of list.txt

I'm kinda stuck. The Transhumanist 21:27, 2 April 2012 (UTC)


OK, the angle brackets work, but it only prints out one line from the file. If what you meant was to put record separators in the file, then how do you search files without preprocessing every single file with the insertion of record separators? What if I want to search a file that's not a list and still be able to use the file for something else? The Transhumanist 22:22, 2 April 2012 (UTC)

When perl reads the file, it uses \n or \r\n or \n\r as the file separator (depending on OS) - this is stored in the variable $/ . If you set $/ to the end of file marker (or some sting you will not encounter) I would expect it would read the whole file. Other methods are using binary mode, or reading a line at a time:

while (<FILE>) {$text .= $_}

Rich Farmbrough, 22:54, 2 April 2012 (UTC).


I found something called "local" that seems to do the trick:

local $/;                # I don't know what this does, but it works.
 
open(LIST,      "list.txt") || die("can't open list.txt: $!");
$list = <LIST>;            #pull LIST into a scalar variable (doesn't work)
close(LIST);
print "$list"            # to see if it worked, display contents of $list, which should be the file list.txt

Though I'm not exactly sure why this works. The Transhumanist 23:18, 2 April 2012 (UTC)

It works because it makes the value of "$/" undefined -- a value that can never occur in a file. The "local" keyword does this as a side effect of its main purpose (controlling variable scope); you can get the same effect by assigning "undef" to the global copy of "$/", as in "$/ = undef;". --Carnildo (talk) 01:44, 3 April 2012 (UTC)

Checking each item from one list against another list[edit]

I have two lists. list1.txt and list2.txt.

local $/;                                # I don't know what local does
open(LIST1,      'list1.txt') || die("Can't read file 'list1.txt': [$!]\n");
open(LIST2,      "list2.txt") || die("Can't read file 'list2.txt': [$!]\n");
 
$list2 = <LIST2>;          #pull LIST2 into a scalar variable
 
while (<LIST1>){       # start while loop on the first list (angle brackets take next line of input)
 
# Search $list2 using the current line of input from LIST1 as the search string (I don't know how to do this yet without making the script fail to compile).  I plan to write two subroutines, one for true and one for false.
}
 
close(LIST1);
close(LIST2);


I can't believe I'm still in the file IO. I haven't even gotten to the guts of the program yet. Frustrating!   The Transhumanist 00:28, 3 April 2012 (UTC)

ok, local is creating a scoped version of $/ that is undefined. I haven't tried this but I suppose it works, and rather nicely in a way, since if you were using this in a block the default value of $/ would come back when you leave the block.

Now the problem you have is that you will slurp file 2 the same way you slurped file 1. So you need something like

local $/;                                # I don't know what local does
open(LIST1,      'list1.txt') || die("Can't read file 'list1.txt': [$!]\n");
 
$list2 = <LIST2>;          #pull LIST2 into a scalar variable
close(LIST2);#  close LIST 2 as early as we can
$/="\n"; # revert 
 
open(LIST2,      "list2.txt") || die("Can't read file 'list2.txt': [$!]\n");
 
while (<LIST1>){       # start while loop on the first list (angle brackets take next line of input)
   chomp; # Maybe?
   if ($list2 =~ /$_/){
       tru_sub();
   }
   else {
       false_sub();
   }
# Search $list2 using the current line of input from LIST1 as the search string (I don't know how to do this yet without making the script fail to compile).  I plan to write two subroutines, one for true and one for false.
}
 
close(LIST1);

ATB. Rich Farmbrough, 00:55, 3 April 2012 (UTC).


With bare bones subroutines...[edit]

local $/;                                # I don't know what local does
open(LIST1,      'list1.txt') || die("Can't read file 'list1.txt': [$!]\n");
 
open(LIST2,      "list2.txt") || die("Can't read file 'list2.txt': [$!]\n");
$list2 = <LIST2>;          #pull LIST2 into a scalar variable
close(LIST2);#  close LIST 2 as early as we can
$/="\n"; # revert
 
while (<LIST1>){       # start while loop on the first list (angle brackets take next line of input)
   chomp; # Maybe? (seems to work OK)
   if ($list2 =~ /$_/){
       tru_sub();
   }
   else {
       false_sub();
   }
}
 
close(LIST1);
print "\n\n";
print "$list2";    # display contents of list2.txt (as a test)
 
sub tru_sub {
     print "$_";       # display it on the screen so you can see that it is working
     print "\n\n"
}
 
sub false_sub {
     print "This subroutine doesn't do anything yet (other than print this message)\n";
}


The print functions show that the program actually works.

Now I have the places to put the guts. Thank you!

By the way, what is this part of the program called, an IO skeleton?   The Transhumanist 06:01, 3 April 2012 (UTC)

Installing Wikipedia locally[edit]

Is Wikipedia downloadable?

Do you have it installed on your computer? The Transhumanist 01:02, 3 April 2012 (UTC)

You can download the content (see database dumps on my user page) and the software (www.mediawiki.org). I have both, but not the content loaded into the software. Rich Farmbrough, 01:21, 3 April 2012 (UTC).
Cool. What does loading the content into the software entail?
It's pretty easy. The process is described at http://www.mediawiki.org/wiki/Manual:ImportDump.php.
I'm thinking that testing programs on a local copy of Wikipedia could be useful. Having access off-line would also be nice (my Internet access is sporadic).
I'm curious as to what use the database dump is without having it loaded into MediaWiki. What do you use it for? The Transhumanist 06:14, 3 April 2012 (UTC)


If you just want offline access, check out Kiwix -- MarkAHershberger(talk) 13:35, 10 April 2012 (UTC)
It's XML so you can write perl to scan it very easily. Also AWB has facilities to scan it. It's useful for identifying problem articles, making reports, doing statistics and extracting data. Rich Farmbrough, 12:50, 3 April 2012 (UTC).

Subroutine article needs a Perl section[edit]

The article subroutine has C examples, but no Perl examples. The Transhumanist 01:35, 10 April 2012 (UTC)


I can't get WP using Perl's get command...[edit]

I'm having trouble grabbing Wikipedia pages with Perl. I wrote the following script to test the get command which works just fine for pulling the html source from other web pages, including Google, but it doesn't seem to want to work on Wikipedia. It also works on toolserver.

use LWP::Simple;
 
$page=get ("http://en.wikipedia.org/wiki/Template:Journalism");
 
print "\n\n";
print " $page\n ";

Curl works for getting Wikipedia pages, but the above script does not. What's going on here? The Transhumanist 06:12, 10 April 2012 (UTC)

I came here from a similar post at the technical village pump. I know absolutely nothing about Perl, but could it be because Wget is disabled in Wikipedia's robots.txt file? Graham87 06:57, 10 April 2012 (UTC)
I know almost nothing about Perl, but perhaps this is similar to the problem in Python explained at Wikipedia:Reference desk/Archives/Computing/2011 December 29#spider. It boils down to Wikipedia not accepting a request with an unusual User-Agent header.-gadfium 08:51, 10 April 2012 (UTC)
I think meta:User-Agent_policy will help. --Tokikake (talk) 09:52, 10 April 2012 (UTC)
Thank you. That document mentions LWP, the Perl module that get is in, and that the functions in LWP lack user agent headers. Now I need to find out how to add one of those. The Transhumanist 11:33, 10 April 2012 (UTC)
That's easy -- use LWP::Useragent (http://search.cpan.org/~gaas/libwww-perl-6.04/lib/LWP/UserAgent.pm). However, you'll note that the meta:user-agent policy page also says Do not copy a browser's user agent for your bot, as bot-like behavior with a browser's user agent will be assumed malicious. But setting the user agent string is easy. — Preceding unsigned comment added by 66.212.79.111 (talk) 22:34, 10 April 2012 (UTC)
Yep, LWP is rejected, you have to set something explicitly. My string was a made up combination of many browser names. But I've been told it should include my email address. No thank you. Rich Farmbrough, 18:17, 26 May 2012 (UTC).
Actually it is worse than this, if you are using a default agent string from LWP the software will not only reject but claim there are server difficulties (i.e. lie to you).

User agents that send a User-Agent header that is blacklisted (for example, any User-Agent string that begins with "lwp", whether it is informative or not) may encounter a less helpful error message (lie) like this:

Our servers are currently experiencing a technical problem. This is probably temporary and should be fixed soon. Please try again in a few minutes.


Rich Farmbrough, 18:39, 26 May 2012 (UTC).

What's missing in Perl coverage?[edit]

What is missing from Wikipedia's coverage of the Perl programming language?

(To see what we do have so far, most of the articles on Perl have been gathered to the Outline of Perl).   The Transhumanist 23:23, 16 April 2012 (UTC)

Make perl-related pages show up in searches for "perl programming"[edit]

I tried this in spring of this year, and after intervention of people i don't even know how to classify, gave up in disgust at both their behavior on a human level and the lack of venues to inquire properly about this manner.

However recently this issue came up again, so i'm willing to take a tentative stab at this.

As such, the needed action is as the subject states: On as many pages as possible an occurence of the simple word "Perl" must become "Perl programming" or "Perl programming language". One such change is enough, since the goal is to increase discoverability via the search facility.

The shortest reason is:

Wikipedia is currently a factor that is killing Perl.

Slightly longer:

TIOBE is a company that curates a programming language contest. Their algorithm is built to find instances of "[lang] programming", meaning languages with non-unique names have an advantage in their algorithm, since they are commonly referred to as "[lang] programming language", while languages with unique names fare worse since their discussion remains hidden to the algorithm. It is common knowledge among experienced developers that their algorithm is highly flawed and worthless. This however is irrelevant to newcomers and managers who do not possess the technical knowledge or background knowledge and make decisions influenced by TIOBE and news articles referencing them.

Of the score assigned to languages Wikipedia makes up 13% and is the third-largest factor.

Now, with the why having been stated, i am willing to do the actual changes, given guidance on how to proceed exactly.

In return i'd be happy to improve articles where updates and review is requested, as well as improve code samples to not use Perl style from 1990 anymore.

Mithaldu (talk) 14:55, 8 August 2013 (UTC)

  • The question is "Do we go over all Perl-related pages and replace Perl with the Perl programming language?", as I see it. There are a number of problems that such change creates, as I observed. First, when talking about Perl, it's usually called by its one-word name. Also typically a link to "programming" or "programming language" appears on the page separately, and linking it at the first occurence instead of force-putting the phrase to the first Perl occurrence seems logical. And such change involves breaking some idioms such as "Perl best practices", "Perl 5", "Perl 6", "written in Perl", "Perl documentation", as opposed to "the Perl 5 programming language", "the Perl 6 programming language", "written in the Perl programming language", "the Perl programming language documentation". I would encourage that you skim a Perl book, blog post, mailing list archive, or similar, to examine word usage. Gryllida 08:29, 11 August 2013 (UTC)
I do see a number of questions here:
  • Is this a reason for action that Wikipedia is willing to accept as a valid one?
I'm still unclear on whether or not the general stance of WP on this is "That's your problem and not something to start making changes over." or "That sounds reasonable, let's work out how we do this best." Can you opine on that? Mithaldu (talk) 09:17, 11 August 2013 (UTC)
Result of IRC discussion between Gryllida and me: Gryllida does not currently have knowledge on this point. Input from others highly welcomed. Mithaldu (talk) 11:46, 11 August 2013 (UTC)
  • What are the possible ways we can do this?
The end goal here for me is not necessarily to change the article texts, but: To make perl-related pages show up in searches for "perl programming". Editing the article text simply is the only solution i am currently aware of. We did try other things before, like editing Template:Perl to include the term, yet Mastering Perl still does not show up in the search. Maybe moving that Template to Template:Perl Programming Language would do some to help. Maybe adding a category tag with the appropiate string would work? Are there maybe ways to add tags in some manner that show up to the search engine, but are not rendered to someone simply perusing the page? Mithaldu (talk) 09:17, 11 August 2013 (UTC)
Result of IRC discussion between Gryllida and me: Important note of detail: The search engine in question here is the Wikipedia search engine, not any external ones. As such a single change is all that is needed per page. Right now it looks like the search engine indexes only page sources, without resolving template contents. It is possible that the search engine indexes the names of included templates, as such i will be conducting an experiment on the Bonsai (software) page. Meanwhile Gryllida will look into the functionality of the search engine and maybe see if it's possible to get it to index template contents on pages as well. Mithaldu (talk) 11:58, 11 August 2013 (UTC)
  • If there is no other way of doing this, what would be the best way to make changes?
As a non-native speaker who takes pride in the level of English learnt i'm both quite keen on not disrupting the prose of articles changed, but also slightly disadvantaged since disruptive changes will be less obvious to me. I'd not engage in brute force changes. (And quite frankly would like to state that i did not do so previously. While some objections to my changes were valid, a lot of them were done simply to revert a batch operation and quite a few of the justifications brought by Incnis do not make logical sense and seem like rationalization after the fact.) Would a useful process for this being for me to notify you when i've done a batch of changes, so you can review them by looking at my change history and note objections here for discussion? Mithaldu (talk) 09:17, 11 August 2013 (UTC)
I think you misunderstood. The word is used without the "programming" bit in professional literature and having an encyclopedia sacrifice professional wording for the sake of search visibility is bad. This is what programming infoboxes are for, so the words "Perl" and "programming" are on the same page. Gryllida 10:57, 11 August 2013 (UTC)
Can we defer discussing that until the other two questions are adressed? :) Mithaldu (talk) 11:09, 11 August 2013 (UTC)
I believe there is an important point we both agree on, namely that this 'engine' (the special:search one) doesn't care how many times a phrase is mentioned on a page. This may simplify things a bit and leave purely technical questions, with the change not necessarily affecting content. Gryllida 11:53, 11 August 2013 (UTC)
Agreed and written about in more detail above.
Result of IRC discussion between Gryllida and me: Two points have to be made here:
For one, the current use of Perl without the postfix programming language, is not a conscious decision, but entirely a linguistic coincidence resulting from the uniqueness of the word Perl.
For the other, this activity here is in line with the Perl community itself. Multiple well-known people have agreed to aid in the larger effort, and multiple websites have already made relevant changes in headers or footers. As such a difference from other literature is not a very important factor and likely to change.
References: http://perlmonks.org/ http://blogs.perl.org/ http://www.modernperlbooks.com/mt/index.html http://www.perl.org/ http://blogs.perl.org/users/mithaldu/2013/08/do-your-piece-to-fix-tiobe-or-stop-talking-about-it.html
Mithaldu (talk) 12:03, 11 August 2013 (UTC)
  • Please consider contacting TIOBE and bringing the point to them, namely the point of programming languages names being used in their short form inside of the communities; this happens to other languages too, such as Python or Ruby. The algorithm looks incomplete and I would encourage fixing it. Gryllida 08:34, 11 August 2013 (UTC)
I have considered that, but feel there is little point in doing so, for these reasons: The failure modes of their algorithm are highly obvious and other languages (Haskell) have encountered this problem before, a long time ago. Yet TIOBE continues to go on with their algorithm mostly unchanged. Further, they've written uncharitably about Perl before, but also in a manner which shows both ignorance and prejudice. Lastly, they don't seem to be much interested in any scientific method, which is evidenced by the fact that they delete previous releases of their TIOBE index along with the prose by simply overwriting them when a new release is made. Thus i have invited other members of the Perl community to try and contact them, and personally focus on things that actually have a chance of effecting change. Mithaldu (talk) 08:47, 11 August 2013 (UTC)

Break[edit]

I try to sum up the current concerns the way I see them.

  1. Unlike the original proposal, content does not have to be rephrased. This is only a question of improving results for "Perl programming language" at Special:Search (for a TIOBE contest thing).
  2. Does the Special:Search engine only look at markup or does it expand templates?
  3. If so, would renaming-or-making-a-redirect with {{Perl}} to {{Perl programming language}}, and adding the latter to relevant pages, resolve the issue? Or can the engine be reprogrammed to look at content with expanded templates?
  4. If the former, is it accepted by Wikipedia policies?

Cheers, Gryllida 12:20, 11 August 2013 (UTC)

Good summary, i agree with it as stated. Mithaldu (talk) 12:23, 11 August 2013 (UTC)
I just concluded the two week experiment of editing the pages for Higher Order Perl and Bonsai. Neither of them showed up in search results after renaming the template tag. So other paths will need to be taken. Mithaldu (talk) 11:38, 27 August 2013 (UTC)


Cite error: There are <ref> tags on this page, but the references will not show without a {{reflist}} template (see the help page).