I have created this page to examine vandalism on Wikipedia. My first project is analyzing the ratio of IP-based vandalism to signed in user vandalism.
A simple test to determine percentages of IP versus logged-in vandalism can be done with google.
My method in this test was very straightforward, based on one basic assumption, that the use of the test4 template is a fair vandalism indicator. A google search for the first several words limited to Wikipedia returns 95,700 results.
To identify only anonymous users I included a second phrase, a portion of the message that appears at the bottom of all anon user talk pages. This search returned 83,500 results. Indicating that the vast majority of test4 templates are used on anon user pages: 87.2%.
In many discussions, users cite a figure of about 90% for anon vandalism. My test's result: 87.2% is in line with this figure. However, there are several objections to the methods employed in my survey:
- In many cases, logged-in users might receive more personalized vandalism responses.
- Not all vandals are warned using test4.
- Many sock-puppet vandals do not receive the full template cycle before being blocked.
- Test4 is only used for hardened vandals and therefore does not demonstrate the full extent of vandalism.
In light of these weaknesses, I estimate that my survey is accurate within 5%, so I estimate that the proportion of IP vandalism is 82.2%-92.2%.
Percentage of IPs who are vandals
This figure is of course much harder to calculate, but a google search for the same phrase earlier used to identify anon talk pages returns a total of 297,000 results. So, an incredible 28.1% of anons with a talk page have received the test4 template at some time! That means that a far greater percentage have probably engaged in at least some form of vandalism. Of course, the principal problem with this experiment is that, in my experience, anons are often communicated with only in cases of warnings (i.e., they don't have the same degree of talk interaction as most signed-in users). All in all, I find that 28.1% is little more than an interesting figure.
For comparison, Wikipedia has 1,542,477 registered users. Assuming that 80% of these (yes I just kind of made that figure up) have talk pages, there would be about 1,240,000 registed user talk pages, 12,200 of which have received test4 warnings. Under the 80% assumption then, slightly less than 1% of users with talk pages have been warned with test4.
Again, I do not feel the methodology of this portion to be as sound, but using the above figures, an anon is 28 more times likely than a signed-in user to be a vandal.
A second test, using a section of the blatantvandal template returns 52,600 total uses with 39,200 uses on IP talk pages, a much lower rate of 74.5%. However, I feel that this test is less clear than the test4 experiment because:
- Signed-in users are more likely to be sockpuppets; potential sockpuppets are often treated with harsher template such as bv
- The sample size is much smaller.