Jump to content

User:Sailsbystars/Proxy Checking

From Wikipedia, the free encyclopedia

Figuring out the proxy mechanism[edit]

  • Lookup the WHOIS info
Is the range someone's home or work computer or is it a webserver sitting in a rack? The easiest way to figure this out is to look up the WHOIS info on the IP. This information is linked to from the bottom of each Special:Contributions page for an IP and in WP:OP headers (or use the whois command in linux). These tell you who owns the IP. If unsure, do a google search on the company listed in the WHOIS record. If the company's web page talks about DSL, Cable, business internet, etc. it's an ISP. If it talks about hosting, rackspace, dedicated servers, VPS (virtual private servers), it's a webhost. Sometimes it will be a university or corporate network instead, but this is relatively unusual.
  • look up the DNS info
Skip this step if it's an ISP (although this can be an alternate method of determining hosting vs. ISP). Info is looked up the same as above (linux command is 'host'). It may be useless on a webhost or it may be your "I win button." Frequently it will return nothing. What you ideally want from this is the name of the website hosted on the server, and sometimes this will come back. This will then take you to the proxy. However, while turning a host name into an IP always works, turning an IP into a host name doesn't since an IP can have multiple host names pointing to it.
  • Google the IP
If something is a proxy, a simple google search will typically turn it up. Even the number of GHITS is very useful. If the number is >1000, then there's a pretty good chance the IP is a proxy (although a small number of hits is not indicative of not a proxy). It's best to find the exact mechanism of course. So here are the most useful google results:
A list of proxies with your target IP in it
'Nuff said. This sort of result is actually pretty common. Hash.es in the proxy check header is one such site.
A list of domain names
If your target IP has a name that's something on the lines of "hideip" or "usaproxy" that is probably the proxy.
Duck evidence
If the IP is advertising a magic blue pill across wide swaths of the internet, there's a pretty good chance it's a proxy.

Verify the proxy[edit]

There are two main types of proxy:

HTTP/Transparent
This is a type of proxy which will show up as something like 192.168.0.1:8080. This type of proxy is accessed by changing your browser's connection or proxy settings. (Word to the wise: use a private browser session to check this, proxies are nasty stuff and will happily sniff your passwords and login cookies that go through them). Then try to access a site which tells you your IP. Good choices: Wikipedia whatismyip. If the IP now matches the suspected proxy, congratulations you've just confirmed it!
Tunneling
Sometimes, the ip that edited wikipedia and the IP you use in the browser settings are different. E.g. you connect to 192.168.0.1:8080, but your IP comes back as 192.168.0.2. The latter is what needs to be blocked from editing wikipedia.
Misconfigured websites/routers
Port 80 is the port your browser goes to by default every time you type in a URL. If the IP seems to have a router config page or seems to be a legit website when you type it into your address bar, check to see if you can use it as a transparent proxy on port 80. If it works, it means either it has been hacked or the sysadmin running it made a pretty big goof. Regardless, it should get blocked.
Web
This is a type of proxy that you will get to by typing in a URL. The key to finding this type of proxy is finding the correct URL to access it. The web page displayed will have a text box where you can enter a URL, and then will display the destination website in a subsection of the proxy website. Again, try one of the sites above and for the sake of all that is holy and just make sure that you're practicing safe browsing and especially make sure you aren't logged in to wikipedia with an admin account!
Shared webhosts
Sometimes one server will host many websites. If it's clearly a shared webhost, it may make sense to block even if you can't find the exact proxy mechanism.

Notes on checking[edit]

  • A word on port scans
If the on-wiki behaviour looks especially suspicious, but google-fu has been unsuccessful at finding the mechanism, one can run a port scan using nmap. This may or may not be considered legal and so should generally be avoided. Some common proxy ports are 80, 8080, 8888, 8000, 3128, but having those ports open does not inherently show a proxy, nor does having all of them closed rule one out. Suspect ports can be tested for verification as an http/transparent proxy. A less intrusive alternative to port scanning is to simply check to see if the IP is a proxy on the common ports.
  • A note on ordering
DNS, WHOIS, and Google checks can be applied in any order. In general, if it's a block request, and the IP is shown to belong to an ISP, then then you can almost entirely rule out a proxy right away, whereas if either DNS or WHOIS returns a webhost, you should be deeply suspicious. Of course, you could just google the IP, and if it's a proxy you'll get the info on how to use it in the first page of hits! The exact order is a matter of personal preference....
  • Google searching? That sounds too easy...
How do you think someone finds a proxy in the first place? Experience teaches one what to search for and what type of results to concentrate on.

Applying the Bits[edit]

Block Requests
Once the proxy is confirmed, the proxy should be blocked with a note about how to access it. If it belongs to an ISP, then it's likely a virus on a home computer. The block should be shortish (no longer than two months) due to the dynamic nature of the IPs. If it belongs to a webhosting company, consider applying a long, hard rangeblock (up to five years) to the entire network after checking for range contribs. The range can be found via a WHOIS check. Factors in favor of a long block: low number of contributions, other proxies on range. It's worth doing a quick proxy check on the most frequent wikipedia contributors on the hosting range, as those will frequently turn out to be other proxies. Factors in favor of a shorter block: constructive contributions. In general, with webhosts, there will be a very low signal-to-noise in the contributions, so long blocks are the norm.
Unblock requests
It's much easier to confirm a proxy than to disconfirm one. If long-term contributor is caught in a proxy block on an ISP, the best immediate option is often issuing an ip block exemption. One should check google for http/transparent proxy listings on that IP and especially check the timestamps on them. If the stamps are recent, the user may have a virus on their computer which is running the open proxy and should be strongly advised to run anti-virus software. If the stamps are older, the proxy has likely jumped IPs and the IP can be safely unblocked.
If the user is editing from a webhost, it is usually best to simply advise them firmly, but politely to disable their proxy, particularly if it's a "new" user.
A word on ProcseeBot (talk · contribs)
ProcseeBot is one of the few bots with an administrative bit. Its sole purpose is to block proxies. It trolls the internet for lists of proxies, checks to see if they can edit wikipedia, and then blocks them for a few months if it can edit. Procseebot is infallible in the sense that when it blocks, the IP was definitively a proxy on the listed configuration. However, it is not uncommon for the offending internet hardware to be fixed or move IPs before the block expires. If the listed proxy is no longer there when checked by a human and no recent proxy lists contain the IP, and the IP isn't part of a webhost it is usually safe to unblock.