Jump to content

Squid (software)

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 59.90.224.47 (talk) at 11:09, 19 December 2012 (→‎Reverse proxy). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Squid
Initial releaseJuly 1996
Stable release
3.2.4 [1] / December 2, 2012 (2012-12-02)
Repository
Written inC/C++ (Squid 3)
Operating systemCross-platform
Typeweb cache, proxy server
LicenseGNU General Public License
Websitehttp://www.squid-cache.org

Squid is a proxy server and web cache daemon. It has a wide variety of uses, from speeding up a web server by caching repeated requests; to caching web, DNS and other computer network lookups for a group of people sharing network resources; to aiding security by filtering traffic. Although primarily used for HTTP and FTP, Squid includes limited support for several other protocols including TLS, SSL, Internet Gopher and HTTPS.[2]

Squid was originally designed to run on Unix-like systems. The Windows port was maintained up to version 2.7 but more current versions are not being developed.[3] Released under the GNU General Public License, Squid is free software.

It is used by the Wikimedia Foundation on Wikipedia.[4]

History

Squid was originally developed by Duane Wessels as the Harvest object cache, part of the Harvest project at the University of Colorado at Boulder.[5][6] Further work on the program was completed at the University of California, San Diego and funded via two grants from the National Science Foundation.[7] Duane Wessels forked the "last pre-commercial version of Harvest" and renamed it to Squid to avoid confusion with the commercial fork called Cached 2.0, which became NetCache.[8][9] Squid version 1.0.0 was released in July 1996.[8]

Squid is now developed almost exclusively through volunteer efforts.

Web proxy caching is a way to store requested Internet objects (e.g. data like web pages) available via the HTTP, FTP, and Gopher protocols on a system closer to the requesting site. Web browsers can then use the local Squid cache as a proxy HTTP server, reducing access time as well as bandwidth consumption. This is often useful for Internet service providers to increase speed to their customers, and LANs that share an Internet connection. Because it is also a proxy (i.e. it behaves like a client on behalf of the real client), it can provide some anonymity and security. However, it also can introduce significant privacy concerns as it can log a lot of data including URLs requested, the exact date and time, the name and version of the requester's web browser and operating system, and the referrer.

A client program (e.g. browser) either has to specify explicitly the proxy server it wants to use (typical for ISP customers), or it could be using a proxy without any extra configuration: “transparent caching”, in which case all outgoing HTTP requests are intercepted by Squid and all responses are cached. The latter is typically a corporate set-up (all clients are on the same LAN) and often introduces the privacy concerns mentioned above.

Squid has some features that can help anonymize connections, such as disabling or changing specific header fields in a client's HTTP requests. Whether these are set, and what they are set to do, is up to the person who controls the computer running Squid. People requesting pages through a network which transparently uses Squid may not know whether this information is being logged.[10] Within UK organisations at least, users should be informed if computers or internet connections are being monitored.[11]

Reverse proxy

The above setup—caching the contents of an unlimited number of webservers for a limited number of clients—is the classical one. Another setup is “reverse proxy” or “webserver acceleration” (using http_port 80 accel vhost). In this mode, the cache servers an unlimited number of clients for a limited number of—or just one—web servers.

As an example, if slow.example.com is a “real” web server, and www.example.com is the Squid cache server that “accelerates” it, the first time any page is requested from www.example.com, the cache server would get the actual page from slow.example.com, but later requests would get the stored copy directly from the accelerator (for a configurable period, after which the stored copy would be discarded). The end result, without any action by the clients, is less traffic to the source server, meaning less CPU and memory usage, and less need for bandwidth. This does, however, mean that the source server cannot accurately report on its traffic numbers without additional configuration, as all requests would seem to have come from the reverse proxy. A way to adapt the reporting on the source server is to use the X-Forwarded-For HTTP header reported by the reverse proxy, to get the real client's IP address.

It is possible for a single Squid server to serve both as a normal and a reverse proxy simultaneously. For example, a business might host its own website on a web server, with a Squid server acting as a reverse proxy between clients (customers accessing the website from outside the business) and the web server. The same Squid server could act as a classical web cache, caching HTTP requests from clients within the business (i.e. employees accessing the internet from their workstations), so accelerating web access and reducing bandwidth demands.

Media-range limitations

This feature is used extensively by video streaming websites such as YouTube, so that if a user clicks to the middle of the video progress bar, the server can begin to send data from the middle of the file, rather than sending the entire file from the beginning and the user waiting for the preceding data to finish loading.

Partial downloads are also extensively used by Microsoft Windows Update so that extremely large update packages can download in the background and pause halfway through the download, if the user turns off their computer or disconnects from the Internet.

The Metalink download format enables clients to do segmented downloads by issuing partial requests and spreading these over a number of mirrors.

Squid can relay partial requests to the origin web server. In order for a partial request to be satisfied at a fast speed from cache, Squid requires a full copy of the same object to already exist in its storage.

If a proxy video user is watching a video stream and browses to a different page before the video completely downloads, Squid can not keep the partial download for reuse and simply discards the data. Special configuration is required to force such downloads to continue and be cached.[12]

Supported platforms

Squid can run on the following operating systems:

Performance

The Squid web site claims that if working in front of the server application, it can improve performance by up to four times. Squid is especially efficient in case of (probably unexpected) high traffic to one or several particular pages, as in this case near 100% of caching can be achieved.

See also

References

  1. ^ "Squid Versions". Retrieved 2 December 2012.
  2. ^ "Squid FAQ: About Squid". Retrieved 13 February 2007.
  3. ^ "Squid 3 for Windows". The development of Squid 3 for Windows (3.0 and 3.1 branches) is stopped since the bazaar migration of the Squid 3 VCS
  4. ^ "Squids". Retrieved 27 December 2012.
  5. ^ Squid intro, on the Squid website
  6. ^ Harvest cache now available as an "httpd accelerator", by Mike Schwartz on the http-wg mailing list, Tue, 4 April 1995, as forwarded by Brian Behlendorf to the Apache HTTP Server developers' mailing list
  7. ^ "Squid Sponsors". Archived from the original on 14 October 2007. Retrieved 13 February 2007. The NSF was the primary funding source for Squid development from 1996-2000. Two grants (#NCR-9616602, #NCR-9521745) received through the Advanced Networking Infrastructure and Research (ANIR) Division were administered by the University of California San Diego
  8. ^ a b Duane Wessels Squid and ICP: Past, Present, and Future, Proceedings of the Australian Unix Users Group. September 1997, Brisbane, Australia
  9. ^ netcache.com, Wayback Machine
  10. ^ See the documentation for header_access and header_replace for further details.
  11. ^ See, for example, Computer Monitoring In The Workplace and Your Privacy
  12. ^ "Squid Configuration Reference". Retrieved 26 November 2012.
  13. ^ OS/2 Ports by Paul Smedley, OS/2 Ports

Further reading