HTTrack: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Xroche (talk | contribs)
No edit summary
→‎Bugs: uncited anyway and seem to have been resolved (see talk)
Line 7: Line 7:





== Bugs ==
*It can be used for small websites only which can be downloaded all at once. This is because its "continue interrupted download" option is broken. If you choose this option HTTrack may erase all data downloaded so far and start from the very beginning.{{Fact|date=June 2008}} {{Dubious}}
*HTTrack is unable to interpret the [[HTTP]] header "[[MIME#Content-Type|Content-Type]]" correctly. Instead it uses the [[Uniform Resource Locator|URL]]'s extension to determine the [[file format]] in many cases and thus does not name the files correctly. For example: HTTrack downloads a cgi-script "[...]get.htm?1234" that sends a [[JPEG]] image and signals this by "Content-Type: image/jpeg". HTTrack will save this in a file with the [[Filename extension|extension]] ".htm". The [[operating system]] will interpret this as a text/html file. So if you open the file from your hard drive the OS will launch your [[web browser]] and all you see are strange letters instead of an image.{{Fact|date=June 2008}} {{Dubious}}


== See also ==
== See also ==

Revision as of 02:00, 25 August 2008

HTTrack is a free and open source website copier and offline browser by Xavier Roche, licensed under the GNU General Public License. It allows one to download World Wide Web sites from the Internet to a local computer. By default, HTTrack arranges the downloaded site by the original site's relative link-structure. The downloaded (or "mirrored") website can be browsed by opening a page of the site in a browser.

HTTrack can also update an existing mirrored site and resume interrupted downloads. HTTrack is fully configurable by options and by filters (include/exclude), and has an integrated help system. There is a basic command line version and two GUI versions (WinHTTrack and WebHTrack); the former can be part of scripts and cron jobs.

HTTrack uses a web crawler to download a website. Some parts of the website may not be downloaded by default due to the robots exclusion protocol unless disabled during the program. HTTrack can follow links that are generated with basic JavaScript and inside Applets or Flash, but not complex links (generated using functions or expressions) or server-side image maps.



See also

External links