Wikipedia offers free copies of all available content to interested users. These databases can be used for mirroring, personal use, informal backups, offline use or database queries (such as for Wikipedia:Maintenance). All text content is multi-licensed under the Creative Commons Attribution-ShareAlike 3.0 License (CC-BY-SA) and the GNU Free Documentation License (GFDL). Images and other files are available under different terms, as detailed on their description pages. For our advice about complying with these licenses, see Wikipedia:Copyrights.
Where do I get... 
English-language Wikipedia 
- Dumps from any Wikimedia Foundation project: http://dumps.wikimedia.org/
- English Wikipedia dumps in SQL and XML: http://dumps.wikimedia.org/enwiki/
- pages-articles.xml.bz2 – Current revisions only, no talk or user pages. (This is probably the one you want. The size of the 4 April 2013 dump is approximately 9.06 GB compressed, 42 GB uncompressed).
- As these files are quite large, please consider using a BitTorrent download to reduce the load on our servers. See meta:data dump torrents for more information.
- Unofficial torrent (2012) with Wikipedia:Project namespace removed and packaged with WikiTaxi for easy browsing. dc966fb7760c40621e0b203ba075d233d68375a8 is the info hash.
- Unofficial Torrent (2013) with Wikipedia:Project namespace removed. 343EB051D4F9339D3F8FD34B470D9AEACBFB0E58 is the info hash.
- pages-meta-current.xml.bz2 – Current revisions only, all pages (including talk)
- abstract.xml.gz – page abstracts
- all_titles_in_ns0.gz – Article titles only (with redirects)
- SQL files for the pages and links are also available
- All revisions, all pages: These files expand to multiple terabytes of text. Please only download these if you know you can cope with this quantity of data. Go to Latest Dumps and look out for all the files that have 'pages-meta-history' in their name.
- pages-articles.xml.bz2 – Current revisions only, no talk or user pages. (This is probably the one you want. The size of the 4 April 2013 dump is approximately 9.06 GB compressed, 42 GB uncompressed).
- To download a subset of the database in XML format, such as a specific category or a list of articles see: Special:Export, usage of which is described at Help:Export.
- Wiki front-end software: MediaWiki .
- Database backend software: You want to download MySQL.
- Image dumps: See below.
Other languages 
In the http://dumps.wikimedia.org/ directory you will find the latest SQL and XML dumps for the projects, not just English. For example, (others exist, just select the appropriate two letter language code and the appropriate project):
- Arabic Wikipedia dumps: http://dumps.wikimedia.org/arwiki/
- Dutch Wikipedia dumps: http://dumps.wikimedia.org/nlwiki/
- English Wikipedia dumps: http://dumps.wikimedia.org/enwiki/
- French Wikipedia dumps: http://dumps.wikimedia.org/frwiki/
- German Wikipedia dumps: http://dumps.wikimedia.org/dewiki/
- Italian Wikipedia dumps: http://dumps.wikimedia.org/itwiki/
- Polish Wikipedia dumps: http://dumps.wikimedia.org/plwiki/
- Portuguese Wikipedia dumps: http://dumps.wikimedia.org/ptwiki/
- Russian Wikipedia dumps: http://dumps.wikimedia.org/ruwiki/
- Spanish Wikipedia dumps: http://dumps.wikimedia.org/eswiki/
- Ukrainian Wikipedia dumps: http://dumps.wikimedia.org/ukwiki/
Some other directories (e.g. simple, nostalgia) exist, with the same structure.
Where are images and uploaded files 
Images and other uploaded media are available from mirrors in addition to being served directly from Wikimedia servers. Bulk download is currently (as of September 2012) available from mirrors but not offered directly from Wikimedia servers. See the list of current mirrors.
Unlike most article text, images are not necessarily licensed under the GFDL & CC-BY-SA-3.0. They may be under one of many free licenses, in the public domain, believed to be fair use, or even copyright infringements (which should be deleted). In particular, use of fair use images outside the context of Wikipedia or similar works may be illegal. Images under most licenses require a credit, and possibly other attached copyright information. This information is included in image description pages, which are part of the text dumps available from dumps.wikimedia.org. In conclusion, download these images at your own risk (Legal)
Dealing with compressed files 
Compressed dump files are significantly compressed, thus after uncompressed will take up large amounts of drive space. The following are programs that can be used to uncompress bzip2 (.bz2) and .7z files.
Windows does not ship with a bzip2 decompressor program. The following can be used to decompress bzip2 files.
- bzip2 (command-line) (from here) is available for free under a BSD license.
- 7-Zip is available for free under a LGPL license.
OS X ships with the command-line bzip2 tool.
GNU/Linux ships with the command-line bzip2 tool.
Some BSD systems ship with the command-line bzip2 tool as part of the operating system. Others, such as OpenBSD, provide it as a package which must first be installed.
- Some older versions of bzip2 may not be able to handle files larger than 2GB, so make sure you have the latest version if you experience any problems.
- Some older archives are compressed with gzip, which is compatible with PKZIP (the most common Windows format).
Dealing with large files 
As files grow in size, so does the likelihood they will exceed some limitation of a computing device. Each operating system, file system, hard storage device, and software (application) has a maximum file size limit. Each one of these will likely have a different maximum file size limit, but the lowest limit of all of them will become the file size limit for a storage device.
The older the software in a computing device, the more likely it will have a 2GB file limit somewhere in the system. This is due to older software using 32-bit integers for file indexing, which limits file sizes to 2^31 bytes (2GB) (for signed integers), or 2^32 (4GB) (for unsigned integers). Older C programming libraries have this 2 or 4GB limitation, but the newer file libraries have been converted to 64-bit integers thus supporting file sizes up to 2^63 or 2^64 bytes (8 or 16 EB).
Prior to starting a download of a large file, it is recommended that a person checks the storage device to ensure its file system can support files of such a large size, and check the amount of free space to ensure that it can hold the downloaded file.
File system limits 
There are two limits for a file system; the file system size limit, and the file size limit. In general, since the file size limit is less than the file system limit, then the larger file system limits are a moot point. A large percentage of users assume they can create files up to the size of their storage device, but are wrong in their assumption. For example, a 16GB storage device formatted as FAT32 file system has a file limit of 4GB for any single file. The following is a list of the most common file systems, and see Comparison of file systems for additional detailed information.
- FAT16 supports files up to 4GB. FAT16 is the factory format of smaller USB drives and all SD cards that are 2GB or smaller.
- FAT32 supports files up to 4GB. FAT32 is the factory format of larger USB drives and all SDHC cards that are 4GB or larger.
- exFAT supports files up to 127PB. exFAT is the factory format of all SDXC cards, but is incompatible with most flavors of UNIX due to licensing problems.
- NTFS supports files up to 16TB. NTFS is the default file system for Windows computers, including Windows 2000, Windows XP, and all their successors to date.
- HFS+ supports files up to 8EB on Mac OS X 10.2+ and iOS. HFS+ is the default file system for Mac computers.
- ext2 and ext3 supports files up to 16GB, but up to 2TB with larger block sizes. See http://www.suse.com/~aj/linux_lfs.html for more information.
- ext4 supports files up to 16TB (using 4KB block size). (limitation removed in e2fsprogs-1.42 (2012))
- XFS supports files up to 8EB.
- ReiserFS supports files up to 1EB (8TB on 32-bit systems).
- JFS supports files up to 4PB.
- Btrfs supports files up to 16EB
- NILFS supports files up to 8EB
- YAFFS2 supports files up to 2GB.
Operating system limits 
Each operating system has internal file system limit for file size and drive size, which is independent of the file system or physical media. If the operating system has any limits lower than the file system or physical media, then the O/S limits will be the real limit.
- For Windows 95/98/ME, there is a 4GB limit for all file sizes.
- For 32-bit Kernel 2.4.x systems, there is a 2TB limit for all file systems.
- For 64-bit Kernel 2.4.x systems, there is a 8EB limit for all file systems.
- For 32-bit Kernel 2.6.x systems without option CONFIG_LBD, there is a 2TB limit for all file systems.
- For 32-bit Kernel 2.6.x systems with option CONFIG_LBD and all 64-bit Kernel 2.6.x systems, there is a 8ZB limit for all file systems.
Google Android is based upon Linux, which determines its base limits.
- Internal Storage:
- External Storage Slots:
- All Android devices should support FAT16, FAT32, ext2 file systems.
- Android 2.3 and later supports ext4 file system.
- All devices support HFS+ for internal storage. No devices have external storage slots.
- Detect corrupted files
It is recommended that you check the MD5 sums (provided in a file in the download directory) to make sure your download was complete and accurate. You can check this by running the "md5sum" command on the files you downloaded. Given how large the files are, this may take some time to calculate. Due to the technical details of how files are stored, file sizes may be reported differently on different filesystems, and so are not necessarily reliable. Also, you may have experienced corruption during the download, though this is unlikely.
- Reformatting external USB drives
If you plan to download Wikipedia Dump files to one computer and use an external USB Flash Drive or Hard Drive to copy them to other computers, then you will run into the 4GB FAT32 file size limitation issue. To work around this issue, reformat the >4GB USB Drive to a file system that supports larger file sizes. If you are working exclusively with Windows XP/Vista/7 computers, then reformat your USB Drive to NTFS file system. Windows ext2 driver
- Linux and Unix
If you seem to be hitting the 2GB limit, try using wget version 1.10 or greater, cURL version 7.11.1-1 or greater, or a recent version of lynx (using -dump). Also, you can resume downloads (for example wget -c).
Why not just retrieve data from wikipedia.org at runtime? 
Suppose you are building a piece of software that at certain points displays information that came from Wikipedia. If you want your program to display the information in a different way than can be seen in the live version, you'll probably need the wikicode that is used to enter it, instead of the finished HTML.
Also if you want to get all of the data, you'll probably want to transfer it in the most efficient way that's possible. The wikipedia.org servers need to do quite a bit of work to convert the wikicode into HTML. That's time consuming both for you and for the wikipedia.org servers, so simply spidering all pages is not the way to go.
To access any article in XML, one at a time, access Special:Export/Title of the article.
Read more about this at Special:Export.
Please be aware that live mirrors of Wikipedia that are dynamically loaded from the Wikimedia servers are prohibited. Please see Wikipedia:Mirrors and forks.
Please do not use a web crawler 
Please do not use a web crawler to download large numbers of articles. Aggressive crawling of the server can cause a dramatic slow-down of Wikipedia.
Sample blocked crawler email 
- IP address nnn.nnn.nnn.nnn was retrieving up to 50 pages per second from wikipedia.org addresses. Robots.txt has a rate limit of one per second set using the Crawl-delay setting. Please respect that setting. If you must exceed it a little, do so only during the least busy times shown in our site load graphs at http://stats.wikimedia.org/EN/ChartsWikipediaZZ.htm. It's worth noting that to crawl the whole site at one hit per second will take several weeks. The originating IP is now blocked or will be shortly. Please contact us if you want it unblocked. Please don't try to circumvent it - we'll just block your whole IP range.
- If you want information on how to get our content more efficiently, we offer a variety of methods, including weekly database dumps which you can load into MySQL and crawl locally at any rate you find convenient. Tools are also available which will do that for you as often as you like once you have the infrastructure in place. More details are available at http://en.wikipedia.org/wiki/Wikipedia:Database_download.
- Instead of an email reply you may prefer to visit #mediawiki at irc.freenode.net to discuss your options with our team.
Note that the robots.txt currently has a commented out Crawl-delay:
## *at least* 1 second please. preferably more :D ## we're disabling this experimentally 11-09-2006 #Crawl-delay: 1
Please be sure to use an intelligent non-zero delay regardless.
Doing SQL queries on the current database dump 
You can do SQL queries on the current database dump (as a replacement for the disabled Special:Asksql page).
Database schema 
SQL schema 
See also: mw:Manual:Database layout
The sql file used to initialize a MediaWiki database can be found here.
XML schema 
The XML schema for each dump is defined at the top of the file.
Help parsing dumps for use in scripts 
- Wikipedia:Computer help desk/ParseMediaWikiDump describes the Perl Parse::MediaWikiDump library, which can parse XML dumps.
- Wikipedia preprocessor (wikiprep.pl) is a Perl script that preprocesses raw XML dumps and builds link tables, category hierarchies, collects anchor text for each article etc.
- Wikipedia SQL dump parser is a .NET library to read MySQL dumps without the need to use MySQL database
- Dictionary Builder is a Java program which is able to parse XML dumps and extract entries in files
Help importing dumps into MySQL 
Static HTML tree dumps for mirroring or CD distribution 
MediaWiki 1.5 includes routines to dump a wiki to HTML, rendering the HTML with the same parser used on a live wiki. As the following page states, putting one of these dumps on the web unmodified will constitute a trademark violation. They are intended for private viewing in an intranet or desktop installation.
- The static version of Wikipedia created by Wikimedia: http://static.wikipedia.org/ Feb. 11, 2013 - This is apparently offline now. There was no content.
- Wiki2static (site down as of October 2005[update]) was an experimental program set up by User:Alfio to generate html dumps, inclusive of images, search function and alphabetical index. At the linked site experimental dumps and the script itself can be downloaded. As an example it was used to generate these copies of English WikiPedia 24 April 04, Simple WikiPedia 1 May 04(old database) format and English WikiPedia 24 July 04Simple WikiPedia 24 July 04, WikiPedia Francais 27 Juillet 2004 (new format). BozMo uses a version to generate periodic static copies at fixed reference.
- If you want to draft a traditional website in Mediawiki and dump it to HTML format, you might want to try mw2html by User:Connelly.
- If you'd like to help develop dump-to-static HTML tools, please drop us a note on the developers' mailing list.
- mw:Alternative parsers lists some other not working options for getting static HTML dumps
- Wikipedia:TomeRaider database
- http://sdict.com hosts a January 2007 snapshot in the open source Sdictionary .dct format
- http://ahuv.net/wikipedia hosts October 2010 processed snapshot in the freeware MDict .mdx format
Kiwix is an offline reader for web content which runs on Windows, Mac OSX, Android and GNU/Linux. It's especially thought to make Wikipedia available offline. This is done by reading the content of the project stored in a file format ZIM, a high compressed open format with additional meta-data.
- Pure ZIM reader
- case and diacritics insensitive full text search engine
- Bookmarks & Notes
- kiwix-serve: ZIM HTTP server
- PDF/HTML export
- Multilingual (User interface localised in more than 110 languages)
- Search suggestions
- Zim index capacity
- Support for MacOSX / Linux / Windows
- DVD/USB launcher for Windows (autorun)
- Integrated content manager/downloader
- See also
- (English) Kiwix + 40.000 articles big WP1 selection (torrent) (~4GB)
- (English) Kiwix + Wikipedia for schools selection (torrent) (~3GB)
- (English) (French) (Spanish) Official Web site
- RSS/Atom Planet
- (English) Follow our last improvements...
- Please be aware: in Kiwix version "0.9-beta5-win" changing the selected folder in "Preferences", the contents of that folder will be deleted if the selected folder contains files or data. Please do not use a personal folder, and make a new empty folder.
Provides Wikipedia pages with images.
Aard Dictionary 
Offline wikipedia reader. No images. Cross-Platform for Windows, Mac, Linux, Android, Maemo. Runs on rooted Nook and Sony PRS-T1 eBooks readers.
The wiki-as-ebook store provides ebooks created from a large set of Wikipedia articles with grayscale images for e-book-readers (2013).
The wikiviewer plugin for rockbox permits viewing converted wikipedia dumps on many Rockbox devices. It needs a custom build and conversion of the wiki dumps using the instructions available at http://www.rockbox.org/tracker/4755 .The conversion recompresses the file and splits it into 1Gb files and an index file which all need to be in the same folder on the device or micro sd card.
Dynamic HTML generation from a local XML database dump 
Instead of converting a database dump file to many pieces of static HTML, one can also use a dynamic HTML generator. Browsing a wiki page is just like browsing a Wiki site, but the content is fetched and converted from a local dump file upon request from the browser.
XOWA is an open-source desktop application that can read and edit Wikipedia offline. It is currently in the alpha stage of development, but is functional. It is available for download here.
- Displays all articles from a Wikimedia data dump
- Works with English Wikipedia, Wiktionary, Wikisource, Wikiquote, Wikivoyage, as well as the non-English language counterparts (for example, French Wikipedia). Also works with Wikimedia Commons and Wikispecies
- Renders articles with full HTML formatting
- Downloads images and other files on demand
- Navigates between offline wikis (click on "Look up this word in Wiktionary" and it will open your offline version of Wiktionary)
- Edits articles
- Installs to a flash memory card for portability to other machines
- Is customizable and extendable at many levels: from keyboard shortcuts to HTML layouts to internal options
Offline wikipedia reader 
(for Mac OS X, GNU/Linux, FreeBSD/OpenBSD/NetBSD, and other Unices)
The offline-wikipedia project provides a very effective way to get an offline version of wikipedia. It uses entirely free software. Packages are available for Ubuntu and soon for other Linux distributions.
Main features 
- Very fast searching
- Keyword (actually, title words) based searching
- Search produces multiple possible articles: you can choose amongst them
- LaTeX based rendering for mathematical formulae
- Minimal space requirements: the original .bz2 file plus the index
- Very fast installation (a matter of hours) compared to loading the dump into MySQL
WikiFilter is a program which allows you to browse over 100 dump files without visiting a Wiki site.
WikiFilter system requirements 
- A recent Windows version (WinXP is fine; Win98 and WinME won't work because they don't have NTFS support)
- A fair bit of hard drive space (To install you will need about 12 - 15 Gigabytes; afterwards you will only need about 10 Gigabytes)
How to set up WikiFilter 
- Start downloading a Wikipedia database dump file such as an English Wikipedia dump. It is best to use a download manager such as GetRight so you can resume downloading the file even if your computer crashes or is shut down during the download.
- Download XAMPPLITE from  (you must get the 1.5.0 version for it to work). Make sure to pick the file whose filename ends with .exe
- Install/extract it to C:\XAMPPLITE.
- Download WikiFilter 2.3 from this site: https://sourceforge.net/projects/wikifilter. You will have a choice of files to download, so make sure that you pick the 2.3 version. Extract it to C:\WIKIFILTER.
- Copy the WikiFilter.so into your C:\XAMPPLITE\apache\modules folder.
- Edit your C:\xampplite\apache\conf\httpd.conf file, and add the following line:
- LoadModule WikiFilter_module "C:/XAMPPLITE/apache/modules/WikiFilter.so"
- When your Wikipedia file has finished downloading, uncompress it into your C:\WIKIFILTER folder. (I used WinRAR http://www.rarlab.com/ demo version - BitZipper http://www.bitzipper.com/winrar.html works well too.)
- Run WikiFilter (WikiIndex.exe), and go to your C:\WIKIFILTER folder, and drag and drop the XML file into the window, click Load, then Start.
- After it finishes, exit the window, and go to your C:\XAMPPLITE folder. Run the setup_xampp.bat file to configure xampp.
- When you finish with that, run the Xampp-Control.exe file, and start apache.
- Browse to http://localhost/wiki and see if it works
- If it doesn't work, see the forums.
WikiTaxi is an offline-reader for wikis in MediaWiki format. It enables users to search and browse popular wikis like Wikipedia, Wikiquote, or WikiNews, without being connected to the Internet. WikiTaxi works well with different languages like English, German, Turkish, and others but has a problem with right-to-left language scripts.
WikiTaxi system requirements 
- Any Windows version starting from Windows 95 or later. Large File support (greater than 4 GB) for the huge wikis (English only at the time of this writing).
- It also works on Linux with Wine.
- 16 MB RAM minimum for the WikiTaxi reader, 128 MB recommended for the importer (more for speed).
- Storage space for the WikiTaxi database. This requires about 11.7 GiB for the English Wikipedia (as of 5 April 2011), 2 GB for German, less for other Wikis. These figures are likely to grow in the future.
WikiTaxi usage 
- Download WikiTaxi and extract to an empty folder. No installation is otherwise required.
- Download the XML database dump (*.xml.bz2) of your favorite wiki.
- Run WikiTaxi_Importer.exe to import the database dump into a WikiTaxi database. The importer takes care to uncompress the dump as it imports, so make sure to save your drive space and do not uncompress beforehand.
- When the import is finished, start up WikiTaxi.exe and open the generated database file. You can start searching, browsing, and reading immediately.
- After a successful import, the XML dump file is no longer needed and can be deleted to reclaim disk space.
- To update an offline Wiki for WikiTaxi, download and import a more recent database dump.
For WikiTaxi reading, only two files are required: WikiTaxi.exe and the .taxi database. Copy them to any storage device (memory stick or memory card) or burn them to a CD or DVD and take your Wikipedia with you wherever you go!
BzReader and MzReader (for Windows) 
BzReader is an offline Wikipedia reader with fast search capabilities. It renders the Wiki text into HTML and doesn't need to decompress the database. Requires Microsoft .NET framework 2.0.
MzReader by Mun206 works with (though is not affiliated with) BzReader, and allows further rendering of wikicode into better HTML, including an interpretation of the monobook skin. It aims to make pages more readable. Requires Microsoft Visual Basic 6.0 Runtime, which is not supplied with the download. Also requires Inet Control and Internet Controls (Internet Explorer 6 ActiveX), which are packaged with the download.
Offline Wikipedia database in EPWING dictionary format, which is common in Japan, can be read including images and other some rendering limitations for tables, on any systems where a reader is available. There are many free and commercial readers for Windows/Mobile, MacOSX/iOS (Mac, iPhone, iPad), Android, Unix/Linux/BSD, DOS, and Java-based browser applications.
See also 
- m:Help:Downloading pages
- Meta:Data dumps#Other tools, for related tools, e.g. extractors and "dump readers"
- Wikipedia:Size of Wikipedia
- meta:Mirroring Wikimedia project XML dumps
- Wikimedia Downloads.
- Domas visits logs (read this!). Also, old data in the Internet Archive.
- Wikimedia mailing lists archives.
- User:Emijrp/Wikipedia Archive. An effort to find all the Wiki[mp]edia available data, and to encourage people to download it and save it around the globe.
- Script to download all Wikipedia 7z dumps.