Wikipedia talk:Database download

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Please note that questions about the database download are more likely to be answered on the xmldatadumps-l or wikitech-l mailing lists than on this talk page.

Anywhere to get a hard drive with the most recent dump on it?[edit]

Is there anywhere to get the most recent dump, or some subset thereof (say, enwiki talk and WP namespaces with histories intact), without downloading it? I live in a rural area with a lousy connection such that even downloading the current version (which wouldn't actually be helpful to me anyway) is not practical. Seems like hard drives with the database preloaded is something someone may have thought to offer. Anyone come across this? In other words I want to have a hard drive with the data on it shipped to me. Thoughts? Thanks. -- (talk) 22:30, 23 August 2014 (UTC)

I suggest those who are editing about the BitTorrent related specifics should know extremely about the BitTorrent Protocol. BitTorrenting is not a simple job of making 'burnbit' torrents as many here would have thought about. It's more than that.[edit]

Recently my edit (which was prevalent for about more than a year) was changed to a few words without explaining the benefits!

My edit: Additionally, there are numerous benefits using BitTorrent over a HTTP/FTP download. In an HTTP/FTP download, there is no way to compute the checksum of a file during downloading. Because of that limitation, data corruption is harder to prevent when downloading a large file. With BitTorrent downloads, the checksum of each block/chunk is computed using a SHA-1 hash during the download. If the computed hash doesn't match with the stored hash in the *.torrent file, that particular block/chunk is avoided. As a result, a BitTorrent download is generally more secure against data corruption than a regular HTTP/FTP download.[1]

New edit: ** Use BitTorrent to download the data dump, as torrenting has many benefits.

Even in the new Meta page, I see no reference to the 'benefits' he/she is referring.

The new edit is made by this IP:

Could I know, why this edit have been made and why didn't he/she provide the correct reference when editing? Why he/she solely decided to delete that whole reference?

I'm amazed what could be the reason! (talk) 09:45, 6 October 2014 (UTC)


  1. ^ Anthony Bellissimo; Brian N. Levine; Prashant Shenoy. "Exploring the Use of BitTorrent as the Basis for a Large Trace Repository" (PDF). University of Massachusetts (USA). Archived from the original (PDF) on 2013-12-20. Retrieved 2013-12-20. 

Privacy: Timestamps in database downloads and history[edit]

I haven't looked into the database dumps, even if you only get a snapshot and not edit history you can track users (and their timestamps) by downloading it frequently.

By going into people's "View history" or pages' history you can track users. Should the timestamps be accurate down to the second there and in dumps?

It seems to me for talk pages that signatures' timestamps would be preserved but excluding that couldn't all available info on users be shown say with day granularity of timestamps or just not at all? Is the timestamp important to anyone? I feel like I should see all mine (when logged in). But then I would infer a timeframe for others between mine.. Still I would have to edit much and only for those pages and users between my edits.

I'm not sure about with timezones how timestamps appear. Because of screen scraping maybe less granularity would be pointless as could be evaded?

I see when I edit pages that they are synced to Google very fast - anyone know how? Ongoing database dumps? comp.arch (talk) 12:31, 26 October 2014 (UTC)

We are missing an offline wiki reader program (that preferably can read compressed wiki dump files)[edit]

It would be awesome to download Wikipedia and have a program read the data archive, like a browser does online, but for offline viewing; in that, all I need is a wiki dump file, and the program (as a basical 'reader' program).

I don't need the 'talk', or 'discussion', or 'modification' parts, just as long as internal hyperlinks are forwarded to find the right file, and if it's possible to just extract the needed page 'on the fly' rather than having to extract the whole Wiki dump file on my hd.

There exist a lot of compressed text readers, however to have a compressed html browser would be file. And have the pages stripped from redundant data, like headers, and other online stuff, that only makes sense having when being online.

I basically want a book reader, where the main page is like a search engine, to load the sub pages (of whatever topic I want to read of Wikipedia). — Preceding unsigned comment added by (talk) 08:25, 6 December 2014 (UTC)

Yes, that would be awsome. I added a list of all the offline Wikipedia readers I know about to Wikipedia: database download#Offline Wikipedia readers. I hope at least one of them works for you, User:
Is there a mainspace article that has a section discussing such readers, other than list of Wikipedia mobile applications? --DavidCary (talk) 16:38, 26 March 2015 (UTC)

Is there a way to get diff update periodically?[edit]

>> pages-articles.xml.bz2 – Current revisions only, no talk or user pages; this is probably what you want, and is approximately 11 GB compressed (expands to over 49 GB when uncompressed).

If I want to cache whole Wikipedia locally, I can download the whole 11 GB. But What if I want to update periodically once a day or once a week? Do I have to download again whole 11 GB or is there a better way?

Balkierode (talk) 22:33, 14 July 2015 (UTC)