Wikipedia talk:Database download
- Please note that questions about the database download are more likely to be answered on the xmldatadumps-l or wikitech-l mailing lists than on this talk page.
- 1 New version of bzip2 executable?
- 2 How to download all users talk pages?
- 3 How to create a Torrent?
- 4 Torrent download.
- 5 Anywhere to get a hard drive with the most recent dump on it?
- 6 I suggest those who are editing about the BitTorrent related specifics should know extremely about the BitTorrent Protocol. BitTorrenting is not a simple job of making 'burnbit' torrents as many here would have thought about. It's more than that.
- 7 Privacy: Timestamps in database downloads and history
- 8 We are missing an offline wiki reader program (that preferably can read compressed wiki dump files)
New version of bzip2 executable?
Wikipedia:Database download#Dealing with compressed files contains a link to ftp://sources.redhat.com/pub/bzip2/v102/bzip2-102-x86-win32.exe which is nice. Is there a way to link to an executable version of the current version 1.06? Thanks! GoingBatty (talk) 00:54, 1 February 2014 (UTC)
How to download all users talk pages?
How to create a Torrent?
I notice that the March and April dumps are missing from meta:data dump torrents. Is there an idiot's guide to transforming a new dump from dumps.wikimedia.org to the torrents page? -- John of Reading (talk) 07:37, 7 April 2014 (UTC)
- (More) Actually the burnbit.com home page makes it idiot-proof. Even I have managed to do it. -- John of Reading (talk) 18:52, 15 June 2014 (UTC)
- Several years of torrent links are listed at meta:Data dump torrents#enwiki with the latest at the top - the most recent was added only yesterday. If a link to "the latest torrent" is displayed at Wikipedia:Database download as well, they are sure to get out of sync. I think it is safer to have no torrent links here, and send people over to the page at meta. -- John of Reading (talk) 15:14, 14 August 2014 (UTC)
Anywhere to get a hard drive with the most recent dump on it?
Is there anywhere to get the most recent dump, or some subset thereof (say, enwiki talk and WP namespaces with histories intact), without downloading it? I live in a rural area with a lousy connection such that even downloading the current version (which wouldn't actually be helpful to me anyway) is not practical. Seems like hard drives with the database preloaded is something someone may have thought to offer. Anyone come across this? In other words I want to have a hard drive with the data on it shipped to me. Thoughts? Thanks. --18.104.22.168 (talk) 22:30, 23 August 2014 (UTC)
Recently my edit (which was prevalent for about more than a year) was changed to a few words without explaining the benefits!
My edit: Additionally, there are numerous benefits using BitTorrent over a HTTP/FTP download. In an HTTP/FTP download, there is no way to compute the checksum of a file during downloading. Because of that limitation, data corruption is harder to prevent when downloading a large file. With BitTorrent downloads, the checksum of each block/chunk is computed using a SHA-1 hash during the download. If the computed hash doesn't match with the stored hash in the *.torrent file, that particular block/chunk is avoided. As a result, a BitTorrent download is generally more secure against data corruption than a regular HTTP/FTP download.
New edit: ** Use BitTorrent to download the data dump, as torrenting has many benefits.
Even in the new Meta page, I see no reference to the 'benefits' he/she is referring.
The new edit is made by this IP: 22.214.171.124
Could I know, why this edit have been made and why didn't he/she provide the correct reference when editing? Why he/she solely decided to delete that whole reference?
I'm amazed what could be the reason!
Privacy: Timestamps in database downloads and history
I haven't looked into the database dumps, even if you only get a snapshot and not edit history you can track users (and their timestamps) by downloading it frequently.
By going into people's "View history" or pages' history you can track users. Should the timestamps be accurate down to the second there and in dumps?
It seems to me for talk pages that signatures' timestamps would be preserved but excluding that couldn't all available info on users be shown say with day granularity of timestamps or just not at all? Is the timestamp important to anyone? I feel like I should see all mine (when logged in). But then I would infer a timeframe for others between mine.. Still I would have to edit much and only for those pages and users between my edits.
I'm not sure about with timezones how timestamps appear. Because of screen scraping maybe less granularity would be pointless as could be evaded?
We are missing an offline wiki reader program (that preferably can read compressed wiki dump files)
It would be awesome to download Wikipedia and have a program read the data archive, like a browser does online, but for offline viewing; in that, all I need is a wiki dump file, and the program (as a basical 'reader' program).
I don't need the 'talk', or 'discussion', or 'modification' parts, just as long as internal hyperlinks are forwarded to find the right file, and if it's possible to just extract the needed page 'on the fly' rather than having to extract the whole Wiki dump file on my hd.
There exist a lot of compressed text readers, however to have a compressed html browser would be file. And have the pages stripped from redundant data, like headers, and other online stuff, that only makes sense having when being online.
I basically want a book reader, where the main page is like a search engine, to load the sub pages (of whatever topic I want to read of Wikipedia). — Preceding unsigned comment added by 126.96.36.199 (talk) 08:25, 6 December 2014 (UTC)