Talk:Gzip: Difference between revisions

Content deleted Content added

Inline

Revision as of 07:29, 12 June 2016

This is the talk page for discussing improvements to the Gzip article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Computing: Software Start‑class

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
Start	This article has been rated as Start-class on Wikipedia's content assessment scale.
???	This article has not yet received a rating on the project's importance scale.
	This article is supported by WikiProject Software.

Computing Start‑class Low‑importance

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
Start	This article has been rated as Start-class on Wikipedia's content assessment scale.
Low	This article has been rated as Low-importance on the project's importance scale.

Old Discussion

"Although its file format also allows for multiple such streams to be concatenated together (these are simply decompressed concatenated as if they were one), gzip is normally used to compress just single files."

I don't understand what this sentence is trying to say. --Gbleem 19:08, 15 October 2006 (UTC)[reply]

Here's §2.2 of RFC 1952:

A gzip file consists of a series of "members" (compressed data sets). The format of each member is specified in the following section. The members simply appear one after another in the file, with no additional information before, between, or after them.

That is, a member is a compressed file and the gzip format allows one .gz file to contain multiple compressed files. In practice, however, it is always better to use either a ZIP archive or a compressed tar archive to handle multiple files, so people rarely put more than one member in a compressed file. In fact, the gzip program provides no easy way to do that. Furthermore, when used to decompress a .gz file containing multiple members, the program does something quite unhelpful (creates a single file containing the concatenated contents of the original members) instead of (for instance) recreating the original members as individual files.

I hope this helps. Cheers, CWC(talk) 05:22, 16 October 2006 (UTC)[reply]

AdvanceCOMP/7-zip

AdvanceCOMP and 7-zip use a DEFLATE implementation which produces gzip-compatible files with better compression ratios than gzip itself, at the cost of more processor time.

I removed this because 7-zip does not produce gzip-compatible files. Even using the deflate method, 7-zip files cannot be decompressed with gzip. If you contend otherwise, please post the required commands to compress a file with 7-zip and then decompress it with gzip. I used: '7za a test.7z test.bin' then 'mv test.7z test.gz' then 'gunzip test.gz' and gunzip reported: "gunzip: test.gz: not in gzip format". '7za l -slt test.7z' reports "Method = Deflate". I don't know anything about AdvanceCOMP. If you can verify that an AdvanceCOMP-compressed file can be decompressed using gzip, feel free to re-add the part about that program, but please clarify whether *all* or just *some* AdvanceCOMP archives are gzip-compatible, if so. Ramorum (talk) 07:37, 17 February 2008 (UTC)[reply]

Hello Ramorum, did you try using the -t switch? This should select the container format; eg. -t7z, -tzip, -tgz (the last should produce gzip encapsulated DEFLATE) using the internal DEFLATE implementation. —Sladen (talk) 15:39, 17 February 2008 (UTC)[reply]

Good source

There's an excellent concise discussion of the specific techniques used in gzip in the book Managing Gigabytes, section 2.6, pp.78–79. It even includes some important implementation details not in the RFC. I'll add some info from there once I've dealt with the more fundamental Ziv-Lempel articles. Dcoetzee 19:37, 9 May 2008 (UTC)[reply]

--rsyncable option

perhaps some discussion of the rsyncable option, which occasionally resets the compressor so that an early change can still leave later sections identical after compression (for smaller deltas), might be worthwhile. I don't know enough to do it myself. —Preceding unsigned comment added by 216.106.175.189 (talk) 19:04, 3 June 2008 (UTC)[reply]

Examples

The examples section needs to be rewritten and incorporated into the rest of the article; right now it is a (woefully) incomplete howto which has no place on wikipedia --mcpusc (talk) 20:04, 22 February 2010 (UTC)[reply]

I've rewritten examples to what I consider useful; what was there before was a random collection of examples. Let's keep it to a simple compress/decompress & tar example; anything more is getting too far into the usage guide which doesn't belong on wikipedia. --mcpusc (talk) 20:15, 22 February 2010 (UTC)[reply]

I couldn't find anything in the charter prohibiting my mass gzip (find . -name "*.txt" -mtime 1 -type f -print0 | xargs -0 gzip), fundamental, illustrating, and very useful. JackPotte (talk) 22:10, 25 February 2010 (UTC)[reply]

Think about the target audience for this page. Does your mass gzip explain something about gzip? I don't think it does. I think your example would be great for 'find'!

Usefulness (from a system administrator's perspective) doesn't warrant inclusion here on wikipedia. This is what WP:NOT is saying - that we're here to describe what gzip does, not describe what cool thing one can do with it or provide a collection of nifty scripts. --mcpusc (talk) 23:58, 25 February 2010 (UTC)[reply]

Apart from the Wikimedia scripts indeed. JackPotte (talk) 18:46, 27 February 2010 (UTC)[reply]

Many links are confusing

Why are there so many links on RFC 1950, RFC 1951 and so on? Wouldn't one of each be enough? They are somewhat confusing, especially in the "other uses" section.Gerdschoenle (talk) 10:30, 2 August 2010 (UTC)[reply]

These links are generated by MediaWiki. If you dislike some particular link, be bold and rewrite its wikitext as “RFC 1950” and so on. Incnis Mrsi (talk) 18:17, 2 August 2010 (UTC)[reply]

DEFLATE

What is the relationship between gzip and DEFLATE? Did gzip create the DEFLATE algorithm? Does gzip use the DEFLATE algorithm? Int21h (talk) 21:06, 7 January 2011 (UTC)[reply]

At the risk of doing somebody else's homework; the deflate bitstream encoding was created by Phil Katz for PKZIP (the DEFLATE article states this, right at the top). Other encoders/decoders were built to the same specification (zlib and gzip) as the bitstream was standardised in an RFC. The compression bitstream used by pkzip and gzip is the same, just with different headers on the top and bottom, and this is what makes a zip-file a zip-file and a gzip-file a gzip-file. It is possible to transcode between the two without recompressing by just replacing the .zip/.gz headers, but keeping the core content stream the same.

It's a bit like the situation with an MPEG video stream, which is standardised and for which there are many different compressors (television cameras, Xvid, Divx, ...) and many different decompressors (TVs, set top boxes, VLC, ...). At the same time the actual stream can be stored inside a .mov, .avi, .ts, ... all of which are the same content but with slightly different headers. Is that useful, does anything else warrant further expansion/explanation? —Sladen (talk) 23:07, 7 January 2011 (UTC)[reply]

Well, I missed the obvious "gzip is based on the DEFLATE algorithm" statement in the article, which is what I was looking for. Int21h (talk) 02:01, 12 January 2011 (UTC)[reply]

Confusing structure

As I did some research on HTTP Compression, I came over this and the other articles (zlib, DEFLATE,...). They offer redundant information and everything is centered on on their historical close relation and these 3 RFCs (1950,1951,1952). I would propose a restructuring with at least the following entities (as pages or headings)

deflate:
- as a compression algorithm and stream format (RFC 1951)
- the historical development of deflate, pkzip, zip, gzip, zlib
- a reference to the role of deflate in the data formats zip, gzip, zlib, png, ...
- a reference to the role of deflate in HTTP Compression
gzip:
- as an application
- as a compressed container file/stream format (RFC 1952) that uses deflate by default
- a reference to the role of gzip in HTTP Compression
- a reference to the historical development (described in deflate)
zlib:
- as a library
- as a compression file/stream format (RFC 1950) that uses deflate by default
- a reference to the role of zlib in HTTP Compression
- a reference to the historical development (described in deflate)
HTTP Compression

--Voeren (talk) 08:58, 24 March 2011 (UTC)[reply]

Citation needed?

AdvanceCOMP and 7-Zip can produce gzip-compatible files, using an internal DEFLATE implementation with better compression ratios than gzip itself—at the cost of more processor time compared to the reference implementation ^{[citation needed]}.

Citation in not needed, because everyone can see that

7z -si -so -tgzip -mx=9 < file > file.gz

produces smaller files than

gzip --best < file > file.gz

46.112.87.125 (talk) 23:42, 11 April 2011 (UTC)[reply]

Confusing and unhelpful article

This article is so full of technical speak and details that it is incomprehensible to the average reader. This article is likely to be understood only by those who already know all the information it contains.

The "Other uses" section is confusing - I missed the part where the primary uses were discussed.

I came here trying to understand gzip's use in HTTP compression, seeking to learn which servers and browsers support it, and how it is enabled. After re-reading the article several times, I finally found a brief couple of highly-technical sentences where I almost found the information I was looking for.

This article needs to be re-written to be understandable by a wider audience. Perhaps all that is needed is to state the (seemingly) obvious.

-- Gilly3 (talk) 20:37, 29 December 2011 (UTC)[reply]

About the .tbz2 extension

When I first came across tbz in this article, I thought it was a typo and replaced it with tbz2. That change was reverted. I should have paid attention to the fact that some people seem to use tbz for bzip2-compressed tar archives (in spite of the fact that it looks more like tar + bzip than tar + bzip2, but this is beside the point). Sorry for that. However, I believe tbz2 must be mentioned: ".tbz2" generates next to 2 million hits on Google (the first ones, at least, being about tar and bzip2) and is mentioned in the bzip2 manual page.

-- FC 21 January 2012 — Preceding unsigned comment added by 82.66.64.12 (talk) 22:30, 21 January 2012 (UTC)[reply]

Incomplete File Format

I came here to see what the file format for a typical (minimalistic) gzip file was. And all I got was that there's a 10 byte header + optional extra headers + 8 byte footer. Without specifying what these are! Very disappointing! Jahibadkaret (talk) 15:26, 17 March 2012 (UTC)[reply]

I doubt it would add much to the article; anyone who needs this much detail is probably better served by the format specification (page 5), which is clearly linked in the reference section. — DataWraith (talk) 21:23, 17 March 2012 (UTC)[reply]

Most used software in the world?

I've heard it said that zlib is the most popular software in the world (Richard Hipp suggests it's more popular than SQLite at the 29:15 mark of his interview on episode 201 of the changelog on https://changelog.com/201/ ). Does anyone have anything that could back this up?

80.194.75.37 (talk) 07:29, 12 June 2016 (UTC)[reply]

@@ Line 109: / Line 109: @@
 I came here to see what the file format for a typical (minimalistic) gzip file was. And all I got was that there's a 10 byte header + optional extra headers + 8 byte footer. Without specifying what these are! Very disappointing! [[User:Jahibadkaret|Jahibadkaret]] ([[User talk:Jahibadkaret|talk]]) 15:26, 17 March 2012 (UTC)
 :I doubt it would add much to the article; anyone who needs this much detail is probably better served by the format specification ([https://tools.ietf.org/html/rfc1952#page-5 page 5]), which is clearly linked in the reference section. — [[User:DataWraith|DataWraith]] ([[User_talk:DataWraith|talk]]) 21:23, 17 March 2012 (UTC)
+== Most used software in the world? ==
+I've heard it said that zlib is the most popular software in the world (Richard Hipp suggests it's more popular than SQLite at the 29:15 mark of his interview on episode 201 of the changelog on https://changelog.com/201/ ). Does anyone have anything that could back this up?
+[[Special:Contributions/80.194.75.37|80.194.75.37]] ([[User talk:80.194.75.37|talk]]) 07:29, 12 June 2016 (UTC)