Jump to content

Talk:Delta encoding

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 76.119.30.87 (talk) at 00:21, 11 February 2015 (→‎"skip deltas" variant in Subversion: new section). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

WikiProject iconComputing Start‑class
WikiProject iconThis article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
StartThis article has been rated as Start-class on Wikipedia's content assessment scale.
???This article has not yet received a rating on the project's importance scale.

C code is wrong

I think the C code is not correct

You are correct.
If you run the example in the text trough this algorightm (2, 4, 6, 9, 7), the encoded becomes:
2, 2, 4, 5, 2
The decode then becomes:
2, 4, 8, 13, 15
The decoder is correct (checked with encoded data 2, 2, 2, 3, -2):
2, 4, 6, 9, 7 (which is the right awnser).
I will correct the encoder after I add this comment. ShadowLord 09:14, 7 Apr 2005 (UTC)
The C code is utter rubbish. Delta encoding requires solving the longest common substring problem. Many approximate solutions exist, but exact solution requires O(n**2) operations. — Preceding unsigned comment added by 107.16.107.127 (talk) 22:57, 24 April 2013 (UTC)[reply]

Sample Java code

The following Java code performs a simple form of delta encoding and decoding:

public class Delta {
	public static int[] deltaEncode(int[] buffer) {
		int original, delta = 0;
		for (int i = 0; i < buffer.length; ++i) {
			original = buffer[i];
			buffer[i] -= delta;
			delta = original;
		}
		return buffer;
	}

	public static int[] deltaDecode(int[] buffer) {
		int delta = 0;
		for (int i = 0; i < buffer.length; ++i) {
			buffer[i] += delta;
			delta = buffer[i];
		}
		return buffer;
	}

	public static void main(String[] args) {
		int[] buffer = { 2, 4, 6, 9, 7 };
		buffer = deltaEncode(buffer);
		for (int c : buffer) {
			System.out.print(c + " ");
		}
		System.out.println();

		buffer = deltaDecode(buffer);
		for (int c : buffer) {
			System.out.print(c + " ");
		}
	}
}

Generalisation of delta encoding

Could this article be expanded to a more general view of delta encoding as encoding only the differences between things? A mention of diff and similar might be quite good. porges 09:38, Apr 7, 2005 (UTC)

I've changed the text to link to delta compression article, which have more direct relation to revision control in general and diff in particular. 217.26.163.26 07:23, 4 May 2005 (UTC)[reply]

Error control

Error control has different meaning in this article - it's used to find differences from other version of the same file without transfering it, not to detect corrupted data. This is the way how rsync/zsync use it. Please, don't change this meaning! 217.26.163.26 07:06, 4 May 2005 (UTC)[reply]


symmetric delta

The article currently states

A symmetric delta can be expressed as , where and represent two successive versions.

OK, that is a pretty formula, but what does it mean? (What does the "\" mean in this context?)

I'm guessing it means something like

Given two versions and , we can compute "the delta" between them.
A directed delta tells us, once we know , how to generate -- but if we only know and the directed delta, it's generally impossible to recover .
A symmetric delta can go both ways -- if we know , it tells us how to generate . But if we only know , that same symmetric delta tells us how to generate .

Is that all it means ? Is it OK if I replace the mysterious formula with this English-language description? --DavidCary 03:19, 6 October 2005 (UTC)[reply]


If I read the formula correctly, it means the same as , which I can read easier in English: the symmetric delta of two versions is the elements of each version without the elements common to both.

I'm guessing that directed delta could be defined by , and would read: the elements of second version without the elements of the first.

I don't feel that the formula adds much to the article. --Eruionnyron 19:10, 9 October 2005 (UTC)[reply]

Potential sources for new article

This article is crappy. I'll rewrite it, but having a new list of sources would be quite helpful. If anyone wants to chip in I'd be very pleased.

What??

"Perhaps the simplest example is storing values of bytes as differences (deltas) between sequential values, rather than the values themselves. So, instead of 2, 4, 6, 9, 7, we would store 2, 2, 2, 3, -2."

This wouldn't save any space at all. Is this actually used to save computer data file revisions? I highly doubt it. We already have an article for delta modulation, as used in audio (and similar to what is used in video). This article is supposed to be about saving revisions of large chunks of data, right? Involving the unix diff utility? Not about the difference between subsequent samples within a chunk of data. — Omegatron 20:55, 18 May 2007 (UTC)[reply]
So what if it doesn't save space, it is still delta encoding. Delta encoding is used for more than file revisions. Delta modulation isn't exactly the same thing. Nikola (talk) 06:41, 19 October 2008 (UTC)[reply]
Actually such delta encoding MAY save space if there is other algorithm runs AFTER delta algorithm. For example, LZ or RLE algorithms can't do anything with sequence like 2,4,6,9,7. But 2,2,2 surely can be compressed further to something like instruction to repeat '2' digit for 3 times and this could be more compact than storing 2, 2, 2 directly. This is a well-known method of improving compression ratio of archivers on certain kinds of data (like audio and other certain sorts of data as well) and this IS kind of delta encoding. Not to mention ADPCM and similar algos. —Preceding unsigned comment added by 91.78.236.168 (talk) 15:12, 5 December 2008 (UTC)[reply]
Yes, the point of delta coding is not compression but to enable other effective compression techniques. In particular, universal codes are most efficient on small values. Dcoetzee 15:41, 5 December 2008 (UTC)[reply]

Varint encoding benefits a LOT from delta encoding, as it can use from 1 to 4 bytes to represent each integer, and a delta-encoded stream will increase the number of small values @nobre November 7, 2011 — Preceding unsigned comment added by 200.251.63.3 (talk) 15:55, 7 November 2011 (UTC)[reply]

I agree that "subtracting" each sample from the consecutive next sample is a common and useful technique when compressing a *single* audio file, a technique that should be described somewhere in the Wikipedia.
However, I agree with Omegatron that such differential pulse-code modulation is conceptually different from delta copying, which always involves 2 different files (or at least 2 different versions of a file).
How can we avoid confusing our readers? --DavidCary (talk) 19:15, 2 February 2015 (UTC)[reply]

"skip deltas" variant in Subversion

apparently Apache Subversion uses a variant of delta encoding with "revision N is stored as a delta against N-f(N), where f(N) is the greatest power of two that divides N". Some discussion here http://svn.apache.org/repos/asf/subversion/trunk/notes/skip-deltas . I guess this should go into Delta encoding#Variants section. 76.119.30.87 (talk) 00:21, 11 February 2015 (UTC)[reply]