Talk:Opus (audio format)

From Wikipedia, the free encyclopedia
  (Redirected from Talk:Opus (audio codec))
Jump to: navigation, search
          This article is of interest to the following WikiProjects:
WikiProject Telecommunications (Rated Start-class, Mid-importance)
WikiProject icon This article is within the scope of WikiProject Telecommunications, a collaborative effort to improve the coverage of Telecommunications on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the quality scale.
 Mid  This article has been rated as Mid-importance on the importance scale.
 
WikiProject Free Software / Software / Computing  (Rated Start-class, Mid-importance)
WikiProject icon This article is within the scope of WikiProject Free Software, a collaborative effort to improve the coverage of free software on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.
Taskforce icon
This article is supported by WikiProject Software (marked as Mid-importance).
Taskforce icon
This article is supported by WikiProject Computing (marked as Mid-importance).
 

Codec delay values[edit]

I've been trying to do the math to understand the codec's delay, in particular separating the delay due to frame size from the algorithmic look-ahead delay. However, I can't get the numbers to add up.

  • The Opus RFC says
    • "algorithmic delays ranging from 5 ms to 65.2 ms"
    • "[The LP layer, based on SILK,] requires an additional 5 ms look-ahead for noise shaping estimation. A small additional delay (up to 1.5 ms) may be required for sampling rate conversion."
    • "A "Hybrid" mode allows the use of both layers simultaneously with a frame size of 10 or 20 ms"
    • "[The MDCT layer, based on CELT, supports] frame sizes from 2.5 ms to 20 ms, and requires an additional 2.5 ms look-ahead due to the overlapping MDCT windows."
    • "To compensate for the different look-ahead required by each layer, the CELT encoder input is delayed by an additional 2.7 ms. ... However, the base 2.5 ms look-ahead in the CELT layer cannot be reduced in the encoder because it is needed for the MDCT overlap"
    • "a CELT-only mode for very low delay speech"
  • Xiph's paper High-Quality, Low-Delay Music Coding in the Opus Codec
    • "Opus scales to delays as low as 5 ms"
    • "CELT's look-ahead is 2.5 ms, while SILK's look-ahead is 5 ms, plus 1.5 ms for the resampling (including both encoder and decoder resampling). For this reason, the CELT path in the encoder adds a 4 ms delay. However, an application can restrict the encoder to CELT and omit that delay. This reduces the total look-ahead to 2.5 ms."
  • Nokia's paper Voice Quality Characterization of IETF Opus Codec
    • "The listening tests in this were performed with a constant window length of 20 ms. With that frame length there is also 5 ms look ahead."

Like I said, I can't get these numbers to add up. Here's my math, and the problems with it:

  1. In CELT-only mode, 2.5 ms of algorithmic delay + minimum frame size of 2.5 ms = 5 ms total delay (the minimum the codec can have)
  2. In SILK-only mode, 5 ms of algorithmic delay + minimum frame size of 10 ms + no resampling delay (best case, in which no resampling is needed) = 15 ms total delay
  3. In Hybrid mode, ??? algorithmic delay + frame size + ??? resampling delay = ??? total delay
    • Algorithmic delay seems like 5 ms due to SILK
    • Resampling delay seems like 1.5 ms, again due to SILK
    • This should give a sum of 6.5 ms, which agrees with Xiph's paper and most of the RFC
      • But it doesn't agree with the maximum delay mentioned in the RFC (65.2 ms)... why isn't this 66.5 ms?
      • CELT delays its output, but by how much? The RFC says 2.7 ms (which would sum to 5.2 ms), while Xiph's paper says 4 ms (which sums to 6.5 ms).

Can anyone shed some light on the disagreements between these figures? --Bigpeteb (talk) 17:13, 20 May 2014 (UTC)

I was quite careful to define what was meant by 'algorithmic delay' when rewording and improving the citations within the article in about December 2012 (from memory), with careful reading and interpretation of the reference that best defined it, which I cited. The current wording used in section Quality comparison and low latency performanceis much like what I used...
Total algorithmic delay for an audio format is the sum of delays that must be incurred in the encoder and the decoder of a live audio stream regardless of processing speed and transmission speed, such as buffering audio samples into blocks or frames, allowing for window overlap and possibly allowing for noise-shaping look-ahead in a decoder and any other forms of look-ahead, or for an MP3 encoder, the use of bit reservoir.
Your definition seems to be excluding frame size, which mine is including. However, a working definition used in another document might exclude some items that are included by the definition above. The term "algorithmic delay" is not very widely used and consistently defined in the literature, as frequently it's more important to report the actual latency of a specific physical implementation (with limitations on processor speed and hardware latency also contributing to that figure). Sometimes "latency" is lazily used instead, which is OK qualitatively (high or low) but not quantitatively. There are some items of optional delay such as the length of the FIR filter used in resampling or anti-aliasing, which might also get excluded. I don't have time to look into the specifics now, but I hope this help you narrow in on a definition that makes sense. If in doubt, remove algorithmic delay and replace it with all its component delays so that you can avoid double-counting. The block diagram of the codecs from xiph.org and their presentations such as Jean-Marc Valin's talk is quite helpful in showing where delay arises. Hope this helps a little. For a better discussion, likely to be viewed by some of the Opus developers, try joining the forums at Hydrogenaud.io [[1]] and asking on their Opus sub-forum. Dynamicimanyd (talk) 12:41, 3 July 2014 (UTC)

The numbers in the RFC are wrong :) The 2.7ms may apply to some old version, but it is currently 4ms in the reference implementation. This plus the 2.5ms inherent CELT delay matches the SILK 5ms delay plus 1.5ms required for resampling which is always performed in hybrid mode. Other implementations could differ. There is a specific low delay mode that operates without the extra 4ms, but obviously only with heavy restrictions in place.

The 65.2ms is wrong. That would presumably be 60+2.5+2.7 which we already deduced was wrong. 6.5ms would be correct with resampling (60+5+1.5), but resampling doesn't always have to be done. Lithopsian (talk) 20:39, 15 August 2014 (UTC)

I'm the author of Opus. I confirm that the 65.2 ms and the 2.7 ms figures in the RFC are wrong. SILK has 5 ms delay + 1.5 ms for resampling. On the CELT side, there's a 2.5 overlapping delay, and we add 4 ms to make it in sync with SILK (unless selecting restricted-lowdelay application). — Preceding unsigned comment added by Jmvalin (talkcontribs) 17:29, 17 August 2014 (UTC)

Thank you Lithopsian and Jmvalin for your comments here and Bigpeteb for your edit to the live article. As for a citation, the top paragraph in page 2, column 2 of the AES paper by Valin et al.[2] gives the correct figures of SILK 5.0ms + 1.5ms = 6.5ms = CELT 2.5ms + 4.0ms. It also mentions the restricted low-delay alternative with 2.5 ms delay and CELT only. I will insert that reference and remove the {citation needed} tag.Dynamicimanyd (talk) 16:47, 19 August 2014 (UTC)
My new revision includes that reference, a modicum of explanatory rewording and replaces the old, wrong, "opusdelay" named reference with the more recent and accurate "ValinAES135". Hopefully, this concludes the matter. Thanks to all for initial spot and the fruitful discussion.Dynamicimanyd (talk) 18:18, 19 August 2014 (UTC)

Spectrogram[edit]

I'm not going to edit this article myself since I'm one of the main authors of Opus, but I really think the spectrogram should go -- or at the very least be done properly. These spectrograms (the ones for Vorbis/MP3/AAC too) aren't showing anything interesting. To be more precise, they're showing two things:

  • the low-pass thresholds for the particular encoder they've been generated with
  • that when you have a signal close to the max PCM value, you get clipping all over the place when decoding (all of these vertical stripes going all the way to Nyquist are clipping)

In no way do any of these spectrograms represent anything meaningful about the formats. Jmvalin (talk) 19:45, 29 August 2014 (UTC)

I was pondering that same spectrogram when editing the article recently. It's the one diagram that shows little useful information. I imagine something like the bitrate sweep [3] audio-visual example would be more illustrative - and the whole site is Creative Commons licensed allowing remixing. While we can't used Javascript Dynamic HTML on Wikipedia, we could for the time being screen-capture it and encode it to WebM or Theora for use here. I believe those formats in Wikimedia only support Vorbis, but we could transcode the decoded sweep.wav into a highish-quality like -q 7 (~224kbps) or above to give a close approximation of sub 65 kbps Opus 1.0 with negligible transcoding artifacts and good enough for an illustrative example. Dynamicimanyd (talk) 16:38, 3 September 2014 (UTC)