Talk:Additive synthesis/Archive 2

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

Archive 2

Archive 3

Criticism of the introductory paragraph

At the time of writing the introductory paragraph is:

Additive synthesis is a sound synthesis technique, based on the mathematics of Fourier series. It creates timbre by adding overtones together.

This has a few problems:

- The first phrase "Additive synthesis is a sound synthesis technique" implies that we are discussing a single technique. This is arguable, since it is possibly a family of techniques. Especially if you are going to include digital, analog, pipe organ, harmonic, inharmonic, fixed-waveform and time varying. Note that the Roads quote I included elsewhere on this page defines it as a family of techniques. None the less, if it is assumed to be an overarching category for all these techniques, then the second phrase of the first sentence is problematic:

- The second phrase "based on the mathematics of Fourier series" refers strictly to *harmonic* additive synthesis (as per the Theory section). Therefore it would be better to say "inspired by the mathematics of Fourier series" (influenced perhaps?) or some similar construction. I think it used to have something softer like this.

- In a similar vein "by adding overtones together" is also a reference to harmonic additive synthesis since overtones (at least as defined on the linked page) are usually understood to be harmonics, not just sinusoidal partials. The second paragraph is more accurate and complete and addresses these issues.

Overall I think this would be better:

Additive synthesis is a family of sound synthesis techniques where timbre is created by adding multiple components together. In one common form, components are sine waves arranged in a harmonic series.

Ross bencina (talk) 20:18, 12 January 2012 (UTC)

So change it, Ross. If I did, I might get slapped down. I do not think that (except in the context of group additive synthesis) that there should be anything other than sinusoids being added in the lede description. How about?:

Additive synthesis is a sound synthesis technique in which timbre is created by adding several sinusoidal components (partials) together. In the case of tones with harmonic overtones, these sinusoidal components all have frequencies that are at integer multiples of a common fundamental frequency and a nearly periodic function is synthesized which can be expressed as a Fourier series.

It's a little wordy, and maybe no association need be made with Fourier, but it seems to me to be appropriate to make that association. 70.109.183.99 (talk) 21:06, 12 January 2012 (UTC)

See Wikipedia:Manual of Style/Lead section#Introductory text. The suggested text doesn't really meet the criteria. E.g. where it says "The first paragraph should define the topic with a neutral point of view, but without being overly specific.". The suggested text really is over-specific. Charles Matthews (talk) 21:34, 12 January 2012 (UTC)

The problem is, Charles, that additive synthesis is about adding sinusoids together. As best as I can tell, it was not meant to be about adding square waves or triangle waveforms or Walsh functions or some other set of basis functions together. If the lede does not make that specific point (we're adding sine waves together), I believe it will be inaccurate. But the sinusoids do not have to be harmonically related, that is also important in the fundamental description of what additive synthesis is. 70.109.183.99 (talk) 22:02, 12 January 2012 (UTC)

If you can reference a definition of "additive synthesis" which really does say that (i.e. linear combinations of sinusoidal waves that may have incommensurable frequencies), then fine. Wikipedia does ultimately work on definitions that are "textbook", and if so those will be verifiable. The lede needs to cut through to essentials, so definitions are key, certainly. If the definition is right, then saying "is a technique", "is a family of techniques", "is one of a number of techniques", "is any technique" that does X, is the sort of nuance which is not really fundamental to explaining the main deal. It can go lower down the article. Charles Matthews (talk) 22:33, 12 January 2012 (UTC)

Well, a reference might be from Julius Smith:

Additive synthesis is evidently the first technique widely used for analysis and synthesis of audio in computer music. It was inspired directly by Fourier's theorem (which followed Daniel Bernoulli's insights) which states that any sound s(t) can be expressed mathematically as a sum of sinusoids. The term "additive synthesis" refers to sound being formed by adding together many sinusoidal components modulated by relatively slowly varying amplitude and frequency envelopes.

So, Charles, what might you want to do with that? I think "inspired by" is better than "based on", but I also think that "Fourier series" is better than "Fourier's theorem". There is an inverse FFT method of doing additive synthesis (and it's messy), but what it is, essentially, is adding up a finite number of discrete sinusoidal components of specified amplitude and frequency (and inverse Fourier transform is normally more than that). The parameters going in is a finite set of frequency/amplitude pairs and sound comes out. 71.169.180.195 (talk) 05:35, 13 January 2012 (UTC)

What I'd want to do with that is to add a third para in the lede explaining what is going on in terms of modulation. This would be a good style for us: so-called "concentric": first para very crip, second para (which I have just tweaked) more technical. Then the third para amplifying what is said above it in terms that an expert could accept, and alluding to typical examples. That would be good, I think. Charles Matthews (talk) 08:22, 13 January 2012 (UTC)

I think it would be useful to acknowledge that there appear to be two separate concepts that are commonly referred to as "Additive Synthesis." (1) the one concerned only with sinusoids, supported by 70.109.183.99 and Julius Smith. It is also a view I have supported elsewhere on this talk page. (2) there is another usage related to a general synthesis "paradigm" of mixing simple components to create complex timbres -- this does not strictly require sinusoids, just spectral fusion or perceptual coherence of the result. In this sense it is additive (mixing) in contrast to subtractive (filtering). This latter definition is hinted at by the following textbook definitions:

Computer sound synthesis for music general falls into one or more of four basic categories: (1) *Additive synthesis models*, in which elemental sound components (such as sine waves) are added together in time-varying ways. Elements of Computer Music, F. Richard Moore, Prentice Hall, 1990. p 16.

The basic notion of additive sound synthesis is that complex musical sounds can be created by adding together multiple sound components, each of which is relatively simple. Such components are typically not perceived individually, but each component contributes materially to the overall quality of the resulting sound.

The greatest advantage of the additive method of synthesis is that it rests on the very well developed theoretical framework of Fourier analysis. For this reason, additive synthesis is sometimes called *Fourier synthesis*, though that description is more restrictive than the more general notion of additive synthesis.Ibid. pp. 207-208

*Additive synthesis* is a class of sound synthesis techniques based on the summation of elementary waveforms to create a more complex waveform. The Computer Music Tutorial, Curtis Roads, MIT Press 1995. p 134. [While most of Roads' chapter refers to sums of sinusoids it begins with the pipe organ and ends with Walsh functions.]

In my view this page could safely take the "sum of sinusoids" definition as the principal one. This agrees with previous comments by 71.169.180.195. For completeness, a "Broader interpretations of Additive Synthesis" section could be added, quoting Roads above and listing related methods that don't involve sinusoids (such as Walsh functions, pipe organs, tone wheels etc). Does this sound OK Charles? Ross bencina (talk) 10:30, 13 January 2012 (UTC)

In light of the above I have edited the lead to focus on sine waves. I have added a "Broader definitions of additive synthesis" section. This text could perhaps appear instead in the lead. Thoughts? Ross bencina (talk) 16:18, 13 January 2012 (UTC)

I'm okay with all of this, Ross. I have trouble, historically and conceptually, equating this Walsh function synthesis or Wavetable synthesis to Additive synthesis. Heck, if you stretch this broadened definition enough, you can say that Sampling synthesis is a form of Additive synthesis, since it adds up a bunch of Kronecker deltas:

y[n]=\sum _{i=-\infty }^{+\infty }y[i]\delta _{ni}\

It's a kinda dumb example, I know. 71.169.180.195 (talk) 19:32, 13 January 2012 (UTC)

Fully agree. Both the Roads and Moore quotes above reflect attempts to use Additive Synthesis as a domain-level taxonomical division. In (Roads) it is used as a chapter title to conveniently group somewhat related techniques. In (Moore) it is listed as one of four basic categories. In both cases "Component summation methods" would have been better. None the less I think it is an established alternative usage and it should be addressed by the article to avoid confusion.

Question: should we move the content of "Broader definitions of additive synthesis" to the lead?

Ross bencina (talk) —Preceding undated comment added 04:49, 14 January 2012 (UTC).

I dunno. If it were up to me, I would start out with the principle definition of Additive synthesis being the explicit addition of a finite set of sinusoids of arbitrary and specified amplitude and frequency. This would be the case whether it's real-time or not. If it's not real-time, it creates a "sample" that can be played back later. Then I might differentiate between harmonic and not-necessarily-harmonic additive synthesis. In the harmonic case, I would connect the concept to Fourier series. Then I would say there are some other less-direct ways of doing this, such as the inverse FFT, and wavetable synthesis, the latter of which would be only for the harmonic case (unless you detune the harmonics by rapidly changing the phase, which is an arcane detail that shouldn't go into the article).

Then maybe I would extend the concept out to other basis functions like Walsh, but emphasize that virtually all of the time when "additive synthesis" is referred to, it is meant to be about adding sinusoids. Then maybe go into all the sinusoidal modeling and analysis/resynthesis STFT stuff. That would cover nearly all of the academic stuff and what would be left would be the historical stuff about what they built long ago and what commercial products that do it or have done it. I really don't have all the ideas down for how to lay out the article.

Thanks for coming over and helping with the article. (And thanks for getting the reference on Alice.) 71.169.180.195 (talk) 05:20, 14 January 2012 (UTC)

On that basis I think we should leave things as they are at the moment (i.e. a separate section late in the document about more inclusive definitions.) Calling a Walsh function additive synthesis is categorically different from the other "advanced" methods such as SMS -- as such it is really just a style question of whether to include "Definition 2" in the lede. Perhaps with a single sentence "Additive synthesis has also been used as a general term to refer to the class of synthesis techniques that sum multiple elementary components." The whole lede section still needs a re-write. I may attempt it later. --Ross bencina (talk) 06:24, 14 January 2012 (UTC)

Inline TeX vs HTML and what truly tidies up.

Hi Chris Johnson, I've been editing Wikipedia since maybe 2004 or 2005 (you'll have to take my word for it, because I won't show you the editing history) and I really wish there was just one way to do math on the pages, but there has always been this discussion/debate about what looks better. In addition, there is a new math mark up construct {{math|''f''(''x'') {{=}} ''x''<sup>2</sup>}} or $f (x) = x 2$ . I am not sure what this latest construct gets us, but it's there.

I would normally want to use precisely the same construct for inline math as I would for an equation that stands alone on a line. That would normally mean LaTeX, like $f(x)=x^{2}\$ , but for some reason that is discouraged here. In addition, if there is no \ {space} character on a line that can be reduced in size, the inline LaTeX is reduced to display differently than one might expect. Sometimes even, the appearance is identical to what it would look like if it were inline HTML. Here it is without the backslash: $f(x)=x^{2}$ , just like $f (x) = x 2$ .

After much back-and-forth with other technical and math editors (User:Michael Hardy comes to mind), I finally came to the conclusion that the best and most accepted formatting decision is to use <math> ... </math> for separate equations that live on their own lines, and I put in a backslash-space in there to make sure it renders like LaTeX, and inline with the text body, I do everything I can to markup with HTML constructs. That looks the most consistent and readable and is essentially what all these other math editors told me to do. Do you think the format changes you made really makes the math look better? I really do not.

Also, if we do not connect this additive synthesis theory to the concept of Fourier series, we can cut out all that crap and just say: This is what Additive synthesis with sinusoids is:

y[n]=\sum _{k=1}^{K}r_{k}[n]\cos \left({\frac {2\pi }{f_{\mathrm {s} }}}\sum _{i=-\infty }^{n}f_{k}[i]\right)

or

y[n]=\sum _{k=1}^{K}r_{k}[n]\cos \left(\theta _{k}[n]\right)

where

\theta _{k}[n]=\theta _{k}[n-1]+{\frac {2\pi }{f_{\mathrm {s} }}}f_{k}[n]

That is all that it is. And maybe the Theory part should be stripped to that. But if you wanna relate this to Fourier series, I do not see a way that is as complete and concise as what we had to start with. So, what do you guys think? Including Clusternote. You get to contribute to the article too, as long as you do not try to take it over again and replace correct mathematics that you don't understand with incorrect mathematics that you think you understand. And

y[n]=\sum _{k=1}^{K}r_{k}[n]\cos \left({\frac {2\pi }{f_{\mathrm {s} }}}f_{k}[n]n\right)

is incorrect mathematics since it doesn't work unless f_k[n] is constant w.r.t. n.

Also, for historical reasons, we should revisit the real-time vs. non-real-time thing again. That Bell Labs synth was the first real-time (in the full meaning of the term) additive synthesizer, if I recall correctly. Before that, we would write computer programs that would run at maybe 5 or 10 times slower than real time (so it would take 5 or 10 minutes to create 1 minute of music) and would compute a soundfile that would be written to hard disk and reproduced at a later time. That is the essential difference between real-time and not. A real-time synth is one where you hit the key and the note is synthesized on the spot. That also means if you slide a fader that is supposed to affect the note, you will hear the effect immediately. This is about being live and is related to time-varying but is not the same thing. But there is an historical issue about what is real-time additive synthesis and what is not. And that should be in the article. 71.169.180.195 (talk) 18:59, 13 January 2012 (UTC)

Added real-time to Alles Machine entry. Verified and cited. Is that sufficient? Ross bencina (talk) 04:21, 14 January 2012 (UTC)

I'm happy to change the maths formatting to whatever is the accepted standard. The inline maths before I edited it was just italic sans-serif text produced with with the standard double quotes (''x+y'' = x+y). I changed that to using the math tag, which for the simple inline equations on the page displays (on my computer) as an italic serif HTML font rather than as LaTeX (<math>x+y</math> =

x+y

). I'm quite happy to change this to the new HTML math markup style ({{math|''x'' + ''y''}} =

x + y

) if that's the accepted way of doing things -- or were you suggesting going back to the sans-serif math without the new math markup tag (x+y)? (To my eyes, the italic serif font produced by inline <math> or the new HTML markup looks nicer than the italic sans-serif font that was there before my edits, but it's going to be browser/OS dependent so I can well believe it looks worse elsewhere.)

On the subject of what is required in the theory section, we could indeed strip it down to the three equations that you have above, perhaps then adding that if everything is time-invariant, this is equivalent to the

\sum (\sin \ldots +\cos \ldots )

form, which is a Fourier series. However, if additive synthesis is presented in the introduction in the context of Fourier/harmonics/etc., then we should probably start with the equations in a Fourier-like form and show that they can be extended to inharmonic/time-varying frequency, as the page is currently (though I'm happy to reduce the amount of detail considerably compared to what is there presently). Chrisjohnson (talk) 15:45, 14 January 2012 (UTC)

I'd be happy to use {{math}} in inline equations, I think it flows better with the text than <math>. Makes math look less scary, so to say. But everyone appears to use <math> currently so... Perhaps some gnome will do the conversion some day. Thanks already. Olli Niemitalo (talk) 01:04, 17 January 2012 (UTC)

How about having the following kinds of equations for continuous-time output signal y(t), each in the most simple form, in a Theoretical background section (from the most easy to understand and simple to the most complex)?

1. Continuous-time, harmonic partial frequencies, constant partial phase offsets (phase at time zero), constant partial amplitudes.

2. Continuous-time, harmonic partial frequencies, constant partial phase offsets, time-dependent partial amplitudes.

3. Continuous-time, non-harmonic partial frequencies, constant partial phase offsets, time-dependent partial amplitudes.

Then the body of the current theory section could be moved to a separate Discrete-time equations section. I've began to think that maybe we should not cripple (by removing equations) the rather solid block of discrete equations too much. It's a nice reference. Olli Niemitalo (talk) 18:07, 14 January 2012 (UTC)

Okay, can I ask that someone who is more than an IP (IPs cannot upload images to the en Wikipedia) create a simple drawing similar to Figure 22.1 at the bottom of this reference? Please leave out the noise source and, for consistency, use r₁(t) instead of A₁(t), etc. And please label the output of the big summer y(t).

We could use that simple diagram to start with. Then we can start with the simplest, continuous-time, and general (harmonic or inharmonic) additive synthesis equations. Then we can work this the other way, first turn them into discrete-time counterparts, then, for the harmonic synthesis case make these more specific toward harmonic synthesis, and then relate it to Fourier series. Even though the electrical engineers among us might like to see these equations in the more compact form using the complex exponential, I really think that we should keep this real, so we would have cosine and sine terms when we compare to Fourier series. One advantage of doing it this way is that we can get rid of the absorbing the change of phase into the instantaneous frequency term step that has been hanging up some unidentified persons.

So would someone want to create an PNG or SVG graphic like the one in the JOS reference? Clusternote, would you like to do that and contribute usefully to the article? That is something I cannot do. 71.169.180.195 (talk) 21:52, 14 January 2012 (UTC)

When drawings are needed, it should be drawn by the someone who think it is absolutely needed, to express his/her own original intention. Most users are usually doing so. If someone can't upload his/her drawing on Wikipedia for several reason, still he/her can transfer its drawing via other upload service with the help of other users, as we helped someone a few days ago. --Clusternote (talk) 06:55, 16 January 2012 (UTC)

Review of assertions in Wavetable Synthesis section

I just deleted the following unattributed comment from Wavetable Synthesis section:

wavetable synthesis requires just as much computation as additive but transfers much of the computation to a pre-synthesis step.

I would like to point out that this claim is incorrect. Wavetable synthesis only requires pre-computation of a relatively small set of single-cycle waveforms ("breakpoint waveforms" if you like). These can then be looped and slowly crossfaded to ramp between sets of partial amplitudes. A cross fade from one wavetable to another is the same as all of the partials fading from one set of amplitudes to another. The Beauchamp reference elsewhere provides concrete examples I think.

On the other hand, it could be claimed that wavetable synthesis is *possibly* less space-efficient than oscillator bank additive synthesis. However, the article makes no claims about space efficiency. --Ross bencina (talk) 10:53, 14 January 2012 (UTC)

Oh, you're probably right on all counts, Ross. Given the same frequency-domain additive synthesis parameters (the r_k(t) and φ_k(t) envelope functions, where φ_k(t) is sorta related to the frequency, f_k(t), of the harmonic partial - it is possible in wavetable synthesis to detune it a little from the perfectly harmonic; k f₀), when t is fixed to some value, both (harmonic) additive synthesis and wavetable synthesis do a form of inverse Fourier transform to create a time-domain waveform. It's just that wavetable does that in advance of the MIDI Note-On event and straight additive does it immediately after the MIDI Note-On event. They both have to do that computation, but for wavetable, the time-domain data is loaded up and ready to rock-n-roll when the key is pressed.

If the breakpoints are all at the same times for all r_k(t) and φ_k(t) envelopes, and if your wavetable isn't "oversampled" (that is the number of wavetable points is about equal to 2K where 1 ≤ k ≤ K), then the amount of space required by the two methods is about equal. But if the breakpoints in the r_k(t) and φ_k(t) envelopes do not fall on the same times (for different k), then wavetable synthesis is less space-efficient because, for wavetable, once one r_k(t) requires a breakpoint, then all of the r_k(t) functions get a breakpoint at that same time. So, because if this added restriction, the breakpoints aren't as optimally determined. But those are the only two reasons I can think of for why the wavetable data would take significantly more space than the straight additive data; one is if the wavetable is oversampled and the number of wavetable points greatly exceeds 2K and the other is the less than optimal breakpoint locations because the breakpoints for all harmonic envelopes must occur coincidently.71.169.180.195 (talk) 19:01, 14 January 2012 (UTC)

Audio samples

I note that this page has a {{Audio requested}} requested recordings tag. I'm happy to synthesise some (and provide the C++ source for them); I was thinking of making one sound with harmonic partials of time-varying amplitude (showing how the timbre varies for a fixed fundamental frequency), and one more abstract sound with many evolving inharmonic time-dependent-frequency partials. Any thoughts? Chrisjohnson (talk) 16:01, 14 January 2012 (UTC)

Would be neat to have also graphical presentations of those sound samples. Spectrograms maybe, or envelopes of the partials drawn in different colors in an amplitude vs. time plot. Something illustrative, not too pedantic. Olli Niemitalo (talk) 16:47, 14 January 2012 (UTC)

Something like this? (the inharmonic one might be a little more musically interesting...) Chrisjohnson (talk) 23:51, 14 January 2012 (UTC)

Harmonic additive synthesis example

Example of harmonic additive synthesis. The fundamental frequency is 440Hz.

Problems playing this file? See media help.

Inharmonic additive synthesis example

Example of inharmonic additive synthesis.

Problems playing this file? See media help.

That looks great! It's fine as it is if we drop these in different places along the story, but if we want a gallery (with graphs on top and audio below each one) then the figure would need to be compacted horizontally so that a few will fit on the same row. Dunno which it is yet. Olli Niemitalo (talk) 10:59, 15 January 2012 (UTC)

A comment on the second sample. Could you please make another version that has more subtle frequency variations (not to step into the realm of FM synthesis) that do not make the "harmonics" overlap, perhaps with randomized initial phases for the vibrato LFO phases for each harmonic so that we don't get any coincidential transients and chirps. Thanks for your high-quality work on these! Olli Niemitalo (talk) 11:50, 19 January 2012 (UTC)

Inharmonic example added above, and the images now reduced in width. I can't figure out how to get the images on top of the audio samples while retaining the 'play' button (though there is a way of doing it if the audio is a linked just from a piece of text, rather than from a button). Chrisjohnson (talk) 01:22, 16 January 2012 (UTC)

I understand that these examples are intended to be illustrative of concepts in the theory section -- for this they are excellent. But would it also be possible to have at least one example that sounds like a musical instrument? (a piano tone, a vocal sound, or perhaps a gong or bell?) It doesn't have to be a resynthesis (probably it shouldn't be) but it should at least give the reader the idea that additive synthesis is useful for synthesising *common* *complex* musical timbres. Such an example could be used in the introduction before the theory section, and titled something like "Additive synthesis of a Piano-like tone". Just an idea. Ross bencina (talk) 05:03, 16 January 2012 (UTC)

Good idea - I'll put together such an example. I looked for a less abstract example in (popular) recorded music that could be used - but couldn't find anything that was both free and definitely produced by additive synthesis. If anyone can think of such an example, it would be nice to use that: it would illustrate the significance/history of additive synthesis in a way which samples that are created just for the wikipedia page cannot. Chrisjohnson (talk) 11:31, 16 January 2012 (UTC)

A bell-like example is now at the top of the article (it was the easiest sound to do of the three you suggested without resorting to analysis/resynthesis - and is quite a representative additive sound I think.) (talk) 00:34, 18 January 2012 (UTC)

Perhaps if the summation of sinusoids diagram that the IP user suggested can be made small enough while remaining legible, this diagram could sit up at the top right of the page too, in the manner of the info boxes on many pages (Cello, for example). In the long term, it would be nice to have such boxes with a schematic image and a sound example for all the synthesis methods in the Sound_synthesis_types template, but that may take a while... Chrisjohnson (talk) 00:34, 18 January 2012 (UTC)

I've put together a schematic image of additive synthesis (right), which (unlike File:SSadditiveblock.png that was imported from commons) is designed to be legible at small sizes (200px wide). I've put it at the start of the description section for now, but as I mentioned in the my comment above, I think it might be best placed at the top of the page in an 'sound synthesis methods' infobox with the sound sample - once I/we can find/make images and example sounds for other types of sound synthesis. With its use in an infobox in mind, I've simplified it to omit any phase control, and used

a_{k}

for amplitude, since without room to label,

a_{k}

was more obviously amplitude than

r_{k}

. I'd be interested to know if people favour keeping the diagram where it is and linking it closely to the maths, rather than keeping moving it to an infobox in due course -- or indeed if people think a diagram like this is useful at all. Chrisjohnson (talk) 10:38, 19 January 2012 (UTC)

I think the diagram is good as it is. To change a to r might be more misleading because most readers are sure to look at the image first rather than to read the math. The current position looks good (I did not try any alternatives). Olli Niemitalo (talk) 11:50, 19 January 2012 (UTC)

Should we change all the rs to as in the math? Problem is that a_k and b_k are pretty well established in the textbooks for the cos() and sin() in Fourier series, and r_k and φ_k exist pretty widely as the length and angle of a complex number (for which a_k and b_k are the real and imag parts). I would be in favor of consistency throughout the article and as much consistency as possible with the most immediate lit. I have tried to keep any reference to complex variables out of this (hence the use of i for a dummy index). Maybe we should leave out the Fourier series reference out of it. But the diagram looks real good. Thanx. 70.109.178.133 (talk) 17:27, 19 January 2012 (UTC)

I don't mind if it's rs. By the way it would be easier to talk if everyone would intend similarly. Olli Niemitalo (talk) 18:54, 19 January 2012 (UTC)

Well, you don't know what I intend: "Workers of the world Untie!" :-)

Anyway, if Chris doesn't wanna change "a" to "r", I am considering changing a_k and b_k to p_k and q_k, then changing r_k to a_k. Who likes that idea? Who doesn't? Who wants the drawing convention to be inconsistent with the math? 70.109.178.133 (talk) 19:45, 19 January 2012 (UTC)

I'm happy to change the symbols in the diagram to "r" and will do so when I get home. It can always be reverted later if need be. If the diagram is staying where it is by the maths section, I think it's important to keep the notation consistent, and there's space to mention that r is amplitude in the caption. I prefer a_k and b_k to p_k and q_k — but I did wonder about A_k and B_k for the Fourier coefficients (I think I've seen that notation somewhere before). Chrisjohnson (talk) 20:15, 19 January 2012 (UTC)

I think it is best if both visually interesting synthesis sample by Chris, and other more natural samples, were shown on article, if possibly.

Also I remember one of earliest gong sound emulation was done on Trautonium in 1942 at the latest. Possibly it may be one of the earliest inharmonic additive synthesis. --Clusternote (talk) 05:32, 16 January 2012 (UTC) P.S. Also it should be added to timeline section. (possibly later "subharmonic synthesis" used on Mixture Trautonium (1952) and Subharchord (1960s), may be more appropriate) --Clusternote (talk) 06:07, 16 January 2012 (UTC)

Argument disturbance by IP user

Will you listen to my criticism? 71.169.180.195 (talk) 06:26, 16 January 2012 (UTC)

Or, if you do not want to hear any criticism from me, would you allow me to spell out to you exactly what this instantaneous phase and instantaneous frequency is about? 71.169.180.195 (talk) 06:26, 16 January 2012 (UTC)

If it becomes apparent to other editors (like Chris or Olli or Ross) that the effort and time required to disassemble your footnote exceeds what they would be able to provide, and since the time they have to work on the article likely is already limited, if it becomes just too difficult and inefficacious to get you to understand, can you accept that the footnote is not ready for the main namespace? I wasn't the only editor to remove your contribs to the article and this is the first time I did remove it since the previous "edit war", and since there has been robust participation in the article by some very knowledgeable and productive editors. 71.169.180.195 (talk) 06:26, 16 January 2012 (UTC)

IP user 71.169.180.195, you didn't explained any about continuous form other than personal attacking on your past posts. If you can't explained it in continuous form, you shouldn't discuss on it. --Clusternote (talk) 07:08, 16 January 2012 (UTC)

Clusternote, do you think that the other editors agree with your evaluation of "problematic". I think you deem yourself as "not problematic". Do you think the other editors agree with your evaluation of "not problematic"? 71.169.180.195 (talk) 06:26, 16 January 2012 (UTC)

71.169.x, please keep your edit summaries objective in tone. User:Clusternote, I don't think that the idea of the footnote was really very helpful. Charles Matthews (talk) 09:44, 16 January 2012 (UTC)

Charles, I'm glad for your advice.

Then, how should we simply explain that section's too detailed implementation specific description to generic users ? Essentially, that section merely describes details of below equation (below is tail of above supplemental note).

General expression of inharmonic additive synthesis

Second equation shown on the top of section can be rewritten using above result, as following:

\textstyle y[n]=\sum _{k=1}^{K}r_{k}[n]\cos \left({\frac {2\pi }{f_{s}}}\sum _{i=1}^{n}f_{k}[i]+\phi _{k}[n]\right)

\textstyle =\sum _{k=1}^{K}r_{k}[n]\cos(\theta _{k}[n])

The last form matches with the "general expression" shown on the tail of section.

However, even after recent improvements, that section still seems complicated and hard to grasp its intention, without any supplemental note. --Clusternote (talk) 10:02, 16 January 2012 (UTC)

Review of the article

I have had a look down the article and it looks fairly bad.

The first chunk of mathematics is presumably OK, but a bit pompous given knowledge of what a Fourier series is. The second chunk may not be needed, or not there anyway.

I'd like to re-order the whole thing, starting with the theory of the "harmonic signal", then something like the bit about "Additive resynthesis"; and then the examples that are appropriate for the harmonic signal. After that the quasi-periodic signals; and speech synthesis last of all.

I don't know how easy it would be to reference everything in the current article properly. I do know that addressing writing issues is much easier once there is a clear logical flow. Charles Matthews (talk) 21:29, 9 January 2012 (UTC)

Thank you to the experts trying to clean up and verify the mathematics here. I have a question... since many readers may not be inclined or able to work through the math necessary to verify equations... isn't sourcing those equations from an existing reliable source the best idea? --Ds13 (talk) 05:34, 10 January 2012 (UTC)

Charles, Ds13, others, thanks for your sincere handling of this issue. Finally, bad mouth IP user corrected (probably his own) original description written on Theory section (at that time). Latest results of his corrections can be seen on top of Definitions section and Discrete-time equations section. One of the most serious errors which made original description almost incomprehensible for over four years was: confusion between "phase offset" $\phi _{k}\$ and instantaneous phase $\theta _{k}\$ . Similar confusing expression is also seen on other source [5], as slightly mentioned by other user. Oh my God!

Here is more accurate explanation on these additive synthesis equations, based on the discussion on this talk page. I wrote it for general readers who studied undergraduate math, to help their rational understanding on these equations. (Though, neither original source nor rationality (why instantaneous phase is required ?) of these equations are not yet clarified...)

I am glad for everyone's efforts for article improvements. ! --Clusternote (talk) 01:59, 23 January 2012 (UTC) [Added details]--Clusternote (talk) 06:37, 23 January 2012 (UTC)

Ds, User:Clusternote is not an expert. He doesn't know shit and he has been crapping up the article immensely.

I have added the {{talk page}} template to this page. As you can see, it says "Avoid personal attacks". I must ask you to do exactly that. Charles Matthews (talk) 08:29, 10 January 2012 (UTC)

The language in the discussion of the mathematical basis of additive synthesis may sound "pompous", I didn't write it, nor did I write the original equations, but I might have touched them up in the past. I don't remember. I think the pompous language was really more condescending than pompous. I believe it may have been so because it may have originally been written by computer musicians who may have marveled more at the math regarding Fourier series than mathematicians, scientists, and engineers do. This is also why (in my estimation) there was this lofty reference to "Fourier's theorem" or such, that I yesterday simply changed to Fourier series in the lede.

Now, given that the consumers of this article will be both the hard-core techies that might know a lot about Fourier analysis and little about music and the computer musicians that would have that strength in expertise turned around, I would say that the article would do well to clearly connect the Fourier series concept first to harmonic tones (these would be quasi-periodic, but not in the strict sense that mathematicians mean, but closer to the almost periodic functions, but the use of the term has been used in the audio signal processing and computer music lit for years) and then to generalize to non-harmonic additive synthesis. Now to do that rigorously, we would have to first start out with perfectly periodic real signals (at least while the synthesizer key is depressed) that would have cosine and sine terms in continuous-time with frequencies going up to infinity. Then we would generalize a little and make the tone quasi-periodic by allowing the a_k and b_k coefficients to be slowly changing functions of time (like envelopes in a synthesizer). Then we would have to limit the top frequency (and the upper limit of the Fourier series summation) so that the signal would be bandlimited. Then we could sample it by substituting t = nT = n(1/F_s). Those four beginning steps were obviously left out from the article, even from its very beginning. Charles, do you see a good reason to include those steps? I didn't, otherwise I would have added them some time ago. But if you do and want to really make this look like a tutorial, I'm okay with it. My major concern was to go from a concept that looks similar to Fourier series and get to the equations that define exactly how we program the sample processing code that performs additive synthesis and the following does that. What words and citations one dresses that up with is less of a concern for me as long as it is both accurate and readable.

So, I'm plopping this down as a starting point and ask that you, Charles, quickly delete or correct the incorrect math that Clusternote has ignorantly replaced it with. It may have started out as WP:good faith but, because he refuses to recognize and admit that he just doesn't get the math, insisting that he canonize his ignorance in the article (and calling any alternative "false" or "incorrect" or "vandalism") is no longer good faith. I will not deal with him anymore because he is not a "straight shooter". He is fundamentally dishonest, and the only thing to do now is simply call him on it (and then ignore him).

BTW, I'm confident of your mathematical chops, Charles, if there are questions you have about the practice of additive synthesis, what or how we coders deal with it, feel free to ask any. 71.169.179.65 (talk) 06:20, 10 January 2012 (UTC)

I forgot to mention a common notational convention in digital signal processing that you might not be aware of, Charles. For discrete-time signals, rather than put the discrete-time index, n, in a subscript (we might have a vector of signals as we do here and each element might be subscripted already), we show n as an argument, just like the continuous-time t, but we put the discrete-time argument into square brackets to indicate that there should only be integer values going in there. So it's x(t) and x[n] where t = nT. — Preceding unsigned comment added by 71.169.179.65 (talk) 06:39, 10 January 2012 (UTC)

So, what would you add or subtract from the following mathematical development?:

I would certainly rewrite it to be somewhat more accessible to the "general reader" (point 1); the usual way is to use a phrases like "in other words, a periodic signal is written as a superposition sum of sine waves ...". Charles Matthews (talk) 08:29, 10 January 2012 (UTC)

So would you do it? Presently the section Non-harmonic signals as it stands is fully incorrect, for the reasons that I and two other editors have pointed out. If time is a problem, can you at least delete that section so that Wikipedia does not look so stupid in the meantime? 71.169.179.65 (talk) 18:17, 10 January 2012 (UTC)

Supplemental note for section "Inharmonic partials" using continuous form

As we already discussed on above, "Additive synthesis#Inharmonic partials" section which described on too specific details of discrete implementation needs supplemental explanation using continues form for readers familiar with continuous form. Also it is needed for additional verification on complicated description using discrete form on the section. I've added below supplemental note on Additive synthesis#Footnotes section.

Previous note (revision 472047820) was gone to the archive. More recent note may be found on below.

I expect your criticism on above supplemental explanation. sincerely, --Clusternote (talk) 02:22, 16 January 2012 (UTC) ERRTA--Clusternote (talk) 03:21, 17 January 2012 (UTC)

New "revert-war" caused by IP user

However, an ill-mannered IP user seems to start his faithless revert-war on footnote without preceding discussion, and even start personal attack on talk page and edit summary field using his poor English. Probably, he still not understand above description is just an outline using continuous form corresponding to discrete form on the section.

With our intelligence, how to handle this problematic IP user ? For this specific issue, I expect advices from third person who isn't concerned on this issue (i.e. other than me and IP user). best regards, --Clusternote (talk) 02:08, 16 January 2012 (UTC)

In my opinion, IP user provided sufficient reason for his deletion in the edit summary. Clusternote: the guidelines ask you to refrain from personal attacks (criticising "poor English" in this case). So far everyone has refrained from criticising your poor English and I think you should give other people this same respect. Ross bencina (talk) 05:11, 16 January 2012 (UTC)

Why you think he provided sufficient reason ? We needs supplemental explanations on corresponding section for "general readers" who studies physics and mathematics in continuous form. If you found any defeats on supplemental notes, it should be improved instead of reverted. I expect your honest criticism on above notes.

By the way: My ironical expression "poorly English" means "inappropriate words" shown on his edit summary. (also he repeatedly cause personal attacking in the past). As you recognize, I also recognize rather my English is poor : ). sincerely, --Clusternote (talk) 05:55, 16 January 2012 (UTC)

P.S. I also expect criticism by other users who isn't concerned on this issue (i.e. other than IP user), as wrote on top. --Clusternote (talk) 07:01, 16 January 2012 (UTC)

Supplemental note on inharmonic discrete equations (rev.3.3)

[rev.3]: With the great help of participants of related discussion, I've revised above note as following. --Clusternote (talk) 16:42, 19 January 2012 (UTC) [added summary]--Clusternote (talk) 22:29, 9 March 2012 (UTC)

[rev.3.2]: Added supplements for more about instantaneous terms, mention on Rodet & Depalle 1992, and introduction of multiple band-limited (I'll add more on the later). --Clusternote (talk) 23:26, 26 January 2012 (UTC)

[rev.3.3]: cleanup. moved complicated part into footnote, etc.

Supplemental note using continuous form — for section Inharmonic form of Discrete-time equations —

The sub-section "Inharmonic form" was described with discrete equations using following notions:

"instantaneous phase" $\theta \$ , instead of "phase offset" $\phi \$
(possibly reverse use of these characters might be also familiar),
"instantaneous frequency" ${\frac {1}{n}}\sum _{i=0}^{n}f_{k}[i]\$ in discrete form (also, divided by discrete time $n\$ ),^{[note 1]}
instead of "time-varying frequency" $f_{k}[i]\$ .

For readers not familiar with above, descriptions using these are possibly not easy to understand at a glance. The following is a try to provide supplemental explanations for these readers.

A wave in continuous form (correspond to a partial of the 2nd equation)

A wave is expressed with angular frequency

\omega _{k}\

and phase

\phi _{k}(t)\

, as following:

x_{k}(t)\ =r_{k}(t)\cdot \cos(\omega _{k}\ t+\phi _{k}(t))\ \ \ \ \ \ \ \ (\omega _{k}>0,\ {\mbox{real signal}})

(1)

Let's follow the style of signal analysis !

Following the style of signal processing, above real-valued signal can be extended into a complex form,^{[note 2]}^{[note 3]}^{[note 4]}called ...

Analytic representation (analytic signal)

{\begin{aligned}x_{a}(t)&=r_{k}(t)\cdot e^{j(\omega _{k}t+\phi _{k}(t))}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ {\mbox{(complex signal)}}\end{aligned}}

Introduction of instantaneous terms

On the style of signal processing, notions of time-varying phase and time-varying frequency are replaced by its "instantaneous" version, called "instantaneous phase" and "instantaneous frequency", respectively. The reason is, I don't know ^{[note 5]} even after a long discussion on this theme. (Note: possibly it may have been introduced since formalization of Frequency modulation^{[note 6]})

Instantaneous phase

\theta _{k}\

is given by argument function.^{[note 7]} On the above case, it is given as:

{\begin{aligned}\theta _{k}(t)&={\mbox{arg}}(x_{a}(t))=\omega _{k}\ t+\phi _{k}(t)\ \ \ \ \ \ ({\mbox{where}}\ \theta _{k}(0)=\phi _{k}(0)\ {\mbox{for}}\ t=0)\end{aligned}}

(4)

Instantaneous frequency

f_{k}(t)\

is defined by differentiation of above instantaneous phase.

If phase

\phi _{k}(t)\

is time-invariant, it is ignorable, and whole expression is:

f_{k}(t)=\omega _{k}(t)/2\pi ={\frac {d}{dt}}\theta _{k}(t)/2\pi

(5)

Redefinition of instantaneous phase

\theta _{k}(t)\

is introduced from above, using angular frequency

\omega _{k}(t)\

, as:

\theta _{k}(t)=\int _{-\infty }^{t}\omega _{k}(\tau )d\tau =\int _{0}^{t}\omega _{k}(\tau )d\tau +\theta (0)\ \ \ \ (\omega _{k}>0)

(6)

On discretization, above integral form can be rewritten into its discrete form, using substitutions:

t\rightarrow n/f_{s}=nT\

and

\omega _{k}(t)\rightarrow \omega _{k}[n]=2\pi f_{k}[n]\

as following:

{\begin{aligned}\theta _{k}[n]&=\sum _{i=-\infty }^{n}\omega _{k}[i]=\sum _{i=0}^{n}\omega _{k}[i]+\theta _{k}[0]\\&=2\pi T\sum _{i=0}^{n}f_{k}[i]+\theta _{k}[0]\ \ \ \ \ \ ({\mbox{where}}\ \theta _{k}[0]=\phi _{k}[0]\ {{\mbox{for}}\ n=0})\end{aligned}}

(7)

By differentiating

\theta _{k}[n]\

, above is expressed as:

{\begin{aligned}\theta _{k}[n]&=\theta _{k}[n-1]+2\pi Tf_{k}[n]\end{aligned}}

(8)

Comparison with expressions shown on other sources

Almost same expressions are seen on Smith III 2011 [6]^{[cite 1]} in the continuous form, and on Smith III & Serra 2005 [7].^{[cite 2]}^{[note 8]}
[ADDED] Similary, same equation forms were also seen on Rodet & Depalle 1992, p. 2nd page. On their expression, above instantaneous phase $\theta \$ was replaced to phase offset $\Phi \$ , and above initial (or static) phase offset $\phi _{k}[0]\$ was not apparent. ^{[note 9]} Thus, as far as above equations are concerned, at least that paper seems slightly hard to be called reliable source.

Introduction of band-limited time-varying terms

According to Papoulis 1977, p. 184,

"We shall say that a function $f(t)\$ is bandlimited if its Fourier transform is zero outside a finite interval ( $F(\omega )=0\$ ${\mbox{for}}|\omega |>\sigma \$ ) and its energy $E\$ is finite."

(Supplement: On the above quotation,

\sigma \

means bandwidth, and it seems often denoted by the fundamental frequency

\omega _{0}=2\pi f_{0}\

, as shown on below^{[cite 3]})

For the time-varying terms other than frequency (i.e. phase and amplitude ), if these were band-limited below the fundamental frequency

f_{0}\

,^{[cite 3]}^{[cite 4]} whole above discussion is almost applicable, with a few modifications. (The implementation details of band-limited terms are not mentioned on this note)

For time-varying phase $\phi _{k}(t)\$ on equation 4

If

\phi _{k}[n]\

is band-limited as above,^{[cite 3]} whole above discussion is almost applicable with a little modifications—subsequent equations (eq. 5, 6, 7, 8) should regard on the phase

\phi _{k}(t)\

, as following:

{\begin{aligned}f_{k}(t)&=\omega _{k}(t)/2\pi ={\frac {d}{dt}}\left(\theta _{k}(t)-\phi _{k}(t)\right)/2\pi \end{aligned}}

(5-2)

{\begin{aligned}\theta _{k}(t)&=\int _{-\infty }^{t}\omega _{k}d\tau +\phi _{k}(t)=\int _{0}^{t}\omega _{k}d\tau +\theta (0)+\phi _{k}(t)\ \ \ \ (\omega _{k}>0)\\\end{aligned}}

(6-2)

{\begin{aligned}\theta _{k}[n]&=\sum _{i=-\infty }^{n}\omega _{k}[i]+\phi _{k}[n]=2\pi T\sum _{i=-\infty }^{n}f_{k}[i]+\phi _{k}[n]=2\pi T\sum _{i=0}^{n}f_{k}[i]+\theta _{k}[0]+\phi _{k}[n]\end{aligned}}

(7-2)

[ADDED]

{\begin{aligned}\theta _{k}[n]&=\theta _{k}[n-1]+2\pi Tf_{k}[n]+\phi _{k}[n]-\phi _{k}[n-1]\end{aligned}}

(8-2)

Equation 7-2 requires

\phi _{k}[0]=0\

, because

\theta _{k}[0]=\theta _{k}[0]+\phi _{k}[0]\ \ {\mbox{for}}\ n=0\

.

(Note: I'm expecting more accurate, precise discussions might be found in somewhere around DFT, STFT, or these application, Phase vocoder)

For time-varying amplitude $r_{k}[n]\$

If

r_{k}[n]\

is band-limited as above,^{[cite 4]} whole above discussion is directly applicable without modification.

Multiple use of time-varying terms [ADDED]

Multiple use of band-limited time-varying terms may be not always band-limited, even if each term was individually band-limited on the single time-varying model, because, on the estimated bandwidths for each modulation type (AM : $2f_{m}$ , FM : $2(\Delta {}f+f_{m})$ , PM : $2(\Delta \theta +1)f_{m}$ , where $f_{m}$ is frequency of simplified modulation signal $x_{m}(t)=A_{m}\cos(2\pi f_{m}t+\phi _{m})$ ), several sum-rules between bandwidths (or merely a rule of thumb) are naturaly expected (in other words: when each time-invariant term was replaced by time-varying version, total bandwidth may be mostly widen rather than unchanged or narrowed). I'll add details on the later.
General expression of inharmonic additive synthesis

[EXTENDED]
Using above equation 7-2, inharmonic additive synthesis is expressed as:

{\begin{aligned}y[n]&=\sum _{k=1}^{K}r_{k}[n]\cos \left(\theta _{k}[n]\right)\\&=\sum _{k=1}^{K}r_{k}[n]\cos \left(2\pi T\sum _{i=-\infty }^{n}f_{k}[i]+\phi _{k}[n]\right)\\&=\sum _{k=1}^{K}r_{k}[n]\cos \left(2\pi T\sum _{i=0}^{n}f_{k}[i]+\theta _{k}[0]+\phi _{k}[n]\right)\\\end{aligned}}

(10)

Notes

^ As explained on the later section, instantaneous frequency can be defined as derivative of instantaneous phase $\theta _{k}(t)\$ as: ${\frac {d}{dt}}\theta _{k}(t)/2\pi$ in continuous form.

^ [MOVED TO NOTE] Following the style of signal processing, above real-valued signal can be extended into a complex form, called "analytic representation":

{\begin{aligned}x_{a}(t)&=x_{k}(t)+j\cdot {\tilde {x}}_{k}(t)\\\end{aligned}}

where

j\

denotes imaginary unit. Its real part is same as above

x_{k}(t)\

, and additional imaginary part

{\tilde {x}}_{k}(t)\

given by Hilbert transform of

x_{k}(t)\

, is known to be expressed as:

{\tilde {x}}_{k}(t)=H(x)(t)=r_{k}(t)\sin(\omega _{k}\ t+\phi _{k}(t))\,

(2)

Analytic representation (analytic signal) is expressed as:

{\begin{aligned}x_{a}(t)&=r_{k}(t)\left[\cos(\omega _{k}\ t+\phi _{k}(t))+j\cdot \sin(\omega _{k}\ t+\phi _{k}(t))\right]\\&=r_{k}(t)\cdot e^{j(\omega _{k}t+\phi _{k}(t))}\ \ \ \ {\mbox{(complex signal)}}\end{aligned}}

(3)

^ On this note, Hilbert transform of $x_{k}(t)\$ is denoted by ${\tilde {x}}_{k}(t)\$ , instead of ${\hat {x}}_{k}(t)\$ .
reason: the later notation is also often used to denote analytic representation itself $x_{a}(t)=x_{k}(t)+H(x_{k})(t)\$ , or even Fourier transform ${\mathcal {F}}(x_{k})(t)\$ of $x_{k}(t)\$ . To avoid confusion, the former notation is more appropriate on here.
^ Hilbert transform of $\cos \$ and $\sin \$ functions are known to given by $\pi /2\$ phase delayed signal. Although detail is omitted on here, it is led by the convolution of $x_{k}(t)\$ with the function $h(t)=1/(\pi t)\$ in the form of principal value integral, as following:
$H(x_{k})(t)={\frac {1}{\pi }}\ p.v.\int _{-\infty }^{\infty }{\frac {x_{k}(\tau )}{t-\tau }}d\tau \ \ \ \ {\mbox{or}}\ \ \ \ H(x_{k})(t)=-{\frac {1}{\pi }}\lim _{\epsilon \downarrow 0}\int _{\epsilon }^{\infty }{\frac {x_{k}(t+\tau )-x_{k}(t-\tau )}{\tau }}d\tau$

where $p.v.\$ denotes Cauchy principal value.
^ [MOVED TO NOTE] Following is merely a my guess for reason to use instantaneous phase on time-varying frequency.

Possibly, these notions have been originally required to precisely estimate the time-varying amount during one discrete time ( $\Delta {}t=T=1/f_{s}\$ ) on the acquiring, processing, re-sampling, etc. of external analog signal. And once these notions became basis of related fields (Signal analysis and Digital signal processing), finally it is always used even if this notion was essentially not required. These are just my guess. Possibly truth can't be explained in the plain language other than mathematics :)
^ [ADDED] As for a requirement of instantaneous terms, I re-found that general frequency modulation is expressed using instantaneous phase (in the form of a finite integral of instantaneous frequency $f(t)=\omega (t)/2\pi$ ) as:

${\begin{aligned}y(t)&=A_{c}\cos \left(\int _{0}^{t}\omega (\tau )d\tau \right)&=A_{c}\cos \left(\omega _{c}t+\omega _{\Delta }\int _{0}^{t}x_{m}(\tau )d\tau \right)\end{aligned}}$

where $x_{m}(t)\$ is modulation signal,

[REVISED] $\omega _{\Delta }/2\pi =f_{\Delta }\$ is frequency deviation (maximum shift from $f_{c}$ for normalized $x_{m}(t)\ \ (|x_{m}(t)|\leq 1)$ ).

and also Phase modulation is sometimes explained as a kind of differential version of frequency modulation (in the viewpoint of physical dimensions). Probably I'd studied them for the Ham radio license in childhood, however, for me at that time, these equations were merely a pile of definitions without meanings. And now, I should find more rational explanations on these)
^
[ADDED] Argument function

${\mbox{arg}}(x_{a}(t))=\arctan \left({\frac {\Im (x_{a}(t))}{\Re (x_{a}(t))}}\right)\mod \pi \ \ \ \ {\mbox{for}}\ x_{a}(t)\neq 0$
^ [ADDED] According to Smith III 2011 [1] and Smith III & Serra 2005 [2]

$\theta _{k}(t)=\int _{0}^{t}\omega _{k}(\tau )d\tau +\phi _{k}(0)\ \ {\xrightarrow[{}]{discreptize}}\ \ \,\theta _{k}[n]=2\pi T\sum _{i=0}^{n}f_{k}[i]+\phi _{k}[0]$

${\hat {\Theta }}_{k}(n)\triangleq {\hat {\Theta }}_{k}(n-1)+2\pi T{\hat {F}}_{k}(n)$

(Note: on above expressions, several variable names are replaced as: $i\rightarrow k\$ , integration variable $t\rightarrow \tau \$ .)

where [3] [4]

${\begin{aligned}{\tilde {x}}'_{m}(e^{j\cdot \omega _{k}})&:\end{aligned}}$ STFT of $x_{m}(n)\triangleq x(n-mR)\$ , at $k$ th bin, $m$ th frame.
$R\$ is hop size (hops) of STFT,

" ${\tilde {}}\$ " (over tilde) denotes applying spectral analysis window $w(n)\$ , and

" $'\$ " (prime) denotes zero-padding of both side on FFT frame.

${\begin{aligned}\Theta _{k}(m)&\triangleq \angle {\tilde {x}}'_{m}(e^{j\cdot \omega _{k}})\ \ \ \ [{\mbox{radians}}]\\F_{k}(m)&\triangleq {\frac {\Theta _{k}(m)-\Theta _{k}(m-1)}{2\pi RT}}\ \ \ \ [{\mbox{Hz}}]\\T&=1/f_{s}\ \ \ \ [{\mbox{seconds}}]\\\end{aligned}}$
^ [ADDED] On Rodet & Depalle 1992, "Theory and the oscillator method" section, they used following equations:

${\begin{aligned}s[n]&=\sum _{k=1}^{K}c_{k}[n]\\c_{k}[n]&=a_{k}[n]\cdot \cos(\Phi _{k}[n])\\\Phi _{k}[n]&=\Phi _{k}[n-1]+{\frac {2\pi }{Sr}}f_{k}[n]\end{aligned}}$

(Note: on above expressions, several variable names are replaced as: $j\rightarrow k\$ , $J\rightarrow K\$ .)

where each term of $k$ ^th partial at sample time $n$ had been described as

$f_{k}[n]\$ for frequency,

$a_{k}[n]\$ for amplitude,

$\Phi _{k}[n]\$ for phase (Note: probably typo of the instantaneous phase).

References

^ Smith III 2011, "Additive Synthesis (Early Sinusoidal Modeling)"
^ Smith III & Serra 2005, Additive Synthesis
^ ^a ^b ^c Kwakemaak & Sivan 1991, p. 613–614
^ ^a ^b Papoulis 1977, p. 121

Smith III, Julius O. (2011), Spectral Audio Signal Processing, CCRMA, Department of Music, Stanford University, ISBN 978-0-9745607-3-1 {{citation}}: Cite has empty unknown parameter: |chapterurl= (help)
Smith III, Julius O.; Serra, Xavier (2005), "PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation", Proceedings of the International Computer Music Conference (ICMC-87, Tokyo), Computer Music Association, 1987., CCRMA, Department of Music, Stanford University {{citation}}: Cite has empty unknown parameter: |chapterurl= (help) (online reprint)
Kwakernaak, Huibert; Sivan, Raphael (1991), Modern Signals and Systems: solutions manual with software, USA: Prentice Hall, ISBN 9780138092603 {{citation}}: Cite has empty unknown parameter: |1= (help)
Papoulis, Athanasios (1977), Signal Analysis, USA: McGraw-Hill, ISBN 9780070484603 {{citation}}: Cite has empty unknown parameter: |1= (help)
Rodet, X.; Depalle, P. (IRCAM) (1992), "Spectral Envelopes and Inverse FFT Synthesizer", AES 1992, San Francisco

I expect gentle criticism : ) --Clusternote (talk) 14:02, 19 January 2012 (UTC) [rev.3.2]Clusternote (talk) 23:26, 26 January 2012 (UTC) [added summary on rev.3.3]--Clusternote (talk) 22:29, 9 March 2012 (UTC)

Instantaneous phase and instantaneous frequency

About instantaneous phase and instantaneous frequency, in the context of additive synthesis equations, we cannot equate our definitions of time-dependent phase and time-dependent frequency to those definitions because those are non-local and we have local definitions. Instantaneous phase of a real-valued signal is obtained by introducing an imaginary component that is the Hilbert transform of the real signal, and by taking the argument of the resulting complex number. Hilbert transform is obtained by convolution with the Hilbert kernel, which has no compact support, hence non-locality. In other words, instantaneous phase of a non-static sinusoid, at current time, will depend on the unknown future, indefinitely, and that's not what's happening in our synthesis equations. So it might be better to stick to the different names or to describe the difference, or both. Olli Niemitalo (talk) 11:00, 16 January 2012 (UTC) Edit: This applies to the discrete equations. To the continuous equations, I'm not sure! Olli Niemitalo (talk) 14:16, 16 January 2012 (UTC)

Probably I roughly grasped intention of your clear proposal. Moreover, direction of recent improvements on that section seem almost correct, in my eyes. The thing I can't grasp yet it is, why we should use still this specific implementation which seems slightly hard to rationally explained, as example on Wikipedia. Possibly, is it based on a defact standard code on that field ? (If so, probably we may be able to find several sources other than Smith III ...) --Clusternote (talk) 12:46, 16 January 2012 (UTC)

P.S. I understand that you are talking about signal processing using complex form. It is not my specialty, however, your above kind comment is very helpful for me to improve my understandings on these equations. I'm sorry for trouble you. sincerely, --Clusternote (talk) 17:44, 16 January 2012 (UTC)

P.S.2 I'm happy if I can clearly identify the definitions of time-dependent phase and time-dependent frequency and also local definitions used on equations described on that section. If possible ... --Clusternote (talk) 18:24, 16 January 2012 (UTC)

Wikipedia is used as a source of information by all kinds of people. While the general reader (say, a musician) might not be interested, the discrete equations may be useful for those who wish to create their own additive synthesizers (say, a computer programmer), or interesting to someone who wishes to know more about how their inner workings can be described mathematically (say, a "nerdy" person). Clusternote, perhaps it would be more helpful, as compared to you writing a footnote, if you would point out here on the talk page what exactly it is that you find disagreeable about the discrete equations. We can then respond by 1) clarifying the article and 2) correcting any true defects. Olli Niemitalo (talk) 14:16, 16 January 2012 (UTC)

P.S. Anyway, I know that you recognized several issues on earlier version of theory section and corrected. Also IP user, too. I'm glad for intellectual honesty of you.

By the way: possibly most other users on here are the merely a mania or students or general users ? I can't almost believe that professionals on this field can't prove their own mathematical equations. It's a Miracle ! --Clusternote (talk) 22:14, 22 January 2012 (UTC)

I've split my previous post into sub-section "#Yet not clarified points" for ease of editing. --Clusternote (talk) 06:31, 21 January 2012 (UTC) [Added Link]--Clusternote (talk) 09:08, 23 January 2012 (UTC)

- - - - - - - - - - - - - - - - - - - - -

Actually, Olli, I'm not sure I agree with you. First, I do not think that there is much difference in outcome between the continuous-time case and the discrete-time case. Second, as long as the envelopes r_k(t) are bandlimited sufficiently, the Hilbert transform of

y(t)=\sum _{k=1}^{K}r_{k}(t)\cos(2\pi f_{k}t+\phi _{k})

,

is

{\hat {y}}(t)=\sum _{k=1}^{K}r_{k}(t)\sin(2\pi f_{k}t+\phi _{k})

and the analytic signal is

y(t)+i{\hat {y}}(t)=\sum _{k=1}^{K}r_{k}(t)e^{i(2\pi f_{k}t+\phi _{k})}

.

At this point there is agreement with how we understand instantaneous frequency from the POV of the analytic signal. The instantaneous frequency of a continuous-time sinusoid is simply and always the derivative of the argument of the sin or cosine function w.r.t. time and it need no definition or relationship with the analytic signal when only real sinusoids are involved. 71.169.180.195 (talk) 02:01, 17 January 2012 (UTC)

And, as a result, #Supplemental note for section "Inharmonic partials" using continuous form is almost correct. --Clusternote (talk) 02:59, 17 January 2012 (UTC)

I know you wish for that to be true, Cluster. But it isn't. For example, the expression

{\frac {d}{dt}}\phi _{k}[n]/2\pi

has no meaning. (The reason is that you're mixing discrete-time notions with continuous-time notions ad hoc in the same equation. The only way to relate continuous-time notions to discrete-time, is via the sampling theorem.) Your supplemental note is dead-in-the-water, right from the beginning. 71.169.180.195 (talk) 04:02, 17 January 2012 (UTC)

Is it sure ? The following two expressions are probably correct.

f_{k}(t)=\omega _{k}(t)/2\pi ={\frac {d}{dt}}\phi _{k}(t)/2\pi

(as written on article instantaneous frequency)

and

\phi (t)=\int _{0}^{t}\omega d\tau +\theta

(in continuous form)

The later expression is also mentioned on

best regards, --Clusternote (talk) 04:18, 17 January 2012 (UTC)

Those two equations are correct.*** And they are equations that live solely in the continuous-time domain. So, in your note, can you tell me what domain φ_k[n] is in? And what meaning there is to (d/dt)φ_k[n]? (d/dt)φ_k(t) does have meaning, but (d/dt)φ_k[n] does not and if you want to convert φ_k(t) to φ_k[n] or the back again, you need to consider the sampling theorem. BTW, the \textstyle TeX command does not appear to do anything. I don't know why you like putting it in. Use HTML or LaTeX, whichever you like, but try to keep the equations clean and consistent in style and use. Makes it easier for others to see what you're saying and, if they respond, to copy and edit your equations in response. Glad you're playing nice now. 71.169.180.195 (talk) 04:35, 17 January 2012 (UTC)

*** Actually, that article (which I have never contributed to) reverses the common convention for θ being the overall angle going into a cos() or sin() function and φ being the phase offset from a frequency reference term. It really should be (removing your superfluous formatting):

f_{k}(t)=\omega _{k}(t)/2\pi ={\frac {{\frac {d}{dt}}\theta _{k}(t)}{2\pi }}\

and

\theta (t)=\int _{0}^{t}\omega (\tau )d\tau +\theta (0)\

71.169.180.195 (talk) 04:42, 17 January 2012 (UTC)

I'm sorry for trouble you by my mistype and \textstyle. After then, I corrected it to

{\frac {d}{dt}}\phi _{k}(t)

, as you pointed out. As for \textstyle, I used it because I wrote these as "footnote". Of course, I should not use it on other situations including talk page. Any way, thanks. --Clusternote (talk) 04:51, 17 January 2012 (UTC)

(split topic on "band-limiting requirements" into subsection for ease of discussing --Clusternote (talk) 09:08, 23 January 2012 (UTC))

Band-limiting requirements

71.169.180.195, well, I think I'm still going to feel uneasy about it unless we mention the band-limiting requirement for

r_{k}[n]\,

and also for

f_{k}[n]\,

(thus allowing the two definitions of instantaneous phase and instantaneous frequency to almost meet, and also pretty much hides the fact that

f_{k}[n]\,

is "centered" in-between times of sample

n-1\,

and

n\,

). — Preceding unsigned comment added by Olli Niemitalo (talk • contribs) 03:14, 17 January 2012 (UTC)

Well, Olli, they don't do that so much in our communication theory and signal processing textbooks. It's because no envelope that is time limited can also be bandlimited. All we need to worry about is that these envelopes are sufficiently bandlimited so that they don't spill into the negative frequencies. That Hilbert transform and analytic signal relationship is actually an approximation, but a very good one. It essentially means that we can think of those envelopes as constant. If r_k(t) does not vary too fast, it's variance does not affect the instantaneous frequency. Now when you apply the same reasoning to a_k(t) and b_k(t), then we start to get into a little problem because if they do not vary in tandem, then the effective φ_k(t) varies and we *know* variation of phase causes detuning to the instantaneous frequency. This is essentially why, if starting from the Fourier series perspective (with the a_k(t) and b_k(t) envelopes) that you have to have some way of absorbing the change of φ_k(t) into the instantaneous frequency. But if we don't start with that and don't even fiddle with

y[n]=\sum _{k=1}^{K}\left[a_{k}[n]\cos \left({\frac {2\pi kf_{0}}{f_{\mathrm {s} }}}n\right)-b_{k}[n]\sin \left({\frac {2\pi kf_{0}}{f_{\mathrm {s} }}}n\right)\right]

and start directly out of

y[n]=\sum _{k=1}^{K}r_{k}[n]\cos \left({\frac {2\pi }{f_{\mathrm {s} }}}\sum _{i=1}^{n}f_{k}[i]+\phi _{k}\right)\,

we might be able to totally ditch the "absorb the change of phase into the instantaneous frequency" thing. And, I imagine if we do, Clusternote will want to take credit for it (and maybe some credit for motivating the issue is due him). Note that it is φ_k not φ_k[n]. I am contemplating how to do that succinctly and considering pedagogy. This is why I started that Fourier series section. I think, if we do this right, that section is the only place we should see a_k(t) and b_k(t), both sines and cosines at the same frequencies. After disposing of that, then the remainder should always be about a modulating amplitude r_k(t) and either a modulating phase φ_k(t) with harmonic frequencies k f₀ or an inharmonic and possibly time-variant frequency f_k(t) with a constant phase offset φ_k. 71.169.180.195 (talk) 04:02, 17 January 2012 (UTC)

71.169.180.195, I found a couple of books that discuss the analytic representation of amplitude-modulated sinusoids. Quoting: "If the input to a real system is a modulated signal f(t) with band-limited envelope y(t) as in [reference to Y(ω) = 0 for |ω|>ω₀, where Y(ω) is the Fourier transform of y(t)], f(t) = y(t) cos ω₀ t then z_f(t) = y(t)e^jω₀t [z_f(t) being the analytic representation of f(t)]" (Athanasios Papoulis, Signal Analysis, 1977, USA, p 121). The other book discusses phase-modulated sinusoids: "let x be the signal x(t) = cos [2πf₀t + ϕ(t)], t ∈ ℝ, with ϕ a real signal such that the complex signal g given by g(t) = e^jϕ(t), t ∈ ℝ, is low-pass with bandwidth B such that B ≤ f₀. [...] Because by assumption the bandwidth B of g is less than f₀, it follows that [...] x_p(t) = e^{j[2πf₀t+ϕ(t)]} [x_p(t) being the analytic representation of x(t)]" (Huibert Kwakernaak & Raphael Sivan, Modern Signals and Systems, 1991, USA, p 613-614) I'm finding in Kwakernaak & Sivan also the differential-of-sinusoid-phase definition of instantaneous frequency, so I'm going to resolve this by taking the issue to Instantaneous phase, the Wikipedia article that should contain also that definition of instantaneous frequency. I agree that we cannot make a strict assertion about bandlimitedness, but I would still say that a notion that the envelopes vary slowly with respect to the unmodulated frequency is still in order. Olli Niemitalo (talk) 16:04, 17 January 2012 (UTC)

I'm glad for your valuable information ! (I almost start to search several references on these). Also I reflected these to my memo. sincerely, --Clusternote (talk) 14:08, 19 January 2012 (UTC)

Hi Olli, dunno if the IP is the same today as it was yesterday. Wow! How did you find all of those characters to cut and paste into the discussion? Now those definitions of instantaneous frequency regarding the analytic signal do not dispute the simpler and older definition of simply the derivative w.r.t. time of whatever the argument of the sin() or cos() function. Probably a better article there would start with the simpler definition with no reference to any complex valued functions (like the analytic signal) and then expand it to the analytic signal. Now, in our case it is conceivable to use the analytic signal definition and speak of an overall instantaneous frequency of this collection of sinusoids:

y(t)=\sum _{k=1}^{K}r_{k}(t)\cos(2\pi f_{k}t+\phi _{k})\,

but it would make no sense. The only sense that can be made for this concept of instantaneous frequency is to apply it to a single sinusoid:

y(t)=r(t)\cos(\omega _{0}t+\phi (t))\,

.

In that case, both definitions are precisely identical. In both cases the instantaneous (angular) frequency is ω₀+(d/dt)ϕ(t) (again, assuming r(t) varies slowly enough). 71.169.180.195 (talk) 17:34, 17 January 2012 (UTC)

71.169.180.195, from here: ⊙. I agree with all you're saying (with an additional assumption for your last point that e^iϕ(t) also varies slowly enough). Olli Niemitalo (talk) 18:41, 17 January 2012 (UTC)

70.109.178.133, about requiring

f_{k}(t)

to be nonnegative: It is to equate it with the definitions of instantaneous frequency. I understand that if

f_{k}(t)

"bounces off" 0Hz, the sinusoid will be the same even if you swap the sign after that point. But going to negative frequencies,

f_{k}(t)

would no longer match the definition of instantaneous frequency of a real signal. Hence fixing it to non-negative values. This makes a nice reading about the development of definitions of instantaneous phase and amplitude: (Naoki Saito & Jimena Royo Letelier, Presentation: Amplitude and Phase Factorization of Signals via Blaschke Product and Its Applications, March 9, 2009, jsiam09.pdf). — Preceding unsigned comment added by Olli Niemitalo (talk • contribs) 09:12, 25 January 2012 (UTC)

Yet not clarified points

Why this specific implementation is selected as a sample on article ?
It seems share several with Smith III's articles (first one was originally added by me), however, it is still nearly single source state. To meets notability, we should add yet another reliable sources.
Added one citation, for the angle increment equation with constant frequency. Olli Niemitalo (talk) 03:31, 17 January 2012 (UTC)
Thanks. I've checked it on eq.8 on my memo, and found this relation is only applicable for the static phase $\phi _{k}$ , as you've already improved. (On the time-varying phase offset model, additional differentiated phase offset $(\phi _{k}[n]-\phi _{k}(n-1))$ are required, as shown on #math 8-2) --Clusternote (talk) 09:51, 20 January 2012 (UTC)

I agree that we should have citations. Added another citation, to the Rodet & Depalle FFT^-1 paper (see section "Theory and the oscillator method" in the linked pdf). This could serve as a source for subsequent equations too. Note that their equations are incomplete in that they omit the initial phase/initial conditions. Ross bencina (talk) 15:23, 17 January 2012 (UTC)
What theory is underlying on this specific implementations ?
Until early this month, I assumed this specific implementation (and although more generic definition I assumed) were probably related to Analysis phase, and based on theory of signal analysis and digital signal processing. However at now, we seems almost left Analysis as out of scope for this article.
I'm not sure how general your question is. My biased opinion is: The signal processing math comes from communications theory, specifically digital signal processing (two standard texts are: "Digital signal processing", Alan V. Oppenheim, Ronald W. Schafer (1975), "Theory and application of digital signal processing" Lawrence R. Rabiner, Bernard Gold (1975).) The practice is based in digital audio engineering and computer music (publications traditionally with the Audio Engineering Society and the International Computer Music Conference, respectively). Ross bencina (talk) 15:23, 17 January 2012 (UTC)
Thanks, but it seems slightly unfocused. My intention on here is clarification of theoretical background of digital additive synthesis, and probably it should be broken down into more specific topics:

2-1. How the equations for additive synthesis were derived, especially on time-varying frequency model ?

2-2. Why the notion of instantaneous phase is always used on time-varying frequency model ? (Note: derived instantaneous frequency is used on its definition as $2\pi \int _{-\infty }^{\infty }f_{k}(u)du$ or $2\pi \sum _{i=1}^{n}(Tf_{k}[i])$ ) I've already precisely checked equations as my memo, however, inevitable reason for it wasn't found on equations derived from definition (as predicted in Gödel's theorem). (Also I wrote more naive guess on "Introduction of instantaneous terms" on my memo)
If the definition of additive synthesis is almost free from Analysis/Resynthesis, it might be arbitrary defined using Fourier series or transform. Then, possibly several audible issues were found during experiments, probably more generic theories (such as signal analysis, digital signal processing or possibly speech synthesis) were referred to resolve issues. Or, possibly it might be purely implementation requirements for efficiency which use differential form of it (Olli have mentioned on above). If so, these should be explained on definition of time-varying frequency additive synthesis. --Clusternote (talk)

Possibly it may be a "bandlimited time-varying frequency form" as someone already mentioned on somewhere in a few days ago. (however, I've not yet verified it ... should be verified!) --Clusternote (talk) 06:36, 21 January 2012 (UTC)
Where can we find theory on the section ? and where is its reliable sources ?
Normally, merely a formula modification which lacks clear explanations on purposes, merits, assumptions and its proofs, is not called theory. It is simply called "a formula modification".
What would resolve this for you? Ross bencina (talk) 15:23, 17 January 2012 (UTC)
Essentially, plain but precise explanations are potentially always required on Wikipedia for non-specialized user. My trial to resolve it (and most other items on here) is supplemental note, however, it seems not so match relay on several "Theories" except for Fourier transform and bandlimited mentioned by Olli. (in the early stage, I'd expected more deep relation to DFT and phase vocoder, and now, I feel the need of more generic article on this synthesizer family and boundary region of the periphery. It is my intention.).

Anyway, above item 2-2 seems to need more explicit explanation, in my eyes. (below item 3-1 seems to be already resolved in latest revision of article. If it was still important for implementation, it should be explained in the viewpoints of efficiency or ease of implementation, IMO)

3-1. [already resolved ?] Previously exist description " $\phi _{k}[n]$ can be absorbed into the instantaneous frequency": Is it a theory or empilical implementation technique on DSP, or others without sources ? On my verification on memo, it seems not required at least "Theoretically".--Clusternote (talk) 09:51, 20 January 2012 (UTC)

The lead of the section could be worded differently, for all I care. The section, in my view is a consistent mathematical description of interrelations underlying most typical implementations of additive synthesis. Olli Niemitalo (talk) 20:20, 17 January 2012 (UTC)
I'm glad for every authors who improve it, really! The overall impression seems truly right direction. Hopefully if more citations were added on it, probably it is very useful for later verifications by other users. --Clusternote (talk) 09:51, 20 January 2012 (UTC)
Also, we needs short general explanations outlining what is described on discrete section, for generic users (i.e. other than implementers) to help their understandings on these discrete equations. It may be probably described in a few equations in continuous form.
⇒ #Supplemental note on inharmonic discrete equations (rev.3)

Note: on this year's earliest revision, this specific implementation was called "Theory", and it was described using instantaneous phase and instantaneous frequency. As a result, it had been incorrectly described for 4 years, until a few weeks ago. I'm glad for drastically improvements during last a few weeks !

best regards, --Clusternote (talk) 14:46, 16 January 2012 (UTC)

There was nothing incorrect with the equations in the last 4 years, except for the ones you recently put in. 71.169.180.195 (talk) 02:01, 17 January 2012 (UTC)

As written on top, it has several not yet clarified points. --Clusternote (talk) 04:08, 17 January 2012 (UTC)

[1] As explained on the later section, instantaneous frequency can be defined as derivative of instantaneous phase $\theta _{k}(t)\$ as: ${\frac {d}{dt}}\theta _{k}(t)/2\pi$ in continuous form.

[2] [MOVED TO NOTE] Following the style of signal processing, above real-valued signal can be extended into a complex form, called "analytic representation":

${\begin{aligned}x_{a}(t)&=x_{k}(t)+j\cdot {\tilde {x}}_{k}(t)\\\end{aligned}}$

where $j\$ denotes imaginary unit. Its real part is same as above $x_{k}(t)\$ , and additional imaginary part ${\tilde {x}}_{k}(t)\$ given by Hilbert transform of $x_{k}(t)\$ , is known to be expressed as:

${\tilde {x}}_{k}(t)=H(x)(t)=r_{k}(t)\sin(\omega _{k}\ t+\phi _{k}(t))\,$ (2)

Analytic representation (analytic signal) is expressed as:

${\begin{aligned}x_{a}(t)&=r_{k}(t)\left[\cos(\omega _{k}\ t+\phi _{k}(t))+j\cdot \sin(\omega _{k}\ t+\phi _{k}(t))\right]\\&=r_{k}(t)\cdot e^{j(\omega _{k}t+\phi _{k}(t))}\ \ \ \ {\mbox{(complex signal)}}\end{aligned}}$ (3)

[3] On this note, Hilbert transform of $x_{k}(t)\$ is denoted by ${\tilde {x}}_{k}(t)\$ , instead of ${\hat {x}}_{k}(t)\$ .
reason: the later notation is also often used to denote analytic representation itself $x_{a}(t)=x_{k}(t)+H(x_{k})(t)\$ , or even Fourier transform ${\mathcal {F}}(x_{k})(t)\$ of $x_{k}(t)\$ . To avoid confusion, the former notation is more appropriate on here.

[4] Hilbert transform of $\cos \$ and $\sin \$ functions are known to given by $\pi /2\$ phase delayed signal. Although detail is omitted on here, it is led by the convolution of $x_{k}(t)\$ with the function $h(t)=1/(\pi t)\$ in the form of principal value integral, as following:
$H(x_{k})(t)={\frac {1}{\pi }}\ p.v.\int _{-\infty }^{\infty }{\frac {x_{k}(\tau )}{t-\tau }}d\tau \ \ \ \ {\mbox{or}}\ \ \ \ H(x_{k})(t)=-{\frac {1}{\pi }}\lim _{\epsilon \downarrow 0}\int _{\epsilon }^{\infty }{\frac {x_{k}(t+\tau )-x_{k}(t-\tau )}{\tau }}d\tau$

where $p.v.\$ denotes Cauchy principal value.

[5] [MOVED TO NOTE] Following is merely a my guess for reason to use instantaneous phase on time-varying frequency.

Possibly, these notions have been originally required to precisely estimate the time-varying amount during one discrete time ( $\Delta {}t=T=1/f_{s}\$ ) on the acquiring, processing, re-sampling, etc. of external analog signal. And once these notions became basis of related fields (Signal analysis and Digital signal processing), finally it is always used even if this notion was essentially not required. These are just my guess. Possibly truth can't be explained in the plain language other than mathematics :)

[instantaneous-6] [ADDED] As for a requirement of instantaneous terms, I re-found that general frequency modulation is expressed using instantaneous phase (in the form of a finite integral of instantaneous frequency $f(t)=\omega (t)/2\pi$ ) as:

${\begin{aligned}y(t)&=A_{c}\cos \left(\int _{0}^{t}\omega (\tau )d\tau \right)&=A_{c}\cos \left(\omega _{c}t+\omega _{\Delta }\int _{0}^{t}x_{m}(\tau )d\tau \right)\end{aligned}}$

where $x_{m}(t)\$ is modulation signal,

[REVISED] $\omega _{\Delta }/2\pi =f_{\Delta }\$ is frequency deviation (maximum shift from $f_{c}$ for normalized $x_{m}(t)\ \ (|x_{m}(t)|\leq 1)$ ).

and also Phase modulation is sometimes explained as a kind of differential version of frequency modulation (in the viewpoint of physical dimensions). Probably I'd studied them for the Ham radio license in childhood, however, for me at that time, these equations were merely a pile of definitions without meanings. And now, I should find more rational explanations on these)

[7] 
[ADDED] Argument function

${\mbox{arg}}(x_{a}(t))=\arctan \left({\frac {\Im (x_{a}(t))}{\Re (x_{a}(t))}}\right)\mod \pi \ \ \ \ {\mbox{for}}\ x_{a}(t)\neq 0$

[smiss11-10] [ADDED] According to Smith III 2011 [1] and Smith III & Serra 2005 [2]

$\theta _{k}(t)=\int _{0}^{t}\omega _{k}(\tau )d\tau +\phi _{k}(0)\ \ {\xrightarrow[{}]{discreptize}}\ \ \,\theta _{k}[n]=2\pi T\sum _{i=0}^{n}f_{k}[i]+\phi _{k}[0]$

${\hat {\Theta }}_{k}(n)\triangleq {\hat {\Theta }}_{k}(n-1)+2\pi T{\hat {F}}_{k}(n)$

(Note: on above expressions, several variable names are replaced as: $i\rightarrow k\$ , integration variable $t\rightarrow \tau \$ .)

where [3] [4]

${\begin{aligned}{\tilde {x}}'_{m}(e^{j\cdot \omega _{k}})&:\end{aligned}}$ STFT of $x_{m}(n)\triangleq x(n-mR)\$ , at $k$ th bin, $m$ th frame.
$R\$ is hop size (hops) of STFT,

" ${\tilde {}}\$ " (over tilde) denotes applying spectral analysis window $w(n)\$ , and

" $'\$ " (prime) denotes zero-padding of both side on FFT frame.

${\begin{aligned}\Theta _{k}(m)&\triangleq \angle {\tilde {x}}'_{m}(e^{j\cdot \omega _{k}})\ \ \ \ [{\mbox{radians}}]\\F_{k}(m)&\triangleq {\frac {\Theta _{k}(m)-\Theta _{k}(m-1)}{2\pi RT}}\ \ \ \ [{\mbox{Hz}}]\\T&=1/f_{s}\ \ \ \ [{\mbox{seconds}}]\\\end{aligned}}$

[rodet92-11] [ADDED] On Rodet & Depalle 1992, "Theory and the oscillator method" section, they used following equations:

${\begin{aligned}s[n]&=\sum _{k=1}^{K}c_{k}[n]\\c_{k}[n]&=a_{k}[n]\cdot \cos(\Phi _{k}[n])\\\Phi _{k}[n]&=\Phi _{k}[n-1]+{\frac {2\pi }{Sr}}f_{k}[n]\end{aligned}}$

(Note: on above expressions, several variable names are replaced as: $j\rightarrow k\$ , $J\rightarrow K\$ .)

where each term of $k$ ^th partial at sample time $n$ had been described as

$f_{k}[n]\$ for frequency,

$a_{k}[n]\$ for amplitude,

$\Phi _{k}[n]\$ for phase (Note: probably typo of the instantaneous phase).

[8] Smith III 2011, "Additive Synthesis (Early Sinusoidal Modeling)"

[smith05-9] Smith III & Serra 2005, Additive Synthesis

[kwakemaak91-12] Kwakemaak & Sivan 1991, p. 613–614

[Papoulis77-13] Papoulis 1977, p. 121

[note 1]

[note 2]

[note 3]

[note 4]

[note 5]

[note 6]

[note 7]

[cite 1]

[cite 2]

[note 8]

[note 9]

[cite 3]

[cite 4]