# Talk:Least mean squares filter

This article could use some work but since I don't understand LMS, I can't do it.

## Least Squares

Minimizing the sum of squares distance is the same as minimizing the average squared distance (providing the number of points does not change). I do not understand why this is a separate article. I therefore move that this be merged into Least squares. - JustinWick 18:05, 5 December 2005 (UTC)

• LMS is an adaptive filter algorithm. Therefore, it belongs firmly in the area of digital signal processing. Whereas Least squares describes a mathematical technique. There must be a link from Least squares to LMS and vice versa. Faust o 17:07, 22 January 2006 (UTC)
• Only one use of the term least means squares is an adaptive filter, it is also frequently used in statistical minimization problems. I propose that this page be moved to Least mean squares (filter). --Salix alba (talk) 00:11, 4 February 2006 (UTC)
• Done Faust o 15:36, 7 February 2006 (UTC)

## Stochastic gradient descent

Is LMS a Stochastic gradient descent algorithm? --Memming 17:10, 12 November 2006 (UTC)

• Yes, added the link there Jmvalin 06:13, 2 February 2007 (UTC)

## NLMS and more

Just added NLMS and a sort of overview figure. I changed the notation for the filter. If everyone objects and want something else, I can always update the figure. The main reason for having h and \hat{h} is that I want to add some derivations for the optimal rate, so I need to distinguish between the "real" filter and the adapted one. Jmvalin 06:07, 2 February 2007 (UTC)

Just did another batch of improvements. I'd be interested on feedback on that -- and the rest of the article Jmvalin 13:45, 7 February 2007 (UTC)

## Undefined symbol.

Several places in the article the symbol ${\displaystyle \mathbf {h} ^{H}(n)\,}$ is used, but it is not defined. I cannot tell from context what it is supposed to mean. Can someone who knows please update the article so this symbol is well defined? 207.190.198.130 00:55, 7 November 2007 (UTC)

It's the Hermitian transpose. I've now added a mention of this to the article. Oli Filth(talk) 09:23, 7 November 2007 (UTC)
The definition occurred long after the term is used (in several places). Moved reference to Hermitian to "definition of symbols" section. Laugh Tough (talk) 17:35, 27 February 2012 (UTC)

## Sources

"This article or section needs sources or references that appear in reliable, third-party publications."

WTF? Looked at the references? Those are the references on the subject, especially the Haykin book. Jmvalin (talk) 23:37, 11 May 2008 (UTC)

I agree. I removed the tag. --Zvika (talk) 13:12, 22 June 2008 (UTC)

## Misleading figure and explanations

In an article as this one I'm expecting that by simply looking at the figure to grasp as much as possible about the subject. Unfortunately, be inspecting the figure one will remain with the impression that the impulse response of the filter is computed sample by sample. That is, for the n-th input and output samples it is possible to estimate only the n-th sample of the impulse response, which is not the case. I guess that a more consistent notation would be to make n a sub / superscript because n is not the input sample number but the iteration number.

Another thing, I can't find anywhere in the text any information regarding the length of the impulse response. I can see that x(n) is in fact a column vector consisting of the previous p samples of the input signal and by this I can assume the same length for the transfer function, but I think this should be made more clear in the text.89.136.41.31 (talk) 14:27, 19 February 2009 (UTC)Apass

In the section Idea, in the discussion about the steepest descent method, shouldn't it be "to take the partial derivatives with respect to the CONJUGATES of the individual entries of the filter vector"?

Indeed we want to compute the gradient of the modulus square here, and the gradient of a real function ${\displaystyle f(z)}$ of the complex variable ${\displaystyle z}$ is given by

${\displaystyle \nabla f(z)=2{\frac {df(z)}{dz^{*}}}.}$

I think it should be corrected, otherwise it can be misleading and the equation providing ${\displaystyle \nabla C(n)}$ may seem general whereas it is in fact a very special case (because here ${\displaystyle e(n)}$ is expressed with respect to the conjugate of the filter coefficients...)

Ivilo (talk) 13:22, 15 October 2009 (UTC)

## Step Size Factor

would anyone address the Step Size (μ) limitations?
One must pay attention to define the Step Size clearly.
I wish I could do it, yet I struggle to understand the contradictions about it from different sources. --Royi A (talk) 15:13, 30 December 2009 (UTC)

I added a section about mean stability and convergence. I omitted the proof, as it would clutter the page significantly, and anyone can look it up in the references. There should also be a section about mean square performance (specifically, one should be informed that the steady state mean square error increases monotonically with increasing ${\displaystyle \mu }$), maybe I'll add something later. BTW, this is my first edit on Wikipedia - does anyone know why the math looks different in the different equations, although the tags are identical? Can I do something about that? It looks weird that a greek letter is italic in one place and plain in another.

128.97.90.235 (talk) 05:54, 18 May 2010 (UTC)

## Relationship to the least squares filter section needs work

I've been trying to make sense of the equation:

${\displaystyle {\boldsymbol {\hat {\beta }}}=(\mathbf {X} ^{\mathbf {T} }\mathbf {X} )^{-1}\mathbf {X} ^{\mathbf {T} }{\boldsymbol {y}}.}$

I cannot relate it to anything in the article or figures. ${\displaystyle {\boldsymbol {\hat {\beta }}}}$ is defined nowhere. X and y are casually referred to as "input matrix" and "output vector". Does "output vector" mean the entire history of the output? It says X is a matrix, but the input to me looks like a vector. From what is written, I gather that the "least squares filter" is something different from the Wiener filter. I have three books on adaptive filtering, including Widrow's and none of them mention a "least squares filter". There is no "least squares filter" article. The LMS algorithm converges under certain conditions to the Wiener filter solution. I propose to delete this equation and all mention of "least squares filter." Discussion of the LMS algorithm vs the Wiener filter covers the issue. Constant314 (talk) 04:38, 20 November 2016 (UTC)