Talk:Polynomial regression

Statistics High‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
High	This article has been rated as High-importance on the importance scale.

Mathematics Mid‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics articles
Mid	This article has been rated as Mid-priority on the project's priority scale.

Problems

This article is quite badly written as it stands, although not as bad as it was. I'm not sure if it's worth saving or not. Michael Hardy (talk) 05:22, 7 April 2009 (UTC)[reply]

I think it's worth having a page on this topic. I've attempted a major rewrite. I plan to add a figure or two in the near future. Comments are welcome. Skbkekas (talk) 15:38, 8 April 2009 (UTC)[reply]

This looks considerably better. There may be scope to add something from design of experiments, where I think there are results such as, if the region of interest is known and if the order of the polynomial is known, where best to place the values of the x's to get the best estimates of the regression function. It may be more important to extend the discussion to explicitly include polynomials with multivariate x's. Possibly the book by Draper&Smith may suggest things that ought to be mentioned here. Melcombe (talk) 09:03, 9 April 2009 (UTC)[reply]

Fine albeit unconstructive comments!

Polynomial regression has its place in history even if more efficient methods exist. Therefore an article on polynomial regression should not be overshadowed by other topics which should merely be linked to and exist separately in their own right.

I am sorry to say that the article in its current state does not appear explain what polynomial regression is, and why it is useful (follow up the Excel commentary). The nomenclature corrections are appreciated, but I cannot fathom why the derivation has been removed. This would have been of interest to any budding mathematician or software programmer. A more useful revision would have been an explanation of why the differentials yield a minima (as opposed to a maxima or inflexion), or the addition of some credible references (I wrote the original article from memory of my university days). Finally, I would like to defend the use of practical analogies and simple English. I think mathematical articles nowadays are way too abstract and risk alienating future generations. 23:42, Gouranga Gupta 15 April 2009 (UTC) —Preceding unsigned comment added by 89.240.182.222 (talk)

Something more to consider:

According to the Oxford Dictionary of Mathematics (Clapham & Nicholson), Multiple Regression is regression based on multiple independent variables, e.g. f(x, y) = 2 * x + 5 * y etc as opposed to a single independent variable e.g. f(x) = 2 * x + 5. Both of these examples are LINEAR. The aforementioned reference would refer to the former equation as an example of "Multiple Linear Regression".

I think Michael Hardy's early argument that polynomial regression is a form of linear regression is questionable. My academic background is Chemical Engineering and not Mathematics (so I’m open to correction), but I believe that it is by clever substitution that one can mould linear regression into a logarithmic type regression etc. This should not distract from the fact that the word linear refers to the fact that highest "Degree" of the independent variable(s) is one. Thus, I concede that my original use of the word "Order" is incorrect. "Order" according to Clapham & Nicholson is generally applied to differential equations, matrices or roots. A polynomial of degree n or nth degree polynomial would constitute better grammar. Gouranga Gupta 23:14 22 April 2009 (UTC). —Preceding unsigned comment added by 84.13.104.33 (talk)

I am absolutely certain that the meaning of the term "linear" in "linear regression" or "linear model" refers to linearity in the unknown parameters (the regression coefficients), not to linearity in the independent variable or variables. This is stated unequivocally in the first chapter of several standard regression texts on my desk (Stapleton, Monahan, Abraham). This is simply a definition and you may view it as being arbitrary, but it is unquestionably the definition that is in universal use. The rationale for the definition is that while it may be extremely important for the interpretation of a fitted linear model whether the independent variables are transformed (e.g. by taking powers of them), it has nothing to do with how a linear model is fit, or how inferences are performed. On the other hand, a model that is non-linear in the parameters requires a completely different set of statistical techniques for fitting and inference. Skbkekas (talk) 03:32, 23 April 2009 (UTC)[reply]

Conceded! (I've also just noticed that Excel uses the phrase polynomial order (and not degree)). I think confusion by non-specialists is inevitable, although the "Linear regression" article deals with this. What does one type if searching specifically for practical information on the common types of least squares regression curve fitting, e.g. straight line, a general polynomial, exponential, logarithmic or perhaps even Fourier? Originally I was looking specifically for the polynomial derivation and maybe some pseudo code that would yield the regression coefficients. It didn't seem to exist, so I sought to add it. Are such specific articles beyond the intended scope of Wikipedia? If not then perhaps "Straight line regression", "Polynomial regression", "Multiple regression" etc should exist as separate articles that can focus on their pragmatic implementation and the "Linear regression" article can continue to disambiguate semantics. Gouranga Gupta 16:49 26 April 2009 (UTC). —Preceding unsigned comment added by 78.151.65.30 (talk)

>I think Michael Hardy's early argument that polynomial regression is a form of linear regression is questionable.

No, it isn't, although this point often confuses novices. A mathematical formula is linear or nonlinear in the unknowns. A regression equation Y = beta0 + beta1 * X1 + beta2 * X2 +... has the parameters (the betas) as its unknowns. The form of the relationship between the Y and the Xs is irrelevant. Blaise (talk) 12:54, 31 March 2013 (UTC)[reply]

Explanations

Further to my comments above, I’ve just submitted what I consider to be a relatively simple explanation of why the surface area of a sphere is what it is in the Sphere topic. It may yet be invalidated, but I think it would serve the community well if the common types of regression we take for granted in software could be explained simply and usefully, for posterity if nothing else. Gouranga Gupta (talk) 15:51, 27 April 2009 (UTC)[reply]

Solving the polynomial regression

This section needs more information because right now I haven't the faintest clue how it works. The article says to "set epsilon to 0 and solve the system of linear equations", but it's extremely easy to come up with a system of linear equations with no solution, e.g. let m = 2, n = 3, (x1, y1) = (1, 1), (x2, y2) = (2, 1), (x3, y3) = (2, 2) and there is obviously no straight line connecting these three points. Nevertheless, calculating

${\widehat {\vec {a}}}=(\mathbf {X} ^{T}\mathbf {X} )^{-1}\;\mathbf {X} ^{T}Y$

yields a0 = 0.5, a1 = 0.5, which, while it does not solve the simultaneous equations, describes a straight line passing through (x1, y1) and exactly between (x2, y2) and (x3, y3), i.e. the correct linear model. What is this, magic? 195.212.29.92 (talk) 13:05, 5 January 2011 (UTC)[reply]

Agreed, I have made some edits to that section that hopefully make it more clear. Skbkekas (talk) 22:53, 6 January 2011 (UTC)[reply]

Finding slope of polynomial regression

As of now, the entry states that "...when the temperature is increased from x to x + 1 units, the expected yield changes by a₁ + a₂ + 2a₂x."

Shouldn't it be a₁ + 2a₂x? The derivative of any polynomial function is a₁+2a₂x. Or am I missing something here? — Preceding unsigned comment added by Joshtk76 (talk • contribs) 23:47, 25 March 2012 (UTC)[reply]

(Mathematically) agree - I just corrected the article. Billy Pilgrim (talk) 12:01, 29 January 2013 (UTC)[reply]

Interactions?

Aren't interactions a form of multivariate polynomial regression? 86.127.138.234 (talk) 06:02, 22 February 2015 (UTC)[reply]

Merge (Polynomial least-squares regression)

I intend to merge Polynomial regression and Polynomial least squares, creating Polynomial least-squares regression. fgnievinski (talk) 00:28, 16 July 2018 (UTC)[reply]

~~The merge sounds fine to me, but I'm not sure that the title change is necessary: Polynomial regression is the primary topic, with the least-squares being (by far) the most important subset.~~ Klbrain (talk) 06:15, 10 August 2019 (UTC)[reply]

Oppose, changing my position, noticing the broader definition of Polynomial least squares. Given the complexity of the topic and the current definitions, I think that the best solution is to keep the pages separate. Klbrain (talk) 07:43, 22 September 2019 (UTC)[reply]

Closed, given the absence of support for a merge. Klbrain (talk) 07:39, 2 October 2019 (UTC)[reply]

Sorry, but absence of support is not presence of opposition, which was only your own, after you changed your mind, although no reason was given other than the perceived subject complexity.

Well, the subject is only seemingly complex because the article is poorly written. There is nothing of substance in it that is not covered here, except perhaps the weighting scheme, which could be covered more succinctly here with a link to weighted least squares.

I've also checked the first two references cited in that article and the expression "polynomial least squares" does not show up, only "polynomial regression". That plus the multiple issues identified in the lead make it clear that article should simply redirect here. fgnievinski (talk) 00:14, 27 August 2020 (UTC)[reply]

@Fgnievinski: no objection from me if you can rework the material into something more coherent, if there is anything salvageable in Polynomial least squares, or simply redirect (as you suggest). Klbrain (talk) 09:06, 27 August 2020 (UTC)[reply]

@Klbrain: thanks for your understanding; I've inserted a sentence about the weighting scheme, then redirected that article here. I also noticed that the main reference ("Ordinary Least Squares Revolutionized", cited 20 times in the Wikipedia article), is not a peer-reviewed publication: it's been self-published in the SSRN preprint server in 2015 and has not attracted any citations outside of Wikipedia since then. So maybe the author or someone who likes their work very much just wanted to publicized it in Wikipedia. fgnievinski (talk) 01:58, 28 August 2020 (UTC)[reply]