# Talk:Sigmoid function

WikiProject Mathematics (Rated Start-class, Low-importance)
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
 Start Class
 Low Importance
Field:  Analysis

## Untitled

Call me crazy, but since when is the Hyperbolic_cosine considered "S shaped"? If this is a typo, I'm not sure what other function it was supposed to be. --65.147.0.105 15:25, 28 May 2004 (UTC)

• Right you are, I don't know what I was thinking. Fixed. (The error function is a proper sigmoid, right?)
Jorge Stolfi 10:24, 31 May 2004 (UTC)

Isn't it better to redirect this page to the logistic function page? Or restore this page to its former glory? The current page is kinda pathetic.

## local extrema?

Do you really mean "local minimum" and "local maximum"? The example function given clearly doesn't have any local minima or maxima (but it does have a global minimum of 0 and a global maximum of 1) -- Somebody

Perhaps what is meant is that the second derivative (curvature) has a local minimum and maximum? BTW, I do not agree that the function has global extrema, because as I learned them and as the article on them states, they are points in the domain of the function and are always also local extrema. This function has none. The image of the function has supremum of 1 and infimum of 0, though (ie. the asymptotes of this function are y=0 and y=1). 82.103.198.180 10:03, 23 July 2006 (UTC)

Maybe it's for complex argument values? One is led to think of reals only because the plot is 2d, but maybe the text doesn't assume that. Coffee2theorems 18:22, 23 July 2006 (UTC)

## Examples

I'd like to add a gallery of sigmoid-like curves to this article. The hemoglobin example is a nice one. Any others? --HappyCamper 17:22, 30 March 2007 (UTC)

The sigmoidal shape of hemoglobin's oxygen-dissociation curve results from cooperative binding of oxygen to hemoglobin.

## Sign

For the double sigmoid function, do you mean sin?

${\displaystyle y={\mbox{tanh}}(x-d)\,{\Bigg (}1-\exp {\bigg (}-{\bigg (}{\frac {x-d}{s}}{\bigg )}^{2}{\bigg )}{\Bigg )},}$

or this one:

${\displaystyle y={\mbox{tanh}}(x-d)\,{\Bigg (}1-\exp {\bigg (}-{\bigg (}{\frac {x-d}{s}}{\bigg )}^{4}{\bigg )}{\Bigg )},}$

## Image

I see that someone changed the image size recently in order to avoid resolution problems. Maybe the image should be replaced after all with the almost identical vector image ?--Hagman-de 15:53, 16 June 2007 (UTC)

## a slightly more useful definition?

I've used the sigmoid function on and off, for a long time (about 8 years), and what I use is of course similar to what is presented here, but I would suggest adding two elements into the definition -- a "gain" or "sharpness" factor "k" or "g" -- and a "threshold" or "slider" term that allows the function to be "slid" back and forth across the X-axis:

Y(t) = 1/(1 + e(k*(X - thr))

• The "gain" at X = "thr", is the derivative of course, but it is 1/4 the value of k (as I remember)
• The curve can be "flipped around" by changing the sign of k; thus the sigmoid can be made to act like a Boolean NOT if "thr" is 0.5 and k is positive,
You see the failure of "the law of excluded middle" (LoEM) -- no matter how huge the k, the value of the function at X = "thr" = 0.5. This violates the LoEM.
• You can build e.g. an OR gate by adding X1 and X2, subtracting "thr" = 0.5 and then squashing the sum with the sigmoid:
OR(X1, X2) = 1/(1 + e(-12*(X1 + X2 - 0.5)))
• Given that you can build an OR and a NOT you now can approximate any Boolean function.
• Similarly, in a plane, the value of Z(t) will be 0.5 all along a line (it looks like a folded plane)
From Y = mX + b,
Y/b + (m/b)*X = 0
Z(t) = sig(Y/b + (m/b)*X - thr)
• Two of the above Z(t) but with reversed signs and slightly offset with different thresholds added together make a line, like a mountain range on a map, or a canyon. However, If you put three of these plateaus i.e. "folded sheets" (for a total of just 3 sigmoids) on the X-Y plane and get the signs of their k's right, add them together and pass them through a "second-layer" sigmoid you have a "triangle" that can be shrunk with higher values of k's make a single Matterhorn stick up anywhere on the plane (or make a sink-hole).
• Given that you can make Matterhorns to your heart's content anywhere on the plane, you can add them together and approximate any curve by "bleeding" one into another. This summation proves that sigmoids can be used to approximate any arbitrary curve, much like a 2-D Fourier transform.

Some of this stuff can be found in a book titled:

Tom M. Mitchell, Machine Learning, WCB-McGraw-Hill, 1997, ISBN 0-07-042807-7

In particular see "Chapter 4: Artificial Neural Networks" where the Boolean abilities of "perceptrons" are defined as well. I happened onto the tricky business of adding three folded planes together to make a "triangle" (and passing them through a second-layer sigmoid) because a neural net showed me this (!). I've not seen it documented anywhere, but I did see the results of it in a journal once. I'm sure someone who knows the literature better could cite the source. Proofs similar to the above are mentioned in Mitchell. This stuff is easy to do in Excel. wvbaileyWvbailey 18:39, 17 June 2007 (UTC)

## Another sigmoid?

I wonder if it would be useful to list the following function among the sigmoids:

${\displaystyle f(x)=3x^{2}-2x^{3},\quad x\in [0,1]}$

I have seen it used as a "hack" when a fast S-shaped function was needed, avoiding the (computer) evaluation of exp(x). Its derivative is flat at 0 and 1, and it is symmetrical with respect to the midpoint (meaning, ${\displaystyle f(1-x)=1-f(x)}$). For many purposes it works fine, as long as you don't run outside the range [0,1]. —Preceding unsigned comment added by Pasmao (talkcontribs) 12:44, 27 October 2007 (UTC)

It would be interesting to add something like this. I fiddled with this notion with respect to what would be required for mother nature to build a squasher for making neuralogical ANDs and ORs, and was able to get to some pretty nice approximations -- as long as you stay within the interval. Somewhere I actually worked out the math for this ... a problem arises because, to be useful, the AND etc needs some "gain" in the middle (i.e. a slope > 1) but the more gain you put in the more difficult the design becomes. For an OR you need a range of -0.25 to +2.25 (i.e. if inputs are "a" and "b" that vary from 0 to 1, add them and squash their sum back to approximately 0 or 1). The first hack starts out with the odd function y = 1*(x-0.5) + 0.5 (just a straight line shifted to the right: yielding (0,0), (1,1) ). This clearly won't work. The trick then is to feedback a certain amount of x2 to give you some "gain", etc, etc. As I remember this works best if it goes through two iterations. I'm working from memory here... bill Wvbailey (talk) 17:18, 13 January 2008 (UTC)

This is nice! Actually you can generalize it

${\displaystyle f(x)=ax^{b}-(a-1)x^{\frac {ab}{a-1}},\quad x\in [0,1]}$

You assure that f(0)=0, f(1)=1 and that f'(0)=f'(1)=0. By playing around with a and b you can get different shapes to suit you Juancentro (talk) 23:36, 18 April 2013 (UTC)

## Derivative Clarification

I'm pretty sure that not all sigmoid functions have the derivative:

${\displaystyle {\frac {dP}{dt}}=P(1-P).}$

Perhaps a minor clarification would be in order. —Preceding unsigned comment added by 128.111.110.55 (talk) 02:12, 11 December 2007 (UTC)

This formula is only for ${\displaystyle 1/(1-exp(-x))}$ tanh for example has a derivative of 1-tanh^2. This is also confusing as f(...) can be mistaken for applying function f to (...) where in this case it means the result of multiplying function f with 1-f. dP/df = (P)*(1-P) would be clearer.
Jfmiller28 (talk) 23:09, 2 January 2008 (UTC)

${\displaystyle 1/(1-exp(-x))}$ is not even the special case of the logistic function mentioned in the text. How the reader could know what function the formula applies to. This part of the text is very confusing.130.234.198.85 (talk) 14:36, 7 January 2008 (UTC)

## Are some of the sections talking about the logistic?

• My text by Mitchell, which I listed on the article page (the only reference, BTW), equates the two:
"σ(y) = 1/(1+e-y)
"σ is often called the signmod function or, alternately, the logistic function. Note that its output ranges between 0 and 1 .... Because it it maps a very large input domain to a small range of outputs, it is often referred to as the squashing function of the unit [cf Figure 4.6 The sigmoid threshold unit; in this drawing, σ(y) = 1/(1+e-net), where net = Σ0i(wi*xi) and wi is the ith weight for the ith input xi and x0 is a constant -- x0 is important(!)]. The sigmoid function has the useful property that its derivative is easily expressed in terms of its output..." (Mitchell 1997:96-97)
My guess is writers who distinguish between the two are (needlessly) splitting the hare (hair) and using two different names for the same function depending on where it is used. "Logistic" would seem to come from "logic" i.e. having 1 and 0 outcomes only; "Sigmoid" because of its shape as in "sigmoidoscopy". Anyway, as this is wikipedia and we need sources to back up our claims, mine says they are the same thing. Bill Wvbailey (talk) 15:07, 30 May 2008 (UTC)

## External link to Logistic Function implementation in Excel should be maintained

A sigmoid curve is produced by a mathematical function having an "S" shape —Preceding unsigned comment added by 220.225.131.157 (talk) 04:16, 12 April 2010 (UTC)

## Sigmoid and sigmoidal

the following is called a "sigmoidal function" in another article:

σ${\displaystyle (x)=2/(1+e^{-ax})-1}$

Is ok to create a sigmoidal function redirect to this article? Or are they different things? walk victor falk talk 02:16, 15 February 2011 (UTC)

Sigmoidal and sigmoid seem to me as they are the same thing; maybe there is some very slight difference but it's not pointed out by the article. --Kri (talk) 00:38, 16 February 2011 (UTC)

## Sign of first derivative

Currently the section "Definition" says

A sigmoid function is a bounded differentiable real function that is defined for all real input values and has a positive derivative at each point.

but then in the very next sentence the section "Properties" says

In general, a sigmoid function is real-valued and differentiable, having either a non-negative or non-positive first derivative which is bell shaped.

So the article is inconsistent as to whether it must be upward sloping or whether it can alternatively be downward sloping, and (if downward sloping is precluded) as to whether it must have a positive or just a non-negative derivative. Duoduoduo (talk) 14:15, 2 November 2013 (UTC)

## asymmetric sigmoid function

This page is missing a separation of symmetrical and asummetrical sigmoid functions , e.g. the Gompertz function is an asymmetric sigmoid http://en.wikipedia.org/wiki/Gompertz_function but there are many others.

Olbran (talk) 10:50, 17 March 2015 (UTC)

## Definition and properties not consistent

The Properties section should be consistent with the Definition section. The Definition says that the derivative must be positive. The Properties section says that it must be either non-positive or non-negative. It's unnecessary anyway to restate properties that are explicit in the definition (other properties that can be derived from the definition should be mentioned), but it's unhelpful at least to be inconsistent. — Preceding unsigned comment added by 209.93.31.116 (talk) 17:20, 29 April 2017 (UTC)