# Sinusoidal model

In statistics, signal processing, and time series analysis, a sinusoidal model to approximate a sequence Yi is:

${\displaystyle Y_{i}=C+\alpha \sin(\omega T_{i}+\phi )+E_{i}}$

where C is constant defining a mean level, α is an amplitude for the sine wave, ω is the frequency, Ti is a time variable, φ is the phase, and Ei is the error sequence in approximating the sequence Yi by the model. This sinusoidal model can be fit using nonlinear least squares; to obtain a good fit, nonlinear least squares routines may require good starting values for the constant, the amplitude, and the frequency.

Fitting a model with a single sinusoid is a special case of least-squares spectral analysis.

## Good starting value for the mean

A good starting value for C can be obtained by calculating the mean of the data. If the data show a trend, i.e., the assumption of constant location is violated, one can replace C with a linear or quadratic least squares fit. That is, the model becomes

${\displaystyle Y_{i}=(B_{0}+B_{1}T_{i})+\alpha \sin(2\pi \omega T_{i}+\phi )+E_{i}}$

or

${\displaystyle Y_{i}=(B_{0}+B_{1}T_{i}+B_{2}T_{i}^{2})+\alpha \sin(2\pi \omega T_{i}+\phi )+E_{i}}$

## Good starting value for frequency

The starting value for the frequency can be obtained from the dominant frequency in a periodogram. A complex demodulation phase plot can be used to refine this initial estimate for the frequency.[citation needed]

## Good starting values for amplitude

The root mean square of the detrended data can be scaled by the square root of two to obtain an estimate of the sinusoid amplitude. A complex demodulation amplitude plot can be used to find a good starting value for the amplitude. In addition, this plot can indicate whether or not the amplitude is constant over the entire range of the data or if it varies. If the plot is essentially flat, i.e., zero slope, then it is reasonable to assume a constant amplitude in the non-linear model. However, if the slope varies over the range of the plot, one may need to adjust the model to be:

${\displaystyle Y_{i}=C+(B_{0}+B_{1}T_{i})\sin(2\pi \omega T_{i}+\phi )+E_{i}}$

That is, one may replace α with a function of time. A linear fit is specified in the model above, but this can be replaced with a more elaborate function if needed.

## Model validation

As with any statistical model, the fit should be subjected to graphical and quantitative techniques of model validation. For example, a run sequence plot to check for significant shifts in location, scale, start-up effects and outliers. A lag plot can be used to verify the residuals are independent. The outliers also appear in the lag plot, and a histogram and normal probability plot to check for skewness or other non-normality in the residuals.

## Extensions

A different method consists in transforming the non-linear regression to a linear regression thanks to a convenient integral equation. Then, there is no need for initial guess and no need for iterative process : the fitting is directly obtained.[1]