# Sample entropy

Sample entropy (SampEn) is a modification of approximate entropy (ApEn), used for assessing the complexity of physiological time-series signals, diagnosing diseased states.[1] SampEn has two advantages over ApEn: data length independence and a relatively trouble-free implementation. Also, there is a small computational difference: In ApEn, the comparison between the template vector (see below) and the rest of the vectors also includes comparison with itself. This guarantees that probabilities ${\displaystyle C_{i}'^{m}(r)}$ are never zero. Consequently, it is always possible to take a logarithm of probabilities. Because template comparisons with itself lower ApEn values, the signals are interpreted to be more regular than they actually are. These self-matches are not included in SampEn. However, since SampEn makes direct use of the correlation integrals, it is not a real measure of information but an approximation. The foundations and differences with ApEn, as well as a step-by-step tutorial for its application is available at.[2]

There is a multiscale version of SampEn as well, suggested by Costa and others.[3]

## Definition

Like approximate entropy (ApEn), Sample entropy (SampEn) is a measure of complexity.[1] But it does not include self-similar patterns as ApEn does. For a given embedding dimension ${\displaystyle m}$, tolerance ${\displaystyle r}$ and number of data points ${\displaystyle N}$, SampEn is the negative natural logarithm of the probability that if two sets of simultaneous data points of length ${\displaystyle m}$ have distance ${\displaystyle then two sets of simultaneous data points of length ${\displaystyle m+1}$ also have distance ${\displaystyle . And we represent it by ${\displaystyle SampEn(m,r,N)}$ (or by ${\displaystyle SampEn(m,r,\tau ,N)}$ including sampling time ${\displaystyle \tau }$).

Now assume we have a time-series data set of length ${\displaystyle N={\{x_{1},x_{2},x_{3},...,x_{N}\}}}$ with a constant time interval ${\displaystyle \tau }$. We define a template vector of length ${\displaystyle m}$, such that ${\displaystyle X_{m}(i)={\{x_{i},x_{i+1},x_{i+2},...,x_{i+m-1}\}}}$ and the distance function ${\displaystyle d[X_{m}(i),X_{m}(j)]}$ (i≠j) is to be the Chebyshev distance (but it could be any distance function, including Euclidean distance). We define the sample entropy to be

${\displaystyle SampEn=-\log {A \over B}}$

Where

${\displaystyle A}$ = number of template vector pairs having ${\displaystyle d[X_{m+1}(i),X_{m+1}(j)]

${\displaystyle B}$ = number of template vector pairs having ${\displaystyle d[X_{m}(i),X_{m}(j)]

It is clear from the definition that ${\displaystyle A}$ will always have a value smaller or equal to ${\displaystyle B}$. Therefore, ${\displaystyle SampEn(m,r,\tau )}$ will be always either be zero or positive value. A smaller value of ${\displaystyle SampEn}$ also indicates more self-similarity in data set or less noise.

Generally we take the value of ${\displaystyle m}$ to be ${\displaystyle 2}$ and the value of ${\displaystyle r}$ to be ${\displaystyle 0.2\times std}$. Where std stands for standard deviation which should be taken over a very large dataset. For instance, the r value of 6 ms is appropriate for sample entropy calculations of heart rate intervals, since this corresponds to ${\displaystyle 0.2\times std}$ for a very large population.

## Multiscale SampEn

The definition mentioned above is a special case of multi scale sampEn with ${\displaystyle \delta =1}$,where ${\displaystyle \delta }$ is called skipping parameter. In multiscale SampEn template vectors are defined with a certain interval between its elements, specified by the value of ${\displaystyle \delta }$. And modified template vector is defined as ${\displaystyle X_{m,\delta }(i)={x_{i},x_{i+\delta },x_{i+2\times \delta },...,x_{i+(m-1)\times \delta }}}$ and sampEn can be written as ${\displaystyle SampEn\left(m,r,\delta \right)=-\log {A_{\delta } \over B_{\delta }}}$ And we calculate ${\displaystyle A_{\delta }}$ and ${\displaystyle B_{\delta }}$ like before.

## Implementation

Sample entropy can be implemented easily in many different programming languages. Below lies a vectorized example written in Python.

import numpy as np

def sampen(L, m, r):
N = len(L)
B = 0.0
A = 0.0

# Split time series and save all templates of length m
xmi = np.array([L[i : i + m] for i in range(N - m)])
xmj = np.array([L[i : i + m] for i in range(N - m + 1)])

# Save all matches minus the self-match, compute B
B = np.sum([np.sum(np.abs(xmii - xmj).max(axis=1) <= r) - 1 for xmii in xmi])

# Similar for computing A
m += 1
xm = np.array([L[i : i + m] for i in range(N - m + 1)])

A = np.sum([np.sum(np.abs(xmi - xm).max(axis=1) <= r) - 1 for xmi in xm])

# Return SampEn
return -np.log(A / B)


An example written in other languages can be found:

## References

1. ^ a b Richman, JS; Moorman, JR (2000). "Physiological time-series analysis using approximate entropy and sample entropy". American Journal of Physiology. Heart and Circulatory Physiology. 278 (6): H2039–49. doi:10.1152/ajpheart.2000.278.6.H2039. PMID 10843903.
2. ^ Delgado-Bonal, Alfonso; Marshak, Alexander (June 2019). "Approximate Entropy and Sample Entropy: A Comprehensive Tutorial". Entropy. 21 (6): 541. doi:10.3390/e21060541.
3. ^ Costa, Madalena; Goldberger, Ary; Peng, C.-K. (2005). "Multiscale entropy analysis of biological signals". Physical Review E. 71 (2): 021906. doi:10.1103/PhysRevE.71.021906. PMID 15783351.