# Sample entropy

Sample entropy (SampEn) is a modification of approximate entropy (ApEn), used for assessing the complexity of physiological time-series signals, diagnosing diseased states. SampEn has two advantages over ApEn: data length independence and a relatively trouble-free implementation. Also, there is a small computational difference: In ApEn, the comparison between the template vector (see below) and the rest of the vectors also includes comparison with itself. This guarantees that probabilities $C_{i}'^{m}(r)$ are never zero. Consequently, it is always possible to take a logarithm of probabilities. Because template comparisons with itself lower ApEn values, the signals are interpreted to be more regular than they actually are. These self-matches are not included in SampEn.

There is a multiscale version of SampEn as well, suggested by Costa and others.

## Definition

Like approximate entropy (ApEn), Sample entropy (SampEn) is a measure of complexity. But it does not include self-similar patterns as ApEn does. For a given embedding dimension $m$ , tolerance $r$ and number of data points $N$ , SampEn is the negative logarithm of the probability that if two sets of simultaneous data points of length $m$ have distance $ then two sets of simultaneous data points of length $m+1$ also have distance $ . And we represent it by $SampEn(m,r,N)$ (or by $SampEn(m,r,\tau ,N)$ including sampling time $\tau$ ).

Now assume we have a time-series data set of length $N={\{x_{1},x_{2},x_{3},...,x_{N}\}}$ with a constant time interval $\tau$ . We define a template vector of length $m$ , such that $X_{m}(i)={\{x_{i},x_{i+1},x_{i+2},...,x_{i+m-1}\}}$ and the distance function $d[X_{m}(i),X_{m}(j)]$ (i≠j) is to be the Chebyshev distance (but it could be any distance function, including Euclidean distance). We define the sample entropy to be

$SampEn=-\log {A \over B}$ Where

$A$ = number of template vector pairs having $d[X_{m+1}(i),X_{m+1}(j)] $B$ = number of template vector pairs having $d[X_{m}(i),X_{m}(j)] It is clear from the definition that $A$ will always have a value smaller or equal to $B$ . Therefore, $SampEn(m,r,\tau )$ will be always either be zero or positive value. A smaller value of $SampEn$ also indicates more self-similarity in data set or less noise.

Generally we take the value of $m$ to be $2$ and the value of $r$ to be $0.2\times std$ . Where std stands for standard deviation which should be taken over a very large dataset. For instance, the r value of 6 ms is appropriate for sample entropy calculations of heart rate intervals, since this corresponds to $0.2\times std$ for a very large population.

## Multiscale SampEn

The definition mentioned above is a special case of multi scale sampEn with $\delta =1$ ,where $\delta$ is called skipping parameter. In multiscale SampEn template vectors are defined with a certain interval between its elements, specified by the value of $\delta$ . And modified template vector is defined as $X_{m,\delta }(i)={x_{i},x_{i+\delta },x_{i+2\times \delta },...,x_{i+(m-1)\times \delta }}$ and sampEn can be written as $SampEn\left(m,r,\delta \right)=-\log {A_{\delta } \over B_{\delta }}$ And we calculate $A_{\delta }$ and $B_{\delta }$ like before.

## Implementation

Sample entropy can be implemented easily in many different programming languages. An example written in Matlab can be found here. An example written for R can be found here.

Python

 1 import numpy as np
2
3 def SampEn(U, m, r):
4
5     def _maxdist(x_i, x_j):
6         result = max([abs(ua - va) for ua, va in zip(x_i, x_j)])
7         return result
8
9     def _phi(m):
10         x = [[U[j] for j in range(i, i + m - 1 + 1)] for i in range(N - m + 1)]
11         C = 0
12         for i in range(len(x)):
13             for j in range(len(x)):
14                 if i == j:
15                     continue
16                 C += (_maxdist(x[i], x[j]) <= r)
17         return C
18
19     N = len(U)
20
21     return -np.log(_phi(m+1) / _phi(m))