1. Introduction
Complexity of time series can be studied through several measures, e.g., approximate entropy [
1] and sample entropy [
2]. However, some of these measures do not take into account the multiple time scales in physical systems. In the 2000s, the multiscale entropy (MSE) approach has been proposed by Costa
et al. to represent the complexity of a signal [
3]. MSE relies on the computation of the sample entropy over a range of scales: in the MSE algorithm, coarse-grained time series—that represent the system dynamics on different scales—are analyzed with the sample entropy algorithm. Since its introduction, MSE has become a prevailing method to quantify the complexity of signals. It has been used successfully in different research fields, including biomedical time series [
4–
6], electroseismic time series [
7], vibration of rotary machine [
8], financial time series [
9]. However, the sample entropy algorithm may give an imprecise estimation of entropy or even induce undefined entropy values for short time series (see, e.g., [
10]). Moreover, it has been reported that the coarse-graining procedure (that eliminates the fast temporal scales) exploits a filter with a frequency response that cannot prevent aliasing and which is therefore suboptimal, especially in the presence of fast oscillations [
11]. Finally, some authors mentioned that the entropy-based complexity is partially linked to the reduction of variance generated by the elimination of the fast temporal scales [
11]. This is why, other algorithms—based on MSE—have been developed to try to improve the original MSE algorithm. We herein propose to present a state-of-the art of these algorithms that have been conceived to overcome some limitations of the original MSE.
2. Original Multiscale Entropy Algorithm
The MSE algorithm is composed of two steps [
3,
4]
Sample entropy is a conditional probability measure that quantifies the likelihood that a sequence of
m consecutive data points—that matches another sequence of the same length (match within a tolerance of
r)—will still match the other sequence when their length is increased of one sample (sequences of length
m + 1);
m therefore defines the length of the patterns that are compared to each other [
2]. In this definition, the distance between two vectors is computed as the maximum absolute difference of their corresponding scalar components [
2]. More precisely, sample entropy is determined as
where
Am(
r) is the probability that two sequences will match for
m + 1 points and
Bm(
r) is the probability that two sequences will match for
m points (with a tolerance of
r), where self-matches are excluded [
2].
Equation (2) is estimated by the statistics [
2]
By setting
B = {[(
N − m − 1)(
N − m)]/2}
Bm(
r) and
A = {[(
N − m − 1)(
N − m)]/2}
Am(
r) [
2], we have
A/B =
Am(
r)
/Bm(
r) and the sample entropy can be expressed as [
2]
where
A is the total number of forward matches of length
m + 1 whereas
B is the total number of template matches of length
m [
2].
Regular and periodic signals have a theoretical sample entropy of 0 whereas uncorrelated random signals have a maximum entropy (the value depends on the signal length) [
2]. The sample entropy for coarse-grained white noise time series can analytically be written as [
4]
where τ and erf(·) refer to the scale factor and to the error function, respectively. Moreover, Costa
et al. have shown that the sample entropy of a coarse-grained 1
/f noise is equal to 1.8 for all scales, for the following parameters values:
m = 2,
r = 0.15
× the standard deviation, and
N = 30, 000 samples [
4]. For scale one, white noise time series present a higher value of entropy compared to the one of 1
/f time series. Moreover, the value of entropy for the coarse-grained white noise time series monotonically decreases and becomes smaller than the entropy value of coarse-grained 1
/f time series for scale factor τ > 4 [
3]. These results underline that 1
/f noise contains complex structures across multiple time scales whereas white noise does not.
However, it has been reported that the computation of MSE values as described in the above-mentioned algorithm presents several drawbacks (see, e.g., [
11,
12]):
The coarse-graining procedure can be seen as a two-steps procedure [
12]: (I) averaging the data inside a window of length τ in order to reduce the high frequency components; (II) downsampling of the averaged data by a factor τ. The coarse-graining procedure therefore reduces the length of the time series: at a scale factor τ, the coarse-grained time series has a length that is equal to the one of the original time series divided by τ. Therefore, the larger the scale factor, the shorter the coarse-grained time series. It has been reported that to obtain a reasonable entropy value, the time series length should be in the range of 10
m to 20
m [
2]. For shorter time series, the variance of the entropy estimator grows very fast as the number of data points is reduced. Large variance of estimated entropy leads to a reduction of reliability. Consequently, the statistical reliability of sample entropy for the coarse-grained time series is reduced, as the scale factor τ increases. The sample entropy algorithm therefore leads to an imprecise estimation of entropy—unreliable results with great variance (errors), or even undefined entropy values (when no template vectors are matched to one another)—for short time series or at large time scales. Thus, it has been shown that for synthetic signals for which the theoretical MSE values are known, the estimated MSE values (numerical solutions) may significantly differ from the analytic solutions (see, e.g., [
10]). This is particularly annoying for practical applications where it is difficult to obtain long recordings (biomedical field for example).
It has been reported that
Equation (1) is similar to the use of a finite-impulse response (FIR) filter [
11]
on the original time series
x and to the downsampling of the filtered signal with a factor τ. This FIR filter is a low-pass filter. The authors of [
11] underline that the features of the frequency response of this low-pass filter are poor, among others because it shows side lobes in the stopband. For the latter reason, the low-pass filter cannot prevent aliasing when the downsampling procedure is applied [
11]. As a result, the filter does not eliminate fast temporal scale above the filter’s cutoff frequency. The downsampling procedure that follows produces aliasing generating spurious oscillations in the frequency between 0 and the filter’s cutoff frequency [
11]. The authors of [
11] therefore conclude that the evaluation of the complexity of the downsampled time series is biased by these artifacts.
As mentioned previously, in the MSE algorithm two patterns are considered similar if they are closer than a parameter
r. The value of
r is usually chosen as a percentage of the standard deviation of the signal under study. In the original algorithm proposed by Costa
et al., the value of
r is constant for all scale factors. However, some authors considered this point as a drawback [
11,
13]. Indeed, as mentioned above,
Equation (1) can be seen as a low-pass filtering followed by a downsampling. As a result, when the scale factor increases, the standard deviation of the resulting filtered time series may become lower and lower. Therefore, the patterns may become closer and closer. If the parameter
r is constant while the scale factor τ increases, more and more patterns will be considered indistinguishable. This will lead to a decrease of the entropy when the scale factor τ increases. Valencia
et al. concluded that MSE measures not only the variations of entropy as a function of the scale factor τ but also the variations in the power of the signal [
11,
14]. On this question, Costa
et al. argue that subsequent changes of the variance due to the coarse-graining procedure should be accounted for by the entropy measure [
4].
From the three points mentioned above and others (see below), several authors proposed modified MSE algorithms in order to overcome the possible drawbacks of the original MSE. These algorithms are described below.
MSE is a method that can be used with different types of entropic measures. Thus, some works did not use sample entropy but permutation entropy (see, e.g., [
8,
15]), cross-approximate entropy (see, e.g., [
16,
17]), compression entropy (see, e.g., [
18]), … These works will not be described herein: we will focus only on MSE algorithms with a sample entropy-based approach. However, some of the algorithms described below could be adapted to use not sample entropy but other entropic measures. Moreover, in what follows, the computation of the sample entropy for a time series
x with the parameters
m and
r will be noted as
SE(
x, m, r).
3. Refined Multiscale Entropy
In 2009, Valencia
et al. proposed the refined MSE (RMSE) [
11]. In the latter algorithm, the authors proposed a way to remove the fast temporal scales and used a coarse-graining that prevents the influence of the reduced variance on the complexity evaluation [
11]. In order to improve the elimination of the fast temporal scales, the FIR filter is replaced by a low-pass Butterworth filter. The squared magnitude of the filter frequency response is chosen as [
11]
where
n is the filter order and
fc is the cutoff frequency [
11]. This filter has the advantage of presenting a flat magnitude for its frequency response in the passband. Moreover, it has no side lobes in the stopband and the roll-off is fast. It thus reduces aliasing when the filtered time series is downsampled [
11].
Regarding the coarse-graining procedure, Valencia
et al. proposed to reduce the dependence of the estimated entropy rate on the decrease of variance by updating the value of
r as a percentage of the standard deviation of the filtered signal [
11].
In order to analyze the performance of RMSE, Valencia
et al. processed (among others) a fully unpredictable process (Gaussian white noise) and a signal with long-range correlation (1
/f noise). They reported that RMSE is flat with scale factor τ for the Gaussian white noise and presents a slow but progressive increase with scale factor τ in the case of 1
/f noise. By opposition, the original MSE presents a monotonic decrease with scale factor τ both for the Gaussian white noise and the 1
/f noise [
11]. Since aliasing is more significant at short time scales when the fast oscillations are dominant, the largest differences between RMSE and the original MSE algorithms are found in the presence of high frequency oscillations [
11]. The RMSE algorithm has also been applied on experimental data (see, e.g., [
11,
14]).
4. Composite Multiscale Entropy
In 2013, Wu
et al. proposed the composite multiscale entropy (CMSE) in order to reduce the variance of estimated entropy values at large scales [
10]. For this purpose, and for a discrete time series
x = {
x1, …,
xi, …,
xN}, the
kth coarse-grained time series for a scale factor τ is defined as [
10]
where
Then, at a scale factor τ the sample entropy values of all coarse-grained time series are computed and the CMSE is defined as the means of the τ entropy values [
10]
where
corresponds to the sample entropy for the time series
, which can also be written as [
19]
where
represents the total number of
m-dimensional matched vector pairs and is computed from the
kth coarse-grained time series at a scale factor τ.
In order to compare the original MSE and CMSE, Wu
et al. applied the two algorithms on white and 1
/f noises [
10]. Moreover, different lengths for the data have been tested. For the white noise, they reported that the estimated MSE values (numerical solutions) are significantly different from the analytic solution, for short signals. For the 1
/f noise, they reported differences between the numerical solution and the analytic solution, for all scales [
10]. The authors also mentioned that the variance of the entropy estimator is improved by CMSE. However, the over estimation due to the short length of the time series still exists with CMSE on 1
/f noises [
10]. The authors report that CMSE shows better performances on short time series than the original MSE algorithm (the original MSE and CMSE give entropy values that are almost the same but CMSE can estimate entropy values more accurately than the original MSE: the standard deviation values of sample entropy are smaller with CMSE). They also mention that for white noise and 1
/f noise, CMSE gives a more reliable estimation of entropy than the original MSE algorithm, as shown by the simulation results. CMSE has also been applied on real data (see, e.g., [
9,
10]).
5. Refined Composite Multiscale Entropy
In 2014, Wu
et al. proposed the refined CMSE (RCMSE) to improve CMSE because CMSE does not resolve undefined entropy: CMSE estimates entropy more accurately but increases the probability of inducing undefined entropy [
19]. From
Equation (10), one can observe that in CMSE the logarithms of
for all τ coarse-grained time series are first computed and only then the average of the results is computed to give rise to the entropy value. As a result, CMSE leads to an undefined value when either
or
is zero. Consequently, when short time series are analyzed, CMSE leads to much more undefined values than does the original MSE algorithm. To address this drawback, the RCMSE has been proposed, based on the following three steps [
19]
the same coarse-graining procedure as in CMSE is used (see
Equation (8))
for each scale factor τ, and for all τ coarse-grained time series, the number of matched vector pairs
and
is computed
RCMSE is then defined as [
19]
From
Equation (11), one can note that RCMSE leads to undefined entropy values only when all
or
are zeros. The RCMSE algorithm therefore reduces the probability of having undefined entropy values, compared to CMSE [
19].
Wu
et al. also processed white and 1
/f noises with the RCMSE algorithm [
19]. They concluded that, regarding validity, the RCMSE algorithm is superior to both MSE and CMSE algorithms (the probability of obtaining undefined entropy was zero). Moreover, RCMSE leads to more consistent entropy values than CMSE or MSE (the standard deviations of the entropy values obtained with RCMSE were lower than those obtained with CMSE or MSE). Finally, Wu
et al. reported that RCMSE is better than the CMSE and MSE algorithms regarding independence of the data length [
19]. Nevertheless, the computational cost of RCMSE is worse than the one of MSE (but better than the one of CMSE) [
19]. RCMSE has recently been applied (in conjunction with empirical mode decomposition) on real data [
20].
6. Modified Multiscale Entropy for Short-Term Time Series
In 2013, Wu
et al. proposed the modified MSE (MMSE) in order to overcome the imprecise estimation of entropy and undefined entropy values obtained with short-term time series (short-term series generated by the coarse-graining procedure) [
12]. In the MMSE algorithm, the coarse-graining procedure is replaced by a moving-average procedure [
12]
where
x = {
x1, …,
xi, …,
xN} is the discrete time series under study. Moreover, the regularities of the moving-averaged time series at a scale factor τ are quantified by computing the sample entropy value with a time delay τ [
12]
Therefore, the number of template vectors used in the MMSE algorithm is larger than the one used in the MSE algorithm. This can avoid obtaining an undefined entropy value [
12]. However, the computational cost of MMSE is larger than the one of MSE [
12].
Applying the MMSE algorithm on white and 1
/f noises, Wu
et al. reported that MMSE can detect more effectively the behaviors of white and 1
/f noises than MSE and provides a more accurate estimation (lower standard deviation values obtained with MMSE than with MSE) [
12]. The authors also reported that MMSE can avoid obtaining undefined entropy and is able to provide a more precise estimation of entropy than MSE when analyzing a short-term time series [
12]. MMSE has also been applied on experimental data [
12].
7. Short Time Multiscale Entropy
In 2014, Chang
et al. reported another algorithm, the short time MSE (sMSE), in order to be able to process data recorded during a short time [
21]. Thus, for a discrete signal
x = {
x1, …,
xi, …,
xN}, the sMSE algorithm has the following steps [
21]
construction of the coarse-grained time series
y(p)(τ) with 0
≤ p ≤ τ
− 1 as
the τ
y(p)(τ) time series are subjected to sample entropy computation and averaged, giving sMSE of scale factor τ
where
SE(
y(p)(τ),
m, r) corresponds to the sample entropy for the time series
y(p)(τ).
The authors applied the sMSE algorithm on white and 1
/f noises [
21]. They noticed that, for white noise, MSE exhibits fluctuations whereas sMSE shows a relatively steady decrease [
21]. Moreover, sMSE has also been applied on experimental data [
21,
22]. Let us note that sMSE and CMSE present the same steps.
10. Adaptive Multiscale Entropy
More recently, Hu
et al. reported that the coarse-graining procedure used in the MSE algorithm essentially represents a linear smoothing and decimation of the original time series [
25]. Therefore, only coarse-scale (corresponding to low-frequency components) are captured. They also reported that, because of the linear operations, the way of extracting the different scales is not convenient for nonlinear/nonstationary signals [
25]. In order to overcome these drawbacks, Hu
et al. proposed the adaptive multiscale entropy (AME). AME aims at estimating the entropy of the data over multiple adaptive scales that are intrinsically determined by multivariate EMD. The AME algorithm relies on two steps [
25]: (I) a multivariate EMD to decompose the time series into the aligned IMFs at different scales; (II) a sample entropy computation over the selective scales. In order to select the scales, two algorithms have been proposed: the fine-to-coarse AME and the coarse-to-fine AME [
25]. From simulations, Hu
et al. reported that AME is able to adaptively extract the scales inherent in the nonstationary signal and that AME considers both coarse scales and fine scales in the data [
25]. AME was also used for biomedical data [
25].
11. Multiscale Fuzzy Sample Entropy
Xie
et al. proposed a modification in the sample entropy algorithm [
26]. In order to measure the match degree of two vectors, they suggested to use the nonlinear Sigmoid function instead of the Heaviside function (the Heaviside function presents a discontinuous and hard boundary, whereas the Sigmoid function is smooth and continuous). Moreover, in the modified sample entropy algorithm, the vector sequences are generalized by removing the local baseline. Xie
et al. reported that the modified sample entropy leads to better performances in term of relative consistency, freedom of parameter selection, robustness to noise and independence on the data length when characterizing time series with different regularities [
26]. This new algorithm has been used in different kinds of studies (see, e.g., [
27]). Moreover, Kong and Xie applied this modified sample entropy algorithm to classify ventricular tachycardia and fibrillation and reported good results [
28]. In 2010, Xie
et al. proposed to use the modified sample entropy (using a Gaussian function instead of the nonlinear Sigmoid function) in order to determine whether a time series arises from a stochastic or a deterministic process [
29]. They reported that the modified sample entropy is robust for detecting determinism in short and noisy signals [
29].
In 2009, Chen
et al. also proposed the use of the concept of fuzzy sets [
30]. They thus developed the FuzzyEn measure, which relies on fuzzy membership functions instead of the Heaviside function. This leads to make a fuzzy boundary [
30]. Chen
et al. reported that FuzzyEn corresponds to a more accurate entropy definition than sample entropy; it owns stronger relative consistency and less dependence on data length [
30]. Zhang
et al. used the same approach in a multiscale framework for fault feature extraction: they used a Gaussian type fuzzy membership function in the similarity definition of vectors [
31]. Xiong
et al. went further in the work of Chen
et al. [
30] and proposed a new fuzzy membership function leading to a new fuzzy sample entropy and its multiscale version [
32]. They reported that, when used to extract discriminating features of bearing vibration signals, their multiscale fuzzy sample entropy outperforms other multiscale entropy methods, especially when the signals contain heavy noise [
32].
12. Multivariate Multiscale Entropy
In order to be able to process multivariate data, multivariate MSE has been proposed [
33,
34]. For this purpose, a multivariate sample entropy version has been introduced for considering both within and cross-channel dependencies in multichannel data. Multivariate MSE relies on the same steps as MSE [
33,
34]: (I) a coarse-graining procedure; (II) a sample entropy computation for each coarse-grained time series. However, due to the multivariate nature of the data processed by multivariate MSE, these two steps are adapted to multivariate signals. Thus, for the coarse-graining procedure, temporal scales are defined by averaging a
p-variate time series
(
l = 1, …,
p is the channel index and
N is the number of samples in every channel) over non overlapping time segments of increasing length. Thus, for a scale factor τ, a coarse-grained multivariate time series is computed as
where 1
≤ j ≤ N/τ, and the channel index
l goes from 1 to
p. For the entropy computation, the multivariate sample entropy is used for each coarse-grained multivariate. The multivariate sample entropy algorithm is an extension of the (univariate) sample entropy [
2]. For a tolerance level
r, multivariate sample entropy is calculated as the negative of the natural logarithm of the conditional probability that two composite delay vectors close to each other in a
m dimensional space will also be close to each other when the dimensionality is increased by one. The detailed multivariate sample entropy algorithm can be found in [
33,
34]. In the multivariate MSE algorithm, multivariate sample entropy is evaluated over multiple time scales.
14. Computational Efficiency
The computational time of the original MSE algorithm prevents online applications or even the processing of long data sets (the time complexity of the algorithm is
O(
N2)). This is due to the computational time of the sample entropy value. In order to overcome this drawback, Sugisaki
et al. developed, in 2007, a recursive sample entropy algorithm for the situation when rolling windows overlapped, in order to be able to use the complexity measure (sample entropy algorithm) online [
36]. Their algorithm uses a variable forgetting factor mechanism. However, the algorithm proposed by Sugisaki
et al. has two drawbacks [
37]: the computed sample entropy is an approximation, and the computational efficiency decays with a decrease in overlap length (the improvement does not exist when overlap length is zero).
Samani and Madeleine proposed another algorithm to compute sample entropy for online applications [
38]. Their algorithm uses permuted limited number of samples to estimate sample entropy.
Later, Pan
et al. went further [
37]. They showed that the probability function in entropy could be transformed into an orthogonal range search problem in the field of computational geometry [
37]. The algorithm that they proposed for the computation of MSE is based on a
k-d tree algorithm. The authors reported that their algorithm reduces the computational time and mentioned that this new algorithm can be used for online applications [
37].
In 2011, Jiang
et al. also proposed an algorithm to compute sample entropy with less computational time. As for Pan
et al., their algorithm relies on a
k-d tree data structure to accelerate counting the number of matched pairs of the pattern templates [
39]. The authors reported that the time complexity of their algorithm is
and its space complexity is
O(
N log
2 N), where
N is the time series length and
m is the length of the pattern templates.
15. Other Improvements
Other improvements than those presented above have been proposed for the original MSE algorithm. Thus, in 2011, Chou proposed the use of MSE on residuals (at various resolution levels) given by the
à trous redundant wavelet transform [
40]. Based on this, the authors proposed to determine the number of resolution levels in the wavelet transformation, according to the Mann-Kendall rank correlation test [
40].
Regarding the selection of the parameter
r value, and as mentioned in Section 2, Nikulin and Brismar reported that the coarse-graining procedure is similar to smoothing and decimation of the original sequences [
41]. Due to the constant value of
r with scales, in the original MSE algorithm, they concluded that the changes in MSE on each scale depend on both the regularity and variation of the coarse-grained time series [
41]. To this point, Costa
et al. argued that the degree of irregularity of a complex signal is a property measured by entropy that cannot be entirely captured by the standard deviation or correlation measures, individually or in combination [
42]: they argue that, following the normalization, modifications in the variance generated by the coarse-graining procedure are due to the temporal structure of the original signal. Therefore, it should be accounted for by the entropy measure [
4].
Regarding the application of MSE on noisy data: it has been reported that MSE is not reliable when applied on heartbeat signals containing outliers [
4,
43]. Therefore, in order to quantify the complexity of noisy data (data with outliers), Lo
et al. proposed to analyze the irregularity of the sign time series of the coarse-grained data and, for this purpose, proposed the multiscale symbolic entropy (MSSE) analysis [
44]. MSSE uses the sign time series of each coarse-grained series at a given time scale (1 when the time series increases and 0 otherwise). The sign time series is divided into bit sequences of length
m. Based on the same concept as the approximate/sample entropy, these bit sequences are divided into different categories based on their temporal patterns (see details in [
44]). Then, based on the distribution of the sequences in different categories, the Shannon entropy (which gives information richness) and the mean rank (which gives the degree of uncertainty of the fluctuations) are computed for the sign time series. Lo
et al. used the MSSE on human heartbeat recordings. While the outliers due to ectopic beats were not removed, the authors were able to stress a modification in the cardiac control of patients with congestive heart failure, and even more [
44]. As a result, they mentioned that MSSE enables the application of the complexity theory in clinical practice [
44].
We can also report that Govindan
et al. proposed a modification in the definition of sample entropy for a better characterization of the complexity: in the original sample entropy algorithm, the time series under study is divided into blocks of size
m with a time gap of one (sample unit) between the successive components of the block [
45]. Govindan
et al. incorporated a time delay between the successive components of the block [
45]. The delay is larger than one in the presence of dynamical correlations. Govindan
et al. reported that this modification leads to a better characterization of the complexity than the original algorithm of sample entropy [
45].