1. Introduction
Many real-world signals including physiological signals are irregular in some aspects. They are neither purely periodic nor can they be expressed by an analytic formula. The inherent irregularities imply the uncertainty during the evolution of the underlying process from which the signals are observed. The uncertainty enables information transfer but also limits the predictability of those signals. The unpredictability of signal in time domain makes researchers have to toil in frequency domain. Fourier transform (FT) bridges signals in original space (time domain) with their representations in dual space (frequency domain) by decomposing a signal satisfying some weak constraints into infinitely many periodic components, which has numerous applications in signal processing.
For most of the real-world applications, finite samples drawn (usually evenly spaced) from a continuous random process cannot give us full information about the process’ evolution but only a discrete depiction. FT was adapted into discrete Fourier transform (DFT) for such scenarios [
1]. Moreover, line spectrum wherein the total energy of the signal distributes on only few frequency components is rarely encountered among physiological signals due to the inherent irregularities therein.
To characterize the irregularity of digital signals in frequency domain, spectral entropy is introduced analogous to the Shannon entropy in information theory [
2]. The estimations on the frequency grid are firstly divided by the total power, and then, a list of proxies in the form of probabilities whose sum is 1 is obtained. Then, the Shannon entropy formula, which is the negative sum of probability-weighted log probabilities, map those proxies into a quantity representing the irregularity of energy distribution on frequency domain. Under this perspective, a flat spectrum has maximal spectral entropy, and the spectrum of a single frequency signal has minimal spectral entropy, which is zero. Spectral entropy has been applied in diverse areas, including endpoint detection in speech segmentation [
3] and spectrum sensing in cognitive radio [
4]. Moreover, it has also served as the base of a famous inductive bias, maximum entropy [
5], which is widely adopted for spectrum estimation of some kinds of physiological signals like electroencephalogram (EEG).
Although spectral entropy is well-defined and can be computed efficiently by Fast Fourier Transform (FFT), it is difficult to relate spectral entropy with other interpretable properties of interest of original signal, especially when taking no account of the overwhelming endorsement from its counterpart (information entropy), which is the foundational concept in information theory which quantifies the uncertainty. Furthermore, it is apparent that the spectral entropy ignored the order information since the power estimations are arranged on the frequency grid with intrinsic partial order structure. Any permutations of these values on the grid yields a same spectral entropy, but obviously, the representations of those signals in time domain can look very different.
The motivation to incorporate the order information carried by the power spectrum is guided by the following belief. The normal operations of any system (biological/electromechanical, etc.) are impossible without the proper processing of information through some physical/chemical process. It could be the signaling between different modules within the system or the communications between the system as a whole and the external environment. Information transfers happening in those scenarios are accomplished with the help of carrier signals of particular forms with nontrivial structures in their spectra. Moreover, only limited frequency precision of the control and recognition of those signals is practical for real systems. Therefore, it is unreasonable for well-designed artificial systems or natural systems that have gone through long-term evolution to arrange the internal signals responsible for different functions close with each other in frequency domain within a certain time window. Otherwise, the efficient transfer of information could be degraded, and frequency divided multiplex [
4] in modern communication systems can be considered as a living example of this belief.
Therefore, if we use power estimations on the frequency grid as proxies of the intensities of activities corresponding to those frequencies, it seems reasonable to infer that the energy distributed on neighboring rather than remote frequency grids is more likely caused by the very same function. The alpha band activities (8–13 Hz) which can be interrupted by visual perception tasks in human’s EEG is one of the examples. To sum up, we want to develop a metric to characterize the aforementioned structural irregularities of the power spectra, that is, how the frequency components of different intensities in a spectrum close to each other instead of what is captured in spectral entropy, which is how the intensities of frequency components are distributed no matter their locations in frequency domain. It was supposed to assign a larger value to a signal wherein the frequency components having similar intensities are distributed far apart from, rather than close to, each other. In addition, the similarities of intensities can be reflected (partially and heuristically) by the relative order of power estimates on discrete frequency grid. That is why the order information in the spectrum can shed new light on the structure aspects of signal and how the order information is incorporated into our analysis.
In this paper, we explore the effectiveness of the order information carried by the power spectra of signals. Given the motivation illustrated above, in
Section 2 we provide details about our method. In
Section 3 we present several use cases to justify the effectiveness of our preliminary approach and, more importantly, the promising potential to find some new research niche in the field of physiological signal processing. Finally, discussion about the limitations of our work and future directions are followed in
Section 4.
2. Materials and Methods
Given an equally spaced, real-valued digital signal
, we assume the length of
is an even number
, for simplicity. Then DFT is applied to
and a complex-valued vector
of dimension
is obtained as follows:
Due to the conjugate symmetry of , we take the square of the modulus of the first half of and get . Thanks to the Parseval identity, the 1-norm of equals to the energy of up to factor 1/2. Although has a dimension of energy instead of power, the constant factor having a dimension of time does not change the relative ordinal relations between its components. So we just use as the estimations of power on normalized frequency range , whereby the component of is the estimation of signal’s power on grid point .
Now, let us assume again that every two components of are different from each other, so we can rank these components without any ambiguity in ascending/descending order.
These grid points have an intrinsic partial order structure from low frequency range to high frequency range, so we get eigen-triple for these grid points:
The first row indicates the grid points by their location on frequency range. The second row contains the corresponding power estimations. The third row contains the relative order of corresponding power estimation among all estimations, denoted by . Since no duplicated values in are assumed, will traverse number set .
The first two rows of are just a kind of representation of traditional power spectrum. Novelty lies in taking the order information, carried in the third row, into consideration.
It should be noted that the first and the third row together have defined a permutation over the natural number set , with its complete detail determined by implicitly. Remember that spectral entropy is defined in a permutation-invariant way. Such an invariance must be broken down so as to disentangle the order information. Therefore, this permutation per se returns the long-overdue ladder to understand structural irregularities of signals under a new perspective.
The sketch of our method is illustrated in
Figure 1. Using the measurements in time domain (
Figure 1a), the power estimations on normalized frequency grid with resolution determined by half the length of original signal are obtained (
Figure 1b). By ranking these estimations in descending order, we arrange
against
. As shown in
Figure 1c, the first stem indicates the location on the frequency grid of the largest power component and so on. From (b) to (c), we are actually performing a nonlinear stretching while the order information of the spectrum is preserved and calibrated. Then a distance matrix
(in
Figure 1d) is induced for every point pair. Here in (c) we define
.
So
is real-symmetric with trace identically equals to 0. The structural aspects of
are reflected in its eigenvalues (
Figure 1e). Due to the sophisticated relationships between its entries, it is unwise to reshape such a high dimensional object with far lower degrees of freedom into a long vector for pattern recognition. In addition to the eigenvalues, a descriptor, named as
Circular Difference Descriptor (
), accounting for the total variation of the locations on frequency grids of frequency components having adjacent intensities is defined as follows, to a large extent, in a heuristic manner:
The first term makes Circular Difference veritable and endows translational invariance instead of permutational invariance.
Another heuristic descriptor is defined slightly different from
, named as
Correspondence Difference Descriptor (
). It equals to the 1-norm of the difference of
and
, aiming to characterize the difference between
and the perfectly ordered case where
,
Results from the Monte-Carlo simulation (shown in
Figure 2) imply that the empirical distributions of
and
among all permutations could well be Gaussian. Although theoretical distributions of
and
must have bounded supports for finite
, they fit a bell-shaped curve very well, which in theory has unbounded support.
Since permutational invariance of spectral entropy is broken herein,
and
actually encode the signal in different ways but both with guaranteed information gain with respect to spectral entropy. To be specific, given
without
, the corresponding
and
are fixed, but the distribution of
can form widely differed spectra. We take flat spectrum in
and line spectrum in
as examples:
The corresponding spectral entropy values vary from infinitesimal (in ) to maximum possible (in ). On the contrary, given , any permutation on it yields an exactly same spectral entropy, as mentioned before, but the corresponding and will absolutely transverse all possible values.
The relationship between spectral entropy and the proposed descriptors is illustrated in
Figure 3. The set A denotes full space of signals’ spectra whereby for each
no duplicate value exists for its sub-components
, which is an assumption made by us for simplicity and with only minimal loss of generality. Signals in the set B have the same spectral entropy, denoted by
. The following conditions need to be satisfied for signals in C:
Spectral entropy operator is denoted by , and denotes the rank vector of . For example, if we have , then we will have .
Since all members in C are with a same
rank vector, we can obtain many different counterparts of C which share this spectral entropy value with it by a certain permutation on the arrangement of
. Since there are
different permutations other than the identical permutation, the following relationship is obtained:
Until now, we get a coverage of B by disjoint subsets. Members in the same subset share a specific spectral entropy value, a same rank vector, and cannot be transformed to be identical to each other by simply rearranging their sub-components.
Given only the value of spectral entropy () without rank vector, we can localize the signal in A to B. Given , the location will be more accurate (to one of many Cs in B). From this perspective, we can distinguish signals which have completely different order structures with the same spectral entropy.
If no a priori about the signals’ spectra is available, then the equiprobable distribution of
is substantially and implicitly pre-assumed. Then under such circumstance, the so-called Kullback–Liebler Divergence (KLD) which is a widely used method to measure the difference between two probability distributions is adopted to illustrate the advantage when using the proposed descriptors. KLD between the proposed descriptors and spectral entropy as different coding schemes having probability distribution
and
are always nonnegative [
6], no matter the direction (KLD is lack of symmetry). KLD between two distributions
and
is defined as follows:
Such a property is welcomed since it guarantees the nonnegative information gain when using both spectral entropy and the proposed descriptors instead of only one of them. In other words, the representation will be more informative with a combination of our proposed descriptors and spectral entropy.
As for the distance matrix with its entries representing the distance or similarity between point and , distance measures other than the absolute difference can be applied on and to form different distance matrices. Given any distance measure, a topology is induced on this finite set , based on the coarse-grained, discrete-valued rankings among them, and certainly, more order information is unrevealed yet. For example, is just the circular difference of the first sub-diagonal line of , captures only a portion of full information.
To sum up, by ranking power estimations of signal on a discrete frequency grid, an interesting picture of order structure carried by signal’s spectrum is obtained.
4. Discussion
Order structure of signal’s spectrum is revealed by simply ranking the power estimations. Several use cases justify that taking that order structure into consideration could contribute valuable information to the processing of physiological signals. The possible applications include serving as candidate features for pattern recognition among signals, change point detection in process tracking for anomaly detection and many more.
The permutation of length
N defined by rankings of power estimations on frequency grid has huge capacity (
). Although in practice it is not necessarily that these
different ordinal patterns are equiprobable, the proved information gain under such an assumption is still hoped to be found in practice. An established metric,
permutation entropy is based on ranking consecutive measurements in time domain and doing statistics among a sufficient number of segments [
11]. The length of such segments must be small otherwise the density estimation will be impractical for time series of reasonable length. Our method delves into the order structure of signal’s representation in dual space (frequency domain) instead of original space (time domain). Every point in the dual space is bridged to all points in the time domain through FT, so no one-to-one correspondence exists between original measurements and mapped points in the proposed method. This is also an important distinction.
The proposed descriptors in their original forms could be noise vulnerable, but they can be modified using techniques include but not limit what is used here. In practice, we observed high correlation between
and
, and one could outperform another at times. In addition, the pairwise distances in the distance matrix in
Figure 1d can be induced in a way other than that used here. Anyway, more fruitful and distinguishable features can be extracted along this way from such a representation with large capacity.
As for future research, we have several proposals.
The first is to establish relationships between the order information given by a recorded digital signal of length 2N and that of its sub-signals, obtained by (nonuniformly) down-sampling these 2N points. Uniformly down-sampling is equivalent to folding the power spectrum. Situations will be more sophisticated under nonuniform cases (include but not limit to evolving/truncating case where the length of signal is ever-increasing), but usually a more flat spectrum with lower frequency resolution is produced. The original signal with its sub-signals together could provide an informative and hierarchical object of study.
The second is to develop distance measures other than the absolute difference of ranks used here. By incorporating both the discrete-valued ranks and the continuous-valued power estimations, parameters more robust to broad band noise could be anticipated. Furthermore, could ‘ranking’ of power spectrum of a continuous function (signal) be possible in some sense?
The third is about the topology induced from the distance matrix. The distance matrix in
Figure 1d or the distance defined by possible modified measures, as mentioned above, whereby block structures frequently occur, provides full neighborhood information of
N points on frequency grid. Given such information, could we find some relations with the eigenvalues of DisMat (with possible modification mentioned in the second point) with some properties of interest of original signal? Despite that, we can also calculate so-called persistent homology—a dominant methodology usually referred as synonym of topological data analysis (TDA)—of these
N points by computing a series of simplicial complexes with their topological invariants [
12] and get topological description of signal’s power spectrum. That means the order information in spectrum enables a nontrivial embedding method of data points with temporal structure. Such an embedding method is different from the famous delay-embedding [
13], which is an operation performed in signal’s original space rather than dual space. Delay-embedding could be vulnerable to short and noisy process. A messy point cloud could provide nothing except for ‘topological noises’. However, by ranking power spectrum, the data points are arranged in an organized way, and the application of TDA can be free from such pitfalls encountered in delay-embedding.
In conclusion, order structures of physiological signals’ power spectra are almost neglected in existing methods, but they are not meaningless. On the contrary, such structures could provide a unique perspective to understand the intrinsic properties of physiological processes.