Next Article in Journal
Eco-Design of Energy Production Systems: The Problem of Renewable Energy Capacity Recycling
Previous Article in Journal
A Logarithmic Turbulent Heat Transfer Model in Applications with Liquid Metals for Pr = 0.01–0.025
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Compressive Sampling with Multiple Bit Spread Spectrum-Based Data Hiding

by
Gelar Budiman
1,2,*,†,‡,
Andriyan Bayu Suksmono
1,† and
Donny Danudirdjo
1,†
1
School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung 40132, Indonesia
2
School of Electrical Engineering, Telkom University, Bandung 40257, Indonesia
*
Author to whom correspondence should be addressed.
Current address: Jl. Ganesha No.10, Bandung 40132, Indonesia.
Current address: Jl. Telekomunikasi Terusan Buahbatu Bandung 40257, Indonesia.
Appl. Sci. 2020, 10(12), 4338; https://doi.org/10.3390/app10124338
Submission received: 8 May 2020 / Revised: 8 June 2020 / Accepted: 9 June 2020 / Published: 24 June 2020
(This article belongs to the Section Electrical, Electronics and Communications Engineering)

Abstract

:
We propose a novel data hiding method in an audio host with a compressive sampling technique. An over-complete dictionary represents a group of watermarks. Each row of the dictionary is a Hadamard sequence representing multiple bits of the watermark. Then, the singular values of the segment-based host audio in a diagonal matrix are multiplied by the over-complete dictionary, producing a lower size matrix. At the same time, we embed the watermark into the compressed audio. In the detector, we detect the watermark and reconstruct the audio. This proposed method offers not only hiding the information, but also compressing the audio host. The application of the proposed method is broadcast monitoring and biomedical signal recording. We can mark and secure the signal content by hiding the watermark inside the signal while we compress the signal for memory efficiency. We evaluate the performance in terms of payload, compression ratio, audio quality, and watermark quality. The proposed method can hide the data imperceptibly, in the range of 729–5292 bps, with a compression ratio 1.47–4.84, and a perfectly detected watermark.

1. Introduction

At present, the exchange of data and information in the Internet network has increased very dramatically. With more and more people accessing the Internet and more and more content that can be accessed, the size of the data accessed in a given time increases on an exponential scale. With the increase in data access, more and more crimes related to data include data falsification, data theft, claiming unilateral ownership of data, leaking data, deception of data, and many other crimes related to Internet data access. These problems have implications for the more losses experienced by data owners, which also affect state losses. Losses suffered by the state cause harm to its people, such that crime in the Internet world only benefits certain parties and results in a big loss for the wider community. Thus, technology that provides security for data, including marking ownership rights to data and hiding important data when sent over the Internet, becomes mandatory to anticipate losses suffered by the wider community.
With more and more data content accessed, the greater the memory capacity needed; besides, assuming the network infrastructure does not increase, the network capacity also decreases due to increased data traffic accessed, and power requirements on the network infrastructure also increase. These conditions bring problems in how to access data efficiently so that we can save the infrastructure and energy needs by minimum usage. One technique that can provide solutions to these problems is Compressive Sampling or Compressed Sensing (CS). This technique takes or picks up part of the data or signal from the sensor and then sends the data from the sample, and the receiver can reconstruct it back to the data as if it were authentic.
In this paper, we propose a technique for sampling audio signals and inserting or hiding data into them at the same time, so that the sampled signals have a smaller size and, at the same time, there is data inserted into the encoded data. With this technique, the signal stored in the cloud system from recording results by sampling is smaller in size, and we can mark it with hidden data at the same time. The broadcast monitoring application is an example of how to monitor signals in real-time and store the results in the cloud. Monitoring such signals is more efficient if partial signal sampling is applied, such that the signal size becomes smaller than the original signal. At the same time, marking or indexing is applied by hiding data on the signal at any given duration to secure the authenticity of the monitored signal or to index the monitored signal by hiding its index on the encoded signal. Another application example is the recording of biomedical signals in which one samples them using several sensors and, at the same time, embeds the ownership marking or index into the encoded signal. Thus, the recorded biomedical signal has a smaller size than the original size, but does not reduce the quality of the biomedical signal, and there is a marking that is inserted in the encoded signal to secure the biomedical signal.
CS in audio combined with the data hiding technique is a rare topic. The combination of CS and data hiding makes it possible to compress the audio and, at the same time, hide the watermark. Hua in [1] and Xin in [2] formerly proposed the CS applications in audio combined with data hiding. In [2], Xin proposed an embedding method on host audio that was semi-fragile zero-watermarking by decomposing the audio into the wavelet domain and applying the CS technique to the audio wavelet coefficients without describing the audio reconstruction to determine the audio quality after the embedding process. Watermarks inserted in the measurement vector utilize positive and negative signs on the matrix elements. The result is that the inserted watermark is resistant to damaging samples from the signal. However, this paper does not explain the function of CS in terms of reducing the signal size. Xin only explained CS techniques as a technique of inserting data with the property of being semi-fragile.
Griffin, in [3], proposed the CS method to compress the sinusoidal signal. Griffin investigated whether CS can be used to compress sinusoidal audio at a low speed because audio models like this have a high degree of spacing in the frequency domain. In his proposed method, Griffin performed CS techniques on single channels and multiple channels of audio signals with sinusoidal characteristics only. Griffin stated that the research he did was not to develop audio compression techniques and compare with existing compression techniques, but to find out how far CS was able to be applied in reducing the size of audio files so that the application applied, in this case, was for wireless sensor networks. Griffin could produce the smallest compression ratio by 5.4%. He applied spectral whitening first on the new audio, then applied the CS technique to the spectral results, so that he produced a tiny compression ratio with a good quality of the reconstruction results.
Fakhr in [4] proposed an insertion method using CS techniques by first thinning the host audio and watermark signals using the Walsh-Hadamard Transform (WHT), Discrete Cosine Transform (DCT), and Karhunen–Loeve Transform (TLC). Watermark extraction and the audio host are done by reconstruction L 1 minimization. Fakhr claimed that the technique could withstand MP3 attacks at the lowest rate of 64 kbps with an 11 bps watermark payload and the highest payload at 172 bps against additive noise attacks. However, Fakhr used CS not for compression techniques, but as an insert technique. Fakhr used MP3 attacks as compression to reduce the size of the audio signal after embedding a watermark. The watermarking applied to the compressed sensing domain was also proposed by Jeng-Shyang in [5,6]. Jeng-Shyang used DWT-DCT as the host sparsity before he applied the CS acquisition and the watermark embedding procedure. In the other scenario, Patsakis in [7] used CS to detect the embedded data by Least Significant Bit (LSB) and the DCT method. CS in [7] was used as a denoising filter to detect the hidden data. However, the embedding method in [5,6,7] worked on the image as the host.
In [1], Hua proposed a data hiding technique that was combined with CS synthetically. Suppose we define an over-complete dictionary A R p × r , an uncompressed vector z R r × 1 , a watermark bit to be inserted as b { 1 , + 1 } , a watermark code sequence w R r × 1 , a compressed vector y R p × 1 , and α as the gain control of the watermark, then we have:
y = A ( z + α b w ) ,
Hua inserted b as the additional operation to z after multiplying by α w . In this paper, we embed the watermark bits into the over-complete matrix A . Then, we multiply A by the diagonal matrix from the singular values of the host audio.
The data hiding technique proposed in this paper is multiple orthogonal codes based on Spread Spectrum (SS), as formerly introduced by Xin in [8] in time domain embedding and continued by Xiang in DCT domain embedding in [9,10]. We use the Hadamard code as the sequence for multiple bits of the watermark due to its best code performance [11]. The matrix A consists of p Hadamard sequences that represent p groups of multiple bits.
One of the signal sparsity techniques is a shrinkage technique on Singular Value Decomposition (SVD) output. This SVD technique truncates U , S , and V with a specific rank as also described in [12,13,14,15]. This shrinkage technique yields a more compressed signal as the CS output, but certainly decreases the quality of the reconstructed signal. In this paper, we decompose a host signal using SVD. Then, the outputs of SVD, i.e., U , S , and V , are truncated at a specific rank. We transform the truncated singular matrix S r to compressed domain Y via an over-complete dictionary containing SS-based data hiding A . Thus, the matrices ready to be transmitted to the detector are U r , Y , and V r . Then, in the receiver, firstly, we detect dictionary A containing the hidden data. We can extract the hidden data from the detected dictionary. Not only can we take back the hidden data, but also, we can get the reconstructed signal to the original domain. Note that the process on the receiver needs only the compressed domain signal, such as U r , Y , and V r . There is no dictionary, and original data are needed for data detection and signal reconstruction.
We organize the rest of this paper as follows. Section 2 describes the sparsity of the singular value and CS technique for the audio compression. Section 3 explains the mathematics model and derivation of audio watermarking including the embedding, the extraction, the audio reconstruction process, and the effect of the noisy environment in this proposed method. Section 4 discusses the result of the simulation, while Section 5 reports the conclusion of this paper.

2. Sparse Singular Value and CS Technique

The host signal in the form of a vector x = [ x 1 , x 2 , x L ] R 1 × L is converted into a two-dimensional matrix X R M × M where L = M 2 . The conversion to a two-dimensional matrix X is applied in such a way that it produces:
X = x 1 x M + 1 x M ( M 1 ) + 1 x 2 x M + 2 x M ( M 1 ) + 2 x M x 2 M x M 2 .
The SVD process of X obtains orthogonal matrices U R M × M , S R M × M , and V R M × M , where the relationship is described as:
X = U S V T ,
where S is a sparse diagonal matrix having M non-zero elements in the diagonal of the matrix as M singular values. For compression needs, U , S , and V can be truncated or reduced to U r = U [ 1 , . . , M ; 1 , . . , r ] R M × r , S r = S [ 1 , . . , r ; 1 , . . , r ] R r × r , and V r = V [ 1 , . . , M ; 1 , . . , r ] R M × r with r < M . Then, we apply CS acquisition S r as:
Y = A S r ,
where A R p × r is an over-complete dictionary containing the SS-based encoded watermark and Y R p × r is an output of CS acquisition with a smaller size than S . The truncated matrix S r has the form of:
S r = σ 1 0 0 0 σ 2 0 0 0 σ r ,
where σ 1 , σ 2 , . . . , σ r are the singular value elements. The matrix A is described later in Section 3.1. Finally, we have three matrices to be transmitted, that is U r , V r , and Y . From this result, we can calculate the Compression Ratio (CR) as the comparison between the original signal length and the transmitted signal length as:
CR = L X L T = M 2 2 M r + p r ,
where L X is the element number of X , that is M 2 , and L T is the total number of the transmitted elements U r , Y , and V r , i.e., 2 M r + p r .
We can calculate the reconstructed audio matrix with the same size as X in the form of:
X r = U r S ^ r V r T = x ^ 1 x ^ M + 1 x ^ M ( M 1 ) + 1 x ^ 2 x ^ M + 2 x ^ M ( M 1 ) + 2 x ^ M x ^ 2 M x ^ M 2 ,
where X r R M × M , but its element values are slightly different than X . The r value controls the signal quality and the signal compression ratio. If r is lower, then the compression ratio is higher, but the signal quality is worse. Finally, we can get x ^ = [ x ^ 1 , x ^ 2 , x ^ M 2 ] as a reconstructed or decompressed version of the signal by converting two-dimensional matrix X r back to a vector or one-dimensional signal x ^ ; thus, we can calculate the signal quality by comparing x and x ^ .

3. Data Hiding Model

3.1. An Overcomplete Dictionary with SS-Based Content

In this proposed method, firstly, we convert the audio host to the frequency domain using DCT in the process before applying insertion and compression. In the audio receiver, after being reconstructed or decompressed, the reconstructed audio is re-converted to the time domain with IDCT. The DCT and Inverse DCT (IDCT) formulations used for this method are in the following equation [16]:
X ( k ) = w ( k ) n = 0 N p 1 x ( n ) cos π ( 2 n 1 ) ( k 1 ) 2 N p
x ( n ) = 2 N p + 1 k = 0 N p 1 l ( k ) X ( k ) cos π ( 2 n 1 ) ( k 1 ) 2 N p ,
where X ( k ) is the audio signal in the DCT domain, x ( n ) is the audio signal in the time domain, and N p is the number of DCT points. l ( k ) is defined in the following equation:
l ( k ) = 1 N , if k = 0 2 N , if 1 k N p 1 .
After transforming the signal to the frequency domain by DCT, we apply the signal to the SVD decomposition as displayed in Figure 1. In detail, compression and embedding procedure is described at Table 1.
In this paper, the orthogonal code mapping to multiple bit watermarks is a Hadamard sequence taken from the Hadamard matrix. Denote the Hadamard matrix H r { 1 , + 1 } r × r generated by [17,18] as:
H r = H r 2 H r 2 H r 2 H r 2 ,
where H 1 = [ 1 ] . Assume H r ( j ) is a vector from the j th row of H r , then the orthogonal Hadamard sequence p j , where j = 1 , 2 , . . . , r , is obtained from:
p j = H r ( j ) .
Let A 0 { 1 , + 1 } p × r be an SS-based content matrix, where p < r , and p t i R 1 × r is a Hadamard sequence associated with the watermark bits in the i th row of A 0 . Let t i = { t 1 , t 2 , . . . , t p } be the set of Hadamard sequence indices where i is a row index of A 0 . Thus, A 0 contains p t i as:
A 0 = p t 1 p t 2 p t p ,
where the semicolon from (13) restricts each p t i to the different row. Since there are p rows of A 0 , there are p Hadamard sequences in A 0 . Thus, we have an over-complete dictionary A R p × r :
A = 1 p A 0 ,
with the unit norm of its columns: a m 2 2 = 1 , where m = 1 , 2 , . . . , r .
A Hadamard sequence represents multiple watermark bits. Assume that there are N s watermark bits for a Hadamard sequence, then there are N p different Hadamard sequence possibilities, where N p = 2 N s . Note that the length of a Hadamard sequence and also the row of matrix A is r bits, thus r = N p due to the square size of Hadamard matrix as (11). Denote w t i as a watermark vector in the i th segment of the watermark with a vector index or a Hadamard index t i , then:
w t i = [ w t i ( 1 ) w t i ( 2 ) w t i ( N s ) ] ,
where w t i ( l ) { 1 , + 1 } and l = 1 , 2 , . . . , N s . In multi-bit SS, the watermark vector w t i is mapped to a Hadamard sequence p t i . For example, if we have three bits watermarked in a Hadamard sequence, or N s = 3 bits, then N p = 2 N s = 8 bits; thus, all watermark possibilities and their mapping to Hadamard sequences are displayed in Table 2. If we have two segments or two vectors of watermark w t 1 = [ 1 , + 1 , 1 ] and w t 2 = [ + 1 , 1 , 1 ] , then using Table 2, we get t 1 = 3 and t 2 = 5 ; thus, p t 1 = p 3 = { + 1 , + 1 , 1 , 1 , + 1 , + 1 , 1 , 1 } , and p t 2 = p 5 = { + 1 , + 1 , + 1 , + 1 , 1 , 1 , 1 , 1 } .
The over-complete matrix A 0 contains p N s bits of watermark for the host with length M 2 ; thus, we can compute watermark payload C in bps as:
C = p N s F s M 2 ,
where F s is the host signal sampling rate in samples/s. Due to N s = log 2 N p = log 2 r , thus (16) will be as:
C = p log 2 r F s M 2 .
Once A is generated from the associated watermark bits, it is embedded into S r using matrix multiplication in (4). The result Y is not only a matrix with a smaller size than S r , but also, it is embedded by the watermark bits. The matrix S r is a diagonal matrix whose size is reduced from the original one S . From (4), (5), and (13), the equation Y = A S r can be exploited as:
y t 1 y t 2 y t p = p t 1 p t 2 p t p σ 1 0 0 0 σ 2 0 0 0 σ r ,
where y t i R 1 × r is a vector of matrix Y at row i, which also corresponds to p t i , and σ 1 , σ 2 , . . . , σ r are singular value elements of S r . Each row of A or p t i is a vector with size 1 × r . S r is a diagonal matrix with size r × r . Thus, we can simplify (18) to the following equation:
y t 1 y t 2 y t p = p t 1 S r p t 2 S r p t p S r
Then, we can have the following simple vector expression:
y t i = p t i S r .

3.2. Data and Dictionary Detection

Once we get the compressed and watermarked signal Y or y i , it is transmitted to the receiver; thus, we get the received signal Y or y i . The received signals along y i are U r and V r , as described in Section 2. One can choose whether to decompress the signal or to extract the watermark. Anyway, to decompress the signal, we need A or p t i using (22) for reconstructing y i to get S ^ r . It is clear that, either to extract the watermark or to decompress the signal, extracting A from y i is the first thing to be applied in the receiver since the compression and data hiding process is blind. Once we get A , then we can extract the data, or we can reconstruct y i with detected A to obtain S ^ r using (37), (39), (40), and (41). Thus, we can use SVD reconstruction for S ^ r , U r , and V r to obtain a square matrix or X r using (7). Finally, we get the reconstructed signal x by converting the two-dimensional matrix X r to the vector x . Clearly, the detection and reconstruction procedure is displayed in Figure 2 and Table 3.
For p t i detection, we need to correlate y t i to p j T as:
K i j = y t i p j T ,
where i = 1 , 2 , . . , p and j = 1 , 2 , . . , N p . From (21), there is an index of j whose correlation K i j is the highest, that is j = t . Thus, the formula to detect the correct index of the Hadamard sequence embedded into y t i is:
t i = argmax j 1 , 2 , . . . , N p y t i p j T .
Since we can detect t i , we decode the detected Hadamard code to the associated watermark bits according to one-to-one mapping between the index, the Hadamard code, and the associated watermark bits. For detection proving needs, assume there is no attack, then y t i = y t i . Thus, (21) is:
K i j = y t i p j T .
Substituting (20) into (23) results in:
K i j = p t i S r p j T .
Assume that t i = j ; thus, p t i = p j , then (24) is an autocorrelation as:
K a = p j S r p j T .
Assume that p j consists of such elements as:
p j = p j 1 p j 2 p j r ;
therefore, (24) becomes:
K a = p j 1 p j 2 p j r T σ 1 0 0 0 σ 2 0 0 0 σ r p j 1 p j 2 p j r .
By a matrix multiplication operation, (27) is described as:
K a = p j 1 2 σ 1 + p j 2 2 σ 2 + + p j r 2 σ r = i = 1 r p j i 2 σ i .
Since σ i > 0 and p j i 2 0 for all j and all i, then (28) becomes:
K a = i = 1 r p j i 2 σ i 0 .
If p t i = p k and p t i p j , then (24) is a cross-correlation as:
K c = p k S r p j T = p k 1 p k 2 p k r T σ 1 0 0 0 σ 2 0 0 0 σ r p j 1 p j 2 p j r
= p k 1 p j 1 σ 1 + p k 2 p j 2 σ 2 + + p k r p j r σ r = i = 1 r p k i p j i σ i .
Since p k is mutually orthogonal with p j , it is confirmed that K a is comparable to K c with the following inequality:
K a K c ,
which means that the autocorrelation of the same Hadamard sequence is still much higher than the cross-correlation of the different Hadamard sequence on the singular value intervention. It confirms that the Hadamard sequence can be detected successfully; thus, from (22), t i is detected for t i = { t 1 , t 2 , . . . , t p } , then we can obtain the associated watermark bits w ^ t i = { w ^ t 1 , w ^ t 2 , . . . , w ^ t p } and also all Hadamard sequences p ^ t i = { p ^ t 1 , p ^ t 2 , . . . p ^ t p } , which form A ^ using (13) and (14) as:
A ^ = 1 p p ^ t 1 p ^ t 2 p ^ t p ,
where p is the row number of A ^ . This procedure assures that there is no dictionary needed to detect the hidden data and also to reconstruct the signal, since the associated watermark bits w ^ t i are detected. Thus we can calculate the Bit Error Rate (BER) as a robustness parameter. The following equation is a BER formula:
BER = i = 1 L w | w i ^ w i | L w ,
where w i is the original watermark bit, w i ^ is the detected watermark bit, and L w is the total number of watermark bits.

3.3. Security Model

The Hadamard matrix is easily generated as described in (11). Anyone can attempt with the Hadamard matrix to reconstruct the dictionary to detect the hidden data and also to reconstruct the audio. This leads to insecure watermark bits hidden in the host audio, and accordingly, we apply a procedure to secure the Hadamard matrix as also discussed in [19,20,21]. The Hadamard matrix is multiplied by −1 at the row and the column of the matrix in a random manner. Denote l i { 1 , r } as an integer random permutation value where i = 1 , 2 , . . . , N l , and N l is the number of the generated integer random permutation value. Denote H s as a secured Hadamard matrix, H s ( j ) as a vector from the j th row of H s , and H s T ( j ) as a vector from the j th column of H s , then the security model of the Hadamard matrix after initial definition H s = H r is defined as:
H s ( l i ) = H r ( l i ) H s T ( l i ) = H r T ( l i )
The above procedure is repeated N l times from l 1 to l N l . Thus, with the secured Hadamard matrix, (12) is replaced by:
p j = H s ( j ) .
Note that H s is not only needed in the embedding process, but also in the detection/extraction process. However, it is not needed to pass H s to the detector directly. We only pass l i as the integer random permutation value to the detector as the security key. By the procedure (35), H s can be generated in the detector using l i as the key. According to [19,20], the modified Hadamard matrix combination using (35) has ( r ! 2 r ) 2 possibilities. For example, if r = 16 , the number of modified Hadamard matrix is 1 . 88 × 10 36 possibilities. If the simulation needs one second to run the detection and reconstruction process using one Hadamard matrix, then it needs 1 . 88 × 10 36 seconds or 5 . 962 × 10 28 years using all Hadamard matrix possibilities. This confirms that this proposed security model is appropriate and meets the security requirement for the embedding and compression process.

3.4. Signal Reconstruction

Once A ^ is obtained, S ^ r reconstruction is simply solved by Orthogonal Matching Pursuit (OMP) [22,23]. The reconstruction process is carried out on each column of Y in sequence with A ^ as a dictionary. Let y m as a vector taken from the m th column of Y , then for a general case, we can find the row position of the strongest atom as:
q m = argmax i 1 , 2 , . . . , p A ^ T y m .
For a specific case, i.e., a singular matrix solution as the reconstructed one, the position of the highest atoms are indeed known, then (37) can be simplified as:
q m = m .
Denote a r as a vector taken from the r th column of A ^ , then we take a column of A ^ , which makes the strongest atom as:
= a q m .
We reconstruct a non-zero element of S ^ r in column m by:
s q m = T 1 T y m .
This reconstruction procedure including (37), (39), and (40) is repeated r times with the increment of m, thus obtaining:
S ^ r = s q 1 0 0 0 s q 2 0 0 0 s q r .
Then, the next step is to form the signal by SVD reconstruction, as described in (7). Thus, finally, we can compute the signal quality.

3.5. Noisy Environment

Note that the compressed and watermarked audio in this paper is the coded audio. A human cannot directly listen to the coded audio without decoding it first. This means that the signal processing attacks against the coded audio are not the same as the attacks against the real audio signal. The signal processing attacks against the real audio signal were standardized in the Stirmark benchmark [24]. However, the Stirmark benchmark is not appropriate for the robustness evaluation of this proposed method except for the additive noise attack. The additive noise attack is the signal processing attack that we can generally use to evaluate the watermarking compression robustness. In the real situation, this additive noise attack in the receiver happens due to the existing thermal condition of the hardware. In this subsection, we describe mathematically how our proposed method is robust to additive noise attack. If the compressed and watermarked signal y i is under an additive noise environment, then (23) becomes:
K i j = y i + n i p j T = p t i S r + n i p j T = p t i S r p j T + n i p j T .
Assume p t i = p j , then (42) becomes:
K i j = p j S r p j T + n i p j T .
Because n i is independent of p j T , thus p j S r p j T n i p j T , then (43) becomes:
K i j p j S r p j T = K a .
Thus, we confirm that the data inserted with the proposed method can be detected even in the additive noise environment. The performance evaluation of the proposed method, when attacked by additive noise, depends on the power ratio between the host audio and the additive noise represented by the Signal-to-Noise power Ratio (SNR) with the following formula:
SNR = 10 log 10 i = 1 r y i 2 i = 1 r n i 2
where i is the row index at y and n, y i is the signal after being compressed using CS at row i, n i is the noise at row i, and r are the number of rows from y.

3.6. Feasible Parameters

In this paper, there is more than one work to do in the signal processing environment. The first work is to encode the watermark into the secure Hadamard code. The second work is to make the host audio be a sparse signal. The third work is to hide the coded watermark into the sparse signal by CS acquisition. Thus, there are two objects for performance analysis, the detected watermark and the reconstructed audio from the detected sparse signal. From the embedded watermark relative to the length of the host audio, we can calculate the watermark payload, as described in (17). We can also calculate the CR of the sparse technique and CS performance as described in (6) from the host audio length relative to the coded and compressed audio.
Mathematically, we can simply determine the trade-off parameters between the watermark payload and the CR as presented in (17) and (6), respectively. In (17) and (6), there are the three same parameters affecting the payload and the CR, M, r, and p, where M is the square root of the host audio length or the row/column number of the diagonal matrix ( S ), r is the row/column number of the truncated diagonal matrix ( S r ), and p is the sample number of the compressed signal or the row number of the output of CS acquisition ( Y ). First, we can see that p, r, and M 2 have different positions in (17) and (6). In (17), the positions of p and r are in the numerator, which means the decrease of p and r causes a lower payload. In (6), the positions of p and r are in the denumerator, which means the decrease of p and r causes a higher CR. Parameter M 2 also has a different position. This case certainly is a trade-off between payload and CR, for which we can find the moderate value of p and r to produce a high payload and high CR.
The relation between the three parameters p, r, and M is such that p r < M . Referring to (6), the above relation causes the denumerator p r 2 M r if M has a high value; thus:
CR M 2 2 M r = M 2 r .
Note that CR for compression must be more than one; thus, M / 2 r > 1 or r < M / 2 . This means that the minimum truncation for compression is applied at half of the diagonal matrix S R M × M , obtaining S r R M 2 × M 2 . Consequently, the relation of the three parameters becomes:
p r < M 2 .
Thus, we can exploit those three parameters in the above relation. Next, we find possible p and r values such that (17) reaches the maximum payload. The positions of parameters p and r are in the numerator of (17); thus, r should be set to the maximum value or M / 2 in order to obtain the maximum payload and p should be set to approximately r. Certainly, setting r to the maximum value or M / 2 obtains the minimum CR, then we have to be careful setting the r parameter since it controls the trade-off between C and CR. Due to its position, the p parameter should be to the maximum value for reaching the maximum payload. The maximum value of p is r. If p = r , then CS acquisition, as described in (4), produces an output with the same size as the input of CS. This condition is still acceptable when CR from (6) is more than one. CS acquisition still contributes to the watermarking process.
Figure 3a displays the payload versus CR with M { 34 , 66 , 98 , , 482 } and r { 0 . 01 M , 0 . 02 M , , 0 . 5 M } . All possibilities of the r and M combination with the restriction (47) are plotted as the magenta dots in Figure 3b. Blue dots in Figure 3a mean the mapping between the payload using Equation (17) and CR using Equation (6) where p = r , whereas magenta plus signs mean the mapping between the payload and CR where p = 1 . The red vertical dotted line means the minimum CR or one. The green horizontal dashed line means the minimum payload or 20 bps [25]. Thus, the area with feasible payload and CR is the right side of the red vertical dotted line and the top side of the green horizontal dashed line. We see that many blue dots have a higher payload and CR than the magenta plus signs, which means the payload and CR with p = r have many possibilities to reach much higher ones than the payload and CR with p = 1 . The payload and CR mapping displayed in the blue dots where payload > 20 bps and CR > 1 in Figure 3a are obtained from r and M in the blue circle in Figure 3b; thus, we set p = r for the experiment in the next section where the r and M combination values are selected from the blue circle in Figure 3b.

4. Experimental Result

We assess several evaluations in this section by simulations. The evaluation aspects of the proposed method included audio quality, security, watermark quality, watermark payload, and compression ratio level aspect. The simulations ran on ASUS notebooks using MATLAB with the following specifications, Advanced Micro Devices (AMD) Fx with 12 compute cores, 16 GB Random Access Memory (RAM), and Windows 10 operating system. There were 50 mono audio host files as the clips tested with different genres of music, sampling rate 44.1 kHz and 16 bit audio quantization. All clips were in the original wave files and licensed as free audio files for research [26]. The simulation output in this section showed the average of the simulation result. The evaluated performance parameters were the audio quality, the watermark robustness, the watermark payload, and CR. The Objective Difference Grade (ODG) represents the audio quality using Perceptual Evaluation of Audio Quality (PEAQ) [27]. Parameter C represents the watermark payload in bps as described in (16). Parameter BER represents the watermark robustness in (34). CR represents the Compression Ratio, as explained in (6).
We measured the audio quality between the original host audio and the reconstructed audio. The reconstructed audio quality was affected by two factors, the truncation of the diagonal matrix and the CS acquisition. The truncation of the diagonal matrix gave worse quality to the audio than the CS acquisition due to the loss of the audio signal information. The audio quality represented by ODG had a range from −4 to zero, where −4 meant the worst audio quality or the distortion was very annoying, −3 meant the distortion was annoying, −2 meant the distortion was slightly annoying, −1 meant the distortion was perceptible but not annoying, and 0 meant the best audio quality or the distortion was imperceptible [27].

4.1. Audio Quality Performance in Relation to r, M, Payload, and Compression Ratio

From Section 3.6, we selected M and r values to obtain CR > 1 and payload > 20 bps using p = r as displayed in Figure 3b with the blue circle symbol. Using the selected M and r from M { 34 , 66 , 98 , , 482 } and r { 0 . 01 M , 0 . 02 M , , 0 . 5 M } , we applied the simulation onto five clips as the hosts. The simulation consisted of the embedding process, the data detection process, and the audio reconstruction process. It calculated the BER between the detected watermark and the original watermark, and it finally calculated the audio quality from the reconstructed audio in the ODG performance parameter. The simulation results are displayed in Figure 4a,b. From the simulation using all combinations of parameters M and r with five clips, we obtained a perfect watermark detected without any errors or BER = 0 on average. Figure 4a shows the trade-off relation between CR and payload with a negative exponential relation. Red star symbols mean the mapping between CR and payload with ODG ≥−1, while blue dot symbols mean the mapping between CR and payload with ODG < −1. We also plot the blue dots and the red stars in Figure 4b, in the relation between ODG and M. We can say that the longer the length of audio processed for embedding and compression, the worse the reconstructed audio quality. For the above case with five selected clips, good reconstructed audio quality or ODG≥−1 was obtained when M < 128 samples with certain values of r.
The required M parameter did not have to be large until 482 samples, but only up to 128 samples to achieve audio quality with ODG ≥−1. Figure 4b shows the results. Furthermore, large M values had a long impact on the time processing of the insertion, detection, and reconstruction. Therefore, we applied the same simulation as the simulation displayed in Figure 4a,b using more detailed M and r, i.e., M { 5 , 6 , . . . , 128 } , r { 1 , 2 , . . . , 64 } , which was similar to r { 0 . 0156 M , 0 . 0234 M , . . . , 0 . 5 M } and 50 clips. We averaged the audio quality results from 50 clips, and all watermarks were perfectly detected. The simulation results are displayed in Figure 4c,d. From Figure 4d, there were many more options of M from five to 128, obtaining the results with ODG 1. The simulation as displayed in Figure 4c also obtained a high CR (up to 7.03) and a high payload (up to 8296 bps). To explore which M and r obtained the above result, we also captured the simulation results in the table. Table 4, Table 5 and Table 6 respectively display the10 highest ODG, payload, and CR with certain M and r. This simulation result generally showed that we could control the audio quality, payload, and CR by adjusting the M and r parameters.
We applied the simulation using 50 clips with M = 32 and r { 1 , 2 , . . . , 16 } , which was similar to r { 0 . 03 M , 0 . 06 M , . . . , 0 . 5 M } , to see how the audio truncation affected the performance parameters. Figure 5a displays the simulation result. This case also produced a perfect detected watermark or BER = 0 on average. Three performance parameters, i.e., ODG, CR, and payload as the y-axis, are displayed in one figure after being averaged, and the x-axis is the normalized rank or r / M { 0 . 03 , 0 . 06 , . . . , 0 . 5 } . The black line with the right triangle symbol shows the average ODG producing −1.16 to −0.16. The blue line with a square symbol shows the payload of an embedded watermark in bps, obtaining 172.26 to 44,100 bps. The red line with a circle symbol means the CR of the encoded audio resulting from 0.20 to 7.53. The red horizontal line with the dashed-dotted symbol means the minimum CR or CR = 1. We can see that increasing the normalized rank represented by r / M raised the ODG and the watermark payload, but lowered the CR of encoded audio. If the CR with the red line and circle symbol was less than the minimum CR, then it meant the CS process did not compress the audio signal overall; instead, it increased the length of the encoded signal. In this case, we could select the normalized rank less than 0.2 or r / M 0 . 2 to maintain the CR to be more than one. In more detail, we could limit the minimum r / M such that ODG > −1, i.e., r / M 0 . 1 . Thus, the selected range of normalized rank was [ 0 . 1 , 0 . 2 ] , obtaining the watermark payload with the range [ 729 , 5292 ] bps, the compressed ratio with the range [ 1 . 47 , 4 . 84 ] , and ODG with the range [ 0 . 94 , 0 . 74 ] . The r / M restriction for this case maintained good quality of the reconstructed audio with a high payload and CR > 1.

4.2. Complexity and Computational Time

The major components of the proposed data hiding and compression method in this paper consisted of DCT, the multi-bit SS mapping, singular value decomposition, and the CS acquisition process in the embedding and multi-bit SS de-mapping, SVD reconstruction, and audio decoding via CS reconstruction and IDCT. Each component had a different complexity. The SVD process to obtain U R M × M , S R M × M , and V R M × M from X R M × M had a complexity of O ( M 3 ) [28]. When we needed to get X from U , S , and V as (3), its complexity was O ( M 2 . 37 ) [29]. DCT and IDCT described in (8) and (9) had a complexity of O ( N p 2 ) where N p is the number of the DCT points, and N p = M in this case. The CS acquisition in (4), which was also the multi-bit SS embedding, had a complexity of O ( p r 2 ) . The multi-bit SS detection, as described in (22), had a complexity of O ( r 3 ) . Finally, the audio reconstruction by OMP approach in (40) had a complexity of O ( p 2 r ) . Due to the relation p r < M , the highest computational cost was found in the singular value decomposition, i.e., O ( M 3 ) ; thus, the overall complexity of the components was dominated by the SVD. This finding confirmed the reason to use the lower M value. However, we still needed to check the computational time by the simulation to find out a proper M value to avoid a very long processing time.
We applied the simulation to find out the computational time, which should represent the complexity of the embedding and the detection stage. In the simulation, we applied parameter M from 16 to 1024 with multiples of a power of two, parameter r = 0 . 125 M , r = 0 . 25 M , and r = 0 . 5 M . We used 10 clips in the simulation, and we averaged the time processing result. The result is displayed in Figure 5a. The processing time exponentially increased when M rose. Parameter r / M had no significant impact on the computational time. From this figure, lower M was recommended due to the low computational time. Moreover, as confirmed in Section 4.1, the lower M had a significant impact on the reconstructed audio quality.

4.3. Security Analysis

In Section 3.3, there were two parameters having an impact on the model security, i.e., N l as the number of the generated integer random permutation value and r as the row and column number of the diagonal matrix after being truncated, S r . The original Hadamard matrix is denoted as H r , and the secured Hadamard matrix is denoted as H s . We applied the simulation varying r and N l to understand how much r and N l affected the security performance. In the real situation, one can try to break the security model by using the original Hadamard matrix for detecting the watermark and reconstructing the audio due to the simplicity of the Hadamard matrix generation. With the secured Hadamard matrix in the encoder, we applied the decoding by the original Hadamard matrix to analyze the strength of the security model. If the security model worked well, the detection watermark should ideally be damaged, or the BER should be near 0.5.
In the simulation, we assumed p = r = 20 and M = 128 samples. N l varied from zero to r. Parameter N l was zero, meaning that H s = H r . We used five clips for analysis by calculating the average BER after the watermark detection process. We applied the simulation in 100 iterations for each clip. The simulation result is shown in Figure 6a. The worst detected watermark was obtained when N l was half of r, and the perfect watermark was detected when N l = 0 and N l = r = 20 . We could limit the accepted minimum BER to restrict the value of N l . We chose BER = 0.4 as a safe minimum BER because we could still interpret the digital visualization from the detected watermark with BER < 0.3 [30]. Therefore, we chose N l > 6 or generally N l > 0 . 3 r as the minimum value of N l and N l < 14 or generally N l < 0 . 7 r as the maximum value of N l to keep the detected watermark uninterpretable when one tries to detect the watermark by the original Hadamard matrix.
Figure 6b shows the relation between BER and r and compares the detected watermark quality using the different N l / r . The simulation was applied to 50 clips via 10 iterations for each clip. The range of r was [ 6 , 30 ] . The worst watermark was detected when N l / r = 0 . 5 . The detected watermark quality was better when N l / r decreased and as the value of r increased. When N l / r = 0 . 3 , most of the BER values were more than 0.4. This result confirmed the restriction of N l in the range [ 0 . 3 r , 0 . 7 r ] .

4.4. Noisy Environment

In the noisy environment, our proposed method was robust to additive noise attack as confirmed mathematically in Section 3.5. Nevertheless, it was necessary to know how robust the method was if the additive noise attacks the encoded audio by simulation. We analyzed the detected watermark quality represented by BER and the reconstructed audio quality represented by ODG as two performance parameters affected by the additive noise. In the simulation, we used 50 clips with 50 iterations for each clip, M = 23 , r = 6 , and p = r . The additive noise parameter or the input parameter for the simulation was SNR, as described in (45), whose range was 0 to 40 dB. ODG and BER as the performance parameters obtained were averaged, as displayed in Figure 7. Decreasing the noise power or increasing the SNR rose the reconstructed audio quality or ODG and the detected watermark quality or BER.
We embedded the watermark image with the letters “ITB” and a resolution of 20 × 35 to understand the interpretation of the value of BER. The detected watermarks are displayed in Table 7 with various BER. We used one selected clip as the audio host using parameters M = 256 , r = 100 , and p = r . The original watermark image is shown at the very bottom of Table 7, since its BER was zero. We used the additive noise as the attack with various SNR from 0 to 55 dB. The detected watermark was interpretable as “ITB” when the SNR of the noise was more than 25 dB or its BER was less than 10%. Thus, the maximum acceptable BER for the detected watermark was up to 10%. In Figure 7, BER less than 10% could be achieved on an SNR of 10 dB and above. This meant that the detected watermark was already interpretable when the noise power was still half of the signal power. Furthermore, ODG was already more than −1. These results confirmed the robust proposed method of additive noise. The reconstructed audio was also robust to the additive noise since the ODG already achieved more than −1 when the SNR was still 10 dB.

4.5. Method Comparison to References

As described in Section 1, there are several references related to this proposed method. We proposed a new method with more benefits than the mentioned references. Our proposed method could be used for both audio watermarking or audio steganography with compression due to the controllable parameter between the payload, the audio quality, and the compression ratio. Besides, our proposed method produced the encoded audio, which could not be attacked by a general signal processing attack, i.e., Stirmark benchmark, except the additive noise as described in Section 3.5. Table 8 displays the comprehensiveness comparison between our proposed method and the previous references, which also used CS as the embedding or compression method and the audio as the object to embed or to compress. From the previous references in Table 8, the reference [1] only described the robustness as only one performance parameter, although his method had the same purpose as our method. The reference [2] proposed the hiding method only. The reference [3] proposed the audio compression scheme only. In detail, we could only compare the performance to [4] because of its comprehensiveness performance, and the performance parameters were the closest to our method.
Table 9 displays the performance comparison between our proposed method and [4]. In [4], Fakhr described the performance of four techniques. The audio quality was quite imperceptible since SNR = 28 dB. Our method was also quite imperceptible since the ODG range was [−0.94 −0.74]. Although [4] had better robustness to additive noise attack with SNR 20 dB where maximum BER = 3% and our method obtained maximum BER = 13%, our method had an outstanding payload compared to the payload in [4]. Note that the experiments in [4] only used one clip to obtain the performance in BER, SNR, and payload. In contrast, our method obtained the average performance from the simulation results of 50 clips. Our method also reported the compression ratio with the range of [1.47 4.84], while [4] did not report the compression ratio.

5. Conclusions

In this paper, we proposed and reported a novel audio watermarking method with the CS technique, which attempted to insert the watermark into the host audio and simultaneously compressed the audio that was inserted by the watermark so that the watermarked audio had a smaller size. We also provided the security aspect of this proposed method using a secure Hadamard matrix. The proposed method worked well in a noiseless and noisy environment by mathematical derivation. Parameter performance, such as payload, CR, ODG, and BER, was reported in this paper. The experimental result showed that the proposed method presented a high imperceptibility property with payload in the range of 729–5292 bps and a compression ratio of 1.47–4.84. There was a trade-off relation between payload and CR. We could choose the performance, specifically adapting to the requirements.

Author Contributions

Conceptualization, G.B. and A.B.S.; methodology, G.B., A.B.S., and D.D.; software, G.B.; validation, A.B.S. and D.D.; formal analysis, A.B.S. and D.D.; investigation, G.B.; resources, G.B.; data curation, G.B.; writing, original draft preparation, G.B.; writing, review and editing, G.B. and A.B.S.; visualization, G.B.; supervision, A.B.S. and D.D.; project administration, A.B.S. and D.D.; funding acquisition, A.B.S. All authors read and agreed to the published version of the manuscript.

Funding

This research was funded by Institut Technology Bandung and the Ministry of Research, Technology and Higher Education of Indonesia in 2020.

Acknowledgments

This research is supported by Institut Teknologi Bandung and Telkom University.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CSCompressed Sensing/Compressive Sampling
WHTWalsh Hadamard Transform
DCTDiscrete Cosine Transform
LSBLeast Significant Bit
TLCKarhunen-Loeve Transform
MP3Motion Picture Experts Group Audio Layer 3
SVDSingular Value Decomposition
CRCompression Ratio
IDCTInverse DCT
SSSpread Spectrum
BERBit Error Rate
OMPOrthogonal Matching Pursuit
SNRSignal-to-Noise power Ratio
ODGObjective Difference Grade
PEAQPerceptual Evaluation of Audio Quality
AMDAdvanced Micro Devices
RAMRandom Access Memory

References

  1. Hua, G.; Xiang, Y.; Bi, G. When Compressive Sensing Meets Data Hiding. IEEE Signal Process. Lett. 2016, 23, 473–477. [Google Scholar] [CrossRef]
  2. Zhaofeng, M.; Xinxin, N.; Yixian, Y. Compressive Sensing-Based Audio Semi-fragile Zero-Watermarking Algorithm. Chin. J. Electron. 2015, 24, 492–497. [Google Scholar] [CrossRef]
  3. Griffin, A.; Hirvonen, T.; Tzagkarakis, C.; Mouchtaris, A. Single-Channel and Multi-Channel Sinusoidal Audio Coding Using Compressed Sensing. IEEE Trans. Audio Speech Lang. Process. 2011, 19, 1382–1395. [Google Scholar] [CrossRef] [Green Version]
  4. Fakhr, M.W. Robust Watermarking Using Compressed Sensing Framework With Application To MP3 Audio. Int. J. Multimed. Its Appl. (IJMA) 2012, 4, 27–43. [Google Scholar] [CrossRef]
  5. Pan, J.S.; Duan, J.J.; Li, W. A dual watermarking scheme by using compressive sensing and subsampling. In Intelligent Data Analysis and Applications; Springer: London, UK, 2015; pp. 381–389. [Google Scholar]
  6. Pan, J.S.; Li, W.; Yang, C.S.; Yan, L.J. Image steganography based on subsampling and compressive sensing. Multimed. Tools Appl. 2015, 74, 9191–9205. [Google Scholar] [CrossRef]
  7. Patsakis, C.; Aroukatos, N. LSB and DCT steganographic detection using compressive sensing. J. Inf. Hiding Multimed. Signal Process. 2014, 5, 20–32. [Google Scholar]
  8. Pawlak, M. M-Ary Phase Modulation for Digital Watermarking. Int. J. Appl. Math. Comput. Sci. 2008, 18, 93–104. [Google Scholar] [CrossRef]
  9. Xiang, Y.; Natgunanathan, I.; Rong, Y.; Guo, S. Spread Spectrum Based High Embedding Capacity Watermarking Method for Audio Signals. IEEE/ACM Trans. Audio Speech Lang. Process. 2015, 9290, 2228–2237. [Google Scholar] [CrossRef]
  10. Xiang, Y.; Member, S.; Natgunanathan, I.; Peng, D.; Hua, G.; Liu, B. Spread Spectrum Audio Watermarking Using Multiple Orthogonal PN Sequences and Variable Embedding Strengths and Polarities. IEEE/ACM Trans. Audio Speech Lang. Process. 2018, 26, 529–539. [Google Scholar] [CrossRef]
  11. Sabrina, M.B.; Khaled, R.; Salim, A.; Djamel, C. Spreading codes performances comparison in terms of Bit-Error-Rate in additive white Gaussian noise channel. In Proceedings of the 2015 4th International Conference on Electrical Engineering (ICEE), Boumerdes, Algeria, 13–15 December 2015; pp. 1–6. [Google Scholar] [CrossRef]
  12. Yadav, S.K.; Sinha, R.; Bora, P.K.; Formulation, A.P.; Art, B.P. An Efficient SVD Shrinkage for Rank Estimation. IEEE Signal Process. Lett. 2015, 22, 2406–2410. [Google Scholar] [CrossRef]
  13. Rusu, C.; Dumitrescu, B. Stagewise K-SVD to Design Efficient Dictionaries for Sparse Representations. IEEE Signal Process. Lett. 2013, 19, 631–634. [Google Scholar] [CrossRef]
  14. Zhang, S.; Zhu, Y.; Dong, G.; Kuang, G. Truncated SVD-Based Compressive Sensing for Imaging With Uniform / Nonuniform Linear Array. IEEE Geosci. Remote Sens. Lett. 2015, 12, 1853–1857. [Google Scholar] [CrossRef]
  15. Irawati, I.D.; Suksmono, A.B.; Edward, I.J.M. End to End Internet Traffic Measurement Model Based on Compressive Sampling. In Transactions on Engineering Technologies; Springer: Singapore, 2020; pp. 93–104. [Google Scholar]
  16. Wang, X.Y.; Zhao, H. A novel synchronization invariant audio watermarking scheme based on DWT and DCT. IEEE Trans. Signal Process. 2006, 54, 4835–4840. [Google Scholar] [CrossRef]
  17. Guo, J.; Xie, R.; Jin, G. An Efficient Method for NMR Data Compression Based on Fast Singular Value Decomposition. IEEE Geosci. Remote. Sens. Lett. 2019, 16, 301–305. [Google Scholar] [CrossRef]
  18. Strang, G. Linear Algebra and Its Applications; Thomson Learning: Boston, MA, USA, 2006; pp. 1–497. [Google Scholar]
  19. Ville, D.V.D.; Philips, W.; Walle, R.V.D.; Lemahieu, I.; Member, S.; Van De Ville, D.; Philips, W.; Van de Walle, R.; Lemahieu, I. Image scrambling without bandwidth expansion. IEEE Trans. Circuits Syst. Video Technol. 2004, 14, 892–897. [Google Scholar] [CrossRef] [Green Version]
  20. Pal, S.K. Fast, Reliable Secure Digital Communication Using Hadamard Matrices. In Proceedings of the 2007 International Conference on Computing: Theory and Applications (ICCTA’07), Kolkata, India, 5–7 March 2007; pp. 526–532. [Google Scholar] [CrossRef]
  21. Bouguezel, S.; Ahmad, M.O.; Swamy, M.N.S. Binary Discrete Cosine and Hartley Transforms. IEEE Trans. Circuits Syst. Regul. Pap. 2013, 60, 989–1002. [Google Scholar] [CrossRef]
  22. Schnass, K. Average performance of Orthogonal Matching Pursuit ( OMP ) for sparse approximation. IEEE Signal Process. Lett. 2018, 25, 1865–1869. [Google Scholar] [CrossRef] [Green Version]
  23. Chen, Y.; Li, G.; Gu, Y. Active Orthogonal Matching Pursuit for Sparse Subspace Clustering. IEEE Signal Process. Lett. 2018, 25, 164–168. [Google Scholar] [CrossRef] [Green Version]
  24. Lang, A.; Dittmann, J.; Spring, R.; Vielhauer, C. Audio Watermark Attacks: From Single to Profile Attacks. In MM&Sec ’05 Proceedings of the 7th workshop on Multimedia and security; ACM: New York, NY, USA, 2005; pp. 39–50. [Google Scholar] [CrossRef]
  25. Katzenbeisser, S.; Petitcolas, F.A.P. Information Hiding Techniques for Steganography and Digital Watermarking; Artech House, Inc.: Boston, MA, USA, 2000; p. 237. [Google Scholar]
  26. Turetsky, R.; Ellis, D. Ground-Truth Transcriptions of Real Music from Force-Aligned MIDI Syntheses. In Proceedings of the 4th International Conference on Music Information Retrieval, Baltimore, MD, USA, 26–30 October 2003. [Google Scholar]
  27. ITU-R, R. Recommendation ITU-R BS.1387-1 Method for Objective Measurements of Perceived Audio Quality; Technical Report; International Telecommunications Union: Geneva, Switzerland, 1998. [Google Scholar]
  28. Cline, A.K.; Dhillon, I.S. Computation of the Singular Value Decomposition; CRC Press: Boca Raton, FL, USA, 2006. [Google Scholar]
  29. Gall, F.L. Powers of Tensors and Fast Matrix Multiplication. arXiv 2014, arXiv:1401.7714. [Google Scholar]
  30. Lei, B.; Soon, I.Y.; Tan, E.l. Robust SVD-Based Audio Watermarking Scheme With Differential Evolution Optimization. IEEE Trans. Audio Speech Lang. Process. 2013, 21, 2368–2378. [Google Scholar] [CrossRef]
Figure 1. Watermark embedding and audio encoding.
Figure 1. Watermark embedding and audio encoding.
Applsci 10 04338 g001
Figure 2. Watermark detection and audio decoding.
Figure 2. Watermark detection and audio decoding.
Applsci 10 04338 g002
Figure 3. Finding feasible M and r to obtain payload >20 bps and CR > 1.
Figure 3. Finding feasible M and r to obtain payload >20 bps and CR > 1.
Applsci 10 04338 g003
Figure 4. ODG in relation to M, payload, and CR.
Figure 4. ODG in relation to M, payload, and CR.
Applsci 10 04338 g004
Figure 5. Three performance parameters affected by r / M and computational time.
Figure 5. Three performance parameters affected by r / M and computational time.
Applsci 10 04338 g005
Figure 6. BER in relation to N l and r using a different Hadamard matrix between encoding and decoding.
Figure 6. BER in relation to N l and r using a different Hadamard matrix between encoding and decoding.
Applsci 10 04338 g006
Figure 7. Additive noise effect and the detected watermark in certain SNR.
Figure 7. Additive noise effect and the detected watermark in certain SNR.
Applsci 10 04338 g007
Table 1. Embedding process.
Table 1. Embedding process.
Step 1:Read a host signal x ( n ) , and transform it into the frequency domain by DCT L-point obtaining X ( k )
Step 2:Reshape X ( k ) in L and sample it to a 2D square matrix producing X with size M × M
Step 3:Decompose X to U , S , and V using SVD
Step 4:Reduce the matrix size of U , S , and V with rank r to U r , S r , and V r
Step 5:Generate the A matrix containing p Hadamard sequences by mapping each multi-watermark bit to an associated random Hadamard sequence using (13)
Step 6:Apply CS acquisition to A and S r by (4), producing Y
Step 7:Transmit the compressed signal with hidden data represented using U r , Y , and V r
Table 2. Watermarks and Hadamard sequences’ example for N s = 3 , N p = 8 , and r = 8 .
Table 2. Watermarks and Hadamard sequences’ example for N s = 3 , N p = 8 , and r = 8 .
Index ( t i )Watermark Bits ( w i )Hadamard Sequence ( p t i )
1{−1,−1,−1}{+1,+1,+1,+1,+1,+1,+1,+1}
2{−1,−1,+1}{+1,−1,+1,−1,+1,−1,+1,−1}
3{−1,+1,−1}{+1,+1,−1,−1,+1,+1,−1,−1}
4{−1,+1,+1}{+1,−1,−1,+1,+1,−1,−1,+1}
5{+1,−1,−1}{+1,+1,+1,+1,−1,−1,−1,−1}
6{+1,−1,+1}{+1,−1,+1,−1,−1,+1,−1,+1}
7{+1,+1,−1}{+1,+1,−1,−1,−1,−1,+1,+1}
8{+1,+1,+1}{+1,−1,−1,+1,−1,+1,+1,−1}
Table 3. Detection and reconstruction process.
Table 3. Detection and reconstruction process.
Step 1:Detect t i from Y using (22) for extracting the hidden data
Step 2:Associate detected t i with p t i , and form A ^ using (13)
Step 3:Reconstruct Y using A ^ by (37), (39), (40), and (41) to obtain S r
Step 4:Reconstruct U r , S r , and V r by SVD reconstruction to obtain the decompressed signal in 2D matrix X r by (7)
Step 5:Reshape 2D matrix X r to a 1D matrix, obtaining X ( k )
Step 6:Transform X ( k ) to the time domain by the IDCT L-point, obtaining the reconstructed signal x ( n )
Table 4. 10 highest ODG.
Table 4. 10 highest ODG.
rMODGCCR
28−0.022756.251.60
27−0.022756.251.60
25−0.034833.331.11
210−0.0317642.08
29−0.0317642.08
26−0.0349001.12
312−0.045512.501.45
311−0.045512.501.45
310−0.045512.501.45
212−0.0512252.57
Table 5. 10 highest payloads.
Table 5. 10 highest payloads.
rMODGCCR
520−0.298268.751.23
519−0.298268.751.23
518−0.298268.751.23
517−0.298268.751.23
516−0.298268.751.23
624−0.328268.751.14
623−0.328268.751.14
622−0.328268.751.14
621−0.328268.751.14
620−0.328268.751.14
Table 6. 10 highest compression ratios.
Table 6. 10 highest compression ratios.
rMODGCCR
230−0.991967.03
229−0.991967.03
227−0.992256.53
228−0.992256.53
225−0.94260.956.04
226−0.94260.956.04
223−0.82306.255.54
224−0.82306.255.54
222−0.68364.475.04
221−0.68364.465.04
Table 7. The detected watermark in certain SNR.
Table 7. The detected watermark in certain SNR.
SNRBER (%)Detected Watermark
024.1 Applsci 10 04338 i001
518.7 Applsci 10 04338 i002
1017.4 Applsci 10 04338 i003
1514.3 Applsci 10 04338 i004
2011.7 Applsci 10 04338 i005
258.8 Applsci 10 04338 i006
306.1 Applsci 10 04338 i007
355.7 Applsci 10 04338 i008
405.6 Applsci 10 04338 i009
453.6 Applsci 10 04338 i010
500.4 Applsci 10 04338 i011
550 Applsci 10 04338 i012
Table 8. Comprehensiveness comparison.
Table 8. Comprehensiveness comparison.
Ref.Hiding MethodAudio ReconstructionAudio QualityRobustnessPayloadCompression Ratio
[1]Watermark Projection××××
[2]Semi-Fragile Zero Watermarking××××
[3]-××
[4]Basis Pursuit Denoising×
ProposedMulti-Bit Spread Spectrum
Table 9. Performance comparison.
Table 9. Performance comparison.
Ref.ClipsAudio QualityBERC (bps)CR
[4]1SNR = 28 dB0–3%11–344not reported
Proposed50ODG = [−0.94 −0.74]0–13%729–52921.47–4.84

Share and Cite

MDPI and ACS Style

Budiman, G.; Suksmono, A.B.; Danudirdjo, D. Compressive Sampling with Multiple Bit Spread Spectrum-Based Data Hiding. Appl. Sci. 2020, 10, 4338. https://doi.org/10.3390/app10124338

AMA Style

Budiman G, Suksmono AB, Danudirdjo D. Compressive Sampling with Multiple Bit Spread Spectrum-Based Data Hiding. Applied Sciences. 2020; 10(12):4338. https://doi.org/10.3390/app10124338

Chicago/Turabian Style

Budiman, Gelar, Andriyan Bayu Suksmono, and Donny Danudirdjo. 2020. "Compressive Sampling with Multiple Bit Spread Spectrum-Based Data Hiding" Applied Sciences 10, no. 12: 4338. https://doi.org/10.3390/app10124338

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop