Next Article in Journal
Characterization and Water Content Estimation Method of Living Plant Leaves Using Terahertz Waves
Previous Article in Journal
Aerodynamic Performance of Quadrotor UAV with Non-Planar Rotors
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Autocorrelation Modulation-Based Audio Blind Watermarking Robust Against High Efficiency Advanced Audio Coding

1
Department of Copyright Protection, Sangmyung University, Seoul 03016, Korea
2
Department of Electronics Engineering, Sangmyung, University, Seoul 03016, Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2019, 9(14), 2780; https://doi.org/10.3390/app9142780
Submission received: 6 June 2019 / Revised: 5 July 2019 / Accepted: 8 July 2019 / Published: 10 July 2019

Abstract

:
High Efficiency Advanced Audio Coding (HE-AAC) is a lossy compression method for digital audio data which supplies high-quality audio at a very low bit rate. In this paper, the audio blind watermarking algorithm, on the basis of autocorrelation modulation, is introduced to maximize the robustness against low-bit rate HE-AAC. The watermark is embedded by modulating the normalized correlation of the original signal as well as its delayed version. The signal-to-noise ratio of before and after HE-AAC compression decides the strength of the embedding watermark. The watermarking embedding strength is guaranteed by the feedback process. The effectiveness of the proposed method is proven using the Perceptual Evaluation of Audio Quality algorithm and bit error rate of recovered watermarks under HE-AAC compression on mono, stereo and 5.1 channel audio. Experimental results show that the proposed method provides good performance in terms of imperceptibility, robustness and data payload compared with some recent state-of-the-art watermarking methods under an MPEG-2 Audio Layer III (MP3) compression attack.

1. Introduction

Over the last couple of years, High Efficiency Advanced Audio Coding (HE-AAC) [1,2,3] has become one of the most important enabling technologies for state-of-the-art multimedia systems. It delivers compact disc (CD) quality stereo at 48 kbps and 5.1 channel surround sound at 128 kbps. This efficiency level is the optimal choice for current internet content delivery. At the same time, it has basically realized novel applications in mobile digital broadcasting and mobile markets. It also enhances audio services while enhancing video services, such as digital television. Without reducing the quality of the audio signal, more bits can be distributed to video signal by coupling HE-AAC with MPEG-4 video.
Though HE-AAC is popular in the field of multimedia systems, as far as we know, there is no audio watermarking related scientific paper that have developed a watermarking algorithm for HE-AAC or have evaluated performance under it. Therefore, the objective of this study was to design an audio watermarking algorithm to maximize the robustness against low-bit rate HE-AAC compression. This paper is an expansion of our previously published conference paper [4]. We illustrate this algorithm in more detail through more experiments and visualization. The parameters of the proposed algorithm are optimized by thorough experiments, and its performance on multi-channel audio is also added. Moreover, synchronization design is discussed.
The rapid growth of computer multimedia technology as well as the broad utilization of the internet have promoted the distribution as well as transmission of digital multimedia content. Digital watermarking [5] refers to the procedure of embedding data into digital multimedia content such as audio, video, and image. It was initially utilized for security-related purposes like copyright protection and source tracking, but nowadays it is dedicated to various non-security-oriented applications [6], such as broadcast monitoring and automatic content recognition.
An audio watermarking method must comply with the following requirements [7]. (1) Imperceptibility: The perception of the original audio signal is similar to that of the watermarked audio signal. (2) Security: Watermarks can only be detected by authorized personnel. (3) Payload: It is a number of bits in which audio can be inserted in a unit of time. The watermarking data payload needs to be greater than 20 bps (bit/second). (4) Robustness: This means that the watermark should be resistant to common malicious attacks as well as signal processing. However, there is no algorithm that can meet all the demands mentioned above. The goal of the watermarking algorithm is to achieve an appropriate trade-off between requirements. Security-oriented applications may require high robustness and security, because it is more likely to receive malicious attacks. By contrast, non-security-oriented applications may require high payload and a certain degree of robustness against one or two specific attacks that are known in advance.
The organization of the rest of the thesis is as follows. In the next part, we give a simple instruct of HE-AAC and audio watermarking algorithms. In the third part, we introduce the audio watermarking algorithm in detail. Experimental results are discussed in Section 4, and these evaluate the function of the algorithm in the data payload, robustness under compression attack, and imperceptibility aspects. Finally, Section 5 summarizes and analyzes the future direction of development.

2. Related Works

2.1. HE-AAC

HE-AAC is an extension of low complexity AAC (AAC-LC), which provided an optimization for those applications of low bit rate like streaming audio. It has the characteristics of standardization and has become a configuration file for the MPEG-4 audio standard. Two versions of HE-AAC are available: HE-AAC v1 and HE-AAC v2. HE-AAC v1 includes two principal technologies, spectral band replication (SBR) and AAC-LC. In contrast to version 1, version 2 additionally uses a technique named parametric stereo (PS) that compresses stereo signals more effectively. Figure 1 shows the HE-AAC techniques family.
AAC-LC is a commonly used audio compress codec. It removes the audio signal by utilizing a psychoacoustic model and only keep auditory information. Though it has good audio quality at a bit rate of 128 kbps for mono audio, the audio quality begins to significantly decline below this bit rate. In order to achieve good audio quality as well as a low compression bit rate, two complementary technologies, SBR and PS, are exploited.
SBR can improve compression efficiency in the frequency domain. At low frequencies, there is a lot of important audio information. SBR utilizes the strong relevance between high frequency and low frequency audio signals to reconstruct the high frequency signals through approximation as well as the transposition of low frequency signals rather than transfer the data of high frequency audio.
The compression efficiency of stereo signal is improved by PS. PS technology mixes stereo signals downward into mono-channel signals, extracts parametric stereo data, and describes the differences as well as similarities of two channels. When decoding, the original stereo signal is reconstructed using parametric stereo data and a mono-channel signal.

2.2. Audio Watermarking Algorithms

Over the past few years, most proposed audio watermarking algorithms [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24] can be grouped into two categories: Spread spectrum (SS) modulation [8,9,10] and quantization index modulation (QIM) [11,12,13,14]. All these methods utilize either a time domain [15,16] or a transform domain such as fast Fourier transform [17], discrete cosine transform [18], or discrete wavelet transform [19]. From the watermark extraction mode, digital watermarking algorithms can be divided into three schemes: Blind, semi-blind and non-blind. Among them, blind methods, which extract watermarks without knowing the host audio signal, are highly desirable, because semi-blind and non-blind methods are not applicable to most practical applications.
The SS-based watermarking scheme embeds a watermark bit into a host audio segment by using pseudorandom number sequences which are shaped to fit under the masking threshold of the audio signal. At the extractor, the watermarks are extracted by a sliding correlator that correlates the received signal to the predefined spread spectrum template. One of the main drawbacks of the existing SS-based audio watermarking methods is their low embedding capacity. Hence, their main task is to increase embedding capacity while maintaining high imperceptibility and robustness.
The QIM approach uses a set of quantizers to quantify the host signal features to embed watermarking data, and each quantizer is associated with different information. In most cases, to obtain high robustness, quantization is implemented to some coefficients in the transform domain rather than to signal samples. Though the QIM approach usually has merits of low complexity and high embedding capacity, it has low robustness.
Another audio watermarking algorithm is autocorrelation modulation [20,21,22]. Its basic idea is that inserting a delayed or advanced version of the host signal itself can modify autocorrelation. In other words, it uses a delayed and modulated version of the host signal as a watermark signal (the difference between the host and watermarked signal), which is then inserted back to the host signal to make the final watermarked signal. A common watermarking approach is to introduce the watermark signal as noise, but a drawback to these approaches is that lossy audio compression algorithms tend to remove most imperceptible artifacts, including typical low dB noise. Autocorrelation modulation introduces changes to the host signal that are characteristic of environmental conditions rather than random noise. Therefore, using a host signal as a watermark signal can be robust against lossy audio compression. In this paper, we present an autocorrelation modulation-based audio blind watermarking algorithm which maximizes robustness against low-bit rate HE-AAC compression. In the next section, the detail of the proposed audio watermarking method is described.

3. Proposed Audio Watermarking Method

This section first provides the following definitions:
  • The host audio signal x ( t ) is frequency bandpass filtered to outputs the filtered audio signal   s ( t ) which is further modulated to be a watermark signal. There are two purposes for the frequency bandpass filtering process. One is to acquire an amount of the host signal, which will reduce disturbance to the audio signal. The other is to avoid embedding watermarks around a high frequency, as the HE-AAC will entirely remove the signals at the range of high frequency on encoding process. The lower and upper cutoff frequencies are denoted as F l c and F u c , respectively.
  • Then, the filtered audio signal is divided into successive frames, each of which has length T f   and contains two non-overlapping sections of samples. These two sections have equal length, and we call them as the front subframe and the back subframe in this paper.
  • One piece of watermark information is represented as one binary bit of value ‘0’ or ‘1,’ which is embedded in one frame.
  • The normalized correlation of the original signal and its delayed version (NCOD) was selected as the characteristic of each subframe. The embedding and extracting of watermark bits are decided by the difference of NCOD relations between the front subframe and back subframe. The NCOD item is calculated as follows:
    C i 1 = t = 0 T [ s ( T f · i + t ) · s ( T f · i + t + τ ) ]
    N i 1 = t = 0 T f / 2 s ( T f · i + t ) 2
    N C i 1 = C i 1 / N i 1
    C i 2 = t = T f / 2 T f / 2 + T [ s ( T f · i + t ) · s ( T f · i + t + τ ) ]
    N i 2 = t = T f / 2 T f s ( T f · i + t ) 2
    N C i 2 = C i 2 / N i 2
    where i   ( i = 0 , 1 , 2 ) represents the frame index. The integration time T (in samples) should be   T T f τ to avoid intersymbol interference. For a frame, the   N C i 1 and N C i 2   are the NCOD values of the front subframe and back subframe, respectively. Since the correlation value is normalized,   N C [ 1 , 1 ] .
  • The difference between N C i 1   and   N C i 2   is computed to obtain:
    D i = N C i 1 N C i 2
If the watermark bit is ‘1,’ D i should larger than 0 ( D i > 0 ). If the watermark bit is ‘0,’ D i should smaller than 0 ( D i < 0 ). Basically, raising | D i | enhances robustness. On the basis of the value of   D i , one bit of watermark information is embedded and extracted from each frame.

3.1. Watermark Embedding Scheme

Figure 2 presents the process of the watermark embedding. Firstly, bandpass filtering is applied to the host signal x ( t ) to get the filtered signal s ( t ) . Then, the filtered signal is separated into successive frames. After that, they are divided into the front subframes and back subframes. We compute a gain based on the scale factor   ξ i , original D i as well as watermark bit   w i , and then multiply the gain to the front and back subframe with mutually opposite signs to generate the watermark signal. In the end, the watermark signal is added back into the host signal with a specified delay   τ to modify the NCOD values.
When modulated subframes are added back to host signal, there are sudden changes at the boundaries which cause audible noise. To avoid this, amplitude of the boundaries of the modulated subframes are attenuated. In our implementation, the window function is calculated by following equation:
( t ) = { ( t 1 ) · Δ k , 1 t < l a 1 , l a t   T l a ( T t ) · Δ k , T l a < t   T l a · | Δ k | = 1 , 1 < l a < T / 2  
where Δ k is attenuation interval and l a is attenuation length at the boundary. We set | Δ k | = 1 / l a . Figure 3 shows the window.
The watermark embedding procedure of each frame is illustrated by pseudo code in Algorithm 1. The   b i is the bipolar term of w i ,   w i { 0 , 1 } b i { 1 , + 1 } . The T H i is threshold for determining a gap of NCOD relation between the front subframe and back subframe.
Algorithm1. Watermark embedding process
1:   b i = w i · 2 1
2:   D i = N C i 1 N C i 2
3:   T H i = ξ i · b i
4:  if ( w i = = 1 and D i > T H i ) or ( w i = = 0 and D i < T H i )
5:   return   // return without performing any action
6:  else
7:    g i = T H i D i
8:  end
9:  front subframe = front subframe · g i · f w
10:   back subframe = back subframe · ( g i ) · f w
If D i has already met the requirements of the watermark embedding condition, then no further modulating operation to the frame is required. Otherwise, a gain value g i is calculated by T H i minus D i , and this gain is multiplied to the front subframe. In contrast, the back subframe is multiplied by opposite sign of g i . These subframes are further multiplied by the f w window function.
In other words, the natural NCOD is firstly calculated, and then the necessary modifications are determined. To get a well visual effect, Figure 4 illustrates the procedure of the watermark embedding of two frames.
The scale factor ξ i is utilized to balance requirements between robustness and transparency. The main purpose of the proposed audio watermarking algorithm is to be robust against HE-AAC compression attack, so we designed the algorithm to adaptively adjust ξ i by using the signal-to-noise ratio (SNR) between host signal and the HE-AAC compressed signal. For convenience, we call the SNR as CSNR in this paper. Figure 5 presents the calculation process of ξ i .
We can calculate the CSNR of each frame by utilizing following formula:
S N R i = 20 · log 10 t = T f · i T f · ( i + 1 ) | x ( t ) | t = T f · i T f · ( i + 1 ) | x ( t ) y ( t ) |
where x ( t )   is host signal and y ( t ) is its compressed version. It can be indicated that the lower the CSNR value, the greater the compression is performed to the frame, which means lots of redundant signals are removed. Thus, a frame with a lower CSNR value would get a larger ξ i value to guarantee robustness and vice versa.

3.2. Watermark Extraction Scheme

Figure 6 shows the proposed watermark extraction procedure, where the | · | symbol means the absolute function. The procedure needs to know the embedding parameters in advance: The passband frequency [ F l c , F u c ] , the frame length T f , and the delay τ . The watermark extraction procedure can be regarded as a part of the embedding procedure. Assuming that the start point of watermark embedding has been found, the watermarked signal x ¯ ( t ) is firstly filtered to produce the filtered watermarked signal s ¯ ( t ) . After then, the formula (1)–(7) is utilized to calculate the D ¯ i of each frame. If D ¯ i > 0 , the detected bit is ‘1.’ If D ¯ i < 0 , the detected bit is ‘0.’ Note that the proposed method does not need the original signal in the watermark extraction process, which indicates that we proposed a blind audio watermarking method.

3.3. Feedback Process

A feedback process was added to the watermark embedding procedure (Figure 1) to ensure that the watermark is embedded with enough strength. As illustrated in Figure 7, after embedding the watermark, the generated watermarked signal x ¯ ( t ) is immediately input into the watermark extractor to calculate the D ¯ i values. By comparing the extracted D ¯ i and T H i , we can know whether the desired watermarking strength is achieved. If the strength is not enough, the gain g is increased by a small quantity α , and then it is put back into the watermark embedder to generate a stronger watermark signal. The feedback process works in an iterative manner.

4. Experimental Results

In this section, we present the results of our many experiments to present the performance of the proposed audio watermarking algorithm. First, there were parameter optimization experiments, followed by a comparison of the results with some state-of-the-art audio watermarking algorithms. We also show experiment results on 5.1 channel and stereo audio. Finally, synchronization is discussed.
Like other watermarking algorithms, we used bit error rate (BER) as a metric to evaluate the robustness of the watermarking algorithm. It is defined as:
B E R = N u m b e r   o f   i n c o r r e c t   b i t s N u m b e r   o f   t o t a l   b i t s   × 100 %
The objective difference grade (ODG) was utilized to measure imperceptibility, as it is one of the output values acquired according to the Perceptual Evaluation of Audio Quality (PEAQ) measurement technique prescribed by the ITU-R BS.1387 standard [25]. The ODG values were between 0.0 and −4.0 (imperceptible to very annoying), and they are shown in Table 1.
In the rest of this part of the paper, if there is no specific notice of parameters, experiments were performed with following default parameters: Frame length   T f = 500   samples (96 bps), delay τ = 45 , integration time T = 500 / 2 45 = 205 samples,   l a = 10 samples, Δ k = 0.1 ,   α = 0.005 , and passband frequency [ F l c , F u c ] = [ 2500 ~ 5500   Hz ] . CSNR was calculated using 24 kbps HE-AAC v1 compression. We terminated the feedback process of a frame if it was conducted more than five times and it resulted in average of 2.13 feedbacks per frame.
The ODG was measured by basic PEAQ model, which was implemented from the Telecommunications & Signal Processing Laboratory of McGill University [26]. The “fdkaac.exe” software [27], which was developed by Fraunhofer, was used to conduct HE-AAC compression. The “ffmpeg.exe” [28] was used to conduct MP3 compression and decode the compressed audio.

4.1. Experiments on Mono Audio

We used seven audio clips belonging to seven different genres as host audio signals to illustrate the performance of our watermarking algorithm on mono-channel audio. They were “Drama,” “Debate,” “Sports Commentary,” “Classical Music,” “Jazz Music,” “Pop Music,” and “Rock Music.” All audio files were in the WAVE format, mono, sampled at 48 kHz, quantized with 16 bits, and 30 s long.
The scale factor ξ i was adaptively adjusted by CSNR, as we mentioned in Section 3.1. Here, we first showed CSNR with 24 kbps HE-AAC v1 compression for two test audios and the sum of CSNR histograms of all seven test audio signals in Figure 8. From the histogram graph, we can see that most CSNR values were distributed from 0 to 15. According to the histogram distribution, we found that calculating ξ i by following Equation (11) well balanced the trade-off between robustness and imperceptibility in our implementation.
ξ i = { 0.2 , 14 C S N R i 0.05 · C S N R i + 0.9 , 0 C S N R i < 14 0.9 , C S N R i < 0
The data payload refers to the number of bits that can be embedded into the audio signal within a unit of time and is measured in the unit of bps (bits per second). The data payload of our algorithm was determined by frame length   T f , where one bit watermark is embedded. For example, if T f = 500 and the sampling rate of the host audio is 48,000 Hz, we can embed 48,000/500 = 96 bits per second in the audio. As the frame length will affect the imperceptibility and robustness of our algorithm, in Table 2 and Figure 9, we show how the change of data payload influenced ODG and BER, where BER was calculated after the HE-AAC v1 24 kbps compression. From the table and graph, we can see that BER was under 2% when the data payload was lower than 96, but it riose rapidly from 1.77 to 8.77% when the data payload was 96–192 bps ( T f = 250 ). With the increase of data payload, the ODG was slightly increased and larger than −1, which indicates the original and watermarked audio signals were perceptually similar and not annoying.
Besides the frame length, the other important parameter of the proposed algorithm is passband frequency. If the selected passband frequency is too low or too high, it is likely to yield bad results for BER, as the audio compression will remove most of signals in that frequency range based on the psychoacoustic model. Moreover, HE-AAC applies SBR to directly cutoff high frequencies, which will result in worse BER results. Table 3 and Figure 10 show how the selected passband frequency influences the ODG and BER, where BER was calculated after the HE-AAC v1 24 kbps compression. As shown in the table and graph, we observed that, when using a frequency higher than 3.5 kHz, BER was over 5% and rose rapidly. When using a passband frequency of 0.5~3.5 kHz, the ODG was lower than −1, the other frequency ranges were all larger than −1, and the BER was also worse than when using 1.5~4.5 kHz and 2.5~5.5 kHz. Therefore, we can conclude that the 1.5~5.5 kHz frequency range is a good choice for embedding watermarks with the proposed algorithm. Note that this range of frequency bands is very sensitive to the human hearing system according to the absolute threshold of hearing. Almost all audio watermarking methods were designed to avoid this range of frequency to meet the imperceptibility requirement, though our method achieved high imperceptibility on the 1.5~5.5 kHz frequency band while maintaining a low BER.
Delay is also a significant factor which influences the performance of the proposed algorithm. If the delay is too large, the accumulation samples for calculating autocorrelation will decrease, which will cause an increase in the BER. The results in Table 4 and Figure 11 show how the delay had and effect on the ODG and BER. We can see that the delay around 45 samples gave good result where the ODG was over −1 and the BER was under 3%. Though the ODG got better when delays were over 100 samples, BER rose over 3%
Figure 12 and Table 5 show how the attenuation length at the boundary of the window influenced the BER and ODG. It can be observed that the ODG steadily rose as the attenuation length increased and the BER decreased. An attenuation length of 10–25 is preferable, as the BER increased quickly over 25, and the ODG was relatively low at 5.
In Table 6, we illustrate our comparison results. Our comparison was based on reported results of recently published papers [8,9,12,13,23,24] and was given for the data payload, ODG, and BER under MP3 compression (32 kbps, 64 kbps, 96 kbps, 128 kbps). We also listed the BER under various HE-AAC v1 compression bitrates (16 kbps, 24 kbps, 32 kbps, 64 kbps) with various data payloads. As we can see from the table, the proposed algorithm is competitive compared with other methods at similar data payloads under MP3 compression, indicating that our algorithm is also able to be robust against MP3 compression. Comparing the BER at the same bit rate (32 kbps, 64 kbps) on MP3 and HE-AAC v1 compressions of our algorithm, we found that the BER under HE-AAC v1 was nearly half of the MP3. From the BER results at various HE-AAC v1 compression bitrates, we can see that our algorithm was able to be robust against HE-AAC v1 compression where the bit rate was higher than 24 kbps with a low BER. Unfortunately, we could not have a comparison result under HE-AAC v1, as there is no paper which has evaluated its robustness under HE-AAC v1 compression.

4.2. Experiments on 5.1 Channel and Stereo Audio

Multichannel audio systems have become more and more popular in home entertainment environments. As HE-AAC is one of the most efficient audio codecs for multichannel audio, here we present experimental results under HE-AAC v1 and v2 compression by applying our watermarking algorithm on 5.1 channel audio and stereo audio, respectively. The CSNR for 5.1 channel audio was applied toHE-AAC v1 128 kbps compression, and, for stereo audio, it was applied to HE-AAC v2 48 kbps compression.
We used the twelve 5.1 channel audio types, which are listed in Table 7. All 5.1 channel audio files were in WAVE format, sampled at 48 kHz, quantized with 16 bits, and 10~20 s long. Figure 13 shows the time domain waveforms of the “Bach organ” 5.1 channel audio.
The channels from, top to bottom, are front left (FL), front right (FR), center (C), low-frequency effects (LFE), surround left (SL), and surround right (SR). We independently embedded a watermark on each channel except LFE, as the channel showed nearly no signals in our test audio. Table 8 provides the BER of each channel of 5.1 channel audio under HE-AAC v1 128 kbps compression, as well as the corresponding ODG of each audio. The ODG was first calculated for each channel except LFE and averaged to report in the table for each audio. From the table, we can observe that there was a big difference of the BER of each channel of an audio. For example, the BER of the center channel of the No.1 audio was only 0.18%, but its front right channel BER was 5% higher than center channel, which was 6.21%. Regarding the ODG, all test audio yielded larger than −1 ODG values, except for three test audios that were slightly lower than −1. The average BER of the twelve test audios was 2.53%, and the average ODG was −0.84.
For testing stereo audio, we down-mixed the 5.1 channel audio by the following formula, FL = FL + 0.707 × FC + 0.5 × SL; FR = FR + 0.707 × FC + 0.5 × SR, to make test stereo audio. Figure 14 shows the time domain waveforms of the “Bach organ” stereo audio, which was made by the corresponding 5.1 channel audio. For the experiment on 5.1 channel audio, we independently embedded a watermark on each channel. Table 9 provides the BER of each channel of stereo audio under HE-AAC v2 48 kbps compression, as well as the corresponding ODG of each audio type. From the table, we can see that average BER of the twelve test audios was 1.28%, and the average ODG was −0.78.
Furthermore, we present the BER of the multichannel audio under various bit rate HE-AAC compressions in Table 10. It can be seen that our watermarking algorithm was able to be robust against HE-AAC v1 and HE-AAC v2 compression when the bit rate was higher than 128 and 32 kbps, respectively, for 5.1 channel and stereo audio, as the BER was under 3% at a data payload of 96 bps, and the ODG was larger than −1. However, our algorithm showed a bad robustness for extremely low bit rate compression, like 5.1 channel audio with HE-AAC v1 64 kbps and stereo audio with HE-AAC v2 16 kbps, where the BER was over 20%.

4.3. Synchronization Design

In order to apply the audio watermarking technique in real situations, integrity information should be embedded repeatedly in the audio signal within each block unit. Each block must be further segmented to the synchronization codes part and the watermark information part. Synchronization is an effective way to accurately identify the watermark location.
In our algorithm, we can use a certain number of consecutive frames as the synchronization codes parts by embedding a fixed watermark sequence. In the extraction process, by checking whether the extracted consecutive watermark bits are same as the synchronization codes or a certain proportion is correct, we can determine the start of a block.
As we used the correlation between the signal and a delayed version of itself to extract the watermark, finding the exact start sample of each watermarked frame was not necessary because the correlation values D ¯ around the start point were similar. Figure 15 shows the D ¯ values calculated for each of the 5000 samples of “Drama” audio after HE-AAC v1 24 kbps compression, where the frame length was 500 samples and the watermark bits [1, 0, 1, 0, 1, 0, 1, 0, 1, 0] were embedded. As we mentioned in Section 3.2, the D ¯ value was calculated to determine the extracted watermark bit. From the figure, we can see that the D ¯ values changed slowly and were not likely to have an abrupt change within several samples. Hence, we could efficiently find the start of a block by skipping a number of samples each time when determining if the synchronization codes are found instead of brute-force searching through each sample. The selection for number of skipping samples is affected by frame length, and a larger number of skipping samples could be chosen with a longer frame length.
To ensure extracting complete information, according to the BER and the length of watermark information, an appropriate error correcting code should be applied.

5. Conclusions

In this paper, an autocorrelation modulation based an audio blind watermarking algorithm was proposed. The difference in NCOD relations between the front subframe and back subframe was used to embed and extract the watermark bits of each frame. The SNR before and after HE-AAC compression was used to adaptively adjust the scale factor, which was further used to balance the trade-off between robustness and imperceptibility. In addition, a feedback process was added to the watermark embedding procedure to ensure that the watermark was embedded with enough strength.
By using optimized parameters, the experimental results present the fact that our algorithm is able to be robust against low bit rate (24 kbps for mono, 32 kbps for stereo and 128 kbps for 5.1 channel) HE-AAC compression where the BER is under 3% while ensuring a high level of imperceptibility (the average ODG was over −1) and data payload (96 bps). Synchronization for our algorithm was also discussed.
In the future, we will study a more efficient and effective watermarking method for multichannel audio to achieve a higher data payload. We will further explore a suitable error correction code and synchronization to the method to develop a real-time application.

Author Contributions

Both authors contributed to the research work. Both authors designed the new method and planned the experiments. J.K. led and reviewed the research work. Y.H. performed the experiments and wrote the paper.

Funding

This research was supported by a 2018 Research Grant from Sangmyung University.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Herre, J.; Dietz, M. MPEG-4 high-efficiency AAC coding [Standards in a Nutshell]. IEEE Signal. Process. Mag. 2008, 25, 137–142. [Google Scholar] [CrossRef]
  2. Wolters, M.; Kjorling, K.; Homm, D.; Purnhagen, H. A Closer Look into MPEG-4 High Efficiency AAC; Audio Engineering Society: New York, NY, USA, 2003. [Google Scholar]
  3. Meltzer, S.; Moser, G. MPEG-4 HE-AAC v2–Audio Coding for Today’s Digital Media World; EBU Technical Review; EBU: Geneva, Switzerland, 2006; pp. 37–48. [Google Scholar]
  4. Shin, D.; Hong, Y.; Kim, J.; Choi, J. Audio Blind Watermarking Robust Against HE-AAC. In Proceedings of the 8th International Conference on Signal Processing Systems, Auckland, New Zealand, 21–24 November 2016; ACM: New York, NY, USA, 2016; pp. 114–118. [Google Scholar]
  5. Cox, I.; Miller, M.; Bloom, J.; Fridrich, J.; Kalker, T. Digital Watermarking and Steganography, 2nd ed.; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2008; ISBN 978-0-08-055580-5. [Google Scholar]
  6. Barni, M. What is the future for watermarking? (Part II). IEEE Signal Process. Mag. 2003, 20, 53–59. [Google Scholar] [CrossRef]
  7. Katzenbeisser, S. Information Hiding Techniques for Steganography and Digital Watermarking, 1st ed.; Katzenbeisser, S., Petitcolas, F.A., Eds.; Artech House, Inc.: Norwood, MA, USA, 2000; ISBN 978-1-58053-035-4. [Google Scholar]
  8. Cox, I.J.; Kilian, J.; Leighton, F.T.; Shamoon, T. Secure spread spectrum watermarking for multimedia. IEEE Trans. Image Process. 1997, 6, 1673–1687. [Google Scholar] [CrossRef] [PubMed]
  9. Zhang, P.; Xu, S.Z.; Yang, H.Z. Robust audio watermarking based on extended improved spread spectrum with perceptual masking. Int. J. Fuzzy Syst. 2012, 14, 289–295. [Google Scholar]
  10. Xiang, Y.; Natgunanathan, I.; Rong, Y.; Guo, S. Spread Spectrum-Based High Embedding Capacity Watermarking Method for Audio Signals. IEEE/ACM Trans. Audio Speech Lang. Process. 2015, 23, 2228–2237. [Google Scholar] [CrossRef]
  11. Yang, S.; Tan, W.; Chen, Y.; Ma, W. Quantization—Based Digital Audio Watermarking in Discrete Fourier Transform Domain. J. Multimed. 2010, 5. [Google Scholar] [CrossRef]
  12. Elshazly, A.R.; Fouad, M.M.; Nasr, M.E. Secure and robust high quality DWT domain audio watermarking algorithm with binary image. In Proceedings of the 2012 Seventh International Conference on Computer Engineering Systems (ICCES), Cairo, Egypt, 27–29 November 2012; pp. 207–212. [Google Scholar]
  13. Dhar, P.; Shimamura, T. Blind Audio Watermarking in Transform Domain Based on Singular Value Decomposition and Exponential-Log Operations. Radioengineering 2017, 26, 552–561. [Google Scholar] [CrossRef]
  14. Li, J.; Wang, H.X.; Wu, T.; Sun, X.; Qian, Q. Norm ratio-based audio watermarking scheme in DWT domain. Multimed. Tools Appl. 2018, 77, 14481–14497. [Google Scholar] [CrossRef]
  15. Bassia, P.; Pitas, I.; Nikolaidis, N. Robust audio watermarking in the time domain. IEEE Trans. Multimed. 2001, 3, 232–241. [Google Scholar] [CrossRef]
  16. Wen-Nung, L.; Li-Chun, C. Robust and high-quality time-domain audio watermarking based on low-frequency amplitude modification. IEEE Trans. Multimed. 2006, 8, 46–59. [Google Scholar] [CrossRef]
  17. Megías, D.; Serra-Ruiz, J.; Fallahpour, M. Efficient Self-synchronised Blind Audio Watermarking System Based on Time Domain and FFT Amplitude Modification. Signal Process. 2010, 90, 3078–3092. [Google Scholar] [CrossRef]
  18. Zeng, G.R.; Qiu, Z.D. Audio watermarking in DCT: Embedding strategy and algorithm. In Proceedings of the 2008 9th International Conference on Signal Processing, Beijing, China, 26–29 October 2008; pp. 2193–2196. [Google Scholar]
  19. Chen, S.T.; Huang, H.N.; Chen, C.J.; Wu, G.D. Energy-proportion based scheme for audio watermarking. IET Signal Process. 2010, 4, 576–587. [Google Scholar] [CrossRef]
  20. Petrovic, R. Audio signal watermarking based on replica modulation. In Proceedings of the 5th International Conference on Telecommunications in Modern Satellite, Cable and Broadcasting Service, TELSIKS 2001–Proceedings of Papers (Cat. No. 01EX517), Nis, Yugoslavia, 19–21 September 2001; Volume 1, pp. 227–234. [Google Scholar]
  21. Petrovic, R.; Winograd, J.M.; Jemili, K.; Metois, E. Data hiding within audio signals. In Proceedings of the 4th International Conference on Telecommunications in Modern Satellite, Cable and Broadcasting Services. TELSIKS’99 (Cat. No. 99EX365), Nis, Yugoslavia, 15 October 1999; Volume 1, pp. 88–95. [Google Scholar]
  22. Muhaimin, H.; Danudirdjo, D.; Suksmono, A.B.; Shin, D. An efficient audio watermark by autocorrelation methods. In Proceedings of the 2015 International Conference on Electrical Engineering and Informatics (ICEEI), Denpasar, Indonesia, 10–11 August 2015; pp. 606–611. [Google Scholar]
  23. Lei, B.Y.; Soon, I.Y.; Li, Z. Blind and robust audio watermarking scheme based on SVD–DCT. Signal Process. 2011, 91, 1973–1984. [Google Scholar] [CrossRef]
  24. Khalil, M.; Adib, A. Audio watermarking with high embedding capacity based on multiple access techniques. Digit. Signal Process. 2014, 34, 116–125. [Google Scholar] [CrossRef]
  25. ITU-R BS.1387-1. Method for Objective Measurements of Perceived Audio Quality; ITU: Geneva, Switzerland, 2001. [Google Scholar]
  26. Kabal, P. An Examination and Interpretation of ITU-R BS.1387: Perceptual Evaluation of Audio Quality, TSP Lab. Technical Report; Department of Electrical and Computer Engineering, McGill University: Montreal, QC, Canada, 2002. [Google Scholar]
  27. Fdkaac.exe. Available online: https://www.dbpoweramp.com/codec-central-m4a.htm (accessed on 19 May 2019).
  28. Ffmpeg.exe. Available online: https://ffmpeg.zeranoe.com/builds (accessed on 19 May 2019).
Figure 1. HE-AAC techniques
Figure 1. HE-AAC techniques
Applsci 09 02780 g001
Figure 2. Watermark embedding procedure.
Figure 2. Watermark embedding procedure.
Applsci 09 02780 g002
Figure 3. Window function.
Figure 3. Window function.
Applsci 09 02780 g003
Figure 4. Watermark embedding diagram.
Figure 4. Watermark embedding diagram.
Applsci 09 02780 g004
Figure 5. Scale factor ξ i calculation.
Figure 5. Scale factor ξ i calculation.
Applsci 09 02780 g005
Figure 6. Watermark extraction procedure.
Figure 6. Watermark extraction procedure.
Applsci 09 02780 g006
Figure 7. Feedback process.
Figure 7. Feedback process.
Applsci 09 02780 g007
Figure 8. Calculated CSNR of test audio (a) CSNR of “Sports Commentary” audio. (b) CSNR of “Classical Music” audio. (c) The sum of CSNR histograms of seven test audio signals.
Figure 8. Calculated CSNR of test audio (a) CSNR of “Sports Commentary” audio. (b) CSNR of “Classical Music” audio. (c) The sum of CSNR histograms of seven test audio signals.
Applsci 09 02780 g008
Figure 9. ODG and BER values under various data payloads.
Figure 9. ODG and BER values under various data payloads.
Applsci 09 02780 g009
Figure 10. ODG and BER values under various passband frequencies.
Figure 10. ODG and BER values under various passband frequencies.
Applsci 09 02780 g010
Figure 11. ODG and BER values under various delays.
Figure 11. ODG and BER values under various delays.
Applsci 09 02780 g011
Figure 12. ODG and BER values under various attenuation lengths of the window.
Figure 12. ODG and BER values under various attenuation lengths of the window.
Applsci 09 02780 g012
Figure 13. Time domain waveforms of the “Bach organ” 5.1 channel audio.
Figure 13. Time domain waveforms of the “Bach organ” 5.1 channel audio.
Applsci 09 02780 g013
Figure 14. Time domain waveforms of the “Bach organ” stereo audio.
Figure 14. Time domain waveforms of the “Bach organ” stereo audio.
Applsci 09 02780 g014
Figure 15. Correlation D ¯ values calculated for each of 5000 samples of “Drama” audio after HE-AAC v1 24 kbps, where the watermark bit [1, 0, 1, 0, 1, 0, 1, 0, 1, 0] is embedded.
Figure 15. Correlation D ¯ values calculated for each of 5000 samples of “Drama” audio after HE-AAC v1 24 kbps, where the watermark bit [1, 0, 1, 0, 1, 0, 1, 0, 1, 0] is embedded.
Applsci 09 02780 g015
Table 1. Objective difference grades (ODG) for audio quality evaluations.
Table 1. Objective difference grades (ODG) for audio quality evaluations.
Impairment DescriptionODGQuality
Imperceptible0.0Excellent
Perceptible, but not annoying−1.0Good
Slightly annoying−2.0Fair
Annoying−3.0Poor
Very annoying−4.0Bad
Table 2. ODG and bit error rate (BER) values under various data payloads.
Table 2. ODG and bit error rate (BER) values under various data payloads.
Data Payload (bps)ODGBER (%)
24−0.820.47
48−0.660.61
72−0.660.83
96−0.661.77
144−0.624.55
192−0.578.77
Table 3. ODG and BER values under various passband frequencies.
Table 3. ODG and BER values under various passband frequencies.
Passband Frequency (kHz)ODGBER (%)
0.5~3.5−1.172.96
1.5~4.5−0.861.22
2.5~5.5−0.651.77
3.5~6.5−0.775.34
4.5~7.5−0.7615.06
5.5~8.5−0.6845.76
Table 4. ODG and BER values under various delays.
Table 4. ODG and BER values under various delays.
DelayODGBER (%)
15−0.772.65
45−0.661.77
75−0.562.1
100−0.443.29
150−0.237.23
200−0.0321.12
Table 5. ODG and BER values under various attenuation lengths of the window.
Table 5. ODG and BER values under various attenuation lengths of the window.
Attenuation Length of the WindowODGBER (%)
5−0.751.69
15−0.611.83
25−0.542.26
35−0.482.89
Table 6. Performance comparison results with state-of-the-art algorithms under MP3 compression, sorted by data payload. (“---” means not reported).
Table 6. Performance comparison results with state-of-the-art algorithms under MP3 compression, sorted by data payload. (“---” means not reported).
ReferenceData Payload (bps)ODGMP3 Compression BER (%)HE-AAC v1 Compression BER (%)
32 kbps64 kbps96 kbps128 kbps16 kbps24 kbps32 kbps64 kbps
[23]43−0.573.000.000.000.00------------
[8]43−0.55---0.220.02---------------
[9]84−0.70------0.330.00------------
[12]172−1.056.13---------------------
[13]172−0.51---2.86---0.02------------
[24]4000−0.91---1915---------------
Ours48−0.660.460.210.210.179.150.610.240.20
96−0.651.290.310.150.1818.021.770.500.16
192−0.586.502.601.431.23308.773.561.50
Table 7. 5.1 channel host audio used in the experiment.
Table 7. 5.1 channel host audio used in the experiment.
No.Audio NameDescription
1Bach organChurch organ; lots of stops out
2BrassOrchestral; lots of brass instruments
3HarpsichordSolo harpsichord; isolated notes
4Mouth harpMouth organ, bass guitar, percussion
5Sax PianoSaxophone and piano
6TrumpetOrchestral piece
7ApplauseApplause with distinct clapping sounds
8ChantsSmall choir; large church; Gregorian chant
9ClassicalOrchestral piece; open sound
10Radio dramaClarinet, orchestra, male speaker, tenor, ambience
11SedambonjouAtmospheric performance of Latin-American music
12MoonriverMouth organ and string orchestra
Table 8. BER (%) and ODG of each channel of 5.1 channel audio under HE-AAC v1 128 kbps compression.
Table 8. BER (%) and ODG of each channel of 5.1 channel audio under HE-AAC v1 128 kbps compression.
No.FLFRCLFESLSRAvg BERODG
12.936.210.18---3.875.213.68−0.70
23.952.440.29---5.031.942.73−0.84
30.881.240.12---1.180.650.81−1.06
41.871.740.37---1.551.991.51−0.73
53.833.320.22---0.300.301.59−0.54
63.811.570.00---3.472.352.24−0.86
70.060.240.77---1.424.561.41−0.48
80.591.002.17---0.061.000.96−0.86
94.903.740.27---2.763.483.03−1.14
107.848.670.46---8.399.036.88−0.93
112.782.830.64---2.952.782.39−1.04
123.474.020.64---3.294.483.18−0.95
Avg------------------2.53-0.84
Table 9. BER (%) and ODG of each channel of stereo audio under HE-AAC v2 48 kbps compression.
Table 9. BER (%) and ODG of each channel of stereo audio under HE-AAC v2 48 kbps compression.
No.Left ChannelRight ChannelAvg BERODG
10.701.701.20−0.42
20.570.930.75−0.78
30.350.880.62−0.98
40.560.120.34−0.62
51.620.811.22−0.73
61.010.780.90−0.83
70.060.060.06−0.45
80.180.640.33−0.83
91.780.451.12−1.07
103.193.013.10−0.87
114.804.404.60−0.87
121.101.191.15−0.89
Avg------1.28−0.78
Table 10. Averaged BER (%) of test multichannel audio under various bit rate HE-AAC compressions.
Table 10. Averaged BER (%) of test multichannel audio under various bit rate HE-AAC compressions.
5.1 ChannelStereo
64 kbps96 kbps128 kbps160 kbps16 kbps24 kbps32 kbps48 kbps
24.046.672.530.5427.97.872.301.29

Share and Cite

MDPI and ACS Style

Hong, Y.; Kim, J. Autocorrelation Modulation-Based Audio Blind Watermarking Robust Against High Efficiency Advanced Audio Coding. Appl. Sci. 2019, 9, 2780. https://doi.org/10.3390/app9142780

AMA Style

Hong Y, Kim J. Autocorrelation Modulation-Based Audio Blind Watermarking Robust Against High Efficiency Advanced Audio Coding. Applied Sciences. 2019; 9(14):2780. https://doi.org/10.3390/app9142780

Chicago/Turabian Style

Hong, Yiyu, and Jongweon Kim. 2019. "Autocorrelation Modulation-Based Audio Blind Watermarking Robust Against High Efficiency Advanced Audio Coding" Applied Sciences 9, no. 14: 2780. https://doi.org/10.3390/app9142780

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop