*3.2. Combining Pre-Trained Binary Classifiers to Discriminate WOW and UNIWARD*

We attempted to train two CNN-based binary classifiers for WOW and UNIWARD and simply combine the two classifiers in parallel to determine the result of the classifier with a higher probability as a final result (Figure 9). This was based on the assumption that the results of the classifier with a greater probability would be right if different classification results are obtained by the two classifiers. Table 2 presents the classification results for the cover, WOW stego, and UNIWARD stego images (the details for the experimental conditions are given in Section 4). The classification rates for the WOW and UNIWARD stego images significantly decreased because of the similarity between WOW and UNIWARD. In other words, the simple combination of two binary classifiers is not useful for discriminating WOW and UNIWARD.

**Figure 9.** Combining two binary classifiers in parallel for the ternary classification.

**Table 2.** Ternary classification rates obtained by simply combining two binary classifiers separately trained for WOW and UNIWARD (*bpp* = 0.4).


We also conducted an experiment for ternary classification through transfer learning. The network parameters of classifiers were fixed after training each binary classifier for WOW and UNIWARD. The fully connected layer was then removed from each binary classifier, and a common fully connected layer was added and trained for the ternary classification (Figure 10). Table 3 shows the classification results for the cover, WOW stego, and UNIWARD stego images (the details for the experimental conditions are given in Section 4). The classification rates for the WOW and UNIWARD stego images were very low, and were lower than those obtained by simply combining two binary classifiers in parallel. This result indicates that the network parameters in the fully connected layer were not correctly trained because of the similarity between WOW and UNIWARD.

**Figure 10.** Ternary classification through transfer learning.

**Table 3.** Ternary classification rates obtained by transfer learning (*bpp* = 0.4).


Referring to the abovementioned experiments, a new single network should be designed to simultaneously learn the cover, WOW stego, and UNIWARD stego images from the beginning and correctly classify the similar steganographic methods, WOW and UNIWARD.

### *3.3. Designing a CNN for Ternary Classification*

The CNN used in [17] is the most basic CNN for image steganalysis, and most conventional CNNs are its modifications. Therefore, it was used as the base CNN herein. First, the base CNN was tested for ternary classification without modification. The cover, WOW stego, and UNIWARD stego images were simultaneously learned in a single network (Figure 4). Table 4 presents the classification rates (the details for the experimental conditions are given in Section 4), which are better than those obtained by combining pre-trained binary classifiers. However, the cover images were relatively well classified at approximately 84%, but the WOW and UNIWARD stego images were rarely classified as expected. In conclusion, the network structure should be extended, and the preprocessing filter for extracting the steganalytic features should be more carefully designed to make the classifier originally developed for the binary classification between the cover and stego images available for ternary classification.

**Table 4.** Ternary classification rates when simultaneously learning the cover, WOW stego, and UNIWARD stego images using the conventional classifier [17] (*bpp* = 0.4).


We tried to extend the based CNN by attempting to add more convolutional layers (each comprising convolution, normalization, activation, and pooling operations) because more classification power would be required for the ternary classification compared to the binary classification. Figure 11 shows the structure of the networks extended with additional convolutional layers. Table 5 displays the classification rates of the extended networks (the details for the experimental conditions are given in Section 4). As a result, adding convolutional layers improved the classification rates by 2–4%; however, the classification rates rather became lower with two or more additional convolutional layers, indicating that the network needs to be deeper for ternary classification, but the depth should be properly adjusted.

**Table 5.** Ternary classification rates of deeper networks in Figure 11 (*bpp* = 0.4).


(**c**) With three additional convolutional layers

**Figure 11.** Extending the conventional network [17] with additional convolutional layers.

We also attempted to use a deep residual network (Figure 12a) or a convolution-stacked network (Figure 12b), where the convolutional blocks were stacked as done in [29] because those residual or convolution-stacked networks demonstrated a significantly improved performance in image recognition. However, as shown in Table 6, the classification rates were not good, indicating that these networks were not suitable for image steganalysis or for ternary classification.

**Figure 12.** Deep residual network and convolution-stacked network for ternary classification.

**Table 6.** Ternary classification rates of the residual and convolution-stacked networks of Figure 12 (*bpp* = 0.4).


As explained in Section 2.3, the CNN-based classifiers for image steganalysis have preprocessing filters to facilitate the extraction of steganalytic features from images. Many conventional methods tried to use various preprocessing filters for performance improvement. For the ternary classification, we decided to use the SRM filters mostly used in conventional methods and conducted an experiment to determine their performance. The base CNN was used with three different preprocessing filter sets: 30 SRM filters (Figure 3), three groups of 10 SRM filters, and 10 selected SRM filters (Figure 13). The second filter set was obtained by dividing 30 SRM filters into three groups of 10 (using different numbers of groups was worse [28]). The filters of each group were applied to the input image. Ten filtered results were generated by performing the element-wise sum between the filtered results of each group [28]. The third filter set is a new one proposed herein. More effective filters were selected from 30 SRM filters. Each of the 30 SRM filters was applied to the arbitrary cover and stego images. The differences between the filtered cover and stego images were then computed (Figure 14). Subsequently, 10 filters with higher differences were selected, assuming that those filters would extract steganalytic features from the images well. For all of the filter sets, eight feature maps were generated in the first convolutional layer and doubled in the subsequent convolutional layers. Tables 4 and 7 (the details for the experimental conditions are given in Section 4) show that the classification rates of the base CNN did not increase as the number of filters increased, unlike expected. The results of the three groups of 10 SRM filters were better than those of the others, indicating that simply increasing the number of filters does not guarantee performance improvements, and finding the appropriate filters for a given CNN is necessary.

**Figure 13.** Ten selected SRM filters. They can better detect tiny variation on images, among 30 SRM filters of Figure 3.

**Figure 14.** Selection of more effective SRM filters: a large difference (e.g., 1.229 and 7.234) depending on the filter between the filtered cover and stego images is found after each SRM filter is applied.



Together with increasing the number of filters, we also attempted to increase the feature maps in the first convolutional layers from 8 to 60. Table 8 shows that the classification rates of the base CNN became significantly lower, except for the 10 selected SRM filters, when the number of feature maps increased. Unlike most conventional CNNs that achieve performance improvement by using more filters or feature maps, the base CNN had a better performance with a small number of filters maybe because the base CNN failed to learn a large amount of information extracted by many filters or feature maps. From these results, we conclude that the base CNN should be deeper such that more filters or feature maps can be used.

**Table 8.** Ternary classification rates of the base CNN with different preprocessing filters when increasing the feature maps in the first convolutional layers to 60 (*bpp* = 0.4).


### *3.4. Proposed Classifier for Ternary Classification*

We proposed a CNN-based classifier for the ternary classification. The base CNN [17] was extended with an additional convolutional layer. The feature maps were increased to 60 in the first convolutional layer and doubled in the subsequent convolutional layers: thus, 1920 feature maps were fed into the fully connected layer. Ten selected SRM filters were used as the preprocessing filters.

### **4. Experimental Results and Discussion**

All the experiments presented in the previous sections and in this section were conducted with the following conditions: 10,000 gray scale images of 512 × 512 in BOSSBase 1.01 [30] were quartered, and the resulting 40,000 images were divided into the training and testing sets, each comprising 30,000 and 10,000 images, respectively. The stego images for both sets were generated with a random payload of *bpp* = 0.4 (In most steganalytic studies, 0.1, 0.2, and 0.4 *bpp* have been used for testing steganalytic methods. However, when using adaptive steganographic methods, 0.1 and 0.2 *bpp* are too small to identify the stego images, even in binary classification [31]. The average PSNRs of the WOW and UNIWARD stego images of 0.4 *bpp* are 58.76 and 59.36 dB, respectively; thus, the image quality of the stego images of 0.4 *bpp* is still very high.) using WOW and UNIWARD. As a result, 90,000 (30,000 for cover, WOW stego, and UNIWARD stego images each) training images of 256 × 256 and 30,000 (10,000 for cover, WOW stego, and UNIWARD stego images each) testing images were used. For training, a momentum optimizer [32] with a momentum value of 0.9 was used. The learning rate started at 0.001 and decreased to 90% in every 5000 iterations. The minibatch size was 64 (32 pairs of cover and stego images). The other hyperparameters were set the same as in the conventional method [17]. All CNNs were implemented using the TensorFlow library [33].

The proposed classifier was evaluated with different preprocessing filters. As a new preprocessing filter set, 16 Gabor filters were used together with the 10 selected SRM filters, as has been done in [19]. The results in Table 9 are the classification rates for the cover, WOW stego, and UNIWARD stego images obtained using different preprocessing filters.

**Table 9.** Ternary classification rates of the network of Figure 11a with different preprocessing filters (*bpp* = 0.4).


Unlike the base CNN, using more filters and feature maps increased the classification rates; however, utilizing too many and different types of filters was not good. The results of the 10 selected SRM filters (i.e., the proposed one) were the best. The experimental results demonstrated that the cover, WOW stego, and UNIWARD stego could be classified with an accuracy of approximately 72% through the single CNN-based ternary classifier proposed herein.

We also attempted to change the tanh functions of the first two convolutional layers to TLU functions, as has been done in [20], and the ReLU functions of the subsequent convolutional layers to leaky ReLU functions, but the classification rates were not good (Table 10).

**Table 10.** Ternary classification rates when changing the activation functions of the proposed CNN (*bpp* = 0.4).


### **5. Conclusions and Future Works**

This study proposed a CNN-based ternary classifier to identify cover, WOW stego, and UNIWARD stego images. The existing binary classifiers were designed to learn and detect a specific steganographic method; hence, they were not suitable for discriminating different steganographic methods. Adaptive steganographic methods, such as WOW and UNIWARD, embed a small amount of the secret message in a similar manner; therefore, discriminating their stego images using the existing binary classifiers or combining them was very difficult. However, the proposed ternary classifier could effectively learn the difference between both steganographic methods and discriminate them with high accuracy. The classification between different steganographic methods using the proposed ternary classifier was the first step in restoring the embedded message instead of simply determining whether or not a message has been embedded.

It was experimentally confirmed that, in designing a CNN-based ternary classifier for image steganalysis, simply expanding the width or depth of the CNN does not guarantee performance improvements. In other words, the CNN width and depth need experimental optimization. This study demonstrated the results of such an experimental optimization.

The proposed method had an accuracy of approximately 72%, which is not very high. Therefore, ways to improve the accuracy by further highlighting the differences between WOW and UNIWARD must be explored in the future. Ways to design a CNN-based classifier suitable for classifying a larger number (≥3) of steganographic methods, including those with other embedding domains (e.g., DCT and wavelet domains), must also be explored.

**Author Contributions:** Conceptualization, S.K. and H.P.; Funding acquisition, H.P. and J.-I.P.; Methodology, S.K. and H.P.; Software, S.K.; Supervision, H.P. and J.-I.P.; Validation, S.K. and H.P.; Writing—original draft, S.K.; Writing—review and editing, H.P. and J.-I.P.

**Funding:** This work was supported by the research fund of the Signal Intelligence Research Center supervised by the Defense Acquisition Program Administration and Agency for the Defense Development of Korea.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*
