**4. Results**

Tables 1 and 2 show the results of steganography detection obtained by shallow and deep methods, respectively. We display the accuracy values achieved for various steganographic algorithms and various hidden data densities, accompanied by average accuracies for each classifier/parameter combination.


**Table 1.** Accuracy of image steganography detection (in percentages) for various classifiers and ensemble configurations. The best values in each column are shown in bold.

**Table 2.** Accuracy of image steganography detection (in percentages) for various architectures of neural networks and optimizers. The best values in each column are shown in bold.


On average, the use of ML for ensemble vote combination allowed for higher detection accuracy when using the systems based on DCTR or GFR features, despite marginally worse performance in certain cases (such as GFR features extracted from nsF5-modified files at 0.1 bpnzac). The PHARM-features-based classifiers sometimes yielded results worse than when using the default, majority-based scheme, or failed to converge altogether. There was no combination of type of parameters used (DCTR, GFR, PHARM) and method of fusing base-learner votes into the final decision that outperformed the others in all testing scenarios. The configuration that, on average, achieved the best results for the

steganographic algorithms tested turned out to be the linear regression classifier fed with DCTR features. While using linear discriminant analysis (LDA) to fuse votes coming from a system operating on DCTR parameters achieved equal averaged accuracy, linear regression is considered in further sections due to slightly better performance with GFR and PHARM features.

As for the deep learning algorithms (Table 2), the lowest accuracy was obtained for the set based on J-Uniward. Better results in terms of accuracy were obtained for the sets based on UERD, and the best were achieved for nsF5. When analyzing the tested configurations, the worst results are those based on the SGD optimizer, while the configurations based on Adam performed better at higher learning rates. Comparing the configuration based on three layers and two layers, the results are rather similar for the Adam optimizer.

Looking at the various feature spaces, it can be seen that the least accurate results were always obtained for PHARM. On the other hand, the results obtained for the DCTR and GFR parameters for all combinations were much better and rather similar, which means that most probably they can be used interchangeably in JPEG steganalytic tools.

These observations are further confirmed in Figure 4. The PHARM parameters always yielded the worst results. The GFR features usually gave slightly better results for the higher embedding rate (0.4 bpnzac), while for the lower embedding rate (0.1 bpnzac) it was the DCTR feature space that turned out to be slightly better for most of the tested classifiers, both shallow and deep learning-based.

**Figure 4.** Accuracy achieved for various feature vectors against classifiers or network architectures.

After conducting the research, we selected the best configurations for specific types of sets, differentiating for shallow and deep learning methods, and calculated the remaining metrics. Their outcomes are visualized in Figure 5, while the details are shown in Tables 3 and 4. Based on Figure 6, one can notice that the differences between the main evaluation metrics for the best shallow and deep methods for density 0.4 bpnzac are only minor. A somewhat higher difference can be observed for all the tested steganographic algorithms applied at the lower embedding rate: 0.1 bpnzac. Here, the ensemble (shallow) classifier usually turned out to be slightly better.


**Table 3.** Results of image steganography detection (in percentages) for the best shallow method (linear regression).

**Table 4.** Results of image steganography detection (in percentages) for the best deep learning method (250 × BN × 120 × BN × 50 with Adam 1 × *e*<sup>−</sup><sup>4</sup> based on DCTR parameters).


**Figure 5.** Visualization of evaluation results for the best shallow and deep steganalytic algorithms.

These observations are confirmed by the scores shown in Tables 3 and 4. The highest difference is for the J-Uniward 0.1 set, where the difference is about 4% relative, while for other sets we usually observe about 1–2% relative advantage in favor of the ensemble classifier, which means that these differences are only minor.

In total, the parameters of detecting data hidden using nsF5 at 0.4 embedding rate are close to 100%, regardless of the method. In contrast, the metrics for detection of data hidden with J-Uniward at 0.1 bpnzac are very poor. For the ensemble classifier with linear regression, all metrics are around 54%, while for the best neural network for most of the results are at the chance level. In general, the detection of all the tested JPEG-based steganographic methods working at the embedding rate of 0.4 bpnzac can be conducted with accuracy, with F1-score and AUC scores above 85%. The detection of hidden content embedded at a low rate of 0.1 bpnzac is problematic both for shallow and deep methods.

In the best case, the detection accuracy reached 85% for the easiest, nsF5 algorithm, while it was significantly lower for UERD and J-Uniward.

**Figure 6.** Comparison of ROC curves for the best neural network and the best ensemble classifier for data hidden (**a**) with density 0.4 bpnzac and (**b**) with density 0.1 bpnzac.
