**1. Introduction**

Steganography is a method of hiding classified information in non-secret material. In other words, we can hide a secret message in data that we publicly send or deliver, hiding the very existence of a secret communication. Steganographic methods pose a significant threat to users, as they can be used to spread malicious software, or can be used by such malware (so-called stegomalware [1]), for example, for C&C communications or to leak sensitive data.

An important share of steganographic methods use multimedial data, including images, as a carrier. These methods are often referred to as digital media steganography and image steganography, respectively. An example is a method used by the Vawtrak/Neverquest malware [2], the idea of which was to hide URL addresses within favicon images. Another example would be the Invoke-PSImage [3] tool, where developers hid PowerShell scripts in image pixels using a commonly used least-significant bit (LSB) approach. Yet another variance may be hiding information in the structure of GIF files [4], which is quite innovative due to the binary complexity of the GIF structure.

It is observed that a growing number of malware infections take advantage of some kinds of hidden transmission, including that based on image steganography. Since malware infections pose a significant threat to the security of users worldwide, finding efficient, reliable, and fast methods of detecting hidden content becomes very important. Therefore, numerous initiatives and projects have been recently initiated to increase malware and stegomalware resilience–one of them is the Secure Intelligent Methods for Advanced RecoGnition of malware and stegomalware (SIMARGL) project [5], realized within the EU Horizon 2020 framework.

The experiments presented in this article are part of this initiative. The aim of our research was to find the most effective automatic methods for detecting digital steganography in JPEG images. JPEG-compressed images are usually stored in files with extensions:

**Citation:** Płachta, M.; Krzemie ´n, M.; Szczypiorski, K.; Janicki, A. Detection of Image Steganography Using Deep Learning and Ensemble Classifiers. *Electronics* **2022**, *11*, 1565. https:// doi.org/10.3390/electronics11101565

Academic Editor: Stefanos Kollias

Received: 14 April 2022 Accepted: 11 May 2022 Published: 13 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

.jpeg, .jpg, .jpe, .jif, .jfif, and .jfi. JPEG compression is commonly used for image storage and transfer; according to [6], 74.3% of web pages contain JPEG images. Therefore, these images can also be easily used for malicious purposes. In this study, we researched various machine learning (ML) methods of creating predictive models able to discover steganographically hidden content that can be potentially used by malware. Such a detection method can be integrated with antimalware software or any other system performing file scanning for security purposes (e.g., a messaging system).

The grea<sup>t</sup> advantage of our research is that we experimented both with shallow ML algorithms and with deep learning methods. As for shallow algorithms, we focused on ensemble classifiers, which have been recently shown to yield good results in detection tasks. When dealing with deep learning methods, we concentrated on a lightweight approach, which did not involve computationally-intensive convolutional layers in a neural network architecture. However, for the sake of simplicity, we did not research the impact of hidden content on detection accuracy—all experiments were conducted with random hidden messages.

Our article is structured as follows: first, in Section 2, we briefly review the state of the art in the area of hiding data in digital images and its detection. Next, in Section 3, we describe the experimental environment, including the test scenarios and the evaluation metrics used. The results are described in Section 4. The article concludes with discussion of the results in Section 5 and a summary in Section 6.

#### **2. Related Work**

This article focuses on JPEG images as carriers of steganographically embedded data. The popularity of this file format has resulted in a number of data-hiding methods being proposed, as well as various detection methods. In this section, we briefly review the basics of JPEG-based image steganography, including the most commonly used algorithms. Next, we proceed to the detection methods.

#### *2.1. JPEG-Based Image Steganography*

While multiple steganographic algorithms operate in the spatial domain, there are some that introduce changes on the level of discrete cosine transform (DCT) coefficients stored in JPEG files. Moreover, certain algorithms are designed to minimize the probability of detection through the use of content-adaptiveness: they embed data predominately in less predictable regions, where changes are more difficult to identify. Such modifications are the most challenging to detect; this is why we selected them for our study. Following other studies, e.g., [7], we chose nsF5 [8], JPEG universal wavelet relative distortion (J-Uniward) [9], and uniform embedding revisited distortion (UERD) [10]. They are briefly characterized in the following subsections.
