**1. Introduction**

In recent times, the rapid growth in computer technology has become core in our lives. The technological advancement such as Cloud computing, Internet of Things, and social media platforms has brought about efficiency, effectiveness, and convenience to both individual and organisational users. However, there is a downside to all this. There are more and more tools on the market today, and the tools created by advanced technology will become more and more difficult to control. Enterprises must increase investment and introduce new defense solutions to deal with them. This all has provided a new type of risk and threats. Due to an increasing reliance upon devices those users are exposed to various Cyber security risks [1]. In particular, individuals as well as organisations which essentially value information secrecy and privacy were greatly concerned about how to secure their data. Information hiding has become a pivotal characteristic of digital society. Against this backdrop, several methods such as steganography and cryptography with complex algorithms have been developed to secure information privacy [2]. Cryptography is intended to conceal the content of messages via data encryption or scrambling, but it cannot hide their existence [3]. In contrast, steganography hides the very existence of secret information while being communicated in cover media files [2–5]. If successful, it attracts no suspicion at all of the presence of such secret information from the point of view of an external observer. This is the main reason why steganography in recent times has received much attention amongst security research. In addtion, steganography has found other uses, such as copyright and e-document forging prevention [6], secret image transfers over Clouds [7] and protection of private health records [8], amongst others.

The problem of detecting hidden content was first formulated in a clear manner by Simmons [9], who modelled the problem as two prisoners attempting to communicated in a covert manner secret messages related to the plan of escape from the prison, whilst the warden would inspect every message

communicated. If suspecting that hidden content was included in a message, the warden would then destroy the message and send the two prisoners into solitary confinement. This is known as the prisoners problem. In fact, there are a lot of real life applications of steganography in politics, diplomacy, and the military [3].

In hiding information using a steganographic procedure, one needs both an embedding algorithm, which takes as input a cover media file in which the secret data message will be embedded resulting in a stego-file. On the other end, one needs a detection algorithm that identifies the stego-file with an affirmation of the existence of the secret message and an extraction algorithm to extract the secret message from the stego-file. This method used in extracting and detecting steganographic activities in any stego-file is called steganalysis. However, similar to any other numerical analysis, steganalysis can have false results, which can be divided into false positive, where an image is clean but it is flagged by the analysis tool as being loaded with a secret message, and false negative, where an image is loaded with a secret message but it is flagged by the analysis tool as being clean.

Previously, in [10], we presented a false positive rate study of the well-known image analysis tool, StegDetect [11]. In this paper, we continue the false rates study of StegDetect by investigating the rate of false negative cases that the tool exhibits. Understanding the rate of false negatives is equally as important as it demonstrates the rate at which the tool fails in detecting hidden content, something that has implications for security and privacy.

The rest of the paper is organised as follows. In Section 2, we discuss related work in current literature. In Section 3, we give an overview of the methodology we used to conduct our research, and describe the datasets used for the analysis. In Section 4, we present the results of the analysis. Finally, in Section 5, we discuss some of the limitations of our experiments and in Section 6, we conclude the paper giving directions for future work.

#### **2. Related Work**

In terms of information hiding, steganography and watermarking are interconnected [12]. Although they share some technical traits, the largest difference is their purpose of use. The former is aimed at engaging in secret communication while the latter is for verifying the identity and authenticity of the owner. Ref. [12,13] argue that imperceptibility, robustness, and payload capacity are parameters of steganography. Compared to this, watermarking concerns the most whether it is robust in order to avoid watermarks being removed or replaced. These parameters can be referred to distinguish it from watermarking and cryptography as well as to compare various types of steganograpy techniques.

There are two groups of people who use steganographic techniques. A steganographer uses analysis tools to reassure whether a steganographic process has been successful, and thus the message is undetectable or unreadable [14]. On the opposite side, a stegoanalyst attempts to detect and read stego-messages. In either way, steganalysis involves two stages: (1) identifying the existence of steganographic messages and (2) reading the embedded message [15].

Various digital steganography methods have been developed in recent years. One commonality is that all methods is based on the fundamental concept that secret messages are embedded in a cover medium to create an output, a stego-file. There are a wide range of steganograpy techniques depending on a type of a cover medium (e.g., text, image, video and audio).

It has been an ongoing debate whether steganography is used by terrorists or criminals [16]. scanned a couple of million images and identified 20,000 suspicious images using 'Stegdetect' [11]. Although no hidden messages were identified in the research, we cannot categorically conclude that stegnography was not misused by malicious actors. Before making the conclusion, available tools should be examined whether they are reliable or not. Therefore it is of importance to check their reliability. However, there have been few research on this.

Detection of steganographic messages does not necessarily have to reveal the hidden content, but merely detecting their presence can carry significant implications in that this can draw unwanted attention from opposite parties. As such, the precision of the detection algorithm is one of its important

attributes. This presents a crucial implication to digital forensic analysts. Ref. [17] defined digital forensic as the approved method used to preserve, collect, validate, identify, analyse, interpret evidence obtained for a digital investigation. In the digital communication era, any sort of criminal investigations are bound to involve digital devices. To establish facts in a court, digital data stored on devices such as computers and smartphones have to be investigated by a forensics analyst.

As malicious actors are equipped with state-of-the-art technologies, forensic analysts have tried to keep pace with them. According to [18], in digital crime there are different methods used by an analyst during their investigation. These methods throughout the investigation must be done in a forensically sound manner. Ref. [19] noted that an investigation is successful and acceptable if the evidence obtained from the original source is not altered in any way. Morever, to raise criminal arrests and convictions, forensic analysts need to ponder over how to reduce the false negative ratio of a tool. If the false negative ratio is high, this indicates that there is a high possibility that a stego-file is not detected, failing to weed out criminals. In this respect, this study aims to investigate the false negative rates of a steganalysis tool, Stegdetect, in order to examine whether this is a reliable tool for digital forensic analysts. This paper complements an earlier study on the false positive rates for Stegdetect carried out in [10].

There are many tools currently in the market for exposing hidden content in images. StegExpose [20,21] is an open-source Java-based steganalysis tool. The tool was designed primarily as a bulk analyser for lossless images, it works by classifying images as clean or stego images based on whether or not a pre-defined analysis threshold has been exceeded. In its standard operation, StegExpose is aimed at detecting different methods (nonlinear adaptive encoding, equidistribution and pseudorandom distribution) Least Significant Bit (LSB) originating from many tools including SilentEye [22], OpenPuff [23] and OpenStego [24]. StegSecret [25], like StegExpose, is also an open-source Java-based tool. It features a very simple graphical user interface, and is aimed at high performance bulk analysis, however this comes at the cost of lack of customisation; the tool is less configurable compared to StegExpose.

Other approaches have been combined in recent times to enhance steganalysis techniques. For example, the authors in [26] used convolutional neural networks to identify noise in different regions in an image and the relationship between noise in different sub-regions, and then classify images according to whether features of the image indicate the presence of hidden content or not. In [27], the authors use ensemble learning methods to construct effective steganalysis of images, which searches for colour correlativity between pixels and colour channels. Convolutional neural networks are used again in [28] as a method to embed hidden content in cover images. Most of these machine learning approaches suffer from lack of transparency as to how general the method can effectively either embed or analyse hidden content.

#### **3. Methodology**

The study has selected one of the automated steganalysis tools, Stegdetect [11] developed by Niels Provos. The purpose of the tool is to identify steganographic content by analysing JPEG images. It is able to detect several steganographic methods (F5 (header analysis), JPHide, invisble secret, outguest and camouflage) [29]. In analysing JPEG images it expresses the level of detection accuracy by appending stars (\*, \*\*, \*\*\*) to whichever steganographic method is detected. One star means the level of confidence in the detection of the specific steganographic method is low, two star means the level of confidence in the identification of steganographic method is quite good, and three star shows a high level of confidence in it. In this paper, we have used Stegdetect Windows version 0.4 which has an easy to use graphical interface. The tool's detection rate was based on the sensitivity value which is between 0.1 and 10.0. However, we have considered sensitivities of (0.1, 0.3, 0.5, 0.7, 10.0). Ref. [10] indicated that the sensitivity values affect the tool's false-negative ratio.

To achieve the purpose of the paper, we looked for a popular steganographic method that embeds data in JPEG image which is detectable by Stegdetect. JPHide [30] has both Windows and Linux version developed by A. Latham in 1999. In this paper we have chosen the Window version 0.5 with a user-friendly interface. Jphide uses least significant bit of the discrete cosine transform coefficient to hide data into any image with JPEG format. Meanwhile, according to [30], 5% insertion rate of data into an image will be very difficult to identify in the absence of the original image. Detection of the Jphide method is independent of the size of the message embedded into the image. This below shows the process we used in generating stego images.
