Research on Key Technologies of Image Steganography Based on Simultaneous Deception of Vision and Deep Learning Models

Zhang, Fan; Dong, Yanhua; Sun, Hongyu

doi:10.3390/app142210458

Open AccessArticle

Research on Key Technologies of Image Steganography Based on Simultaneous Deception of Vision and Deep Learning Models

by

Fan Zhang

,

Yanhua Dong

^* and

Hongyu Sun

^*

College of Mathematics and Computer, Jilin Normal University, Siping 136000, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2024, 14(22), 10458; https://doi.org/10.3390/app142210458

Submission received: 21 October 2024 / Revised: 8 November 2024 / Accepted: 11 November 2024 / Published: 13 November 2024

(This article belongs to the Special Issue Security and Privacy in Complicated Computing Environments)

Download

Browse Figures

Versions Notes

Abstract

:

As machine learning continues to evolve, traditional image steganography techniques find themselves increasingly unable to meet the dual challenge of deceiving both the human visual system and machine learning models. In response, this paper introduces the Visually Robust Image Steganography (VRIS) model, specifically tailored for this dual deception task. The VRIS model elevates visual security through the use of specialized feature extraction and processing methodologies. It meticulously conducts feature-level fusion between secret images and cover images, ensuring a high level of visual similarity between them. To effectively mislead machine learning models, the VRIS model incorporates a sophisticated strategy utilizing random noise factors and discriminators. This involves adding controlled amounts of random Gaussian noise to the encrypted image, thereby enhancing the difficulty for machine learning frameworks to recognize it. Furthermore, the discriminator is trained to discern between the noise-infused encrypted image and the original cover image. Through adversarial training, the discriminator and VRIS model refine each other, successfully deceiving the machine learning systems. Additionally, the VRIS model presents an innovative method for extracting and reconstructing secret images. This approach safeguards secret information from unauthorized access while enabling legitimate users to non-destructively extract and reconstruct images by leveraging multi-scale features from the encrypted image, combined with advanced feature fusion and reconstruction techniques. Experimental results validate the effectiveness of the VRIS model, achieving high PSNR and SSIM scores on the LFW dataset and demonstrating significant deception capabilities against the ResNet50 model on the Mini-ImageNet dataset, with an impressive misclassification rate of 99.24%.

Keywords:

image steganography; dual deception; feature-level fusion; random noise factor; adversarial training

1. Introduction

With the rapid development of digital technology, information security is particularly important in many critical scenarios, such as military communications, commercial confidentiality protection, and personal privacy protection. Computer security [1] protects access to data and operating systems through a variety of mechanisms; however, traditional techniques are no longer able to resist some of the current attacks. Therefore, more and more researchers are exploring new defense techniques such as information hiding techniques [2].

Information hiding techniques, as a product of the ancient art of steganography combined with modern cryptography, hinge on the core principle of utilizing the redundancy of multimedia information and the visual masking of specific information by the human eye. This allows for the ingenious concealment of information within a medium, thus providing robust technical support for encrypted communication and digital content protection. Image steganography [3,4,5], a significant branch of information-hiding techniques within the realm of images, seeks to discreetly transmit secret information by leveraging the visual characteristics of human perception. Research in image steganography can be approached from various angles. One aspect focuses on how to efficiently and securely embed secret information into cover images, while another emphasizes maintaining a high degree of concealment against sophisticated detection methods, such as machine learning models.

Traditional image steganography algorithms have laid the groundwork for research in this area, cleverly integrating sensitive information into cover images through the use of meticulously designed keys and specific embedding techniques. Traditional image steganography algorithms can generally be classified into two main categories: transform domain algorithms and spatial domain algorithms. Transform domain algorithms include discrete wavelet transform domain (DWT) [6,7,8], discrete Fourier transform domain (DFT) [9,10,11], and discrete cosine transform domain (DCT) [12,13,14,15], achieving efficient information embedding through their unique transformation properties. Spatial domain algorithms involve the least significant bit (LSB) [16,17], pixel value differencing (PVD) [18], and wavelet-based optimal weighting (WOW) [19], directly manipulating pixel spaces to achieve covert embedding of information.

However, in the face of increasingly sophisticated machine learning techniques, thanks to the powerful learning and generalization capabilities of machine learning, it is possible to discover details that are difficult to detect by the human visual system in traditional steganography algorithms, thus threatening the security of secret information. To overcome this limitation, researchers have actively explored new image steganography methods. For example, the adaptive steganography algorithms S-UNIWARD [20] and J-UNIWARD [21] proposed by J. Fridrich et al. Although these two algorithms take into account the factors of counter-attacks, their steganography is still challenged in the face of machine learning detection. In addition to this, most of the existing image steganography techniques focus on steganographing textual information or grey-scale images as secret images, while relatively little research has been done on color images.

In this context, the proposed VRIS (Visually Robust Image Steganography) model aims to embed color images as secret information into cover images. By introducing random noise, optimizing embedding strategies, and utilizing adversarial training techniques, it seeks to deceive both the human visual system and machine learning detectors. This research not only provides a new direction and ideas for the development of color image steganography but also offers a technological breakthrough in the field of information security. Moreover, it has broad application prospects in military, commercial, and legal domains, safeguarding information security and ensuring the safety and concealment of critical information during transmission.

The main work of this article is as follows:

(1) Proposed VRIS image steganography model: This study proposes a novel image steganography model named VRIS, which aims to deceive both the human visual system and the machine learning model to ensure the security and concealment of secret information during transmission.

(2) Feature processing of secret images: The VRIS model performs feature extraction and processing of secret images, and performs feature-level fusion with cover images to generate visually indistinguishable encrypted images that successfully deceive the human eye. Meanwhile, adding random Gaussian noise and using discriminators for adversarial training ensures that the encrypted image can successfully deceive the machine learning model.

(3) Extraction of Secret Information, Quality Assessment and Testing of Secretly Reconstructed Image: The VRIS model ensures that legitimate users are able to extract and reconstruct the secret image without any loss, and at the same time prevents unauthorized users from accessing the secret information. In addition, the quality of the encrypted image is evaluated by PSNR and SSIM, and its effectiveness in spoofing machine learning models is verified by classification tests.

2. Related Work

Research on steganography has made significant progress in recent years [22], from the introduction of adversarial samples to the application of Generative Adversarial Networks (GANs), YIJI-based Graph Neural Networks (GNNs), and feature-space hiding strategies, which have collectively advanced the development of steganography. The work of Goodfellow, Szegedy, and others [23,24] has revealed vulnerabilities in machine learning models and inspired novel ideas for the advancement of steganography. The Deepfool method, proposed by Balaaditya et al. [25], significantly advanced the development of adversarial sample generation techniques. This method can efficiently generate adversarial samples capable of deceiving deep neural networks (DNNs), thereby providing new adversarial strategies for steganography. Moreover, the C&W++ method, proposed by Du et al. [26], not only enhances the generation efficiency of adversarial samples but also assesses the robustness of DNNs, offering a more comprehensive tool for adversarial sample generation and evaluation in the context of steganography.

Based on the concept of GAN, Zhang et al. [27] proposed a GAN-based steganographic image generation method, which uses the generative adversarial mechanism of GAN to generate realistic encrypted images by training a generator network which are visually indistinguishable from the original images, so as to achieve the purpose of improving the covertness of steganography. With the continuous research and application of steganography, in order to meet the needs of different application scenarios, Li et al. [28] subsequently proposed a multi-scale GAN method, which is capable of generating encrypted images with different scales, and this feature makes the encrypted image adaptable to different resolutions and sizes while maintaining high covertness, which greatly enriches the application scenarios of steganography.

However, despite the significant progress of multi-scale GAN methods in encrypted image generation, how to further improve the covertness of encrypted images and their resistance to steganalyzers is still an urgent problem to be solved. Tang and colleagues proposed ASDL-GAN [29] based on Generative Adversarial Networks (GANs), which enable adversarial training between generators and discriminators through unsupervised learning, generating realistic steganographic images. Compared to traditional prior knowledge-based methods, this approach significantly enhances the imperceptibility of secret images while maintaining the statistical properties of the images. However, its steganographic effect may still be exposed when facing more advanced steganalyzers. In order to further improve the concealment of encrypted images and their resistance to steganalyzers, Volkhonskiy and others introduced SGAN [30], which embeds secret information into adversarial examples using a steganalysis network, enhancing the realism of cover images while bolstering the resistance of embedded images against steganalysis detection.

Nevertheless, the semantic shortcomings of SGAN may lead to noticeable embedded images. To address this issue, Balijia and collaborators designed a new model based on graph neural networks (GNNs) [31], associating the color channels of secret and cover images with a multi-level embedding strategy. This approach achieves high concealment and relatively low distortion, resulting in visually more natural embedded images. On this basis, Zhang and colleagues introduced ISGAN [32], which tackles color channel strategies by embedding secret information within the Y channel of cover images, thus avoiding color distortion issues. Yet, there remains room for improvement in image quality and resistance to steganalysis detection.

In targeting the deception and security of learning-based detection, Din R et al. [33] proposed a steganography method based on feature embedding. The secret information is cleverly embedded into the features of the image, and the dependence of the model on the features is used to improve the steganography of the information. Meanwhile Wang and Chen [34,35] and others have proposed a deep learning-based steganography method for semantic information, which is able to detect semantic information embedded in the image, providing a new means for security assessment of steganography technology, and also providing new ideas for the subsequent development of steganography technology.

Finally, the work of Li et al. [36] focuses on understanding and exploiting vulnerabilities in machine learning models to design covert methods that can evade or attack detection mechanisms. Research in this area not only enhances the effectiveness of steganographic techniques but also holds significant implications for personal privacy and copyright protection. In summary, although these algorithms have achieved remarkable results in steganography, they still have certain limitations. The VRIS model proposed in this paper aims to overcome the limitations of the existing algorithms: by introducing random noise and optimizing the embedding strategy, it is able to hide the secret information more efficiently and reduce the risk of being detected by the human visual system and the machine learning detector; by using the adversarial training technique, it enhances the resistance to machine learning detection and improves the security and reliability of steganography. Compared with the existing methods, the VRIS model has higher concealment and stronger resistance while maintaining image quality, providing new ideas and methods for the development of steganography (as shown in Table 1).

3. VRIS Image Steganography Model

To achieve dual deception of machine learning models and human visual systems, this study designs the VRIS image steganography model (as shown in Figure 1), aiming for efficient completion of image steganography tasks. The VRIS technology combines the data reconstruction capability of autoencoders with the generative-discriminative framework of Generative Adversarial Networks (GAN), falling under the category of unsupervised learning.

3.1. VRIS Model Architecture and Design

The VRIS model consists of three key components:

Visual-Masker Module: This module extracts characteristics of the hidden image utilizing a multi-scale convolutional kernel. It leverages multi-layer convolution and batch normalization techniques to enhance feature integration [37]. Ultimately, these features are seamlessly mapped onto the cover image, producing a visually indistinguishable first-level encrypted image.

Hidden-Insight Module: This module extracts features from the second-level encrypted image, processes them through activation functions and batch normalization, and subsequently retrieves the information embedded within the hidden image. These refined features are then transformed into the secretly reconstructed image.

AI-Evasion Shield Module: Serving as a discriminator, this module distinguishes between noisy second-level encrypted images and cover images. Through adversarial training, it refines the steganography strategy to bolster the model’s resilience against machine learning-based deceptions.

To elevate the complexity and confidentiality of the first-level encrypted image, random noise is introduced prior to decoding. This gives rise to the second-level encrypted image, characterized by randomness, controllability, and diversity. Randomness ensures that each noise instance is unique, while controllability facilitates experimental adjustments. By varying the noise intensity, a range of experimental conditions can be explored. The incorporation of random noise serves to obscure image details, enhance the stealth of the hidden image, and fortify the model against potential attacks.

3.2. Image Preprocessing

Objective: Through a series of preprocessing steps, increase the diversity of the training dataset, enhance the model’s generalization ability, and optimize the model’s learning of image features.

Input: Original image dataset

D = {I_{1}, I_{2}, \dots, I_{N}}

, where N is the total number of images.

Output: Preprocessed image set

D^{'}

.

Steps:

Random cropping (training set):

For each image

I_{i}

in the training set, use the Random Crop function to randomly select a fixed-size area for cropping, generating the cropped image

I_{i}^{'}

. Repeat this process until the entire training set is cropped, resulting in the cropped training set

D_{t r a i n}^{'}

.

2.: Center Crop (Testing Set):

For each image

I_{j}

in the testing set, use the Center Crop function to crop a fixed-size area around the center point of the image, generating the cropped image

I_{j}^{″}

. Repeat this process until the entire testing set is cropped, resulting in the cropped training set

D_{t e s t}^{'}

.

3.: Convert to Tensor:

For each image,

I^{'}

and

I^{″}

in

D_{t r a i n}^{'}

and

D_{t e s t}^{'}

, use the Tensor function to convert them to tensor format

T^{'}

and

T^{″}

. This results in the tenderized training set

D_{t r a i n}^{t e n s o r}

and testing set

D_{t e s t}^{t e n s o r}

.

4.: Image Normalization:

For each tensor

T^{'}

and

T^{″}

in

D_{t r a i n}^{'}

and

D_{t e s t}^{'}

, apply the transforms. Normalize () function for normalization, resulting in the normalized tensor

T_{n o r m}^{'}

and

T_{n o r m}^{″}

. This normalization process eliminates scale differences between features by adjusting pixel value scales, enhancing the model’s learning efficiency regarding image features.

5.: Dataset Random Subsampling:

If the dataset is too large, you can use a random subset. The sample function performs random subsampling

D_{t r a i n}^{t e n s o r, n o r m}

to obtain the training set

D_{t r a i n}^{s u b}

after subsampling, which helps to reduce the consumption of computing resources and time (as shown in Figure 2 and Figure 3) and improves the efficiency of the training process.

3.3. Basic Results of the VRIS Basic Model

3.3.1. Visual-Masker

In designing the Visual-Masker module (as shown in Figure 4 and Algorithm 1), we focused on achieving efficient steganography of secret images within cover images while maintaining the visual naturalness of the cover image. To achieve this, we constructed the module based on convolutional neural networks, dividing it into two parts: feature extraction and information hiding.

Algorithm 1. Algorithm of the Visual-Masker.

1: FUNCTION Initialize_Visual-Masker (input_S, input_C):
2: # Step 1: Process the secret image through the initial convolutional layer
3: x1 = leaky_relu(bn1(conv1(input_S))) # Process input_S with conv1, bn1
4: x2 = leaky_relu(bb2(conv2(input_S))) # Process input_S with conv2, bn2
5: x2 = PAD (x2, (0, 1, 0, 1)) # Pad x2 with zeros at the bottom right
6: x3 = leaky_relu(bn3(conv3(input_S))) # Process input_S with conv3, bn3
7: x4 = CONCATENATE (x1, x2, x3) # Concatenate x1, x2, x3 along the channel dimension
8: # Step 2: Further process feature map x4 through deeper convolutional layers
9: x1 = leaky_relu(bn4(conv4(x4))) # Process x4 with conv4, bn4
10: x2 = leaky_relu(bn5(conv5(x4))) # Process x4 with conv5, bn5
11: x2 = PAD (x2, (0, 1, 0, 1)) # Pad x2 with zeros at the bottom right
12: x3 = leaky_relu(bn6(conv6(x4))) # Process x4 with conv6, bn6
13: x4_prime = CONCATENATE (x1, x2, x3) # Obtain processed feature map x4’
14: # Step 3: Concatenate the cover image input_C with feature map x4’
15: x_combined = CONCATENATE (input_C, x4_prime) # Concatenate along the channel dimension
16: # Step 4: Further process x_combined through a hidden network
17: FOR i FROM 1 TO N: # N is the number of repetitions
18: x1_hidden = leaky_relu(bn7(conv7(x_combined))) # Process with conv7, bn7
19: x2_hidden = leakyrelu(bn8(conv8(x_combined))) # Process with conv8, bn8
20: x2_hidden = PAD (x2_hidden, (0, 1, 0, 1)) # Pad x2_hidden
21: x3_hidden = leaky_relu(bn9(conv9(x_combined))) # Process with conv9, bn9
22: x_combined = CONCATENATE (x1_hidden, x2_hidden, x3_hidden)
# Concatenate hidden feature maps
23: x-hidden = x_combined # Final hidden feature map containing secret image information
24: # Step 5: Output the final image
25: output-image = tanh(conv16(x-hidden)) # Process with conv16 and apply tanh activation
26: RETURN output-image # First-level encrypted image

First, the feature extraction module preprocesses the cover image through three parallel convolutional layers (conv1, conv2, conv3). Each layer employs different kernel sizes (3 × 3, 4 × 4, and 5 × 5). This multi-scale approach is rooted in the varying abilities of kernel sizes to capture image details: the 3 × 3 kernel excels at extracting local information, while the 4 × 4 and 5 × 5 kernels are better for capturing global features. The acquired feature maps are concatenated along the channel dimension to form a richer feature map, which is then concatenated with the secret image in the same dimension. This achieves the fusion of the secret image with the processed cover image features.

Figure 4. Visual-Masker module structure.

The information-hiding module is responsible for deeply merging the extracted features of the secret image with the cover image. We utilize a series of convolutional layers and batch normalization layers (from conv7 to conv21) to further process the fused feature map obtained from the feature extraction module. Throughout this process, we not only adjust the dimensions of the feature maps but also enhance feature extraction and incorporate information from the secret image. This ensures that the embedded secret information remains concealed while preserving the cover image’s visual naturalness and authenticity. Additionally, multiple concatenation operations occur to combine outputs from different convolutional layers along the channel dimension, allowing us to form a more complex feature representation.

Finally, the Conv22 layer converts the final feature map into the output image. This convolutional operation reduces the feature map’s channels to three, corresponding to the RGB color channels, and applies the tanh activation function to normalize the output values within the range of −1 to 1, satisfying image data representation requirements.

The design of Visual-Masker follows an end-to-end principle, enabling it to learn the mapping relationship between the input image and its corresponding first-level encrypted output image. Each convolution layer is followed by a batch normalization layer to accelerate the training process and reduce internal covariate shifts. Additionally, at several critical points in the network, we utilize the LeakyReLU activation function, which introduces non-linear features and maintains effective gradient propagation throughout the network. This approach aids the model in learning complex mapping relationships.

With the above design, the Visual-Masker module is able to learn the deep features of the input image and also enables efficient steganography of secret images while maintaining the visual naturalness of the cover images. This design allows secret images to be hidden in the cover images without attracting the attention of the human eye, providing an efficient and stealthy solution to the image steganography task.

3.3.2. AI-Evasion Shield

In the architecture of VRIS, the AI-Evasion Shield module (Table 2) is the key component to improve the steganographic performance of encrypted images, and the discriminator as the core of this module adopts a multi-layer cascaded convolutional network structure, with each layer incorporating a LeakyReLU activation function (with the negative slope set to 0.2). This choice aims to avoid the gradient vanishing problem, and ensure that the gradient information can be effectively propagated even in the deeper layers of the network, accelerating the learning and enhancing the nonlinear feature extraction as well as the model’s ability to characterize complex image data.

In order to further enhance the stability and generalization ability of the model, a batch normalization layer is embedded after some of the convolutional layers, a strategy that effectively reduces the internal covariate bias, accelerates the training process, facilitates the extraction of more effective and stable feature representations from the input image, and enables the discriminator to focus on the most discriminative abstract information in the image by gradually reducing the spatial dimensionality of the features.

Compared with the traditional ReLU activation function, the LeakyReLU activation function allows small negative gradients to pass through, thus alleviating the problem of neuron ‘death’, while the batch normalization layer stabilizes the training process and speeds up convergence by normalizing the inputs between layers.

During the training process, the discriminator of AI-Evasion Shield receives two sets of inputs: one is the original cover image, and the other is the second-level encrypted image processed by the VRIS steganography algorithm with the addition of random Gaussian noise. Through a series of convolution, activation and batch normalization operations, the discriminator maps these input images to a scalar value between 0 and 1, which is converted by a sigmoid function and used as the discriminator’s confidence score for the authenticity of the input images. This score not only reflects the discriminator’s judgement on the authenticity of the image, but also provides an important reference for subsequent image processing and steganography techniques.

Through this design, the AI-Evasion Shield discriminator not only learns to differentiate between authentic cover images and second-level encrypted images, but also significantly enhances the difficulty for machine learning models to detect hidden information. This adversarial training approach improves the discriminator’s performance while simultaneously increasing the concealment of hidden information within the encrypted image.

3.3.3. Hidden-Insight

The Hidden-Insight module (as shown in Figure 5 and Algorithm 2) is meticulously designed to extract concealed secret images from second-level encrypted images, addressing the needs of legitimate users with precision. Its design principles primarily leverage convolutional neural network architectures found in deep learning. Deep learning leverages the superior capabilities of this architecture in feature extraction and representation learning.

Algorithm 2. Algorithm of the Hidden-Insight.

1: FUNCTION Initialize_Hidden_Insight(input_S):
2: # Step 1: Perform preliminary feature extraction and transformation on the carrier image
3: x1 = relu(conv1(input_S)) # Extract first feature map
4: x2 = relu(conv2(input_S)) # Extract second feature map
5: x2 = PAD(x2, (0, 1, 0, 1)) # Pad x2 to ensure consistent sizes
6: x3 = relu(conv3(input_S)) # Extract third feature map
7: x_new1 = CONCATENATE (x1, x2, x3) # Concatenate feature maps along the channel dimension
8: # Step 2: Further transform the concatenated feature map x_new1
9: x1’ = relu(conv4(x_new1)) # Extract new features
10: x2’ = relu(conv5(x_new1)) # Extract new features
11: x2’ = PAD (x2, (0, 1, 0, 1)) # Pad x2
12: x3’ = relu(conv6(x_new1)) # Extract new features
13: x_new2 = CONCATENATE (x1’, x2’, x3’) # Concatenate feature maps
14: x_atlas = CONCATENATE (input_S, x_new2) # Concatenate carrier image with x_new2
15: # Step 3: Repeat the process from step 2 three times to deepen the network
16: FOR i FROM 1 TO 2: # Repeat twice
17: x1 = relu(conv7(x_new2)) # Extract additional features
18: x2 = relu(conv8(x_new2)) # Extract additional features
19: x2 = PAD (x2, (0, 1, 0, 1)) # Pad x2
20: x3 = relu(conv9(x_new2)) # Extract additional features
21: x4_deep = CONCATENATE (x1, x2, x3) # Update x4_deep
22: # Step 4: Transform the final feature map
23: output_image = tanh(conv16(x4_deep)) # Process with conv16 and apply tanh activation
24: RETURN output_image # Secretly Reconstructed Image

In the architecture of the Hidden-Insight module, the convolutional layers within the module significantly reduce the number of parameters through parameter sharing, effectively lowering the risk of overfitting while capturing the local correlations within images. This aspect is crucial for extracting key features of secret images. In order to fully cope with the multi-scale characteristics that secret image information may present, the Hidden-Insight module sets up convolutional kernels of different sizes and configurations in the first few layers to capture feature information at different scales, and accelerates the training process to enhance the generalization ability of the model so that it can cope with different input data.

As the second-level encrypted image processes through the layers, Hidden-Insight gradually refines the core features of the secret image through multiple convolutions, activations (such as ReLU), and feature concatenations. Particularly in the final layers, carefully designed combinations of convolutional layers further enrich the representation and accuracy of the features.

Lastly, to ensure the visual naturalness and authenticity of the output image, the Hidden-Insight module employs a tanh activation function in the final layer. The tanh function normalizes the output values within a suitable range for image representation, yielding a high-quality secretly reconstructed image. This design guarantees that the extracted secret image visually aligns with the original image, satisfying the needs of legitimate users.

3.4. Hybrid Loss Function

3.4.1. Mean Square Error Loss

Mean Squared Error (MSE) is a commonly used regression loss function to measure the difference between predicted and true values. In the image steganography task, our goal is to achieve that the encrypted image formed after embedding a secret image into a cover image is as similar as possible to the cover image in terms of visual effect, so as to avoid arousing suspicion. In order to quantify this difference, we use the mean square error as a loss function to compute the reconstruction loss between the secret image and the cover images.

M S E = \frac{1}{T} {\sum_{i = 1}^{T} (y_{i} - {\hat{y}}_{i})}^{2}

(1)

where

y_{i}

represents the true value of the ith sample,

{\hat{y}}_{i}

represents the predicted value of the predicted value of the ith sample, and T is the number of samples, because the MSE is obtained by calculating the mean of the squares of the difference between the predicted value and the true value; it is so-called because it is more sensitive to large errors.

3.4.2. Adam Optimizer

In image steganography tasks, the model usually needs to deal with multiple aspects of the image, and the gradient variations of different parameters during training may vary greatly. Adam is a gradient descent-based optimization algorithm that combines the ideas of Force Momentum (Momentum) and RMSprop, with the property of adaptive learning rate. Its ability to adjust the adaptive learning rate enables the model to remain efficient and stable while dealing with these differences. Secondly, the Adam optimizer not only considers the first order moment estimation of the gradient (i.e., the mean of the gradient), but also the second order moment estimation, so that it can dynamically adjust the learning rate of each parameter, and quickly converge to find the optimal solution during the training process.

3.4.3. Adam Optimizer Core Formula

Momentum estimation:

$m_{t} = β_{1} m_{t - 1} + (1 - β_{1}) g_{t}$

(2)

where $m_{t}$ represents the first order estimate of the gradient (momentum), $g_{t}$ is the current gradient and $β_{1}$ represents the exponential decay rate of the momentum term.
Uncentred variance estimation:

$v_{t} = β_{2} v_{t - 1} + (1 - β_{2}) g_{t}^{2}$

(3)

where $v_{t}$ represents the second order estimate of the gradient (uncentred variance) and $β_{2}$ represents the exponential decay rate of the variance term.
Bias correction:

${\hat{m}}_{t} = \frac{m_{t}}{1 - β_{1}^{t}}$

(4)

${\hat{v}}_{t} = \frac{v_{t}}{1 - β_{2}^{t}}$

(5)
Parameter update:

$θ_{t} = θ_{t - 1} - α \frac{{\hat{m}}_{t}}{\sqrt{{\hat{υ}}_{t}} + ε}$

(6)

where $θ_{t}$ represents the parameter, $α$ is the learning rate, ${\hat{υ}}_{t}$ and ${\hat{m}}_{t}$ are the bias-corrected versions of the first-order and second-order moment estimates of the gradient, respectively, and $ε$ is a constant used to prevent division by zero.

3.4.4. A Variant of the Binary Cross-Entropy Loss

The binary cross-entropy loss function updates the parameters of the model through optimization algorithms such as gradient descent, making the model’s predictions closer to the true labels. By minimizing the binary cross-entropy loss function, we can train more accurate and efficient image steganography models. The basic form of the binary cross-entropy loss function can be expressed as follows:

L (y, \hat{y}) = - y \log (\hat{y}) - (1 - y) \log (1 - \hat{y})

(7)

where

y

represents the true label of the sample, and

\hat{y}

represents the predicted probability of the discriminator on the sample.

The discriminator in GAN is essentially a binary classifier that can distinguish whether the input sample is a real sample or a generated sample. Therefore, during training for image steganography tasks, we use a variant of the binary cross-entropy loss function that is often used to measure the accuracy of the model in predicting hidden information.

The loss function of the discriminator is divided into two parts: the first part represents the loss of the discriminator’s prediction success for the true samples, and the second part is the loss of the discriminator’s prediction success for the generated samples, where in general we would like

D (x)

to be close to 1, and

D (G (z))

close to 0. The loss function of the discriminator is as follows:

L_{D} = - E_{x ~ P_{d a t a}} [\log D (x)] - E_{z ~ P_{z}} [\log (1 - D (G (z)))]

(8)

where

D

represents the discriminator, G is the generator,

P_{data}

is the real data distribution, and

P_{z}

represents the noise distribution.

The generator goal is to trick the discriminator into producing false predictions, and in general, the generator wants to maximize the discriminator’s prediction probability for the generated samples. Therefore, in the actual training, we will maximize

E_{z ~ P_{z (z)}} [\log (D (G (z)))]

. The generator’s loss function is shown below:

L_{G} = - E_{z ~ P_{z (z)}} [\log (D (G (z)))]

(9)

4. VRIS Multidimensional Assessment

In this section, the details of the experiment, the selection of the dataset, and the evaluation metrics will be presented.

4.1. Experimental Setup and Dataset

In this study, the Adam optimizer was used to optimize the parameters of the generator and discriminator, the learning rate was set to 0.001, and the ReduceLROnPlateau scheduler was used to dynamically adjust the learning rate. Trained in batches of 46, the model was trained on 1000 epochs, with each epoch randomly selecting 3200 images for training. Tiny ImageNet-200, Mini-ImageNet, 256_ObjectCategories, and LFW datasets were selected as experimental datasets. Tiny ImageNet-200 is a small dataset consisting of 200 categories of images, each containing 500 training images, 50 validation images, and 50 test images, which is commonly used for model training and evaluation of image classification tasks. The Mini-ImageNet dataset is also a dataset for small-sample image classification, which consists of a subset of categories in the ImageNet dataset, each containing 600 images. Compared to Tiny-ImageNet-200, the images of Mini-ImageNet are larger and clearer. Mini-ImageNet datasets are commonly used in small-shot learning, meta-learning, transfer learning, and adversarial training. The 256_ObjectCategories dataset is an image dataset containing 256 different object categories. The dataset is manually filtered from the Google Open Images dataset and is mainly used for tasks such as image classification, object recognition, and image segmentation. The LFW dataset is a commonly used face recognition dataset, which contains a set of real-world face images from the Internet, which have great variation, such as different postures, expressions, lighting conditions, etc. In this study, all images from these datasets were resized to 64 × 64 pixels.

4.2. Visual Quality Assessment

4.2.1. Evaluation Indicator Selection

PSNR

PSNR (Peak Signal-to-Noise Ratio) is an indicator used to evaluate image quality, especially in image steganography, where it measures the quality degradation of encrypted images compared to the originals. It is derived by calculating the Mean Squared Error (MSE) between the original image and the encrypted image and normalizing it to the maximum pixel value of the image. A higher PSNR value indicates a smaller difference in quality between the encrypted image and the original image, meaning that the embedding process has a minimal impact on image quality, and the secret information is effectively hidden. PSNR mainly focuses on the image pixel-level error, and in image steganography, the level of PSNR value directly reflects the extent to which the image quality is maintained after secret image embedding.

2.: SSIM

SSIM is an image quality evaluation metric that simulates the human eye’s visual system and evaluates the similarity of images by considering brightness, contrast, and structure. In image steganography, SSIM is used to quantify the similarity between the encrypted image and the original image and to evaluate the influence of embedded information on image quality. A higher SSIM value indicates that the image still maintains a better visual quality after embedding the secret information, and at the same time can effectively hide the secret information. Compared with PSNR, SSIM pays more attention to image structure and provides an assessment that is more in line with human perception.

3.: Noise factor

In this study, the noise factor is used as a very important parameter to control the intensity of the random Gaussian noise by adjusting its value, which affects the performance of the image steganography model and produces encrypted images of different quality.

4.2.2. Results of the Experiment

Robustness is a crucial metric for evaluating the resistance of image steganography models against various attacks and noises. In assessing the robustness of the VRIS model, we selected the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) as the two primary performance evaluation criteria.

In this study, we evaluate the performance of the VRIS model by comparing the PSNR and SSIM of different datasets under different noise factors (as shown in Table 3). In Tiny-ImageNet, the PSNR fluctuates slightly with increasing noise factor, while the SSIM is relatively stable, which indicates that the VRIS model is more robust in maintaining the image structure by optimizing the image under a specific noise intensity than random noise on image impressions. In Mini-ImageNet and LFW, both PSNR and SSIM remain relatively stable with increasing noise factor, and this result proves that the VRIS model is robust enough to resist the interference of random Gaussian noise while maintaining the high quality of the encrypted image. In 256-ObjectCategories, PSNR fluctuates more under different noise factors, while SSIM is relatively stable, which implies that the VRIS model is more sensitive to the changes of parameters such as luminance and contrast on this dataset, and also indicates that the VRIS model shows strong stability in maintaining the image structure.

We compared the quality of encrypted images generated by the VRIS model with the PSNR and SSIM of encrypted images generated by the state-of-the-art steganography model (shown in Table 4). The VRIS model is better on the ImageNet and LFW datasets, and the values of PSNR and SSIM are higher than or equal to those of ISGAN, SGAN, and the methods in the literature [38]. This indicates that VRIS outperforms or equals the other compared methods in terms of both image quality and similarity.

To further validate the quality of images generated by VRIS, we compared the histograms of each channel for the cover and first-level encrypted image. The results are illustrated in Figure 6. The x-axis of the histogram represents pixel values, which correspond to the brightness or color values of each pixel in the image. The y-axis indicates frequency, showing the number of times pixels with the same value appear in the image. Thus, the histogram provides a clear understanding of the brightness or color distribution before and after steganography. The minor differences in channel histograms before and after steganography indicate that the proposed steganographic model possesses good embedding capacity and quality. It effectively maintains the visual quality of the images after embedding the secret image into the cover image, avoiding noticeable changes or distortions.

4.3. Security Analysis

4.3.1. Evaluation Indicator Selection

The Misclassification Rate is a key metric used to evaluate the performance of a classification model. It refers to the extent to which a model makes errors in the prediction process, which is defined as the ratio of the number of samples with classification errors to the total number of samples. In the image steganography task, the misclassification rate measures the misclassification effect of the adversarial sample when it deceives the target machine learning model. Specifically, it refers to the proportion of the total number of samples in an image dataset that successfully misleads a machine learning classifier to produce a misclassification in an image dataset containing steganography information. A high misclassification rate indicates that the image steganography model has a high technical level and innovation ability in the embedding and encoding of steganography information.

4.3.2. Experimental Results

Security is also a vital metric in evaluating image steganography models. In this study, we enhanced the security of the image steganography algorithm by adding random noise to the first-level encrypted image, thereby increasing the difficulty for unauthorized users to extract the secret information. The presence of noise prevents unauthorized users from directly extracting the hidden secret information from the encrypted image without undergoing additional processing. The addition of noise makes the secret information more difficult to analyze and extract, effectively safeguarding the security of the hidden information. Without sufficient information and keys, unauthorized users find it challenging to restore the original secret information, thereby protecting the confidentiality of the information. The specific experimental results are presented in Table 5.

In summary, by adding noise to the first-level encrypted image, we can confuse unauthorized users’ analysis of the image content, enhancing the security of image steganography algorithms. Moreover, the misclassification rate is a crucial evaluation metric in image steganography models. We selected InceptionV3 and ResNet50 to detect the misclassification rates of second-level encrypted images generated by our steganography model under varying noise intensities. According to the experimental results (Table 6), for InceptionV3, the misclassification rate shows an overall increasing trend as the noise factor rises, reaching 87.18% on the LFW dataset. For ResNet50, the misclassification rate also fluctuates under different noise factors, peaking at 99.24% when the noise factor is 0.05, which is quite notable. This indicates that our proposed image steganography model can effectively deceive machine learning models, resulting in erroneous classification outcomes while concealing secret images, demonstrating significant security and robustness. This is crucial for preserving the protection and privacy of steganographic information, especially in the context of sensitive data transmission and storage.

In addition, to further verify that the VRIS model also has significant advantages in terms of effectiveness and robustness, we conducted a comparative analysis of multiple steganographic algorithms using the excellent steganalysis algorithms SRM and XuNet. The experimental results show that the VRIS model has a significant advantage in terms of misclassification rate, which is higher than that of other algorithms under both steganalysis algorithms, which indicates that VRIS has a stronger ability to evade steganalysis detection (as shown in Table 7).

4.4. Analysis of Concealment

To better demonstrate the pixel differences between the encrypted image and the cover image, we enhanced the residual images of the encrypted image and the cover image by 5, 10, 15, and 20 times (Figure 7). The specific experimental results are shown in the figure. When the residual images of the encrypted image and the cover image are enhanced by different multiples, the noise points gradually increase and become scattered, making it more difficult to identify valid information, which helps protect the privacy of the carrier image and the secret image and prevents unauthorized access to sensitive information in the image. Furthermore, the scattered noise points can interfere with the vision of unauthorized users, making it difficult for them to distinguish the information in the image, thereby increasing the difficulty of analysis for unauthorized users and ensuring the concealment of the secret image.

We compared and analyzed the residual results of the method proposed in the literature [38], the VRIS model, and the ISGAN model (shown in Figure 8). Generally speaking, the color depth of the residual image reflects the degree of pixel difference between encrypted images and cover images: the darker the color, the smaller the pixel difference between the two, the less the image distortion, and the better the visual quality of encrypted images. When the residual multiplier is 1, the method proposed in the literature [38] has been able to vaguely identify the infrared imaging features of the face, while the residual images generated by both the VRIS model and the ISGAN model present better concealment. With the gradual increase of the residual magnification, the difference becomes gradually more obvious, and when the residual magnification is 5 and 10, observing the residual images of the ISGAN model and the literature [38], it can be found that the cover images can be almost completely recognized. While some detailed differences can also be observed in the residual images of the VRIS model, the degree of difference is significantly lower compared to the ISGAN model and the literature [38], and the cover images cannot be recognized. This suggests that the VRIS model is able to maintain the visual quality of the cover images while embedding the secret images compared to the literature [38] and the ISGAN model, resulting in a higher degree of concealment of the secret images.

5. Reversibility Evaluation of VRIS Image Steganography Model

Reversibility of image steganography algorithms is crucial for legitimate users during the information transmission process to ensure the integrity and accuracy of secret images during extraction. As for illegal users, reversibility leads to difficulties in obtaining information about the secret image, thus further enhancing the security of the information. In order to evaluate the reversibility of the VRIS image and hence the frontal model, we choose pixel error and extraction accuracy as evaluation metrics. Pixel error provides a visual measure of the difference between two images of the same resolution, while extraction accuracy is used to quantify the similarity between encrypted images and the secretly reconstructed image; a higher accuracy means that the extracted information is closer to the original secret image.

Reversibility Analysis

The experimental results are shown in Table 8, where the VRIS model shows high accuracy in extracting secret images from encrypted images with different noise factors. Specifically, on the Tiny-ImageNet dataset, the average extraction accuracy of the model without noise is above 96% and reaches up to 98.45%, while in the case with noise, the accuracy still remains above 80% although it decreases. Similarly, the average extraction accuracy of VRIS in the presence of noise remains high on Mini-ImageNet, 256-ObjectCategories and LEW. This implies that the VRIS model can effectively maintain the information integrity and accuracy of secret images throughout the encryption and decryption process.

To further validate the reversibility of the VRIS image steganography model, we conducted a comparative analysis of the histogram of each channel between the original secret image and the secretly reconstructed image. This comparison provides a visual observation of the distribution of image brightness or colors. The results (Figure 9) indicate that there is minimal difference in the histogram of each channel before and after extraction, conclusively demonstrating the excellent reversibility of the VRIS steganography model proposed in this study. During the process of extracting the secret image, the secretly reconstructed image maintains a high degree of integrity and accuracy, exhibiting virtually no distortion.

6. Summary and Future Prospects

This paper introduces VRIS, an image steganography model capable of simultaneously deceiving both the human visual system and machines. By replacing the generator role in Generative Adversarial Networks (GANs) with an autoencoder, VRIS achieves the steganography of secret images. Random Gaussian noise is added to the encrypted image to ensure that unauthorized users cannot access the specific information of the secret image, thereby preserving its security and concealment. Experimental validation of the misclassification rates of the encrypted images generated by VRIS on InceptionV3 and ResNet50 demonstrates its strong security and robustness, particularly against various attacks, including those based on deep learning models.

However, as the noise factor increases, slight color differences emerge between the reconstructed secret image and the original, potentially compromising the integrity and accuracy of the secret information, particularly in applications where image quality is stringently required. Therefore, future research should focus on reducing the noise introduced during embedding through noise suppression techniques or enhancing the quality and accuracy of the secretly reconstructed image by optimizing the embedding algorithm. Furthermore, considering the various interference factors that may exist in practical application scenarios, deep learning methods can be explored to improve the robustness of the embedding algorithm, enabling effective embedding and extraction of secret images in diverse environments.

Given that the current VRIS model primarily focuses on image steganography, future endeavors can explore extending the model to multimodal data such as video and audio to cater to diverse application needs. This necessitates addressing the unique challenges of multimodal data, including synchronization and continuity while maintaining high concealment.

Author Contributions

Conceptualization: F.Z.; software: F.Z.; formal analysis: F.Z.; investigation: F.Z.; data curation: F.Z.; writing—original draft preparation: F.Z.; writing—review and editing: Y.D. and H.S.; visualization: F.Z.; supervision: Y.D. and H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Jilin Province Science and Technology Development Plan Project—Youth Growth Science and Technology Plan Project (20220508038RC), New Generation Information Technology Innovation Project of China University Industry, University and Research Innovation Fund (2022IT096), Jilin Province Innovation and Entrepreneurship Talent Project (2023QN31), Natural Science Foundation of Jilin Province (No. YDZJ202301ZYTS157, 20240601034RC, 20240304097SF), Innovation Project of Jilin Provincial Development and Reform Commission (2021C038-7).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Jamila, F.; Korba, O. Active intrusion detection and prediction based on temporal big data analytics. Int. J. Knowl. Intell. Eng. Syst. 2023, 28, 1–30. [Google Scholar] [CrossRef]
Memon, N.; Wong, P.W. Protecting digital media content. Commun. ACM 1998, 41, 35–43. [Google Scholar] [CrossRef]
Fridrich, J.; Goljan, M. Practical steganalysis of digital images: State of the art. In Proceedings of the SPIE—The International Society for Optical Engineering, San Jose, CA, USA, 21 January 2002; Volume 4675, pp. 1–13. [Google Scholar] [CrossRef]
Liao, X.; Yu, Y.; Li, B.; Li, Z.; Qin, Z. A New Payload Partition Strategy in Color Image Steganography. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 685–696. [Google Scholar] [CrossRef]
Wang, X.; Wang, X.; Ma, B.; Li, Q.; Shi, Y.-Q. High Precision Error Prediction Algorithm Based on Ridge Regression Predictor for Reversible Data Hiding. IEEE Signal Process. Lett. 2021, 28, 1125–1129. [Google Scholar] [CrossRef]
Percival, D.B. Wavelet Methods for Time Series Analysis; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
Mallat, S.G. A Wavelet Tour of Signal Processing: The Sparse Way; China Machine Press: Beijing, China, 2009. [Google Scholar]
Lin, W.-H.; Horng, S.-J.; Kao, T.-W.; Fan, P.; Lee, C.-L.; Pan, Y. An Efficient Watermarking Method Based on Significant Difference of Wavelet Coefficient Quantization. IEEE Trans. Multimed. 2008, 10, 746–757. [Google Scholar] [CrossRef]
Cooley, J.W.; Tukey, J.W. An Algorithm for the Machine Calculation of Complex Fourier Series. Math. Comput. 1965, 19, 297–301. [Google Scholar] [CrossRef]
Ramkumar, M.; Kansu, A.N.; Alatan, A.A. A robust data hiding scheme for images using DFT. In Proceedings of the 1999 International Conference on Image Processing (Cat. 99CH36348), Kobe, Japan, 24–28 October 1999. [Google Scholar] [CrossRef]
Xu, D.H.; Zhu, C.Q.; Wang, Q.S. A Construction of Digital Watermarking Model for the Vector Geospatial Data Based on Magnitude and Phase of DFT. J. Beijing Univ. Posts Telecommun. 2011, 34, 25. [Google Scholar] [CrossRef]
Parah, S.A.; Sheikh, J.A.; Loan, N.A.; Bhat, G.M. Robust and blind watermarking technique in DCT domain using inter-block coefficient differencing. Digit. Signal Process. 2016, 53, 11–24. [Google Scholar] [CrossRef]
Baiya, M.; Obaidat, M.S. On the Importance of the DCT Phase for Image Steganography Schemes. In Proceedings of the 2020 IEEE 5th International Conference on Computing Communication and Automation (ICCCA), Greater Noida, India, 30–31 October 2020. [Google Scholar] [CrossRef]
Ahmad, S.; Ogala, J.O.; Ikpotokin, F.; Arif, M.; Ahmad, J.; Mehfuz, S. Enhanced CNN-DCT Steganography: Deep Learning-Based Image Steganography Over Cloud. SN Comput. Sci. 2024, 5, 408. [Google Scholar] [CrossRef]
Baiya, M.; Rabie, T.; Kamel, I. Achieving Stronger Compaction for DCT-Based Steganography: A Region-Growing Approach. In Trends and Innovations in Information Systems and Technologies; Springer: Cham, Switzerland, 2020. [Google Scholar] [CrossRef]
Singh, A.; Singh, H. An improved LSB based image steganography technique for RGB images. In Proceedings of the 2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, India, 5–7 March 2015. [Google Scholar] [CrossRef]
Dumre, R.; Dave, A. Exploring LSB Steganography Possibilities in RGB Images. In Proceedings of the 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kharagpur, India, 3 November 2021; pp. 1–7. [Google Scholar] [CrossRef]
Vishnu, B.; Sajeesh, S.R.; Namboothiri, L.V. Enhanced Image Steganography with PVD and Edge Detection. In Proceedings of the 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 11–13 March 2020. [Google Scholar] [CrossRef]
Holub, V.; Fridrich, J. Designing Steganographic Distortion Using Directional Filters. In Proceedings of the 2012 IEEE International Workshop on Information Forensics and Security (WIFS), Costa Adeje, Spain, 2–5 December 2012. [Google Scholar] [CrossRef]
Holub, V.; Fridrich, J. Digital image steganography using universal distortion. In Proceedings of the First ACM Workshop on Information Hiding and Multimedia Security (IH&MMSec ’13), Montpellier, France, 17–19 June 2013; p. 59. [Google Scholar] [CrossRef]
Holub, V.; Fridrich, J.; Denemark, T. Universal distortion function for steganography in an arbitrary domain. EURASIP J. Inf. Secur. 2014, 2014, 1. [Google Scholar] [CrossRef]
Kheddar, H.; Hemis, M.; Himeur, Y.; Megías, D.; Amira, A. Deep learning for steganalysis of diverse data types: A review of methods, taxonomy, challenges and future directions. Neurocomputing 2024, 581, 127528. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Sheens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. arXiv 2014, arXiv:1412.6572. [Google Scholar] [CrossRef]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets; MIT Press: Cambridge, MA, USA, 2014. [Google Scholar]
Balaaditya, M.; Dunston, S.D. Analysis of the Effect of Adversarial Training in Defending EfficientNet-B0 Model from DeepFool Attack. In Proceedings of the 2023 3rd International Conference on Intelligent Communication and Computational Techniques (ICCT), Jaipur, India, 19–20 January 2023; pp. 1–7. [Google Scholar]
Du, C.; Huo, C.; Zhang, L.; Chen, B.; Yuan, Y. Fast C&W: A fast adversarial attack algorithm to fool SAR target recognition with deep convolutional neural networks. IEEE Geosci. Remote Sens. Lett. 2021, 19, 4010005. [Google Scholar]
Zhang, C.; Gao, X.; Liu, X.; Hou, W.; Yang, G.; Xue, T.; Wang, L.; Liu, L. IDGAN: Information-Driven Generative Adversarial Network of Coverless Image Steganography. Electronics 2023, 12, 2881. [Google Scholar] [CrossRef]
Li, F.; Li, L.; Zeng, Y.; Yu, J.; Qin, C. Adversarial multi-image steganography via texture evaluation and multi-scale image enhancement. Multimed. Tools Appl. 2024, 1–31. [Google Scholar] [CrossRef]
Tang, W.; Tan, S.; Li, B.; Huang, J. Automatic Steganographic Distortion Learning Using a Generative Adversarial Network. IEEE Signal Process. Lett. 2017, 24, 1547–1551. [Google Scholar] [CrossRef]
Vilchinsky, D.; Nazarov, I.; Burnie, E. Steganographic Generative Adversarial Networks. arXiv 2017, arXiv:1703.05502. [Google Scholar] [CrossRef]
Baluja, S. Hiding Images in Plain Sight: Deep Steganography. Adv. Neural Inf. Process. Syst. 2017, 30, 2066–2076. [Google Scholar]
Zhang, R.; Dong, S.; Liu, J. Invisible steganography via generative adversarial networks. Multimedia Tools Appl. 2019, 78, 8559–8575. [Google Scholar] [CrossRef]
Din, R.; Utama, S. The Design Review of Feature-based Method in Embedding the Hidden Message in Text as the Implementation of Steganography. Borneo Int. J. 2023, 6, 88–95. [Google Scholar]
Wang, Y.; Zhang, R.; Tang, Y.; Liu, J. State-of-the-art Advances of Deep-learning Linguistic Steganalysis Research. arXiv 2024, arXiv:2409.01780, 2024. [Google Scholar]
Chen, X.; Liu, C.; Li, B.; Lu, K.; Song, D. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv 2017, arXiv:1712.05526. [Google Scholar]
Li, X.; Li, F. Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
Kim, K.; Kim, J.B. Two-step model based on XGBoost for predicting artwork prices in auction markets. Int. J. Knowl.-Based Intell. Eng. Syst. 2024, 28, 133–147. [Google Scholar] [CrossRef]
Rahim, R.; Nadeem, S. End-to-end trained cnn encoder-decoder networks for image steganography. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops; Springer: Munich, Germany, 2018; pp. 723–729. [Google Scholar]

Figure 1. The fundamental architecture of the VRIS Steganography Network.

Figure 2. The curve of the loss function over time.

Figure 3. Comparison of runtime performance after applying random subsampling.

Figure 5. Hidden-Insight module structure.

Figure 6. Histogram comparison of cover image and steganographic image.

Figure 7. Contrast diagram of residual differences between reconstructed secret image and cover image with varying intensities. (a) Cover image; (b) encrypted image; (c) secret image; (d) reconstruction image; (e) residual ×1; (f) residual ×5; (g) residual ×10; (h) residual ×15; (i) residual ×20.

Figure 8. Residual image comparison: the first row is generated by the ISGAN model, the second row is generated by the model proposed in the literature [38], and the third row is generated by the VRIS model.

Figure 9. Histogram comparison between secret image and reconstructed secret image.

Table 1. Evolutionary framework of image steganography strategies and comparative analysis of algorithms.

Research Work	Main Techniques/Methods	Contributions & Features	Deficiencies or Areas for Improvement
Adversarial samples	Theoretical basis	Reveal the vulnerability of machine learning models and provide new ideas for steganography technology	It is highly theoretical, but its direct application to steganography requires further exploration
Deepfool method	Adversarial sample generation	Efficiently generate adversarial samples that can deceive DNNs, and provide new adversarial strategies for steganography technology	Counter the inadequacy of sample generation techniques
C&W method	Adversarial sample generation and evaluation	Improve the efficiency of adversarial sample generation, evaluate DNN robustness, and provide a comprehensive tool for steganography technology	Counter the inadequacy of sample generation techniques
ASDL-GAN	GAN, Unsupervised learning	Generate photorealistic steganography images to improve imperceptibility	There may be slight differences in statistical characteristics
SGAN	GAN Steganography Analysis Network	The steganography analysis network is introduced to enhance the authenticity and resistance of steganography analysis of densely loaded images	Deficiencies at the semantic level may cause concern
GNN-based model	GNN, multi-level embedding	Highly concealed, low distortion, more visually natural	The complexity is high, and the implementation is difficult
The GAN method of Zhang et al. [27]	GAN steganography image generation	GANs are used to generate realistic steganography images to improve concealment	General insufficient GAN method.
Multi-scale GAN method	GAN, multi-scale generation	Generate steganography images of different scales to enrich steganography application scenarios	Optimization problem for specific scales.
ISGAN	Color channel strategy	Hide secret information in the Y channel to avoid color distortion	There is still room for improvement in image quality and steganography detection resistance
Feature embedding based on deep learning	Deep learning, feature extraction	Take advantage of the model’s dependence on features to improve concealment	It has a strong dependence on a specific model, and the generalization ability needs to be verified
Chen et al.’s semantic information steganography approach [35]	Deep learning, semantic information detection	Detect the semantic information embedded in the image, which provides a new means for the security evaluation of steganography technology	Limitations of detection methods
Li et al. [36]	Exploits	Design a hiding method for evasion or attack detection mechanism to improve the steganography effect	It is necessary to have a deep understanding and continuous tracking of the development of machine learning models

Table 2. AI-Evasion Shield network structure details.

Block	Input	Structure	Output Size
Block1	(batch_size, 3, H, W)	Conv Layer + LeakyReLU	(batch_size, 64, H/2, W/2)
Block2	(batch_size, 64, H/2, W/2)	Conv Layer + BN + LeakyReLU	(batch_size, 128, H/4, W/4)
Block3	(batch_size, 128, H/4, W/4)	Conv Layer + BN + LeakyReLU	(batch_size, 256, H/8, W/8)
Block4	(batch_size, 256, H/8, W/8)	Conv Layer + BN + LeakyReLU	(batch_size, 512, H/16, W/16)
Block5	(batch_size, 512, H/16, W/16)	Conv Layer + Sigmoid	(batch_size, 1, 1, 1)

Table 3. Image PSNR and SSIM under different noise intensities.

Noise Factor	0.03		0.05		0.07		0.1
Dataset	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
Tiny-ImageNet	37.82	0.99	36.91	0.98	37.05	0.97	40.24	0.98
Mini-ImageNet	37.07	0.97	36.91	0.98	37.06	0.97	36.61	0.97
256-ObjectCategories	35.29	0.96	36.47	0.96	35.59	0.97	34.57	0.96
LFW	42.51	0.99	40.69	0.99	42.48	0.99	42.65	0.99

Table 4. Comparison of image quality generated by different image steganography models.

Dataset	Methods	PSNR	SSIM
ImageNet	VRIS	37.82	0.99
	ISGAN	34.28	0.96
	SGAN	37.88	0.93
	Ref. [38]	32.9	0.96
LFW	VRIS	42.51	0.99
	ISGAN	34.12	0.96
	SGAN	37.59	0.93
	Ref. [38]	39.9	0.95

Table 5. Image encryption and decryption effect.

Dataset	Cover Image	Secret Image	Noisy Encrypted Image	Illegally Decrypted Image	Denoised Encrypted Image	Secretly Reconstructed Image
LFW
256_ObjectCategories
ImageNet

Table 6. Classification error rate of VRIS model.

	InceptionV3			ResNet50
Dataset	0.03	0.05	0.07	0.03	0.05	0.07
Tiny-ImageNet	70.20%	70.95%	75.05%	99.04%	99.12%	87.8%
Mini-ImageNet	75.58%	79.5%	75.20%	69.68%	99.24%	87.64%
256-ObjectCategories	74.48%	75.99%	77.87%	74.6%	84.68%	89.84%
LFW	81.92%	85.78%	87.18%	86.54%	87.64%	86.00%

Table 7. Comparison of detection misclassification rates for cryptographic analyzers.

Steganalyzer	Steganographic Method	Accuracy
SRM	VRIS	42.29%
	ASDL-GAN [29]	17.40%
	S-UNIWARD [20]	20.50%
	ISGAN [32]	14.85%
XuNet	VRIS	42.29%
	ASDL-GAN [29]	16.20%
	S-UNIWARD [20]	20.10%
	ISGAN [32]	37.60%

Table 8. VRIS model extraction accuracy.

	Output_S_DENOISE/Input_S			Output_S_NOISE/Input_S
Dataset/Noise Factor	0.03	0.05	0.07	0.03	0.05	0.07
Tiny-ImageNet	98.45%	97.44%	96.05%	91.31%	82.04%	81.99%
Mini-ImageNet	97.27%	96.87%	96.34%	92.63%	87.74%	83.39%
256-ObjectCategories	97.94%	97.37%	97.47%	94.05%	92.97%	93.57%
LFW	96.14%	97.10%	93.76%	92.81%	87.13%	87.60%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, F.; Dong, Y.; Sun, H. Research on Key Technologies of Image Steganography Based on Simultaneous Deception of Vision and Deep Learning Models. Appl. Sci. 2024, 14, 10458. https://doi.org/10.3390/app142210458

AMA Style

Zhang F, Dong Y, Sun H. Research on Key Technologies of Image Steganography Based on Simultaneous Deception of Vision and Deep Learning Models. Applied Sciences. 2024; 14(22):10458. https://doi.org/10.3390/app142210458

Chicago/Turabian Style

Zhang, Fan, Yanhua Dong, and Hongyu Sun. 2024. "Research on Key Technologies of Image Steganography Based on Simultaneous Deception of Vision and Deep Learning Models" Applied Sciences 14, no. 22: 10458. https://doi.org/10.3390/app142210458

APA Style

Zhang, F., Dong, Y., & Sun, H. (2024). Research on Key Technologies of Image Steganography Based on Simultaneous Deception of Vision and Deep Learning Models. Applied Sciences, 14(22), 10458. https://doi.org/10.3390/app142210458

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Key Technologies of Image Steganography Based on Simultaneous Deception of Vision and Deep Learning Models

Abstract

1. Introduction

2. Related Work

3. VRIS Image Steganography Model

3.1. VRIS Model Architecture and Design

3.2. Image Preprocessing

3.3. Basic Results of the VRIS Basic Model

3.3.1. Visual-Masker

3.3.2. AI-Evasion Shield

3.3.3. Hidden-Insight

3.4. Hybrid Loss Function

3.4.1. Mean Square Error Loss

3.4.2. Adam Optimizer

3.4.3. Adam Optimizer Core Formula

3.4.4. A Variant of the Binary Cross-Entropy Loss

4. VRIS Multidimensional Assessment

4.1. Experimental Setup and Dataset

4.2. Visual Quality Assessment

4.2.1. Evaluation Indicator Selection

4.2.2. Results of the Experiment

4.3. Security Analysis

4.3.1. Evaluation Indicator Selection

4.3.2. Experimental Results

4.4. Analysis of Concealment

5. Reversibility Evaluation of VRIS Image Steganography Model

Reversibility Analysis

6. Summary and Future Prospects

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI