Next Article in Journal
Model Test Study on the Influence of Ground Surcharges on the Deformation of Shield Tunnels
Previous Article in Journal
Investigation of Local Weighting Filtering on Randomization Technique Estimates in a Data Assimilation System
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Fast Selection Based on Similar Cross-Entropy for Steganalytic Feature

1
SanQuan College, Xinxiang Medical University, Xinxiang 453003, China
2
College Software, Henan Normal University, Xinxiang 453007, China
3
College Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
*
Author to whom correspondence should be addressed.
Symmetry 2021, 13(9), 1564; https://doi.org/10.3390/sym13091564
Submission received: 20 July 2021 / Revised: 20 August 2021 / Accepted: 20 August 2021 / Published: 25 August 2021
(This article belongs to the Section Computer)

Abstract

:
The mutual confrontation between image steganography and steganalysis causes both to iterate continuously, and as a result, the dimensionality of the steganalytic features continues to increase, leading to an increasing spatio-temporal overhead. To this end, this paper proposes a fast steganalytic feature selection method based on a similar cross-entropy. Firstly, the properties of cross-entropy are investigated, through the discussion of different models, and the intra-class similarity criterion and inter-class similarity criterion based on cross-entropy are presented for the first time. Then, referring to the design principles of Fisher’s criterion, the criterion of feature contribution degree is further proposed. Secondly, the variation of the cross-entropy function of a univariate variable is analyzed in principle, thus determining the normalized range and simplifying the subsequent analysis. Then, within the normalized range, the variation of the cross-entropy function of a binary variable is investigated and the setting of important parameters is determined. Thirdly, the concept of similar cross-entropy is further presented by analyzing the changes in the value of the feature contribution measure under different circumstances, and based on this, the criterion for the feature contribution measure is updated to decrease the complexity of the calculation. Remarkably, the contribution measure criterion devised in this paper is a symmetrical structure, which equitably measures the contribution of features in different situations. Fourth, the feature component with the highest contribution is selected as the final selected feature based on the result of the feature metric. Finally, based on the Bossbase 1.01 image base that is a unique standard and recognized base in steganalysis, the feature selection on 8 kinds of low and high-dimensional steganalytic features is carried out. Through extensive experiments, comparison with several classic and state-of-the-art methods, the method designed in this paper attains competitive or even better performance in detection accuracy, calculation cost, storage cost and versatility.

1. Introduction

Image steganography [1,2,3,4] refers to the use of some algorithm to embed secret information in an image for covert communication. In this regard, the original image is called the cover image and the image embedded with secret information is called the stego image. On the one hand, steganography ensures the privacy of subscribers and special confidential units, but on the other hand, it can be used by unscrupulous elements to compromise public security [5,6,7]. This has led to the advent of steganalysis [8,9,10,11], which uses feature extraction algorithms to analyze image characteristics and then uses classifiers to distinguish between cover images and stego images, thereby safeguarding national and public security. Common steganalysis algorithms are: Jan et al. [11] proposed 548-D CC-PEV feature (the PEV features enhanced by Cartesian Calibration), Holub et al. [12] proposed 8000-D DCTR feature (Discrete Cosine Transform Residual feature), Song et al. [9] proposed 17000-D GFR feature (jpeg Rich model utilizing Gabor Filters), Kodovský et al. [13] proposed 22510-D CC-JRM feature (Cartesian Calibrated JPEG domain Rich Model), Fridrich et al. [14] proposed 34671-D SRM feature (full Spatial domain Rich Model).
Nevertheless, along with the rapid development of adaptive steganography [15,16,17], in order to improve the detection accuracy of stego images, steganalysis needs to extract features from different scales and orientations [18,19,20], leading to an increasing number of feature dimensions [13,21,22], resulting in huge computational and storage overheads.
To efficiently solve the problem of distinguishing stego images from cover images, researchers have devised feature selection methods [23,24,25]. Depending on the application of the methods, the existing feature selection methods can be divided into: specific feature selection methods and general selection methods.
The specific feature selection method [25,26] means that the algorithm only works on one or some steganalytic features with weak generalization. For example, Yang et al. [25] proposed a feature subspace selection method based on Fisher’s criterion (SSFC), which first calculates the Fisher value and probability value of individual feature components, and then calculates the weight of each feature component, finally, selects the feature component whose probability value is proportional to the weight as the final selected feature, which improves the detection accuracy of GFR features to a certain extent. Yu et al. [26] proposed a multi-scale feature selection method for steganalytic feature GFR (SRGS), which first used the SNR criterion to measure the uselessness of features and removed the useless features, then innovated the Relief algorithm to measure the importance of features and select the important features, and finally took the important features as the final selected features. The experiment verified that the algorithm reduced a certain number of dimensions and improved detection accuracy at the same time.
The generic feature selection method [24,27,28,29,30,31,32] means that the algorithm achieves excellent results for most existing steganalytic features. For example, Qin et al. [27] devised a principal component analysis-based feature selection method (PCA-based), which first calculates the mean value of each feature component, and then calculates the covariance matrix as well as the eigenvalues. Then arranges the feature components in descending order according to the eigenvalues, and finally selects the feature components of a specified dimension as the final selected features according to the requirements. Wang et al. [28] devised a comprehensive criterion-based feature selection method (CGSM), which was guided by the disparity function and Person coefficients, and first selected the feature components with large disparity, on this basis, then removed the interference of redundant features, reducing the feature dimensionality while slightly improving the detection accuracy for stego images. Ma et al. [24] proposed a feature selection method based on decision rough set α-positive domain approximation, which first applied rough set theory to the steganalytic feature selection, then using the attribute separability measure (ASM) criterion to measure the divisibility of feature components, which was then extended to measure the divisibility of feature vectors, and finally, the final dominant features were selected based on the classifier, and the experiments demonstrated that the method significantly reduced the dimensionality of some features.
Even though the above feature selection methods have attained certain results, there are still problems such as the feature dimensionality is still high, the selection time is still long and the detection accuracy is still low, which limits its application in practice [33,34,35].
In order to solve the above problems, this paper attempts to devise a fast selection method for steganalytic features based on similarity cross-entropy (FSCE). Specifically, firstly, the properties of cross-entropy are investigated, the intra-class similarity criterion and inter-class similarity criterion are investigated and proposed, followed by a further feature contribution measure criterion regarding the model of Fisher’s criterion. Secondly, the variation of cross-entropy functions of univariate and binary variables is analyzed separately to determine the problem of setting important parameters. Finally, the feature component with high contribution is selected as the final selected feature based on the results of the feature metric.
Remarkably, the feature selection in image steganalysis differs from the regular feature selection method with two manifestations: on the one hand, we use two symmetrical images, i.e., cover and stego images, during training and testing. On the other hand, in feature selection, we analyze two sets of symmetrical features, i.e., cover and stego features, for the convenience of calculation.
In order to verify the effectiveness and efficiency of FSCE, a large number of experiments are carried out in Bossbase 1.01 image database [36] (That contains 10,000 gray image pictures whose size is 512 × 512.), which is the only standard and recognized image base in the steganalysis field. This includes: Firstly, a comparison was made with the features selected under different thresholds to determine the correctness of the final selection threshold in this paper. Secondly, a comparison was made with the features selected by the original steganalytic features as well as randomly selected features of the same dimension. Finally, a comparison was made with several classical and state-of-the-art fast feature selection methods. The effectiveness, efficiency and generality of FSCE are verified by the above large number of experiments.
The rest of this paper is organized as follows. Section 2 introduces the related work. In Section 3, the variation of univariate and binary cross-entropy is investigated and analyzed, while the concept of similar cross-entropy is then devised for the first time, then the FSCE method is then introduced; A series of comparative experiments are conducted in Section 4 to verify the effectiveness, efficiency and generality of the FSCE. Finally, a summary of the whole paper is presented.

2. Materials and Methods

To devise an appropriate and rational feature selection method, in this section, a bit of background knowledge needed for the methods in this paper is presented. In particular, Section 2.1 introduces the Fisher criterion, which is the most popular today. In addition, Section 2.2 presents the concept of information entropy, from which the principle of cross-entropy and its properties are introduced and presented.

2.1. Fisher Criterion

The Fisher criterion was first introduced to the field of image steganalysis by Yang et al. [25], which uses the idea of “intra-class aggregation and inter-class dispersion” to measure the separability of feature components for the selection of important features. Since the feature selection process requires measuring the performance of each feature component, which is related to the rate of change of feature values and the statistical dispersion in different feature classes. In addition, the Fisher criterion, which is the classical method for measuring feature discrimination in pattern recognition, considers not only deviation between different classes of images but also dispersion of features in each class of images. As a result, it is widely used in feature selection [25,29]. The formula is as follows.
F i s h e r ( f i ) = ( μ f i c μ f i s ) 2 ( σ f i c ) 2 + ( σ f i s ) 2
where, f i represents the ith feature component, F i s h e r ( f i ) represents the Fisher value of f i , μ f i c and μ f i s represent the mean value of f i in cover class and stego class, respectively. σ f i c and σ f i s represent the standard deviation of f i in cover class and stego class, respectively. ( μ f i c μ f i s ) 2 represents the inter-class distance of f i , ( σ f i c ) 2 + ( σ f i s ) 2 represents the intra-class distance of f i . Remarkably, the larger the F i s h e r ( f i ) , the greater the contribution of the f i to distinguishing the stego image from the cover images, i.e., the more likely it is to be important.

2.2. Cross-Entropy

In 1948, Sharon introduced the concept of information entropy, which solved the problem of quantifying information. The basic principle is that if a feature has a greater effect on the clustering of data, the more information it carries, i.e., the greater its entropy value. The formula for this is as follows.
H I ( X ) = i = 1 n p i log 2 ( p i )
where, X = [ X 1 , X 2 , , X n 1 , X n ] , X i represents the ith feature, H I ( X ) represents the information entropy of X , p i represents the probability values of X i . Notably, the larger the H I ( X ) , the more information the X carries, i.e., the more likely the features in the X are to be useful.
Even though information entropy can measure the information carried by a single feature set, it cannot address the discrepant information of two feature sets. For this reason, cross-entropy [37,38] was created and is formulated as follows.
H ( P , Q ) = i = 1 n p i ln ( q i )
where, H ( P , Q ) represents the cross-entropy of P to Q , p i represents the ith value of P , q i represents the ith value of Q . Remarkably, the larger the H ( P , Q ) , the less information there is about the difference between P and Q .

3. FSCE

Feature selection, also known as feature dimensionality reduction, aims to reduce the number of features while maintaining or even improving the detection accuracy of stego images and speeding up feature selection. In order to measure the contribution of a single feature component, this paper considers the cross-entropy principle as a guide to constructing a similar cross-entropy-based feature contribution criterion. Among them, Section 3.1 devises a criterion for the feature contribution measure, which provides a sound basis for the selection of feature components with a high contribution. Section 3.2 discusses the problem of setting some important parameters. In addition, Section 3.3 presents the overall process of the algorithm with performance analysis. Section 3.4 illustrates the advantages of the FSCE.

3.1. Contribution Probing

Drawing on the idea of the construction for Fisher’s criterion, in this section, we attempt to construct intra-class similarity and inter-class similarity criteria for the feature components utilizing the cross-entropy principle. Specifically, Section 3.1.1 introduces some important notations in this paper. Section 3.1.2 proposes the construction of the intra-class similarity criterion. Section 3.1.3 introduces the construction of the inter-class similarity criterion, and Section 3.1.4 introduces the criterion for the feature contribution measure.

3.1.1. Symbol Description

F = [ F c , F s ] T = [ f 1 , f 2 , , f i , , f N 1 , f N ] represents all steganalytic feature in this case. F c = [ f 1 c , f 2 c , , f i c , , f N 1 c , f N c ] and F s = [ f 1 s , f 2 s , , f i s , , f N 1 s , f N s ] represent all cover and stego features, respectively. N = F c = F s represents the feature dimensions, i.e., the number of feature components contained. f i = [ f i c , f i s ] T represents the set of all eigenvalues of the ith feature component. f i c = [ f i , 1 c , f i , 2 c , , f i , j c , , f i , M 1 c , f i , M c ] represents the cover feature of f i . f i s = [ f i , 1 s , f i , 2 s , , f i , j s , , f i , M 1 s , f i , M s ] represents the stego feature of f i . M = f i c = f i s represents the total number of cover/stego images.

3.1.2. Construction of Intra-Class Similarity Criterion

Based on the cross-entropy principle in Section 2.2, we suppose utilizing this principle to assess the intra-class similarity of a single feature component. Since there are two classes of cover features and stego features in a single feature component, and the eigenvalues of the two are not identical. In order to more appropriately measure the similarity relationship within two different categories of a single feature component, this paper defines the two categories in cross-entropy as the model of Equation (4) and follows Equation (5) to measure the intra-class similarity D I n ( f i ) of a single feature component.
Remarkably, we discuss two cases in Equation (4), i.e., when discussing the intra-class similarity of cover features, both P and Q in Equation (3) are f i c (i.e., illustrated in the upper part of Equation (4), while the cross-entropy of P and Q , i.e., H ( P , Q ) = H ( f i c , f i c ) is the intra-class similarity of cover features. Correspondingly, when discussing the intra-class similarity of stego features, both P and Q in Equation (3) are f i s (i.e., illustrated in the lower part of Equation (4), while the cross-entropy of P and Q , i.e., H ( P , Q ) = H ( f i s , f i s ) is the intra-class similarity of stego features.
{ f i c P   a n d   f i c Q ,   W h e n   f i c f i c f i s P   a n d   f i s Q ,   W h e n   f i s f i s
D I n ( f i ) = H ( f i c , f i c ) + H ( f i s , f i s )
where, f i represents the ith feature component, D I n ( f i ) represents the intra-class similarity of f i , H ( f i c , f i c ) represents the cross-entropy within the cover features, H ( f i s , f i s ) represents the cross-entropy within the stego features. Remarkably, the smaller the D I n ( f i ) , the better the intra-class aggregation of f i , i.e., the greater the contribution of f i .

3.1.3. Construction of Inter-Class Similarity Criterion

Similarly, for the construction of the inter-class similarity criterion, this paper measures the similarity between the cover feature values and the stego feature values in the feature components. However, because it is apparent that cross-entropy does not satisfy the exchange law, i.e., H ( P , Q ) H ( Q , P ) . Then this paper treats the cover image class and the stego image class in a single feature component as P and Q , respectively, as illustrated in Equation (6). Hence the inter-class similarity of a single feature component is measured using Equation (7).
Remarkably, we discuss two cases in Equation (6), i.e., when the cross-entropy of cover to stego is measured, with P and Q in Equation (3) being f i c and f i s , respectively (i.e., illustrated in the upper part of Equation (6)). Correspondingly, when the cross-entropy of stego to cover is measured, with P and Q in Equation (3) being f i s and f i c , respectively (i.e., illustrated in the lower part of Equation (6)):
{ f i c P   a n d   f i s Q ,   W h e n   f i c f i s f i s P   a n d   f i c Q ,   W h e n   f i s f i c
D I t ( f i ) = H ( f i c , f i s ) + H ( f i s , f i c )
where, f i represents the ith feature component, D I t ( f i ) represents the inter-class similarity of f i , H ( f i c , f i s ) represents the cross-entropy of the cover features to the stego features, H ( f i s , f i c ) represents the cross-entropy of the stego features to the cover features. Remarkably, the larger the D I t ( f i ) , the better the inter-class dispersion of f i , i.e., the greater the contribution of f i .

3.1.4. Feature Contribution Metric

Based on the above intra-class similarity criterion and inter-class similarity criterion, to observe the contribution of individual features more easily, we tried to combine them together, we then devised a feature contribution measure criterion shaped akin to Fisher’s criterion [29], whose formula is shown in Equation (8).
F C S ( f i ) = D I t ( f i ) D I n ( f i ) = H ( f i c , f i s ) + H ( f i s , f i c ) H ( f i c , f i c ) + H ( f i s , f i s )
D I t ( f i ) = H ( f i c , f i s ) + H ( f i s , f i c ) = H ( f i c , f i s ) + H ( f i s , f i c )
H ( f i c , f i s ) = j = 1 M / 2 ( μ 1 f i , j c ln ( μ 2 f i , j s ) ) ,   H ( f i s , f i c ) = j = 1 M / 2 ( μ 1 f i , j s ln ( μ 2 f i , j c ) )
D I n ( f i ) = H ( f i c , f i c ) + H ( f i s , f i s ) = H ( f i c , f i c ) + H ( f i s , f i s )
H ( f i c , f i c ) = j = 1 M / 2 ( μ 3 f i , j c ln ( μ 4 f i , j c ) ) ,   H ( f i s , f i s ) = j = 1 M / 2 ( μ 3 f i , j s ln ( μ 4 f i , j s ) )
where, F C S ( f i ) represents the overall contribution of f i , D I t ( f i ) is a deformation of D I t ( f i ) standing for the inter-class similarity of f i which can be calculated from Equations (9) and (10). D I n ( f i ) is a deformation of D I n ( f i ) standing for the intra-class similarity of f i which can be calculated from Equation (11) and Equation (12). Since we are only exploring the variation of D I t ( f i ) D I n ( f i ) , we need to restrict the trend of the cross-entropy, for which we have introduced four parameters: μ 1 , μ 2 , μ 3 and μ 4 . For simplicity, we set μ 1 , μ 2 , μ 3 and μ 4 to take only the values 1 and −1 (in fact, they could take other values, but in this case, the simpler −1 and 1 are chosen because they are only used to change the trend) to control the reasonableness of F C S ( f i ) . Yet, the specific deformation patterns and the results of the changes are discussed and analyzed in Section 3.2. Notably, the larger the F C S ( f i ) , the more useful the f i is for detecting stego images, i.e., the more it should be retained.

3.2. Parameter Setting

For the Equation (8) proposed in this paper, in order to satisfy the thought of “intra-class aggregation and inter-class dispersion”, we must set the size of the μ 1 , μ 2 , μ 3 and μ 4 parameters appropriately to ensure that the inter-class similarity does not conflict with the intra-class similarity, so as to better determine the extent to which this feature component contributes to the classification.
To this end, we first investigated the variation of the simpler monadic function H ( X , X ) = x i ln ( x i ) in [ 0 , 1 ] , which takes the shape of H ( f i c , f i c ) in intra-class similarity, and it is easy to see that H ( X , X ) shows an increasing and then decreasing trend within [ 0 , 1 ] , with the extreme point coordinates being ( 1 e , 1 e ) . Based on this statistic, and for simplicity of calculation, we intend to normalize the steganalytic feature values via Equation (13) to restrict them to between [ 0 , 1 e ] , where H ( X , X ) is monotonically increasing. From this, it is given that μ 3 = 1 , μ 4 = 1 , H ( f i c , f i c ) = H ( f i c , f i c ) , H ( f i s , f i s ) = H ( f i s , f i s ) , D I n ( f i ) = D I n ( f i ) . Since H ( f i c , f i c ) is monotonically increasing within [ 0 , 1 e ] , H ( f i c , f i c ) is monotonically increasing, i.e., D I n ( f i ) is monotonically increasing. Then we just need to make sure that D I t ( f i ) is monotonically decreasing in [ 0 , 1 e ] to perfectly satisfy the “intra-class aggregation, inter-class dispersion” principle.
f i , j c / s = { f i , j c / s min ( F c , F s ) e × ( max ( F c , F s ) min ( F c , F s ) ) , max ( F c , F s ) min ( F c , F s ) f i , j c / s min ( F c , F s ) e ,                                 max ( F c , F s ) = min ( F c , F s )
where, f i represents the ith feature component, f i , j denotes the jth value of f i taken.
Then, on the basis of the above normalization, we investigated the variation of the binary function H ( X , Y ) = μ 1 x i ln ( μ 2 y i ) in [ 0 , 1 e ] . Since μ 1 and μ 2 each take two values, there are four cases of H ( X , Y ) , but since H ( X , Y ) is meaningless when μ 2 = 1 , only two cases exist. Subsequently, we set the change step for both X and Y to 0.01 and plotted Figure 1 using the meshgrid(), the subplot() and the surf() in Matlab 2016.
From Figure 1, when μ 1 = 1 and μ 2 = 1 , the larger X is, the larger H ( X , Y ) is, when Y is certain. In addition, the larger Y is, the smaller H ( X , Y ) is, when X is certain. Yet when X and Y increase simultaneously, H ( X , Y ) is not monotonically decreasing, i.e., it does not satisfy our objective. Then when μ 1 = 1 and μ 2 = 1 , the larger X is, the smaller H ( X , Y ) is, when Y is certain. In addition, the larger Y is, the larger H ( X , Y ) is, when X is certain. Yet when X and Y increase simultaneously, H ( X , Y ) is not monotonically decreasing, i.e., such a case again does not satisfy our objective.
To this end, after reviewing the literature, we found that one could try to replace ln ( x ) with e x [39] (From Figure 2, it can be seen that ln ( x ) and e x vary in opposite trends within [ 0 , 1 e ] ), which led to the concept of similar cross-entropy, as shown in Equation (14). Correspondingly, Equations (8) and (10) can be transformed into Equations (15) and (16), respectively.
S H ( P , Q ) = i = 1 n p i e q i
F C S ( f i ) ¯ = S H ( f i c , f i s ) + S H ( f i s , f i c ) S H ( f i c , f i c ) + S H ( f i s , f i s ) = S H ( f i c , f i s ) + S H ( f i s , f i c ) S H ( f i c , f i c ) + S H ( f i s , f i s )
S H ( f i c , f i s ) = j = 1 M / 2 ( μ 1 f i , j c e μ 2 f i , j s ) ,   S H ( f i s , f i c ) = j = 1 M / 2 ( μ 1 f i , j s e μ 2 f i , j c )
where, S H ( P , Q ) represents the similar cross-entropy between P and Q . Since e x and ln ( x ) vary in opposite trends within [ 0 , 1 e ] (as shown in Figure 2), the smaller the F C S ( f i ) ¯ , the better the usefulness. Nevertheless, at this point we still need to determine the values of μ 1 and μ 2 , so we have investigated the variation of S H ( X , Y ) = μ 1 x i e μ 2 y i within [ 0 , 1 e ] . Subsequently, we set the change step for both X and Y to 0.01 and plotted Figure 3 and Figure 4 using the subplot() and the surf() in Matlab 2016B.
From Figure 3 and Figure 4, when μ 1 = 1 , μ 2 = 1 and X and Y increase simultaneously, S H ( X , Y ) is monotonically decreasing. This means that it thoroughly met the fundamental requirements of our design guidelines.
Overall, after investigating and analyzing the values of μ 1 , μ 2 , μ 3 and μ 4 , we have finally determined that μ 1 = 1 , μ 2 = 1 , μ 3 = 1 and μ 4 = 1 . Therefore, Equation (15) can be transformed into Equation (17).
F C S ( f i ) ¯ = S H ( f i c , f i s ) + S H ( f i s , f i c ) S H ( f i c , f i c ) + S H ( f i s , f i s ) = S H ( f i c , f i s ) + S H ( f i s , f i c ) S H ( f i c , f i c ) + S H ( f i s , f i s ) = j = 1 M / 2 ( f i , j c e f i , j s ) + j = 1 M / 2 ( f i , j s e f i , j c ) j = 1 M / 2 ( f i , j c e f i , j c ) + j = 1 M / 2 ( f i , j s e f i , j s )
where, M represents the total number of cover/stego images, and M/2 represents the number of cover/stego images used for testing. Notably, the smaller the value of F C S ( f i ) ¯ , the greater the contribution of f i , i.e., the more important it is to be selected.

3.3. Overall Process and Performance Analysis

The FSCE algorithm principally consists of the following details. Firstly, normalize the eigenvalues to restrict their range to [ 0 , 1 e ] . Secondly, calculate the similar cross-entropy of cover to stego and stego to cover, respectively, using Equation (16) and the determined μ 1 and μ 2 , and thus calculate the inter-class similarity D I t ( f i ) . Then calculate the similar cross-entropy of cover to cover and stego to stego, respectively, using Equation (16) and the determined μ 3 and μ 4 , and thus calculate the intra-class similarity D I n ( f i ) . Thirdly, calculate F C S ( f i ) ¯ using Equation (17). Finally, rank the feature components according to F C S ( f i ) ¯ in ascending order and select the smaller feature component of F C S ( f i ) ¯ as the final selected feature.
To illustrate in more detail the working principle of the FSCE method, we give a specific algorithm based on the major steps outlined above, which is shown in Algorithm 1.
Algorithm 1: FSCE.
Input: Original steganalytic features F = [ f 1 , f 2 , , f i , , f N 1 , f N ] , and dimension selected D s
Output: Final selected feature F = [ f 1 , f 2 , , f i , , f n 1 , f n ] and corresponding column number c o l u m n s
(1)
For  i = 1 : N do
(2)
For  j = 1 : M do
(3)
Normalize the eigenvalues to within [ 0 , 1 e ] using Equation (13).
(4)
END For
(5)
END For
(6)
Let F C S ( f i ) ¯ = zeros ( 1 , N ) .
(7)
For  i = 1 : N do
(8)
Calculate S H ( f i c , f i s ) and S H ( f i s , f i c ) using the Equation (16), u 1 and u 2 .
(9)
Calculate D I t ( f i ) using the Equation (9).
(10)
Calculate S H ( f i c , f i c ) and S H ( f i s , f i s ) using the u 3 and u 4 .
(11)
Calculate D I n ( f i ) using the Equation (11).
(12)
Calculate F C S ( f i ) ¯ using the Equation (17).
(13)
END For
(14)
Arrange the steganalytic feature components in ascending order based on the F C S ( f i ) ¯ values from Step(12) and acquire F ¯ = [ f ¯ 1 , f ¯ 2 , , f ¯ i , , f ¯ N 1 , f ¯ N ] .
(15)
Set the dimensions to be selected D s .
(16)
For  i = 1 : N do
(17)
Select the top D s feature components as the final features based on the ranking results of Step(14).
(18)
END For
(19)
Return F = [ f 1 , f 2 , , f i , , f n 1 , f n ] and c o l u m n s .
Analyzing Algorithm 1, it can be seen that the complexity of the FSCE algorithm depends mainly on the calculation process of F C S ( f i ) ¯ , and the time complexity of this part is O ( N M ) , where N is the total dimensions of the features and M represents the total number of cover/stego images. In contrast, the time complexity of the more popular classifier-based feature selection method is O ( m L M N 2 ) + O ( m L N 3 ) [24,25], where L is the number of classifiers, M is the number of image samples in the training set, N is the dimensions of feature in the test set, and m is the number of cycles.
The comparison shows that the time complexity O ( N M ) of the FSCE method is significantly lower than the time complexity O ( m L M N 2 ) + O ( m L N 3 ) of the integrated classifier-based feature selection method.

3.4. The Merits of FSCE

The merits of the FSCE method can be summarized as follows.
Firstly, an innovative improvement in the normalization range simplifies the algorithm design process. Compared to the traditional method of normalizing features to [0, 1], normalizing to [ 0 , 1 e ] permits the H ( X , X ) = x i ln ( x i ) to be monotonically increasing in this interval, simplifying the subsequent analysis and making it easier to determine μ 1 , μ 2 , μ 3 and μ 4 .
Secondly, cross-entropy, which is applicable to image steganalysis, is investigated for the first time and a feature contribution measure criterion is constructed, which is similar to Fisher’s criterion. Based on the advantage that cross-entropy can determine the difference information of two classes, we classify the models of different cases and then propose the intra-class similarity criterion and inter-class similarity criterion. Based on which we further propose the feature contribution measure criterion with reference to the design principle of Fisher’s criterion.
Thirdly, the concept of similar cross-entropy is theoretically proposed and proved, based on which the complexity of the calculation is considerably reduced. In determining the values of the parameters, we analyzed the variation of the feature contribution measure under different situations and found that it did not meet the original intention of our design. For this reason, after searching for a large amount of literature, we proposed the concept of similar cross-entropy and updated the feature contribution measure criterion based on it. The analysis revealed that the new feature contribution measure criterion not only meets the requirements of the design but also decreases the computational complexity by changing the logarithmic operation into an exponential operation.
Fourth, FSCE is more general. Through a series of experiments in Section 4, it is found that the FSCE method is relatively effective in selecting many steganalytic features, attaining the goal of reducing the dimensions of feature by 40% while maintaining or even improving the detection accuracy of stego images.
Finally, the FSCE method has a low time complexity. With the performance analysis in Section 3.3, we find that the comparison demonstrates that the time complexity O ( N M ) of the FSCE method is significantly lower than the time complexity O ( m L M N 2 ) + O ( m L N 3 ) of the integrated classifier-based feature selection method. From this, one can find that FSCE is more efficient, which enables it to be used in time-critical applications.

4. Experiment

To verify the performance of the FSCE method proposed in this paper, in this section, we conduct experiments on the selection of different image steganalytic features. Specifically, Section 4.1 describes the experimental setup; Section 4.2 compares the features selected with different thresholds to determine the correctness of the final selection threshold in this paper; Section 4.3 compares the original features as well as randomly selected features to verify the effectiveness of FSCE; And finally, Section 4.4 compares the features with several classical and state-of-the-art feature selection methods to verify the efficiency and generality of FSCE.
Remarkably, all experiments in this paper were performed based on Matlab 2016B. It deserves mentioning that all algorithms are executed on a PC with 4 Intel(R) Core (TM) i7-8700 @ 3.20GHz CPUs, 8Gb memory.

4.1. Experiment Setup

The images used in this paper are taken from the only recognized image library in the steganography and steganalysis, BOSSbase 1.01 (http://dde.binghamton.edu/download/ImageDB/BOSSbase_1.01.zip, accessed on 7 November 2014), which contains 10,000 grey-scale images. To acquire the image steganalytic features, we performed the following operations on downloaded images.
(1)
Set a specified quality factor QF and then transform the PGM images in Bossbase 1.01 into the JPEG images of a certain QF.
(2)
Set the embedding rate Payload, and then use the steganography algorithm to embed secret information into the JPEG images to acquire the stego images under the current Payload.
(3)
Based on the set QF and Payload, use the steganalysis algorithm to extract the corresponding steganalytic features for the cover/stego images.
(4)
Depending on the steganography algorithm, steganalysis algorithm, QF and Payload (whose specific settings are shown in Table 1), by repeating (1)–(3), we will eventually construct a steganography detection image library containing 80,000 cover images and 400,000 stego images, and acquire a library containing 8 different steganalytic features.
Meanwhile, we continued to train and test the sample data along with the fisher linear discriminant (FLD) integrated classifier [40] by selecting 5000 cover and stego images as the training set and the remaining 5000 cover and stego images as the testing set, and then calculated the detection accuracy using Equation (18).
P A ¯ = 1 P E ¯
P ¯ E = min P F A [ P F A + P M D ( P F A ) ] / 2
where, P A ¯ denotes the average detection accuracy, P ¯ E denotes the average detection error rate, P F A denotes the false alarm rate, P M D denotes the false positive rate. To ensure that the experimental results could be fair and reliable, we took the average detection accuracy of the ten-fold cross-check as the final result of this feature selection method.
The experiments in this paper consist of three main parts.
(1)
Comparison experiments with features selected under different thresholds
(2)
Comparison experiments with original features and randomly selected features
(3)
Comparison experiments with several classical and state-of-the-art feature selection methods

4.2. Comparison Experiments with Features Selected under Different Thresholds

In order to determine the value of D s in the FSCE method, in this subsection, we conducted a mass of experiments on the image steganalytic features extracted in Section 4.1, and then determined the relatively appropriate D s based on the experimental results under different D s . As for the range of D s and the iteration step, after analyzing a large amount of literature, we found that most of the existing feature selection methods reduce the dimensionality to 70%, while a few feature selection methods can reduce the dimensionality of some features to about 50%. For example, the SRGS algorithm reduces the GFR feature to roughly 50%, the CGSM algorithm reduces the GFR feature to 65% and the CC-PEV feature to 67%. steganalysis-α reduces the J + SRM (union of SRMQ1 (SRM with the fixed quantization q = 1c) and CC-JRM) dimension to about 70%. To make the FSCE method more generalizable and effective for most of the steganalytic features, the range of D s is specified between 0.5 N and 0.7 N in this paper, and the iteration step is 0.02 N . In fact, based on this strategy (setting the same dimensions selected for different Payloads of the same feature), the dominant feature can be selected more efficiently. At the same time, this helps to propose new feature extraction methods. In general, the results of feature selection under different D s are shown in Table 2.
As can be seen from Table 2, the proposed FSCE achieves excellent selection results for different steganalytic features. For example, for the F1 feature, when Payload = 0.1, the detection accuracy of D s = 0.64 N is the highest, which is improved by 0.29% compared to the original feature. For the F2 feature, when Payload = 0.2, the detection accuracy of D s = 0.56 N is the highest, which is improved by 1.20% compared to the original feature. For the F4 feature, when Payload = 0.1, the detection accuracy of D s = 0.54 N is the highest, which is improved by 0.93% compared to the original feature. For the F5 feature, when Payload = 0.1, the detection accuracy of D s = 0.68 N is the highest, which is improved by 0.24% compared to the original feature. For the F6 feature, when Payload = 0.5, the detection accuracy of D s = 0.62 N is the highest, which is improved by 0.23% compared to the original feature.
In summary, based on a combination of the feature dimensions selected and the detection accuracy of the stego images, we found that FSCE was better at selecting many features when D s = 0.58 N . For example, for the F2 feature, when Payload = 0.1, the detection accuracy of FSCE is improved by 0.86% compared to the original feature, and when Payload = 0.2, the detection accuracy of FSCE is also improved by 1.20%. For the F4 feature, when Payload = 0.1, the detection accuracy of FSCE is improved by 0.92% compared to the original feature and when Payload = 0.3, the detection accuracy of FSCE is also improved by 0.80%. For the F5 feature, when Payload = 0.2, the detection accuracy of FSCE is improved by 0.18% compared to the original feature. For the F6 feature, when Payload = 0.5, the detection accuracy of FSCE is improved by 0.22% compared to the original feature. For the F7 feature, when Payload = 0.3, the detection accuracy of FSCE is improved by 0.24% compared to the origin-feature.

4.3. Comparison Experiments with Original Features and Randomly Selected Features

To verify the effectiveness of the FSCE, in this subsection we compare the features selected by the FSCE with the original features and the randomly selected features. Notably, the dimensionality of the “randomly selected features” is the same as the dimensionality of the features selected by the FSCE, so as to demonstrate the effectiveness of the FSCE. The results of the experiments are shown in Table 3.
As can be seen from the table, the FSCE-feature has the best performance compared to the Random-feature and Origin-feature. Specifically, FSCE reduces the dimensionality by 42% while maintaining or even improving the detection accuracy. For example, for the F1 feature, when Payload = 0.5, FSCE improved the detection accuracy by 0.94% compared to the Random-feature. For the F2 feature, when Payload = 0.2, FSCE improved the detection accuracy by 3.85% compared to the Random-feature and by 1.20% compared to the Origin-feature. For the F3 feature, when Payload = 0.1, FSCE improved the detection accuracy by 5.91% compared to the Random-feature. For the F4 feature, when Payload = 0.1, FSCE improved the detection accuracy by 4.65% compared to the Random-feature and by 0.92% compared to the Origin-feature. For the F5 feature, when Payload = 0.4, FSCE improved the detection accuracy by 0.65% compared to the Random-feature and by 0.16% compared to the Origin-feature. For the F6 feature, when Payload = 0.5, FSCE improved the detection accuracy by 0.82% compared to the Random-feature and by 0.22% compared to the Origin-feature. For the F7 feature, when Payload = 0.3, FSCE improved the detection accuracy by 0.48% compared to the Random-feature and by 0.24% compared to the Origin-feature. For the F8 feature, when Payload = 0.4, FSCE improved the detection accuracy by 0.40% compared to the Random-feature.
For other Payloads, FSCE also achieved excellent results compared to Random-feature and Origin-feature, thus verifying the effectiveness of FSCE.

4.4. Comparison Experiments with Several Classical and State-of-the-Art Feature Selection Methods

To validate the efficiency and generality of FSCE, we conducted comparison experiments with PCA-based method [27], SRGS method [26] and CGSM method [28], where PCA-based method is more classical, SRGS is a more novel method for specific feature selection and CGSM is a more novel method for general feature selection. The results of the comparison between the algorithm proposed in this paper and the above three methods are shown in Table 4.
From Table 4, it is clear that the performance of the proposed FSCE method is superior compared to the PCA-based method, SRGS method and CGSM method. Specifically, for example, for the F1 feature, when Payload = 0.5, the detection accuracy of FSCE is further improved by 13.00% compared to PCA-feature, and by 1.32% compared to SRGS-feature, while FSCE further reduces the dimensionality by 2363-D (about 29.54%) on SRGS-feature. In addition, a further improvement of 0.22% in detection accuracy compared to CGSM-feature. For the F2 feature, when Payload = 0.5, the detection accuracy of FSCE is further improved by 29.19% compared to PCA-feature, 4.41% compared to SRGS-feature and 1.09% compared to CGSM-feature, while the dimensionality is further reduced by 1109-D (about 13.86%). For the F3 feature, when Payload = 0.2, the detection accuracy of FSCE is further improved by 48.77% compared to PCA-feature, 0.28% compared to SRGS-feature, and the dimensionality is further reduced by 1623-D (about 20.29%) compared to CGSM-feature. For the F4 feature, when Payload = 0.3, the detection accuracy of FSCE is further improved by 49.29% compared to the PCA-feature, and the dimensionality is further reduced by 894-D (about 11.18%) compared to the SRGS-feature while maintaining comparable detection accuracy, and compared to the CGSM-feature a further improvement of 0.10%, while the dimensionality was further reduced by 2194-D (about 27.43%).
From the Table 5, for the F5 feature, when Payload = 0.5, the detection accuracy of FSCE is further improved by 12.87% compared to PCA-feature, 0.19% compared to SRGS-feature, and further reduced by 3681-D (about 21.65%) compared to CGSM-feature while maintaining comparable detection accuracy. For the F6 feature, when Payload = 0.5, the detection accuracy of the FSCE is further improved by 17.54% compared to the PCA-feature and by 0.25% compared to the SRGS-feature, while the dimensionality is further reduced by 6183-D (about 36.37%). For the F7 feature, when Payload = 0.5, the detection accuracy of the FSCE is further improved by 7.18% compared to the PCA-feature and by 0.37% compared to the SRGS-feature, while the dimensionality is further reduced by 6482-D (about 38.13%). For the F8 feature, when Payload = 0.1, the detection accuracy of FSCE is further improved by 9.76% compared to PCA-feature, 0.59% compared to SRGS-feature, while the dimensionality is further reduced by 2112-D (about 6.10%), and further improved compared to CGSM-feature by 0.18%, while the dimensionality is further reduced by 8046-D (about 23.21%). For other situations, FSCE has achieved equally excellent results.
Table 6 illustrates the feature selection time of the FSCE method in comparison with the PCA-based and the SRGS and CGSM method. From the table, it can be seen that the feature selection time of the FSCE method proposed in this paper is significantly shorter than that of the PCA-based, SRGS and CGSM method. Specifically, For F1 features, when Payload = 0.5, FSCE takes only 2.93 s to select the features, a further reduction of 125.58 s compared to the PCA-based method, 765.21 s compared to the SRGS method, and 5.45 s compared to the CGSM method. For F2 features, when Payload = 0.1, FSCE takes only 2.88 s to select the features, a further reduction of 124.22 s compared to the PCA-based method, 810.13 s compared to the SRGS method, and 5.37 s compared to the CGSM method. For F3 features, when Payload = 0.2, FSCE takes only 2.84 s to select the features, a further reduction of 125.55 s compared to the PCA-based method, 525.49 s compared to the SRGS method and 5.81 s compared to the CGSM method. For F4 features, when Payload = 0.3, FSCE takes only 2.80 s to select the features, a further reduction of 124.24 s compared to the PCA-based method, 681.51 s compared to the SRGS method and 5.54 s compared to the CGSM method. For F5 features, when Payload = 0.5, FSCE takes only 6.22 s to select the features, a further reduction of 264.54 s compared to the PCA-based method, 1698.08 s compared to the SRGS method and 22.75 s compared to the CGSM method. For F6 features, when Payload = 0.4, FSCE takes only 6.38 s to select the features, a further reduction of 264.59 s compared to the PCA-based method, 1863.42 s compared to the SRGS method and 29.47 s compared to the CGSM method. For F7 features, when Payload = 0.5, FSCE takes only 8.30 s to select the features, a further reduction of 301.99 s compared to the PCA-based method, 2486.4 s compared to the SRGS method and 95.42 s compared to the CGSM method. For F8 features, when Payload = 0.2, FSCE takes only 17.44 s to select the features, a further reduction of 391.16 s compared to the PCA-based method, 2568.86 s compared to the SRGS method and 17.60 s compared to the CGSM method. For other situations, FSCE has achieved equally excellent results.

5. Conclusions

To decrease the feature dimensions and the spatio-temporal overhead, this paper presents a fast selection method for image steganalytic features based on similar cross-entropy (FSCE). Specifically, firstly, the innovative improvement of the normalization range simplifies the analysis of the changing trend of binary information entropy, and lays the foundation for the overall algorithm design process. Secondly, the cross-entropy applicable to image steganalysis is investigated, and a feature contribution measure criterion shaped akin to Fisher’s criterion is constructed. Thirdly, after analyzing the constructed feature contribution criterion, it is found that the criterion has certain shortcomings. For this reason, after reviewing a large amount of literature, the concept of similar cross-entropy is presented for the first time and after analyzing its change trend, the feature contribution criterion is innovated and its reliability is verified theoretically. As a result, the contribution degree of each feature component is better measured. Finally, the feature component with the highest contribution is selected as the final selected feature. Based on the above operations, FSCE has exceptional usability, which facilitates its use in real-world applications with strict memory footprint constraints and high-efficiency requirements.
The effectiveness and efficiency of FSCE have been demonstrated through extensive experiments on the only standard and widely utilized Bossbase 1.01 image library. For example, for the F2 feature, when Payload = 0.5, the detection accuracy of the FSCE is further improved by 4.41% compared to the SRGS-feature. For the F3 feature, when Payload = 0.1, the detection accuracy of the FSCE is improved by 5.91% compared to the Random-feature. For the F4 feature, when Payload = 0.3, the detection accuracy of FSCE is further improved by 49.29% compared to PCA-feature. For the F8 feature, when Payload = 0.5, the detection accuracy of FSCE is further improved by 4.33% compared to CGSM-feature.
For the future, we will continue to devote ourselves to steganography and steganalysis, focusing on two aspects: On the one hand, analyzing the properties of each deleted feature component lays the foundation for new, more secure steganography techniques. On the other hand, investigating the characteristics of each retained feature component lays the foundation for new effective, efficient and low-dimensional steganalysis techniques.

Author Contributions

Conceptualization, R.J. and X.Y.; methodology, X.Y. and Y.M.; software, validation, X.Y.; formal analysis, Y.M.; writing—original draft preparation, X.Y.; writing—review and editing, X.Y. and S.Y.; visualization, X.Y. and L.X.; project administration, X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

Please add: This work was supported in part by the National Natural Science Foundation of China under Grant 62002103, in part by the Key Scientific and Technological Project of Henan Province under Grant 202102210165, in part by the Promotion Special (Soft Science) Project of Henan Province under Grant 202400410088 and in part by the Key Scientific Research (Soft Science) Project of Higher Education Institutions of Henan Province under Grant 19A880030.

Institutional Review Board Statement

Not Applicable.

Informed Consent Statement

Not Applicable.

Data Availability Statement

The images used in this paper are taken from the Bossbase 1.01 image library: http://dde.binghamton.edu/download/ImageDB/BOSSbase_1.01.zip (accessed on 7 November 2014).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zhou, W.; Zhang, W.; Yu, N. A New Rule for Cost Reassignment in Adaptive Steganography. IEEE Trans. Inf. Forensics Secur. 2017, 12, 2654–2667. [Google Scholar] [CrossRef]
  2. Denemark, T.; Fridrich, J. Steganography with Multiple JPEG Images of the Same Scene. IEEE Trans. Inf. Forensics Secur. 2017, 12, 2308–2319. [Google Scholar] [CrossRef]
  3. Tomáš, F.; Judas, J.; Fridrich, J. Minimizing Additive Distortion in Steganography using Syndrome-Trellis Codes. IEEE Trans. Inf. Forensics Secur. 2011, 6, 920–935. [Google Scholar]
  4. Fridrich, J.; Tomáš, P.; Jan, K. Statistically Undetectable JPEG Steganography: Dead Ends, Challenges, and Opportunities. In Proceedings of the 9th Workshop on Multimedia & Security, Dallas, TX, USA, 20–21 September 2007; pp. 3–14. [Google Scholar]
  5. Filler, T.; Fridrich, J. Gibbs Construction in Steganography. IEEE Trans. Inf. Forensics Secur. 2010, 5, 705–720. [Google Scholar] [CrossRef] [Green Version]
  6. Holub, V.; Fridrich, J. Designing Steganographic Distortion using Directional Filters. In Proceedings of the 2012 IEEE International Workshop on Information Forensics and Security, Costa Adeje-Tenerife, Spain, 2–5 December 2012; pp. 234–239. [Google Scholar]
  7. Vojtěch, H.; Fridrich, J.; Denemark, T. Universal Distortion Function for Steganography in an Arbitrary Domain. EURASIP J. Inf. Secur. 2014, 1, 1–13. [Google Scholar]
  8. Tomáš, P.; Bas, P.; Fridrich, J. Steganalysis by Subtractive Pixel Adjacency Matrix. IEEE Trans. Inf. Forensics Secur. 2010, 5, 215–224. [Google Scholar]
  9. Song, X.; Liu, F.; Zhang, Z.; Yang, C.; Luo, X.; Chen, L. 2D Gabor Filters-Based Steganalysis of Content-Adaptive JPEG Steganography. Multimed. Tools Appl. 2016, 76, 26391–26419. [Google Scholar] [CrossRef]
  10. Tomas, P.; Fridrich, J. Merging MarKov and DCT Features for Multi-Class JPEG Steganalysis. In Proceedings of the Security, Steganography, and Watermarking of Multimedia Contents IX, San Jose, CA, USA, 28 January 2007; pp. 650501–650503. [Google Scholar]
  11. Jan, K.; Fridrich, J. Calibration Revisited. In Proceedings of the 11th ACM Workshop on Multimedia and Security, New York, NY, USA, 7–8 September 2009; pp. 63–74. [Google Scholar]
  12. Holub, V.; Fridrich, J. Low-Complexity Features for JPEG Steganalysis using Undecimated DCT. IEEE Trans. Inf. Forensics Secur. 2015, 10, 219–228. [Google Scholar] [CrossRef]
  13. Kodovský, J.; Fridrich, J. Steganalysis of JPEG Images using Rich Models. In Proceedings of the Media Watermarking, Security, and Forensics, Burlingame, CA, USA, 23–25 January 2012; pp. 83030A-1–83030A-13. [Google Scholar]
  14. Fridrich, J.; Kodovsky, J. Rich Models for Steganalysis of Digital Images. IEEE Trans. Inf. Forensics Secur. 2012, 7, 868–882. [Google Scholar] [CrossRef] [Green Version]
  15. Ghasemzadeh, H. Calibrated Steganalysis of Mp3stego in Multi-Encoder Scenario. Inf. Sci. 2019, 480, 438–453. [Google Scholar] [CrossRef] [Green Version]
  16. Zhang, W.; Zhang, Z.; Zhang, L.; Li, H.; Yu, N. Decomposing Joint Distortion for Adaptive Steganography. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 2274–2280. [Google Scholar] [CrossRef]
  17. Denemark, T.; Boroumand, M.; Fridrich, J. Steganalytic Features for Content-Adaptive JPEG Steganography. IEEE Trans. Inf. Forensics Secur. 2016, 11, 1736–1746. [Google Scholar] [CrossRef]
  18. Zhang, R.; Zhu, F.; Liu, J.; Liu, G. Depth-Wise Separable Convolutions and Multi-Level Pooling for an Efficient Spatial CNN-Based Steganalysis. IEEE Trans. Inf. Forensics Secur. 2020, 15, 1138–1150. [Google Scholar] [CrossRef]
  19. Ye, J.; Ni, J.; Yi, Y. Deep Learning Hierarchical Representations for Image Steganalysis. IEEE Trans. Inf. Forensics Secur. 2017, 12, 2545–2557. [Google Scholar] [CrossRef]
  20. Sedighi, V.; Fridrich, J. Histogram layer, Moving Convolutional Neural Networks Towards Feature-Based Steganalysis. Electron. Imaging 2017, 7, 50–55. [Google Scholar] [CrossRef] [Green Version]
  21. Holub, V.; Fridrich, J.; Denemark, T. Random Projections of Residuals as an Alternative to Co-Occurrences in Steganalysis. In Proceedings of the Media Watermarking, Security, and Forensics 2013, Burlingame, CA, USA, 22 March 2013; p. 86650L. [Google Scholar]
  22. Holub, V.; Fridrich, J. Random Projections of Residuals for Digital Image Steganalysis. IEEE Trans. Inf. Forensics Secur. 2013, 8, 1996–2006. [Google Scholar] [CrossRef]
  23. Boroumand, M.; Fridrich, J. Applications of Explicit Non-Linear Feature Maps in Steganalysis. IEEE Trans. Inf. Forensics Secur. 2018, 13, 823–833. [Google Scholar] [CrossRef]
  24. Ma, Y.; Luo, X.; Li, X.; Bao, Z.; Zhang, Y. Selection of Rich Model Steganalysis Features Based on Decision Rough Set α-Positive Region Reduction. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 336–350. [Google Scholar] [CrossRef]
  25. Yang, C.; Zhang, Y.; Wang, P.; Luo, X.; Liu, F.; Lu, J. Steganalysis Feature Subspace Selection Based on Fisher Criterion. In Proceedings of the 2017 IEEE International Conference on Data Science and Advanced Analytics, Tokyo, Japan, 19–21 October 2017; pp. 514–521. [Google Scholar]
  26. Yu, X.; Ma, Y.; Jin, R.; Xu, L.; Duan, X. A Multi-Scale Feature Selection Method for Steganalytic Feature GFR. IEEE Access 2020, 8, 55063–55075. [Google Scholar] [CrossRef]
  27. Qin, J.; Sun, X.; Xiang, X.; Niu, C. Principal Feature Selection and Fusion Method for Image Steganalysis. J. Electron. Imaging 2009, 18, 033009. [Google Scholar]
  28. Wang, Y.; Ma, Y.; Jin, R.; Liu, P.; Ruan, N. Comprehensive Criteria-Based Generalized Steganalysis Feature Selection Method. IEEE Access 2020, 8, 154418–154435. [Google Scholar] [CrossRef]
  29. Lu, J.; Liu, F.; Luo, X. Selection of Image Features for Steganalysis Based on the Fisher Criterion. Digit. Investig. 2014, 11, 57–66. [Google Scholar] [CrossRef]
  30. Ma, Y.; Xu, J.; Zhang, Y.; Yang, C.; Luo, X. W2ID Criterion-Based Rich Model Steganalysis Features Selection. Chin. J. Comput. 2021, 44, 724–740. [Google Scholar]
  31. Chen, Y.; Chen, Y.; Yin, A. Feature Selection for Blind Image Steganalysis using Neighborhood Rough Sets. J. Intell. Fuzzy Syst. 2019, 37, 3709–3720. [Google Scholar] [CrossRef]
  32. Davidson, J.; Jalan, J. Feature Selection for Steganalysis using the Mahalanobis Distance. In Proceedings of the Media Forensics and Security II, San Jose, CA, USA, 18–20 January 2010; p. 754104. [Google Scholar]
  33. Boroumand, M.; Chen, M.; Fridrich, J. Deep Residual Network for Steganalysis of Digital Images. IEEE Trans. Inf. Forensics Secur. 2019, 14, 1181–1193. [Google Scholar] [CrossRef]
  34. Denemark, T.; Holub, V.; Cogranne, R.; Fridrich, J. Selection-Channel-Aware Rich Model for Steganalysis of Digital Images. In Proceedings of the IEEE International Workshop on Information Forensics and Security 2014, Atlanta, GA, USA, 3–5 December 2014; pp. 48–53. [Google Scholar]
  35. Yuan, H.; Li, J.; Lai, L.; Tang, Y. Low-Rank Matrix Regression for Image Feature Extraction and Feature Selection. Inf. Sci. 2020, 522, 214–226. [Google Scholar] [CrossRef]
  36. Patrick, B.; Tomáš, F.; Tomáš, P. “Break Our Steganographic System”: The Ins and Outs of Organizing BOSS. In Proceedings of the International Workshop on Information Hiding, Berlin, Germany, 18–20 May 2011; pp. 59–70. [Google Scholar]
  37. Shore, J.; Johnson, R. Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Trans. Inf. Theory 1980, 26, 26–37. [Google Scholar] [CrossRef] [Green Version]
  38. Pieter, T.; Dirk, P.; Shie, M.; Reuven, Y. A tutorial on the cross-entropy method. Ann. Oper. Res. 2005, 134, 19–67. [Google Scholar]
  39. Wang, D.; Li, Q.; Zhang, J. A New Method to Analyze Evidence Conflict. Control Theory Appl. 2011, 28, 839–844. [Google Scholar]
  40. Kodovsky, J.; Fridrich, J.; Holub, V. Ensemble Classifiers for Steganalysis of Digital Media. IEEE Trans. Inf. Forensics Secur. 2012, 7, 432–444. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Variation of binary cross-entropy when μ 1 and μ 2 take different values. (ac) represent the variation of H ( X , Y ) at different viewpoints when μ 1 = 1 and μ 2 = 1 , respectively. (df) represent the variation of H ( X , Y ) at different viewpoints when μ 1 = 1 and, μ 2 = 1 respectively.
Figure 1. Variation of binary cross-entropy when μ 1 and μ 2 take different values. (ac) represent the variation of H ( X , Y ) at different viewpoints when μ 1 = 1 and μ 2 = 1 , respectively. (df) represent the variation of H ( X , Y ) at different viewpoints when μ 1 = 1 and, μ 2 = 1 respectively.
Symmetry 13 01564 g001
Figure 2. Variation of cross-entropy and similar cross-entropy within [ 0 , 1 e ] (a,b) represent the variation of ln ( x ) and e x , respectively. In addition (c,d) represent the variation of cross-entropy and similar cross-entropy, respectively.
Figure 2. Variation of cross-entropy and similar cross-entropy within [ 0 , 1 e ] (a,b) represent the variation of ln ( x ) and e x , respectively. In addition (c,d) represent the variation of cross-entropy and similar cross-entropy, respectively.
Symmetry 13 01564 g002
Figure 3. Variation of binary similar cross-entropy when μ 1 and μ 2 take different values. (ac) represent the variation of S H ( X , Y ) at different viewpoints when μ 1 = 1 and μ 2 = 1 , respectively. (df) represent the variation of S H ( X , Y ) at different viewpoints when μ 1 = 1 and μ 2 = 1 , respectively.
Figure 3. Variation of binary similar cross-entropy when μ 1 and μ 2 take different values. (ac) represent the variation of S H ( X , Y ) at different viewpoints when μ 1 = 1 and μ 2 = 1 , respectively. (df) represent the variation of S H ( X , Y ) at different viewpoints when μ 1 = 1 and μ 2 = 1 , respectively.
Symmetry 13 01564 g003
Figure 4. Variation of binary similar cross-entropy when μ 1 and μ 2 take different values (ac) represent the variation of S H ( X , Y ) at different viewpoints when μ 1 = 1 and μ 2 = 1 , respectively. (df) represent the variation of S H ( X , Y ) at different viewpoints when μ 1 = 1 and μ 2 = 1 , respectively).
Figure 4. Variation of binary similar cross-entropy when μ 1 and μ 2 take different values (ac) represent the variation of S H ( X , Y ) at different viewpoints when μ 1 = 1 and μ 2 = 1 , respectively. (df) represent the variation of S H ( X , Y ) at different viewpoints when μ 1 = 1 and μ 2 = 1 , respectively).
Symmetry 13 01564 g004
Table 1. Specific experimental setup.
Table 1. Specific experimental setup.
SourceBOSSbase 1.01Number of cover images10,000
Size512 × 512Number of stego images 10,000 × 5
ColourGray-scaleQF95, 75
FormatsJPEGPayload0.1, 0.2, 0.3, 0.4, 0.5
Training images5000 pairsSteganography AlgorithmnsF5 [4], SI-UNIWARD [7],
S-UNIWARD [7]
Testing
images
5000 pairsSteganalysis AlgorithmGFR [9], DCTR [12],
CC-JRM [13], SRM [14]
Total 10,000 × ( 5 + 1 ) × 8 = 480,000
QF represents the quality factor. Payload represents the the embedding rate of the steganography algorithm.
Table 2. Experimental results of FSCE at different values of D s .
Table 2. Experimental results of FSCE at different values of D s .
FPQOrigin0.5N0.52N0.54N0.56N0.58N0.6N0.62N0.64N0.66N0.68N0.7N
F1 D800040004160432044804640480049605120528054405600
0.1PA0.52390.52210.52130.52260.52580.52600.52530.52530.52680.52430.52590.5264
0.2PA0.52560.52260.52400.52420.52680.52860.52710.52860.52850.52870.52870.5277
0.3PA0.54070.53570.53720.53690.53850.54120.54140.53940.54000.53970.53930.5397
0.4PA0.57450.56660.56690.56780.56840.57270.57230.57190.57520.57300.57360.5732
0.5PA0.63000.62030.62370.62560.62800.63020.62980.62950.62970.62830.62990.6293
F2 D800040004160432044804640480049605120528054405600
0.1PA0.72090.72690.72840.72670.72940.72950.72840.72890.72860.72860.72860.7260
0.2PA0.72460.73430.73360.73170.73660.73660.73420.73280.73460.73160.73270.7326
0.3PA0.73380.74320.74260.74320.74420.74470.74370.74170.74210.74190.74060.7418
0.4PA0.75570.76160.76090.76220.76260.76310.76340.76220.76260.76220.76050.7614
0.5PA0.78640.78740.79010.78930.79260.79240.79080.79030.79110.79080.79020.7907
F3 D800040004160432044804640480049605120528054405600
0.1PA0.84960.82780.83110.83420.84210.84380.84890.84780.84730.84760.84690.8486
0.2PA0.99130.97890.98330.98560.98760.98920.99130.99070.99020.99070.99090.9908
0.3PA0.99930.99800.99870.9990.99940.99930.99930.99940.99930.99940.99930.9993
0.4PA0.9998
0.5PA0.9999
F4 D800040004160432044804640480049605120528054405600
0.1PA0.78840.79720.79640.79770.79650.79760.79540.79640.79630.79650.79640.7951
0.2PA0.96070.96480.96490.96520.96560.96550.96520.96520.96530.96540.96440.9646
0.3PA0.99390.99450.99460.99480.99470.99470.99480.99480.99470.99450.99450.9945
0.4PA0.99980.99810.99820.99830.99830.99830.99820.99830.99830.99820.99830.9983
0.5PA0.99910.99910.99910.99910.99920.99920.99920.99910.99910.99920.99920.9992
F5 D17,0008500884091809520986010,20010,54010,88011,22011,56011,900
0.1PA0.51680.51550.5170.51650.51730.51740.5170.5180.51660.51870.51920.5151
0.2PA0.52050.51990.52080.52180.52150.52230.52220.52090.52270.52290.52220.5201
0.3PA0.53880.53650.53700.53670.53800.53890.53810.53830.53840.53850.53750.5388
0.4PA0.57380.57320.57380.57380.57410.57540.57580.57440.57470.57450.57480.5761
0.5PA0.62920.62680.62750.62900.62750.62940.62870.62750.62800.62900.62880.6283
F6 D17,0008500884091809520986010,20010,54010,88011,22011,56011,900
0.1PA0.50350.50510.50340.50450.50540.50560.50420.5040.50360.50520.50440.5033
0.2PA0.52450.52420.52370.52590.52520.52570.52510.52480.52410.52310.52540.5244
0.3PA0.56030.56100.56120.55930.56020.56140.56160.56160.56110.56040.56120.5602
0.4PA0.61250.61200.61300.61260.61080.61390.61340.61450.61160.61340.61140.6110
0.5PA0.67520.67350.67390.67390.67530.67740.67580.67750.67580.67540.67420.6752
F7 D22,51011,25511,70512,15512,60513,05513,50613,95614,40614,85615,30615,757
0.1PA0.5290.52880.52950.53050.5310.53050.52980.53020.53070.53030.53040.5303
0.2PA0.53450.53500.53460.53440.53460.53570.53580.53580.53640.5360.53560.536
0.3PA0.5380.53980.54060.54000.53970.54040.53900.54040.54070.53980.54010.5394
0.4PA0.54750.54670.54760.54630.54850.54840.54730.54830.54850.54720.54750.5466
0.5PA0.57050.56880.56930.57070.57150.57240.5690.57100.57210.57030.57110.5715
D34,67117,33518,02818,72219,41520,10920,80221,49622,18922,88223,57624,269
F80.1PA0.59880.59570.59570.59720.59740.59820.59340.59670.59790.59860.5980.5991
0.2PA0.67950.67990.67880.68020.68000.68020.67880.67970.68010.67920.68040.6807
0.3PA0.7450.74470.74400.74440.74390.74510.74570.74380.74440.74480.74580.7457
0.4PA0.79380.79350.79350.79370.79440.79470.79440.79470.79430.79350.79420.7943
0.5PA0.83750.83650.83690.83730.83860.83870.83750.83790.83810.83780.83860.8374
F1~F8 represent the SI-UNIWAD-DCTR-95, SI-UNIWAD-DCTR-75, nsF5-DCTR-95, nsF5-DCTR-75, SI-UNIWAD-GFR-95, SI-UNIWAD-GFR-75, SI-UNIWAD-CC-JRM-95 and S-UNIWAD-SRM-75 feature, respectively. N represents the total dimensionality of the original feature. P represents the Payload. D represents the feature dimension selected. Remarkably, the darker the background colour, the higher the detection accuracy at this point. “—” means that the detection accuracy of this original feature is already very high and no further feature selection is needed for it.
Table 3. Comparison results of FSCE-feature with Origin-feature and Random-feature.
Table 3. Comparison results of FSCE-feature with Origin-feature and Random-feature.
Feature F1F2F3F4 F5F6F7F8
PayloadQualityDPAPAPAPADPAPADPADPA
0.1Origin80000.52390.72090.84960.788417,0000.51680.503522,5100.529034,6710.5988
Random46400.52320.69200.78470.751198600.51510.502113,0550.527020,1090.5969
FSCE46400.52600.72950.84380.797698600.51740.505613,0550.530520,1090.5989
0.2Origin80000.52560.72460.99130.960717,0000.52050.524522,5100.534534,6710.6795
Random46400.52400.69810.96810.939598600.51800.521813,0550.531120,1090.6764
FSCE46400.52860.73660.98920.965598600.52230.525713,0550.535720,1090.6802
0.3Origin80000.54070.73380.99930.993917,0000.53880.560322,5100.538034,6710.7450
Random46400.53670.70740.99790.990798600.53460.556513,0550.535620,1090.7413
FSCE46400.54120.74470.99930.994798600.53890.561413,0550.540420,1090.7451
0.4Origin80000.57450.75570.99980.999817,0000.57380.612522,5100.547534,6710.7938
Random46400.56740.72980.997498600.56890.606713,0550.544220,1090.7907
FSCE46400.57440.76310.999398600.57540.613913,0550.548420,1090.7947
0.5Origin80000.63000.78640.99990.999117,0000.62920.675222,5100.570534,6710.8375
Random46400.62080.76260.998998600.61820.669213,0550.566520,1090.8354
FSCE46400.63020.79240.999298600.62940.677413,0550.572420,1090.8387
“Bolded” indicates the highest detection accuracy in the current situation. Remarkably, F1–F4 are placed in a box because they have the same feature dimension. F5–F6 are similar. Significantly, the meanings of the abbreviations in this table are consistent with those in Table 2.
Table 4. Comparison results for FSCE, PCA-based, SRGS and CGSM.
Table 4. Comparison results for FSCE, PCA-based, SRGS and CGSM.
FeatureF1F2F3F4
PayloadQualityDPADPADPADPA
0.1Origin80000.523980000.720980000.849680000.7884
PCA46400.500146400.501246400.500946400.5006
SRGS21500.519375420.721345340.833554990.7868
CGSM69990.523468500.715368100.825269990.7866
FSCE46400.526046400.729546400.843846400.7976
0.2Origin80000.525680000.724680000.991380000.9607
PCA46400.500046400.500846400.501546400.5013
SRGS43390.529372520.727143890.986455240.9617
CGSM62780.523965910.718462630.989569620.9605
FSCE46400.528646400.736646400.989246400.9655
0.3Origin80000.540780000.733880000.999380000.9939
PCA46400.500246400.500346400.502346400.5018
SRGS59130.539850560.736644200.999355340.9945
CGSM56710.538963560.728350680.999168340.9937
FSCE46400.541246400.744746400.999346400.9947
0.4Origin80000.574580000.755780000.999880000.9998
PCA46400.500546400.499746400.5023
SRGS70510.568123550.730258140.9982
CGSM49560.572060800.751365680.9979
FSCE46400.574446400.763146400.9993
0.5Origin80000.630080000.786480000.999980000.9991
PCA46400.500246400.500546400.5045
SRGS70030.617023230.748354730.9991
CGSM45700.628057490.781561050.9990
FSCE46400.630246400.792446400.9992
“Bolded” indicates the highest detection accuracy in the current situation. Significantly, the meanings of the abbreviations in this table are consistent with those in Table 2.
Table 5. Comparison results for FSCE, PCA-based, SRGS and CGSM.
Table 5. Comparison results for FSCE, PCA-based, SRGS and CGSM.
FeatureF5F6F7F8
PayloadQualityDPADPADPADPA
0.1Origin17,0000.516817,0000.503522,5100.529034,6710.5988
PCA98600.500098600.500213,0550.500320,1090.5013
SRGS45080.514484370.503287130.535322,2210.5930
CGSM15,7940.514714,5480.503215,2440.532428,1550.5971
FSCE98600.517498600.505613,0550.530520,1090.5989
0.2Origin17,0000.520517,0000.524522,5100.534534,6710.6795
PCA98600.500198600.500213,0550.499820,1090.5021
SRGS70540.521014,5810.524312,4230.539421,5770.6685
CGSM15,3630.522212,8130.524313,1430.537320,7310.6740
FSCE98600.522398600.525713,0550.538020,1090.6802
0.3Origin17,0000.538817,0000.560322,5100.535634,6710.7450
PCA98600.500298600.500613,0550.500020,1090.5038
SRGS82080.536315,5200.560714,3500.541921,3160.7318
CGSM14,8130.537511,3100.560511,3300.541314,1660.7274
FSCE98600.538998600.561413,0550.540420,1090.7451
0.4Origin17,0000.573817,0000.612522,5100.547534,6710.7938
PCA98600.500598600.500813,0550.500520,1090.5041
SRGS85780.574214,0290.610917,8130.550321,4250.7825
CGSM14,1210.574310,2550.608899460.549095060.7608
FSCE98600.575498600.613913,0550.548420,1090.7947
0.5Origin17,0000.629217,0000.675222,5100.570534,6710.8375
PCA98600.500798600.502013,0550.500620,1090.5068
SRGS85800.627516,0430.674919,5370.568721,7500.8239
CGSM13,5410.629410,2330.676687730.570672390.7954
FSCE98600.629498600.677413,0550.572420,1090.8387
“Bolded” indicates the highest detection accuracy in the current situation. Significantly, the meanings of the abbreviations in this table are consistent with those in Table 2.
Table 6. Comparison of feature selection time for FSCE, PCA-based, SRGS and CGSM.
Table 6. Comparison of feature selection time for FSCE, PCA-based, SRGS and CGSM.
FeatureF1F2F3F4F5F6F7F8
PMethodTime of Selecting Feature (s)
0.1PCA127.49127.1127.87127.19272.17274.07314.23408.6
SRGS271.17813.01542.54663.171117.071002.791152.212586.29
CGSM8.658.279.1210.4028.8537.9677.4735.04
FSCE2.952.882.812.956.225.898.1217.44
0.2PCA127.51126.33128.39127.63272.66275.65310.81414.14
SRGS529.64721.53528.33651.41453.061537.181507.822537.18
CGSM8.278.198.6510.3322.0534.4579.3837.91
FSCE2.962.892.842.906.206.387.9317.83
0.3PCA127.74126.54127.60127.04272.81271.10315.25415.16
SRGS697.29600.23519.90684.311537.491790.011726.112529.80
CGSM9.838.318.638.3428.8435.1579.9437.09
FSCE2.922.812.842.806.286.367.9323.75
0.4PCA128.59127.06127.71126.75265.68270.97311.63407.24
SRGS723.55283.77525.92639.651668.661869.752226.192525.94
CGSM8.358.278.188.7822.0735.8197.2698.65
FSCE2.952.812.872.986.206.387.8815.17
0.5PCA128.51127.74127.27126.91270.76272.81310.29415.56
SRGS768.14297.59531.77656.51704.281824.512494.692570.46
CGSM8.389.508.0812.5228.9735.49103.72117.89
FSCE2.932.822.832.876.226.378.3015.07
“Bolded” indicates the shorter feature selection time in the current situation. Significantly, the meanings of the abbreviations in this table are consistent with those in Table 2.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Jin, R.; Yu, X.; Ma, Y.; Yin, S.; Xu, L. A Fast Selection Based on Similar Cross-Entropy for Steganalytic Feature. Symmetry 2021, 13, 1564. https://doi.org/10.3390/sym13091564

AMA Style

Jin R, Yu X, Ma Y, Yin S, Xu L. A Fast Selection Based on Similar Cross-Entropy for Steganalytic Feature. Symmetry. 2021; 13(9):1564. https://doi.org/10.3390/sym13091564

Chicago/Turabian Style

Jin, Ruixia, Xinquan Yu, Yuanyuan Ma, Shuang Yin, and Lige Xu. 2021. "A Fast Selection Based on Similar Cross-Entropy for Steganalytic Feature" Symmetry 13, no. 9: 1564. https://doi.org/10.3390/sym13091564

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop