Next Article in Journal
Measurement and Analysis of Downhole Drill String Vibration Signal
Next Article in Special Issue
Dealing with Low Quality Images in Railway Obstacle Detection System
Previous Article in Journal
A Data Augmentation Method for Skeleton-Based Action Recognition with Relative Features
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Analyzing Benford’s Law’s Powerful Applications in Image Forensics

1
Automation Department, Faculty of Automation and Computer Science, Technical University of Cluj-Napoca, 400027 Cluj-Napoca, Romania
2
Individual Sports Department, Faculty of Physical Education and Sport, Babeș-Bolyai University, 400084 Cluj-Napoca, Romania
*
Author to whom correspondence should be addressed.
Appl. Sci. 2021, 11(23), 11482; https://doi.org/10.3390/app112311482
Submission received: 20 October 2021 / Revised: 25 November 2021 / Accepted: 28 November 2021 / Published: 3 December 2021
(This article belongs to the Special Issue Computer Vision in Mechatronics Technology)

Abstract

:
The Newcomb–Benford law states that in a set of natural numbers, the leading digit has a probability distribution that decays logarithmically. One of its major applications is the JPEG compression of images, a field of great interest for domains such as image forensics. In this article, we study JPEG compression from the point of view of Benford’s law. The article focuses on ways to detect fraudulent images and JPEG quality factors. Moreover, using the image’s luminance channel and JPEG coefficients, we describe a technique for determining the quality factor with which a JPEG image is compressed. The algorithm’s results are described in considerably more depth in the article’s final sections. Furthermore, the proposed idea is applicable to any procedure that involves the analysis of digital images and in which it is strongly suggested that the image authenticity be verified prior to beginning the analyzing process.

1. Introduction

The empirical gem of statistical folklore [1], a phenomenon well-known by some, yet little known by most, is the Newcomb–Benford law. Since in 1881, pocket calculators had not yet been invented, the Newcomb–Benford law (NBL) was first discovered by S. Newcomb while he was looking through the pages of a logarithmic book. When noticing that the earlier pages were much more worn than the other pages, he stated that the first significant figure is more often 1 than any other digit, with the frequency diminishing up to 9 [2].
A further mathematical and experimental analysis was made by F. Benford, who named it The Law of Anomalous Numbers. He stated that the ten digits do not occur with equal frequency, but according to the following logarithmic relation [3]:
F a = log ( a + 1 a ) ,   for   all   a = 1 ,   2 ,   ,   9
In this formula, a is the first significant digit and Fa is the frequency of the digit. A visual representation of this law is presented in Figure 1.
Over the past few years, this law has been gaining traction due to its wide range of applications. For instance, it was used to evaluate financial disclosures in the financial statements of public manufacturing companies in Nigeria and Ghana [4]. Identifying the numerical anomalies of Turkish elections in 2017–2018 was also possible due to Benford’s law [5]. Another political context in which it was used was analyzing the national elections in Spain in 2015–2016 [6]. Detecting fraud in customs declaration [7] and accounting [8] are two other fields where the NBL was applied.
Therefore, the NBL has been proposed to determine fraud in a wide range of domains, varying from elections to financial accounting and international trade. However, when given a closer look, the idea of fraud can be naturally extended to images. Thus, a significant area of study, upon which we have chosen to focus our research, is JPEG compression.
In light of JPEG being one of the most popular file formats, the detection of compressed digital images is of great importance for image forensics and crime detection. That is because when a JPEG image is compressed, it is usually because a photo-editing software is used, and the image is re-saved. Considering this, a compressed image is, therefore, compromised, altered, possibly edited, and must not be accepted as evidence, for instance.
The Newcomb–Benford law is used in the field of JPEG images, in relationship to discrete cosine transform coefficients, to identify the number of compression stages that have been applied [9] and in image steganalysis to detect hidden messages [10] and to detect double-compressed images [11]. Moreover, this law was used to analyze JPEG coefficients, resulting in a generalized form of the law, which can be applied for each quality factor [12].
Therefore, determining the authenticity of digital images is of great importance, even in the field of radiography. For example, regarding radiographs, the X-ray software allows image enhancement [13]. Moreover, they are exported to common file formats, so they can be easily altered. Therefore, when determining if ultrasound imaging is better than radiographs in differentiating endodontic lesions, for instance [14], the authenticity of the radiographs should be verified beforehand. The same procedure is also valuable in the case of X-ray technology, studied from the point of view of dual-energy imaging, which enhances lesion recognition [15].
Furthermore, digital images are also used for neural networks, where large datasets are needed for training. Since both natural and synthetic data can be used, methods to distinguish between the two categories are needed [16]. Such methods to detect fraudulent data can also be applied when using convolutional neural networks, for example, in combination with orbital-field matrix and Magpie descriptors to predict the formation energy of 4030 crystal material [17], or combined with inception blocks and CapsuleNet to study the surface accessibility of transmembrane protein residues [18].
Crystal recognition is another field where it is crucial to determine the reliability of digital images before attempting to classify them using the Mask R-CNN model [19].
One can also verify whether grayscale images (images that are used to measure crystal size distributions) have been altered or not, [20]. Another concept used in many branches is represented by cross-sectional images. When studying how the transformation of the image of a grained structure in a cross-sectional plane reflects structure deformation [21], it is also important to prove the authenticity of the images.
Quantitative image analysis could also benefit from methods that can prove the security of digital images, for example, when studying the evolution of mosaicity during seeded Bridgman processing of technical Ni-based single crystal superalloys [22], or when using X-ray diffraction imaging to study crack geometry and associated strain field around Berkovich and Vickers indents on silicon [23].
In the following sections, the way we have chosen to approach JPEG compression in relationship with the NBL is discussed in detail.

2. Materials and Methods

In general, when the NBL is used to detect various types of data tampering, the following assumption is made and then tested: the real data will follow the distribution given by this law, while the altered data will not. Our research too has considered this concept as the starting point.
However, in the case of JPEG compression, the following problem arises immediately: what data need to be considered and compared to the Benford distribution? What numbers need to be extracted and analyzed? So far in the literature, both the DCT and JPEG coefficients have been compared to Benford’s law [9,12]. However, the algorithm that we have implemented uses the JPEG coefficients, obtained from the luminance channel. The next parts describe the logic behind our decision as well as the outcomes.
Since at first glance this law appears to be counterintuitive, and since it is not the purpose of this article to examine the deep mathematical theories behind this subject, we considered it to be worth mentioning that there are several opinions regarding when the law applies. For instance, F. Benford himself wrote that “the logarithmic law applies particularly to those outlaw numbers that are without known relationship rather than to those that individually follow an orderly course” [3]. A popular claim states that to follow the NBL, a distribution must extend over several orders of magnitude [24,25,26]. However, this was pointed out as a widely spread misconception, by T. P. Hill [27].
Going back to the subject of JPEG compression, the numbers that provide valuable information related to our research are the discrete cosine transform coefficients after quantization. The reason for choosing them is not based on a mathematical proof, but rather on previous work, such as [9,28], and experimental tests [29]. For comparison, the discrete cosine transform coefficients before quantization will also be analyzed. The steps used to obtain these coefficients are thoroughly discussed in the following section.
Regarding the study sample on which we have conducted our research, it contains images from the Uncompressed Color Image Database (UCID).
In our research, the color spaces which were used are RGB and YCBCR. Consequently, the experiments were conducted on the channels R (red), G (green), B (blue), Y (luma component), CB (blue-difference chroma component) and CR (red-difference chroma component). Therefore, the first step when considering an input image is to decide which channel will be used further.
In what follows, the algorithm of obtaining the discrete cosine transform coefficients is discussed. As a starting point, we consider the main steps of JPEG compression represented in Figure 2.
The terminology of “JPEG coefficients” [28] will be used from now on to describe the discrete cosine transform (DCT) coefficients after quantization. As it was previously stated, in our research, both DCT coefficients and JPEG coefficients are analyzed. However, before studying them using the NBL, we provide our readers with an exemplified explanation of the DCT and the quantization.
To begin with, one channel is selected, for instance, the luminance. Then, the image is divided into blocks of 8 by 8 pixels. Such an example is represented in Figure 3.
If representing graphically the luminance values, a histogram is obtained, shown in Figure 4.
Since the range of values is 0–255, by subtracting 128 from each value, a range centered around 0 is obtained, shown in Figure 5.
Then, for each block, the two-dimensional discrete cosine transform is applied. This transformation is a sum of cosines, a sum in which the terms’ coefficients are in decreasing order. Moreover, it is performed in a zigzag manner, as shown in Figure 6.
Therefore, the top left value in our 8 by 8 block, which is the first coefficient of the DCT, has the biggest value [28]. Then, the DCT coefficients decrease. Consequently, the information is concentrated in the area near the top left corner, as suggested by the highlighted area in Figure 7.
The next step is the quantization, which represents the division with a matrix. This is the step where information is lost by allowing the user to choose a quality factor, QF. Naturally, the higher the QF is, the less information is lost. For example, when choosing a QF of 50, the result is the one shown in Figure 8.
For each QF in a range of 1 to 100, there is a different quantization matrix. The main one, the one for QF = 50 , which was also used in our example, is the one in the following relation.
Q 50 = 16 11 10 16 24 40 51 61 12 12 14 19 26 58 60 55 14 13 16 24 40 57 69 56 14 17 22 29 51 87 80 62 18 22 37 56 68 109 103 77 24 35 55 64 81 104 113 92 49 64 78 87 103 121 120 101 72 92 95 98 112 100 103 99
The other quantization matrices are computed using the following formula [28]:
Q n = S · Q 50 + 50 100 ,   where   { S = 1 ,                           for   n = 100 S = 200 2 n ,             for   n   ϵ   ( 50 ,   99 ] S = 5000 n ,                           for   n   ϵ   ( 1 ,   50 )
To sum up, we have now detailed the way in which, from an input image, for a single channel, the DCT coefficients and the JPEG coefficients are calculated.
We have established that the JPEG coefficients depend on the QF that was chosen. Since we want to compare them with the NBL, one can easily comprehend that there is not a single form for this law, that is, Equation (1). The generalized Benford’s law (GBL), proposed by Fu et al. (2007) [12], is used further.
F a = N log 10 ( 1 + 1 s + a q ) ,   for   all   a = 1 ,   2 ,   ,   9  
As it can be observed, for the special case when N   = 1 , s   = 0   and q   = 1 , the formula is indeed the NBL. However, for each QF , the model parameters are different, as shown in Table 1.
Considering the above-mentioned assumption that the real data will follow the distribution given by this generalized law, while the compromised data will not, our approach is presented in the following section.
Two main cases are considered, the “single-compressed” image and the “double-compressed” image. The first case is illustrated in Figure 9.
In this situation, the input image is uncompressed. When obtaining the DCT and the JPEG coefficients, we perform an incomplete JPEG compression. It is incomplete because the encoding step is not made, and the result is not an image because these coefficients are an intermediary step. Consequently, we consider the use of quotes to be appropriate.
Therefore, we expect both DCT and JPEG coefficients to obey the law for the luminance channel since it was previously stated that the chrominance channels do not provide a significant amount of information because they are “typically down-sampled by a factor of two or four, and quantized using larger quantization steps” [11].
The second case is the one of “double-compressed” images, observed in Figure 10.
For this situation, the image is initially compressed and decompressed with QF 1 = 50 , for example. Then, using the above-mentioned steps, the DCT and JPEG coefficients are obtained. Again, this second compression is not a complete one, its only purpose being to extract the coefficients. In this last case, we expect the JPEG coefficients to “violate the proposed logarithmic law unless the re-compression Q-factor is equal to the original Q-factor” [29]. However, we have observed a different rule, which is presented in the next section.

3. Results

Before explaining, in depth, the obtained results, two specifications have to be made. Python was used for both the algorithm and the graphical representations. To compare the frequencies of the leading digit of the coefficients with the NBL, the p-value of the chi-test was used. Moreover, we decided not to use any means of machine learning at this stage to emphasize the principles that we experimentally observed and described.

3.1. The DCT Coefficients

Firstly, the DCT coefficients are discussed. A random input image is considered and the DCT coefficients are calculated and compared to the NBL. Since the quantization is not applied yet and there is no QF involved, the original law is used. The comparison is showed in Figure 11 for a random image from the UCID database. The channel that was used is the luminance.
The results are now presented in the same manner for a “double-compressed” image (an image which is originally compressed once and from which the coefficients are extracted by performing an incomplete second compression). In Figure 12, the input image is firstly compressed with QF = 50 . Then, the DCT coefficients are calculated and compared with the NBL, using the luminance channel.
For a better understanding, a batch of 10 images from the UCID database is considered. The DCT coefficients are again compared to the NBL. This time, the average p-value is calculated. Both cases are considered: when the images are initially uncompressed and when they are compressed with different quality factors.
The experiment is conducted on all channels. Even if, by definition, the first step of JPEG compression consists of converting the image from RGB to YCBCR, we also select, one by one, the R, G, and B channel and perform the same algorithm steps to extract the coefficients. The results are shown in Table 2.
As one can observe, the DCT coefficients obey the law, even if an image was compressed before. Therefore, using this method, they do not provide valuable information in detecting fraudulent data. However, we consider that is especially important to examine them carefully since the cases when the NBL applies are determined rather experimentally than mathematically.

3.2. The JPEG Coefficients—CB and CR Channels

Secondly, the JPEG coefficients are discussed.
To begin with, we analyze the channels CB and CR. For ten random images, the JPEG coefficients are calculated, and using the chi square test, the p-value is obtained, related to the GBL. The results are represented in Figure 13.
It can be observed that at this stage, the JPEG coefficients obtained from the CB channel do not appear to follow any specific pattern using our analysis. Since similar results were obtained for the CR channel, we have not studied these two channels further.

3.3. The JPEG Coefficients—Y, R, G and B Channels

In what follows, we analyze the JPEG coefficients for the channels Y, R, G and B, in relationship with the GBL. Firstly, we compare these coefficients with the generalized Benford’s law, using a large number of random, uncompressed images from the UCID database. The results are shown in Table 3, along with the p-values obtained by performing the chi square test for the coefficients and the GBL.
The results show that indeed these coefficients follow the Benford distribution. An important aspect is also the sample size of the JPEG coefficients. In our case, the images from the UCID database have a size of 340 × 512 pixels, resulting in 3072 JPEG coefficients, and therefore, 3072 most significant digits to be analyzed.
Next, regarding these channels, we make the following statement, verified for the images in the UCID: for an originally compressed image with QF 1 , using the above-mentioned method of obtaining the JPEG coefficients with a QF 2 , with QF 1   QF 2     10 , the p-value obtained by performing the chi square test for the coefficients and the GBL has the following property:
for   QF 2 >   QF 1             p < n for   QF 2   QF 1             p > n n = 0.6
Thus, if an imaginary threshold line is set at n = 0.6 , the first QF 2 under it is the second compression quality factor. Different values for this threshold are analyzed in what follows. Before continuing the reasoning, an example is given.
In our research, the following quality factors were used: QF { 90 ,   80 ,   70 ,   60 ,   50 } . Therefore, considering the above-mentioned property, two cases must be discussed first. When the input image is initially uncompressed or compressed with QF 1 = 90 , there is no QF 2 such that QF 2 > QF 1 . Consequently, all p-values satisfy p > n , n = 0.6 , as exemplified in Table 3.
As it can be observed from Table 4, this method cannot distinguish between uncompressed images and images that are originally compressed with QF 1 = 90 . Therefore, our algorithm is used to determine QF 1 , provided that the original image is already compressed.
For a different quality factor, such as QF 1 = 60 , the property can be easily observed in an example, in Table 5. Thus, by establishing the threshold at p-value n = 0.6 , suggested by the green line, the first QF 2 under it, is QF 2 = 60 , which is indeed the second compression quality factor.
In fact, QF 2 determined using this method has the property QF 2 = QF 1 .

4. Discussion

Before moving further, it is important to restate that the developed algorithm finds the quality factor with which the image was compressed only once. The second quality factor is only applied by the algorithm in an incomplete second compression. Its only purpose is to obtain the JPEG coefficients (which is an intermediary step of JPEG compression).
In light of the abovementioned observations, we make the following complete statement. Given an input image, compressed with a quality factor QF 1 { 90 ,   80 ,   70 ,   60 ,   50 } , to determine it, the following algorithm can be applied:
  • Calculate the JPEG coefficients for each QF 2 { 90 ,   80 ,   70 ,   60 ,   50 } , denoted by JPEG i   , i { 90 ,   80 ,   70 ,   60 ,   50 } ;
  • For all coefficients, in relationship with the GBL, perform the chi square test and determine the p-values, denoted by p i , i { 90 ,   80 ,   70 ,   60 ,   50 } ;
  • If all p i satisfy the relationship p i > 0.6 , the initial quality factor is QF 1 = 90 ;
  • Else, the largest QF 2 for which p i > 0.6 is the initial quality factor QF 1 = QF 2 .
In what follows, this property is exemplified on 100 images from the UCID, for each channel Y, R, G, and B, for QF 1 { 90 ,   80 ,   70 ,   60 ,   50 } . To represent graphically the outcome, the average p-values are calculated. Figure 14, Figure 15, Figure 16 and Figure 17 show the results. The threshold p-value is represented as a dashed red line for each case.
As it can be observed, the threshold p-value n = 0.6 represented by the red dashed line seems to be an appropriate choice for the algorithm. However, in Table 6, the results are shown for different threshold values. Random images are taken from the UCID database. They are compressed with different quality factors QF 1 { 90 ,   80 ,   70 ,   60 ,   50 } . Then, the abovementioned algorithm computes the quality factor QF 1 , using the JPEG coefficients obtained from the luminance channel. Besides the overall algorithm accuracy, we have also included the accuracies with which each original quality factor was determined.
Therefore, the threshold p-value is chosen to be n = 0.7 since it conducts to the best results for our algorithm.
Moreover, the accuracy of the algorithm decreases when detecting smaller quality factors. We believe that one of the reasons behind this is the parameters of the GBL. Tuning these parameters for our algorithm is a future goal.
However, the original algorithm only makes use of one channel, the luminance. Next, we use all four channels to predict the quality factor QF 1 . In this manner, if three of the four channels predict for example QF 1 = 60 and one of the channels predicts QF 1 = 50 , the result is QF 1 = 60 .
As it can be observed from Table 7, the accuracy of the algorithm cannot be improved in this manner. This is due to the fact that the luminance channel conducts to the best results, while the other channels have a negative impact on the predicted quality factor. Therefore, we have reached the conclusion that the algorithm, which uses the parameters of the generalized Benford’s law as they were presented in the previous sections, has the best performance when applied to the luminance channel while using a threshold p-value n = 0.7 .

5. Conclusions

Summarizing, the above-described algorithm uses the JPEG coefficients from the luminance channel of already compressed JPEG images. It, therefore, determines the quality factor with which the analyzed images were compressed. Without using any means of machine learning, this algorithm reached an accuracy of 89% for 500 random images. Moreover, the algorithm had an accuracy of 100% when detecting the images compressed with quality factors 80 and 90.
As a final thought, we refer to the idea that “the numbers but play the poor part of lifeless symbols for living things” (Benford, 1938). In light of our research, we strongly believe that, when analyzed properly and meticulously, the numbers come alive, providing essential information. Such a case is the one of JPEG coefficients, which embed the traces of an image’s compression history.
Finally, it may be concluded that the methods of detecting the quality factor, such as the one presented above, along with the described properties of JPEG coefficients, are fundamental for developing deep learning algorithms used in image forensics. The results are presented in more details in the Discussion section of this article.
Moreover, we strongly believe that not only is the detection of a compromised image of great importance, but also the detection of the quality factor with which it was compressed.

Author Contributions

Conceptualization, D.C. and D.G.; methodology, D.G. and A.I.; software, D.C.; validation, L.M., H.V. and A.P.; formal analysis, O.S.; investigation, D.C.; resources, D.G.; data curation, D.C.; writing—original draft preparation, D.C.; writing—review and editing, D.G.; visualization, D.G.; supervision, L.M.; project administration, D.G. and D.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Berger, A.; Hill, T.P. Introduction. In An Introduction To Benford’s Law; Berger, A., Hill, T.P., Eds.; Princeton University Press: Princeton, NJ, USA, 2015; p. 1. [Google Scholar]
  2. Newcomb, S. Note on the Frequency of Use of the Different Digits in Natural Numbers. Am. J. Math. 1881, 4, 39. [Google Scholar] [CrossRef] [Green Version]
  3. Benford, F. The law of anomalous numbers. Proc. Am. Philos. Soc. 1938, 78, 551–572. [Google Scholar]
  4. Nwoye, U.J.; Adeniyi, S.I.; Abiahu, M.F.C. Achieving transparent IFRS financial reporting in Nigeria and Ghana: The B & B model effect. J. Tax. Econ. Dev. 2021, 19, 34–64. [Google Scholar]
  5. Klimek, P.; Jiménez, R.; Hidalgo, M.; Hinteregger, A.; Thurner, S. Forensic analysis of Turkish elections in 2017–2018. PLoS ONE 2018, 13, e0204975. [Google Scholar] [CrossRef]
  6. Fernández-Gracia, J.; Lacasa, L. Bipartisanship Breakdown, Functional Networks, and Forensic Analysis in Spanish 2015 and 2016 National Elections. Complexity 2018, 2018, 9684749. [Google Scholar] [CrossRef] [Green Version]
  7. Cerioli, A.; Barabesi, L.; Cerasa, A.; Menegatti, M.; Perrotta, D. Newcomb–Benford law and the detection of frauds in in-ternational trade. Proc. Natl. Acad. Sci. USA 2019, 116, 106–115. [Google Scholar] [CrossRef] [Green Version]
  8. Asllani, A.; Naco, M. Using Benford’s Law for Fraud Detection in Accounting Practices. J. Soc. Sci. Stud. 2014, 2, 129. [Google Scholar] [CrossRef]
  9. Milani, S.; Tagliasacchi, M.; Tubaro, S. Discriminating multiple JPEG compressions using first digit features. APSIPA Trans. Signal Inf. Process. 2014, 3. [Google Scholar] [CrossRef] [Green Version]
  10. Pérez-González, F.; Heileman, G.L.; Abdallah, C.T. Benford’s law in image processing. In Proceedings of the 2007 IEEE International Conference on Image Processing, San Antonio, TX, USA, 16 Septmber–19 October 2007; Volume 1, p. I-405. [Google Scholar]
  11. Mahdian, B.; Saic, S. Detecting double compressed JPEG images. In Proceedings of the 3rd International Conference on Imaging for Crime Detection and Prevention, London, UK, 3 December 2009. [Google Scholar]
  12. Fu, D.; Shi, Y.Q.; Su, W. A generalized Benford’s law for JPEG coefficients and its applications in image forensics. In Security, Steganography, and Watermarking of Multimedia Contents IX (Vol. 6505, p. 65051L), Proceedings of the Electronic Imaging 2007, San Jose, CA, USA, 29 January–1 February 2007; SPIE: Bellingham, DC, USA, 2007. [Google Scholar]
  13. Calberson, F.L.; Hommez, G.M.; De Moor, R.J. Fraudulent Use of Digital Radiography: Methods To Detect and Protect Digital Radiographs. J. Endod. 2008, 34, 530–536. [Google Scholar] [CrossRef]
  14. Patil, S.; Alkahtani, A.; Bhandi, S.; Mashyakhy, M.; Alvarez, M.; Alroomy, R.; Hendi, A.; Varadarajan, S.; Reda, R.; Raj, A.; et al. Ultrasound Imaging versus Radiographs in Differentiating Periapical Lesions: A Systematic Review. Diagnostics 2021, 11, 1208. [Google Scholar] [CrossRef]
  15. Paternò, G.; Cardarelli, P.; Gambaccini, M.; Taibi, A. Dual-Energy X-ray Medical Imaging with Inverse Compton Sources: A Simulation Study. Crystals 2020, 10, 834. [Google Scholar] [CrossRef]
  16. Trampert, P.; Rubinstein, D.; Boughorbel, F.; Schlinkmann, C.; Luschkova, M.; Slusallek, P.; Dahmen, T.; Sandfeld, S. Deep Neural Networks for Analysis of Microscopy Images—Synthetic Data Generation and Adaptive Sampling. Crystals 2021, 11, 258. [Google Scholar] [CrossRef]
  17. Cao, Z.; Dan, Y.; Xiong, Z.; Niu, C.; Li, X.; Qian, S.; Hu, J. Convolutional Neural Networks for Crystal Material Property Prediction Using Hybrid Orbital-Field Matrix and Magpie Descriptors. Crystals 2019, 9, 191. [Google Scholar] [CrossRef] [Green Version]
  18. Lu, C.; Liu, Z.; Kan, B.; Gong, Y.; Ma, Z.; Wang, H. TMP-SSurface: A Deep Learning-Based Predictor for Surface Accessibility of Transmembrane Protein Residues. Crystals 2019, 9, 640. [Google Scholar] [CrossRef] [Green Version]
  19. Qin, J.; Zhang, Y.; Zhou, H.; Yu, F.; Sun, B.; Wang, Q. Protein Crystal Instance Segmentation Based on Mask R-CNN. Crystals 2021, 11, 157. [Google Scholar] [CrossRef]
  20. Wirz, D.; Hofmann, M.; Lorenz, H.; Bart, H.-J.; Seidel-Morgenstern, A.; Temmel, E. A Novel Shadowgraphic Inline Meas-urement Technique for Image-Based Crystal Size Distribution Analysis. Crystals 2020, 10, 740. [Google Scholar] [CrossRef]
  21. Minárik, S.; Martinkovič, M. On the Applicability of Stereological Methods for the Modelling of a Local Plastic Deformation in Grained Structure: Mathematical Principles. Crystals 2020, 10, 697. [Google Scholar] [CrossRef]
  22. Hallensleben, P.; Scholz, F.; Thome, P.; Schaar, H.; Steinbach, I.; Eggeler, G.; Frenzel, J. On Crystal Mosaicity in Single Crystal Ni-Based Superalloys. Crystals 2019, 9, 149. [Google Scholar] [CrossRef] [Green Version]
  23. Tanner, B.K.; Allen, D.; Wittge, J.; Danilewsky, A.N.; Garagorri, J.; Gorostegui-Colinas, E.; Elizalde, M.R.; McNally, P.J. Quantitative Imaging of the Stress/Strain Fields and Generation of Macroscopic Cracks from Indents in Silicon. Crystals 2017, 7, 347. [Google Scholar] [CrossRef] [Green Version]
  24. Election Integrity Partnership. Available online: https://www.eipartnership.net/rapid-response/what-the-election-results-dont-tell-us (accessed on 15 July 2021).
  25. Reuters. Available online: https://www.reuters.com/article/uk-factcheck-benford/fact-check-deviation-from-benfords-law-does-not-prove-election-fraud-idUSKBN27Q3AI (accessed on 15 July 2021).
  26. Wolfram|Alpha Blog. Available online: https://blog.wolframalpha.com/2010/12/13/the-curious-case-of-benfords-law/ (accessed on 15 July 2021).
  27. Hill, T.P. A Widespread Error in the Use of Benford’s Law to Detect Election and Other Fraud. arXiv 2020, arXiv:2011.13015. [Google Scholar]
  28. Praveenkumar, S.; Karuppanagounder, S.; Magesh, S.; Thiruvenkadam, K. The effect of quantizing the Discrete Cosine Transform coefficients at different quality factors for image compression [Paper presentation]. In Proceedings of the International Conference on Mathematical Modelling and Scientific Computation, Gandhigram, India, 16–18 March 2012. [Google Scholar]
  29. Cerqueti, R.; Lupi, C. Some New Tests of Conformity with Benford’s Law. Stats 2021, 4, 745–761. [Google Scholar] [CrossRef]
Figure 1. The Newcomb–Benford Law.
Figure 1. The Newcomb–Benford Law.
Applsci 11 11482 g001
Figure 2. Obtaining a JPEG compressed image.
Figure 2. Obtaining a JPEG compressed image.
Applsci 11 11482 g002
Figure 3. Luminance values for an 8 by 8 block of pixels.
Figure 3. Luminance values for an 8 by 8 block of pixels.
Applsci 11 11482 g003
Figure 4. The luminance histogram for a block of 8 by 8 pixels.
Figure 4. The luminance histogram for a block of 8 by 8 pixels.
Applsci 11 11482 g004
Figure 5. Pixels values in range [−128, 127].
Figure 5. Pixels values in range [−128, 127].
Applsci 11 11482 g005
Figure 6. The zigzag rule of the discrete cosine transform.
Figure 6. The zigzag rule of the discrete cosine transform.
Applsci 11 11482 g006
Figure 7. The DCT coefficients.
Figure 7. The DCT coefficients.
Applsci 11 11482 g007
Figure 8. The JPEG coefficients.
Figure 8. The JPEG coefficients.
Applsci 11 11482 g008
Figure 9. The “single-compressed” image.
Figure 9. The “single-compressed” image.
Applsci 11 11482 g009
Figure 10. The “double-compressed” image.
Figure 10. The “double-compressed” image.
Applsci 11 11482 g010
Figure 11. The DCT coefficients compared to the NBL for an uncompressed image.
Figure 11. The DCT coefficients compared to the NBL for an uncompressed image.
Applsci 11 11482 g011
Figure 12. The DCT coefficients compared to the NBL for an image originally compressed with QF = 50 .
Figure 12. The DCT coefficients compared to the NBL for an image originally compressed with QF = 50 .
Applsci 11 11482 g012
Figure 13. The JPEG coefficients compared to the GBL, using the CB channel.
Figure 13. The JPEG coefficients compared to the GBL, using the CB channel.
Applsci 11 11482 g013
Figure 14. The JPEG coefficients compared to the GBL, using the Y channel.
Figure 14. The JPEG coefficients compared to the GBL, using the Y channel.
Applsci 11 11482 g014
Figure 15. The JPEG coefficients compared to the GBL, using the R channel.
Figure 15. The JPEG coefficients compared to the GBL, using the R channel.
Applsci 11 11482 g015
Figure 16. The JPEG coefficients compared to the GBL, using the G channel.
Figure 16. The JPEG coefficients compared to the GBL, using the G channel.
Applsci 11 11482 g016
Figure 17. The JPEG coefficients compared to the GBL, using the B channel.
Figure 17. The JPEG coefficients compared to the GBL, using the B channel.
Applsci 11 11482 g017
Table 1. Proposed normalization factor and model parameters.
Table 1. Proposed normalization factor and model parameters.
Quality FactorNormalization Factor NModel Parameter
q
Model Parameter
s
1001.4561.470.0372
901.2551.563−0.3784
801.3241.653−0.3739
701.4121.732−0.337
601.5011.813−0.3025
501.5791.882−0.2725
Table 2. Comparison between DCT coefficients and the NBL.
Table 2. Comparison between DCT coefficients and the NBL.
Quality Factor for the Initial Compressionp-Value for the
Y ChannelCB ChannelCR ChannelR ChannelG ChannelB Channel
Initially uncompressed0.999890.999930.999930.999960.999940.99994
9010.999340.999340.999980.999940.99984
800.993840.999780.999780.99740.987260.99952
700.990630.999910.999910.998070.987330.99904
600.990540.999920.999920.998860.989030.99908
500.989030.999970.999970.998720.988790.99937
Table 3. Comparison between JPEG coefficients and the GBL for uncompressed images.
Table 3. Comparison between JPEG coefficients and the GBL for uncompressed images.
Original ImageChannel Used to Obtain the JPEG CoefficientsQuality Factor Used to Obtain the JPEG Coefficients Average p-Value for 500 Images
Initially uncompressedY900.9963
800.9925
700.9264
600.8938
500.8882
R900.9900
800.9853
700.9406
600.8996
500.8932
G900.9954
800.9925
700.9184
600.8828
500.8766
B900.9809
800.9601
700.9032
600.8547
500.8408
Table 4. Comparison between JPEG coefficients and the GBL, using Y channel.
Table 4. Comparison between JPEG coefficients and the GBL, using Y channel.
Quality Factor for the Initial Compression QF1Quality Factor for the Initial Compression QF2Average p-Valuefor 100 Images
Initially uncompressed900.999242
800.997895
700.975055
600.951813
500.946787
90900.999219
800.990972
700.978843
600.970747
500.947362
Table 5. Comparison between JPEG coefficients and the GBL, using Y channel.
Table 5. Comparison between JPEG coefficients and the GBL, using Y channel.
Quality Factor for the Initial Compression QF1Quality Factor for the Initial Compression QF2Average p-Value for 100 Images
60900
800
700.052688
600.955229
500.917083
Table 6. The algorithm results for different threshold p-values, using Y channel.
Table 6. The algorithm results for different threshold p-values, using Y channel.
Threshold p-ValueNumber of ImagesChannel from Which the JPEG Coefficients Are ExtractedDetection Accuracy for Each Original QF1 Overall Accuracy
9080706050
n = 0.5500Y110.90.890.520.862
n = 0.6 110.90.870.610.876
n = 0.7110.910.850.70.892
n = 0.8110.880.830.720.886
Table 7. The algorithm results using Y, R, G and B channels.
Table 7. The algorithm results using Y, R, G and B channels.
Threshold p-ValueNumber of ImagesChannel from Which the JPEG Coefficients Are ExtractedDetection Accuracy for Each Original QF1 Overall Accuracy for One ChannelOverall Accuracy
9080706050
n = 0.7500Y110.910.850.70.8920.882
R110.920.820.520.852
G110.880.830.660.874
B110.840.740.440.804
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Crișan, D.; Irimia, A.; Gota, D.; Miclea, L.; Puscasiu, A.; Stan, O.; Valean, H. Analyzing Benford’s Law’s Powerful Applications in Image Forensics. Appl. Sci. 2021, 11, 11482. https://doi.org/10.3390/app112311482

AMA Style

Crișan D, Irimia A, Gota D, Miclea L, Puscasiu A, Stan O, Valean H. Analyzing Benford’s Law’s Powerful Applications in Image Forensics. Applied Sciences. 2021; 11(23):11482. https://doi.org/10.3390/app112311482

Chicago/Turabian Style

Crișan, Diana, Alexandru Irimia, Dan Gota, Liviu Miclea, Adela Puscasiu, Ovidiu Stan, and Honoriu Valean. 2021. "Analyzing Benford’s Law’s Powerful Applications in Image Forensics" Applied Sciences 11, no. 23: 11482. https://doi.org/10.3390/app112311482

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop