1. Introduction
Effective pre-audit accounting data analysis is crucial in detecting possible earnings management and tax planning, thereby enhancing the expected integrity of financial reporting. In UK, the tax gap, i.e., the difference between the amount of tax that should theoretically be paid to HMRC, and what is actually paid, was estimated at GBP 39.8 billion for the 2022 to 2023 tax year, accounting for 4.8% of the known total liabilities (
https://www.gov.uk/government/statistics/measuring-tax-gaps/1-tax-gaps-summary; accessed on 29 January 2025).
During our investigation of tax optimization by listed FTSE companies, we had to use data on their pre-tax income (
) and data on their Total Assets (
) [or “
” =
(
)] in a given year over a large time interval. In fact, the ratio
weighted through some irrelevant factor, at this stage, is a correction term to the deferred tax expense (
) needed in order to calculate the (finally relevant) studied tax avoidance, estimated from the measure of total book–tax differences (
) [
1].
In so doing, we came across a huge dataset for and , i.e., 6811 and 6768 data points, respectively. One pertinent question delves into the reliability of such pre-taxation data. Its analysis plays a crucial role in ensuring the fairness of public revenue collection. Econometrics, mixing statistical analysis and economic considerations, should increase the confidence of the population, and even more so that of taxation officers. The findings of this study, if the data are found to be markedly bizarre, can subsequently serve as a disciplinary tool toward reducing the risk of tax avoidance, and even tax evasion, as well as manipulations of profit shifts, which (among other things) are great concerns of the UK government.
Benford’s Laws, mathematically written in
Section 3, state that the leading digits of (naturally occurring) numbers follow specific
distributions [
2,
3]. A deviation from this empirical distribution is often considered as a warning suggesting a more detailed examination. Therefore, it has seemed to us of pertinent interest to observe whether such
and
data fulfil the expectations of BLs. This is the main aim of this report—i.e., the research questions and the distributions of the first, second, and first–second digits in such data are tested according to the empirical BLs, called BL1, BL2, and BL12, respectively.
In other words, through the null hypothesis (agreement between the empirical data and expected laws), we aim to assess the discrepancy between secondary (financial) data and their possible theoretical Benford distributions, through two statistical tests based on rather different concepts: the
test and the Mean Absolute Deviation (MAD), both outlined in
Section 4.
The BLs are of interest for identifying irregularities in financial reports [
4,
5,
6,
7,
8,
9,
10], but the literature is too huge to be summarized here. Let us mention that BLs have been considered in many fields, not just finance, but also academia, elections [
11], engineering, medicine, psychology, physics, religion, scientometrics, sports, and likely many others; some of these are discussed in [
1], with which the following subsection sometimes overlaps.
Moreover, because of the
and
data type and size, it is possible to distinguish between negative and positive
PIs and to study them in the BLs framework. Notice that this sign distinction has rarely been considered; indeed most authors, except a few to our knowledge, have searched for BL obedience in absolute values of data [
10].
A short and focused literature review is found in
Section 2. The literature is vast. In order not to add to useless (and, necessarily or often, incomplete) literature reviews, we reduce the present literature review section to the essential papers pertinent to our aim; i.e., we include a mention of (i) the pioneering papers, (ii) the next most often quoted ones, which are likely of general interest, and (iii) the most recent ones, essentially to pin point the “state of the art”. We focus on papers at the intersection of three sets: (i) high order BLs, in particular BL2 and BL12, (ii) financial data relevant to pre-taxation, and (iii) statistical tests, in particular papers considering the Mean Absolute Deviation (MAD).
A warning: irregularities may not always be revealed through the use of Benford’s Law [
12,
13,
14,
15]. Moreover, financial data might not necessarily strictly conform to BLs: they may depend on the data ranges [
16,
17].
Notice that within a statistical framework, it is sometimes debated whether BL compliance can be extended to derived, correlated, or combined quantities. Therefore, we also calculated when possible, i.e., when both and data were reported for a company in a given year.
To conclude these remarks, a fundamental limitation of studies reported so far should be pointed out: this limitation is their common focus on BL1 tests, except for a few authors who have approached income items and BL2 from a behavioural perspective [
18,
19,
20,
21]. Consequently, our research aims to significantly advance the literature by conducting an in-depth statistical analysis of (accounting and tax) variables in financial statements, exploring potential conformity to BL1, BL2, and BL12.
We fully present the data acquisition and study methodology in
Section 3. We run two null hypothesis significance testing methods to investigate conformity to BLs. More precisely, as measures to assess the discrepancy between the empirical and the theoretical Benford distributions, we use the
test and the MAD test as described in detail in
Section 4. We report our findings through Tables and Figures. Interestingly, we find that the MAD test entirely rejects the conformity of the reported financial data with Benford’s Laws, i.e., the relevant null hypothesis.
We conclude in
Section 5 both with remarks on the statistical tests and their disagreement, and on the implications for practical accounting research. Indeed, we find that both
tests and MAD tests do not lead to conclusions in complete agreement with each other. Therefore, we reject the null hypothesis. Therefore, this finding demands further studies on the validity ranges of statistical test applications. Notice that the study of a ratio, like
, with variables that are (or not) Benford’s Laws-compliant (as is differently found a posteriori), adds to the literature concerning whether indirectly measured variables should be (or not) Benford’s Laws-compliant.
Furthermore, from the mere accounting point of view, we conclude that the findings not only cast some doubt on the reported financial data, but also suggest that many more empirical and thorough investigations may be needed regarding closely related financial data reported by listed companies.
2. Literature Review
Benford’s Laws have been like a “sleeping beauty” sleeping in the dirty pages of logarithmic tables [
22]. But they have been revived in the strategic management literature [
23] and in accounting [
10,
24], both of which are fields of interest here. The range of applications is huge, i.e., as long as large natural datasets are available. For conciseness, we focus on papers at the intersection of three sets: (i) high-order BLs, (ii) tax-releated financial data, and (iii) specific statistical tests, in particular papers considering the Mean Absolute Deviation (MAD). We also comment on pertinent papers at the intersection of pairs of such sets.
In brief, one should start recognizing the pioneer work of Nigrini [
4,
5,
10,
25]: he introduced and developed statistical research based on BLs to estimate the compliance of taxpayers and of companies engaging in tax planning strategies. In [
10], Nigrini provides a review of the literature on audit sampling and perspectives.
One of the most relevant papers at the above-mentioned triple set intersection is that of Alali and Romero from 2013 [
26]. They discovered significant differences in various accounting figures within US financial statements of companies, either audited by Big Four firms or by non-Big Four firms. Thereafter, Druica et al. studied Romanian banks along BL2 [
12]. Also, Prachyl and Fischer evaluated the conformity of municipality financial data with BL2 [
27]. Cheuk et al. assessed the “financial reporting quality of Company Limited by Guarantee charities in Malaysia” [
28].
More oriented toward comparing tests, Kössler et al. looked at share prices [
29] essentially through BL2. Along the same lines, da Silva Azevedo [
30] as well as Cerqueti and Lupi [
31] considered BL12.
In a more comprehensive way, Sadaf [
32], Patel et al. [
33], and Sylwestrzak [
34] studied both BL2 and BL12 with respect to possible data manipulation by managers.
Concerning the pertinent literature at the intersection of pairs of sets, let us recognize the keystone and pioneering observation of Carslaw in 1988 [
35]: he observed while working on BL2 that New Zealand companies’ income statements showed a markedly higher occurrence of 0s and a lower occurrence of 9s in the second digit position than should be expected, implying voluntary roundings. Similar findings were obtained by Niskanen and Keloharju in 2000 on Finnish public companies [
36]. Using both BL2 and BL12, Ausloos et al. also found that “Benford’s laws tests on S&P500 daily closing values and the corresponding daily log-returns both pointed to huge non-conformity” [
16]. Similar studies on two joint BLs by Das et al. [
37] and by Jordan and Clark [
38] can be mentioned.
For completeness, let us also mention that authors have used various values of financial results and tests to detect potential data manipulation. Of interest with respect to our report are, e.g., Van Caneghem [
39] who investigated with the aid of BL2 tests a sample of 1256 UK companies that reported pre-tax income for the accounting year 1998. He found results similar to those of Carslaw [
35]. Other recent works containing BL2 studies are [
40,
41,
42,
43,
44,
45]. Last but not least, Günnel and Tödter considered BL12 [
46], as did Le and Mantelaers, who even discussed the state of the art up to BL123 [
47]; Sardar and Sharma also studied BL3 and BL4 in respect to the financial reports of several listed companies of the Adani Group [
48].
Of course, many other works discuss statistical tests and financial reports sometimes using or MAD tests; many other statistical tests are available and have been considered. However, because such papers restrict their consideration to BL1, for the sake of the brevity of this literature review, we reiterate that we limit ourselves to the above works for the sake of framing our field of interest.
4. Findings
We report the results in the following Tables:
Table 3 contains the (rounded) first digit (
) of the downloaded
, as well as
and
, and their pertinent frequencies to be compared to the BL1. From the results of the two tests, we can notice that the
variable shows non-conformity (both in the “gain” and “loss” domains) considering the MAD test. The
analysis shows mixed results, pointing to some violation for the
variable and its “gain” domain. Notably, we observe “under-occurrence” of the last three digits (7, 8, 9) as the first digit concerning BL1 distribution, potentially pointing to corporate “rounding up” practices.
Table 4 contains the (rounded) first digit (
) occurrences of the downloaded
and the deduced
values and their pertinent frequencies to be compared to the empirical BL1. For the total assets and the
ratio, we again observe complete non-conformity based on the MAD test and non-conformity of the ratio variable considering the
analysis. We can observe for the
variable an “over-occurrence” of the last digit
, which also leads to some interesting behavior for the occurrence of
and
as in the first digit of the
ratio. The findings can be of interest for suggesting a deeper investigation of different ratios as part of financial statement analyses.
Table 5 reports the (rounded) second digit (
) occurrence of
also distinguishing between
and
, and the pertinent frequencies for comparison to BL2. We can observe that the results of the BL2 analysis are more mixed, with an absolute non-conformity for both “loss” and “gain” domains of pre-tax income with the MAD test, and a conformity of all second digits, considering the
analysis.
Table 6 summarizes the (rounded) second digit (
) occurrence of
and
and their pertinent frequencies to be compared to the BL2. For the
variable and the derived ratio, we observe the same picture as for the
analysis: non-conformity with MAD for all variables and also non-conformity considering the behavior of the
ratio with the
test. For the ratio variable, we can observe an “over-occurrence” of
and
as well as
as the second digit. These observations point to possible irregularities in
reporting and in the valuation of companies’ assets.
Table 7 reports the (rounded) first and second digit (
) of the downloaded
, again further distinguishing between
and
, and the pertinent frequencies to be compared to BL12.
Table 8 contains a summary of the (rounded) first and second digit (
) of
and
values and their pertinent frequencies to be compared to the BL12.
Based on the results in
Table 7 and
Table 8, the BL12 conformity leads to a relatively complex analysis, especially considering the evaluation of derived variables (such as the
ratio). The results from
Table 7 and
Table 8 are again characterized by non-conformity when considering the MAD test. But the
statistics point to the conformity of the
values in the “gain” domain. Potentially, as a result, this leads to non-conformity of the
ratio.
The analysis of BL1 and BL2 is also illustrated in
Figure 2 and
Figure 3, respectively. BL12 is illustrated on
Figure 4 without the
data, otherwise the figure would be somewhat unreadable. Let it be observed that when supplementing the data reported in
Table 7 and
Table 8 with the data in
Figure 4, some salient irregularities appear in BL12, in particular in the last higher two digits’ occurrence (such as
). The worst is for the
data, which seem to have much (unexpected) deviation from the empirical BL12.
The results from the statistical tests, in order to assess the discrepancy between the observed frequency distributions and the theoretical BL distributions, are reported in
Table 3,
Table 4,
Table 5,
Table 6,
Table 7 and
Table 8. Recall that the
test and the MAD tests concern two different concepts of distances:
The
test, as usual defined through the number of observations (
O) and the number of expected (
E) ones, distributed in a number of bins equal to
, where
D is the number of degrees of freedom,
is classically used, but is said to tend toward rejecting the Benford compliance of observations even when the deviations from the theoretical BL (
E) are negligible, mainly in large samples.
The MAD test, defined through the observed (
) and the expected (
) frequencies, in the pertinent number
K of bins, as
is thought to be the most reliable test for checking the validity of the BL [
31], but not always [
12]. In brief, for BL1, a value below 0.006 allows for deducing close conformity, while an MAD between 0.006 and 0.012 refers to only acceptable conformity; marginally acceptable conformity occurs for values between 0.012 and 0.015; nonconformity is the conclusion otherwise. For BL2, close conformity occurs for MAD ≤ 0.008, while close conformity occurs for MAD ≤ 0.0012 for BL12 [
4].
Of course, many different statistical tests have been used for assessing conformity (of financial data, for example) to BLs: a short list may include, without any hierarchical order, the Kolmogorov–Smirnov test, the Chebyshev distance, the Kullback–Leibler divergence, the Freedman–Watson () test, the Joenssen JP-square test, z-statistics, the Financial Statement Divergence Score, the Kuiper test, the Binomial Probability test, and even the Euclidean distance or regression approaches and other ”smooth tests”.
Recent discussions and works of interest are covered with pertinent recommendations by Cerqueti and Maggi [
14], Lesperance et al. [
52], Ducharme et al. [
53], Henselmann et al. [
54], Cerqueti and Lupi [
31,
55], and Barabesi et al. [
56].
5. Conclusions
In brief, recall that we aim to assess the discrepancy between secondary (financial) data and their possible theoretical Benford distributions through two statistical tests based on different concepts: the
test and the Mean Absolute Deviation (MAD), both outlined in
Section 4. We have used a rather large set of financial data, i.e., the pre-tax income (
) and the total assets (
) of 567 companies listed on the FTSE All-Share index, gathered from the Refinitiv EIKON database, covering 14 years from the period from 2009 to 2022, as described in
Section 3. Since the validity and applicability of Benford’s Laws (BLs) are still debated, in particular on indirect measures, we also derived the
dataset. Furthermore, due to the data size (a little bit less than 7000 data points), we examined cases of either positive or negative
.
Indeed, Benford’s Law, which describes the frequency distribution of digits in many real-world datasets, has been extensively applied in accounting to detect anomalies such as rounding up or other irregularities. While much research focuses on the first digit, some studies (including the present study) have also examined the distribution of second digits and the first and second digits’ occurrence on income and total assets data integrity. Benford’s Law provides a method to detect such anomalies by comparing the expected distribution of digits to the observed distribution in the dataset. The present study has examined such deviations for the first two digits for the variable in the profits and losses domain, identifying significant deviations for and from the theoretical distribution.
Thus, whether the actual , , and proportions do not statistically differ from the proportion expected from Benford’s Laws (BL2 and BL12), i.e., the null hypothesis, can be now verified according to the and MAD statistical tests.
From the tables summarizing the MAD values, it seems obvious that all data appear to be non-conforming to BLs. This null hypothesis is rejected.
In contrast, conclusions from the tests are more ambiguous. Indeed, the smallest with a value below the critical one for the pertinent number of degrees of freedom occurs for several cases:
firstly, for ,
secondly, for , , , and ,
and finally, for and .
A few cases are close to the critical value like
and ,
,
, and .
It is remarkable that no
(
,
, and
) ratio obeys the 5%
-test criterion. Although there is no proof of the following deduction, one may assume that this finding might be due to the bizarre
distribution, which is illustrated in
Figure 1.
In fact, this observation allows us to point to one still-open question on the applicability, and further validity, of Benford’s Laws for derived measures, if it is found that initial, raw measures obey the empirical BLs. Our results through the
test values seem to indicate that the universally extended validity is dubious. As a suggestion for further work, the
test could be seriously considered [
52,
53].
In conclusion, one might again be amazed that two different statistical tests do not lead to similar deductions regarding the conformity of large, a priori, unmanipulated datasets. On the other hand, one may further question the lack of conformity with respect to the MAD test. Indeed, this implies that the data might be manipulated, likely in order to maintain a balance between the reported income, thus lowering the tax amount, and the positive show of benefits for shareholders. Nevertheless, our findings add to the observations of Alali and Romero [
26] on the one hand and on those of Henselmann et al. [
54] on the other. Although we have used quite different datasets, and different tests, we confirm that much remains to be understood and explained.
To sum up, this note enriches the application of Benford’s Law as a robust tool for detecting anomalies and potential fraud in accounting data, utilizing an extensive dataset derived from FTSE-listed companies. We also stress the need for more elaborate statistical analyses before drawing conclusions.
Thus, future research directions may be highlighted here, in particular studies on other financial data through BLs—if one is interested in financial data manipulation, but also other statistical tests, we recommend the recent work of Lesperance et al. [
52], Ducharme et al. [
53], Henselmann et al. [
54], Cerqueti and Lupi [
31,
55], and Barabesi et al. [
56].