Unsupervised Forgery Detection of Documents: A Network-Inspired Approach

Al-Ameri, Mohammed Abdulbasit Ali; Mahmood, Basim; Ciylan, Bünyamin; Amged, Alaa

doi:10.3390/electronics12071682

Open AccessArticle

Unsupervised Forgery Detection of Documents: A Network-Inspired Approach

¹

Graduate School of Natural and Applied Sciences, Gazi University, 06570 Ankara, Turkey

²

Computer Science Department, University of Mosul, Mosul 41002, Iraq

³

Iraq and Bio Complex Laboratory, Exeter EX4 4QJ, UK

⁴

Computer Engineering Department, Faculty of Technology, Gazi University, 06570 Ankara, Turkey

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(7), 1682; https://doi.org/10.3390/electronics12071682

Submission received: 20 March 2023 / Accepted: 31 March 2023 / Published: 3 April 2023

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

The area of forgery detection of documents is considered an active field of research in digital forensics. One of the most common issues that investigators struggle with is circled around the selection of the approach in terms of accuracy, complexity, cost, and ease of use. The literature includes many approaches that are based on either image processing techniques or spectrums analysis. However, most of the available approaches have issues related to complexity and accuracy. This article suggests an unsupervised forgery detection framework that utilizes the correlations among the spectrums of documents’ matters in generating a weighted network for the tested documents. The network, then, is clustered using several unsupervised clustering algorithms. The detection rate is measured according to the number of network clusters. Based on the obtained results, our approach provides high accuracy using the Louvain clustering algorithms, while the use of the updated version of the DBSAN was more successful when testing many documents at the same time. Additionally, the suggested framework is considered simple to implement and does not require professional knowledge to use.

Keywords:

digital forensics; forgery detection; network science; unsupervised learning

1. Introduction

Today’s software applications and devices such as mobile devices, laptops, printers, scanners, and software programs introduced a variety of issues in our life. This is because of the ability to use these applications and devices for manipulating documents. Even though these digital tools may have some benefits, a few people choose to falsify or create fake official documents by unlawfully exploiting them. especially when working with formal documents that ought to be taken seriously, such as a driver’s license, bank checks, medications, cash, or even testimony in court. Original documents may be subjected to this procedure illegally in order to obtain personal benefits [1]. These illegal practices significantly increased in recent years [2]. To accept or depend on formal documents, organizations must be able to verify their legitimacy. To accomplish this, forgery detection requires a system that is effective, accurate, simple to use, automatable, affordable, and non-destructive [3]. Nevertheless, such a technique is not yet available [4].

Practically, the digital forensics literature includes a variety of approaches for solving this kind of issue [5]. It underwent a tremendous evolution to become a crucial component of many investigations carried out by law enforcement agencies, the military, and other governmental entities. Some approaches detect forged documents using image processing techniques [6]. Other approaches use the spectroscopy of the tested documents. The former is considered complex and consumes time and effort [5,7,8]; this is due to the majority of image processing methods involving several processes, such as image acquisition, image restoration, categorization, extraction of features, and feature evaluation [9,10]. The latter, however, is based on spectral analysis of the document’s constituent parts, such as the printing ink, writing ink, and printing paper [11,12]. It is considered a more accurate way of forgery detection because it analyzes the features of the materials used to create the document [13,14]. However, the spectroscopy approach is not easy to implement. It depends on building mathematical or statistical models for analyzing the spectrums of the tested documents. The spectrums represent the interactions between electronic radiation and matter, which makes them more accurate [15]. These interactions are considered the function of the radiation’s wavelength. Extracting this information can be performed using a variety of technologies. One of the most common technologies used for this specific purpose is laser-induced breakdown spectroscopy (LIBS) [16].

This study aims to propose a novel and simple approach for detecting forged documents based on the spectroscopy of documents’ matters using concepts inspired by complex networks. Hence, the main contributions of this study can be summarized as follows:

Integrate the fields of forensics sciences and complex networks in one useful area;
Develop a novel, efficient, and simple approach for forgery detection of documents that formalizes the spectrums of documents’ matters as a computerized network model;
Provide a forgery detection method that does not need experts to perform forgery detection tests.

This paper is structured as follows: Section 2 lists the related works and Section 3 sets the context for the article by providing a theoretical background about complex networks. Section 4 illustrates the research method. Section 4 shows the experimental results and discussions. Section 5 concludes this work with recommendations and future works.

2. Related Works

According to digital forensics literature, many solutions were proposed for detecting forged documents. For instance, Cicconi et al. [17] distinguished different ink types in signed documents by modeling LIBS spectrums in the form of a statistical model. The research examined pen inks for a single paper type and many paper types, and it established the order in which stacked inks were deposited. They also looked at the toners and signatures on a questioned piece of paper (DQ). The eight black inks on a single type of printing paper could then be differentiated by the researchers after they found up to seven different metals in the inks tested. When the inks were tested on 10 separate sheets, the categorization’s validity was compromised for several distinct reasons. One of the reasons was that the same chemicals were present in both the ink and the paper that was ablated simultaneously with the ink. The different ways inks were absorbed into paper was another distinction. The testing at three crossing locations using a pair of black or blue inks was successful five out of six times.

The authors in [18] developed a technique for examining laser printers and photocopier toners using chemometrics and spectral characteristics. The goal of the study was to develop a non-destructive forensic document discrimination methodology that could be used in forensic science labs to analyze suspect documents on a regular basis. Using the multivariate analysis (PCA—Principal Component Analysis) technique, a set of orthogonal PCs—each representing a collection of data features that collectively describe data variance—were constructed. Based on a comparison of the qualitative and quantitative analysis, the findings demonstrated the efficacy of the suggested strategy. Without having to destroy the samples, the methodology was effective and dependable in the discrimination process.

Printing ink was examined by Hui et al. [6] by examining the components of the print-ink samples. To differentiate between the LIBS method’s power for the samples, they used the Principal Component Analysis (PCA) technique in the analysis. Samples of black printing ink from various kinds of printers were used in the analysis (laser and inkjet printers, and photocopiers). Additionally, they used one control material (white A4 paper). Each sample yielded multiple LIBS spectra, which the authors gathered. In the samples, each frequency was represented by a Spot (box). The stages in their suggested strategy were as follows: The spectra between replicates should first be normalized, then the spectrum should be split into smaller relevant regions to achieve spectra overlay, and lastly, the NIST atomic spectra database should be used to identify the elements of interest. According to the authors, the findings indicated that combining the LIBS approach with the PCA technique allowed for the provision of discriminative evidence regarding elemental differences among all of the sample inks.

The same method was applied in [19] to separate black inks made by various producers and to analyze deteriorated papers using infrared and Raman spectroscopy. They investigated the kinds and dates of the documents. In the experiment, three distinct kinds of paper were used, and various spots were then chosen for each type of paper. Their method looked into the viability of identifying sample ages. Their suggested technique involved using ATR FT-IR spectra as input data for two-dimensional correlation analysis using Noda’s method to produce correlation maps. The findings demonstrated that the pattern of two-dimensional maps provided insight into the samples’ mechanism of degradation, which is of interest to many forensics’ experts.

Another interesting work was performed in [20], it detected the age and type of degraded papers using correlation analysis in the material’s spectrums. In addition to using the spectroscopy of the degraded papers, the authors also used infrared technology.

Buzzini et al.’s [21] investigation into the micro-Raman spectroscopy parameters for inkjet printer ink discrimination. Microscopical methods combined with Raman spectroscopy can be used to find microscopic colored dots produced by inkjet printing on papers. The main goal of this study was to determine whether the Raman data obtained from the three microscopic colored dots—Cyan, Magenta, and Yellow—constitutes, coupled, a chemical signature of sufficient discriminating quality to provide reliable investigative leads in a timely and non-destructive manner. The authors noted that the study’s findings demonstrated spectral differences wherein Ra-man spectra from various samples totally differed in terms of the location, quantity, and relative intensity of Raman bands.

As seen, most of the methods in the literature used complex statistical/mathematical models, which consumed high computational costs. Additionally, most of the approaches were supervised-based. Hence, an unsupervised-based approach is suggested for detecting forged documents. The proposed approach uses the spectrums (using LIBS) of the tested documents matter. The suggested method is thought to be straightforward and does not require complicated calculations, which is of interest to the investigators.

3. Complex Networks

In the concept of network theory, complex networks can be defined as graphs (or networks) that differ significantly from regular graphs or uniformly random networks. Complex networks have many features that do not appear in ordinary networks, such as a simple network, but mostly appear in more complex networks. Studying complex networks started at the beginning of the 2000s, stemming from the concept of graph theory and was inspired by empirical analysis of real networks such as biological networks, brain networks, social networks, and technological networks [22,23].

In the last decade, complex network theory and application saw enormous progress and were a focus of interest. The main reason for this was the large number of tools developed for analyzing various topologies of numerous real-world systems and their flexibility and generality for representing any natural structure [24,25].

A complex network can be thought of as a large collection of interconnected vertexes connected by edges. A vertex can represent any object such as a computer, a person, an organization, or a biological cell. While interconnected means the possibility of linking two vertexes such as two organizations exchanging commodities, two people know one another (friends) [25].

The standard tools to study complex networks require some basic knowledge of data structures, differential equations, probability, and graph theory [26].

Community Detection Algorithms

In the concept of complex networks, a community, module, or cluster informally considers a group of nodes more densely interconnected among each other than with the rest of the network [27,28]. Community detection in networks is an essential and important part of modern network analysis [29]. Several types of algorithms detect network communities. These algorithms vary depending on the process leading to an estimation of the community structure and the estimated communities’ nature [29,30].

4. Research Method

4.1. Dataset and LIBS Settings

Two sets of samples were used in this study. The original documents from the first set were printed on inkjet, laser, and photocopy machines. This set’s documents were all printed on the same kind of paper. Three 5 cm × 5 cm square boxes of black ink were created on white A4 office paper for each printer under consideration (COPY & LASER, 80 gsm). The brands, models, and types of printers that were used in this work are described in Table 1.

The second set, which represented the questioned documents, was composed of prints made on various kinds of paper using the identical printers listed in Table 1 but for official letters and degree certificates, for example. Digital forensics developers frequently employ the LIBS technique to extract the spectrums of the inspected documents. The settings of the LIBS should be carefully performed. This is because different settings may generate different values of the matter used in the tested document. Therefore, this work presents the LIBS settings that were used in this work: A “Q-switch Nd: YAG laser” that “emits 1064 nm with 10 ns pulse duration” created the plasma. Laser pulse energies of 25 mJ were used for the measurements and a converging lens with a focal length of 100 mm was employed to direct the “laser beam” onto the sample’s surface. The sample was 10 cm away from the focusing lens when it was used. With the “beam axis positioned at a distance of 5 cm” from the sample, the “optical fiber” was adjusted at a “angle of 45°”. A collimator lens collected the light released by the laser-induced plasma and focused it onto the optical fiber aperture, which had a diameter of 200 m/0.22 NA. The “Visual Spectra 2.1” program was used by the LIBS to record the spectra. “Recorded between 200 and 900 nm wavelengths range with resolution 0.8 nm” is how the spectra were described. After conducting numerous studies, these parameters were established. For the sake of accuracy, each tested document (original and questioned) was scanned 5 times using LIBS technology and stored the spectrums of each scan in an independent excel file. Each scan had two columns: one for wavelengths (200 to 900 nm) and the other for intensity (2048 points), and they are both present in the scan.

4.2. The Proposed Detection Approach

Using the Excel files mentioned above, a “Correlation Matrix” was constructed. The correlations were calculated among all the scans of the tested document and across original and questioned documents. This is important since it revealed how the documents are correlated. The following formula was used to compute the correlation matrix [20]:

C_{x, y} = \frac{\sum_{i = 1}^{n} x_{i} y_{i}}{\sqrt{(\sum_{i = 1}^{n} x_{i}^{2}) (\sum_{i = 1}^{n} y_{i}^{2})}}

(1)

where x and y refer to LIBS spectrums of the test samples. x_i, y_i refers to ith spectral component (number of spectral points from LIBS). The values of the spectra ranged from 1 to n, with n = 2048. Each component of the matrix reflected the correlation coefficient between the two spectra. In this case, all correlation values were higher than 0. The correlation matrix was obtained using Matlab. The intensity column of the independent spectrums was obtained from the spectroscopic analysis of the sample and used as a representation of the single spectrum of the sample for the formation of the correlation matrix. The resulting matrix was “symmetric”, i.e., the numbers above and below the diagonal were comparable. The resulting matrix was “symmetric”, i.e., the numbers above and below the diagonal were comparable. The generated correlation matrix was 10 × 10 of dimensions and this was due to the 5 spectrums acquired from both the questioned and original test documents. An “Adjacency Matrix” was created using the correlation matrix through the use of “Graph Theory (G)”. Here, spectrums corresponded to vertices (V), and two vertices were connected (edge (E)) based on the correlation value between them, which was also considered as the weight (w). The created graph was considered an undirected weighted graph that was generated using the under-diagonal values of the correlation matrix. Similarly, a “Weighted Adjacency Matrix” was created under the framework of graph theory. The adjacency matrix was converted into two spreadsheet files, which were the nodes file and edges file that were imported by visualization software for visualization purposes and to perform the forgery detection.

The forgery detection was performed using unsupervised clustering algorithms. The number of extracted clusters in the graph served as the primary indicator in the detection procedure (network). The clustering algorithms used in this work were “Louvain Algorithm” [31] and it is an updated version which is called “Leiden Algorithm” [32]. The Louvain Algorithm was suggested in [31] by Blondel et al., it uses two stages—”local movement of nodes” and “aggregation of the network”—to distinguish high modularity partitions in a weighted network (inserting nodes to communities). The Louvain optimization, which is “greed-based”, looks to execute in time O(v.logv), where v is the total number of network nodes. The primary component of this technique for finding communities is the “optimization of the Modularity (Q)” level that is In the range of (−1, 1). According to [31], the formulation of Q can be as follows:

Q = \frac{1}{2 m} + \sum_{i j} [W_{i j} \frac{k_{i} k_{j}}{2 m}] δ (c_{i} c_{j}),

(2)

where W_ij between nodes i and j represent the weights. The weights of the edges of the nodes i and j are represented by k_i and k_j, respectively, while the weights of the entire network are represented by m. The c_i and c_j reflect the formed communities and δ is calculated as follows:

δ (c_{i}, c_{j}) = \{\begin{matrix} 1, c_{i} = c_{j} \\ 0, o t h e r w i s e \end{matrix}

(3)

The Leiden algorithm, a modified version of the Louvain algorithm, on the other hand [33], proposes a stage between the first and second, which is “refinement of the partition”. Practically, the experiments showed that similar results were gained using both algorithms and this is due to the small size of the graph. According to Tragg et al. [34], the Louvain algorithm struggles with the modularity resolution limit. The impact of this issue may lead to combining small communities into one community. It was determined to integrate the characteristics of the Louvain and Leiden algorithms for this specific reason. In other words, the quality function was replaced with the “Constant Potts Model (CPM)” instead of using modularity. The concept of CPM (H) was described in detail in [31]. It can be defined as follows and address the modularity restriction:

H = \sum_{c} [e_{c} - γ (\frac{n_{c}}{k})],

(4)

f c represents a group of n_c nodes and is the proper resolution parameter:

(D e n s i t y o f c o m m u n i t i e s) < γ < (d e n s i t y a m o n g c o m m u n i t i e s)

According to the above update, the network will be well-clustered and faster in forming communities. As mentioned, the questioned and original documents will be converted into one network. As a result, if the algorithm only finds one cluster, the questioned document is original; otherwise, it is forged. Furthermore, based on the work of [35], they proposed an algorithm that was inspired by the DBSCAN algorithm for forgery detection of documents. Their approach was tested in this work since it used the same concept in detecting whether a document is forged.

5. Results and Discussions

Several documents were used to assess the proposed approach in terms of detecting forged documents.

Test 1 (Laser vs. Inkjet samples): The original (OD), which was created using a “Laser printer Canon MF231”, and the questioned (QD), which was created using a “Inkjet Printer HP 577”, were both taken into consideration in this test. The same paper type was used in both documents (“A4 COPY&LASER”). Figure 1 depicts the visualization of Test 1. Clearly, two clusters were extracted using the updated Louvain algorithm. This suggests that the document under inspection was a forgery. Table 2 presents the settings of the algorithm.

Test 2 (Different Paper Types): this test aims to examine the proposed approach in distinguishing different paper types. Therefore, two pairs of documents were tested; Pair_1 includes two documents of which the first was printed using “PAPEROne” paper type and the second printed using “A4 COPY&LASER” paper type. Pair_2 also contains two documents; the first was produced using “Ballet Universal” and the second was printed using “A4 COPY&LASER”. Figure 2A,B demonstrate the visualization of the two pairs, Pair_1 and Pair_2, respectively. In both pairs, the proposed approach was able to distinguish different types of paper types as shown in the visualization and the clusters generated using the updated clustering algorithm. This result reflects the robustness of the approach in obtaining 100% of detection rate. This test is practical when a questioned document was printed on a different brand or kind of paper using the same printer as the original papers.

Test 3: An unauthorized manipulation can be performed on original documents. This means the document is original but one part(s) is questioned. This case motivated us to perform this test and evaluate the proposed approach. Test 3 includes a single original document and has unauthorized manipulation in one of its parts. The document was printed using “Canon264” printer, the manipulation was performed using “HP577” printer, and the paper brand was “A4 COPY&LASER”. To this end, the document was scanned using LIBS and the network was generated with 5 nodes reflecting the number of scans. Figure 3 demonstrates the generated network of the document. Two clusters can be seen in the network; the first has four red nodes and the second has one green node. The former cluster reflects the parts of the original document and the latter shows the manipulated part, which is a forgery or called “counterfeit”. This test demonstrates the suggested method’s capacity for identifying fabricated text within the original document.

Test 4 (Testing Ink Type): This test is important to be performed since it can examine the ability of the proposed approach in distinguishing a variety of brands of printing inks. The test included many different documents that were printed in different inks. The visualization in Figure 4 demonstrates two clusters, the first one (red nodes) includes documents printed using the same brand of ink. The second one includes the documents that were printed in another brand of ink. As can be seen, the proposed approach was successful in the detection. Although the suggested method is straightforward and effective, it has several limits that should be considered when making future decisions:

−: Five LIBS tests (scans) are the minimum recommended number. Five scans were used in this work as the experimentally determined threshold. As a result, it is anticipated that the accuracy of the suggested approach will decline as the number of scans falls below the threshold.
−: The settings of LIBS should be as described in the previous section. The results of the suggested approach may vary depending on how these settings are changed, according to the pre-experiments that were conducted.

Now, we tested our generated dataset using other approaches such as the approach proposed in [35], which was inspired by the DBSCAN algorithm. The previous tests shown in Figure 1, Figure 2 and Figure 3 were reperformed using [35]. Figure 5, Figure 6, Figure 7 and Figure 8 show the results of testing our dataset under an approach that was different from the proposed one in this work, which was proposed in [35]. This approach is an updated version of the DBSCAN algorithm; instead of distances between nodes, the weights between nodes are converted to distances. The main purpose of this test was to prove that our dataset can be considered a standard that can be used in a variety of unsupervised machine-learning approaches.

Test 5 (Testing Louvain vs. Updated-DBSCAN): This test examines the Louvain algorithm [33] and the Updated-DBSCAN algorithm [35] in case of having the spectrums of many documents. Using the whole dataset that includes the spectrums of different printers (described in Table 1), we first tested the Louvain algorithm and showed its ability to distinguish between different printers. Figure 9 demonstrates the result; it can be seen that Louvain was not always successful in distinguishing all the printers together and there was some overlap between the grouped spectrums. The figure shows that some spectrums were aliened together such as Ricoh 2051 (laser), Epson 3070 (inkjet), Canon 2900 (laser), and Canon 6000 (laser), while the others were not aligned altogether. Out of the four successfully distinguished printers, three of them were laser and one was inkjet. This means the Louvain deals better with laser printers. Yet, in the other unsuccessful cases, it can be observed that almost one out of five spectrums of each printer were not distinguished, which is promising and needs more investigation aiming to have higher accuracy.

On the other hand, Figure 10 depicts the resulting visualization of the Updated-DBSCAN using the same dataset of Test 5. The figure shows that all the printers were successfully distinguished, and the visualization shows 11 clusters of different printers. This mean according to Table 3 the Updated-DBSCAN was more successful when having a larger dataset of spectrums.

6. Conclusions

This study proposed a straightforward unsupervised method for identifying fake documents. The proposed approach used concepts inspired by network science and graph theory in forming the spectrums of documents. The spectrums of document materials were obtained using the LIBS technique. The work performed four main tests for examining the proposed approach. An enhanced version of the Louvain algorithm was employed as the unsupervised method in the detection procedure. Additionally, the method used to assess the effectiveness of the methodology was based on the number of clusters attained. The findings demonstrated the effectiveness of the suggested strategy in identifying fake documents in the four tests. Additionally, the suggested method is thought to be easy to implement, consumes little computational power, and the detection is visible, which is of importance to digital forensics investigators. Moreover, the dataset generated in this work was tested using another approach in the literature. The results showed that our dataset can be considered a standard to be using a variety of unsupervised machine-learning approaches. The results also showed that the Updated-DBSCAN algorithm was more successful in being used in our standard dataset since it showed higher accuracy than other algorithms such as Louvain. Therefore, in future work, it is intended to employ various unsupervised machine-learning techniques and add more experiments in subsequent development aiming at testing our dataset under different algorithms and showing that it can be considered as a standard in the forgery detection field.

Author Contributions

Validation, B.M.; supervision, B.C.; project administration, M.A.A.A.-A.; literature search and writing, A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Alameri, M.A.A.; Ciylan, B.; Mahmood, B. Computational Methods for Forgery Detection in Printed Official Documents. In Proceedings of the 2022 ASU International Conference in Emerging Technologies for Sustainability and Intelligent Systems (ICETSIS), Virtual, 22–23 June 2022; pp. 307–331. [Google Scholar]
Montasari, R.; Hill, R.; Parkinson, S.; Peltola, P.; Hosseinian-Far, A.; Daneshkhah, A. Digital forensics: Challenges and opportunities for future studies. Int. J. Organ. Collect. Intell. 2020, 10, 37–53. [Google Scholar] [CrossRef] [Green Version]
Dyer, A.G.; Found, B.; Rogers, D. An insight into forensic document examiner expertise for discriminating between forged and disguised signatures. J. Forensic Sci. 2008, 53, 1154–1159. [Google Scholar] [CrossRef] [PubMed]
Parkinson, A.; Colella, M.; Evans, T. The development and evaluation of radiological decontamination procedures for documents, document inks, and latent fingermarks on porous surfaces. J. Forensic Sci. 2010, 55, 728–734. [Google Scholar] [CrossRef] [PubMed]
Ragai, J. Scientist And The Forger, The: Insights Into The Scientific Detection Of Forgery In Paintings; World Scientific: Singapore, 2015. [Google Scholar]
Warif, N.B.A.; Wahab, A.W.A.; Idris, M.Y.I.; Ramli, R.; Salleh, R.; Shamshirband, S.; Choo, K.-K.R. Copy-move forgery detection: Survey, challenges, and future directions. J. Netw. Comput. Appl. 2016, 75, 259–278. [Google Scholar]
Valderrama, L.; Março, P.H.; Valderrama, P. Model precision in partial least squares with discriminant analysis: A case study in document forgery through crossing lines. J. Chemom. 2020, 34, e3265. [Google Scholar] [CrossRef]
Niu, P.; Wang, C.; Chen, W.; Yang, H.; Wang, X. Fast and effective Keypoint-based image copy-move forgery detection using complex-valued moment invariants. J. Vis. Commun. Image Represent. 2021, 77, 103068. [Google Scholar] [CrossRef]
Muthukrishnan, R.; Radha, M. Edge detection techniques for image segmentation. Int. J. Comput. Sci. Inf. Technol. 2011, 3, 259. [Google Scholar] [CrossRef]
Gorai, A.; Pal, R.; Gupta, P. Document fraud detection by ink analysis using texture features and histogram matching. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016. [Google Scholar]
Markiewicz-Keszycka, M.; Cama-Moncunill, X.; Casado-Gavalda, M.P.; Dixit, Y.; Cama-Moncunill, R.; Cullen, P.J.; Sullivan, C. Laser-induced breakdown spectroscopy (LIBS) for food analysis: A review. Trends Food Sci. Technol. 2017, 65, 80–93. [Google Scholar]
Elsherbiny, N.; Nassef, O.A. Wavelength dependence of laser-induced breakdown spectroscopy (LIBS) on questioned document investigation. Sci. Justice 2015, 55, 254–263. [Google Scholar] [CrossRef]
Raman Spectroscopy of Two-Dimensional Materials; Tan, P.H. (Ed.) Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
Laserna, J.; Vadillo, J.M.; Purohit, P. Laser-Induced Breakdown Spectroscopy (LIBS): Fast, Effective, and Agile Leading Edge Analytical Technology. Appl. Spectrosc. 2018, 72 (Suppl. 1), 35–50. [Google Scholar] [CrossRef]
Noll, R.; Fricke-Begemann, C.; Connemann, S.; Meinhardt, C.; Sturm, V. LIBS analyses for industrial applications–an overview of developments from 2014 to 2018. J. Anal. At. Spectrom. 2018, 33, 945–956. [Google Scholar]
Hui, Y.W.; Mahat, N.A.; Ismail, D.; Ibrahim RK, R. Laser-induced breakdown spectroscopy (LIBS) for printing ink analysis coupled with principle component analysis (PCA). AIP Conf. Proc. 2019, 2155, 020010. [Google Scholar]
Ameh, P.O.; Ozovehe, M.S. Forensic examination of inks extracted from printed documents using Fourier transform infrared spectroscopy. Edelweiss Appl. Sci. Technol. 2018, 2, 10–17. [Google Scholar] [CrossRef]
Verma, N.; Kumar, R.; Sharma, V. Analysis of laser printer and photocopier toners by spectral properties and chemometrics. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2018, 196, 40–48. [Google Scholar] [CrossRef]
Zięba-Palus, J.; Wesełucha-Birczyńska, A.; Trzcińska, B.; Kowalski, R.; Moskal, P. Analysis of degraded papers by infrared and Raman spectroscopy for forensic purposes. J. Mol. Struct. 2017, 1140, 154–162. [Google Scholar]
Steiger, J.H. Tests for comparing elements of a correlation matrix. Psychol. Bull. 1980, 87, 245. [Google Scholar] [CrossRef]
Buzzini, P.; Polston, C.; Schackmuth, M. On the criteria for the discrimination of inkjet printer inks using micro-R aman spectroscopy. J. Raman Spectrosc. 2018, 49, 1791–1801. [Google Scholar] [CrossRef]
Shergin, V.; Udovenko, S.; Chala, L. Assortativity Properties of Barabási-Albert Networks. In Data-Centric Business and Applications: ICT Systems-Theory, Radio-Electronics, Information Technologies and Cybersecurity; Springer: Berlin/Heidelberg, Germany, 2021; Volume 5, pp. 55–66. [Google Scholar]
Wiedmer, R.; Griffis, S.E. Structural characteristics of complex supply chain networks. J. Bus. Logist. 2021, 42, 264–290. [Google Scholar] [CrossRef]
Mahmood, B.; Younis, Z.; Hadeed, W. Analyzing Iraqi Social Settings After ISIS: Individual Interactions in Social Networks. Am. Behav. Sci. 2018, 62, 300–319. [Google Scholar]
Bonifazi, G.; Cauteruccio, F.; Corradini, E.; Marchetti, M.; Ursino, D.; Virgili, L. Applying Social Network Analysis to Model and Handle a Cross-Blockchain Ecosystem. Electronics 2023, 12, 1086. [Google Scholar] [CrossRef]
Hu, Z.; Shao, F.; Sun, R. A New Perspective on Traffic Flow Prediction: A Graph Spatial-Temporal Network with Complex Network Information. Electronics 2022, 11, 2432. [Google Scholar] [CrossRef]
Barrat, A.; Barthelemy, M.; Vespignani, A. Dynamical Processes on Complex Networks; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
Wei, B.; Deng, Y. A cluster-growing dimension of complex networks: From the view of node closeness centrality. Phys. A Stat. Mech. Its Appl. 2019, 522, 80–87. [Google Scholar] [CrossRef]
Mahmood, B. Prioritizing CWE/SANS and OWASP Vulnerabilities: A Network-Based Model. Int. J. Comput. Digit. Syst. 2021, 10, 361–372. [Google Scholar] [CrossRef] [PubMed]
Mahmood, B. Indicators on the Feasibility of Curfew on Pandemics Outbreaks in Metropolitan/Micropolitan Cities. In Proceedings of the 2021 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT), Purwokerto, Indonesia, 17–18 July 2021; pp. 179–183. [Google Scholar]
Blondel, V.D.; Guillaume, J.L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, 2008, P10008. [Google Scholar] [CrossRef] [Green Version]
Traag, V.A.; Waltman, L.; Van Eck, N.J. From Louvain to Leiden: Guaranteeing well-connected communities. Sci. Rep. 2019, 9, 5233. [Google Scholar] [CrossRef] [Green Version]
Amjed, A.; Mahmood, B.; AlMukhtar, K.A.K. Network Science as a Forgery Detection Tool in Digital Forensics. In Proceedings of the 2021 IEEE International Conference on Communication, Networks, and Satellite (COMNETSAT), Purwokerto, Indonesia, 17–18 July 2021; pp. 200–205. [Google Scholar] [CrossRef]
Traag, V.A.; Van Dooren, P.; Nesterov, Y. Narrow scope for resolution-limit-free community detection. Phys. Rev. E 2011, 84, 016114. [Google Scholar] [CrossRef] [Green Version]
Al-Ameri, M.A.A.; Ciylan, B.; Mahmood, B. Spectral Data Analysis for Forgery Detection in Official Documents: A Network-Based Approach. Electronics 2022, 11, 4036. [Google Scholar] [CrossRef]

Figure 1. Test 1 visualization.

Figure 2. Test 2 visualization of Pair_1 document (A) and Pair_2 documents (B).

Figure 3. Test 3 visualization of a single document with questionable parts.

Figure 4. Test 4, visualization of different ink brands.

Figure 5. Test 1, distinguishing documents of two different printers; Laser and Inkjet using the algorithm proposed in [35].

Figure 6. Test 2, distinguishing between two different paper types; Copy Laser and Paper One using the algorithm proposed in [35].

Figure 7. Test 2, another test performed to distinguish between two different paper types; Copy Laser and Ballet using the algorithm proposed in [35].

Figure 8. Test 3, which shows if a document is partially forged in one of its parts using the algorithm proposed in [32]. The four blue nodes reflect the original document, while the red node represents the forged part.

Figure 9. Test 5, testing many printers using the Louvain algorithm.

Figure 10. Test 5, testing many printers using the Updated-DBSCAN algorithm.

Table 1. Printers Descriptions.

#	Printer Type	Brand	Model	Cartridge
1	LaserPrinter	Canon	i-SENSYS MF231	AR CRG 737
2	LaserPrinter	Canon	i-SENSYS MF4010	AR FX 10
3	LaserPrinter	Canon	i-SENSYS LBP6000	AR CRG725
4	LaserPrinter	Canon	Image CLASS MF264DW	CRG 51
5	LaserPrinter	Canon	i-SENSYS LBP2900	AR FX 10
6	InkjetPrinter	Epson	EcoTank ITS L3070	#
7	InkjetPrinter	Canon	Pixma TS6020	#
8	InkjetPrinter	HP	PageWidePro 577dw	#
9	LaserPrinter	Ricoh	Aficio MP 4001	Toner Black MP c4500
10	LaserPrinter	Ricoh	Aficio MP C2051	Toner Black MP c2051c

Table 2. Settings of the Louvain clustering algorithm.

Resolution	Iterations	Restarts	Weight-Based	Randomization
0.95	10	1	ON	ON

Table 3. Accuracy based on the number of clusters.

Test #	# of Nodes	Successful Cases Using the Updated Louvain Algorithm [33]	Successful Cases Using the Updated-DBSCAN Algorithm [35]
Test 1	10	100%	100%
Test 2	20	100%	100%
Test 3	5	100%	100%
Test 4	30	33.33%	100%
Test 5	55	87%	100%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Al-Ameri, M.A.A.; Mahmood, B.; Ciylan, B.; Amged, A. Unsupervised Forgery Detection of Documents: A Network-Inspired Approach. Electronics 2023, 12, 1682. https://doi.org/10.3390/electronics12071682

AMA Style

Al-Ameri MAA, Mahmood B, Ciylan B, Amged A. Unsupervised Forgery Detection of Documents: A Network-Inspired Approach. Electronics. 2023; 12(7):1682. https://doi.org/10.3390/electronics12071682

Chicago/Turabian Style

Al-Ameri, Mohammed Abdulbasit Ali, Basim Mahmood, Bünyamin Ciylan, and Alaa Amged. 2023. "Unsupervised Forgery Detection of Documents: A Network-Inspired Approach" Electronics 12, no. 7: 1682. https://doi.org/10.3390/electronics12071682

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Unsupervised Forgery Detection of Documents: A Network-Inspired Approach

Abstract

1. Introduction

2. Related Works

3. Complex Networks

Community Detection Algorithms

4. Research Method

4.1. Dataset and LIBS Settings

4.2. The Proposed Detection Approach

5. Results and Discussions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI