Advancing Diagnostic Tools in Forensic Science: The Role of Artificial Intelligence in Gunshot Wound Investigation—A Systematic Review

Sessa, Francesco; Chisari, Mario; Esposito, Massimiliano; Guardo, Elisa; Mauro, Lucio Di; Salerno, Monica; Pomara, Cristoforo

doi:10.3390/forensicsci5030030

Open AccessSystematic Review

Advancing Diagnostic Tools in Forensic Science: The Role of Artificial Intelligence in Gunshot Wound Investigation—A Systematic Review

by

Francesco Sessa

^1,*

,

Mario Chisari

²,

Massimiliano Esposito

³

,

Elisa Guardo

¹,

Lucio Di Mauro

¹,

Monica Salerno

¹ and

Cristoforo Pomara

¹

Department of Medical, Surgical and Advanced Technologies “G.F. Ingrassia”, University of Catania, 95121 Catania, Italy

²

“Rodolico-San Marco” Hospital, Santa Sofia Street, 87, 95121 Catania, Italy

³

Faculty of Medicine and Surgery, “Kore” University of Enna, 94100 Enna, Italy

^*

Author to whom correspondence should be addressed.

Forensic Sci. 2025, 5(3), 30; https://doi.org/10.3390/forensicsci5030030

Submission received: 23 April 2025 / Revised: 28 May 2025 / Accepted: 16 July 2025 / Published: 20 July 2025

Download

Browse Figure

Versions Notes

Abstract

Background/Objectives: Artificial intelligence (AI) is beginning to be applied in wound ballistics, showing preliminary potential to improve the accuracy and objectivity of forensic analyses. This review explores the current state of AI applications in forensic firearm wound analysis, emphasizing its potential to address challenges such as subjective interpretations and data heterogeneity. Methods: A systematic review adhering to PRISMA guidelines was conducted using databases such as Scopus and Web of Science. Keywords focused on AI and GSW classification identified 502 studies, narrowed down to 4 relevant articles after rigorous screening based on inclusion and exclusion criteria. Results: These studies examined the role of deep learning (DL) models in classifying GSWs by type, shooting distance, and entry or exit characteristics. The key findings demonstrated that DL models like TinyResNet, ResNet152, and ConvNext Tiny achieved accuracy ranging from 87.99% to 98%. Models were effective in tasks such as classifying GSWs and estimating shooting distances. However, most studies were exploratory in nature, with small sample sizes and, in some cases, reliance on animal models, which limits generalizability to real-world forensic scenarios. Conclusions: Comparisons with other forensic AI applications revealed that large, diverse datasets significantly enhance model performance. Transparent and interpretable AI systems utilizing techniques are essential for judicial acceptance and ethical compliance. Despite the encouraging results, the field remains in an early stage of development. Limitations highlight the need for standardized protocols, cross-institutional collaboration, and the integration of multimodal data for robust forensic AI systems. Future research should focus on overcoming current data and validation constraints, ensuring the ethical use of human forensic data, and developing AI tools that are scientifically sound and legally defensible.

Keywords:

forensic science; artificial intelligence; firearm wounds; gunshot wounds; machine learning

1. Introduction

Forensic science plays a crucial role in the administration of justice by providing reliable and efficient methods to support crime detection, investigation, and prevention [1]. Its applications span numerous domains, including toxicology, chemistry, DNA analysis, digital forensics, anthropology, and more: each branch contributes to solving complex criminal cases [2].

In recent decades, technological advancements, particularly in artificial intelligence (AI), have opened new avenues for addressing such challenges in forensic pathology [3]. AI, encompassing techniques such as machine learning (ML) and deep learning (DL), has shown remarkable potential in automating and enhancing complex analytical processes across various fields [4,5]. In medicine, for instance, AI has been employed for disease diagnosis, medical imaging analysis, and predictive analytics, demonstrating its adaptability and transformative power [6,7,8].

Current forensic practitioners are increasingly encountering AI technologies (particularly convolutional neural networks—CNNs) across various domains of their work, from digital forensics to forensic pathology [3]. Many professionals recognize the potential of AI to enhance efficiency, accuracy, and objectivity in several forensic tasks such as pattern recognition, evidence analysis, and case triage. For instance, AI is being used to automate the analysis of digital evidence or reduce genetic misinterpretation events, assisting in estimating post-mortem intervals using microbial data [9,10].

Gunshot wounds (GSWs) pose significant challenges in forensic pathology due to their morphological variability, which often leads to inconsistent and subjective interpretations [11]. A key forensic task involves distinguishing between single and multiple projectile ammunition: while single projectiles such as bullets tend to produce localized injuries, multiple projectile ammunition like buckshot or pellets results in dispersive wound patterns that are more difficult to analyze systematically. Another critical aspect is the differentiation between entry and exit wounds. Entry wounds are typically smaller and characterized by abrasion collars or the presence of gunpowder residue, whereas exit wounds are usually larger and irregular and lack soot or stippling [12,13]. However, these distinguishing features can vary depending on factors such as anatomical location, projectile type, firing distance, and the angle of impact, introducing further complexity to the analysis. For instance, close-range shots may show soot deposits and thermal injury, while distant shots often lack these features. Additionally, ricocheting bullets or oblique trajectories may produce atypical wound patterns that defy conventional classification [14]. Forensic interpretation requires the accurate assessment of wound characteristics—including size, shape, location, and trajectory—as well as the estimation of firing distance and the determination of the manner of death [11,15]. Yet, due to the heterogeneous presentation of GSWs, current methods remain highly dependent on practitioner expertise and subjective judgment, which can lead to variability in conclusions across cases [16]. This variability underscores a critical problem in forensic practice: the lack of standardized, objective tools for the classification of GSWs.

This review aims to provide a comprehensive overview of the current state of AI applications in the forensic classification of GSWs. Specifically, it addresses the following objectives:

-: To examine how AI methodologies have been applied to the forensic analysis and classification of GSWs.
-: To evaluate the potential of AI tools in improving the objectivity and consistency of data interpretation at crime scenes.
-: To identify current limitations, challenges, and gaps in the integration of AI within this forensic domain.
-: To propose future research directions that could enhance the utility and acceptance of AI technologies in forensic science.
-: To highlight the potential of AI to foster more accurate, efficient, and standardized forensic practices.

2. Methods

2.1. Study Design

This systematic review was conducted in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines to ensure a comprehensive and transparent assessment of the literature [17]. As no dedicated registration platform exists for forensic systematic reviews at this time, the protocol was not formally registered.

2.2. Data Sources and Search Strategy

The Scopus and Web of Science (WOS) databases were utilized to identify relevant studies published between 1 January 2000 and 31 March 2025. The search terms were selected based on preliminary exploratory searches, expert consensus, and a review of the key terms used in previous relevant studies. The terms were chosen to ensure the inclusion of studies focusing on the application of AI and ML in forensic investigations of firearm-related injuries. However, there is a risk that some relevant studies were not identified due to variations in terminology or indexing limitations in the databases. To mitigate this, an additional manual screening of references from selected articles was conducted.

A systematic search was conducted using the following keyword combinations:

(firearm) AND (artificial intelligence)—71 articles matched these keywords;
(gunshot) AND (artificial intelligence)—46 articles matched these keywords;
(firearm wounds) AND (artificial intelligence)—2 articles matched these keywords;
(gunshot wounds) AND (artificial intelligence)—13 articles matched these keywords;
(firearm injuries) AND (artificial intelligence)—7 articles matched these keywords;
(gunshot injuries) AND (artificial intelligence)—21 articles matched these keywords;
(firearm) AND (machine learning)—132 articles matched these keywords;
(gunshot) AND (machine learning)—103 articles matched these keywords;
(firearm wounds) AND (machine learning)—16 articles matched these keywords;
(gunshot wounds) AND (machine learning)—30 articles matched these keywords;
(firearm injuries) AND (machine learning)—21 articles matched these keywords;
(gunshot injuries) AND (machine learning)—37 articles matched these keywords.

The numbers following each combination indicate the initial number of studies identified using the respective search terms.

2.3. Inclusion and Exclusion Criteria

Inclusion criteria for this review were as follows:

✓ Original articles;
✓ Articles written in English.

Exclusion criteria included the following:

✓ Conference papers (98);
✓ Reviews (12);
✓ Conference reviews (12);
✓ Editorials (6);
✓ Notes (4);
✓ Book chapters (3);
✓ Data papers (3);
✓ Letters (2);
✓ Short surveys (2);
✓ Erratum (1);
✓ Retracted articles (1).

Moreover, 4 articles were excluded because they were not published in English. Finally, articles not aligned with the specific objective of the AI-based classification of GSWs were excluded.

2.4. Quality Assessment and Data Extraction

The initial assessment of all articles was conducted by F.S., who evaluated the titles, abstracts, and full texts. M.S. independently reanalyzed the selected articles to ensure consistency. In cases of conflicting opinions, M.C. was consulted for final evaluation and decision-making.

Cohen’s Kappa statistic [18] was used to measure the level of agreement among reviewers, yielding a Kappa value of 0.90, which indicates strong agreement.

Data extraction was standardized using a predefined table capturing key study characteristics, including study design, AI model type, data input, output variable (e.g., GSW classification), evaluation metrics, and the main findings.

2.5. Risk of Bias Assessment

To evaluate the methodological quality and potential sources of bias in the included studies, a risk of bias assessment was conducted using the Critical Appraisal Skills Programme (CASP) Checklist for Descriptive/Cross-Sectional Studies (2024). Each study was independently assessed across all 11 CASP domains (URL: https://casp-uk.net/news/, accessed on 28 May 2025). For each criterion, the studies were rated as per CASP guidance. A narrative synthesis of the appraisal outcomes was also provided for each study, summarizing strengths (e.g., validated AI tools, clear outcome definitions) and limitations (e.g., small sample sizes, lack of external validation, limited generalizability). This approach allowed for a more nuanced understanding of study quality beyond binary scoring.

All assessments were carried out independently by two reviewers (F.S. and M.S.), with discrepancies resolved through discussion and arbitration by a third reviewer (M.C.). This process ensured consistency and reduced subjective bias.

2.6. Characteristics of Eligible Studies

A total of 502 articles were identified in the initial search. Of these, 220 articles were removed as duplicates. The remaining 282 articles were analyzed using the Scopus database filtering function, and 148 were automatically excluded based on the predefined exclusion criteria (see Section 2.3).

The remaining 134 articles were manually screened by analyzing their titles, abstracts, and keywords. Of these, 96 articles were excluded because their themes did not align with the objectives of this review. Specifically, these excluded studies focused on topics such as firearm mortality and clinical records, AI systems for firearm identification based on sound, AI systems for bullet identification, and AI systems for gunshot residue identification.

The final 38 articles underwent full-text screening. However, 34 of these were excluded as they explored different goals, such as firearm injuries and suicide risks, AI systems in ballistics, and AI for gunshot detection recognition. Ultimately, only 4 articles met the inclusion criteria and were included in this systematic review, as they aligned with this review’s objectives (Figure 1). While this limited number reflects the nascent stage of this specific forensic application, it also underscores the urgency of further targeted research in this domain.

3. Results

The selection process for this systematic review followed a rigorous methodology, beginning with the identification of 450 articles in the initial search. Nevertheless, only four articles met the inclusion criteria, aligning with this review’s objectives of exploring the application of AI for classifying GSWs or related forensic contexts. These studies are summarized in Table 1.

Oura et al. [19] presented notable findings on the application of DL for forensic shotgun pattern interpretation. Utilizing a dataset of 106 images—54 from a 10 m distance and 52 from 17.5 m—the authors trained and evaluated several neural network architectures using the AIDeveloper platform. A TinyResNet-based model achieved the highest performance, with a testing accuracy of 94%, correctly classifying all 10 m patterns and misclassifying only one 17.5 m pattern. The experimental setup was deliberately simplified, employing a single shotgun model (Benelli Montefeltro Synthetic 12/76), a consistent ammunition type (Winchester Super Speed 12/70 with 3.1 mm lead pellets), and a fixed cylinder choke to minimize confounding variables. However, the study relied on a small dataset and used only one weapon and ammunition type, which limits the model’s generalizability. Moreover, the use of piglet carcasses rather than human tissue introduces biological differences that may affect the model’s applicability to real forensic cases. The authors advocate for future research involving larger, more heterogeneous datasets to develop robust, generalizable models for forensic shotgun pattern analysis.

The same research group [20] advanced their work by applying DL algorithms to the interpretation of GSWs, achieving an impressive accuracy of 98% in classifying shooting distance categories. Their dataset consisted of 204 high-resolution images derived from 19 piglet carcasses, including 60 negative controls, 50 contact shots (0 cm), 49 close-range shots (20 cm), and 45 distant shots (100 cm), all inflicted using a 22 Long Rifle Ruger Standard Model pistol. The images were captured under standardized lighting conditions and processed into 32×32 grayscale inputs for model training. Among the tested neural network architectures, a multilayer perceptron (MLP241624) achieved the highest performance, correctly classifying all test images except one distant shot, which was misclassified as a negative control. Performance metrics included an F1-score (the harmonic mean of precision and recall, balancing both concerns) of 1.00 for contact and close-range shots and 0.94 for distant shots, with an area under the ROC curve (AUC) of 0.99 or higher for all classes. This pioneering study established a foundation for using DL in GSW analysis, demonstrating the potential of neural networks to generalize the visual features of wounds. Nonetheless, the study’s reliance on a small, homogeneous dataset and the exclusive use of piglet carcasses—whose tissue properties differ from human anatomy—significantly limit external validity and raise concerns about overfitting. The absence of external validation or comparison with expert forensic pathologists further underscores the need for caution in interpreting these results. The authors emphasized the need for future studies involving larger, more diverse datasets—including human cadavers and varied weapon types—to develop robust, generalizable DL tools for forensic pathology.

Queiroz Nogueira Lira et al. [21] made significant advancements in classifying GSWs using DL techniques. Their study utilized a large and diverse dataset of 2551 wound images collected from real crime scenes in Brazil, comprising 1883 entry wounds and 668 exit wounds. Most cases involved male victims (95.09%) with a mean age of 27 years, and over half of the wounds were located on the trunk (51.35%). The authors evaluated 59 state-of-the-art CNN architectures for two classification tasks: wound type (entry vs. exit) and Medico-Legal Shooting Distance (MLSD), which categorizes wounds as contact, close-range, or distant. Among the tested models, ResNet152 consistently outperformed the others, achieving the highest accuracy (86.90%, which is the proportion of correct predictions made by a model out of all predictions), precision (82.7%, which refers to the proportion of correctly predicted positive cases among all predicted positives), recall (86.8%, which measures the proportion of actual positives correctly identified), F1-score (82.1%), and AUC (82.09%) for wound type classification. For MLSD classification, ResNet152 again led with an accuracy of 92.48% and an AUC of 94.36%, although ResNet50 and SqueezeNet10 showed slightly better performance on specific metrics such as macro precision and weighted specificity, respectively. The study employed extensive data augmentation and resampling techniques (e.g., SMOTE + ENN) to address dataset imbalance, particularly the underrepresentation of contact and close-range wounds. Despite these efforts, the authors acknowledged that the dataset was derived from a single institution and that image quality varied due to environmental conditions, which may affect model generalizability. They emphasized the need for future research involving multi-institutional datasets and external validation to enhance model robustness. Nonetheless, the study demonstrated the practical feasibility of DL in forensic pathology and advocated for AI as a complementary tool to support, rather than replace, expert human judgment in forensic investigations.

Cheng et al. [22] explored the use of DL, particularly CNNs, for classifying GSWs from digital images. Their study utilized a dataset of 2418 color images sourced from a forensic archive, comprising 2028 entrance and 1314 exit wounds, with an additional holdout test set of 415 entrance and 293 exit wound images. The images were preprocessed through cropping, resizing, and augmentation to enhance model robustness. Among several architectures tested—including ResNet50, ConvNext Large, EfficientNetV2S, and MobileNetV3Large—the ConvNext Tiny model achieved the best performance, with a final test accuracy of 87.99%, precision of 83.99%, recall of 87.71%, F1-score of 85.81%, and an AUC of 0.946. These results were comparable to the performance of experienced forensic pathologists when limited to image-based assessment without contextual cues. However, the model struggled with a subset of 100 challenging images, where pathologists significantly outperformed AI (56% vs. 15% accuracy, p < 0.0001), particularly in identifying entrance wounds. This highlighted the model’s limitations in handling atypical presentations and the need for contextual or tactile information in complex cases. The authors emphasized the importance of dataset diversity and quality, noting that image inconsistencies (e.g., lighting, exposure, anatomical variation) and class imbalance may have affected performance. They also stressed the need for improved explainability in DL models to support clinical and forensic adoption. Despite these challenges, the study demonstrated the feasibility and promise of DL for GSW classification while underscoring the necessity of further research to enhance model reliability, interpretability, and generalizability.

To evaluate the risk of bias assessment, the CASP Checklist for Descriptive/Cross-Sectional Studies was applied. Table 2 summarizes the risk of bias for each included study.

4. Discussion

The results of this systematic review underscore the growing potential of AI, particularly DL, in forensic science. Particularly, wound ballistics, a subfield of terminal ballistics, plays a critical role in forensic investigations by analyzing factors such as impact angle, range of fire, and projectile trajectory. These analyses help determine the cause, nature, and manner of shooting incidents and support the reconstruction of events [14]. However, forensic pathologists often face challenges due to missing contextual information, such as firearm type or shooting distance, which can lead to subjective interpretations. In this context, AI-based systems offer promising solutions [23].

Several studies have demonstrated the utility of DL in forensic image classification. For instance, CNN-based models have been applied to tasks such as blood spatter analysis and active shooter detection, achieving high accuracy [24,25,26]. Oura et al. [19,20] reported high classification accuracies of 94% and 98% in shotgun pattern and GSW distance classification, respectively. However, these results were based on small datasets derived from piglet carcasses, which do not fully replicate the complexity of human tissue. This limitation is critical, as animal models may not capture the anatomical variability, wound morphology, or tissue response observed in human forensic cases. Moreover, small sample sizes increase the risk of overfitting, reducing the generalizability of AI models.

In contrast, Queiroz Nogueira Lira et al. [21] and Cheng et al. [22] used larger datasets of human GSWs, collected from real crime scenes or forensic archives. Their models, particularly ResNet152 and ConvNext Tiny, achieved strong performance in wound classification tasks. However, even these studies face challenges related to dataset imbalance, image quality variability, and limited institutional diversity, which may affect model robustness across broader forensic contexts.

While DL models have shown the potential to match or exceed human performance in specific tasks, forensic science imposes unique demands. Unlike other domains, forensic evidence must be explainable, reproducible, and legally defensible [27]. This raises critical ethical concerns, particularly regarding the use of human forensic data. The issues of informed consent, data privacy, and the potential for algorithmic bias must be addressed. Transparent data governance frameworks and ethical oversight are essential to ensure responsible AI development in forensic contexts [28,29].

AI’s success in healthcare is often cited as a parallel, given its integration into diagnostic and decision support systems. However, this comparison should be viewed cautiously. While both fields benefit from AI’s ability to process complex data, forensic science involves unique evidentiary standards and legal scrutiny. Therefore, the focus should remain on forensic-specific challenges, such as the admissibility of AI-generated findings in court and the need for interpretability [30,31,32,33,34].

To advance the field, future research should prioritize the development of large, diverse, and ethically sourced datasets. Global collaborations among forensic institutions can facilitate this goal. Incorporating high-resolution 3D imaging, multispectral data, and standardized calibration techniques [35] can further enhance dataset quality. Multi-institutional validation studies, as emphasized in digital forensics [36], are also essential to ensure model reliability across diverse populations and forensic environments.

Moreover, the integration of explainable AI techniques—such as SHAP (Shapley Additive Explanations) and LIME (Locally Interpretable Model-Agnostic Explanations)—is critical for forensic applications [37,38,39]. These tools can help forensic experts understand and defend AI-generated findings under legal scrutiny. Combining AI with multimodal forensic data, such as ballistic trajectories and gunshot residue analysis, may also improve consistency and evidentiary strength [40,41].

Finally, cross-disciplinary collaboration among forensic scientists, AI researchers, ethicists, and legal professionals is vital. Establishing clear legal frameworks for the admissibility of AI-generated evidence and addressing ethical concerns—such as data ownership, consent, and bias—will be crucial for the responsible adoption of AI in forensic science [42,43].

By addressing these challenges, AI can provide standardized methodologies for interpreting GSWs, reduce inter-observer variability, and enhance the reliability of forensic analyses. Rather than replacing forensic pathologists, AI should serve as a decision support tool, helping experts generate consistent, evidence-based interpretations of gunshot injuries and reconstruct shooting dynamics with greater precision.

This systematic review highlights the growing role of AI, particularly DL, in forensic science applications related to GSW classification. A key strength of this review is its synthesis of emerging methodologies that demonstrate AI’s ability to enhance forensic investigations by reducing subjectivity and variability. Additionally, by comparing AI applications across multiple studies, we identified key trends and challenges in this field, offering insights for future research directions.

However, this review has several limitations. First, the number of studies included is relatively small, reflecting the nascent stage of AI integration in forensic wound analysis. This limited sample size restricts the generalizability of our findings and prevents definitive conclusions from being made regarding AI’s effectiveness compared to traditional forensic methods. Second, the reviewed studies exhibit methodological heterogeneity, including variations in dataset size. Third, many critical forensic variables—such as clothing effects, firearm type variations, and decomposition-related alterations—were not systematically analyzed in the included studies. These gaps underscore the need for further research before AI models can be reliably implemented in forensic casework.

5. Conclusions

The findings of this systematic review emphasize the transformative potential of AI, particularly DL, in forensic science. AI models demonstrated promising accuracy in tasks such as gunshot wound classification and shooting distance estimation, though these results should be interpreted with caution due to the small sample sizes and preliminary nature of many included studies. While some models achieved performance comparable to human experts in specific contexts, these findings are not yet sufficient to warrant widespread deployment in forensic practice without further validation.

The adoption of AI in forensic applications comes with significant challenges, including dataset limitations, the need for external validation, and the imperative for model transparency. While larger and more diverse datasets would improve AI model training and generalizability, forensic science faces significant barriers in compiling such datasets. Legal restrictions, ethical concerns, and the lack of standardized data collection protocols hinder the creation of large-scale forensic AI training sets. Additionally, without international standardization, training AI models across multiple datasets risks introducing inconsistencies and biases.

To address these limitations, researchers should prioritize the development of standardized imaging and annotation protocols and seek ethical approval frameworks that facilitate data sharing while protecting privacy. Practitioners are encouraged to engage in collaborative, multi-institutional studies that include diverse populations and real-world forensic conditions. These efforts will help ensure data consistency and scientific validity. Collaborative initiatives among forensic institutions, academia, and industry will be essential in developing ethically sourced and methodologically robust forensic datasets, ultimately improving AI’s reliability in forensic applications.

The use of black-box models in forensic settings is particularly problematic, given the need for evidence to withstand scrutiny in legal proceedings. Transparent, interpretable AI models are essential so that forensic analyses remain defensible in court, safeguarding the integrity of the judicial process. To achieve this, future research should prioritize explainable AI methodologies alongside standardized protocols and comprehensive datasets.

Moreover, ethical considerations—such as data privacy, potential biases, and the fair and accountable application of AI technologies—must be central to these efforts. Researchers should implement privacy-preserving techniques (e.g., data anonymization, federated learning) and conduct bias audits to ensure equitable model performance across demographic groups. Close collaboration between forensic experts, AI researchers, and legal professionals will be crucial in developing trustworthy AI tools that align with forensic best practices and legal standards.

In summary, while AI holds significant promise for enhancing forensic science, its current application remains in an early, exploratory phase. By addressing the outlined challenges and building on the advancements highlighted in this review, AI can evolve into a valuable decision support tool—fostering more objective and reliable outcomes that uphold the principles of justice.

Author Contributions

Conceptualization, F.S. and C.P.; methodology, F.S., E.G., M.E., M.C., L.D.M., M.S., and C.P.; validation, F.S., E.G., M.S., and C.P.; formal analysis, F.S. and E.G.; investigation, F.S. and E.G.; writing—original draft preparation, F.S., E.G., M.E., M.C. L.D.M., M.S., and C.P.; writing—review and editing, F.S., E.G., M.E., M.C., L.D.M., M.S., and C.P.; visualization, F.S. and E.G.; supervision, M.S. and C.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially supported by the research of PIAno di inCEntivi per la Ricerca di Ateneo 2024/2026—Linea di intervento I “Progetti di ricerca collaborativa”—SIAM project—University of Catania, Italy.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were obtained.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
ML	Machine Learning
DL	Deep Learning
CNNs	Convolutional Neural Networks
GSW	Gunshot Wound
PRISMA	Preferred Reporting Items for Systematic Reviews and Meta-Analyses
WOS	Web of Science
CASP	Critical Appraisal Skills Programme
MLSD	Medico-Legal Shooting Distance
SHAP	Shapley Additive Explanations
LIME	Locally Interpretable Model-Agnostic Explanations

References

Delémont, O.; Lock, E.; Ribaux, O. Forensic Science and Criminal Investigation. In Encyclopedia of Criminology and Criminal Justice; Springer: New York, NY, USA, 2014. [Google Scholar]
Roux, C.; Ribaux, O.; Crispino, F. Forensic Science 2020–the End of the Crossroads? Aust. J. Forensic Sci. 2018, 50, 607–618. [Google Scholar] [CrossRef]
Galante, N.; Cotroneo, R.; Furci, D.; Lodetti, G.; Casali, M.B. Applications of Artificial Intelligence in Forensic Sciences: Current Potential Benefits, Limitations and Perspectives. Int. J. Leg. Med. 2022, 137, 445–458. [Google Scholar] [CrossRef] [PubMed]
Elahi, M.; Afolaranmi, S.O.; Martinez Lastra, J.L.; Perez Garcia, J.A. A Comprehensive Literature Review of the Applications of AI Techniques through the Lifecycle of Industrial Equipment. Discov. Artif. Intell. 2023, 3, 43. [Google Scholar] [CrossRef]
Datta, S.D.; Islam, M.; Rahman Sobuz, M.H.; Ahmed, S.; Kar, M. Artificial Intelligence and Machine Learning Applications in the Project Lifecycle of the Construction Industry: A Comprehensive Review. Heliyon 2024, 10, e26888. [Google Scholar] [CrossRef] [PubMed]
Krishnan, G.; Singh, S.; Pathania, M.; Gosavi, S.; Abhishek, S.; Parchani, A.; Dhar, M. Artificial Intelligence in Clinical Medicine: Catalyzing a Sustainable Global Healthcare Paradigm. Front. Artif. Intell. 2023, 6, 1227091. [Google Scholar] [CrossRef] [PubMed]
Olawade, D.B.; David-Olawade, A.C.; Wada, O.Z.; Asaolu, A.J.; Adereni, T.; Ling, J. Artificial Intelligence in Healthcare Delivery: Prospects and Pitfalls. J. Med. Surg. Public Health 2024, 3, 100108. [Google Scholar] [CrossRef]
Sablone, S.; Bellino, M.; Cardinale, A.N.; Esposito, M.; Sessa, F.; Salerno, M. Artificial Intelligence in Healthcare: An Italian Perspective on Ethical and Medico-Legal Implications. Front. Med. 2024, 11, 1343456. [Google Scholar] [CrossRef] [PubMed]
Sessa, F.; Esposito, M.; Cocimano, G.; Sablone, S.; Karaboue, M.A.A.; Chisari, M.; Albano, D.G.; Salerno, M. Artificial Intelligence and Forensic Genetics: Current Applications and Future Perspectives. Appl. Sci. 2024, 14, 2113. [Google Scholar] [CrossRef]
Wang, Z.; Zhang, F.; Wang, L.; Yuan, H.; Guan, D.; Zhao, R. Advances in Artificial Intelligence-Based Microbiome for PMI Estimation. Front. Microbiol. 2022, 13, 1034051. [Google Scholar] [CrossRef] [PubMed]
Baum, G.R.; Baum, J.T.; Hayward, D.; Mackay, B.J. Gunshot Wounds: Ballistics, Pathology, and Treatment Recommendations, with a Focus on Retained Bullets. Orthop. Res. Rev. 2022, 14, 293. [Google Scholar] [CrossRef] [PubMed]
Pomara, C.; Fineschi, V. Forensic and Clinical Forensic Autopsy: An Atlas and Handbook, 2nd ed.; Pomara, C., Fineschi, V., Eds.; CRC Press: Boca Raton, FL, USA, 2020; ISBN 9780367330712. [Google Scholar]
Kimberley Molina, D.; Dimaio, V.; Cave, R. Gunshot Wounds: A Review of Firearm Type, Range, and Location as Pertaining to Manner of Death. Am. J. Forensic Med. Pathol. 2013, 34, 366–371. [Google Scholar] [CrossRef] [PubMed]
Kaur, G.; Mukherjee, D.; Moza, B. A Comprehensive Review of Wound Ballistics: Mechanisms, Effects, and Advancements. Int. J. Med. Toxicol. Leg. Med. 2023, 26, 189–196. [Google Scholar] [CrossRef]
Gugala, Z.; Lindsey, R.W. Classification of Gunshot Injuries in Civilians. Clin. Orthop. Relat. Res. 2003, 408, 65–91. [Google Scholar] [CrossRef] [PubMed]
Molina, D.K.; Rulon, J.J.; Wallace, E.I. The Atypical Entrance Wound: Differential Diagnosis and Discussion of an Unusual Cause. Am. J. Forensic Med. Pathol. 2012, 33, 250–252. [Google Scholar] [CrossRef] [PubMed]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef] [PubMed]
Cohen, J. A Coefficient of Agreement for Nominal Scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
Oura, P.; Junno, A.; Junno, J.A. Deep Learning in Forensic Shotgun Pattern Interpretation—A Proof-of-Concept Study. Leg. Med. 2021, 53, 101960. [Google Scholar] [CrossRef] [PubMed]
Oura, P.; Junno, A.; Junno, J.A. Deep Learning in Forensic Gunshot Wound Interpretation—A Proof-of-Concept Study. Int. J. Legal. Med. 2021, 135, 2101–2106. [Google Scholar] [CrossRef] [PubMed]
Queiroz Nogueira Lira, R.; Geovana Motta de Sousa, L.; Memoria Pinho, M.L.; Pinto da Silva Andrade de Lima, R.C.; Garcia Freitas, P.; Scholles Soares Dias, B.; Breda de Souza, A.C.; Ferreira Leite, A. Deep Learning-Based Human Gunshot Wounds Classification. Int. J. Legal. Med. 2024, 139, 651–666. [Google Scholar] [CrossRef] [PubMed]
Cheng, J.; Schmidt, C.; Wilson, A.; Wang, Z.; Hao, W.; Pantanowitz, J.; Morris, C.; Tashjian, R.; Pantanowitz, L. Artificial Intelligence for Human Gunshot Wound Classification. J. Pathol. Inform. 2023, 15, 100361. [Google Scholar] [CrossRef] [PubMed]
Tynan, P. The Integration and Implications of Artificial Intelligence in Forensic Science. Forensic Sci. Med. Pathol. 2024, 20, 1103–1105. [Google Scholar] [CrossRef] [PubMed]
Kaya, V.; Tuncer, S.; Baran, A. Detection and Classification of Different Weapon Types Using Deep Learning. Appl. Sci. 2021, 11, 7535. [Google Scholar] [CrossRef]
Gelana, F.; Yadav, A. Firearm Detection from Surveillance Cameras Using Image Processing and Machine Learning Techniques. Adv. Intell. Syst. Comput. 2019, 851, 25–34. [Google Scholar]
Bergman, T.; Klöden, M.; Dreßler, J.; Labudde, D. Automatic Classification of Bloodstains with Deep Learning Methods. KI-Kunstl. Intell. 2022, 36, 135–141. [Google Scholar] [CrossRef]
Tripathi, S.; Gabriel, K.; Dheer, S.; Parajuli, A.; Augustin, A.I.; Elahi, A.; Awan, O.; Dako, F. Understanding Biases and Disparities in Radiology AI Datasets: A Review. J. Am. Coll. Radiol. 2023, 20, 836–841. [Google Scholar] [CrossRef] [PubMed]
Dunsin, D.; Ghanem, M.C.; Ouazzane, K.; Vassilev, V. A Comprehensive Analysis of the Role of Artificial Intelligence and Machine Learning in Modern Digital Forensics and Incident Response. Forensic Sci. Int. Digit. Investig. 2024, 48. [Google Scholar] [CrossRef]
Barash, M.; McNevin, D.; Fedorenko, V.; Giverts, P. Machine Learning Applications in Forensic DNA Profiling: A Critical Review. Forensic Sci. Int. Genet. 2024, 69, 102994. [Google Scholar] [CrossRef] [PubMed]
Thompson, L.E. Facial Recognition Software for Lead Generation and Lineup Construction. In The Impact of Technology on the Criminal Justice System; Routledge: Oxford, UK, 2024; ISBN 9781003848233. [Google Scholar]
Kleinstreuer, N.; Hartung, T. Artificial Intelligence (AI)—It’s the End of the Tox as We Know It (and I Feel Fine)*. Arch. Toxicol. 2024, 98, 735–754. [Google Scholar] [CrossRef] [PubMed]
Wankhade, T.D.; Ingale, S.W.; Mohite, P.M.; Bankar, N.J. Artificial Intelligence in Forensic Medicine and Toxicology: The Future of Forensic Medicine. Cureus 2022, 14, e28376. [Google Scholar] [CrossRef] [PubMed]
Alowais, S.A.; Alghamdi, S.S.; Alsuhebany, N.; Alqahtani, T.; Alshaya, A.I.; Almohareb, S.N.; Aldairem, A.; Alrashed, M.; Bin Saleh, K.; Badreldin, H.A.; et al. Revolutionizing Healthcare: The Role of Artificial Intelligence in Clinical Practice. BMC Med. Educ. 2023, 23, 689. [Google Scholar] [CrossRef] [PubMed]
Chan, K.Y.; Yuen, T.H.; Co, M. Using ChatGPT for Medical Education: The Technical Perspective. BMC Med. Educ. 2025, 25, 201. [Google Scholar] [CrossRef] [PubMed]
Chairat, S.; Chaichulee, S.; Dissaneewate, T.; Wangkulangkul, P.; Kongpanichakul, L. AI-Assisted Assessment of Wound Tissue with Automatic Color and Measurement Calibration on Images Taken with a Smartphone. Healthcare 2023, 11, 271. [Google Scholar] [CrossRef] [PubMed]
Brunty, J. Validation of Forensic Tools and Methods: A Primer for the Digital Forensics Examiner. WIREs Forensic Sci. 2023, 5, e1474. [Google Scholar] [CrossRef]
Guo, Y.; Slay, J.; Beckett, J. Validation and Verification of Computer Forensic Software Tools-Searching Function. Digit. Investig. 2009, 6, S12–S22. [Google Scholar] [CrossRef]
Yang, W.; Wei, Y.; Wei, H.; Chen, Y.; Huang, G.; Li, X.; Li, R.; Yao, N.; Wang, X.; Gu, X.; et al. Survey on Explainable AI: From Approaches, Limitations and Applications Aspects. Hum. Centric Intell. Syst. 2023, 3, 161–188. [Google Scholar] [CrossRef]
Olatoye, F.O.; Awonuga, K.F.; Mhlongo, N.Z.; Ibeh, C.V.; Elufioye, O.A.; Ndubuisi, N.L. AI and Ethics in Business: A Comprehensive Review of Responsible AI Practices and Corporate Responsibility. Int. J. Sci. Res. Arch. 2024, 11, 1433–1443. [Google Scholar] [CrossRef]
Kara, I.; Tahillioglu, E. Digital Image Analysis of Gunshot Residue Dimensional Dispersion by Computer Vision Method. Microsc. Res. Tech. 2022, 85, 971–979. [Google Scholar] [CrossRef] [PubMed]
Cheong, B.C. Transparency and Accountability in AI Systems: Safeguarding Wellbeing in the Age of Algorithmic Decision-Making. Front. Hum. Dyn. 2024, 6, 1421273. [Google Scholar] [CrossRef]
Sessa, F.; Chisari, M.; Esposito, M.; Karaboue, M.A.A.; Salerno, M.; Cocimano, G. Ethical, Legal and Social Implications (ELSI) Regarding Forensic Genetic Investigations (FGIs). J. Acad. Ethics. 2024, 1–21. [Google Scholar] [CrossRef]
Stoykova, R. The Right to a Fair Trial as a Conceptual Framework for Digital Evidence Rules in Criminal Investigations. Comput. Law Secur. Rev. 2023, 49, 105801. [Google Scholar] [CrossRef]

Figure 1. PRISMA flow diagram illustrating the study selection process: a total of 502 records were identified through database searches, with 4 studies ultimately included after screening and eligibility assessment.

Table 1. Summary of main findings from included studies.

First Author Name, Country of First Author Affiliation, and Year	Study’s Object	Dataset	Deep Learning Model	Key Findings
Oura et al., Finland, 2021 [19]	Classification of shotgun pattern images based on shooting distance	106 images of shotgun patterns (54 from 10 m, 52 from 17.5 m)	TinyResNet-based algorithm	Achieved 94% accuracy. Highlighted the need for larger datasets and diverse firearms for broader applicability.
Oura et al., Finland, 2021 [20]	Prediction of shooting distance classes from GSWs	204 images from piglet carcasses (negative control, contact, close-range, and distant shots)	Multilayer perceptron (MLP_24_16_24)	Achieved 98% accuracy. Noted limitations due to small sample size and the use of piglet carcasses.
Queiroz Nogueira Lira et al., Brazil, 2024 [21]	Classification of GSWs (entry vs. exit) and shooting distance categories	2551 wound images (entry and exit wounds) from Brazilian forensic cases	ResNet152	Achieved 86.9% (wound classification) and 92.48% (distance classification). Acknowledged dataset imbalance and variability in image quality.
Cheng et al., USA, 2024 [22]	Classification of GSWs (entry vs. exit wounds)	2418 digital images (1314 exit and 1104 entrance wounds)	ConvNext Tiny (Fastai library)	Achieved 87.99% accuracy. Emphasized the need for enhanced data diversity and explainability for forensic applications.

Table 2. This table summarizes the risk of bias for each included study based on key domains: selection bias, measurement bias, reporting bias, applicability concerns, and the overall risk of bias. Selection bias considers factors such as dataset representativeness, while measurement bias reflects limitations in data collection methods. Reporting bias assesses completeness and transparency in study findings, and applicability concerns address the generalizability of results to forensic casework. The overall risk of bias is determined by synthesizing these factors.

Study	Selection Bias	Measurement Bias	Reporting Bias	Applicability Concerns	Overall Risk of Bias
Oura et al. [19]	Low	Moderate (small dataset)	Low	Moderate (single firearm type)	Moderate
Oura et al. [20]	Low	Moderate (animal model)	Low	High (limited external validity)	Moderate
Queiroz Nogueira Lira et al. [21]	Moderate (single-institution data)	Low	Low	Moderate (dataset imbalance)	Moderate
Cheng et al. [22]	Low	Low	Low	Moderate (lack of diverse cases)	Low–Moderate

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sessa, F.; Chisari, M.; Esposito, M.; Guardo, E.; Mauro, L.D.; Salerno, M.; Pomara, C. Advancing Diagnostic Tools in Forensic Science: The Role of Artificial Intelligence in Gunshot Wound Investigation—A Systematic Review. Forensic Sci. 2025, 5, 30. https://doi.org/10.3390/forensicsci5030030

AMA Style

Sessa F, Chisari M, Esposito M, Guardo E, Mauro LD, Salerno M, Pomara C. Advancing Diagnostic Tools in Forensic Science: The Role of Artificial Intelligence in Gunshot Wound Investigation—A Systematic Review. Forensic Sciences. 2025; 5(3):30. https://doi.org/10.3390/forensicsci5030030

Chicago/Turabian Style

Sessa, Francesco, Mario Chisari, Massimiliano Esposito, Elisa Guardo, Lucio Di Mauro, Monica Salerno, and Cristoforo Pomara. 2025. "Advancing Diagnostic Tools in Forensic Science: The Role of Artificial Intelligence in Gunshot Wound Investigation—A Systematic Review" Forensic Sciences 5, no. 3: 30. https://doi.org/10.3390/forensicsci5030030

APA Style

Sessa, F., Chisari, M., Esposito, M., Guardo, E., Mauro, L. D., Salerno, M., & Pomara, C. (2025). Advancing Diagnostic Tools in Forensic Science: The Role of Artificial Intelligence in Gunshot Wound Investigation—A Systematic Review. Forensic Sciences, 5(3), 30. https://doi.org/10.3390/forensicsci5030030

Article Menu

Advancing Diagnostic Tools in Forensic Science: The Role of Artificial Intelligence in Gunshot Wound Investigation—A Systematic Review

Abstract

1. Introduction

2. Methods

2.1. Study Design

2.2. Data Sources and Search Strategy

2.3. Inclusion and Exclusion Criteria

2.4. Quality Assessment and Data Extraction

2.5. Risk of Bias Assessment

2.6. Characteristics of Eligible Studies

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI