Next Article in Journal
Investigation of the Effect of Slope and Road Surface Conditions on Traffic Accidents Occurring in Winter Months: Spatial and Machine Learning Approaches
Previous Article in Journal
Measurements and Analysis of Sound Reflections from Selected Building Façades
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Innovative Telecom Fraud Detection: A New Dataset and an Advanced Model with RoBERTa and Dual Loss Functions

1
School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
2
School of Business, Guilin University of Electronic Technology, Guilin 541004, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(24), 11628; https://doi.org/10.3390/app142411628
Submission received: 12 November 2024 / Revised: 29 November 2024 / Accepted: 11 December 2024 / Published: 12 December 2024
(This article belongs to the Section Computing and Artificial Intelligence)

Abstract

Telecom fraud has emerged as one of the most pressing challenges in the criminal field. With advancements in artificial intelligence, telecom fraud texts have become increasingly covert and deceptive. Existing prevention methods, such as mobile number tracking, detection, and traditional machine-learning-based text recognition, struggle in terms of their real-time performance in identifying telecom fraud. Additionally, the scarcity of Chinese telecom fraud text data has limited research in this area. In this paper, we propose a telecom fraud text detection model, RoBERTa-MHARC, which combines RoBERTa with a multi-head attention mechanism and residual connections. First, the model selects data categories from the CCL2023 telecom fraud dataset as basic samples and merges them with collected telecom fraud text data, creating a five-category dataset covering impersonation of customer service, impersonation of leadership acquaintances, loans, public security fraud, and normal text. During training, the model integrates a multi-head attention mechanism and enhances its training efficiency through residual connections. Finally, the model improves its multi-class classification accuracy by incorporating an inconsistency loss function alongside the cross-entropy loss. The experimental results demonstrate that our model performs well on multiple benchmark datasets, achieving an F1 score of 97.65 on the FBS dataset, 98.10 on our own dataset, and 93.69 on the news dataset.
Keywords: telecom fraud dataset; dual loss functions; natural language processing; text classification telecom fraud dataset; dual loss functions; natural language processing; text classification

Share and Cite

MDPI and ACS Style

Li, J.; Zhang, C.; Jiang, L. Innovative Telecom Fraud Detection: A New Dataset and an Advanced Model with RoBERTa and Dual Loss Functions. Appl. Sci. 2024, 14, 11628. https://doi.org/10.3390/app142411628

AMA Style

Li J, Zhang C, Jiang L. Innovative Telecom Fraud Detection: A New Dataset and an Advanced Model with RoBERTa and Dual Loss Functions. Applied Sciences. 2024; 14(24):11628. https://doi.org/10.3390/app142411628

Chicago/Turabian Style

Li, Jun, Cheng Zhang, and Lanlan Jiang. 2024. "Innovative Telecom Fraud Detection: A New Dataset and an Advanced Model with RoBERTa and Dual Loss Functions" Applied Sciences 14, no. 24: 11628. https://doi.org/10.3390/app142411628

APA Style

Li, J., Zhang, C., & Jiang, L. (2024). Innovative Telecom Fraud Detection: A New Dataset and an Advanced Model with RoBERTa and Dual Loss Functions. Applied Sciences, 14(24), 11628. https://doi.org/10.3390/app142411628

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop