A Novel Approach to SPAM Detection in Social Networks-Light-ANFIS: Integrating Gradient-Based One-Sided Sampling and Random Forest-Based Feature Clustering Techniques with Adaptive Neuro-Fuzzy Inference Systems
Abstract
1. Introduction
- 1-
- The number of features effectively reduced by the proposed RFBFC technique also minimized the number of inputs used in ANFIS. This situation significantly reduced the number of parameters used in the ANFIS based on fuzzy sets and rules, enabling a simpler ANFIS configuration.
- 2-
- Combining ANFIS with the RFBFC and GOSS methods enables a substantial reduction in feature columns and data samples within the training dataset while maintaining performance. The structural streamlining outlined in the first bullet point and the accompanying dimensionality reduction led to faster training duration in the proposed Light-ANFIS framework.
- 3-
- The study demonstrates that feature reduction using RFBFC and data sample reduction via GOSS-based selective data sampling not only maintains performance but also improves it in some datasets. Therefore, integrating these techniques with ANFIS could enhance its overall performance.
- 4-
- By integrating RFBFC and GOSS techniques into all fuzzy inference-based systems, significant contributions can be made in terms of structural simplicity, performance, and speed.
2. Related Studies in the Literature
3. Materials and Methods
3.1. About Datasets and Data Collection
3.1.1. Dataset 1
3.1.2. Dataset 2
3.1.3. Dataset 3
Algorithm 1 Real-Time Twitter Data Collection via API | |
Input query // Keyword to track data_dir // Directory to store collected data API_keys // Twitter API credentials | |
Output stream_<query>.json // JSON file containing collected tweets | |
Begin | |
1. | def get_parser(): // Parse command-line arguments to get query and data_dir return parser |
2. | Authenticate with Twitter using API_keys |
3. | Format query to create a valid filename // For each character in query: If the character is valid (letter, digit, -, _, .), keep it Otherwise, replace it with ‘_’ |
4. | class MyListener(StreamListener): // Define MyListener class to handle streaming: |
5. | on_data(data): |
6. | Append incoming tweet data to stream_<query>.json // The folder where the data will be saved was created in json format. |
7. | Print data to console |
8. | If error occurs, wait 5 seconds and continue |
9. | on_error(status): |
10. | Print error status to console |
11. | Initialize Twitter stream with MyListener(query, data_dir) Define on_error(status): // Print error status to console // Start streaming tweets filtered by query |
12. | Start Twitter stream: Filter tweets by query Use MyListener to handle incoming data |
13. | end |
3.2. Light-ANFIS Architecture for Spam Detection in Social Networks
3.2.1. Data Processing Phase
Feature Extraction
Data Encoding and Conversion
Algorithm 2 Coding and Numerization of Features | |
Input spamDataset // Original dataset | |
Outputs numrFeatures // Numeric features booInFeatures // Boolean features classLabels // Class labels | |
Begin | |
1. | [rows, cols] = size(spamDataset) // Process all columns including class labels |
2. | for col = 1 to cols do |
3. | colNumrFeatures = [], colBoolnFeatures = [] |
4. | for row = 1 to rows do |
5. | value = spamDataset[row, col] |
6. | // Assign a boolean value to the column and make 5 copies if value in [“YES”, “NO”, “TRUE”, “FALSE”] then |
7. | numericBooln = (value IN [“YES”, “TRUE”]) ⇐ 1 [“NO”, “FALSE”] ⇐ 0 |
8. | boolnArray = [numericBooln, numericBooln, numericBooln, numericBooln, numericBooln] |
9. | colBoolnFeatures.ADD(boolnArray’) // Generate and assign 5 numeric values randomly // in the related range for the column |
10. | else |
11. | if contains(value, “-”) then |
12. | [min, max] = parse_range(value) |
13. | randomNums = [] |
14. | for i = 1 to 5 do |
15. | randomNums.Add(RANDOM_INT(min, max)) |
16. | end for |
17. | colNumrFeatures.Add(randomNums’) |
18. | end if |
19. | end if |
20. | end for |
21. | numrFeatures.Add(colNumrFeatures) |
22. | boolnFeatures.Add(colBoolnFeatures) |
23. | end for |
24. | // Extract class labels from processed boolean features classLabels = boolnFeatures[cols] |
25. | // Remove class labels from boolean features boolnFeatures.Remove_At(cols) |
26. | Return numrFeatures, boolnFeatures, classLabels |
27. | end |
Algorithm 3 Binary to Decimal Conversion of Features |
Input boolnFeatures, classLabels // Boolean feature set, Class labels |
Outputs DecFeature // Decimal Feature |
Begin |
// Rank features using Random Forest importance |
1. rfmodel = train_random_forest(boolnFeatures, classLabels) |
2. Feature_importances = get_feature_importance(rfmodel) |
3. [~, rankedIndices] = sort(Feature_importances, ‘descend’) |
4. rankedFeatures = boolnFeatures(:, rankedIndices) |
// Convert each row from binary to decimal |
5. decValues = zeros(size(boolnFeatures, 1), 1) |
6. for i = 1:size(boolnFeatures, 1) |
7. binaryrow = rankedFeatures(i, :) |
8. powers = 2.^(length(binaryrow)-1:-1:0) |
9. decValues(i) = sum(binaryrow .* powers) |
10. end for |
11. DecFeature = decValues |
12. end |
Feature Importance Analysis
3.2.2. Data Assessment Phase
Random Forest-Based Feature Clustering (RFBFC) Technique
Gradient-Based One-Sided Sampling (GOSS) Technique
Algorithm 4 GOSS Technique |
Input(s) I: training data, d: iterations |
a: sampling ratio of large gradient data, b: sampling ratio of small gradient data loss: loss function, L: weak learner |
Steps |
1. models ← {}, fact ← (1 − a) / b |
2. topN ← a × len(I), randN ← b × len(I) |
3. for i = 1 to d do |
4. preds ← models.predict(I) |
5. g ← loss(I, preds), w ← {1,1, …} |
6. sorted ← GetSortedIndices(abs(g)) |
7. topSet ← sorted[1: topN] |
8. randSet ← RandomPick(sorted[topN: len(I)], randN) |
9. usedSet ← topSet + randSet |
10. w[randSet] = fact × w[randSet] |
11. newModel ← L(I[usedSet], − g[usedSet], w[usedSet]) |
12. models.append(newModel) |
13. end for |
Adaptive Neural Fuzzy Inference System (ANFIS) Architecture
3.3. Performance Assessment
4. Experimental Results and Discussion
4.1. Ablation Studies
4.2. Experimental Studies
4.3. Literature Comparisons
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
Acc | Accuracy |
ACTM | Account Creation Time-Based Method |
ADM | Anomaly Detection Method |
AI | Artificial Intelligence |
ANFIS | Adaptive Neuro-Fuzzy Inference System |
AS | Automated Systems |
AUC | Area Under ROC Curve |
BAA | Behavioral Analysis Approaches |
BERT | Bidirectional Encoder Representations from Transformers |
BoW | The Bag of Words |
CAA | Comparison and Contrastive Approaches |
CF | Content-Based Filtering |
CMBF | Combined Feature |
CNN | Convolutional Neural Networks |
COLAB | Google Colaboratory |
D1 | Dataset 1 |
D2 | Dataset 2 |
D3 | Dataset 3 |
DFS | Deep Feature Set |
DIDM | Deceptive Information Detection Method |
DL | Deep Learning |
EFB | Exclusive Feature Bundling |
ELM | Ensemble Learning Method |
FFCM | Following and Follower Comparison Method |
FIS | Fuzzy Inference Systems |
FN | False Negative |
FP | False Positive |
FS-HC | Fuzzy Similarity-Based Hierarchical Clustering |
GAT | Geolocation Analysis Technique |
GD | Gradient Descent |
GOSS | Gradient-Based One-Sided Sampling |
GPT-3 | Generative Pre-trained Transformer 3 |
GRU | Gated Recurrent Units |
HSD | Honeypot-Based Spam Detection |
IMDB | Internet Movie Database |
IT2M-FIS | Interval Type-2 Mamdani Fuzzy Inference System |
IT2S-FIS | Interval Type-2 Sugeno Fuzzy Inference System |
LAA | Link Analysis Approach |
LightGBM | Light Gradient Boosting Machine |
LSE | Least Squares Estimation |
LSTM | Long Short-Term Memory |
ML | Machine Learning |
MLM | Machine Learning Methods |
NFS | Naive Feature Set |
NLP | Natural Language Processing |
OOV | Out-Of-Vocabulary |
Prec | Precision |
Rec | Recall |
RF | Random Forest |
RFBFC | Random Forest-Based Feature Clustering |
RFS | Rich Feature Set |
RMSE | Root Mean Square Error |
RNN | Recurrent Neural Networks |
ROC | Receiver Operating Characteristics |
SDT | Spammer Detection Tools |
Spec | Specificity |
SPO | Subject-Predicate-Object |
T1M-FIS | Type-1 Mamdani Fuzzy Inference System |
T1S-FIS | Type-1 Sugeno Fuzzy Inference System |
TF-IDF | Term Frequency-Inverse Document Frequency |
TN | True Negative |
TP | True Positive |
T-TAM | Trend-Topics Analysis Method |
UB | Using Blacklist |
References
- Patmanthara, S.; Febiharsa, D.; Dwiyanto, F.A. Social Media as a Learning Media: A Comparative Analysis of Youtube, WhatsApp, Facebook and Instagram Utillization. In Proceedings of the 2019 International Conference on Electrical, Electronics and Information Engineering (ICEEIE), Denpasar, Bali, Indonesia, 3–4 October 2019; Volume 6, pp. 183–186. [Google Scholar]
- Masciantonio, A.; Bourguignon, D.; Bouchat, P.; Balty, M.; Rimé, B. Don’t Put All Social Network Sites in One Basket: Facebook, Instagram, Twitter, TikTok, and Their Relations with Well-Being during the COVID-19 Pandemic. PLoS ONE 2021, 16, e0248384. [Google Scholar] [CrossRef]
- Authenticity | X Help. Available online: https://help.x.com/en/rules-and-policies/authenticity (accessed on 15 June 2025).
- Krithiga, R.; Ilavarasan, E. A Comprehensive Survey of Spam Profile Detection Methods in Online Social Networks. J. Phys. Conf. Ser. 2019, 1362, 012111. [Google Scholar] [CrossRef]
- Mian, S.M.; Khan, M.S.; Shawez, M.; Kaur, A. Artificial Intelligence (AI), Machine Learning (ML) & Deep Learning (DL): A Comprehensive Overview on Techniques, Applications and Research Directions. In Proceedings of the 2024 2nd International Conference on Sustainable Computing and Smart Systems (ICSCSS), Coimbatore, India, 10–12 July 2024; pp. 1404–1409. [Google Scholar]
- Siva Krishna, D.; Srinivas, G. StopSpamX: A Multi Modal Fusion Approach for Spam Detection in Social Networking. MethodsX 2025, 14, 103227. [Google Scholar] [CrossRef]
- Nasser, M.; Saeed, F.; Da’u, A.; Alblwi, A.; Al-Sarem, M. Topic-Aware Neural Attention Network for Malicious Social Media Spam Detection. Alex. Eng. J. 2025, 111, 540–554. [Google Scholar] [CrossRef]
- Pal, A.A.; Mondal, S.; Kumar, C.A.; Kumar, C.J. A Transformer-Based Approach for Fake News and Spam Detection in Social Media Using RoBERTa. In Proceedings of the 2025 International Conference on Multi-Agent Systems for Collaborative Intelligence (ICMSCI), Erode, India, 20–22 January 2025; pp. 1256–1263. [Google Scholar]
- Çıtlak, O.; Dörterler, M.; Doğru, İ.A. A Survey on Detecting Spam Accounts on Twitter Network. Soc. Netw. Anal. Min. 2019, 9, 1–13. [Google Scholar] [CrossRef]
- Choi, J.; Jeon, B.; Jeon, C. Scalable Learning Framework for Detecting New Types of Twitter Spam with Misuse and Anomaly Detection. Sensors 2024, 24, 2263. [Google Scholar] [CrossRef] [PubMed]
- Rovito, L.; Bonin, L.; Manzoni, L.; De Lorenzo, A. An Evolutionary Computation Approach for Twitter Bot Detection. Appl. Sci. 2022, 12, 5915. [Google Scholar] [CrossRef]
- Liu, S.; Wang, Y.; Zhang, J.; Chen, C.; Xiang, Y. Addressing the Class Imbalance Problem in Twitter Spam Detection Using Ensemble Learning. Comput. Secur. 2017, 69, 35–49. [Google Scholar] [CrossRef]
- Fazil, M.; Abulaish, M. A Hybrid Approach for Detecting Automated Spammers in Twitter. IEEE Trans. Inform. Forensic Secur. 2018, 13, 2707–2719. [Google Scholar] [CrossRef]
- Sánchez-Corcuera, R.; Zubiaga, A.; Almeida, A. Early Detection and Prevention of Malicious User Behavior on Twitter Using Deep Learning Techniques. IEEE Trans. Comput. Soc. Syst. 2024, 11, 6649–6661. [Google Scholar] [CrossRef]
- Patel, P.; Bhushanwar, K.; Patel, H. Social Media Analysis for Criminal Behavior Detection: Methods, Application and Challenge. In Proceedings of the 2025 4th International Conference on Sentiment Analysis and Deep Learning (ICSADL), Bhimdatta, Nepal, 18–20 February 2025; pp. 70–75. [Google Scholar]
- Hussain, N.; Turab Mirza, H.; Rasool, G.; Hussain, I.; Kaleem, M. Spam Review Detection Techniques: A Systematic Literature Review. Appl. Sci. 2019, 9, 987. [Google Scholar] [CrossRef]
- Li, C.; Liu, S. A Comparative Study of the Class Imbalance Problem in Twitter Spam Detection. Concurr. Comput. Pract. Exp. 2018, 30, e4281. [Google Scholar] [CrossRef]
- Santos, I.; Miñambres-Marcos, I.; Laorden, C.; Galán-García, P.; Santamaría-Ibirika, A.; Bringas, P.G. Twitter Content-Based Spam Filtering. In Proceedings of the International Joint Conference SOCO’13-CISIS’13-ICEUTE’13, Salamanca, Spain, 11–13 September 2013; Herrero, Á., Baruque, B., Klett, F., Abraham, A., Snášel, V., de Carvalho, A.C.P.L.F., Bringas, P.G., Zelinka, I., Quintián, H., Corchado, E., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 449–458. [Google Scholar]
- Maurya, S.K.; Singh, D.; Maurya, A.K. Deceptive Opinion Spam Detection Using Feature Reduction Techniques. Int. J. Syst. Assur. Eng. Manag. 2024, 15, 1210–1230. [Google Scholar] [CrossRef]
- Dheyaa Radhi, A.; Obeid, H.N.; Al-Attar, B.; Fuqdan, A.-I.; Hakim, B.A.; Ali Hussein Al Naffakh, H. Unmasking Deceptive Profiles: A Deep Dive into Fake Account Detection on Instagram and Twitter. BIO Web Conf. 2024, 97, 00127. [Google Scholar] [CrossRef]
- Dhar, S.; Bose, I. An Ensemble Deep Learning Model for Fast Classification of Twitter Spam. Inf. Manag. 2024, 61, 104052. [Google Scholar] [CrossRef]
- El Mendili, F.; Fattah, M.; Berros, N.; Filaly, Y.; El Bouzekri El Idrissi, Y. Enhancing Detection of Malicious Profiles and Spam Tweets with an Automated Honeypot Framework Powered by Deep Learning. Int. J. Inf. Secur. 2024, 23, 1359–1388. [Google Scholar] [CrossRef]
- Kumar, M.R.; Bharathi, P.S.; Sajiv, G. Accuracy Enhancement in Detection of Malicious Social Bots Using Reinforcement Learning Technique with URL Features in Twitter Network Through Convolutional Neural Network over K -Nearest Neighbors. In Proceedings of the 2024 Ninth International Conference on Science Technology Engineering and Mathematics (ICONSTEM), Chennai, India, 4–5 April 2024; pp. 1–4. [Google Scholar]
- Dhabliya, D.; Karthikeyan, C.; Sood, G.; Faiz, A.; Shah, M.D. Robust Twitter Spam Detection Through Ensemble Learning and Optimal Feature Selection. In Proceedings of the 2024 IEEE 4th International Conference on ICT in Business Industry & Government (ICTBIG), Indore, India, 13–14 December 2024; pp. 1–6. [Google Scholar]
- Jeong, S.; Noh, G.; Oh, H.; Kim, C. Follow Spam Detection Based on Cascaded Social Information. Inf. Sci. 2016, 369, 481–499. [Google Scholar] [CrossRef]
- Guo, D.; Chen, C. Detecting Non-Personal and Spam Users on Geo-Tagged Twitter Network. Trans. GIS 2014, 18, 370–384. [Google Scholar] [CrossRef]
- Ong, Y.C.; Paladini, S.; Alifan, B.; Sambas, A.; Alwi, S.S.E.; Sedek, N.S.M. SohoNet: A Novel Social Honeynet Framework for Detecting Social Bots in Online Social Networks. J. Adv. Res. Des. 2024, 1, 234248. [Google Scholar] [CrossRef]
- Krishna, C.R.; Loretta, G.I. An Efficient Malicious Social Bots with URL Features Detection Using Densenet Compared over ANN with Improved Accuracy. AIP Conf. Proc. 2025, 3270, 020151. [Google Scholar] [CrossRef]
- Divani, N.; Vinitha, A. Machine Learning-Based Detection of Malicious URLs in Twitter. In Proceedings of the 2025 International Conference on Machine Learning and Autonomous Systems (ICMLAS), Prawet, Thailand, 11–13 March 2025; pp. 61–67. [Google Scholar]
- Güngör, K.N.; Ayhan Erdem, O.; Doğru, İ.A. Tweet and Account Based Spam Detection on Twitter. In Proceedings of the The International Conference on Artificial Intelligence and Applied Mathematics in Engineering, Warsaw, Poland, 31 October–2 November 2025; Springer: Berlin/Heidelberg, Germany, 2020; pp. 898–905. [Google Scholar]
- Asthana, Y.; Chhabra, R.; Srivastava, S. Machine Learning Techniques for Twitter Spam Detection: Comparative Insights and Real-Time Application. In Proceedings of the 2024 14th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 18–19 January 2024; pp. 780–786. [Google Scholar]
- Asha, S.; Madhan, M.; PB, H.K.; Hariharan, S.; Dharaneesh, B. Twitter (X) Spam Detection Using Natural Language Processing by Encoder Decoder Model. In Proceedings of the 2024 1st International Conference on Sustainable Computing and Integrated Communication in Changing Landscape of AI (ICSCAI), Greater Noida, India, 4–6 July 2024; pp. 1–5. [Google Scholar]
- Asmitha, M.; Kavitha, C.R. Exploration of Automatic Spam/Ham Message Classifier Using NLP. In Proceedings of the 2024 IEEE 9th International Conference for Convergence in Technology (I2CT), Pune, India, 5–7 April 2024; pp. 1–7. [Google Scholar]
- Gupta, S.; Khattar, A.; Gogia, A.; Kumaraguru, P.; Chakraborty, T. Collective Classification of Spam Campaigners on Twitter: A Hierarchical Meta-Path Based Approach. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018; pp. 529–538. [Google Scholar]
- Hwang, E.H.; Lee, S. A Nudge to Credible Information as a Countermeasure to Misinformation: Evidence from Twitter. Inf. Syst. Res. 2025, 36, 621–636. [Google Scholar] [CrossRef]
- Nikhil Sai, G.V.; Tubagus, R.A.; Rohith, V.; Donavalli, H. Unlocking Deeper Data Insights on Social Media: Removing Hashtag and Tweets Spam for Improved Content Analysis. In Proceedings of the 2024 5th International Conference for Emerging Technology (INCET), Belgaum, India, 24–26 May 2024; pp. 1–6. [Google Scholar]
- Gerber, A. A Content Analysis: Analyzing Topics of Conversation under the #sustainability Hashtag on Twitter. Environ. Data Sci. 2024, 3, e5. [Google Scholar] [CrossRef]
- Inuwa-Dutse, I.; Liptrott, M.; Korkontzelos, I. Detection of Spam-Posting Accounts on Twitter. Neurocomputing 2018, 315, 496–511. [Google Scholar] [CrossRef]
- Swe, M.M.; Nyein Myo, N. Fake Accounts Detection on Twitter Using Blacklist. In Proceedings of the 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS), Singapore, 6–8 June 2018; pp. 562–566. [Google Scholar]
- Atacak, İ.; Çıtlak, O.; Doğru, İ.A. Application of Interval Type-2 Fuzzy Logic and Type-1 Fuzzy Logic-Based Approaches to Social Networks for Spam Detection with Combined Feature Capabilities. PeerJ Comput. Sci. 2023, 9, e1316. [Google Scholar] [CrossRef]
- Kabakus, A.T.; Kara, R. A Survey of Spam Detection Methods on Twitter. Int. J. Adv. Comput. Sci. Appl. 2017, 8, 29–38. [Google Scholar] [CrossRef]
- Şencan, Ö.A.; Atacak, İ.; Doğru, İ.A. Sosyal Ağlarda Topluluk ve Konu Tespiti: Bir Sistematik Literatür Taraması. Int. J. Inform. Technol. 2022, 15. [Google Scholar]
- Zhang, L.; Liu, W.; Wang, J. Design of Spam Detection and Classification System Based on Artificial Intelligence. In Proceedings of the 2025 5th International Symposium on Computer Technology and Information Science (ISCTIS), Xi’an, China, 16–18 May 2025; pp. 150–153. [Google Scholar]
- Soto-Diaz, R.; Vásquez-Carbonell, M.; Escorcia-Gutierrez, J. A Review of Artificial Intelligence Techniques for Optimizing Friction Stir Welding Processes and Predicting Mechanical Properties. Eng. Sci. Technol. Int. J. 2025, 62, 101949. [Google Scholar] [CrossRef]
- Shifath, S.M.S.-U.-R.; Khan, M.F.; Islam, M.S. A Transformer Based Approach for Fighting COVID-19 Fake News 2021. arXiv 2021, arXiv:2101.12027. [Google Scholar] [CrossRef]
- Alshattnawi, S.; Shatnawi, A.; AlSobeh, A.M.R.; Magableh, A.A. Beyond Word-Based Model Embeddings: Contextualized Representations for Enhanced Social Media Spam Detection. Appl. Sci. 2024, 14, 2254. [Google Scholar] [CrossRef]
- Alom, Z.; Carminati, B.; Ferrari, E. A Deep Learning Model for Twitter Spam Detection. Online Soc. Netw. Media 2020, 18, 100079. [Google Scholar] [CrossRef]
- Agarwal, R.; Dhoot, A.; Kant, S.; Singh Bisht, V.; Malik, H.; Ansari, M.F.; Afthanorhan, A.; Hossaini, M.A. A Novel Approach for Spam Detection Using Natural Language Processing with AMALS Models. IEEE Access 2024, 12, 124298–124313. [Google Scholar] [CrossRef]
- Altwaijry, N.; Al-Turaiki, I.; Alotaibi, R.; Alakeel, F. Advancing Phishing Email Detection: A Comparative Study of Deep Learning Models. Sensors 2024, 24, 2077. [Google Scholar] [CrossRef]
- Wang, X.; Wang, K.; Chen, K.; Wang, Z.; Zheng, K. Unsupervised Twitter Social Bot Detection Using Deep Contrastive Graph Clustering. Knowl.-Based Syst. 2024, 293, 111690. [Google Scholar] [CrossRef]
- Li, T.; Yu, J.; Zhang, H. Web of Things Based Social Media Fake News Classification with Feature Extraction Using Pre-Trained Convoluted Recurrent Network with Deep Fuzzy Learning. Theor. Comput. Sci. 2022, 931, 65–77. [Google Scholar] [CrossRef]
- Wani, M.A.; ElAffendi, M.; Shakil, K.A. AI-Generated Spam Review Detection Framework with Deep Learning Algorithms and Natural Language Processing. Computers (2073-431X) 2024, 13, 264. [Google Scholar] [CrossRef]
- Nair, V.; Pareek, J.; Bhatt, S. A Knowledge-Based Deep Learning Approach for Automatic Fake News Detection Using BERT on Twitter. Procedia Comput. Sci. 2024, 235, 1870–1882. [Google Scholar] [CrossRef]
- Jain, D.K.; Kumar, A.; Sharma, V. Tweet Recommender Model Using Adaptive Neuro-Fuzzy Inference System. Future Gener. Comput. Syst. 2020, 112, 996–1009. [Google Scholar] [CrossRef]
- Suganthi, R.; Prabha, K. Fuzzy Similarity Based Hierarchical Clustering for Communities in Twitter Social Networks. Meas. Sens. 2024, 32, 101033. [Google Scholar] [CrossRef]
- Rajesh, K.P.; Nallasivam, M.P.; PS, S.P.; Kumar, H.; Dharun, V.S. Detection of Fake Hotel Reviews Using ANFIS and Natural Language Processing Techniques. In Proceedings of the 2024 7th International Conference on Circuit Power and Computing Technologies (ICCPCT), Kollam, India, 8–9 August 2024; Volume 1, pp. 265–269. [Google Scholar]
- Gracia Betty, J.; Harivarthini, R.; Deepthi, O.; Pari, R.; Maharajan, P. YouTube Video Spam Comment Detection Using Light Gradient Boosting Machine. In Proceedings of the 2023 5th International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 3–5 August 2023; pp. 1650–1656. [Google Scholar]
- Gong, D.; Liu, Y. A Mechine Learning Approach for Botnet Detection Using LightGBM. In Proceedings of the 2022 3rd International Conference on Computer Vision, Image and Deep Learning & International Conference on Computer Engineering and Applications (CVIDL & ICCEA), Changchun, China, 20–22 May 2022; pp. 829–833. [Google Scholar]
- Aditya, B.L.; Mohanty, S.N. Heterogenous Social Media Analysis For Efficient Deep Learning Fake-Profile Identification. IEEE Access 2023. [Google Scholar] [CrossRef]
- Purba, K.R.; Asirvatham, D.; Murugesan, R.K. Classification of Instagram Fake Users Using Supervised Machine Learning Algorithms. Int. J. Electr. Comput. Eng. 2020, 10, 2763. [Google Scholar] [CrossRef]
- Çıtlak, O.; Doğru, İ.A.; Dörterler, M. Data Set Attributes Drawn in JSON Format on Twitter. ResearchGate 2018. Available online: https://www.researchgate.net/publication/328655475_Data_set_attributes_drawn_in_JSON_format_on_Twitter (accessed on 13 June 2025).
- Patro, S.G.K.; Sahu, K.K. Normalization: A Preprocessing Stage. arXiv 2015, arXiv:1503.06462. [Google Scholar] [CrossRef]
- Feature Selection Using Random Forest Classifier. Available online: https://www.geeksforgeeks.org/machine-learning/feature-selection-using-random-forest-classifier/ (accessed on 18 July 2025).
- Zhao, Q.; Li, L.; Zhang, L.; Zhao, M. Recognition of Corrosion State of Water Pipe Inner Wall Based on SMA-SVM under RF Feature Selection. Coatings 2023, 13, 26. [Google Scholar] [CrossRef]
- Feature Selection Using Random Forest GeeksforGeeks. Available online: https://www.geeksforgeeks.org/machine-learning/feature-selection-using-random-forest/ (accessed on 27 June 2025).
- Gruber, P.; Agner, R.; Deniz, S. Detection of Cavitating States (Swirls) in a Francis Test Pump-Turbine Using Ultrasonic and Transient Pressure Measurements. In Proceedings of the 2018 12th International Group for Hydraulic Efficiency Measurements(IGHEM), Beijing, China, 10–13 September 2018; pp. 64–78. [Google Scholar]
- Saha, S. Acoustic Assessment of Sleep Apnea and Pharyngeal Airway. Ph.D. Thesis, University of Toronto (Canada), Ontario, CA, Canada, 2021. [Google Scholar]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3149–3157. [Google Scholar]
- Feature Selection Techniques in Machine Learning GeeksforGeeks. Available online: https://www.geeksforgeeks.org/machine-learning/feature-selection-techniques-in-machine-learning/ (accessed on 26 August 2025).
- Alpaydin, E. Introduction to Machine Learning; MIT Press: Cambridge, MA, USA, 2020. [Google Scholar]
- Shahani, N.M.; Zheng, X.; Guo, X.; Wei, X. Machine Learning-Based Intelligent Prediction of Elastic Modulus of Rocks at Thar Coalfield. Sustainability 2022, 14, 3689. [Google Scholar] [CrossRef]
- Wang, R.; Liu, Y.; Ye, X.; Tang, Q.; Gou, J.; Huang, M.; Wen, Y. Power System Transient Stability Assessment Based on Bayesian Optimized LightGBM. In Proceedings of the 2019 IEEE 3rd Conference on Energy Internet and Energy System Integration (EI2), Changsha, China, 8–10 November 2019; pp. 263–268. [Google Scholar]
- Al-Hmouz, A.; Shen, J.; Al-Hmouz, R.; Yan, J. Modeling and Simulation of an Adaptive Neuro-Fuzzy Inference System (ANFIS) for Mobile Learning. IEEE Trans. Learn. Technol. 2012, 5, 226–237. [Google Scholar] [CrossRef]
- Adeyemo, Z.K.; Olawuyi, T.O.; Oseni, O.F.; Ojo, S.I. Development of a Path-Loss Prediction Model Using Adaptive Neuro-Fuzzy Inference System. Int. J. Wirel. Microw. Technol. 2019, 9, 40–53. [Google Scholar]
- Jang, J.-S.R. ANFIS: Adaptive-Network-Based Fuzzy Inference System. IEEE Trans. Syst. Man Cybern. 1993, 23, 665–685. [Google Scholar] [CrossRef]
- Dağkurs, B.; Atacak, İ. Deep Learning-Based Novel Ensemble Method with Best Score Transferred-Adaptive Neuro Fuzzy Inference System for Energy Consumption Prediction. PeerJ Comput. Sci. 2025, 11, e2680. [Google Scholar] [CrossRef]
- Turk, F. RNGU-NET: A Novel Efficient Approach in Segmenting Tuberculosis Using Chest X-Ray Images. PeerJ Comput. Sci. 2024, 10, e1780. [Google Scholar] [CrossRef]
- Gharaibeh, M.; Almahmoud, M.; Ali, M.Z.; Al-Badarneh, A.; El-Heis, M.; Abualigah, L.; Altalhi, M.; Alaiad, A.; Gandomi, A.H. Early Diagnosis of Alzheimer’s Disease Using Cerebral Catheter Angiogram Neuroimaging: A Novel Model Based on Deep Learning Approaches. Big Data Cogn. Comput. 2022, 6, 2. [Google Scholar] [CrossRef]
- Liu, X.; Lu, H.; Nayak, A. A Spam Transformer Model for SMS Spam Detection. IEEE Access 2021, 9, 80253–80263. [Google Scholar] [CrossRef]
- Valero-Carreras, D.; Alcaraz, J.; Landete, M. Comparing Two SVM Models through Different Metrics Based on the Confusion Matrix. Comput. Oper. Res. 2023, 152, 106131. [Google Scholar] [CrossRef]
- Understanding the Confusion Matrix in Machine Learning. Available online: https://www.geeksforgeeks.org/confusion-matrix-machine-learning/ (accessed on 21 June 2025).
- Atacak, İ. An Ensemble Approach Based on Fuzzy Logic Using Machine Learning Classifiers for Android Malware Detection. Appl. Sci. 2023, 13, 1484. [Google Scholar] [CrossRef]
- Jayaswal, V. Performance Metrics: Confusion Matrix, Precision, Recall, and F1 Score. Towards Data Sci. 2020. [Google Scholar]
- Laila, K.; Jayashree, P.; Vinuvarsidh, V. A Unified Neuro-Fuzzy Framework to Assess the User Credibility on Twitter. IETE J. Res. 2024, 70, 1407–1424. [Google Scholar] [CrossRef]
- Ouni, S.; Fkih, F.; Omri, M.N. BERT- and CNN-Based TOBEAT Approach for Unwelcome Tweets Detection. Soc. Netw. Anal. Min. 2022, 12, 144. [Google Scholar] [CrossRef]
- Airlangga, G.; Bata, J.; Adi Nugroho, O.I.; Lim, B.H.P. Hybrid CNN-LSTM Model with Custom Activation and Loss Functions for Predicting Fan Actuator States in Smart Greenhouses. AgriEngineering 2025, 7, 118. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhang, P.; Zhang, W.; Wang, M. CNN-LSTM-Attention with PSO Optimization for Temperature and Fault Prediction in Meat Grinder Motors. Discov. Appl. Sci. 2025, 7, 438. [Google Scholar] [CrossRef]
- BERT Inference on G4 Instances Using Apache MXNet and GluonNLP: 1 Million Requests for 20 Cents | Artificial Intelligence. Available online: https://aws.amazon.com/blogs/machine-learning/bert-inference-on-g4-instances-using-apache-mxnet-and-gluonnlp-1-million-requests-for-20-cents/ (accessed on 31 August 2025).
- Performance Regression Found in TensorRT 8.6.1 When Running BERT on GPU T4 Deep Learning (Training & Inference)/TensorRT. Available online: https://forums.developer.nvidia.com/t/performance-regression-found-in-tensorrt-8-6-1-when-running-bert-on-gpu-t4/261651 (accessed on 31 August 2025).
- What Differences in Inference Speed and Memory Usage Might You Observe Between Different Sentence Transformer Architectures (for Example, BERT-Base vs DistilBERT vs RoBERTa-Based Models)? Available online: https://milvus.io/ai-quick-reference/what-differences-in-inference-speed-and-memory-usage-might-you-observe-between-different-sentence-transformer-architectures-for-example-bertbase-vs-distilbert-vs-robertabased-models (accessed on 31 August 2025).
No | Common Spam Detection Methods | Advantages | Disadvantages | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Quick Spam Detection | No Need Complex Algorithms | Can Work Dynamically | Can Include Many Methods | Flexible and Adaptive Models | Fast and Effective Blocking | Real-Time Detection | High Rate of False Positives | High Processing Power Required | Spammers Can Easily Bypass This Method | Need to Learn | Up-to-Date and Maintenance Cost Highly | Complex | The Overfitting Issue | ||
1 | Account Creation Time-based Method (ACTM) | ✓ | ✓ | ✓ | ✓ | ||||||||||
2 | Anomaly Detection Method (ADM) | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||||
3 | Automated Systems (AS) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||||
4 | Behavioral Analysis Approaches (BAA) | ✓ | |||||||||||||
5 | Comparison and Contrastive Approaches (CCA) | ✓ | ✓ | ✓ | |||||||||||
6 | Content-based Filtering (CF) | ✓ | ✓ | ||||||||||||
7 | Deceptive Information Detection Method (DIDM) | ✓ | ✓ | ✓ | |||||||||||
8 | Deep Learning Methods (DLM) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||
9 | Ensemble Learning Method (ELM) | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||||
10 | Following and Follower Comparison Method (FFCM) | ✓ | ✓ | ✓ | |||||||||||
11 | Geolocation Analysis Technique (GAT) | ✓ | ✓ | ✓ | |||||||||||
12 | Honeypot-based Spam Detection (HSD) | ✓ | ✓ | ✓ | ✓ | ||||||||||
13 | Link Analysis Approach (LAA) | ✓ | ✓ | ✓ | |||||||||||
14 | Machine Learning Methods (MLM) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
15 | Natural Language Processing Methods (NLPM) | ✓ | ✓ | ✓ | ✓ | ||||||||||
16 | Spammer Detection Tools (SDT) | ✓ | ✓ | ✓ | |||||||||||
17 | Trend-Topics Analysis Method (T-TAM) | ✓ | ✓ | ||||||||||||
18 | User Reports Methods (URM) | ✓ | ✓ | ✓ | |||||||||||
19 | Using Blacklist (UB) | ✓ | ✓ | ✓ | ✓ |
No | Feature | Used | No | Feature | Used |
---|---|---|---|---|---|
1 | id | 19 | profile_banner_url (PBU) | ✓ | |
2 | name | 20 | profile_use_background_image (PUBG) | ✓ | |
3 | screen_name (SRN) | 21 | profile_background_image_url_https (PBIA) | ||
4 | fav_number (FVN) | ✓ | 22 | profile_text_color (PTC) | ✓ |
5 | statuses_count (STC) | ✓ | 23 | profile_image_url_https (PIH) | |
6 | followers_count (FOC) | ✓ | 24 | profile_sidebar_border_color (PSBC) | ✓ |
7 | friends_count (FRC) | ✓ | 25 | profile_background_tile (PBT) | ✓ |
8 | favourites_count (FAC) | ✓ | 26 | profile_sidebar_fill_color (PSFC) | |
9 | listed_count (LSC) | ✓ | 27 | profile_background_image_url (PBIU) | |
10 | created_at (CRT) | 28 | profile_background_color (PBGC) | ||
11 | url | ✓ | 29 | profile_link_color (PRLC) | |
12 | lang | 30 | utc_offset (UOF) | ||
13 | time_zone (TMZ) | 31 | Protected (PRTC) | ||
14 | Location (LOC) | ✓ | 32 | Verified (VRF) | |
15 | default_profile (DFP) | ✓ | 33 | Description (DSC) | ✓ |
16 | default_profile_image (DPI) | ✓ | 34 | Updated (UPD) | |
17 | geo_enabled (GOE) | ✓ | 35 | dataset | ✓ |
18 | profile_image_url (PRIU) |
No | Type in Model | Type | Evaluation Range | Explanation |
---|---|---|---|---|
1 | USTC:User_Statuses_Count | Tweets | ‘20–99’, ‘100–199’, …, ‘1,000,000–1,999,999’ | The tweet count posted by the user (including retweets). |
2 | USCA:Sensitive_Content _Alert | Array of Object | TRUE/FALSE | USCA refers to sensitive objects within a tweet’s text or a user object’s text fields. |
3 | UFVC:User_Favourites _Count | Boolean | ‘0–9’, ‘10–19’, ‘20–29’, …, ‘100,000–1,999,999’ | UFC is the number of Tweets a user account has liked over its lifetime. |
4 | ULSC: User_Listed_Count | Int | ‘0–9’, ‘10–19’, ‘20–29’, …, ‘900–999’ | ULSC indicates the count of public lists a user belongs to. |
5 | SITW: Source_in_Twitter | String | YES/NO | SITW reveals the utility of sending Tweets as an HTML-formatted string. Tweets originating from the Twitter website carry a web source value. |
6 | UFRC:User_Friends_Counts | Int | ‘0–9’, ‘10–19’, ‘20–29’, …, ‘1000–99,999’ | The user count followed by an account. Under specific circumstances, this field may temporarily appear as zero. |
7 | UFLC:User_Followers_Count | Int | ‘0–9’, ‘10–19’, ‘20–29’, …, ‘100,000–1,999,999’ | The tweet counts a user has liked over the account’s lifetime. |
8 | ULOC:User_Location | String | YES/NO | ULOC displays a user-defined location for an account profile. The search service may sometimes interpret these fields ambiguously. |
9 | UGEO:User_Geo_Enabled | Boolean | TRUE/FALSE | When set to True, UGEO means that the user has allowed location tagging for tweets. |
10 | UDPI:User_Default_Profile _Image | Boolean | TRUE/FALSE | When UDPI is True, it implies that the user has not uploaded a custom profile picture, and the system is using a default image instead. |
11 | RTWT:ReTweet | Boolean | TRUE/FALSE | RTWT shows whether the authenticating user has retweeted Tweet. |
12 | UCRA:User_Created_at | String | ‘2006–2009’, ‘2010–2013’, …, ‘2022–2025’ | UCRA represents the tweet creation time in UTC. |
13 | UCOO:User_Coordinates | coordinates | YES/NO | This area indicates the geographical location of the tweeting or the user application. The internal coordinates are formatted as geoJSON (longitude first, then latitude). This area can be null. |
14 | UDPR:User_Default_Profile | Boolean | TRUE/FALSE | When UDPR is set to True, it means the user’s profile theme or background remains unchanged. |
15 | UFAC:User_Favorite_Count | Boolean | ‘0–9’, ‘10–19’, … ‘100,000–1,999,999’ | This feature displays the approximate number of times users have expressed their approval of a tweet by selecting the “like” option on the Twitter interface. |
16 | UFAV:User_Favorited | Int | ‘0–9’, ‘10–19’, … ‘900–999’ | This field indicates whether the tweet in question was liked by the authenticating user. |
17 | URSN:User_in_Reply _to_ScreenName | String | YES/NO | If the Tweet is a reply, the screen name of the original tweet’s author will be displayed here. |
18 | UPSE:User_Possibly_Sensitive | Boolean | TRUE/FALSE | The presence of this field is contingent upon the inclusion of a link within the tweet. Its purpose is not directly about the tweet’s content, but rather to indicate that the URL in the tweet may contain sensitive content or media. |
19 | UPRO:User_Protected | Boolean | TRUE/FALSE | If this is true, it signifies that the user has elected to safeguard their tweets. |
20 | URTC:User_Retweet_Count | Int | ‘0–9’, ‘10–19’, … ‘100,000–1,999,999’ | This dataset contains information about how many times a specified tweet was retweeted. |
21 | UURL:User_Url | String | YES/NO | The user is prompted to provide a Uniform Resource Locator (URL) in relation to their profile. |
22 | UVFD:User_Verified | Boolean | TRUE/FALSE | When the value of UVFD is true, it implies that the user has an authenticated account. |
23 | CLASS:Account Suspender | Boolean | TRUE/FALSE | CLASS represents the class label and indicates whether an account is spam. |
Metrics | Formula | Definition |
---|---|---|
Accuracy (Acc) | This metric calculates the proportion of samples that were correctly classified from all samples that were evaluated. | |
Recall (Rec) | Recall is an indicator of how accurately the model measures positive samples. | |
Precision (Prec) | Precision is a metric that gives the ratio between the number of positive samples correctly predicted among the samples that the model identified as positive. | |
F1-score (F1-scr) | The F1 Score, or F-measure, expresses the harmonic mean of precision and recall. This metric is a valuable indicator when precision and recall are equally important. F1-score (hereafter referred to as F-score). | |
Area Under ROC Curve (AUC) | The Area Under the Curve (AUC) performs the model performance measurement based on the area under the Receiver Operating Characteristic (ROC) curve. In the AUC equation, TPR defines the proportion of correctly classified positive samples, whereas FPR indicates the proportion of incorrectly classified positive samples. | |
RMSE | RMSE is a metric that provides the square root of the mean squared differences between the predicted values and real ones. In this context, represents the i-th predicted value, is the i-th actual value, and n is the total number of samples. | |
Training Time T(t) | _ | This represents the time allocated for model training. In practice, it serves as a performance metric that determines the processing and memory complexities of the model. |
Parameter’ Name | Parameter’ Value |
---|---|
EpochNumber | 200–4000 |
InitialStepSize | 0.005 |
StepSizeDecreaseRate | 0.9 |
StepSizeIncreaseRate | 1.1 |
DisplayANFISInformation | 0 |
DisplayErrorValues | 0 |
OptimizationMethod | 1 |
MF Type | Number of Set | Accuracy | Precision | Recall | F1 Score | AUC | Training Time (s) | Number of Parameter |
---|---|---|---|---|---|---|---|---|
Gbellmf | 3 | 0.97115 | 0.96368 | 0.98909 | 0.97622 | 0.96673 | 113.86 | 39 |
5 | 0.98585 | 0.98818 | 0.98818 | 0.98818 | 0.98527 | 364.14 | 95 | |
7 | 0.99238 | 0.98743 | 1.00000 | 0.99368 | 0.99050 | 834.76 | 175 | |
Trimf | 3 | 0.94829 | 0.95974 | 0.95364 | 0.95668 | 0.94697 | 51.63 | 39 |
5 | 0.96407 | 0.96161 | 0.97909 | 0.97027 | 0.96037 | 162.04 | 95 | |
7 | 0.97768 | 0.97319 | 0.99000 | 0.98152 | 0.97465 | 764.53 | 175 | |
Gaussmf | 3 | 0.97496 | 0.97563 | 0.98273 | 0.97917 | 0.97305 | 51.95 | 39 |
5 | 0.99020 | 0.98827 | 0.99545 | 0.99185 | 0.98891 | 166.97 | 95 | |
7 | 0.99183 | 0.98742 | 0.99909 | 0.99322 | 0.99005 | 781.79 | 175 | |
Gauss2mf | 3 | 0.96788 | 0.96850 | 0.97818 | 0.97332 | 0.96535 | 65.57 | 39 |
5 | 0.98040 | 0.98188 | 0.98545 | 0.98367 | 0.97916 | 185.22 | 95 | |
7 | 0.98911 | 0.98736 | 0.99455 | 0.99094 | 0.98777 | 840.91 | 175 | |
Trapmf | 3 | 0.93794 | 0.95313 | 0.94273 | 0.94790 | 0.93676 | 58.18 | 39 |
5 | 0.96952 | 0.95951 | 0.99091 | 0.97496 | 0.96425 | 172.51 | 95 | |
7 | 0.98258 | 0.98195 | 0.98909 | 0.98551 | 0.98098 | 781.22 | 175 |
GOSS Parameters | Performance Metrics | |||
---|---|---|---|---|
a | b | Sum of Gradian Rates | Accuracy | Training Time (s) |
0.10 | 0.60 | 0.70 | 0.98476 | 204.15 |
0.30 | 0.40 | 0.99020 | 133.86 | |
0.10 | 0.50 | 0.60 | 0.98748 | 102.24 |
0.40 | 0.20 | 0.98204 | 101.02 | |
0.30 | 0.20 | 0.50 | 0.98204 | 83.76 |
0.40 | 0.10 | 0.98040 | 83.45 | |
0.20 | 0.20 | 0.40 | 0.97714 | 65.97 |
0.30 | 0.10 | 0.98040 | 66.32 |
GOSS Parameters | Performance Metrics | ||||||
---|---|---|---|---|---|---|---|
a | b | Sum of Gradian Rates | Accuracy | Recall | AUC | Training Time (s) | |
Dataset 1 | 0.30 | 0.40 | 0.70 | 0.99020 | 0.99546 | 0.98891 | 133.86 |
0.10 | 0.50 | 0.60 | 0.98748 | 0.99091 | 0.98664 | 102.24 | |
0.30 | 0.20 | 0.50 | 0.98204 | 0.98636 | 0.98097 | 83.76 | |
0.30 | 0.10 | 0.40 | 0.98040 | 0.98364 | 0.97961 | 66.32 | |
Dataset 2 | 0.30 | 0.40 | 0.70 | 0.97633 | 0.98804 | 0.97528 | 82.35 |
0.10 | 0.50 | 0.60 | 0.98225 | 0.99043 | 0.98234 | 46.02 | |
0.30 | 0.20 | 0.50 | 0.97751 | 0.98804 | 0.97763 | 37.82 | |
0.30 | 0.10 | 0.40 | 0.96568 | 0.97847 | 0.96582 | 42.09 | |
Dataset 3 | 0.30 | 0.40 | 0.70 | 0.98534 | 0.99089 | 0.98371 | 2152.10 |
0.10 | 0.50 | 0.60 | 0.98552 | 0.98720 | 0.98499 | 1866.42 | |
0.30 | 0.20 | 0.50 | 0.98642 | 0.99168 | 0.98489 | 1494.26 | |
0.30 | 0.10 | 0.40 | 0.98419 | 0.99449 | 0.98181 | 1210.77 |
Datasets | Method | Accuracy | Precision | Recall | F1 Score | AUC |
---|---|---|---|---|---|---|
Dataset 1 | RFBFC-ANFIS | 0.99020 | 0.98827 | 0.99545 | 0.99185 | 0.98891 |
Light-ANFIS | 0.98748 | 0.98821 | 0.99091 | 0.98956 | 0.98664 | |
Dataset 2 | RFBFC-ANFIS | 0.97988 | 0.96956 | 0.99043 | 0.97988 | 0.97999 |
Light-ANFIS | 0.98225 | 0.97412 | 0.99043 | 0.98221 | 0.98233 | |
Dataset 3 | RFBFC-ANFIS | 0.98178 | 0.98571 | 0.98454 | 0.98513 | 0.98097 |
Light-ANFIS | 0.98552 | 0.98915 | 0.98720 | 0.98818 | 0.98503 |
No | Author(s) | Method(s) | Dataset | Acc (%) | Prec (%) | Rec (%) | F-Score (%) |
---|---|---|---|---|---|---|---|
1 | Dhar and Bose [21] | Ensemble Models | Twitter Dataset 2 | 70.3 | 66.7 | 72.0 | 69.2 |
2 | Atacak et al. [40] | Interval Type-2 Mamdani Fuzzy Inference System | Twitter Dataset 1 | 95.5 | 95.7 | 96.7 | 96.2 |
3 | Li et al. [51] | Novel FL-driven web categorization system | Fake News Dataset | 95.46 | 96.73 | 94.87 | 95.45 |
4 | Suganthi and Prabha [55] | Fuzzy Similarity-based Hierarchical Clustering | Kaggle Social Media Dataset | 92 | - | - | - |
5 | Krishna and Srinivas [6] | Multi-modal fusion approach | Twitter Dataset | 98.48 | 98.80 | 98.20 | 98.40 |
6 | Laila et al. [85] | Unified Neuro-Fuzzy Inference System | Twitter and Amazon benchmark datasets | 97.01 | 95.33 | 92.67 | 94.71 |
7 | Ouni et al. [86] | BERT- and CNN-based TOBEAT approach | Twitter Dataset | 94.97 | 94.05 | 95.88 | 94.95 |
8 | Proposed Model | Light-ANFIS | Twitter Dataset 1 | 98.748 | 98.821 | 99.091 | 98.956 |
Twitter Dataset 2 | 98.225 | 97.412 | 99.043 | 98.221 | |||
Twitter Dataset 3 | 98.552 | 98.915 | 98.720 | 98.818 |
Model | Number of Parameters | Model Size (MB) | Inference Latency (ms) |
---|---|---|---|
Light-ANFIS (Proposed Model) | 95 | 0.163 | 0.0031–0.0047 |
CNN-LSTM | ~1–10 M | 5–50 | 0.5–10 |
BERT-Base | ~110 M | ~1200 | T4 GPU: 1–5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Çıtlak, O.; Atacak, İ.; Doğru, İ.A. A Novel Approach to SPAM Detection in Social Networks-Light-ANFIS: Integrating Gradient-Based One-Sided Sampling and Random Forest-Based Feature Clustering Techniques with Adaptive Neuro-Fuzzy Inference Systems. Appl. Sci. 2025, 15, 10049. https://doi.org/10.3390/app151810049
Çıtlak O, Atacak İ, Doğru İA. A Novel Approach to SPAM Detection in Social Networks-Light-ANFIS: Integrating Gradient-Based One-Sided Sampling and Random Forest-Based Feature Clustering Techniques with Adaptive Neuro-Fuzzy Inference Systems. Applied Sciences. 2025; 15(18):10049. https://doi.org/10.3390/app151810049
Chicago/Turabian StyleÇıtlak, Oğuzhan, İsmail Atacak, and İbrahim Alper Doğru. 2025. "A Novel Approach to SPAM Detection in Social Networks-Light-ANFIS: Integrating Gradient-Based One-Sided Sampling and Random Forest-Based Feature Clustering Techniques with Adaptive Neuro-Fuzzy Inference Systems" Applied Sciences 15, no. 18: 10049. https://doi.org/10.3390/app151810049
APA StyleÇıtlak, O., Atacak, İ., & Doğru, İ. A. (2025). A Novel Approach to SPAM Detection in Social Networks-Light-ANFIS: Integrating Gradient-Based One-Sided Sampling and Random Forest-Based Feature Clustering Techniques with Adaptive Neuro-Fuzzy Inference Systems. Applied Sciences, 15(18), 10049. https://doi.org/10.3390/app151810049