Hybrid Features by Combining Visual and Text Information to Improve Spam Filtering Performance
Abstract
:1. Introduction
- A method is proposed that employs three sub-models to extract features from spam images and a classifier model to output the results using the features.
- The proposed method showed significant performance enhancement compared to existing techniques.
- An ablation study was performed to analyze the impacts of each sub-model.
- Optimal combinations of sub-models for detecting spam images applying OCR evasion techniques are presented.
2. Related Work
3. Overall Approach
3.1. Sub-Model Based on Topic (Topic Sub-Model)
3.2. Sub-Model Based on Word Embedding (Text Sub-Model)
3.3. Sub-Model Based on Convoltion (Image Sub-Model)
3.4. Classifier Model
4. Evaluation
4.1. Dataset and Preprocessing Methods
4.2. Experimental Settings
- RQ1. Can the proposed method perform classification in an environment in which the existing techniques for filtering spam images are advantageous?
- RQ2. Can the proposed method perform classification in an environment in which the existing techniques for filtering spam images are disadvantageous?
- RQ3. Can the proposed model perform classification on a new spam image dataset that is not used for RQ1 and RQ2 for the model validation?
4.3. Experimental Results
4.3.1. Answer for RQ1
4.3.2. Answer for RQ2
4.3.3. Answer for RQ3
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Strielkowski, W.; Firsova, I. Effective management of energy consumption during the COVID-19 pandemic: The role of ICT solutions. Energies 2021, 14, 893. [Google Scholar] [CrossRef]
- Gong, D.; Liu, S. Who benefits from online financing? A sharing economy E-tailing platform perspective. Int. J. Prod. Econ. 2020, 222, 107490. [Google Scholar] [CrossRef]
- Cannon, P.; Lumsden, L. An innovative and authentic way of learning how to consult remotely in response to the COVID-19 pandemic. Educ. Primary Care 2022, 33, 53–58. [Google Scholar] [CrossRef] [PubMed]
- Tanwar, S.; Parekh, K. Blockchain-based electronic healthcare record system for healthcare 4.0 applications. J. Inf. Secur. Appl. 2020, 50, 102407. [Google Scholar] [CrossRef]
- Alhaboobi, Z.A.; Yousif, S.T. Intelligent classroom a conceptual model for the effective use of internet of things technique. In Proceedings of the 2019 2nd Scientific Conference of Computer Sciences (SCCS), Baghdad, Iraq, 27–28 March 2019; pp. 116–120. [Google Scholar]
- Meqdad, M.N.; Majdi, H.S. Enabling Techniques for 10 Gbps Long-Haul Transmission in Non-Coherent OCDMA Systems. In Proceedings of the 2018 9th International Symposium on Telecommunications (IST), Tehran, Iran, 17–19 December 2018; pp. 457–459. [Google Scholar]
- Ilker, K.A.R.A.; Aydos, M. Cyber fraud: Detection and analysis of the crypto-ransomware. In Proceedings of the 2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 28–31 October 2020; pp. 764–769. [Google Scholar]
- Hayes, D.R.; Cappa, F. A framework for more effective dark web marketplace investigations. Information 2018, 9, 186. [Google Scholar] [CrossRef] [Green Version]
- Datta, P.; Panda, S.N. A technical review report on cyber crimes in India. In Proceedings of the 2020 International Conference on Emerging Smart Computing and Informatics (ESCI), Pune, India, 12–14 March 2020; pp. 269–275. [Google Scholar]
- Lee, M.; Park, E. Real-time Korean voice phishing detection based on machine learning approaches. J. Ambient. Intell. Humaniz. Comput. 2021, 1–12. [Google Scholar] [CrossRef]
- Loukas, G.; Patrikakis, C.Z. Digital deception: Cyber fraud and online misinformation. IT Prof. 2020, 22, 19–20. [Google Scholar] [CrossRef]
- Shambhavee, H.M. Cyber-Stalking: Threat to People or Bane to Technology. Int. J. Trend Sci. Res. Dev. 2019, 3, 350–355. [Google Scholar] [CrossRef] [Green Version]
- Yu, S. Sex in Spam: A Content Analysis. Int. J. Crim. Justice Sci. 2014, 9, 35. [Google Scholar]
- Ukai, Y.; Takemura, T. Spam mails impede economic growth. Rev. Socionetwork Strateg. 2007, 1, 14–22. [Google Scholar] [CrossRef]
- Available online: https://www.weforum.org/agenda/2018/05/its-40-years-since-the-first-spam-email-was-sent-here-are-6-things-you-didnt/ (accessed on 16 May 2022).
- Fonseca, O.; Fazzion, E. Measuring, characterizing and avoiding spam traffic costs. IEEE Internet Comput. 2016, 20, 16–24. [Google Scholar] [CrossRef]
- Biggio, B.; Fumera, G.; Pillai, I.; Roli, F. Image Spam Filtering by Content Obscuring Detection. In Proceedings of the Fourth Conference on Email and Antispam (CEAS), Mountain View, CA, USA, 2–3 August 2007; pp. 1–5. [Google Scholar]
- Bouma-Sims, E.; Reaves, B. A First Look at Scams on YouTube. arXiv 2021, arXiv:2104.06515. [Google Scholar]
- Adadi, A.; Berrada, M. Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
- Karim, A.; Azam, S. Efficient clustering of emails into spam and ham: The foundational study of a comprehensive unsupervised framework. IEEE Access 2020, 8, 154759–154788. [Google Scholar] [CrossRef]
- Memon, J.; Sami, M. Handwritten optical character recognition (OCR): A comprehensive systematic literature review (SLR). IEEE Access 2020, 8, 142642–142668. [Google Scholar] [CrossRef]
- Abrigo, A.B.C.; Estuar, M.R.J.E. A comparative analysis of N-Gram deep neural network approach to classifying human perception on Dengvaxia. In Proceedings of the 2019 IEEE 2nd International Conference on Information and Computer Technologies (ICICT), Kahului, HI, USA, 14–17 March 2019; pp. 46–51. [Google Scholar]
- Anwar, W.; Bajwa, I.S. An empirical study on forensic analysis of Urdu text using LDA-based authorship attribution. IEEE Access 2018, 7, 3224–3234. [Google Scholar] [CrossRef]
- Huang, Y.; Wang, R. Sentiment Classification of Crowdsourcing Participants’ Reviews Text Based on LDA Topic Model. IEEE Access 2021, 9, 108131–108143. [Google Scholar] [CrossRef]
- Lee, D.G.; Seo, Y.S. Improving bug report triage performance using artificial intelligence based document generation model. Hum. -Cent. Comput. Inf. Sci. 2020, 10, 26. [Google Scholar] [CrossRef]
- Mikolov, T.; Chen, K. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Şahin, D.Ö.; Demirci, S. Spam Filtering with KNN: Investigation of the Effect of k Value on Classification Performance. In Proceedings of the 2020 28th Signal Processing and Communications Applications Conference (SIU), Gaziantep, Turkey, 5–7 October 2020; pp. 1–4. [Google Scholar]
- Zamil, Y.K.; Ali, S.A. Spam image email filtering using K-NN and SVM. Int. J. Electr. Comput. Eng. 2019, 9, 2088–8708. [Google Scholar] [CrossRef]
- Murugavel, U.; Santhi, R. Detection of spam and threads identification in E-mail spam corpus using content based text ana-lytics method. Mater. Today Proc. 2020, 33, 3319–3323. [Google Scholar] [CrossRef]
- Alom, Z.; Carminati, B. A deep learning model for Twitter spam detection. Online Soc. Netw. Media 2020, 18, 100079. [Google Scholar] [CrossRef]
- Hussain, N.; Turab Mirza, H. Spam review detection techniques: A systematic literature review. Appl. Sci. 2019, 9, 987. [Google Scholar] [CrossRef] [Green Version]
- Wang, D.; Irani, D. A study on evolution of email spam over fifteen years. In Proceedings of the 9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing, Austin, TX, USA, 20–23 October 2013; p. 9. [Google Scholar]
- Annadatha, A.; Stamp, M. Image spam analysis and detection. J. Comput. Virol. Hacking Tech. 2018, 14, 39–52. [Google Scholar] [CrossRef]
- Barbar, A.; Ismail, A. Image Spam Detection Using FENOMAA Technique. In Proceedings of the International Conference on Artificial Intelligence and Applied Mathematics in Engineering (ICAIAME 2019), Antalya, Turkey, 20–22 April 2019. [Google Scholar]
- Sharmin, T.; Di Troia, F. Convolutional Neural Networks for Image Spam Detection. Inf. Secur. J. A Glob. Perspect. 2020, 29, 103–117. [Google Scholar] [CrossRef]
- Fatichah, C.; Lazuardi, W.F. Image Spam Detection on Instagram Using Convolution Neural Network. In Intelligent and Interactive Computing; Piuri, V., Balas, V., Borah, S., Syed Ahmad, S., Eds.; Springer: Berlin/Heidelberg, Germany, 2019; Volume 67, pp. 295–303. [Google Scholar]
- Srinivasan, S.; Ravi, V. Deep Convolutional Neural Network based Image Spam Classification. In Proceedings of the 2020 6th Conference on Data Science and Machine Learning Applications (CDMA), Riyadh, Saudi Arabia, 4–5 March 2020; pp. 112–117. [Google Scholar]
- Dredze, M.; Gevaryahu, R. Elias-Bachrach, A. Learning fast classifiers for image spam. In proceedings of the Fourth Conference on Email and Anti-Spam (CEAS), Mountain View, CA, USA, 2–3 August 2007; pp. 487–493. [Google Scholar]
- Gao, Y.; Yang, M. Image spam hunter. In Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, Las Vegas, NV, USA, 31 March–4 April 2008; pp. 1765–1768. [Google Scholar]
- Zaidi, S.S.A.; Ansari, M.S. A survey of modern deep learning based object detection models. Digit. Signal Processing 2022, 126, 103514. [Google Scholar] [CrossRef]
- Lee, D.G.; Jang, Y. Intelligent Image Synthesis for Accurate Retinal Diagnosis. Electronics 2020, 9, 767. [Google Scholar] [CrossRef]
- Huh, J.H.; Seo., Y.S. Understanding Edge Computing: Engineering Evolution with Artificial Intelligence. IEEE Access 2019, 7, 164229–164245. [Google Scholar] [CrossRef]
- Kim, S.K.; Huh, J.H. Artificial Neural Network Blockchain Techniques for Healthcare System: Focusing on the Personal Health Records. Electronics 2020, 9, 763. [Google Scholar] [CrossRef]
- Gade, K.; Geyik, S.C. Explainable AI in industry. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, NY, USA, 4–8 August 2019; pp. 3203–3204. [Google Scholar]
- Samek, W.; Wiegand, T. Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv 2017, arXiv:1708.08296. [Google Scholar]
- Arrieta, A.B.; Díaz-Rodríguez, N. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 85–115. [Google Scholar]
- Shi, C.T. Signal pattern recognition based on fractal features and machine learning. Appl. Sci. 2018, 8, 1327. [Google Scholar] [CrossRef] [Green Version]
- Wang, Z.; Wang, J. An intelligent diagnosis scheme based on generative adversarial learning deep neural networks and its application to planetary gearbox fault pattern recognition. Neurocomputing 2018, 310, 213–222. [Google Scholar] [CrossRef]
- Tesseract OCR. Available online: https://github.com/tesseract-ocr (accessed on 16 May 2022).
- Alsaffar, D.; Alfahhad, A. Machine and deep learning algorithms for Twitter spam detection. In International Conference on Advanced Intelligent Systems and Informatics; Hassanien, A., Shaalan, K., Tolba, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2019; Volume 1058, pp. 483–491. [Google Scholar]
- Bolboacă, S.D.; Jäntschi, L.; Sestraş, A.F.; Sestraş, R.E.; Pamfil, D.C. Pearson-Fisher Chi-Square Statistic Revisited. Information 2011, 2, 528. [Google Scholar] [CrossRef] [Green Version]
Research | Approaches | Target Service | Target Contents | Model | Dataset | |||
---|---|---|---|---|---|---|---|---|
SMS | SNS | Text | Image | |||||
Şahin et al. [27] | Using text mining technologies | O | O | O | TF-IDF + kNN | Enron, Ling-Spam, SMS-Spam-Collection | ||
Zamil et al. [28] | Combining kNN and SVM | O | O | kNN + SVM | Dredze [38] | |||
Murugavel et al. [29] | Detecting keywords and threads in email spam corpus | O | O | Multi-split spam corpus algorithm | Email Dataset | |||
Alom et al. [30] | Classifying spam text and spam account with metadata of twitter accounts | O | O | Deep learning model | Twitter Social Honeypot dataset, Twitter 1KS − 10KN dataset | |||
Barbar et al. [34] | Proposing complete solution model with authentication of domain and enhanced OCR | O | O | FENOMAA | - | |||
Sharmin et al. [35] | Classifying spam images with CNN model | O | O | O | CNN | ISH [39], Advanced Dredze, Challenge dataset | ||
Fatichah et al. [36] | Classifying spam images with CNN models of 3 and 5 layers, AlexNet, VGG16 | O | O | CNN | 8000 captured images in Instagram | |||
Srinivasan et al. [37] | Training VGG19 and Xception models using transfer learning. | O | O | O | CNN | ISH, Improved dataset, Dredze | ||
The proposed method | Classifying spam images using image and text features | O | O | O | O | O | CNN, Word-embedding, LDA, word2vec | ISH, Dredze |
- | SPAM Image | HAM Image |
---|---|---|
Source dataset | 930 | 810 |
After pre-processing | 921 | 798 |
Number of differences | 9 | 12 |
- | SPAM Image | HAM Image |
---|---|---|
Source dataset | 3299 | 2021 |
After pre-processing | 3265 | 1892 |
Number of differences | 34 | 129 |
Models | Baseline-1 [50] | Baseline-2 [37] | Model 1 | Model 2 | Model 3 | Model 4 | Model 5 (Proposed) | |
---|---|---|---|---|---|---|---|---|
Sub-Models | ||||||||
Text (word-embedding) | O | O | O | O | ||||
Image (Convolution) | O | O | O | O | ||||
Topic (LDA/word2vec) | O | O | O | O |
Model | Pre-Process | Input Type | Process for Extracting Features from Images | Output (Size) |
---|---|---|---|---|
Topic | OCR, Tokenize | Words 1 | Latent Dirichlet Allocation (LDA), Pretrained word2Vec | Vector (1500) |
Text | Words 1 | Word Embedding | Vector (64) | |
Image | Resize Image Vectorization | 3 Channels 2 256 × 256 Image | Convolution Neural Network (CNN), Flattening | Vector (21,632) ) |
Classifier | Vector Concatenation | Combined Vector 3 | Fully Connected Layer, Sigmoid | True/False |
Technique | Option | Value |
---|---|---|
Gaussian Noise | Variance | 0.01~0.03 |
Salt and Pepper Noise | Amount | 5% |
Salt vs. Pepper | 50% vs. 50% | |
Gaussian Blurring | Sigma | 1 |
Median Blurring | Filter size | 3 × 3 |
Rotation | Clockwise | 90 degree |
Flipping | Vertical (X-axis) | None |
CAPTCHA * | Random | 30% |
Hyperparameter | Option |
---|---|
Cross Validation | Shuffled 10 Folds (Training:9, Validation:1) |
Start Learning Rate | 0.001 |
Learning Rate Scheduler | Reduce learning rate on plateau Monitor: validation loss, Patience: 3, Factor: 0.9 * |
Optimizer | Stochastic gradient descent |
Early Stopping | Monitor: validation loss, Patience: 15 |
Save Option | Monitor: validation loss, Lowest~5th Model |
Drop Out | 20% |
Label | SPAM | HAM | |
---|---|---|---|
Predict | |||
SPAM | 777 (45.20%) | 11 (0.64%) | |
HAM | 21 (0.12%) | 910 (52.94%) |
Name | Value | |
---|---|---|
Accuracy | 0.9814 | |
Spam Side | Precision | 0.974 |
Recall | 0.986 | |
F1-Score | 0.980 | |
Ham Side | Precision | 0.988 |
Recall | 0.977 | |
F1-Score | 0.983 | |
Macro-F1-Score | 0.9813 |
Spam Side | Ham Side | Total | ||||||
---|---|---|---|---|---|---|---|---|
Models | Precision | Recall | F1-Score | Precision | Recall | F1-Score | Accuracy | Macro-F1-Score |
Baseline-1 (only Text) [50] | 0.9724 | 0.9823 | 0.9773 | 0.9848 | 0.9763 | 0.9805 | 0.9791 | 0.9789 |
Baseline-2 (only Image) [37] | 0.9561 | 0.9658 | 0.9610 | 0.9707 | 0.9623 | 0.9665 | 0.9639 | 0.9637 |
Model 1 (only Topic) | 0.9712 | 0.9949 | 0.9829 | 0.9957 | 0.9755 | 0.9855 | 0.9843 | 0.9842 |
Model 2 (Text + Image) | 0.9586 | 0.9696 | 0.9641 | 0.9739 | 0.9645 | 0.9692 | 0.9668 | 0.9666 |
Model 3 (Text + Topic) | 0.9674 | 0.9936 | 0.9803 | 0.9946 | 0.9724 | 0.9834 | 0.9820 | 0.9818 |
Model 4 (Image + Topic) | 0.9825 | 0.9949 | 0.9887 | 0.9957 | 0.9850 | 0.9903 | 0.9895 | 0.9895 |
Model 5 (All, proposed) | 0.9737 | 0.9860 | 0.9798 | 0.9881 | 0.9774 | 0.9827 | 0.9814 | 0.9813 |
Technique | Text Recognition Rate |
---|---|
Gaussian Noise | 67.28% |
Salt and Pepper Noise | 22.86% |
Gaussian Blurring | 47.06% |
Median Blurring | 24.12% |
Rotation | 69.43% |
Flipping | 3.24% |
CAPTCHA | 48.06% |
Models | Baseline-1 (Only Text) | Baseline-2 (Only Image) | Model 1 (Only Topic) | Model 2 (Text + Image) | Model 3 (Text + Topic) | Model 4 (Image + Topic) | Model 5 (All) | |
---|---|---|---|---|---|---|---|---|
Dataset | ||||||||
Original | 0.9789 | 0.9637 | 0.9842 | 0.9666 | 0.9818 | 0.9895 | 0.9813 | |
Gaussian Noise | 0.9575 | 0.9673 | 0.9557 | 0.9643 | 0.9574 | 0.9673 | 0.9737 | |
Salt and Pepper | 0.9203 | 0.9614 | 0.9296 | 0.9626 | 0.9307 | 0.9592 | 0.9696 | |
Gaussian Blur | 0.8999 | 0.9568 | 0.9255 | 0.9414 | 0.9261 | 0.9662 | 0.9696 | |
Median Blur | 0.7698 | 0.9643 | 0.8684 | 0.9655 | 0.8643 | 0.9644 | 0.9661 | |
Rotation | 0.9498 | 0.9179 | 0.9441 | 0.9336 | 0.9493 | 0.9279 | 0.9365 | |
Flipping | 0.6404 | 0.9562 | 0.6768 | 0.9597 | 0.6693 | 0.9250 | 0.9435 | |
CAPTCHA | 0.9377 | 0.9579 | 0.9603 | 0.9591 | 0.9603 | 0.9714 | 0.9661 | |
Average | 0.8679 | 0.9535 | 0.8943 | 0.9552 | 0.8939 | 0.9545 | 0.9607 |
Models | Model 1 (Only Topic) | Model 2 (Text + Image) | Model 3 (Text + Topic) | Model 4 (Image + Topic) | Model 5 (All) | |
---|---|---|---|---|---|---|
Dataset | ||||||
Original | 0.13088 | 0.17434 | 0.19919 | 0.05941 | 0.94254 | |
Gaussian Noise | 0.95252 | 0.00001 | 0.93610 | 0.45632 | 0.00329 | |
Salt and Pepper | 0.73807 | 0.00000 | 0.60672 | 0.00000 | 0.00000 | |
Gaussian Blur | 0.05261 | 0.00000 | 0.05122 | 0.00000 | 0.00000 | |
Median Blur | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | |
Rotation | 0.74227 | 0.21855 | 0.90336 | 0.00000 | 0.02575 | |
Flipping | 0.12073 | 0.00000 | 0.28285 | 0.00000 | 0.00000 | |
CAPTCHA | 0.00807 | 0.00005 | 0.01374 | 0.00004 | 0.00023 |
Models | Model 1 (Only Topic) | Model 2 (Text + Image) | Model 3 (Text + Topic) | Model 4 (Image + Topic) | Model 5 (All) | |
---|---|---|---|---|---|---|
Dataset | ||||||
Original | 0.00018 | 0.97038 | 0.00079 | 0.00001 | 0.01472 | |
Gaussian Noise | 0.00096 | 0.77645 | 0.00195 | 0.00230 | 0.07059 | |
Salt and Pepper | 0.00000 | 0.95324 | 0.00000 | 0.00069 | 0.11195 | |
Gaussian Blur | 0.00000 | 0.92326 | 0.00000 | 0.01034 | 0.13302 | |
Median Blur | 0.00000 | 0.99828 | 0.00000 | 0.00056 | 0.03140 | |
Rotation | 0.02538 | 0.35590 | 0.00306 | 0.00019 | 0.04086 | |
Flipping | 0.00000 | 0.95990 | 0.00000 | 0.00000 | 0.00001 | |
CAPTCHA | 0.24253 | 0.99867 | 0.11910 | 0.01258 | 0.28896 |
Models | Baseline-1 (Only Text) | Baseline-2 (Only Image) | Model 1 (Only Topic) | Model 2 (Text + Image) | Model 3 (Text + Topic) | Model 4 (Image + Topic) | Model 5 (All) | |
---|---|---|---|---|---|---|---|---|
Dataset | ||||||||
Original | 0.9342 | 0.8163 | 0.8823 | 0.8154 | 0.8817 | 0.8945 | 0.8689 | |
Gaussian Noise | 0.9042 | 0.8138 | 0.8704 | 0.8223 | 0.8721 | 0.8819 | 0.8722 | |
Salt and Pepper | 0.8721 | 0.8152 | 0.8302 | 0.8179 | 0.8314 | 0.8684 | 0.8583 | |
Gaussian Blur | 0.7724 | 0.7814 | 0.8212 | 0.7888 | 0.8164 | 0.8759 | 0.8586 | |
Median Blur | 0.6581 | 0.8170 | 0.8040 | 0.8171 | 0.8005 | 0.8760 | 0.8623 | |
Rotation | 0.9224 | 0.7727 | 0.8813 | 0.7768 | 0.8798 | 0.8406 | 0.8077 | |
Flipping | 0.4389 | 0.8172 | 0.5642 | 0.8194 | 0.5619 | 0.8467 | 0.8464 | |
CAPTCHA | 0.8964 | 0.7765 | 0.8737 | 0.7750 | 0.8766 | 0.8484 | 0.8264 | |
Average | 0.7998 | 0.8013 | 0.8159 | 0.8041 | 0.8151 | 0.8666 | 0.8501 |
Models | Model 1 (Only Topic) | Model 2 (Text + Image) | Model 3 (Text + Topic) | Model 4 (Image + Topic) | Model 5 (All) | |
---|---|---|---|---|---|---|
Dataset | ||||||
Original | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | |
Gaussian Noise | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | |
Salt and Pepper | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | |
Gaussian Blur | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | |
Median Blur | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | |
Rotation | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | |
Flipping | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | |
CAPTCHA | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
Models | Model 1 (Only Topic) | Model 2 (Text + Image) | Model 3 (Text + Topic) | Model 4 (Image + Topic) | Model 5 (All) | |
---|---|---|---|---|---|---|
Dataset | ||||||
Original | 0.00000 | 0.16052 | 0.00000 | 0.00000 | 0.00000 | |
Gaussian Noise | 0.00000 | 0.10389 | 0.00000 | 0.00000 | 0.00000 | |
Salt and Pepper | 0.00000 | 0.09382 | 0.00000 | 0.00000 | 0.00000 | |
Gaussian Blur | 0.00000 | 0.33331 | 0.00000 | 0.00000 | 0.00000 | |
Median Blur | 0.00000 | 0.06803 | 0.00000 | 0.00000 | 0.00000 | |
Rotation | 0.00000 | 0.35616 | 0.00000 | 0.00000 | 0.00000 | |
Flipping | 0.00000 | 0.00591 | 0.00000 | 0.00000 | 0.00000 | |
CAPTCHA | 0.00000 | 0.61324 | 0.00000 | 0.00000 | 0.00000 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nam, S.-G.; Jang, Y.; Lee, D.-G.; Seo, Y.-S. Hybrid Features by Combining Visual and Text Information to Improve Spam Filtering Performance. Electronics 2022, 11, 2053. https://doi.org/10.3390/electronics11132053
Nam S-G, Jang Y, Lee D-G, Seo Y-S. Hybrid Features by Combining Visual and Text Information to Improve Spam Filtering Performance. Electronics. 2022; 11(13):2053. https://doi.org/10.3390/electronics11132053
Chicago/Turabian StyleNam, Seong-Guk, Yonghun Jang, Dong-Gun Lee, and Yeong-Seok Seo. 2022. "Hybrid Features by Combining Visual and Text Information to Improve Spam Filtering Performance" Electronics 11, no. 13: 2053. https://doi.org/10.3390/electronics11132053
APA StyleNam, S. -G., Jang, Y., Lee, D. -G., & Seo, Y. -S. (2022). Hybrid Features by Combining Visual and Text Information to Improve Spam Filtering Performance. Electronics, 11(13), 2053. https://doi.org/10.3390/electronics11132053