An Improved Software Source Code Vulnerability Detection Method: Combination of Multi-Feature Screening and Integrated Sampling Model
Abstract
:1. Introduction
- 1.
- This article proposes a feature selection method that combines machine learning and deep learning. The method extracts source code features based on AST, selects high importance features through analysis of variance (ANOVA), and uses a bidirectional long short-term memory (Bi-LSTM) model for deep learning, effectively solving the problem of the long model training time.
- 2.
- We propose a data balancing scheme based on integrated oversampling (IOS), which improves the quality of synthesized samples and solves the problem of class imbalance in the dataset by generating synthetic samples of vulnerability sample classes and combining them with outlier detection.
- 3.
- Through extensive experiments, this method has shown good performance in software vulnerability detection in practical projects, with significant improvements in accuracy and efficiency compared to existing methods.
2. Related Work
2.1. Graph-Based Vulnerability Detection Method
2.2. Sequence-Based Vulnerability Detection Method
2.3. Applications and Challenges of Deep Learning in Software Vulnerability Detection
3. Research Methods
3.1. Data Preprocessing and Feature Selection
3.1.1. Data Processing
Algorithm 1. Preprocess Source code into Standardized Numeric vector |
Input: Source code |
Output: Standardized Numeric vectors |
Step 1: Source code cleanup Removing comments, unnecessary header files, etc.; Cleaned source code; Step 2: Generate AST Use CodeSensor to parse the cleaned source code into an AST. Step 3: Generate Text vectors Initialize empty text vectors, and execute DFT on AST to generate text vectors; |
Step 4: Train the Word2Vec model Initialize and train the Word2Vec model using text vectors; |
Step 5: Generate Numeric vectors Initialize an empty Numeric vectors; Traverse each element in Text vectors: If the element exists in the Word2vec_model: Add the vector corresponding to the element to Numeric vectors; Else: Add the zero vector to Numeric vectors; Step 6: Standardizing the Numeric vectors: If (Numeric vectors) > LEN: Truncate Numeric vectors to LEN; Else: Pad the Numeric vectors with zero to LEN; |
3.1.2. Feature Selection
Algorithm 2. Feature Selection |
Input: Standardized Numeric vectors |
Output: Selected feature subset |
Begin Step 1: Get Feature subsets Initialize models: Random Forest (RF), Decision Tree (DT), LightGBM (LGBM); Train model on Standardized Numeric vectors; If model = RF: Calculate feature importance based on Gini Impurity; If model = DT: Calculate feature importance based on Information Gain; If model = LGBM: Calculate feature importance based on Gini Index Reduction; Store feature importance and evaluation results for each model; Get Feature subsets; Step 2: Generate the Selected feature dataset: For each feature subset do: one-way ANOVA; Calculate F-statistic and p-value; If p-value < 0.05 then Reject original hypothesis; (significant difference between groups) Select features that are significant in ANOVA and ranked top 30 in feature importance; Else Retain original hypothesis; (no significant difference between groups) End if End for Merge the feature importance scores and corresponding features for RF, DT, and LGBM; Sort features according to the feature importance score; Get Selected feature dataset; |
End |
3.2. Balanced Dataset Based on Integrated Sampling
Algorithm 3. Integrated oversampling and Outlier Detection |
Input: Feature representation dataset (with class imbalance issue) |
Output: Balanced dataset (A balanced and outlier-free synthetic dataset) |
Begin Step 1: Integrated oversampling Confirm class distribution in the dataset; Analyze the proportion of vulnerable code class and normal code class samples; For each oversampling method in [ADASYN, SMOTE, SVMSMOTE, Borderline-SMOTE] do: Apply the oversampling method to generate synthetic vulnerable class samples; Store the generated synthetic samples; End for Merge all synthetic samples generated by different methods into a synthetic vulnerable code dataset; Get Balanced dataset after oversampling; Step 2: Outlier Detection Train One-Class SVM on the dataset to obtain the decision boundary; For each data point x in the dataset do: Evaluate f(x); If f(x) ≥ 0 then Mark x as a normal point; Else Mark x as an outlier; End if End for Remove all data points marked as outliers; For each data point in the dataset do: Calculate the local reachability density; Compute the LOF value; If LOF value > threshold then Mark the point as an outlier; Else Mark the point as normal; End if End for Remove all data points marked as outliers; Get Balanced dataset (a balanced and outlier-free synthetic dataset); |
End |
3.3. Software Vulnerability Detection Based on Bi-LSTM
4. Experimental Results and Analysis
4.1. Dataset
4.2. Experimental Environment
4.3. Performance Evaluation Metrics
4.4. Results and Analysis
4.4.1. Comparison and Analysis of Class Balancing Methods
4.4.2. Comparison and Analysis of Feature Selection Methods
4.4.3. Ablation Experiments
4.4.4. Comparison of Classification Performance of Different DL Models
5. Summary
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
MFISM | multi-feature screening and integrated sampling model |
AST | abstract syntax tree |
ANOVA | analysis of variance |
DL | deep learning |
ML | machine learning |
CNNs | convolutional neural networks |
RNN | recurrent neural network |
BRNN | bidirectional recurrent neural network |
LSTM | long short-term memory network |
Bi-LSTM | bidirectional long short-term memory network |
NVD | National Vulnerability Database |
SARD | Software Assurance Reference Dataset |
CVE | Common Vulnerabilities and Exposures dataset |
CFG | control flow graph |
PDG | program dependency graph |
IOS | integrated oversampling |
LOF | local outlier factor |
ODWO | Original Data Without Oversampling |
ADASYN | Adaptive Synthetic Sampling |
RUS | random undersampling |
SMOTE | Synthetic Minority Oversampling Technique |
MCC | Matthews correlation coefficient |
AUC-PR | Area Under the Precision–Recall Curve |
IR | imbalance ratio |
BGNN4VD | Bidirectional Graph Neural Network for Vulnerability Detection |
References
- Senanayake, J.; Kalutarage, H.; Al-Kadri, M.O.; Petrovski, A.; Piras, L. Android source code vulnerability detection: A systematic literature review. ACM Comput. Surv. 2023, 55, 1–37. [Google Scholar] [CrossRef]
- Su, X.H.; Zheng, W.N.; Jiang, Y.; Wei, H.W.; Wan, J.Y.; Wei, Z.Y. Research and Progress on Source Code Vulnerability Detection Based on Learning. Chin. J. Comput. 2024, 47, 337–374. [Google Scholar]
- Coulter, R.; Han, Q.-L.; Pan, L.; Zhang, J.; Xiang, Y. Data-driven cyber security in perspective—Intelligent traffic analysis. IEEE Trans. Cybern. 2019, 50, 3081–3093. [Google Scholar] [CrossRef]
- Lipp, S.; Banescu, S.; Pretschner, A. An empirical study on the effectiveness of static C code analyzers for vulnerability detection. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual, 18–22 July 2022; pp. 544–555. [Google Scholar]
- Liu, R.; Wang, Y.; Xu, H.; Sun, J.; Zhang, F.; Li, P.; Guo, Z. Vul-LMGNNs: Fusing language models and online-distilled graph neural networks for code vulnerability detection. Inf. Fusion 2025, 115, 102748. [Google Scholar] [CrossRef]
- Cao, S.; Sun, X.; Bo, L.; Wei, Y.; Li, B. Bgnn4vd: Constructing bidirectional graph neural-network for vulnerability detection. Inf. Softw. Technol. 2021, 136, 106576. [Google Scholar] [CrossRef]
- Cao, S.; Sun, X.; Bo, L.; Wu, R.; Li, B.; Tao, C. MVD: Memory-related vulnerability detection based on flow-sensitive graph neural networks. In Proceedings of the 44th International Conference on Software Engineering, Pittsburgh, PA, USA, 25–27 May 2022; pp. 1456–1468. [Google Scholar]
- Zhao, L.; Chen, S.; Xu, Z.; Liu, C.; Zhang, L.; Wu, J.; Sun, J.; Liu, Y. Software composition analysis for vulnerability detection: An empirical study on Java projects. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, San Francisco, CA, USA, 3–9 December 2023; pp. 960–972. [Google Scholar]
- Pan, S.; Bao, L.; Xia, X.; Lo, D.; Li, S. Fine-grained commit-level vulnerability type prediction by CWE tree structure. In Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, Australia, 14–20 May 2023; pp. 957–969. [Google Scholar]
- Wu, B.; Zou, F. Code vulnerability detection based on deep sequence and graph models: A survey. Secur. Commun. Netw. 2022, 2022, 1176898. [Google Scholar] [CrossRef]
- Liang, H.; Yang, Y.; Sun, L.; Jiang, L. Jsac: A novel framework to detect malicious javascript via cnns over ast and cfg. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–8. [Google Scholar]
- Wang, H.; Ye, G.; Tang, Z.; Tan, S.H.; Huang, S.; Fang, D.; Feng, Y.; Bian, L.; Wang, Z. Combining graph-based learning with automated data collection for code vulnerability detection. IEEE Trans. Inf. Forensics Secur. 2020, 16, 1943–1958. [Google Scholar] [CrossRef]
- Liu, S.; Lin, G.; Han, Q.-L.; Wen, S.; Zhang, J.; Xiang, Y. DeepBalance: Deep-Learning and Fuzzy Oversampling for Vulnerability Detection. IEEE Trans. Fuzzy Syst. 2019, 28, 1329–1343. [Google Scholar] [CrossRef]
- Lu, G.; Ju, X.; Chen, X.; Pei, W.; Cai, Z. GRACE: Empowering LLM-based software vulnerability detection with graph structure and in-context learning. J. Syst. Softw. 2024, 212, 112031. [Google Scholar] [CrossRef]
- Wang, L.; Han, M.; Li, X.; Zhang, N.; Cheng, H. Review of classification methods on unbalanced data sets. IEEE Access 2021, 9, 64606–64628. [Google Scholar] [CrossRef]
- Li, Y.; Wang, S.; Nguyen, T.N. Vulnerability detection with fine-grained interpretations. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, 23–28 August 2021; pp. 292–303. [Google Scholar]
- Pathak, A.; Barman, U.; Kumar, T.S. Machine learning approach to detect android malware using feature-selection based on feature importance score. J. Eng. Res. 2024, in press. [Google Scholar] [CrossRef]
- Zheng, W.; Gao, J.; Wu, X.; Xun, Y.; Liu, G.; Chen, X. An empirical study of high-impact factors for machine learning-based vulnerability detection. In Proceedings of the 2020 IEEE 2nd International Workshop on Intelligent Bug Fixing (IBF), London, ON, Canada, 18 February 2020; pp. 26–34. [Google Scholar]
- Hin, D.; Kan, A.; Chen, H.; Babar, M.A. LineVD: Statement-level vulnerability detection using graph neural networks. In Proceedings of the 19th International Conference on Mining Software Repositories, Pittsburgh, PA, USA, 23–24 May 2022; pp. 596–607. [Google Scholar]
- Nguyen, V.A.; Nguyen, D.Q.; Nguyen, V.; Le, T.; Tran, Q.H.; Phung, D. ReGVD: Revisiting Graph Neural Networks for Vulnerability Detection. arXiv 2021, arXiv:2110.07317. [Google Scholar]
- Liu, C.; Li, B.; Zhao, J.; Zhen, Z.; Feng, W.; Liu, X. TI-MVD: A temporal interaction-enhanced model for malware variants detection. Knowl. Based Syst. 2023, 278, 110850. [Google Scholar] [CrossRef]
- Zhang, C.; Xin, Y. VulGAI: Vulnerability Detection Based on Graphs And Images. Comput. Secur. 2023, 135, 103501. [Google Scholar] [CrossRef]
- He, H.; Wang, S.; Wang, Y.; Liu, K.; Yu, L. VulTR: Software vulnerability detection model based on multi-layer key feature enhancement. Comput. Secur. 2025, 148, 104139. [Google Scholar] [CrossRef]
- Peng, B.; Zhang, J.; Liu, Z.; Su, P. CEVulDet: A Code Edge Representation Learnable Vulnerability Detector. In Proceedings of the 2023 International Joint Conference on Neural Networks (IJCNN), Gold Coast, Australia, 18–23 June 2023; pp. 1–8. [Google Scholar] [CrossRef]
- Li, Z.; Zou, D.; Xu, S.; Ou, X.; Jin, H.; Wang, S.; Deng, Z.; Zhong, Y. Vuldeepecker: A deep learning-based system for vulnerability detection. arXiv 2018, arXiv:1801.01681. [Google Scholar]
- Li, Z.; Zou, D.; Xu, S.; Jin, H.; Zhu, Y.; Chen, Z. SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities. IEEE Trans. Dependable Secur. Comput. 2022, 19, 2244–2258. [Google Scholar] [CrossRef]
- Hanif, H.; Maffeis, S. VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection. arXiv 2022, arXiv:2205.12424. [Google Scholar]
- Feng, Z.; Guo, D.; Tang, D.; Duan, N.; Feng, X.; Gong, M.; Shou, L.; Qin, B.; Liu, T.; Jiang, D.; et al. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. arXiv 2020, arXiv:2002.08155. [Google Scholar]
- Li, W.; Dou, S.; Wu, Y.; Li, C.; Liu, Y. COCL: An Intelligent Framework for Enhancing Deep Learning-Based Vulnerability Detection. IEEE Trans. Ind. Inform. 2023, 20, 4953–4961. [Google Scholar] [CrossRef]
- Ban, X.; Liu, S.; Chen, C.; Chua, C. A performance evaluation of deep-learnt features for software vulnerability detection. Concurr. Comput. Pract. Exp. 2019, 31, e5103. [Google Scholar] [CrossRef]
- Johnson, J.M.; Khoshgoftaar, T.M. Survey on deep learning with class imbalance. J. Big Data 2019, 6, 27. [Google Scholar] [CrossRef]
- Yamaguchi, F.; Lottmann, M.; Rieck, K. Generalized vulnerability extrapolation using abstract syntax trees. In Proceedings of the 28th Annual Computer Security Applications Conference, Orlando, FL, USA, 3–7 December 2012; pp. 359–368. [Google Scholar]
- Moonen, L. Generating robust parsers using island grammars. In Proceedings of the Proceedings Eighth Working Conference on Reverse Engineering, Stuttgart, Germany, 2–5 October 2001; pp. 13–22. [Google Scholar]
- Church, K.W. Word2Vec. Nat. Lang. Eng. 2017, 23, 155–162. [Google Scholar] [CrossRef]
- Han, D.; Li, H.; Fu, X.; Zhou, S. Traffic Feature Selection and Distributed Denial of Service Attack Detection in Software-Defined Networks Based on Machine Learning. Sensors 2024, 24, 4344. [Google Scholar] [CrossRef] [PubMed]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- De Ville, B. Decision trees. Wiley Interdiscip. Rev. Comput. Stat. 2013, 5, 448–455. [Google Scholar] [CrossRef]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- St, L.; Wold, S. Analysis of variance (ANOVA). Chemom. Intell. Lab. Syst. 1989, 6, 259–272. [Google Scholar]
- He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–8 June 2008; pp. 1322–1328. [Google Scholar]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Tang, Y.; Zhang, Y.-Q.; Chawla, N.V.; Krasser, S. SVMs modeling for highly imbalanced classification. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2008, 39, 281–288. [Google Scholar] [CrossRef]
- Han, H.; Wang, W.-Y.; Mao, B.-H. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In Proceedings of the International Conference on Intelligent Computing, Hefei, China, 23–26 August 2005; pp. 878–887. [Google Scholar]
- Liu, Y.; Liu, Y.; Bruce, X.; Zhong, S.; Hu, Z. Noise-robust oversampling for imbalanced data classification. Pattern Recognit. 2023, 133, 109008. [Google Scholar] [CrossRef]
- Schölkopf, B.; Williamson, R.C.; Smola, A.; Shawe-Taylor, J.; Platt, J. Support Vector Method for Novelty Detection. In Proceedings of the Advances in Neural Information Processing Systems 12, NIPS Conference, Denver, CO, USA, 29 November–4 December 1999; MIT Press: Cambridge, MA, USA, 1999. [Google Scholar] [CrossRef]
- Breunig, M.M.; Kriegel, H.-P.; Ng, R.T.; Sander, J. LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 16–18 May 2000; pp. 93–104. [Google Scholar]
- Lin, G.; Zhang, J.; Luo, W.; Pan, L.; Xiang, Y. POSTER: Vulnerability discovery with function representation learning from unlabeled projects. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 2539–2541. [Google Scholar]
- Lin, G.; Zhang, J.; Luo, W.; Pan, L.; Xiang, Y.; De Vel, O.; Montague, P. Cross-project transfer representation learning for vulnerable function discovery. IEEE Trans. Ind. Inform. 2018, 14, 3289–3297. [Google Scholar] [CrossRef]
- Ghaffarian, S.M.; Shahriari, H.R. Software vulnerability analysis and discovery using machine-learning and data-mining techniques: A survey. ACM Comput. Surv. (CSUR) 2017, 50, 1–36. [Google Scholar] [CrossRef]
- Dam, H.K.; Tran, T.; Pham, T.; Ng, S.W.; Grundy, J.; Ghose, A. Automatic feature learning for vulnerability prediction. arXiv 2017, arXiv:1708.02368. [Google Scholar]
- Kim, Y. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; Moschitti, A., Pang, B., Daelemans, W., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2014; pp. 1746–1751. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Zhou, Y.; Liu, S.; Siow, J.; Du, X.; Liu, Y. Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks. arXiv 2019, arXiv:1909.03496. [Google Scholar]
- Lu, S.; Guo, D.; Ren, S.; Huang, J.; Svyatkovskiy, A.; Blanco, A.; Clement, C.; Drain, D.; Jiang, D.; Tang, D.; et al. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. arXiv 2021, arXiv:2102.04664. [Google Scholar]
- Guo, D.; Ren, S.; Lu, S.; Feng, Z.; Tang, D.; Liu, S.; Zhou, L.; Duan, N.; Svyatkovskiy, A.; Fu, S.; et al. GraphCodeBERT: Pre-training Code Representations with Data Flow. arXiv 2020, arXiv:2009.08366. [Google Scholar]
Layer | Activation Function | Output Shape | Param |
---|---|---|---|
Embedding | None | (batch_size,30,100) | 1,025,000 |
Bi-LSTM | ‘tanh’ | (batch_size,30,128) | 84,480 |
GolbalMaxPooling | None | (batch_size,128) | 0 |
Dense1 | ‘tanh’ | (batch_size,128) | 16,512 |
Dense2 | None | (batch_size,64) | 8256 |
Dense3 | ‘sigmoid’ | (batch_size,1) | 65 |
Dataset | Func_Total | Non_Func | Vul_Func |
---|---|---|---|
LibTIFF | 5757 | 5565 | 192 |
LibPNG | 621 | 577 | 44 |
FFmpeg | 825 | 731 | 94 |
Final_data | 7202 | 6873 | 329 |
Feature_Method | Precision | Recall | F1_Score | Accuracy |
---|---|---|---|---|
DL_DT | 98.12% | 97.92% | 98.01% | 98.06% |
DL_RF | 97.53% | 97.65% | 97.42% | 97.47% |
DL_lightGBM | 98.26% | 98.18% | 98.24% | 98.25% |
MFISM | 99.29% | 99.28% | 99.30% | 99.30% |
Model | Precision | Recall | F1_Score | Accuracy | Training Time per Epoch |
---|---|---|---|---|---|
Model1 | 82.98% | 76.12% | 79.44% | 95.13% | 38 m/22 epochs |
Model2 | 85.70% | 87.92% | 86.77% | 97.45% | 92 m/21 epochs |
Model3 | 88.43% | 71.88% | 79.29% | 96.71% | 2 m/19 epochs |
Model4 | 87.41% | 85.74% | 86.55% | 97.55% | 6 m/22 epochs |
Model | Accuracy |
---|---|
Bi-LSTM | 59.37% |
TextCNN | 60.69% |
RoBERTa | 61.05% |
CodeBERT | 62.08% |
Devign (Idx + CB) | 60.43% |
ReGVD (GCN + UniT + G-CB) | 63.69% |
MFISM | 66.41% |
Model | Training Duration |
---|---|
DeepBalance | 6 h 10 min |
MFISM | 2 h 18 min |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
He, X.; Asiya; Han, D.; Zhou, S.; Fu, X.; Li, H. An Improved Software Source Code Vulnerability Detection Method: Combination of Multi-Feature Screening and Integrated Sampling Model. Sensors 2025, 25, 1816. https://doi.org/10.3390/s25061816
He X, Asiya, Han D, Zhou S, Fu X, Li H. An Improved Software Source Code Vulnerability Detection Method: Combination of Multi-Feature Screening and Integrated Sampling Model. Sensors. 2025; 25(6):1816. https://doi.org/10.3390/s25061816
Chicago/Turabian StyleHe, Xin, Asiya, Daoqi Han, Shuncheng Zhou, Xueliang Fu, and Honghui Li. 2025. "An Improved Software Source Code Vulnerability Detection Method: Combination of Multi-Feature Screening and Integrated Sampling Model" Sensors 25, no. 6: 1816. https://doi.org/10.3390/s25061816
APA StyleHe, X., Asiya, Han, D., Zhou, S., Fu, X., & Li, H. (2025). An Improved Software Source Code Vulnerability Detection Method: Combination of Multi-Feature Screening and Integrated Sampling Model. Sensors, 25(6), 1816. https://doi.org/10.3390/s25061816