An Architecture as an Alternative to Gradient Boosted Decision Trees for Multiple Machine Learning Tasks
Abstract
:1. Introduction
2. Related Work
3. Methodology
3.1. GBDT Algorithm
3.2. Proposed Architecture
3.2.1. Input Layer
3.2.2. Decision Forest Layer
3.2.3. Fully Connected Layers
3.2.4. Loss Layer
4. Optimization
Algorithm 1 The back-propagation procedure for the proposed architecture |
Input: training set , maximal iteration number k, number n of hidden nodes in the decision forest layer, the shrinkage s. Output: 1. Initialize: and randomly initialize , by Gaussian distribution. 2. Repeat: forward propagation. 4. Update by Equation (20). 5. For , add a new tree to the ensemble of the i-th hidden node in the decision forest layer by Equation (22). 6. 7. Until . |
5. Experiments
- GBDT [18], which consists of an ensemble of regression trees. It automatically selects the features with largest statistical information gain and combines the selected features to fit the training targets well when building trees.
- MLP [24], which makes up a number of interconnected processing elements and processes information by their dynamic state response to external inputs. We report the experimental results of two different models: and , where includes one hidden-layer and includes two hidden layers.
- NNRF [30], a novel model of decision-tree like the multi-layer perceptron network, which has similar properties to a decision tree (The author of NNRF only reported the experimental results for the classification task, so we neglected to select NNRF as the baseline method for the regression task). It has one path activated for each input.
- Ranknet [34], which is a pair-based neural network method and learns the ranking functions by a probabilistic cost. Two sentences from the same document generated a pair and the label is determined by the scores of sentences.
- LambdaMART [21], which is based on Ranknet and directly optimizes NDCG. LambdaMART defines a weight parameter to represent the difference in NDCG when swapping a pair of documents. It is then used to update the weight in the next iteration.
5.1. Datasets
5.2. Evaluation Metrics
5.3. Experiment Settings
5.4. Results
5.5. Parameter Analysis
6. Complexity
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Khan, H.; Wang, X.; Liu, H. Handling missing data through deep convolutional neural network. Inf. Sci. 2022, 595, 278–293. [Google Scholar] [CrossRef]
- Zhou, S.; Deng, X.; Li, C.; Liu, Y.; Jiang, H. Recognition-Oriented Image Compressive Sensing With Deep Learning. IEEE Trans. Multimed. 2023, 25, 2022–2032. [Google Scholar] [CrossRef]
- Li, S.; Dai, W.; Zheng, Z.; Li, C.; Zou, J.; Xiong, H. Reversible Autoencoder: A CNN-Based Nonlinear Lifting Scheme for Image Reconstruction. IEEE Trans. Signal Process. 2021, 69, 3117–3131. [Google Scholar] [CrossRef]
- Rasheed, M.T.; Guo, G.; Shi, D.; Khan, H.; Cheng, X. An empirical study on retinex methods for low-light image enhancemen. Remote Sens. 2022, 14, 4608. [Google Scholar] [CrossRef]
- Rasheed, M.T.; Shi, D.; Khan, H. A comprehensive experiment-based review of low-light image enhancement methods and benchmarking low-light image quality assessment. IEEE Trans. Signal Process. 2023, 204, 108821. [Google Scholar] [CrossRef]
- Soleymanpour, M.; Johnson, M.T.; Soleymanpour, R.; Berry, J. Synthesizing Dysarthric Speech Using Multi-Speaker Tts For Dysarthric Speech Recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Singapore, 23–27 May 2022; pp. 7382–7386. [Google Scholar]
- Lu, H.; Li, N.; Song, T.; Wang, L.; Dang, J.; Wang, X.; Zhang, S. Speech and Noise Dual-Stream Spectrogram Refine Network With Speech Distortion Loss For Robust Speech Recognition. In Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
- Liu, J.; Fang, Y.; Yu, Z.; Wu, T. Design and Construction of a Knowledge Database for Learning Japanese Grammar Using Natural Language Processing and Machine Learning Techniques. In Proceedings of the 2022 4th International Conference on Natural Language Processing (ICNLP), Xi’an, China, 25–27 March 2022; pp. 371–375. [Google Scholar]
- Collobert, R.; Weston, J. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 160–167. [Google Scholar]
- Collobert, R.; Weston, J.; Bottou, L. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 2011, 12, 2493–2537. [Google Scholar]
- Asifullah, K.; Anabia, S.; Umme, C.Z.; Saeed, Q.A. A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 2020, 53, 5455–5516. [Google Scholar]
- Wang, X.; Gao, L.; Song, J. Beyond Frame-level CNN: Saliency-Aware 3-D CNN With LSTM for Video Action Recognition. IEEE Signal Process. Lett. 2017, 24, 510–514. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Wang, F.; Tax, D.M. Survey on the attention based RNN model and its applications in computer vision. arXiv 2016, arXiv:1601.06823. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Díaz-Uriarte, R.; De Andres, S.A. Gene selection and classification of microarray data using random forest. BMC Bioinform. 2006, 7, 3. [Google Scholar] [CrossRef] [PubMed]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2011, 29, 1189–1232. [Google Scholar] [CrossRef]
- Mohan, A.; Chen, Z.; Weinberger, K. Web-search ranking with initialized gradient boosted regression trees. Proc. Learn. Rank. Chall. 2011, 14, 77–89. [Google Scholar]
- Rao, H.; Shi, X.; Rodrigue, A.; Feng, J.; Xia, Y.; Elhoseny, M.; Yuan, X.; Gu, L. Feature selection based on artificial bee colony and gradient boosting decision tree. Appl. Soft Comput. 2019, 74, 634–642. [Google Scholar] [CrossRef]
- Burges, C.J.C. From ranknet to lambdarank to lambdamart: An overview. Learning 2010, 11, 81. [Google Scholar]
- Freund, Y.; Schapire, R. A short introduction to boosting. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, San Francisco, CA, USA, 31 July–6 August 1999. [Google Scholar]
- Ke, G.; Xu, Z.; Zhang, J. DeepGBM: A deep learning framework distilled by GBDT for online prediction tasks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 384–394. [Google Scholar]
- Rosenblatt, F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev. 1958, 65, 86. [Google Scholar] [CrossRef] [PubMed]
- Paul, C.; Jay, A.; Emre, S. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016; pp. 191–198. [Google Scholar]
- Chen, T.; Carlos, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3149–3157. [Google Scholar]
- Jiang, J.; Cui, B.; Zhang, C.; Fu, F. Dimboost: Boosting gradient boosting decision tree to higher dimensions. In Proceedings of the 2018 International Conference on Management of Data, Houston, TX, USA, 10–15 June 2018; pp. 1363–1376. [Google Scholar]
- Biau, G.; Scornet, E.; Welbl, J. Neural random forests. arXiv 2016, arXiv:1604.07143v1. [Google Scholar] [CrossRef]
- Wang, S.H.; Aggarwal, C.C.; Liu, H. Using a Random Forest to Inspire a Neural Network and Improving on It. In Proceedings of the 2017 SIAM International Conference on Data Mining, Houston, TX, USA, 27–29 April 2017; pp. 1–9. [Google Scholar]
- Sethi, I.K. Entropy nets: From decision trees to neural networks. Proc. IEEE 1990, 78, 1605–1613. [Google Scholar] [CrossRef]
- Boser, B.E.; Guyon, I.M.; Vapnik, V.N. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, USA, 27–29 July 1982; pp. 144–152. [Google Scholar]
- De Boer, P.T.; Kroese, D.P.; Mannor, S. A tutorial on the cross-entropy method. Ann. Oper. Res. 2005, 134, 19–67. [Google Scholar] [CrossRef]
- Burges, C.; Shaked, T.; Renshaw, E. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, 7–11 August 2005; pp. 89–96. [Google Scholar]
- Järvelin, K.; Kekäläinen, J. IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, 24–28 July 2000; pp. 41–48. [Google Scholar]
- Baeza-Yates, R.; Ribeiro-Neto, B. Modern Information Retrieval; ACM Press: New York, NY, USA, 1999. [Google Scholar]
- Ganjisaffar, Y.; Caruana, R.; Lopes, C.V. Bagging gradient-boosted trees for high precision, low variance ranking models. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Beijing, China, 24–28 July 2011; pp. 85–94. [Google Scholar]
Mcg | Gvh | Lip | Chg | Aac | Alm1 | Alm2 |
---|---|---|---|---|---|---|
0.29 | 0.30 | 0.48 | 0.50 | 0.45 | 0.03 | 0.17 |
0.22 | 0.36 | 0.48 | 0.50 | 0.35 | 0.39 | 0.47 |
0.23 | 0.58 | 0.48 | 0.50 | 0.37 | 0.53 | 0.59 |
0.47 | 0.47 | 0.48 | 0.50 | 0.22 | 0.16 | 0.26 |
0.54 | 0.47 | 0.48 | 0.50 | 0.28 | 0.33 | 0.42 |
Dataset | Instances | Features | Train/Test Split | Task |
---|---|---|---|---|
Protein | 17,766 | 357 | 12,436/5330 | Classification |
Seismic | 581,012 | 54 | 406,712/174,300 | Classification |
Isolet | 7797 | 617 | 5458/2339 | Classification |
Gesture | 9873 | 32 | 6911/2962 | Classification |
Slices | 53,500 | 386 | 48,150/5350 | Regression |
YearPredictMSD | 515,345 | 90 | 463,811/51,534 | Regression |
Dataset | Number of Documents | Number of Queries | Features |
---|---|---|---|
MQ2007 | 6,962,598 | 1700 | 46 |
MQ2008 | 15,211 | 800 | 46 |
Methods | Protein | Seismic | Isolet | Gesture |
---|---|---|---|---|
GBDT | 0.702 (0.00078) | 0.713 (0.00172) | 0.813 (0.00067) | 0.488 (0.00096) |
MLP_1 | 0.803 (0.00175) | 0.737 (0.00084) | 0.596 (0.00093) | 0.456 (0.00087) |
MLP_2 | 0.811 (0.00162) | 0.742 (0.00178) | 0.616 (0.00128) | 0.477 (0.00114) |
NNRF | 0.693 (0.00074) | 0.711 (0.00076) | 0.857 (0.00145) | 0.462 (0.00098) |
Ours | 0.829 (0.00093) | 0.768 (0.00069) | 0.905 (0.00081) | 0.611 (0.00078) |
Methods | Slices Dataset | YearPredictMSD Dataset |
---|---|---|
GBDT | 5.18 (0.00193) | 9.38 (0.00487) |
MLP_1 | 5.78 (0.00253) | 9.33 (0.00284) |
MLP_2 | 5.63 (0.00272) | 9.31 (0.00354) |
Ours | 2.56 (0.00383) | 9.22 (0.00283) |
Methods | NDCG@1 | NDCG@3 | Mean NDCG | MAP |
---|---|---|---|---|
RankNet | 0.3418 (0.00074) | 0.3519 (0.00074) | 0.4518 (0.00074) | 0.4224 (0.00074) |
LambdaMART | 0.4137 (0.00074) | 0.4157 (0.00074) | 0.5035 (0.00074) | 0.4684 (0.00074) |
Ours | 0.4157 (0.00074) | 0.4178 (0.00074) | 0.5061 (0.00074) | 0.4712 (0.00074) |
Methods | NDCG@1 | NDCG@3 | Mean NDCG | MAP |
---|---|---|---|---|
RankNet | 0.3400 (0.00104) | 0.4000 (0.00058) | 0.4599 (0.00062) | 0.4515 (0.000118) |
LambdaMART | 0.3753 (0.00068) | 0.4312 (0.00084) | 0.4879 (0.00058) | 0.4765 (0.00099) |
Ours | 0.3766 (0.00056) | 0.4304 (0.00074) | 0.4882 (0.00083) | 0.4764 (0.00089) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Du, L.; Song, H.; Xu, Y.; Dai, S. An Architecture as an Alternative to Gradient Boosted Decision Trees for Multiple Machine Learning Tasks. Electronics 2024, 13, 2291. https://doi.org/10.3390/electronics13122291
Du L, Song H, Xu Y, Dai S. An Architecture as an Alternative to Gradient Boosted Decision Trees for Multiple Machine Learning Tasks. Electronics. 2024; 13(12):2291. https://doi.org/10.3390/electronics13122291
Chicago/Turabian StyleDu, Lei, Haifeng Song, Yingying Xu, and Songsong Dai. 2024. "An Architecture as an Alternative to Gradient Boosted Decision Trees for Multiple Machine Learning Tasks" Electronics 13, no. 12: 2291. https://doi.org/10.3390/electronics13122291
APA StyleDu, L., Song, H., Xu, Y., & Dai, S. (2024). An Architecture as an Alternative to Gradient Boosted Decision Trees for Multiple Machine Learning Tasks. Electronics, 13(12), 2291. https://doi.org/10.3390/electronics13122291