*4.10. Statistical Clinical Model Generation Based on Feature Selection*

The process of feature selection was to find the best subset for classifying two disease progression groups out of 412 proteins. There are two steps. In the first step, 50,000 decision trees containing eight variables were randomly generated 50,000 trees and had AUC values. Based on the AUCs values, the optimal number of proteins were determined by out-of-bag error estimation and the value is 11. Second, through the 100 iterations with three-fold cross-validation for from the selected 11 optimal variables, the probability and importance that each variable was included in the model was calculated. We selected five proteins (>0.3 importance). Prior to model building, centering and scaling were performed as preprocessing on the data. In the clinical models, SVM model with linear kernel was generated by a 10 repeated three-fold cross validation method (parameter C = 0.1052) and The RF model was made by a three-fold cross validation method repeated 100 times with 1000 trees, mtry = 5 and nodesize = 5.
