*2.3. Manual Approach*

Three specialist and three resident otolaryngologists evaluated the images and classified the patients into benign and malignant groups. The residents had less than two years of experience in operating with CE-NBI images and the specialists worked for more than five years with such images. The otolaryngologists were blinded to the histologic diagnosis. They used the ELS guideline to independently visually evaluate the CE-NBI images of *Subset I* based on PVC appearance in the CE-NBI images, as explained in [12].

#### *2.4. Automatic Approach*

We used the algorithm presented in [17,18] to perform the automatic approach. The algorithm consists of a pre-processing step involving vessel enhancement and segmentation [20]. A feature extraction step was then applied to extract 24 geometrical features based on the consistency of gradient direction and the curvature level. Supervised classification step was conducted using the features and four classifiers to classify CE-NBI images into benign and malignant groups.

In this study, we made two main changes to the algorithm proposed in [17,18]. First, the Jerman filter [21] was used as pre-processing for the vessel enhancement step instead of the Frangi filter to overcome the problems related to the established enhancement function, not well adapted to natural variations of the vascular morphology. Second, the values of the tuning parameters of four classifiers including Support Vector Machine (SVM) with Polykernel and Radial Basis Function (RBF) [22], k-Nearest Neighbor (kNN) [23] and Random Forest Classifier (RFC) [24] were updated to have the optimum classification results with the current dataset.

In order to cover all the possible vascular structures, the vesselness parameter *σ* of the Jerman filter was set in the range of 0.5 mm to 2.5 mm with a step size of 0.5 mm. The parameter *τ* controlling the response uniformity was empirically set as 1.

The hyperparameter tuning process of all classifiers was updated using a grid search combined with 10-fold cross validation.

The performance of SVM is maily affected by the regulation parameter (*C*) and kernel parameter (*γ*). The regulation parameter together with Polykernel and RBF controls the trade-off between achieving a low error in training data. *γ* determines how quickly class boundaries dissipate when they get far from the support vectors in SVM with RBF. The range of *C* and *γ* values were set within the range of 0.001 to 1000 with a ten-fold increment. The SVM with RBF completed the high overall performance with *C* = 1 and *γ* = 0.01 and SVM with Polykernel indicated the best results with *C* = 1.

Euclidean Distance was applied to calculate the distance of a sample in the case of kNN. To select the optimum *k*, a range from 1 to 20 with the step size equal to one were used. kNN confirmed the best performance at *k* = 5.

The optimization for RFC was done by adjusting the depth of trees and the number of estimators. The range of depth of the trees was set from 1 to 20 with step size equal to one. For the number of estimators, values from 10 to 100 with an increase of five was defined. The classifier gave the best performance at a depth of 8 with 55 trees.

In all classification scenarios, Subset I and Subset II were used as the testing and training sets, respectively. CE-NBI images were labeled as 0 for benign and as 1 for malignant groups. Each classifier was trained using the images' labels and feature vectors that were computed form the CE-NBI images of the training set. For the testing, the features vectors computed from the CE-NBI images of the testing set were fed into the predictive model of each classifier and then the expected labels were collected.

#### **3. Evaluation Strategy**

#### *3.1. Classification Performances of Manual and Automatic Approaches*

The global performances of the manual and automatic classification were evaluated using two classification measurements: sensitivity and specificity.

In the manual classification, the otolaryngologists assessed the set of CE-NBI images in the *Subset I* and classified each patient's image set as benign or malignant. Following [12], the PVC-positive patients with the malignant histological diagnosis were considered as true positive cases. With this assumption, a confusion matrix was created and the average value of sensitivity and specificity of all otolaryngologists, specialists and residents, was calculated using the following parameters:


In the automatic classification, the classifiers classified each CE-NBI image of *Subset I* as benign or malignant. A confusion matrix was calculated for each classifier using the predicted and actual labels of the images. Then, sensitivity and specificity were calculated using the following parameters:


Based on the descriptions above, the sensitivity and specificity values can show the performances of classifiers/otolaryngologists to correctly classify malignant and benign images/patients.
