**2. Related Work**

The significance of medical imaging for the diagnosis of diseases has been established for the treatment of chest pathologies and their early detection. During the last decades, the advances of digital technology and chest radiography as well as the rapid development of digital image retrieval have renewed the progress in new technologies for the diagnosis of lung abnormalities. More specifically, research has been focused on the development of Computer-Aided Diagnostic (CAD) models for abnormality detection in order to assist medical staff. Along this line, a variety of methodologies have been proposed based on machine learning techniques, aiming on classifying and/or detecting abnormalities in patients' medical images. A number of studies have been carried out in recent years; some useful outcomes of them are briefly presented below.

Jaeger et al. [19] proposed a CAD system for tuberculosis in conventional posteroanterior chest radiographs. Their proposed model initially utilizes a graph cut segmentation method to extract the lung region from the CXRs and then a set of texture and shape features in the lung region is computed in order to classify the patient as normal or abnormal. Their extensive numerical experiments on two real-world datasets illustrated the efficiency of the proposed CAD system for tuberculosis screening, achieving higher performance compared to that of human readings.

Melendez et al. [20] recommend a novel CAD system for detecting tuberculosis on chest X-rays based on multiple-instance learning. Their proposed system is based on the idea of utilizing probability estimations, instead of the sign of a decision function, to guide the multiple-instance learning process. Furthermore, an advantage of their method is that it does not require labeling of each feature sample during the training process but only a global class label characterizing a group of samples.

Alam et al. [21] utilized a multi-class support vector machine classifier and developed an efficient lung cancer detection and prediction model. The image enhancement and image segmentation have been done independently in every stage of the classification process. Image scaling, color space transformation and contrast enhancement have been utilized for image enhancement while threshold and marker-controlled watershed have been utilized for segmentation. In the sequel, the support vector machine classifier categorizes a set of textural features extracted from the separated regions of interest. Based on their numerical experiments, the authors concluded that the proposed algorithm can efficiently detect a cancer-affected cell and its corresponding stage such as initial, middle, or final. Furthermore, in case no cancer-affected cell is found in the input image then it checks the probability of lung cancer.

In more recent works, Madani [22] focused on the detection of abnormalities in chest X-ray images, having available only a fairly small size dataset of annotated images. Their proposed method deals with both problems of labeled data scarcity and data domain overfitting, by utilizing Generative Adversarial Networks (GAN) in a SSL architecture. In general, GAN utilize two networks: a generator which seeks to create as realistic images as possible and a discriminator which seeks to distinguish between real data and generated data. Next, these networks are involved in a minimax game to find the Nash equilibrium between them. Based on their experiments, the author concluded that the annotation effort is reduced considerably to achieve similar performance through supervised training techniques.

In [2], Livieris et al. evaluated the classification efficacy of an ensemble SSL algorithm, called CST-Voting, for CXR classification of tuberculosis. The proposed algorithm combines the individual predictions of three efficient self-labeled algorithms i.e., Co-training, Self-training and Tri-training using a simple majority voting methodology. The authors presented some interesting results, illustrating the efficiency of the proposed algorithm against several classical algorithms. Additionally, their experiments lead them to the conclusion that reliable and robust prediction models could be developed utilizing a few labeled and many unlabeled data. In [16] the authors extended the previous work and proposed DTCo algorithm for the classification of X-rays. The proposed ensemble algorithm exploits the predictions of Democratic-Co learning, Tri-training and Co-Bagging utilizing a maximum-probability voting scheme. Along this line, Livieris et al. [17] proposed EnSL algorithm which constitutes a generalized scheme of the previous works. More specifically, EnSL constitutes a majority voting scheme of *N* self-labeled algorithms. Their preliminary numerical experiments demonstrated that robust classification models could be developed by the adaptation of ensemble methodologies in the SSL framework.

Guan and Huang [23] considered the problem of multi-label thorax disease classification on chest X-ray images by proposing a Category-wise Residual Attention Learning (CRAL) framework. CRAL predicts the presence of multiple pathologies in a class-specific attentive view, aiming to suppress the obstacles of irrelevant classes by endowing small weights to the corresponding feature representation while the same time, the relevant features would be strengthened by assigning larger weights. More analytically, their proposed framework consists of two modules: feature embedding module and attention learning module. The feature embedding module learns high-level features using a neural network classifier while the attention learning module focuses on exploring the assignment scheme of different categories. Based on their numerical experiments, the authors stated that their proposed methodology constitutes a new state of the art.

### **3. A New Weighted Voting Ensemble Self-Labeled Algorithm**

In this section, we present a detailed description of the proposed self-labeled algorithm, which is based on an ensemble philosophy, entitled Weighed voting Ensemble Self-Labeled (WvEnSL) algorithm.

Generally, the generation of an ensemble of classifiers considers mainly two steps: *Selection* and *Combination*. The selection of the component classifiers is considered essential for the efficiency of the ensemble and the key point for its efficacy is based on their diversity and their accuracy; while the combination of the individual classifiers' predictions takes place through several techniques with different philosophy [24,25].

By taking these into consideration, the proposed algorithm is based on the idea of selecting a set *C* = ( *C*1, *C*2, ... , *C N*) of *N* self-labeled classifiers by applying different algorithms (with heterogeneous model representations) to a single dataset and the combination of their individual predictions takes place through a new weighted voting methodology. It is worth noticing that weighted voting is a commonly used strategy for combining predictions in pairwise classification in which the classifiers are not treated equally. Each classifier is evaluated on a evaluation set *D* and associated with a coefficient (weight), usually proportional to its classification accuracy.

Let us consider a dataset *D* with *M* classes, which is utilized for the evaluation of each component classifier. More specifically, the performance of each classifier *Ci*, with *i* = 1, 2, ... , *N* is evaluated on *D* and a *N* × *M* matrix *W* is defined, as follows

$$W = \begin{bmatrix} w\_{1,1} & w\_{1,2} & \dots & w\_{1,M} \\ w\_{2,1} & w\_{2,2} & \dots & w\_{2,M} \\ \vdots & \vdots & \ddots & \vdots \\ w\_{N,1} & w\_{N,2} & \dots & w\_{N,M} \end{bmatrix}$$

where each element *wi*,*j* is defined by

$$w\_{i,j} = \frac{2p\_j^{(\mathcal{C}\_i)}}{|D\_j| + p\_j^{(\mathcal{C}\_i)} + q\_j^{(\mathcal{C}\_i)}},\tag{1}$$

where *Dj* is the set of instances of the dataset belonging to the class *j*, *p*(*Ci*) *j* are the number of correct predictions of classifier *Ci* on *Dj* and *q*(*Ci*) *j* are the number of incorrect predictions of *Ci* that an instance belongs to class *j*. Clearly, each weight *wi*,*j* is the *F*1-score of classifier *Ci* for *j* class [26]. The rationale behind (1) is to measure the efficiency of each classifier, relative to each class *j* of the evaluation set *D*.

Subsequently, the class *y*ˆ of each unknown instance *x* in the test set is computed by

$$\mathcal{Y} = \underset{j}{\text{arg}\,\text{max}} \sum\_{i=1}^{N} w\_{i,j} \,\chi\_{A}(\mathbb{C}\_{i}(\mathbf{x}) = j) .$$

where function arg max returns the value of index corresponding to the largest value from array, *A* = {1, 2, ... , *M*} is the set of unique class labels and *χA* is the characteristic function which takes into account the prediction *j* ∈ *A* of a classifier *Ci* on an instance *x* and creates a vector in which the *j* coordinate takes a value of one and the rest take the value of zero. At this point, it is worth mentioning that in our implementation we selected to evaluate the performance of each classifier of the ensemble on the initial training labeled set *L*.

A high-level description of the proposed framework is presented in Algorithm 1 which consists of three phases: *Training*, *Evaluation* and *Weighted-Voting Prediction*. In the Training phase, the self-labeled algorithms, which constitute the ensemble are trained utilizing the same labeled *L* and unlabeled dataset *U* (Steps 1–3). Subsequently, in the Evaluation phase, the trained classifiers are evaluated using the training set *L* in order to calculate the weight matrix *W* (Steps 4–9). Finally, in the Weighted-Voting Prediction phase, the final hypothesis on each unlabeled example *x* of the test set combines the individual predictions of self-labeled algorithms utilizing the proposed weighted voting methodology (Steps 10–15). An overview of the proposed WvEnSL is depicted in Figure 1.



**Output:** The labels of instances in the testing set.

/\* Phase I: Training \*/ **Step 1**: **for** *i* = 1 **to** *N* **do Step 2**: Train *Ci* using the labeled *L* and the unlabeled dataset *U*.**Step 3**: **end for**

/\* Phase II: Evaluation \*/ **Step 4**: **for** *i* = 1 **to** *N* **do Step 5**: Apply *Ci* on the evaluation set *D*. **Algorithm 1:** *Cont.*

> **Step 6**: **for** *j* = 1 **to** *M* **do Step 7**: Calculate the weight *wi*,*j* = <sup>2</sup>*p*(*Ci*) *j* |*Dj*| + *p*(*Ci*) *j* + *q*(*Ci*) *j* . **Step 8**: **end for Step 9**: **end for** /\* Phase III: Weighted-Voting Prediction \*/ **Step 10**: **for each** *x* ∈ *T* **do Step 11**: **for** *i* = 1 **to** *N* **do Step 12**: Apply classifier *Ci* on *x*. **Step 13**: **end for Step 14**: Predict the label *y*ˆ of *x* using *y* ˆ = arg max *jN* ∑ *i*=1*wi*,*j <sup>χ</sup>A*(*Ci*(*x*) = *j*).

**Step 15**: **end for**

**Figure 1.** WvEnSL framework.
