**1. Introduction**

COVID-19, declared as a global pandemic by the World Health Organization (WHO) in March 2020, mainly affects the respiratory tissues [1]. Chest X-ray imaging plays an important role in supporting the screening and early detection of the disease. In this context, radiologists are asked to prioritize the use of portable chest X-ray devices that are important to reduce the risk of cross contamination [2]. However, these devices provide a lower quality and a lower level of detail in comparison with fixed machinery [3]. In this critical scenario, computer-aided diagnosis (CAD) systems can be very useful for clinical practice. During recent years, in the scope of biomedical imaging, these diagnostic systems were usually developed using computer vision techniques as well as machine learning techniques and, specifically, deep learning strategies, which have increased their importance. However, in the context of supervised learning, deep learning models require a great amount of labeled data to be trained.

Regarding medical imaging, data scarcity is an aspect to take into account as, in many occasions, it critically affects the amount of labeled data. One of the ways to overcome data scarcity is to generate synthetic images with several network architectures, as is the case of many variants of Generative Adversarial Networks (GANs) [4]. One example of this kind of GAN model is the CycleGAN, a model that is able to translate images from a certain scenario to another different scenario.

**Citation:** Morís, D.I.; de Moura, J.; Novo, J.; Ortega, M. Portable Chest X-ray Synthetic Image Generation for the COVID-19 Screening. *Eng. Proc.* **2021**, *7*, 6. https://doi.org/10.3390/ engproc2021007006


Academic Editors: Marco A. González, Javier Pereira and Manuel G. Penedo

Published: 28 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

In this work, due to the low availability of samples that show COVID-19 affectation, we present novel approaches to artificially increase the size of a portable chest X-ray image dataset to diagnose COVID-19, combining three different and complementary CycleGAN architectures to perform an oversampling using a non-supervised strategy that can be performed without paired data.

#### **2. Methodology**

Thus, the presented methodology is divided in 2 different parts. The first part performs the synthetic image generation. The second part uses the novel set of generated images in order to augment the dimensionality of the original dataset, which is proven in a COVID-19 screening scenario.

#### *2.1. Approaches for Data Augmentation*

In order to increase the size of the original chest X-ray dataset, we considered 3 different complementary scenarios, which correspond to all the possible combinations given the classes of the dataset. For the first scenario, normal vs. pathological, normal samples are translated to their pathological representation and vice versa. For the second scenario, normal vs. COVID-19, normal samples are converted to their hypothetical representation showing COVID-19 affectation and vice versa, and for the third scenario, pathological vs. COVID-19, we perform the same task as in the previous cases but to convert pathological samples to COVID-19 and vice versa. It is important to remark that all the images from the original dataset are used to train the CycleGAN model [5].

#### *2.2. Approaches for Screening Tasks*

For this second stage, we assess the degree of separability among the generated images and the suitability of the novel set of generated synthetic images, with the oversampled dataset. We used a Dense Convolutional Network Architecture (DenseNet) [6] model, pretrained on the ImageNet dataset, with the same training details as stated in [7,8] due to their suitability to this particular problem.

#### **3. Results and Conclusions**

The chest X-ray image dataset was provided by the Radiology Service of the Complexo Hospitalario Universitario de A Coruña (CHUAC) and is composed of 600 patients that were divided into 3 different classes [9], having 200 normal cases (i.e., from patients without evidence of pulmonary pathologies), 200 pathological cases (i.e., from patients with pulmonary pathologies other than COVID-19) and 200 COVID-19 genuine cases.

In order to demonstrate the separability and the suitability of the generated synthetic images, we conducted 4 different experiments, where the first 3 correspond to the separability among the generated images and the fourth experiment corresponds to the suitability of the novel set of generated images, evaluating the screening using the oversampled dataset. The first 3 experiments demonstrate that there is a proper separability among generated images for the 3 possible scenarios. For the fourth experiment, the model obtained a global accuracy of 0.9250 for the test. Additionally, Figure 1 shows the performance of the model for the test set for all the 4 experiments, obtaining remarkable correct classification ratios in every case.

**Figure 1.** Confusion matrices for the four conducted experiments. (**a**) 1st experiment (Healthy vs. Pathological); (**b**) 2nd experiment (Healthy vs. COVID-19); (**c**) 3rd experiment (Pathological vs. COVID-19); (**d**) 4th experiment (Healthy and Pathological vs. COVID-19).

**Author Contributions:** Conceptualization, D.I.M., J.d.M., J.N. and M.O.; methodology, D.I.M., J.d.M., J.N. and M.O.; software, D.I.M. and J.d.M.; validation, J.d.M., J.N. and M.O.; investigation, J.N. and M.O.; data curation, J.N. and M.O.; writing—original draft preparation, D.I.M.; writing—review and editing, D.I.M., J.d.M., J.N. and M.O.; visualization, D.I.M.; supervision, J.d.M., J.N. and M.O.; project administration, J.N. and M.O.; funding acquisition, M.O. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Instituto de Salud Carlos III, Government of Spain, DTS18/00136 research project; Ministerio de Ciencia e Innovación y Universidades, Government of Spain, RTI2018-095894-B-I00 research project; Ministerio de Ciencia e Innovación, Government of Spain through the research project with reference PID2019-108435RB-I00; Consellería de Cultura, Educación e Universidade, Xunta de Galicia through the predoctoral and postdoctoral grant contracts ref. ED481A 2021/196 and ED481B 2021/059, respectively; and Grupos de Referencia Competitiva, grant ref. ED431C 2020/24; Axencia Galega de Innovación (GAIN), Xunta de Galicia, grant ref. IN845D 2020/38; CITIC, Centro de Investigación de Galicia ref. ED431G 2019/01, receives financial support from Consellería de Educación, Universidade e Formación Profesional, Xunta de Galicia, through the ERDF (80%) and Secretaría Xeral de Universidades (20%).

**Institutional Review Board Statement:** The study was approved by the Ethics Review Board y Data Management Technical Commission of Galician Health Ministry for High Impact studies with protocol code 2020-007.

**Conflicts of Interest:** The authors declare no conflict of interest.

