1. Introduction
Medical ultrasonography has turned out to be the preferred imaging technique for many illnesses due to the fact of its simplicity, speed, and safety [
1,
2,
3,
4,
5]. Two-dimensional gray-scale ultrasound and coloration Doppler ultrasound has been broadly used in the diagnostic tasks of ovarian tumors. Doctors can first perceive the benign and malignant nature of tumors. With the non-stop development and improvement of deep learning [
6,
7], AI, as a riding pressure for intelligent healthcare, has acquired a massive range of achievements in tasks such as clinical image classification and segmentation [
8,
9,
10,
11]. The accuracy of the model additionally relies upon the quality of the dataset [
12,
13]. There is exceedingly little research on the current use of AI for lesion recognition and segmentation of ovarian tumor diseases. In addition, the effectiveness of AI in processing ovarian-tumor images depends on a large-scale AI dataset. Zhao et al. [
14] proposed an ovarian-tumor ultrasound image dataset for lesion classification and segmentation. The dataset consists of a complete of 1469 2D ovarian ultrasound images which are divided into eight categories according to tumor types. The giant majority of the images in the dataset contain annotated symbols, which are overwhelmingly allotted to inside the lesion.
Nevertheless, hidden but crucial trouble has been recognized in practice: most 2D ovarian-tumor ultrasound images incorporate extra symbols. Actually, in clinical operations where ovarian ultrasound images are acquired, the physician will mark the location, size, and border of the tumor in the ovarian ultrasound image, and observe where the lesion is positioned (left or right ovary). Due to equipment factors and the clinical practice environments, the artificially marked component of these aids to image recognition (symbols such as fingers, crosses, dashes, and letters) cannot be separated from the original image. This phenomenon is also widespread in different medical fields [
15,
16,
17,
18]. The ideal situation would be to train and test deep learning models using clean images without any symbols in lesion areas.
We observe that these symbols are centered in ovarian tumor lesions, which negatively affects the training of the model to a positive extent, causing the network to focus more on the symbols in the lesions, which in turn reduces the recognition accuracy of ovarian tumors in the clean images and the segmentation accuracy of the lesions. The different types of images in this paper are shown in
Figure 1. The original images with symbols were used as the training set, and two different test sets of clean images and original images with symbols were used as a way to discover the impact of symbols on the segmentation accuracy of the model.
Figure 2 and
Figure A1 exhibit the effects of our experiments. Fewer training epochs are required to segment more accurate lesion regions in images with symbols, and the segmented regions targeted the yellow line roughly. The clean images, on the other hand, required more epochs and reached lower segmentation accuracy. The results show that the symbols in the images provide additional information to the model enhancing the accuracy of segmentation, which is unrealistic in clinical practice. There is little research on this issue, and it is certainly inappropriate to use the marked ovarian-tumor ultrasound images directly to train the segmentation model. Thus, it is critical for the corrupted areas of the images to be painted, so it is significant for healthcare professionals to use clean images for the artificial intelligence-aided diagnosis of ovarian tumors.
Currently, image inpainting in medical images is in the process of booming and has a lot of potential for development. Existing methods are primarily divided into traditional methods and deep learning-based methods. Traditional methods make use of patch-based or diffusion-based methods, the core of which is to use the redundancy of the image itself to fill in the missing areas with low-level texture features of the image. The following four methods are historically used for inpainting: interpolation [
20], non-local means [
21], diffusion techniques [
22], and texture-dependent synthesis [
23]. However, ordinary methods cannot learn the deep semantic features of medical images frequently and can not attain excellent results.
Deep-learning-based methods use convolutional neural networks to extract and learn high-level semantic features in the image to guide the model to fill the missing parts. Inspired by EdgeConnect [
24], Wang et al. [
25] migrated the method using edge information to medical images. This paper details the study of these methods and use of an attention mechanism, a pyramid-structured generator, to enforce the inpainting of thyroid ultrasound images, which automatically detects and reconstructs the cross symbols in ultrasound images. However, this method has some limitations: the cross symbols in the thyroid ultrasound images used in this approach are small and few, and the effect is negative for ultrasound images containing many large symbols; the detected cross symbols are labeled with rectangular boxes, and this approach does not apply to different symbols with irregular shapes; the real background is covered by these symbols, and the restoration areas have no real background, so how to guide the generative adversarial network for training and evaluation, in this case, is a very necessary issue. Wei et al. [
26] proposed the MagGAN for face-attribute editing. The MagGAN does this by introducing a novel mask-guided adjustment strategy to encourage the affected regions of each target attribute to be positioned in the generator, using the corresponding attributes of the face (eyes, nose, mouth, etc.). The method is applied to the face-attribute editing task, which requires segmentation of the face’s attributes, which is different from our task. However, the motivation of making the results more realistic by bootstrapping the model is similar.
In addition, various attention mechanisms have been proposed and are broadly used in image processing. These attention mechanisms have been steadily utilized in the image inpainting task. Zeng et al. [
27] expanded on this by proposing a pyramidal structure for contextual attention. Yi et al. [
28] proposed a contextual residual aggregation of attention for high-resolution images. The spatial attention mechanism was utilized to solve this problem. To acquire results with a clear structure and texture, the Shift-Net model proposed by Yan et al. [
29] replaced the fully detailed layer in the upsampling process with a shift-connected layer, through which the features in the background region are shifted to fill in the holes.
Due to the above issues, in this paper, a one-stage generation model based on GANs is proposed, which swaps the regular convolution with fast Fourier convolutions to enhance the image-wide acceptance field of the model and includes a channel attention mechanism to minimize the model’s focus on symbols to fill the holes using effective features. To the best of our knowledge, we are the first to accomplish image inpainting on 2D ovarian-tumor ultrasound images with large and irregular masks, and our approach achieves more convincing results than others.
Our contributions are as follows:
We refined 1469 2D ovarian-tumor ultrasound images for irregular symbols and obtained binary masks to establish a 2D ovarian-tumor ultrasound image inpainting dataset.
We introduced fast Fourier convolution to enhance the model’s global perceptual field and a channel attention mechanism to enhance the model’s attention to significant features, and the model uses global features and significant channel features to fill the holes.
Our model achieved better results both subjectively and objectively compared to existing models while for the first time performing image inpainting without clean images.
We use the restoration images for segmentation training, which significantly enhances the accuracy of the classification and segmentation of clean images.
The rest of the paper is organized as follows:
Section 2 describes our dataset and model in detail. The associated experiments and results are detailed in
Section 3. The conclusions are introduced in
Section 4.
4. Conclusions
In this paper, we proposed a 2D ovarian-tumor ultrasound image inpainting dataset to investigate the effect of prevalent symbols in images on ovarian-lesion segmentation. Based on this image inpainting dataset, we proposed a 2D ovarian-tumor ultrasound image inpainting model based on fast Fourier convolution and a channel attention mechanism. Labeled images are used as a priori information to guide the model to focus on features in the non-symbolic regions of the images, and fast Fourier convolution is used to extend the receptive field of the model to make the texture and structure of the inpainting images more realistic and the boundaries smoother. Our model outperformed existing methods in both qualitative and quantitative comparisons. It received the highest scores in all three metrics, LPIPS, FID, and SSIM, which proves the effectiveness of our model. We used the inpainting images for training and validation with Unet and PSPnet models, which appreciably enhanced the accuracy of lesion segmentation in clean images. This additionally demonstrates the great significance of our study for computer-aided diagnosis of ovarian tumors.
Our study in this paper did not currently use ground truth of lesion segmentation in the dataset, which may further improve the similarity of lesion boundaries in inpainted images. In future work, we will do further exploration on how to apply the edge information of the lesion to the model to make the boundaries more similar to those in the original image and extend our model to other types of medical images—CT, MRI, etc.