1. Introduction
Ovarian reserve, defined as the total number of ovarian follicles, indicates the quality and quantity of the primordial follicular pool in the ovaries [
1]. Patients with infertility have shown a correlation between predictors of functional ovarian reserves and ovarian responses to pregnancy outcomes [
2]. Antral follicle count (AFC) and size, obtained using TransVaginal UltraSound (TVUS) images, are non-invasive imaging biomarkers used to assess and quantify ovarian reserve [
2,
3].
The primary aim of ovarian reserve monitoring is to measure the number and size of ovarian follicles and the number of antral follicles, which are, on average, 2–10 mm in diameter [
4]. Follicle size is measured by taking the average of each follicle’s two largest orthogonal diameters [
5]. There are limitations to manually estimating the size and count of follicles; the process is time-consuming, inconsistent [
4], and highly variable depending upon the actual shape of primarily non-spherical follicles [
3]. An accurate, automated method to segment ovaries and follicles and count the follicles could optimize the clinical flow and reduce subjectivity.
Developing an automated solution for ovary and follicle segmentation incorporates numerous challenges.
Figure 1 shows three examples of US images of ovary and follicles. Quantifying ultrasound images ensures reproducibility and reliability [
6,
7]. Ultrasound imaging artifacts impede the performance of deep learning-based segmentation methods. Blurred ambiguous boundaries further compound challenges in delineating tissue boundaries and the presence of acoustic shadowing [
8]. Many image processing and computer vision-based methods are suggested to overcome these challenges that involve geometric features [
9] and watershed [
10]. Active contours-based [
11] approaches have been used to segment the ovary and follicles. Traditional ovary and follicle monitoring methods have been frequently explored with large and distinctly visible follicles [
3,
12]. Boundary ambiguity is noticeable in ovarian and follicular images. The traditional methods have some limitations, such as watershed or thresholding approaches generating discontinuities and variances of intensity in the ovarian ultrasound images. Their slow speed creates challenges to adopt in actual practice clinical settings.
Convolutional neural networks (CNNs) have shown substantial performance, and accuracy advancements over conventional methods [
13]. With the great success of CNNs, multiple popular segmentation methods have been developed such as FCN [
14], U-Net [
15], SegNet [
16], Attention-UNet [
17], DeepLabv3+ [
18], ERFNet [
19], and BiseNetv2 [
20] that segment the objects or anatomies. These methods achieved state-of-the-art results for various semantic segmentation tasks.
Recently, many deep learning-based methods have been developed for analyzing medical images [
21,
22]. Specifically, U-Net-based models have achieved great success with medical image segmentation [
23,
24,
25]. Meng et al. [
26] proposed an instinctive deep learning-based contour regression model for biomedical image segmentation. The authors aggregated multi-level and multi-stage networks to regress the contour coordinates in an end-to-end manner rather than pixel-wise dense predictions. The authors used this method to segment the fetal head in ultrasound images and the optic disc and optic cup in color fundus images. Valanarasu et al. [
27] presented a network architecture called KiU-Net, which projects data onto higher dimensions and picks finer details when compared to a standard U-Net. The suggested method addressed the performance failures when segmenting smaller anatomical structures with blurred, noisy boundaries. The authors performed the brain anatomy segmentation from 2D ultrasound (US). Singh et al. [
28] proposed an automated solution to segment the breast lesion from the US images. The recommended method used generative adversarial learning (GAN) networks. The introduced method efficiently extracts spatial features such as texture, edge, shape, intensity, and global information. The authors used an attention mechanism that highlights the most relevant features and ignores the background ones. However, the GAN-based method has limitations due to its computational complexity and fails to delineate if the lesion shape is not complete. Further, Yang et al. [
29] incorporated the multi-directional recurrent neural network (RNN) with a customized CNN to extract spatial intensity concurrencies to eliminate boundary ambiguities. The author employed semantic segmentation methods in prenatal ultrasound volumes that potentially encourage fetal health monitoring.
Various deep learning-based methods have been used to detect and segment ovary, and antral follicles [
8,
30]. Li, Haoming et al. [
8] proposed an ovary and follicle segmentation model called CR-UNet, consisting of spatially recurrent neural networks incorporated into a standard U-Net. The recommended network has limitations in correctly delineating and detecting the follicles that are joined with each other [
30]. Gupta et al. [
31] developed a deep learning-based framework for ovarian volume computation that utilizes 3D US volumes and the axial orientation. The authors evaluated their methods on 20 3D ovarian US volumes that enhanced the grade of the 3D rendering of the ovary and addressed the issue of combined follicles in segmentation. Yang et al. [
32] introduced ovary and follicles segmentation using the contrastive rendering (C-Rend) framework. The authors employed the semi-supervised learning approach with C-Rend leveraging unlabeled 3D ultrasound for better performance. However, this study has some limitations during inference due to its hyperparameter default value which might not be the most suitable setting for each 3D US.
The clinical need to monitor the smaller follicles automatically and precisely, such as antral follicles (follicles that are 2–8 mm in average diameter) [
4] could not be met using the current AI segmentation technologies, due to some limitations, such as deep learning models overfitting and imaging artifacts. Therefore, the main aim of this paper is to develop an automated method for efficient ovary and follicles segmentation in ovarian TVUS images to facilitate measuring the size of the follicle.
Figure 2 shows the schematic view of our proposed framework. The framework incorporates three stages, i.e., ovary segmentation, follicle segmentation, and follicle counting. We designed a new segmentation method that replaces the standard 2D convolution layer with a harmonic convolution. In contrast, [
33] harmonic convolution combines the learned kernels with predefined filters for feature learning. This weighted combination reduces overfitting and computational complexity. The proposed HaTU-Net method effectively extracts the features that allow precise segmentation of the ovary and follicles from the US images. Moreover, we developed a new attention block that helps to improve the segmentation performance by encouraging the feature discriminability between the pixels and ignoring US imaging artifacts. In summary, our major contributions are in four folds:
We propose a segmentation network called HaTU-Net to segment ovaries and follicles from TVUS images.
We propose using harmonic convolution [
33] to replace the standard convolutional filter. The input image is first decomposed using the discrete cosine transform (DCT); these transformed signals are combined using learned weights.
We developed harmonic attention (HA) block to improve feature discriminability between the target and background pixels in the segmentation stage. The HA block encourages the features by avoiding the artifacts, and support for the HaTU-Net leads to improved segmentation results.
Our experimental results confirm HaTU-Net has shown significant improvement compared to the various state-of-the-art segmentation methods (U-Net [
15], AttentionU-Net [
17], R2U-Net [
34], U-Net++ [
35], and DeepLabv3+ [
18]).
The remainder of this paper is organized as follows:
Section 2 describes the dataset and methodology.
Section 3 explains our experimental results and highlights the limitations of the work.
Section 4 completes our study and suggests some future lines of research.