Leverage Effective Deep Learning Searching Method for Forensic Age Estimation

Zhang, Zhi-Yong; Yan, Chun-Xia; Min, Qiao-Mei; Zhang, Yu-Xiang; Jing, Wen-Fan; Hou, Wen-Xuan; Pan, Ke-Yang

doi:10.3390/bioengineering11070674

Open AccessArticle

Leverage Effective Deep Learning Searching Method for Forensic Age Estimation

by

Zhi-Yong Zhang

^1,2,3,

Chun-Xia Yan

^2,*,

Qiao-Mei Min

³,

Yu-Xiang Zhang

²

,

Wen-Fan Jing

⁴,

Wen-Xuan Hou

⁵ and

Ke-Yang Pan

⁵

¹

Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi’an Jiaotong University, Xi’an 710004, China

²

College of Forensic Medicine, Xi’an Jiaotong University Health Science Center, Xi’an 710061, China

³

Department of Orthodontics, The Affiliated Stomatological Hospital of Xi’an Jiaotong University Health Science Center, Xi’an 710004, China

⁴

Department of Radiology, The Affiliated Stomatological Hospital of Xi’an Jiaotong University Health Science Center, Xi’an 710004, China

⁵

School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China

^*

Author to whom correspondence should be addressed.

Bioengineering 2024, 11(7), 674; https://doi.org/10.3390/bioengineering11070674

Submission received: 6 May 2024 / Revised: 10 June 2024 / Accepted: 14 June 2024 / Published: 2 July 2024

(This article belongs to the Special Issue New Sight of Intelligent Algorithm Model and Medical Device in Bioengineering: Updates and Direction)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Dental age estimation is extensively employed in forensic medicine practice. However, the accuracy of conventional methods fails to satisfy the need for precision, particularly when estimating the age of adults. Herein, we propose an approach for age estimation utilizing orthopantomograms (OPGs). We propose a new dental dataset comprising OPGs of 27,957 individuals (16,383 females and 11,574 males), covering an age range from newborn to 93 years. The age annotations were meticulously verified using ID card details. Considering the distinct nature of dental data, we analyzed various neural network components to accurately estimate age, such as optimal network depth, convolution kernel size, multi-branch architecture, and early layer feature reuse. Building upon the exploration of distinctive characteristics, we further employed the widely recognized method to identify models for dental age prediction. Consequently, we discovered two sets of models: one exhibiting superior performance, and the other being lightweight. The proposed approaches, namely AGENet and AGE-SPOS, demonstrated remarkable superiority and effectiveness in our experimental results. The proposed models, AGENet and AGE-SPOS, showed exceptional effectiveness in our experiments. AGENet outperformed other CNN models significantly by achieving outstanding results. Compared to Inception-v4, with the mean absolute error (MAE) of 1.70 and 20.46 B FLOPs, our AGENet reduced the FLOPs by 2.7×. The lightweight model, AGE-SPOS, achieved an MAE of 1.80 years with only 0.95 B FLOPs, surpassing MobileNetV2 by 0.18 years while utilizing fewer computational operations. In summary, we employed an effective DNN searching method for forensic age estimation, and our methodology and findings hold significant implications for age estimation with oral imaging.

Keywords:

age estimation; deep neural network (DNN); orthopantomograms (OPGs); Neural Architecture Search (NAS)

Graphical Abstract

1. Introduction

Identification of victims and suspects is crucial in a variety of scenarios, such as mass disasters, as well as criminal and civil cases [1,2,3]. The teeth play a key role in this process as they are incredibly durable and able to withstand a wide range of extreme conditions [4,5]. This resilience makes them invaluable for accurately identifying individuals in challenging circumstances. Each tooth has unique characteristics such as shape, coloration, wear pattern, pathology, and position. These characteristics differ from person to person and are important for identification. The arrangement of teeth in the oral cavity is also unique to each individual, making them important for identification purposes. Dental characteristics are personalized traits that can help in identifying individuals [6].

Age serves as a vital aspect of an individual’s identity, acting as a crucial verification in forensic procedures. The accurate estimation of age for person holds paramount importance in forensic investigations. Age estimation relies on the physiological changes that occur during the aging process within the human body. Teeth and bones are widely utilized anatomical structures for predicting age accurately. Skeletal maturation and degeneration give rise to morphological alterations that can be used to estimate age reliably. Evaluating the size, shape, and degree epiphyseal ossification in bones allows for age determination. Various skeletal elements, for example, the clavicle [7], ilium, pubis [8], knee [9,10], and foot [11,12], are extensively utilized in the study of bone age estimation, with a special focus on the hand-wrist region [13,14,15]. Conversely, dental age estimation is viable method employed in forensic practice.

The estimation of dental age relies heavily on aging changes and can be classified into three main groups: histological, formative and degenerative changes. Formative changes involve the finalization of formation, the emergence of the crown, and the growth of the roots. Degenerative changes consist of root resorption, abrasion, secondary dentin deposition, root transparency, and deposition of cementum. These formative characteristics, along with degenerative and histological markers, undergo alterations throughout the processes of human growth and development. Consequently, these characteristics can be utilized for dental age estimation. Previous studies have employed to predict age; but these require tooth sections and are not applicable in vivo. In cooperation to such approaches, radiographic methods offer simplicity, reproducibility and non-invasiveness, making them indispensable tools for accurate age estimation [16].

Dental age is commonly determined by forensic professionals through the analysis of developmental stages [17,18] or by assessing the level of apical closure [19]. Age estimation can be achieved using digital panoramic radiographs, intraoral periapical radiographs, and computer tomography images. However, it is important to note that radiological analysis is inherently subjective, and many of the features employed are primarily applicable to pediatric and adolescent populations.

In recent years, DNN has been widely used in objective age assessment. This technological innovation has significantly improved the processing capacity and accuracy of current image applications. The technology architecture based on DNN has greatly advanced the field of image analysis through its strong characteristic learning and representation capabilities. The application of the DNN architecture to estimate the age according to OPGs in this particular area makes end-to-end age estimation possible and is expected to maximize the potential of the existing DNN architecture. Especially in the research of dental age estimation, deep learning techniques have become the preferred analytical method due to their accuracy and speed advantages. Compared to traditional methods, deep learning not only enhances the precision of dental age estimation but also significantly reduces the human resources required for forensic investigations. Currently, researchers widely utilize neural networks for age estimation analysis. Some studies [20] have shown that deep learning methods outperform traditional methods in handling complex image recognition tasks, particularly when dealing with different types of X-ray images such as OPG, MRI, and CBCT, where deep learning demonstrates stronger adaptability. Moreover, Barkmn [21] explored the impact of different super-parameters, model complexity, and dataset size on model performance. They used the EfficientNet-B4, DenseNet-201, and MobileNet V3 models to cross-authenticate OPG datasets and change batch size and sample volume. The results showed that EfficientNet-B4 performed best at the batch size of 160, while MobileNetV3 peaked at the batch size of 160. In addition, as the sample volume decreased, the model performance remained relatively excellent, highlighting the importance of super-parameter optimization. The above studies provide a point to consider elements for model selection and optimization, but the model design of neural network often relies on man-made experience and expertise and requires considerable time and effort to test and optimize. There is also a lack of an effective DNN search method for forensic age estimation for a neural network architecture that optimizes specific tasks.

In this research, a dataset of dental radiographic images from a specialized stomatological hospital was compiled, featuring labeled OPGs of 27,957 subjects. The age range of the subjects in the dataset spans from 1 to 93 years old, and the median age is 27. The accuracy of the age annotations was verified by cross-referencing with their corresponding ID card. It is worth mentioning that the dental dataset displays unique features.

Moreover, our study focuses on leveraging DNN architectures for end-to-end age estimation using OPGs. To maximize the potential of existing DNN architectures, we thoroughly analyzed various characteristics of these models for accurate age estimation in dental applications. Specifically, we investigated the effectiveness of elements, DNN depth, multi-branch architectures, and convolutional kernel size.

We utilized a Neural Architecture Search (NAS) approach to investigate appropriate deep neural network (DNN) structures for predicting age. Inspired by the characteristics of existing DNN models, we aimed to discover efficient DNN structure that offer optimal accuracy and performance. Our search space design primarily encompasses four aspects: (1) multi-branch architectures; (2) asymmetric convolution kernels; (3) network depth variations and (4) early layer feature reuse. Ultimately, our NAS-derived DNN architectures exhibited superior performance compared with the baseline models.

2. Methods

2.1. Exploration of Neural Network Elements for OPG

An X-ray image exhibits distinct characteristics compared to traditional RGB images. To date, there is a lack of specific studies on the design of DNN models for X-ray images. Although all images share similar texture, they do not possess irregular patterns. To investigate the efficacy of different DNN architecture for age estimation, we initially tried various existing models: Inception-v [22], incorporating a split-transform-merge concept, which utilizes shortcut structures and reuses previous feature layers; VGG16 [23], with a vertical connection structure; MobileNetV2 [24], a lightweight network with a structure; ShuffleNetV2 [25], employing shuffle operations; and EfficientNet [26], discovered through NAS. These models comprehensively encompass the existing rules of neural network design. Additionally, we conducted a comparative analysis on the X-ray image dataset, evaluating ResNet [27], ResNeXt [28], and DenseNet [29] with different depths of network architecture to investigate the influence of DNN depth on forecast accuracy.

The estimation of age for all the previously mentioned DNN models was calculated using mathematical expectations. Inspired by DEX [30], we firstly established the age probability distribution for every image. Then, we used mathematical expectations to generate the final age estimation, as shown in Equation (1):

y = \sum_{k} p_{k} \cdot l_{k}

(1)

where the p_k is the estimation probability that the input image belongs to label l_k_, the value of label is the real age of all the dental image. The k is the index of age for all the dental image, it ranges from 0~75.

According to research on characteristic DNN, four rules have been identified for designing effective age estimation network models for X-ray images. Firstly, it is important to incorporate multi-branch architectures, such as the split-transform-merge structure, to efficiently extract features from X-ray images. Additionally, the utilization of asymmetric convolution kernels, such as 7 × 1, 1 × 7, 3 × 1 and 1 × 3 convolutions, plays a crucial role in successful feature extraction. Moreover, determining the optimal depth for DNN architectures is critical in achieving a balance between comprehensive feature extraction and preventing overfitting. Lastly, it was determined that reusing early layer features in the DNN architecture may actually hinder feature extraction. These guidelines can help improve the effectiveness of age estimation models for X-ray images, ultimately leading to more accurate and reliable results in this field.

Rule 1 delves into the complex structure of multiple branches: architectures with multiple branches, including inception design and group convolution, have proven to be effective in extracting features, as seen in the achievements of models like ResNeXt and Inceptionv4. Our research underscores the importance of multi-branch architecture for OPG datasets. X-ray images use each individual tooth as a basic component, making them suitable for feature extraction using multi-branch structures. The use of various convolutional or pooling operations in a single layer enables efficient feature extraction and minimizes redundancy in the convolutional layer.

Rule 2 delves into the consideration of which convolution kernels are most suitable for inclusion in the DNN for analyzing X-ray images. Due to the unique attributes of X-ray images, the distribution of features can differ greatly in terms of length and width locations. Thus, the utilization of asymmetric convolutions, such as the 1 × 7 convolution, proves to be effective in accurately extracting dental features.

Rule 3 delves into the connection between network depth and the resolution of OPG. This exploration is particularly important when dealing with OPGs, as they are typically less complex than natural images. It is vital to find the right balance in terms of network depth to avoid the risk of overfitting the model. To achieve this, we employed a DNN search methodology to determine the appropriate model depth while keeping the width of the DNN architecture fixed for OPGs.

Rule 4 explores the reuse of features in the initial layers of DNN architectures. Analysis of the image patterns in the OPGs reveals a notable similarity, suggesting that the low-level features in these images also exhibit resemblances. Therefore, repurposing early layer features could potentially introduce more noise.

According to the four rules, we propose a novel DNN model that is more effective for the OPGs dataset. Subsequently, in the next section, the NAS technique is utilized to automatically generate efficient DNN architectures for OPGs.

2.2. Model Search Based on Proposed Rules for OPGs

A framework is introduced in this study to effectively search a deep neural DNN architecture for object proposal generation, as illustrated in Figure 1. The framework consists of four main steps: (1) designing the search space, (2) exploring novel architectures, (3) pre-training the model on the ImageNet dataset, and (4) fine-tuning the pre-trained OPG datasets. The four proposed design rules were utilized to effectively design a search space for the purpose of obtaining an architecture that is capable of achieving optimal performance. Subsequently, the network underwent pre-training on the ImageNet dataset. Finally, the DNN architecture was retrained using the comprehensive training dataset of OPGs, leveraging the weights and parameters obtained from pre-training. Following these sequential steps, we successfully derived the final DNN architecture specifically tailored for dental X-ray datasets, enabling accurate age estimation on inference images as depicted in Figure 2.

The exploration of model architecture is crucial in obtaining an appropriate deep neural network (DNN) model for aging estimation on OPGs. Our choice of utilizing PC-DARTS [31] for our search framework was based on its high efficiency in searching and memory utilization. This framework is capable of exploring a multi-branch structure, aligning with the proposed rule 1. PC-DARTS builds upon the renowned differentiable NAS pipeline DARTS [32]. Stacked cells make up the search network, with each cell being a directed acyclic graph (DAG) in which nodes represent feature maps and edges represent mixed operations. With

x_{i}

representing the

i t h

node, and E

(i, j)

representing the edge between

x_{i}

and

x_{i}

(

i

<

j

), the feature transformation in E

(i, j)

can be obtained, represented by

f_{i, j}

, as shown in Equation (2):

f_{i, j} (x_{i}) = \sum_{o^{'} \in O} \frac{\exp (α_{o^{'}}^{(i, j)})}{\sum_{o^{'} \in O} \exp (α_{o^{'}}^{(i, j)})} o (x_{i})

(2)

In the operation space

O

, every element represents a candidate function O(·), and the architecture parameters

α_{o}^{(i, j)}

must be optimized. During the search process, the network weights w and architecture weights

α

are adjusted alternately through gradient descent. Once fully optimized, the operation associated with the architecture weight is selected directly.

One drawback of DARTS is its inefficiency in memory utilization, due to the inability of the DARTS framework to facilitate direct search within our OPGs dataset. In order to tackle this problem, PC-DARTS utilizes partial connections between channels, mixing operations for some portions of the edge while keeping others unchanged as identity mappings. Additionally, a channel-sampling mask

S_{i, j}

is defined to selectively choose or mask specific channels.

f_{i, j}^{P C} (x_{i}; S_{i, j}) = \sum_{o \in O} \frac{\exp (α_{o}^{(i, j)})}{\sum_{o^{'} \in O} \exp (α_{o^{'}}^{(i, j)})} o (S_{i, j} * x_{i}) + (1 - S_{i, j}) * x_{i}

(3)

Partial channel connection in the searching pipeline can lead to instability. In order to address this issue, edge normalization with additional architecture weights, denoted as

β

, is employed to compute

x_{i}

:

x_{j} = \sum_{i < j} \frac{\exp (β^{(i, j)})}{\sum_{i^{'} < j} \exp (β^{(i^{'}, j)})} * f_{i, j}^{P C} (x_{i}; S_{i, j})

(4)

The search is conducted directly on the OPGs dataset rather than using a proxy dataset like CIFAR-10 or ImageNet, taking into consideration the variations in feature distribution among different datasets. Furthermore, we expanded our search scope to include asymmetric convolution and group convolution in accordance with Rule 2. The search scope

O

encompassed a total of 11 operators:

3 × 3 standard conv

3 × 3 separable conv

5 × 5 separable conv

3 × 3 group conv (groups = 9)

5 × 5 group conv (groups = 9)

1 × 3 then 3 × 1 group conv (groups = 9)

1 × 7 then 7 × 1 group conv (groups = 9)

3 × 3 max pooling

3 × 3 average pooling

identity

zero (no connection)

Each group convolution is followed by a pointwise convolution, similar to the concept of separable convolutions. The technique introduced in PC-DARTS involves utilizing every separable convolution twice within the network structure. The network consists of 8 cells, with 5 regular cells and 3 reduction cells. Specifically, the 3 reduction cells were positioned at the 1st, 3rd, and 6th out of the total 8 positions. The specifics of the explored network will be discussed in the following section.

The cellular topology in the PC-DARTS framework exhibits complexity, rendering it unsuitable for resource-constrained scenarios such as deployment on hardware platforms. Therefore, in this article, we additionally explore lightweight models based on the SPOS [33] to strike a balance between accuracy and inference speed.

The initial search space in the SPOS framework is constructed based on ShuffleNetv2; all candidate blocks in this original search space incorporate a feature reuse structure that conflicts with Rule 4. To address this issue and align with the requirements of Nature journal [34], we have redesigned space using MobileNetV2, another lightweight network. The original MobileNetV2 block lacks the inclusion of asymmetric convolutions; hence, we introduced a parallel asymmetric convolution block as a potential alternative. The configuration of this parallel asymmetrical convolutional unit is depicted on the right-hand side in Figure 3. It presents two key benefits:

(1): The depth of the parallel asymmetric convolution block is equivalent to that of the original MobileNetV2 block, ensuring seamless adherence to Rule 3.
(2): Let $C_{i n}$ and $C_{o u t}$ denote the number of input channels and output channels, respectively, in a MobileNetV2 block. Here, $E X P 2$ represents the expand ratios and $k$ represents the kernel size. The resulting output feature map has dimensions $h \times w$ (for convenience; we assume that the number of output channels of all layers in the map are h × w, which does not affect our analysis and conclusion);then, the FLOPs of the block $F_{m}$ is:

$F_{m} = ({E X P 2 \times (C}_{i n}^{2} + C_{i n} C_{o u t} + C_{i n} \times k^{2}) \times h \times w$

(5)

For our proposed parallel asymmetric convolution block, the FLOPs of the block

{F_{m}}^{'}

is:

{F_{m}}^{'} = ({E X P 2 \times (C}_{i n}^{2} + C_{i n} C_{o u t} + C_{i n} \times 7) \times h \times w

\frac{{F_{m}}^{'}}{F_{m}} = \frac{({E X P 2 \times (C}_{i n}^{2} + C_{i n} C_{o u t} + C_{i n} \times 7)) \times h \times w}{({E X P 2 \times (C}_{i n}^{2} + C_{i n} C_{o u t} + C_{i n} \times k^{2})) \times h \times w} = \frac{C_{i n} + C_{o u t} + 7}{C_{i n} + C_{o u t} + k^{2}}

(6)

In this paper, we set

k = 3

or

k = 5

, so

\frac{{F_{m}}^{'}}{F_{m}} < 1

, and the parallel asymmetric convolution block had less FLOPs than the MobuleNet2 block.

The search space under the SPOS framework is illustrated in Figure 4, where each layer consisted of three candidate blocks. In accordance with Rule 1, every candidate block incorporated at least one depth-wise convolution, resulting in a multi-branch structure. As for Rule 2, the parallel asymmetric convolution block employed both 1 × 7 and 7 × 1 convolutions, which were characterized by their asymmetrical nature. For Rule 3, we set the number of stacking blocks to be 16, with each block having a depth of 3. Considering the presence of an initial layer (the first convolutional layer) and a final classification layer, the network’s depth precisely amounted to 50 layers. Regarding Rule 4, all candidate blocks were reused structures. In general, the search space within the SPOS framework adhered to all four rules. The searched DNN model are shown in Section 4.

3. Experiments

3.1. Dataset of X-ray

We have already collected a large amount of OPGs, encompassing labeled images, comprising 16,385 females and 11,573 males. The age reached up to 93 years old with a median age of 27 years (Figure 5). Cross-referencing with the information on the ID cards guaranteed the accuracy of the age labels. The majority of images had a resolution of 1536 × 3292 pixels. The age distribution of our age dataset is depicted in Figure 5.

3.2. Experimental Settings

In order to obtain high-resolution features for age estimation, all input images were resized to 384 × 384. Image augmentation techniques such as random horizontal flipping, scaling, and translation were applied during the training stage. In the testing phase, both the original test image and its mirrored counterpart were fed into the network, and the average error of the two was calculated as the final error measurement.

During the searching stage of the PC-DARTS framework, we followed the search settings and hyper-parameters based on the original PC-DARTS model. However, we made two specific adjustments to our approach. Firstly, we partitioned our training dataset into two subsets, with 60% of images used for training and the remaining 40% reserved for architecture hyper-parameters. Secondly, we set the batch size at 16 and the initial learning rate at 0.1. In contrast, for the SPOS framework, we modified the training process by training the super-net for 20,000 mini batches, utilizing the momentum SGD optimizer with a momentum of 0.9 and an initial learning rate of 0.1. Following the inceptionv3 model, we adjusted the batch sizes to 4 × 10⁵ and 64. Additionally, we implemented the label smoothing strategy to optimize the performance of our algorithm. In our study, we applied Label Smoothing Normalization (LSR), which is commonly used in text classification tasks. Like L2 and dropout, it is a regularization method that adds noise to the label to constrain the model. The purpose is to prevent the model from predicting labels too confidently during training, prevent overfitting, and improve the model’s generalization ability.

The networks underwent initial pre-training on the ImageNet dataset and optimization through Adam with a batch size of 16. A momentum value of 0.9 was employed, while the learning rate was initialized to 0.0001, and training continued for 100 epochs. The weight decay was set to 1 × 10⁴ and 4 × 10⁵ in the PC-DARTS framework and the SPOS framework, respectively. All the DNN networks were implemented using PyTorch on a GTX 1080Ti GPU.

3.3. Experimental Results for DNN Architecture Searching

In all experiments, we selected mean absolute error (MAE) as the primary evaluation metric. MAE was computed by averaging the absolute differences between predicted ages and actual ages.

M A E = \sum_{k = 1}^{N} | \hat{l_{k}} - l_{k} | / N

(7)

where l_k represents the true age of the test image k,

\hat{l_{k}}

stands for the predicted age, and N denotes the total count of test images.

Figure 6a shows the typical cell and the reduction cell acquired through PC-DARTS, as illustrated in Figure 6b, respectively. During the evaluation stage, a total of eight cells existed, comprising five normal cells and three reduction cells, with each type of cell sharing an identical architecture. We initialized the channel count to 36 for initial model AGENet-Small, and subsequently adjusted it to 54 for our second model AGENet-Large.

The architecture of the searched AGE-SPOS is depicted in Figure 7, where out of the total 16 blocks, half of them (i.e., 8 blocks) were parallel asymmetric convolution blocks, aligning with Rule 1 stated in Section 2.1 This finding further confirms the crucial role played by asymmetric convolution in feature extraction on OPGs datasets.

AGE-SPOS demonstrated superior performance models and exhibited exceptional lightweight characteristics. Specifically, AGE-SPOS (1.0×), out all models with FLOPs, did not exceed 12.59 B (13.25 times the FLOPs of AGENet (1.0×)). Compared to Inception-v4, although the MAE increased by 0.10 years, the FLOPs were reduced by a factor of 21.54. Similar results can be observed when comparing AGE-SPOS with ResNeXt, where the MAE increased by 0.06 years, but the FLOPs decreased by a factor of 13.25. AGENet-Small achieved a reduction in MAE of 0.14 year compared to the lightweight and auto-searched EfficientNet-B0, while our FLOPs were reduced by 22.1%. More detailed results can be found in Table 1 and Figure 8.

3.4. Analysis for Our Rules

Based on our comprehensive analysis of the models we have researched and the results obtained, we conducted an in-depth examination of our four rules.

For Rule 1, compared with ResNeXt and Inception-v4, our searched model predominantly employed solutions, further validating this rule.

For Rule 2, in addition to our AGENet and AGE-SPOS, Inception-v4 displayed superior performance and distinguished itself as the sole network incorporating asymmetric convolutions. This assertion is further supported by the configuration of our search cells.

For Rule 3, it is notable that the MAE reached its minimum at a network depth of 50. Subsequently, as the network depth surpassed this threshold, there was a discernible escalation in the MAE. Moreover, when compared to ResNeXt-50, ResNeXt-101 displayed an increase in the MAE of 0.08; similarly, DenseNet-201 showed an increase in the MAE of 0.1 compared to DenseNet-121. The specific network depth was contingent upon the utilized model. Based on our empirical investigations, we recommend a depth ranging from 50 to 90 layers.

For Rule 4, the performance of ShuffleNetV2 was significantly inferior to that of other lightweight models, including ShuffleNetV2, MobileNetV2, EfficientNet and our searched AGE-SPOS. This can be attributed to the reuse of features in ShuffleNetV2. The results obtained from DenseNet further validate this conclusion. Despite its significantly higher FLOPs and parameters compared to lightweight models, DenseNet121 achieved an MAE of only 2.72. It is crucial to note that age estimation using OPGs depends on unique dental characteristics.

In this paper, the Squeeze-Excitation (SE) module was incorporated into AGE-SPOS to investigate its impact on the accuracy of AGE-SPOS. The corresponding results are presented in Table 2. The addition of the SE module to AGE-SPOS (1.5×) demonstrated a further reduction in the network’s MAE by 0.02 years. Incorporating the SE module into AGE-SPOS (1.0×) resulted in an equivalent MAE compared to its pre-SE module counterpart, whereas incorporating the module into AGE-SPOS (0.5×) yielded a significantly higher MAE of 2.61 years, surpassing the original MAE prior to incorporating the SE module. This study aimed to analyze the role of the SE module in weighting different channels within a network. When the number of channels in the network was large (i.e., wide), there tended to be increased redundancy, resulting in certain channels playing a relatively minor role in feature extraction. Currently, employing the SE module to assign weights enhanced outcomes. Nevertheless, in scenarios where the network possessed a limited number of channels (i.e., narrow network), there existed reduced redundancy within the network. At this time, the utilization of SE module for channel weighting had limited impact, leading to decreased network accuracy due to increased training difficulty. Despite its low FLOPs count and minimal computational load on the network, the SE module could not operate in parallel with the convolutional layer during actual reasoning process. After analyzing the output feature map of the preceding layer, the SE module could be inferred based on the aforementioned output feature map. Considering that the primary objective of this research was to identify a network model that effectively incorporates discrimination accuracy and inference efficiency, it is not advisable to incorporate the SE module into the AGE-SPOS explored in this study.

4. Discussion

Dental radiographs serve as a fundamental tool in oral clinical practice, playing a pivotal role in the diagnosis of dental diseases. This journal work is an extension version from our conference work, which was published in International Joint Conference on Neural Networks and the conference paper named Exploring Effective DNN Models for Forensic Age Estimation based on Panoramic Radiograph Images [35]. The journal article adds new content for the age estimation compared with conference paper.

For the task of age identification in dental X-ray images, this study builds upon research and employs a research strategy of exploring search. Firstly, it critically analyzes the structure of a network based on four existing hypotheses to assess their validity. Subsequently, these four hypotheses were utilized as prior knowledge to design the search space and explore a lightweight network structure.

The search space was enhanced by incorporating a module based on parallel asymmetric convolution, which addressed the computational requirements of lightweight networks. When considering the three key indicators of discriminant accuracy, parameter quantity, and FLOPs, the network structure proposed in this study demonstrated significantly superior performance compared to existing structures (excluding those previously searched by our research group). Moreover, it exhibited substantially higher discriminant accuracy than other lightweight network structures of the same type. The significance of the lightweight network lies primarily in the field of engineering, where it demonstrated superior performance while maintaining comparable accuracy, albeit its application is relatively limited.

Compared to the previously searched HSCNet by the research group (as presented in conference work), although the discrimination accuracy of the network proposed in this paper was slightly lower, it exhibited a significantly reduced number of FLOPs and parameters, approximately one-fourth that of HSCNet. Moreover, its architectural design is more conducive to efficient GPU reasoning and hardware platform utilization, rendering it particularly suitable for scenarios demanding high-speed discrimination capabilities.

The AGE-SPOS presented in this paper incorporates both the discrimination accuracy and inference speed of the network. When the MAE reached 1.80 years, its FLOPs was below 1B (with an input feature map size of 384 × 384). In terms of MAE/FLOPs/parameter count, AGE-SPOS significantly outperformed the existing network architectures and surpassed other lightweight networks of the same type in these three metrics. Compared to the HSCNet proposed by the research group, AGE-SPOS exhibited superior advantages in terms of both speed and network size. The present research further investigates the impact of the SE module on the accuracy and concludes that the SE module is not conducive to achieving accurate results.

To further explore the characteristics of the dental X-ray dataset, we developed the Cumulative Score (

C S

) as a method of measurement.

C S

is defined as:

C S (j) = \frac{N_{e} \leq j}{N} \times 100 %

(8)

where

N_{e} \leq j

is the number of test images on which the age estimation makes an absolute error no higher than

j

year. When the error tolerance was set to a low value, significant disparities emerged among different models. Our model excelled compared to others, especially when the age error tolerance was set below 2 years. For instance, when the error tolerance was at 2 years, our model achieved a score of 0.80. In contrast, when the error tolerance was decreased to 1 year, our model still outperformed others with a score exceeding 70%, while other models struggled to reach this threshold. Additionally, it is worth noting that as the age error tolerance increased, the scores of all models started to converge, highlighting that the model’s superiority is most apparent in lower error tolerance ranges.

The performance of each age group was analyzed, as shown in Figure 9, which shows that the MAE in the 0–30 age range performed well in predicting within 2 years. The performance of each age group was analyzed, as depicted in Figure 9, revealing a favorable estimation effect within the age range of 0–30 years (MAE falls within 2 years). Notably, the most accurate predictions were achieved for individuals aged between 0 and 10 years old, with an MAE of approximately 1 year. The deep learning approach for age estimation has demonstrated superior performance compared to traditional methods during adolescence, and it also exhibited remarkable accuracy in estimating ages during adulthood. Although there were slight deviations in the results regarding the estimation of dental knowledge with respect to aging, such as inconsistent mean absolute error (MAE) values across different age ranges, it is important to note that these MAE values were derived solely from our limited dataset. In order to ensure accuracy, future research should encompass a broader range of datasets and conduct additional experiments. It is crucial to acknowledge that the current findings heavily rely on the available datasets.

Furthermore, we generated a heat-map visualization of the OPGs, as depicted in Figure 2. Upon careful examination of the heat maps, it became evident that the network effectively discerned distinct regions within the teeth. Notably, during adulthood age prediction, crucial factors predominantly resided in specific dental components such as the root area. The molar, upper, and lower jaw regions emerged as areas for identification, a fact that has eluded effective exploration through conventional methodologies.

5. Conclusions

In the article, we employed NAS to acquire a model that achieves the current optimal mean absolute error of 1.64. In comparison with Inception-v4, which has a computational complexity of 20.46 billion floating-point operations per second (FLOPs), our AGENet-Large module demonstrates a remarkable reduction in FLOPs by a factor of 2.7. Furthermore, we have successfully developed a series of lightweight models that can achieve an impressive MAE of 1.80 years while utilizing less than 1 billion FLOPs.

Author Contributions

Conceptualization, Z.-Y.Z.; methodology, Z.-Y.Z.; software, W.-X.H.; validation, Y.-X.Z.; formal analysis, Z.-Y.Z.; investigation, Z.-Y.Z.; resources, C.-X.Y.; data curation, Q.-M.M. and W.-F.J.; writing—original draft preparation, Z.-Y.Z.; writing—review and editing, Z.-Y.Z.; visualization, K.-Y.P.; supervision, C.-X.Y.; project administration, Z.-Y.Z.; funding acquisition, Z.-Y.Z. and C.-X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Natural Science Research Foundation of Shaanxi Province (2020JM-413) for Z.-Y. Zhang, and by Opening Research Fund from Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi’an Jiaotong University (grant number: 2021LHM-KFKT006).

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the National Drug Clinical Trial Institution of Xi’an Jiaotong University Stomatological Hospital (protocol code: KY-QT-20240023, approval date: 2 July 2024).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Hinchliffe, J. Forensic Odontology, Part 2. Major Disasters. Br. Dent. J. 2011, 210, 269–274. [Google Scholar] [CrossRef]
Schuller-Götzburg, P.; Suchanek, J. Forensic Odontologists Successfully Identify Tsunami Victims in Phuket, Thailand. Forensic Sci. Int. 2007, 171, 204–207. [Google Scholar] [CrossRef]
Wedel, V.; Found, G.; Nusse, G. A 37-Year-Old Cold Case Identification Using Novel and Collaborative Methods. J. Forensic Identif. 2015, 65, 733–749. [Google Scholar]
Sweet, D. Sweet Forensic Dental Identification. Forensic Sci. Int. 2010, 201, 3–4. [Google Scholar] [CrossRef]
Saxena, S.; Sharma, P.; Gupta, N. Experimental Studies of Forensic Odontology to Aid in the Identification Process. J. Forensic Dent. Sci. 2010, 2, 69–76. [Google Scholar] [CrossRef]
Krishan, K.; Kanchan, T.; Garg, A.K. Dental Evidence in Forensic Identification—An Overview, Methodology and Present Status. Open Dent. J. 2015, 9, 250–256. [Google Scholar] [CrossRef]
Gonsior, M.; Ramsthaler, F.; Gehl, A.; Verhoff, M.A. Morphology as a Cause for Different Classification of the Ossification Stage of the Medial Clavicular Epiphysis by Ultrasound, Computed Tomography, and Macroscopy. Int. J. Legal. Med. 2013, 127, 1013–1021. [Google Scholar] [CrossRef]
Savall, F.; Hérin, F.; Peyron, P.A.; Rougé, D.; Baccino, E.; Saint-Martin, P.; Telmon, N. Age Estimation at Death Using Pubic Bone Analysis of a Virtual Reference Sample. Int. J. Legal. Med. 2018, 132, 609–615. [Google Scholar] [CrossRef]
Dedouit, F.; Auriol, J.; Rousseau, H.; Rougé, D.; Crubézy, E.; Telmon, N. Age Assessment by Magnetic Resonance Imaging of the Knee: A Preliminary Study. Forensic Sci. Int. 2012, 217, 232.e1–232.e7. [Google Scholar] [CrossRef]
Vieth, V.; Schulz, R.; Heindel, W.; Pfeiffer, H.; Buerke, B.; Schmeling, A.; Ottow, C. Forensic Age Assessment by 3.0T MRI of the Knee: Proposal of a New MRI Classification of Ossification Stages. Eur. Radiol. 2018, 28, 3255–3262. [Google Scholar] [CrossRef]
Hackman, L.; Davies, C.M.; Black, S. Age Estimation Using Foot Radiographs from a Modern Scottish Population. J. Forensic Sci. 2013, 58 (Suppl. 1), S146–S150. [Google Scholar] [CrossRef]
Whitaker, J.M.; Rousseau, L.; Williams, T.; Rowan, R.A.; Hartwig, W.C. Scoring System for Estimating Age in the Foot Skeleton. Am. J. Phys. Anthropol. 2002, 118, 385–392. [Google Scholar] [CrossRef]
Cameriere, R.; De Luca, S.; Biagi, R.; Cingolani, M.; Farronato, G.; Ferrante, L. Accuracy of Three Age Estimation Methods in Children by Measurements of Developing Teeth and Carpals and Epiphyses of the Ulna and Radius. J. Forensic Sci. 2012, 57, 1263–1270. [Google Scholar] [CrossRef]
Murray, R.O. Assessment of Skeletal Maturity and Prediction of Adult Height (TW2 Method). Proc. R Soc. Med. 1976, 69, 542. [Google Scholar]
Bayer, L.M. Radiographic atlas of skeletal development of the hand and wrist. Calif. Med. 1959, 91, 53. [Google Scholar]
Manigandan, T.; Sumathy, C.; Elumalai, M.; Sathasivasubramanian, S.; Kannan, A. Forensic Radiology in Dentistry. J. Pharm. Bioallied Sci. 2015, 7, S260–S264. [Google Scholar] [CrossRef]
Mincer, H.H.; Harris, E.F.; Berryman, H.E. The A.B.F.O. Study of Third Molar Development and Its Use as an Estimator of Chronological Age. J. Forensic Sci. 1993, 38, 379–390. [Google Scholar] [CrossRef]
Moorrees, C.F.; Fanning, E.A.; Hunt, E.E., Jr. Age variation of formation stages for ten permanent teeth. J. Dent. Res. 1963, 42, 1490–1502. [Google Scholar] [CrossRef]
Manjrekar, S.; Deshpande, S.; Katge, F.; Jain, R.; Ghorpade, T. Age Estimation in Children by the Measurement of Open Apices in Teeth: A Study in the Western Indian Population. Int. J. Dent. 2022, 2022, 9513501. [Google Scholar] [CrossRef]
Mohamed, E.G.; Redondo, R.P.D.; Koura, A.; EL-Mofty, M.S.; Kayed, M. Dental Age Estimation Using Deep Learning: A Comparative Survey. Computation 2023, 11, 18. [Google Scholar] [CrossRef]
Büyükçakır, B.; Bertels, J.; Claes, P.; Vandermeulen, D.; de Tobel, J.; Thevissen, P.W. OPG-Based Dental Age Estimation Using a Data-Technical Exploration of Deep Learning Techniques. J. Forensic Sci. 2024, 69, 919–931. [Google Scholar] [CrossRef] [PubMed]
Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. arXiv 2016, arXiv:1602.07261. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2018; Volume 11218, pp. 122–138. ISBN 978-3-030-01263-2. [Google Scholar]
Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2020, arXiv:1905.11946. [Google Scholar]
Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5987–5995. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
Geng, X.; Zhou, Z.-H.; Zhang, Y.; Li, G.; Dai, H. Learning from Facial Aging Patterns for Automatic Age Estimation. In Proceedings of the 14th ACM International Conference on Multimedia, Santa Barbara, CA, USA, 23–27 October 2006; Association for Computing Machinery: New York, NY, USA, 2006; pp. 307–316. [Google Scholar]
Xu, Y.; Xie, L.; Zhang, X.; Chen, X.; Qi, G.-J.; Tian, Q.; Xiong, H. PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search. arXiv 2020, arXiv:1907.05737. [Google Scholar]
Liu, H.; Simonyan, K.; Yang, Y. DARTS: Differentiable Architecture Search. arXiv 2019, arXiv:1806.09055. [Google Scholar]
Guo, Z.; Zhang, X.; Mu, H.; Heng, W.; Liu, Z.; Wei, Y.; Sun, J. Single Path One-Shot Neural Architecture Search with Uniform Sampling. arXiv 2020, arXiv:1904.00420. [Google Scholar]
Hannun, A.Y.; Rajpurkar, P.; Haghpanahi, M.; Tison, G.H.; Bourn, C.; Turakhia, M.P.; Ng, A.Y. Cardiologist-Level Arrhythmia Detection and Classification in Ambulatory Electrocardiograms Using a Deep Neural Network. Nat. Med. 2019, 25, 65–69. [Google Scholar] [CrossRef]
Hou, W.; Liu, L.; Gao, J.; Zhu, A.; Pan, K.; Sun, H.; Zheng, N. Exploring Effective DNN Models for Forensic Age Estimation Based on Panoramic Radiograph Images. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–8. [Google Scholar]

Figure 1. A searching framework for DNN architecture with X-ray image images. The framework consists of four main steps: (1) designing the search space, (2) exploring novel architectures, (3) pre-training the model on the ImageNet dataset, and (4) fine-tuning the pre-trained OPG datasets. The four proposed design rules were utilized to effectively design a search space for the purpose of obtaining an architecture that is capable of achieving optimal performance. Subsequently, the network underwent pre-training on the ImageNet dataset. Finally, the DNN architecture was retrained using the comprehensive training dataset of OPGs, leveraging the weights and parameters obtained from pre-training.

Figure 2. Age estimation based on the new dental X-ray images. By inputting the dental X-ray dataset into the DNN architecture, we were able to make accurate age estimations; the sum in the rightmost block is the average age for all images.

Figure 3. The configuration of this parallel asymmetrical convolutional unit. (a) The step is l for the search space; (b) the step is 2 for the search space.

Figure 4. The search space under the SPOS framework. (a) Choice blocks with stride = l; (b) choice blocks with stride = 2.

Figure 5. Age distribution of our age dataset. (a) Distribution of age; (b) training dataset for DNN (left) and test dataset for DNN (right).

Figure 6. The typical cell and the reduction cell in the network. (a) The normal cell searched based on the input image; (b) the reduction cell searched based on the input images.

Figure 7. The architecture of AGE-SPOS with our searching method.

Figure 8. The accuracy and FLOPs for different DNN networks.

Figure 9. The MAE of different ages.

Table 1. The accuracy of AGE-SPOS with our searching method.

Model	FLOPs/s × 10⁻⁹	Params/s × 10⁻⁶	MAE/Year
Human experts	-	-	>2
ResNeXt50	12.59	23.14	1.74
Inception-v4	20.46	41.27	1.70
MobileNetV2 (1.0×)	0.96	2.33	1.98
EfficientNet-B0	1.22	4.11	1.94
AGENet-Small	3.54	7.53	1.70
AGENet-Large	7.49	16.89	1.64
HSCNet	3.99	7.78	1.64
AGE-SPOS (0.5×)	0.29	0.72	2.40
AGE-SPOS (1.0×)	0.95	2.17	1.80
AGE-SPOS (1.5×)	1.97	4.44	1.77

Table 2. The effectiveness of the SE module for different AGE-SPOS configurations.

DNN Architecture	MAE/Year
DNN Architecture	Without SE Module	With SE Module
AGE-SPOS (0.5×)	2.40	2.61 (+0.21)
AGE-SPOS (1.0×)	1.80	1.80 (−0.00)
AGE-SPOS (1.5×)	1.77	1.75 (−0.02)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Z.-Y.; Yan, C.-X.; Min, Q.-M.; Zhang, Y.-X.; Jing, W.-F.; Hou, W.-X.; Pan, K.-Y. Leverage Effective Deep Learning Searching Method for Forensic Age Estimation. Bioengineering 2024, 11, 674. https://doi.org/10.3390/bioengineering11070674

AMA Style

Zhang Z-Y, Yan C-X, Min Q-M, Zhang Y-X, Jing W-F, Hou W-X, Pan K-Y. Leverage Effective Deep Learning Searching Method for Forensic Age Estimation. Bioengineering. 2024; 11(7):674. https://doi.org/10.3390/bioengineering11070674

Chicago/Turabian Style

Zhang, Zhi-Yong, Chun-Xia Yan, Qiao-Mei Min, Yu-Xiang Zhang, Wen-Fan Jing, Wen-Xuan Hou, and Ke-Yang Pan. 2024. "Leverage Effective Deep Learning Searching Method for Forensic Age Estimation" Bioengineering 11, no. 7: 674. https://doi.org/10.3390/bioengineering11070674

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Leverage Effective Deep Learning Searching Method for Forensic Age Estimation

Abstract

1. Introduction

2. Methods

2.1. Exploration of Neural Network Elements for OPG

2.2. Model Search Based on Proposed Rules for OPGs

3. Experiments

3.1. Dataset of X-ray

3.2. Experimental Settings

3.3. Experimental Results for DNN Architecture Searching

3.4. Analysis for Our Rules

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI