Special DXA protocols permit the quantification of bone mineralization, fat mass, and fat distribution. Improving the image quality and shortening the scanning time allows a full-body scan to be performed and to quantify its components. Contemporary DXA images of the full-body, containing information about the quantity and spatial distribution of body components, opens the field for research on the new use of this old method.
This work is part of a larger project that addresses the following research question: how deep learning methods could improve the diagnostic knowledge extraction from whole-body, low resolution, dual-energy X-ray absorptiometry data, potentially related to a lower radiation dose for patients. Related diagnostic knowledge could be represented in many forms. For example, it could be organized as a set of parameters that include bone density (e.g., bone mineral density (BMD), bone mineral content (BMC)), fat density, muscle density, osteoporosis scores, bone age, etc. The scanners used for testing allow a digital version of the image to be obtained, but it is only used to check correct positioning of the patient setting and correct placement of the regions of interest (ROI). The bone density results are computed by specialized software based on the X-ray attenuation data that the detector receives. For BMD measurement, it is important to eliminate soft tissue input, a measurement based on bone alone. This is possible by scanning at two different X-ray photon energies, followed by mathematical manipulation of the recorded signal; different attenuations for bones and soft tissues at different photon energies are used. Furthermore, the radiation dose for DXA is considerably lower than the radiation dose for conventional radiography. Many other information and dependencies can be extracted from such images that are not used in determining BMD. Many researchers focus on extracting information from DXA and X-ray images.
In [
2] the authors presented a study that described the comparison of the use of DXA images together with X-ray images to assess bone age in children using the Bland and Altman statistical methods. A high agreement was obtained between the bone age assessments performed by DXA and the X-ray. Other studies also considered the problem of using statistical methods, this time involving only BMD measurements. Castillo et al. in [
3] investigated whether there were correlations between age and gender and BMD, and whether you could use this solution in forensic anthropology. The research was carried out on 70 subjects and showed bone mineral density to be a useful technique for gender and age data in forensic anthropology. Neural networks can also be used to analyze X-ray images. Navega et al. in [
4] used neural networks to create a method to estimate age at death. The data sample consisted of 100 femora of female individuals. The authors used a modified general regression neural network as a model for age prediction. Using a nonprobabilistic artificial neural network, the mean absolute difference achieved between the real and estimated age ranged from 9.19 to 12.03 years, depending on the variables used in the modelling. The use of classic X-ray images is also applicable. Lee et al. in [
5] proposed using hand X-ray images to predict bone age. The images were preprocessed (region of interest (ROI) segmentation, standardization and preprocessing of input radiographs) and analyzed by the neural network, GoogLeNet. The solution was validated using 8325 data samples. The accuracy achieved by the classification model was within 2 years (<98%) and within 1 year (<90%) for a group of women and men. A similar solution was presented by Iglovikov et al. in [
6]. Bone age prediction was performed using a database provided by the Radiological Society of North America (RSNA). Initially, the images were processed: hand masks were extracted and all other objects were removed (using positive mining utilizing U-Net architecture), key points were detected, and the hands were placed on the images in one orientation and size (network inspired by the VGG model). Bone age assessment was performed using two types of convolutional neural network (CNN), regression and classification, as well as image division into three areas (whole hand, carpal bones, metacarpals, and proximal phalanges). The smallest mean absolute error (MAE) was obtained for the ensemble of classification and regression models and the ensemble of regional models. This solution had the best accuracy with an MAE equal to 6.10 months. Human age can also be estimated on the basis of chest images using CNNs. In solution [
7], Karargyris et al. created such a solution based on the DenseNet-169 model, which was pretrained on ImageNet. They used the Chest X-ray dataset from the National Institutes of Health, which includes 10,000 chest images from 30,000 different people. They obtained the following results: for a margin of ±4 years, a sensitivity equal to 0.6745 and for ±9 years the value was −0.9441. Similarly, Xue et al. [
8] used X-rays to classify gender. The dataset contained 2066 chest X-ray images (1097 women and 969 men). The images were preprocessed and then the features were extracted from them using the following neural networks: AlexNet, VggNet, GoogLeNet, and ResNet (for comparison). The classification was carried out using a support vector machine (SVM) and a random forest. The best result was obtained for the VGGNet-16 + SVM classifier, which had an accuracy of 86.6% with 5 times cross-validation. The VGGNet-16 is a version of VGG family of models (Visual Geometry Group, from the Department of Science and Engineering of Oxford University) with 16 layers (13 convolutional layers and 3 dense layers). Spine X-ray images can also be used to classify gender. Xue et al. in [
9] developed a sequential CNN model, trained from scratch using spine images. An accuracy of 88.9% was obtained for the original spine images. Data analysis using the DenseNet-121 model pre-overtrained in the ImageNet data range gave an accuracy of 99% for cervical images and 98% for lumbar images. In [
10] Marouf et al. proposed a hybrid methodology for gender classification and bone age estimation, using the trained VGG-16 model and the RSNA dataset. They achieved an accuracy of 99% for the gender classification and for age classification they achieved an MAD 0.50 years and an RMS of 0.67 years. In [
11] Mumtaz et al. proposed to examine a method based on CNNs and left-hand radiographs of children to determine gender. Using class activation mapping (CAM) they showed that the lower part of the palm around the wrists (wrist) was more important in determining the gender of the child. The model accuracy was 98%, given the unfinished skeleton of the children. Age can also be determined from whole-body CT scans. In [
12], Nguyen et al. presented a solution based on deep hierarchical features. They used data from an anonymous hospital that contained 813 whole-body CT bone images. They tested the fine-tuned VGGNet, ResNet, and GoogLeNet models and a modified model, which was based on VGGNet for automatic BAA. The best result was achieved for modified VGGNet with hierarchical features, which had an MAE of 4.856 months. Castillo et al. proposed a model for predicting bone age based on the VGG16 with an attention mapping-focused architecture. Using the RSNA dataset [
13] they achieved an MAE equal to 11.45 months for both genders. Liu et al. evaluated a proposed two-stage bone age assessment network with ranking learning using the same dataset. They achieved an average MAE of 6.05 months [
14]. Salim and Hamza introduced a two-stage approach for bone age assessment using segmentation and ridge regression [
15]. The MAE of the proposed solution was equal to 6.38 months.