1. Introduction
Abnormal plantar temperature in diabetic patients may be an early sign indicating the appearance of foot disorders [
1]. These complications, which include peripheral arterial disease, neuropathy, and infection among others, are associated with substantial costs and loss of quality of life [
1,
2]. Early stage detection of diabetic foot disorders can avoid or delay the appearance of further complications with personalized care and treatment. Diabetic patients with peripheral neuropathy or ulcers reportedly present skin hardness compared to normal foot tissue [
3] and a strong correlation is observed between this hardness and the severity of the neuropathy [
4]. Soft tissue firmness is often measured by palpation which is a subjective method dependent on the proficiency level of the expert. Following this line, an experimental robotic palpation has been recently introduced to measure the elastic moduli of diabetic patients [
5]. However, although the assessment is fast, its use has only been tested in a phantom and a couple of healthy subjects. Screening methods based on foot temperature were long ago identified as leading technologies in the field [
1]. Thus, preventive care by regular monitoring plantar temperature limits the incidence of disabling conditions such as foot ulcers and the related lower-limb amputations required in acute cases [
1,
2,
6,
7]. The most common and clinically effective monitoring protocol for diabetic foot ulcers consists in comparing the temperature of six contralaterally-matched plantar locations daily [
7]. This can be a time-consuming procedure for self-monitoring and lack of adherence to the monitoring protocols is often observed [
1]. Therefore, a fast and precise screening protocol is vital for this purpose in which plantar temperature is measured.
Thermography is a marker-free, non-invasive, safe, accessible, contactless and easily repeatable technique that has been used covering military, space and civilian applications, from astrophysics to medicine [
8,
9]. In the medical field, it has been employed for diagnosis and detection of soft tissue pathologies based on the temperature measurement. For instance, thermography has been successfully utilized for diabetic foot disorders [
2,
6,
10] and, among other applications, for intraoperative functional imaging with high spatial resolution and localization of superficial tumours, including brain tumors [
11] and its size estimation from the temperature distribution [
12], quantitative estimation of the cortical perfusion [
13] and the visualization of neural activity [
14,
15], as well as in a sport scenario to assess the efficacy of treatment in myofascial pain syndrome and software validation before and after physical activity [
16,
17]. Therefore, infrared cameras have become a supplementary diagnostic tool for the medical personnel since the temperature of the epidermis can be measured in a non-invasive manner. However, several technical issues should be addressed before such a tool could be integrated into standard diabetic care protocols.
First, high-end infrared cameras are considerably expensive as the ones used in astrophysics [
8,
9]. Low-cost devices, based on microbolometers, can provide similar features for the required medical application under controlled ambient environment [
18]. Second, a fully unsupervised, without end-user interaction, and automatic segmentation of the feet sole is critical since manual segmentation is dependant on the observer as well as an extremely time-consuming task. Furthermore, a segmentation based solely on infrared (IR) images constitutes a great challenge, as thermographic images provide functional data and exhibit little structural information. IR images consist of a single channel of temperature data, mainly relative temperature values, and are normally noisy. These images present unclear boundaries and certain regions cannot be found, for instance cold toes or heels. These areas could become undetected because the gradient information is not observed [
19]. Some regions in the background, such as other thermal sources within the body, could be considered as part of the soles because they exhibit similar statistical characteristics. Besides, the foot sole is not completely flat and the shape of the arc is subject dependent. Consequently, the establishment of a well-defined standard, reference or ground truth is prevented by the above mentioned reasons.
Multimodal imaging facilitates the segmentation process [
20] since structural and functional information of the tissues can be acquired [
21]. Visual-light images (RGB: Red, Green and Blue color space) provide structural information, primarily a clear feet delineation required to establish the ground truth. The multimodal fusion, resulting from the combination of RGB with IR images, solves the issue as detailed morphological information is gathered. This multimodal image fusion has been previously investigated for several medical applications [
22] including the intended application [
2] but also in the context of brain surgery [
21]. However, low-cost infrared sensors are not usually equipped with visible-light cameras to provide spatially registered RGB images onto the thermal ones, so an additional camera is required. This alternative entails a new challenge to overcome because the acquired RGB and IR images will not be spatially registered. Furthermore, once the images are properly registered so each pixel has information in four-channels (RGB-IR), the segmentation problem still applies.
Previous attempts to segment IR images are mainly based on the application of a threshold to discriminate the background and rely on the homogenization of the background to aid the process [
2]. For instance, thresholding [
23] and active contours [
6,
19] were employed among others. A homogeneous background without thermal sources, except for the sole of each foot, may help the segmentation process but, since an extended exam time is required, presents a serious drawback for the patients’ comfort and the tight schedule of clinical practitioners. Non-constrained acquisition protocols have also been employed to attempt a segmentation based on IR images [
19] including active contour methods and Deep Learning approaches [
24]. The latter provided a more powerful and robust performance as active contour methods are sensitive to the initialization parameters and may fail to converge. The combination of the multimodal approach and a constrained acquisition protocol has also been explored and segmentation was achieved via clustering using the RGB images as input [
2].
Our research aim is to develop an automated workflow, based on affordable thermography, to aid in the detection and monitoring of diabetic foot disorders for future clinical trials. This workflow, providing a proof-of-concept technology and prototypes for such use, consists of several steps which include the acquisition and registration between imaging modalities, extraction of the areas of interest by segmenting the sole of the feet, and finally the analysis of temperature patterns that may indicate areas of risk. In the present work, the focus has been placed on the segmentation procedure, and since the detection of anomalies have not been contemplated yet, only healthy subjects have been considered for the newly created database. Recently, continuing with the non-constrained acquisition protocol, we explored the feasibility of a unified RGB-based Deep Learning approach with point cloud processing, derived from the spatial information provided by the depth images (D), to improve the robustness of the semantic segmentation [
20]. This workflow favours the benefits of the transfer-learning technique where layers are initialised using layers from other networks trained with different RGB image databases, mainly ImageNet. Thus, a robust model can be achieved despite the size of the database which, in our case, for deep learning approaches, was small. This approach was implemented without a database in which RGB-D and IR images were in the same coordinate system. Thus, the present work intends to quantify such approach when employed for the segmentation of the corresponding IR images. In addition, other segmentation approaches were implemented for comparison purposes regarding feasibility and performance.
4. Discussion
Several segmentation approaches have been assessed for a medical application focused on the detection and monitoring of diabetic foot disorders which included U-Net, Skin, their refined versions provided by the spatial information, UPD and SPD, respectively, as well as SegNet. The execution of the segmentation approaches described is fast and they all work in real time, thus segmentation of IR images can be achieved efficiently with a satisfactory performance.
Medical image segmentation is a recognised difficult problem in which high accuracy and precision are desirable properties [
26]. IR manual segmentation is challenging since tissue limits are difficult to define and, as a result, the segmentation is strongly dependent on the observer as well as prone to errors. This is the main reason a RGB segmentation was preferred and, as a consequence, a proper registration between modalities constitutes the main challenge that requires further study. The low-cost cameras employed for the image acquisition were in a non-coaxial arrangement and the RGB images covered a larger FOV than the IR ones. Thus, the IR images were considered as reference and the RGB ones were transformed to match the reference image space using an ad-hoc method. By doing so, a spatially registered dataset formed by multichannel images was created in which each pixel was characterized by five-channel information (RGB-D-IR).
The establishment of the ground truth was based on the manual segmentation carried out on the RGB images by two researchers enabling the quantification of the inter observer variability. However, as mentioned above, the inherent problem associated to the establishment of a ground truth based on the RGB images, is that the quantification of the accuracy provided by each segmentation method is biased by the degree of overlapping between the IR and the RGB images. Furthermore, the use of affordable devices has an impact in the quality of the IR images acquired. For instance, the included optical system may present optical distortions or misalignment. During the establishment of the ground truth, it was demonstrated that the employed IR camera presents some of these effects, particularly on the right side of the images. Significant intergroup differences and longitudinal changes in the masks produced by each researcher, were detected solely for the left foot in the overlap measures, specificity and sensitivity. Thus, a more accentuate lack of correspondence between imaging modalities is observed for the left foot.
The advantage of each method, according to the quantitative evaluation, revealed that the approaches based on the RGB images, namely U-Net and Skin, displayed the best performance. Particularly, the optimization of these approaches, considering the spacial information provided by the depth images significantly improved the overlap measures of the final predictions. A previous attempt to automatically segment the sole of the feet, based on thermal images and employing Deep Learning approaches, reported a 74.35% ± 9.48%
DICE coefficient by the implemented U-Net architecture [
24]. In the work presented here, the same network architecture provided superior results according to the overlap measures. As previously reported, the UPD approach improves the performance of the segmentation, as compared to the U-Net, even for a small training dataset [
20]. A similar trend was observed for the SPD approach, in which substantially superior results were observed when optimized with the spatial information provided by the depth images.
Overall, the SPD approach showed a performance comparable, and even superior, to the UPD segmentation method of reference. This approach is truly unsupervised and the practicality of its implementation makes this segmentation approach the simplest, faster and, therefore, the preferred choice by clinical practitioners. However, despite this numerical advantage, simplicity and good visual reproduction, the skin recognition method is reportedly affected by the light conditions at the image acquisition room as well as individual characteristics like skin tone, age, sex, and body parts under study [
30,
44]. Other factors affecting are the background colors, shadows and motion blur. The testing dataset employed in this work is extremely homogeneous regarding the skin tone of the subjects. In addition, all images were acquired in the same room, so the lighting condition were not varied neither the background colors. Thus, the robustness of this method was not truly tested and this segmentation approach must be considered with caution. At least, until a larger, more diverse and inclusive dataset is available.
In the same study mentioned earlier [
24], a SegNet architecture provided a 97.26% ± 0.69%
DICE coefficient. For comparison purposes, the same architecture was replicated in this work and resulted in a slightly worse performance, 93.15% ± 3.08%, according to the same overlap measure. Our results confirmed that the best qualitative and quantitative results were obtained for an
value equal to one according to Equation (
1). That is, the network is trained more efficiently using a loss function based on the
BCE instead of the
DICE. Regarding the disagreement in the overlap measures may be due to the fact that the number of images available for our training dataset was significantly lower (200 vs. 1134). Furthermore, this segmentation approach required to pre-process the training dataset so a single foot or unique orientation was employed. Since a non-constrained protocol was preferred for the image acquisition, to avoid a partial cut off, the cropping procedure was not centered within the image for the majority of the cases. And this could cause the lower overlap measures detected. In fact, it must be noted that the final prediction for the non-flipped foot was more accurate, in comparison to the flipped one, as can be noticed by simple visual inspection of the presented images. Nevertheless, this effect may be also caused by the smaller training dataset since it was not reported in the mentioned study.
There are several limitations in the present study. First, an ad-hoc method was employed for registration. This approach is not robust enough. A slight displacement between the cameras during transportation may cause a misalignment and, in some cases, there is a clear mismatch between IR and RGB-D images. For this reason, further research is currently in progress to assess the most accurate registration method. Second, no partial foot amputations or deformations have been considered. However, these are common and characteristic features of the intended application that may originate morphological and functional differences in the subsequent analysis of the temperature patterns. In any case, regarding the previously required segmentation process aimed to discriminate the feet from the background, independently of whether the subjects are healthy or diabetic, the impact of these partial amputations or deformations in the segmentation process must be assessed in the future. In addition, further research is planned to provide a foot model that can be taken as a standard for these cases in which certain parts of the foot are missing. Image acquisition is currently in progress with the aim to increase the multimodal image database for healthy subjects as well as pathological ones who are affected by diabetic disorders. The images from healthy subjects will be employed to establish normalcy regarding temperature patterns, while the pathological ones will aim to establish a relationship between temperature patterns and the status of the underlying diabetic condition. A larger database will improve the UPD model which has a promising performance to replace the standard time-consuming manual feet segmentation. Finally, the approaches presented will facilitate the development of a robust demonstrator which can be used in future clinical trials to monitor diabetic foot patients, allowing to place the focus on the diagnosis task.