*Article* **A Computational Study on Temperature Variations in MRgFUS Treatments Using PRF Thermometry Techniques and Optical Probes**

**Carmelo Militello 1,\*, Leonardo Rundo 2,3, Fabrizio Vicari 4, Luca Agnello 5, Giovanni Borasi 4, Salvatore Vitabile <sup>5</sup> and Giorgio Russo <sup>1</sup>**

	- <sup>3</sup> Cancer Research UK Cambridge Centre, Cambridge CB2 0RE, UK
	- <sup>4</sup> LAboratorio di Tecnologie Oncologiche (LATO), Cefalu, 90015 Palermo, Italy; fabrizio.vicari.plus@gmail.com (F.V.); giovanni.borasi@gmail.com (G.B.)
	- <sup>5</sup> Department of Biomedicine, Neuroscience and Advanced Diagnostics (BiND), University of Palermo, 90127 Palermo, Italy; luca.agnello@gmail.com (L.A.); salvatore.vitabile@unipa.it (S.V.)
	- **\*** Correspondence: carmelo.militello@ibfm.cnr.it

**Abstract:** Structural and metabolic imaging are fundamental for diagnosis, treatment and follow-up in oncology. Beyond the well-established diagnostic imaging applications, ultrasounds are currently emerging in the clinical practice as a noninvasive technology for therapy. Indeed, the sound waves can be used to increase the temperature inside the target solid tumors, leading to apoptosis or necrosis of neoplastic tissues. The Magnetic resonance-guided focused ultrasound surgery (MRgFUS) technology represents a valid application of this ultrasound property, mainly used in oncology and neurology. In this paper; patient safety during MRgFUS treatments was investigated by a series of experiments in a tissue-mimicking phantom and performing ex vivo skin samples, to promptly identify unwanted temperature rises. The acquired MR images, used to evaluate the temperature in the treated areas, were analyzed to compare classical proton resonance frequency (PRF) shift techniques and referenceless thermometry methods to accurately assess the temperature variations. We exploited radial basis function (RBF) neural networks for referenceless thermometry and compared the results against interferometric optical fiber measurements. The experimental measurements were obtained using a set of interferometric optical fibers aimed at quantifying temperature variations directly in the sonication areas. The temperature increases during the treatment were not accurately detected by MRI-based referenceless thermometry methods, and more sensitive measurement systems, such as optical fibers, would be required. In-depth studies about these aspects are needed to monitor temperature and improve safety during MRgFUS treatments.

**Keywords:** MRgFUS; proton resonance frequency shift; temperature variations; referenceless thermometry; RBF neural networks; interferometric optical fibers

## **1. Introduction**

Image-guided thermal ablations are increasingly employed in minimally invasive treatments in patients with cancer [1–4]. In the last decades, a large number of highintensity focused ultrasound (HIFU) [5,6] devices have been used in oncology to cover a wide range of cancer types, such as prostate [7], bone metastases [8], liver [9], breast [10], thyroid [11], uterine fibroids [12,13], liver and pancreas [14], and brain [15]; as well as psychiatric disorders [16] and essential tremor [17,18].

Considering the imaging modalities that currently guide HIFU treatments, two possible methodologies are available: (i) ultrasound-guided therapeutic focused ultrasound

**Citation:** Militello, C.; Rundo, L.; Vicari, F.; Agnello, L.; Borasi, G.; Vitabile, S.; Russo, G. A Computational Study on Temperature Variations in MRgFUS Treatments Using PRF Thermometry Techniques and Optical Probes. *J. Imaging* **2021**, *7*, 63. https://doi.org/ 10.3390/jimaging7040063

Academic Editor: Reyer Zwiggelaar

Received: 28 January 2021 Accepted: 23 March 2021 Published: 25 March 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

(USgFUS) [19,20], which uses the shift of the echo timing related to the temperature variation of the treated tissues [21]; and (ii) magnetic resonance-guided focused ultrasound surgery (MRgFUS) [22], which leverages the intrinsic dependence of the temperature with respect to some fundamental parameters, such as the apparent diffusion coefficient (ADC) of water molecules, the spin-lattice relaxation time (T1), and the water proton resonance frequency (PRF) [23].

In order to evaluate the incidence and severity of adverse reactions to the USgFUS ablation of uterine fibroids, Chen et al. [24] performed a multicenter, large-scale retrospective study involving 9988 patients with uterine fibroids or adenomyosis. Even though all the required procedures were applied, including skin preparation, 26 of the patients had blisters or tangerine pericarp-like burns in their abdominal skin, and two of them required surgical removal of the necrotic tissue. In [25], a preliminary report on bone metastasis pain-palliation therapy with MRgFUS, an unusual second-degree skin burn occurred on the body side opposite to the transducer position. The authors argued that this accident occurred due to a series of energetically intense sonications that may not have been totally included inside the patient's body, causing a far-field energy accumulation at the air–skin interface [26,27]. In the case of MRgFUS capsulotomy, safety and clinical efficacy need to be carefully assessed by considering issues related to skull heating [16].

With particular interest in MRgFUS, automated techniques for uterine fibroid MR image segmentation have been recently devised to improve treatment planning [28] and evaluation [29,30], thus increasing the result repeatability and reliability [31]. Importantly, the attention of manufacturers to MRgFUS treatment safety has increased in recent years; therefore, multicenter studies have been performed to propose effective solutions. For instance, a modified clinical MRgFUS fibroid therapy system, called Sonalleve (Philips Healthcare, Vantaa, Finland), was integrated with a 1.5 T magnetic resonance imaging (MRI) scanner (Achieva, Philips Healthcare, Best, The Netherlands). This system directly relied upon a skin-cooling device for the treatment of symptomatic uterine fibroids [32]. In the experiments conducted, involving eight patients, no adverse effects were reported when this cooling device was integrated with the patient table to keep the transducer–patient interface at a fixed temperature of 20 ◦C.

The aim of this work is to explore the sensitivity of MRI guidance to monitor the temperature increase for patient safety [26,27]. In particular, we simulated the temperature variations in a fibroid treatment on a tissue-mimicking phantom, acquiring temperature measurements using thermal imaging provided by the operating console of the MRgFUS ExAblate 2100 (Insightec Ltd., Carmel, Israel), as well as interferometric optical probes. The temperature maps were obtained using classic PRF and referenceless thermometry methods and compared against the measurements.

## **2. Materials and Methods**

In our experiments, an Insightec ExAblate 2100 HIFU transducer integrated with a Signa HTxt MRI scanner (General Electric Medical Systems, Milwaukee, WI, USA) was used. The same clinical device is employed at the Foundation Institute "G. Giglio", Cefalù (PA), Italy, for uterine fibroid treatment and bone metastasis pain-palliative therapy. This system exploits MRI to acquire temperature maps of treated tissues by quantifying the phase variation resulting from the temperature-dependent changes in the resonance frequency. The phase differences are proportional to temperature-dependent PRF shifts, thus enabling the assessment of temperature rises [33]. Temperature maps derived from MRI can be obtained using gradient recalled echo (GRE) imaging sequences. The console operator monitors the temperature rise taking into consideration: (i) the thermal map of a chosen slice (Figure 1a); and (ii) the temperature plots concerning the selected point (by means of a crosshair cursor) and a small neighboring region (Figure 1b). These methods were successfully used to model the thermal dose delivery [34] strictly related to tissue thermo-ablation [35,36]. Any unwanted temperature increase outside the "target" is due to

an energy accumulation, caused by acoustic impedance discontinuity in the ultrasound wave-propagation path [37–39].

## *2.1. MRgFUS Treatments*

The experimental measurements were carried out using the ExAblate uterine-fibroids protocol, considering a real fibroid treatment as reference.

Prior to MRgFUS treatments for uterine fibroid ablation, the patient was sedated to minimize her movements, but nevertheless she could constantly provide feedback on the perception of pain and heat during the treatment. The MR images were acquired to localize the fibroid position and to plan the treatment with the most suitable ultrasound beam path, and sonication size and number. The treatment was planned by software that analyzed the region of treatment (ROT)—i.e., the region that will undergo the ultrasound beams—and the limited energy density regions (LEDRs)—i.e., the regions containing the organs at risk (OARs). Treatment planning aims to deliver the sonications in the entire ROT, making sure that the ultrasound beam does not cross the LEDRs.

To verify the focus-position accuracy, a preliminary sonication at sublethal energy was delivered. Some MR images were acquired to detect the temperature distribution in the neighborhood of the focus point. Using an iterative procedure, the operator can modify the wave characteristics to improve the target accuracy and the temperature increase. As a result, the treatment was performed by delivering sonications with lethal energy. Each sonication typically lasted 20–40 s, with a cooling time of 80–90 s between two successive sonications.

At the end of the treatment, the patient, in the position she had during treatment, underwent a diagnostic MR examination with gadolinium-based contrast medium, aimed to evaluate the nonperfused volume (NPV), which was the uterine fibroid area covered

by sonications. Moreover, the skin was examined to evaluate any side effects due to the temperature increase during the treatment.

In this work, to quantify temperature increases in the interface area suffering from acoustic impedance discontinuity in the ultrasound wave-propagation path, we used a configuration composed of: (i) a standard phantom tissue mimicking the daily quality assurance (DQA) routine, as previously proposed by Zucconi et al. [40]; (ii) an ex vivo porcine skin sample placed under the phantom to simulate the patient's skin; and (iii) a gel pad (between the porcine skin and ExAblate bed). A set of interferometric probes was also used to monitor the skin temperature, over the probes and gel pad (Figure 2). We assumed that the porcine skin would respond to the temperature increases like the human skin.

**Figure 2.** The realized configuration: the daily quality assurance (DQA) phantom over a skin portion. Although barely noticeable, the gel pad was placed under the skin to ensure acoustic coupling between the ExAblate bed and skin. In the left area of the image, the two interferometric probes are visible.

A ROT of 78.7 cm3 was defined inside the phantom and automatically covered by the system with 56 sonications. Neglecting absorption and attenuation in the propagation path [41], an average energy of 2353 ± 611 J can be attributed to the sonications emitted by the 208 elements of the phased-array HIFU transducer [42], for an average duration of 20.0 ± 2.9 s (with an elongated beam geometry). The time cooling was set at 85 s and the ultrasound frequency at 1.1 MHz.

The software distributed the sonications over the ROT, forming s-shaped paths, in order to prevent local overheating.

## *2.2. Optical Thermometry*

For continuous temperature monitoring during the MRgFUS sonications, an MRcompatible instrumentation was required. The AccuSens interferometric signal conditioner (Opsens Inc., Québec, QC, Canada) equipped with an OTP-M birefringent crystal sensor was chosen. The main characteristics are reported in Table 1.


**Table 1.** Characteristics of the AccuSens interferometric signal conditioner.

The bottom surface of the phantom was divided into two portions: a circular crown, which was never crossed by the ultrasound, and an inner area covered by the HIFU. One of the OTP-M probes was inserted into the middle of the circular region, and the tip of another one on the boundary between these two regions (Figure 3). Using this configuration, a mask for the relative positioning of sensors and phantom on the gel pad was designed. Then, this mask was reproduced on a plastic drape included in the "patient accessory set" necessary for the treatment, since this material did not introduce any acoustical impedance discontinuity.

**Figure 3.** Probe positions relative to the ultrasound field. (**a**) 3D model with the two probes positioned; (**b**) schematics of the positioning/coupling apparatus.

## *2.3. Signal-to-Noise-Ratio Estimation*

In order to evaluate if there was an adequate signal within the interface region necessary to quantify temperatures, the signal-to-noise ratio (SNR) was calculated according to Gorny et al. [43]. The investigated areas were the phantom, the skin interface, and the gel pad.

Some sample MR images were evaluated; in particular, the images of the phantom relative to sonication 4 and 5 were examined. Each acquired region was characterized by an overall thickness of 16 mm, and was acquired in different locations with respect to the phantom size (circular base with a diameter of 105 mm, as shown in Figure 3).

As shown in Figure 4, the acquired region of sonication #4 (red area) ranged from +35 mm to +51 mm, while the region of sonication #5 (orange area) ranged from −14.4 mm to +1.6 mm.

**Figure 4.** The MRI acquisition locations of sonications #4 and #5.

The images related to each region were acquired in subsequent temporal frames of 3 s, which allowed us to reconstruct the temporal trend of the temperature rise for each acquired area. The SNR value was calculated according to Equation (1):

$$\text{SNR} = \frac{0.655 \cdot \mu \left( \text{Signal}\_{\text{object}} \right)}{\sigma \left( \text{Signal}\_{\text{background}} \right)},\tag{1}$$

where the ratio between the mean signal value (*μ*) of the object (i.e., phantom, skin, and gel pad) region of interest (ROI) and the standard deviation (*σ*) of an area that contains only background noise (e.g., air) were considered. The 0.655 factor was due to the Rician distribution of the background noise in a magnitude image, which tended to a Rayleigh distribution as the SNR tended to zero [44].

The three ROIs investigated for the SNR estimation are represented in Figure 5. The signal intensity of the phantom, the skin interface, and gel pad areas were compared to a region where the signal was ideally zero (i.e., the background ROI).

## *2.4. Referenceless Thermometry*

Classical PRF shift thermometry—in which one or more baseline images are acquired before the thermal therapy and then are subtracted pixel-by-pixel from the images acquired during heating—is affected by artifacts, which could lead to unrealistic temperature increases [13,45,46]. These temperature-independent artifacts are mainly due to movements of the anatomical region undergoing MRgFUS treatments, or to magnetic field inhomogeneities. With the goal of reducing these issues, referenceless thermometry could be used, thus allowing us to estimate the heating caused by an MRgFUS treatment without using a baseline image as temperature reference.

With the goal of accurately estimating the temperature variations, referenceless thermometry methods were developed; in particular, we devised an interpolation method based on artificial neural networks (ANNs) to reconstruct the original baseline phase image and reliably evaluate temperature variations in the sonication area [47,48]. In fact, assuming that the phase image surrounding the treated region has a smooth trend (even under the heated area), referenceless (or self-referenced) thermometry techniques estimate the temperature variations by means of a set of smooth low-order polynomial functions to the surrounding phase, or to a complex magnitude image with the same phase using a weighted least-squares fit [49]. The extrapolation of the polynomial inside the heated region is used as background phase estimation, which is subtracted from the actual phase to evaluate the phase difference before and after heating caused by ultrasound sonications and, successively, quantify the temperature increase.

In the referenceless phase estimation, an ROI has to be delineated around the area to be heated. First of all, two regions (namely, outer and inner) must be selected in the phase image to perform the interpolation. Figure 6 shows the phase map and the outer baseline region around the sonicated area (after the removal of the inner ROI containing the heated region). It is essential to choose the outer ROI outside the heated region because the temperature changes within the ROI affect the reconstruction of the background phase.

The most straightforward computational approach to solve this problem is to fit the data with a polynomial function [50]. However, an invertible system that uniquely defines the interpolant is not guaranteed for all positions of the interpolation points, and often it could show spurious bumps. The background phase in the frame ROI is reconstructed by means of an ANN exploiting radial basis functions (RBFs) as kernel [51,52].

In particular, a 3-layer feed-forward ANN was designed (with 1 input layer, 1 output layer and 1 hidden layer) in which each hidden node implemented an RBF. ANNs are well-suited for interpolation purposes, especially if there are large areas of missing data, and the RBF approximation method allows several advantages with respect to polynomial interpolants: (i) the network training finds the optimal weights from the input to the hidden layer, and then the weights from the hidden to the output layer are calculated; and (ii) the geometry of the input points is not restricted to a regular grid.

## Radial Basis Function Theory

Let *<sup>f</sup>* : <sup>R</sup>*<sup>d</sup>* <sup>→</sup> <sup>R</sup> be a real valued function of *<sup>d</sup>* variables that has to be approximated by *<sup>s</sup>* : <sup>R</sup>*<sup>d</sup>* <sup>→</sup> <sup>R</sup>, given the values { *<sup>f</sup>*(*Xi*) : *<sup>i</sup>* <sup>=</sup> 1, 2, . . . , *<sup>n</sup>*}, where {*Xi* : *<sup>i</sup>* <sup>=</sup> 1, 2, . . . , *<sup>n</sup>*} is a set of *n* distinct points in R*<sup>d</sup>* called the interpolation nodes. We will consider an approximation of the form:

$$s(X) = p\_{\mathfrak{M}}(X) + \sum\_{i=1}^{n} \lambda\_i \mathfrak{p}(\|X - X\_i\|\_2), \ X \in \mathbb{R}^d, \lambda\_i \in \mathbb{R},\tag{2}$$

where: *pm* is a low-degree polynomial that can be also omitted, ·<sup>2</sup> denotes the Euclidean norm, and *<sup>ϕ</sup>* is a fixed function from <sup>R</sup> to <sup>R</sup>. Thus, the radial basis function *<sup>s</sup>*(·) is a linear combination of translations of the single radially symmetric function *ϕ*(·2), plus a low-degree polynomial. We will denote with *π<sup>d</sup> <sup>m</sup>* the space of all polynomials of degree *m* at most in *d* variables. The coefficients *λi*, which represent the weights of the approximation *s*, are determined by requiring that *s* satisfies the interpolation conditions expressed in the following Equation (3):

$$s(X\_{\bar{j}}) \equiv f(X\_{\bar{j}}), \ j = 1, 2, \dots, n,\tag{3}$$

together with the side conditions:

$$\sum\_{i=1}^{n} \lambda\_i q\left(X\_j\right) = 0, \; \forall q \in \pi\_m^d. \tag{4}$$

**Figure 6.** (**a**) 3D plot of a phase map with sonicated area; (**b**) 3D plot of the outer region of the phase map in (**a**) after removing the sonicated area.

(**b**)

Some typical conditions on the nodes under which the interpolation conditions (3) and (4) uniquely specify the radial basis function (2) are given in Table 2. In this context "not coplanar" means that the nodes do not all lie in a single hyperplane, or equivalently that no linear polynomial in *d*-variables vanishes at all the nodes. The surveys presented in [53] and [54] are excellent references to these and other properties of radial basis functions.


**Table 2.** Conditions imposed on nodes for various radial basis interpolants.

## **3. Results**

The selected ROIs were propagated for all the temporal sequences and in all the depths, so the SNR value was calculated on every acquired 3D volume. As depicted in Figure 7, the MR images of the sonications #4 and #5 showed the impulsive noise in the area surrounding the phantom, especially in the skin interface and in the gel pad.

The signal acquired using the thermometric MRI protocol can be acceptable for aqueous tissues (such as the regions treated with MRgFUS), but unsatisfactory for fatty tissues. In fact, as widely stated in [55], the tissue-type temperature independence of the PRF shift is almost true for aqueous tissues, while the dependence in adipose tissues is affected by susceptibility effects. Consequently, the temperature sensitivity of fat is extremely low [56], indicating that MRI-based thermometry inside fatty tissues (such as the skin interface taken into account here) is difficult.

**Figure 7.** Sonications #4 (first row) and #5 (second row) morphological and thermal map examples: (**a**) and (**d**) temperature reconstruction; (**b**) and (**e**) morphological image; (**c**) and (**f**) temperature image overlapped on the morphological image. It is possible to estimate the noise in the gel pad and in the skin interface by observing the low SNR in those areas.

These insights also were confirmed by our experimental findings, which showed that SNRs inside the area near the gel pad and the porcine skin were relatively low when compared to the SNR inside the phantom. Figure 8 shows that the signal was globally low in all three acquired MRI volumes. The phantom area showed a higher signal compared to the skin layer and the gel pad, where the signal appeared very poor.

**Figure 8.** SNR values calculated for the phantom, skin, and gel ROIs. The phantom signal had the highest SNR values, while the gel area and the skin-interface area had the lowest SNR values.

The treatment was performed in about 2 h. The interferometric probes under the porcine skin, positioned according to the scheme on Figure 3, measured a large amount of temperature data. Figure 9 shows the maximum temperature rise recorded by the probes in all the sonications. This is a clear confirmation that the probes were actually placed as planned: the first probe was in the middle of the phantom and received more heat than the second one, which was in a more decentralized position than the ROT.

**Figure 9.** Maximum rise of temperature in each sonication for probe 1 (blue) and probe 2 (red).

In some sonications, temperature-rising measurements were weakly perceived (ΔT<1 ◦C) for the relative position along the hypersonic field; this was the case in the fourth sonication (Figure 10a). In other cases, like the fifth sonication, the temperature rose about 16 ◦C (Figure 10b).

**Figure 10.** Temperature measured by the optical probe 1 (blue) and probe 2 (red) during sonications #4 (**a**) and #5 (**b**).

Our analysis, coupled with the PRF-based temperature quantification provided by the ExAblate control console, was employed by considering referenceless thermometry on 2D phase map data, by means of ANNs using different interpolants RBF kernels (i.e., linear, thin-plate spline, and multiquadratic) [47]. In these cases, it also was not possible to detect meaningful temperature increases.

RBF and polynomial interpolations were applied on the data set; the former showed a "bump-like" tendency and the latter overestimated the temperature, because the analyzed area was characterized by a low signal intensity where the noise was a significant component (Figure 11).

**Figure 11.** The interpolated temperature errors compared to PRF-based temperature measurements (which does not show any significant temperature rise). The polynomial (green) line overestimated the data, while the linear RBF (blue) and multiquadratic RBF (cyan) lines had a "bump-like" trend caused by the presence of noisy data.

To show the differences measured by the two probes, a two-sided Wilcoxon signed rank test on paired data [57] was performed with the null hypothesis that the samples came from continuous distributions with equal medians. In all the tests, a significance level of 0.05 was considered. More details are provided in what follows: (i) the distributions of the temperature increases measured by the probes (Figure 9) were statistically significant considering all the sonications (*<sup>p</sup>* = 1.719 × <sup>10</sup><sup>−</sup>10); (ii) the distributions of the temperature measured over time by the probes (Figure 10) were statistically significant for sonications for both sonications #4 (*<sup>p</sup>* = 2.095 × <sup>10</sup><sup>−</sup>24) and #5 (*<sup>p</sup>* = 6.601 × <sup>10</sup><sup>−</sup>44); and (iii) the polynomial interpolation (Figure 11) significantly overestimated the data (*p* = 0.031), while the linear RBF and multiquadratic RBF interpolations were not statistically different from the PRF shift data (*p* = 0.687 in both cases).

## **4. Discussion**

Starting from the current issues concerning patient safety related to undesired temperature variations that can cause skin burns, an MRgFUS fibroid treatment was simulated using an ex vivo porcine skin and a DQA tissue-mimicking phantom. The treatment consisted of 56 ultrasound sonications and a maximum temperature increment (ΔT = 17.78 ◦C, given in the 43th sonication), as shown in Figure 9. Even if the temperature increase was obtained intentionally through bad acoustic coupling and by considering the interference of the probes, the obtained results showed how it is quite difficult for a clinical operator to detect a possible (and naturally unwanted) temperature increase by relying only on the operating console that displays MR thermographic images. According to the study of Moritz and Henriques [58], the relationship between time and temperature for this sonication is not intense enough to cause a skin burn, but the authors showed how a repetition of five times could lead to complete and irreversible epidermal necrosis. The same results can be obtained using more recent model-based classification approaches [59]. PRF-based temperature monitoring is not useful with this kind of tissue, which was also confirmed by using referenceless thermometry with polynomial and RBF interpolation models. This can be attributed to the small thickness of the skin in the axial and sagittal planes compared to: (i) the spatial resolution of the acquired MR images, (ii) the difficulty of catching the skin on a coronal slice in the low-quality (to guarantee the appropriate acquisition speed for real-time temperature monitoring) MR images acquired during the treatment, and (iii) the thermometry system developed for clinical applications that is not optimized for such a purpose. Moreover, the bump-like tendency of the RBF interpolation errors (see Figure 11) could be due to a low SNR in the analyzed area, where the noise represented a significant component while the signal was practically negligible, as shown in Figure 8.

While attempts have been made to reduce temperature increases on patients' skin through the quantification of the near-field (between the ultrasound transducer and target) heating [60], a real-time temperature monitoring could give a better control during the treatment. It might be necessary to develop novel image-processing algorithms and methods to enhance phase-map acquisition in PRF-based thermometry techniques, as well as MRI sequences with a higher pixel resolution, to improve the temperature monitoring and limit any unwanted hot spots.

## **5. Conclusions and Future Work**

In this work, the potential side effects regarding patient safety due to temperature increases that rarely affect MRgFUS treatments were assessed. Along with the classical PRF shift thermometry, a novel approach that exploited a referenceless technique based on the RBF interpolation was used to evaluate the skin temperature during sonications. Moreover, in this study, we also used two interferometric probes to measure the reached temperatures. In a simulation of a real uterine fibroid treatment, only the probes were able to detect temperature increases, while no important temperature changes were revealed by the used interpolation methods. The achieved results showed that these methods, based on the PRF shift thermometry, could be unsuitable to detect temperature increases on the skin.

One of the issues to consider in our analysis is the low SNR value in the investigated region. New hardware and software solutions need to be studied to increase the temperature-detector sensitivity by rising the SNR in order to also enhance MRgFUS treatment safety and effectiveness.

In the future, more temporal instants should be considered for temperature measurements and increases. Multiple repetitions of the experiments will increase the statistical robustness of the experimental findings.

Moreover, the planned experiments could be designed to reliably simulate a configuration for clinical environments. To address the issues related to the acoustic interference generated by the optical fibers across the ultrasound propagation, other techniques that are able to accurately measure the skin temperature in real time and with a good time resolution could be employed. For instance, thermoscanners have a high temperature accuracy (±0.3 ◦C), a very high recognition speed (<300 ms), and a temperature range (25–45 ◦C) that are sufficient to evaluate skin temperature increases in real time. Some systems could be also optically coupled to monitor the skin's irradiated area for all tests. After extensive ex vivo tests, the developed systems could be employed during clinical treatments.

**Author Contributions:** Conceptualization, C.M., L.R., F.V., L.A., G.B. and G.R.; methodology, C.M., F.V., L.A. and G.R.; software, C.M., F.V. and L.A.; validation, C.M., L.R., F.V. and G.R.; formal analysis, C.M., L.R., F.V., L.A. and G.R.; investigation, C.M., L.R., F.V., L.A., G.B. and G.R.; resources, C.M. and G.R.; data curation, C.M., L.R., F.V. and L.A.; writing—original draft preparation, C.M., L.R. and F.V.; writing—review and editing, G.B., S.V. and G.R.; visualization, C.M., L.R., F.V. and L.A.; supervision, G.B., S.V. and G.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the "Sviluppo di una piattaforma tecnologica per il trattamento non invasivo di patologie oncologiche e infettive basate sull'uso di ultrasuoni focalizzati" MIUR project (PON01\_01059), approved by MIUR D.D. n. 655/Ric. This work also was supported by the "Smart Health 2.0" MIUR project (PON 04a2\_C), approved by MIUR D.D. n. 626/Ric and n. 703/Ric.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data sharing is not applicable to this article.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **References**


**Mario Manzo 1,\* and Simone Pellino <sup>2</sup>**


Received: 9 October 2020; Accepted: 23 November 2020; Published: 26 November 2020

**Abstract:** Malignant melanoma is the deadliest form of skin cancer and, in recent years, is rapidly growing in terms of the incidence worldwide rate. The most effective approach to targeted treatment is early diagnosis. Deep learning algorithms, specifically convolutional neural networks, represent a methodology for the image analysis and representation. They optimize the features design task, essential for an automatic approach on different types of images, including medical. In this paper, we adopted pretrained deep convolutional neural networks architectures for the image representation with purpose to predict skin lesion melanoma. Firstly, we applied a transfer learning approach to extract image features. Secondly, we adopted the transferred learning features inside an ensemble classification context. Specifically, the framework trains individual classifiers on balanced subspaces and combines the provided predictions through statistical measures. Experimental phase on datasets of skin lesion images is performed and results obtained show the effectiveness of the proposed approach with respect to state-of-the-art competitors.

**Keywords:** melanoma detection; deep learning; transfer learning; ensemble classification

## **1. Introduction**

Among the types of malignant cancer, melanoma is the deadliest form of skin cancer and its incidence rate is growing rapidly around the world. Early diagnosis is particularly important since melanoma can be cured with a simple excision. In the majority, due to the similarity of the various skin lesions (melanoma and not-melanoma) [1], the visual analysis could be unsuitable and would lead to a wrong diagnosis. In this regard, image processing and artificial intelligence tools can provide a fundamental aid to a step of automatic classification [2]. Further improvement in diagnosis is provided by dermoscopy technique [3]. Dermoscopy technique can be applied to the skin, in order to capture illuminated and magnified images of the skin lesion in a non invasive way to highlight areas containing spots. Furthermore, the visual effect of the deeper skin layer can be improved if the skin surface reflection is removed. Anyhow, classification of melanoma dermoscopy images is a difficult task for different issues. First, the degree of similarity between melanoma and not-melanoma lesions. Second, the segmentation, and, therefore, the identification of the affected area is very complicated because of the variations in terms of texture, size, color, shape and location. The last issue and not the least, is the additional skin conditions such as hair, veins or variations due to image capturing. To this end, many solutions have been provided to improve the task. For example, low-level hand-crafted features [4] are adopted to discriminate non-melanoma and melanoma lesions. In some cases, these types of features are unable to discriminate clearly, leading to results that are

sometimes not very relevant [5]. Differently, segmentation is adopted to isolate the foreground elements from the background ones [6]. Consequently, the segmentation includes low-level features with a low representational power that provides unsatisfactory results [7]. In recent years, deep learning has become an effective solution for the extraction of significant features on large data. In particular, the diffusion of deep neural networks, applied to the image classification task, is connected to various factors such as the availability of software in terms of open source license, the constant growth of hardware power and the availability of large datasets [8]. Deep learning has proven effective for the management, analysis, representation and classification of medical images [9]. Specifically, for the treatment of melanoma, deep neural networks were adopted both in segmentation and classification phases [10]. However, the high variation of the types of melanoma and the imbalance of the data have a decisive impact on performance [11], hindering the generalization of the model and leading to over-fitting [12]. In order to overcome the aforementioned issues, in this paper, we introduce a novel framework based on transfer deep learning and ensemble classification for melanoma detection. It works based on three integrated stages. A first, which performs image preprocessing operations. A second, which extracts features using transfer deep learning. A third, including a layer of ensemble learning, in which different classification algorithms and features extracted are combined with the aim of making the best decision (melanoma/not-melanoma). Our approach provides the following main contributions:


The paper is structured as follows. Section 2 provides an overview of state-of-the-art about melanoma classification approaches. Section 3 describes in detail proposed framework. Section 4 provides a wide experimental phase, while Section 5 concludes the paper.

## **2. Related Work**

In this section, we briefly analyze the most important approaches of skin lesions recognition literature. In this field are included numerous works that address the issue according to different aspects. Some works offer an important contribution about image representation, by implementing segmentation algorithms or new descriptors. Instead, others implement complex mechanisms of learning and classification.

In Reference [13], a novel boundary descriptor based on the color variation of the skin lesion input images, achieved with standard cameras, is introduced. Furthermore, in order to reach higher performance, a set of textural and morphological features is added. Multilayer perceptron neural network as classifier is adopted.

In Reference [14], authors propose a complex framework that implements an illumination correction and features extraction on skin image lesions acquired using normal consumer-grade cameras. Applying a multi-stage illumination improvement algorithm and defining a set of high-level intuitive features (HLIF), that quantifies the level of asymmetry and border irregularity about a lesion, the proposed model can be used to classify accurate skin lesion diagnoses.

While in Reference [15], authors, to properly evaluate contents of the concave contours, introduce a novel border descriptor named boundary intersection-based signature (BIBS). Shape signature is a

one-dimensional illustration of shape border and cannot contribute to a proper description for concave borders that have more than one intersection points. For this reason, BIBS analyzes boundary contents of shape especially shapes with concave contours. Support vector machine (SVM) for classification process is adopted.

Another descriptor for the individualization of skin lesions is named high-level intuitive features (HLIFs) [16]. HLIFs are created to simulate a model of human-observable characteristics. It captures specific characteristics that are significant to the given application: color asymmetry—analyzing and clustering pixels colors, structural asymmetry—applying the Fourier descriptors of the shape, border irregularity—using morphological opening and closing, color characteristics—transforming the image to a perceptually uniform color space, building color-spatial representations that model the color information for a patch of pixels, clustering the patch representations into *k* color clusters, quantifying the variance found using the original lesion and the *k* representative colors.

A texture analysis method of Local Binary Patterns (LBP) and Block Difference of Inverse Probabilities is proposed in Reference [17]. A comparison is provided with classification results obtained by taking the raw pixel intensity values as input. Classification stage is achieved generating an automated model obtained by both convolutional neural networks (CNN) and SVM.

In Reference [18], authors propose a system that automatically extracts the lesion regions, using non-dermoscopic digital images, and then computes color and texture descriptors. Extracted features are adopted for automatic prediction step. The classification is managed using a majority vote of all predictions.

In Reference [19], non-dermoscopic clinical images to assist a dermatologist in early diagnosis of melanoma skin cancer are adopted. Images are preprocessed in order to reduce artifacts like noise effects. Subsequently, images are analyzed through a pretrained CNN which is a member of deep learning models. CNNs are trained by a large number of training samples in order to distinguish between melanoma and benign cases.

In Reference [20], Predict-Evaluate-Correct K-fold (PECK) algorithm is presented. The algorithm works by merging deep CNNs with SVM and random forest classifiers to achieve an introspective learning method. In addition, authors provides a novel segmentation algorithm, named Synthesis and Convergence of Intermediate Decaying Omnigradients (SCIDOG), to accurately detect lesion contours in non-dermoscopic images, even in the presence of significant noise, hair and fuzzy lesion boundaries.

In Reference [21], authors propose a novel solution to improve melanoma classification by defining a new feature that exploits the border-line characteristics of the lesion segmentation mask combining gradients with LBP. These border-line features are used together with the conventional ones and lead to higher accuracy in classification stage.

In Reference [22], an objective features extraction function for CNN is proposed. The goal is to acquire the variation separability as opposed to the categorical cross entropy which maximizes according to the target labels. The deep representative features increase the variance between the images making it more discriminative. In addition, the idea is to build a CNN and perform principal component analysis (PCA) during the training phase.

In Reference [23], a deep learning computer aided diagnosis system for automatic segmentation and classification of melanoma lesions is proposed. The system extracts CNN and statistical and contrast location features on the results of raw image segmentation. The combined features are utilized to obtain the final classification of melanoma, malignant or benign.

In Reference [24], authors propose an efficient algorithm for prescreening of pigmented skin lesions for malignancy using general-purpose digital cameras. The proposed method enhances borders and extracts a broad set of dermatologically important features. These discriminative features allow classification of lesions into two groups of melanoma and benign.

In Reference [25], a skin lesion detection system optimized to run entirely on the resource constrained smartphone is described. The system combines a lightweight method for skin detection with a hierarchical segmentation approach including two fast segmentation algorithms and proposes novel features to characterize a skin lesion. Furthermore, the system implements an improved features selection algorithm to determine a small set of discriminative features adopted by the final lightweight system.

Multiple-instance learning (MIL)-based approaches are of great interest in recent years. MIL is a type of supervised learning and works by receiving a set of instances, named bags, individually labeled. In Reference [26], authors present an MIL approach with application to melanoma detection. The goal was to discriminate between positive and negative sets of items. The main rule concerns a bag that is positive if at least one of its instances is positive and it is negative if all its instances are negative. Differently in Reference [27], MIL approaches are described with purpose to discriminate melanoma from dysplastic nevi. Specifically, authors introduce an MIL approach that adopts spherical separation surfaces. Finally, in Reference [28], a preliminary comparison between two different approaches, SVM and MIL, is proposed, focusing on the key role played by the feature selection (color and texture). In particular, the authors are inspired by the good results obtained applying MIL techniques for classifying some medical dermoscopic images.

## **3. Materials and Methods**

In this section, we describe the proposed framework which includes two well known methodologies: deep neural network and ensemble learning. The main idea is to combine algorithms of features extraction and classification. The result is a set of competitive models providing a range of confidential decisions useful for making choices during classification. The framework is composed of three levels. A first, which performs preprocessing operations such as image resize and data balancing. A second, of transfer learning, which extracts features using deep neural networks. A third level, of ensemble learning, in which different classification algorithms (SVM [29], Logistic Label Propagation (LLP) [30], KNN [31]) and features extracted are combined with the aim of making the best decision. Adopted classifiers are trained and tested through a bootstrapping policy. Finally, the framework iterates through a predetermined number of times in a supervised learning context. Figure 1 shows a graphic overview of the proposed framework.

**Figure 1.** Overview of the proposed framework.

## *3.1. Data Balancing*

Melanoma lesion analysis and classification is connected with accurate segmentation with purpose to isolate areas of the image containing information of interest. Moreover, the wide variety of skin lesions and the unpredictable obstructions on the skin make traditional segmentation an ineffective tool, especially for non-dermoscopic images. Furthermore, the problem of imbalance, present in many datasets, makes the classification difficult to address, especially when the samples of the minority class are very underrepresented. In the case under consideration, to compensate the strong imbalance between the two classes, a balancing phase was performed. The goal was to isolate segments of the image that could contain melanoma. In particular, the resampling of the minority class is performed by adding images altered through the application of K-Means color segmentation algorithm [32]. The application of segmentation algorithms for image augmentation [33], and consequently to provide a balancing between classes, represented a good compromise for this stage of the pipeline.

## *3.2. Image Resize*

Images to be processed have been resized based on the dimension, related to the input layer, claimed by the deep neural networks (details can be found in Table 1 column 5). Many of the networks require this type of step but it does not alter the image information content in any way. This normalization step is essential because images of different or large dimensions cannot be processed for the features extraction stage.


**Table 1.** Description of the adopted pretrained network.

## *3.3. Transfer Learning and Features Extraction*

The transfer learning approach has been chosen for features extraction purpose. Commonly, a pretrained network is adopted as starting point to learn a new task. It is the easiest and fastest way to exploit the representational power of pretrained deep networks. It is usually much faster and easier to tune a network with transfer learning than training a new network from scratch with randomly initialized weights. We have selected deep learning architectures for image classification based on their structure and performance skills. The goal was to extract features from images through neural networks by redesigning their structures in the final layer according to the needs of the addressed task (two outgoing classes: melanoma and not-melanoma). The features extraction is performed through a chosen layer (different for each network and specified in the Table 1), placed in the final part of the structure. The image will be encoded through a vector of real numbers produced by consecutive convolution steps, from the input layer to the layer chosen for the representation. Below, a description of the adopted networks is reported.

Alexnet [8] consists of 5 convolutional layers and 3 fully connected layers. It includes the non-saturating ReLU activation function, better then tanh and sigmoid during training phase. For features extraction, we have chosen a fully connected 7 (fc7) layer composed of 4096 neurons.

Googlenet [34] is composed of 22 layers deep. The network is inspired by LeNet [35] but implemented a novel element which is dubbed an inception module. This module is based on several very small convolutions in order to drastically reduce the number of parameters. Their architecture reduced the number of parameters from 60 million (AlexNet) to 4 million. Furthermore, it includes batch normalization, image distortions and Root Mean Square Propagation algorithm. For features extraction, we have chosen global average pooling (pool5-7x7\_s1) layer composed of 1024 neurons.

Resnet18 and Resnet50 [36] are inspired by pyramidal cells contained in the cerebral cortex. They use particular skip connections or shortcuts to jump over some layers. They are composed of 18 and 50 layers deep, which with the help of a technique known as skip connection has paved the way for residual networks. For feature extraction, we have chosen two global average pooling (pool5 and avg-pool) layers composed of 512 and 2048 neurons, respectively.

## *3.4. Network Design*

The adopted networks have been adapted to the melanoma classification problem. Originally, they have been trained on the Imagenet dataset [37], composed of a million images and classified into 1000 classes. The result is a rich features representation for a wide range of images. The network processes an image and provides a label along with probabilities for each of the classes. Commonly, the first layer of the network is the image input layer. This requires input images with 3 color channels. Just after, convolutional layers work to extract image features in which the last learnable layer and the final classification layer adopt to classify the input image. In order to make suitable the pretrained network to classify new images, the two last layers with new layers are replaced. In many cases, the last layer, including learnable weights, is a fully connected layer. This is replaced with a new fully connected layer related to the number of outputs equal to the number of classes of new data. Moreover, to speedup the learning in the new layer with respect to transferred layers, it is recommended to increase the learning rate factors. As an optional choice, the weights of earlier layers can be frozen by setting the related learning rate to zero. This setting produces a failure of update of the weights during the training, and a consequent lowering of the execution time as the gradients of the related layers must not be calculated. This aspect is very interesting to avoid overfitting in the case of small datasets.

## *3.5. Ensemble Learning*

The contribution of different transfer learning features and classifiers can be mixed in an ensemble context. Considering the set of images, with cardinality *k*, belonging to *x* classes, to be classified

$$\text{Imgs} = \{i\_1, i\_2, \dots, i\_k\} \tag{1}$$

each element of the set will be treated with the procedure below. Let us consider the set *C* composed of *n* classifiers

$$\mathbb{C} = \{\beta\_1, \beta\_2, \dots, \beta\_n\} \tag{2}$$

and set *F* composed of *m* vectors of transferred learning features

$$F = \{\Theta\_1, \Theta\_2, \dots \Theta\_m\}\tag{3}$$

the goal is the combination each element of the set *C* with the elements of the set *F*. The set of combinations can be defined as *CF*

$$\begin{aligned} CF = \begin{bmatrix} \beta\_1 \Theta\_1 & \dots & \beta\_1 \Theta\_m \\ \vdots & \ddots & \\ \beta\_n \Theta\_1 & & \beta\_n \Theta\_m \end{bmatrix} \end{aligned} \tag{4}$$

each combination provides a decision *i* ∈ *I*{−1, 1}, where 1 stands for melanoma and −1 for not-melanoma, related to image of the set *Imgs*. The set of decisions *D* can be defined as follows

$$D = \begin{bmatrix} d\_{\beta\_1 \ominus \mathbf{1}\_1} & \dots & d\_{\beta\_1 \ominus \mathbf{0}\_m} \\ \vdots & \ddots & \\ d\_{\beta\_n \ominus \mathbf{1}\_1} & & d\_{\beta\_n \ominus \mathbf{0}\_m} \end{bmatrix} \tag{5}$$

Each *dβi*Θ*<sup>j</sup>* value represents a decision based on the combination of sets *C* and *F*. In addition, the set of scores S can be defined as follows

$$S = \begin{bmatrix} P(i|\mathbf{x})\_{d\_{\beta\_1 \Theta\_1}} & \dots & P(i|\mathbf{x})\_{d\_{\beta\_1 \Theta\_m}} \\ \vdots & \ddots & \\ P(i|\mathbf{x})\_{d\_{\beta\_n \Theta\_1}} & & P(i|\mathbf{x})\_{d\_{\beta\_n \Theta\_m}} \end{bmatrix} \tag{6}$$

a score value, *s* ∈ *S*{0, ... , 1}, is associated with each decision *d* and represents the posterior probability *P*(*i*|*x*) that an image *i* belongs to class *x*. At this point, let us introduce the concept of mode, defined as the value which is repeatedly occurred in a given set

$$h \bmod e = l + \left(\frac{f\_1 - f\_0}{2f\_1 - f\_0 - f\_2}\right) \times h \tag{7}$$

where *l* is the lower limit of the modal class, *h* is the size of the class interval, *f*<sup>1</sup> is the frequency of the modal class, *f*<sup>0</sup> is the frequency of the class which precedes the modal class and *f*<sup>2</sup> is the frequency of the class which successes the modal class. The columns of matrix *D* are analyzed with the mode, in order to obtain the values of the most frequent decisions. This step is carried out in order to verify the best response of the different classifiers, contained in the set *C*, which adopt the same type of features. Moreover, the *mode* provides two indications. The most frequent value and its occurrences (indices). For each most

frequent occurrence, modal value, the corresponding score of the matrix *S* is extracted. In this regard, a new vector is generated

$$DS = \{ ds\_{P(i|x)\_{d\_{\beta\_{1,\ldots,n}\Theta\_1}}}, \ldots, ds\_{P(i|x)\_{d\_{\beta\_{1,\ldots,n}\Theta\_m}}} \}\_{\prime} \tag{8}$$

where each element *ds* contains the average of the scores that have a higher frequency, extracted through the *mode*, in the related column of the matrix *D*. In addition, the modal value of each column of the matrix *D* is stored in the vector *DM*

$$DM = \{dm\_{d\_{\beta\_{1,\ldots,n}\Theta\_1}}, \ldots, dm\_{d\_{\beta\_{1,\ldots,n}\Theta\_m}}\},\tag{9}$$

the final decision will consist in the selection of the element of the vector *DM* with the same position of the maximum score value of the vector *DS*. This last step verifies the best prediction based on the different features adopted, essentially the best features suitable for the classification of the image.

## *3.6. Train and Test Strategy: Bootstrapping*

Bootstrapping is a statistical technique which consists of creating samples of size *B*, named bootstrap samples, from a dataset of size *N*. The bootstrap samples are randomly inserted with replacement on the dataset. This strategy has important statistical properties. First, subsets can be considered as directly extracted from the original distribution, independently of each others, containing representative and independent samples, almost independent and identically distributed (idd). Two considerations must be made in order to validate the hypotheses. First, the *N* dimension of the original dataset should be large enough to detect the underlying distribution. Sampling the original data is a good approximation of real distribution (representativeness). Second, the *N* dimension of the dataset should be better than the *B* dimension of the bootstrap samples so that the samples are not too correlated (independence). Commonly, considering the samples to be truly independent means requiring too much data compared to the amount actually available. This strategy can be adopted to generate several bootstrap samples that can be considered nearly representative and almost independent (almost iid samples). In the proposed framework, bootstrapping is applied to set *F* (Equation (3)) in order to perform the training and testing stages of classifiers. This strategy seemed suitable for the problem faced in order to create a competitive environment capable of providing the best performance.

## **4. Experimental Results**

This section describes the experiments performed on public datasets. In order to produce compliant performance, the settings included in well-known melanoma classification methods, in which the main critical issue concerns the features extraction for image representation, are adopted.

## *4.1. Datasets*

The first adopted dataset is MED-NODE (http://www.cs.rug.nl/~imaging/databases/melanoma\_ naevi/). It was created by the Department of Dermatology of the University Medical Center Groningen (UMCG). The dataset was initially used to train the MED-NODE computer assisted melanoma detection system [18]. It is composed of 170 non-dermoscopic images, where 70 are melanoma and 100 are nevi. The image dimensions vary greatly, ranging from 201 × 257 to 3177 × 1333 pixels.

The second adopted dataset, Skin-lesion (from now), is described in Reference [16]. It is composed of 206 images of skin lesion, which were obtained using standard consumer-grade cameras in varying and unconstrained environmental conditions. These images were extracted from the online public databases Dermatology Information System (http://www.dermis.net) and DermQuest (http://www.dermquest. com). Of these images, 119 are melanomas, and 87 are not-melanoma. Each image contains a single lesion of interest.

## *4.2. Settings*

The framework consists of different modules written in Matlab language. Moreover, we applied pretrained networks available which are included in the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) [38]. Among all the computational stages, the features extraction process, described in Section 3.3, was certainly the most expensive. As is certainly known, the networks are composed of fully connected layers that make the structure extremely dense and complex. This aspect certainly increases the computational load. Alexnet, Googlenet, Resnet50 are adopted to extract features on MED-NODE dataset. Differently, Resnet50 and Resnet18 are adopted for Skin-lesion dataset. The choice is not random but was made based on two criteria. Primarily, a study about the network specifications and characteristics most suitable for the problem faced in literature and, secondly, the performance obtained. A different combination did not provide expected feedback. In the Table 1, some important details related to the layers chosen for feature extraction are shown. Networks were trained by setting the mini batch size to 5, the maximum epochs to 10, the initial learning rate to 3 × <sup>10</sup>−<sup>4</sup> and the optimizer is stochastic gradient descent with momentum (SGDM) algorithm. For both experimental procedures, in order to train the classifiers, 80% and 20% of images are included in train and test sets, respectively, for a number of iterations equal to 10. Table 2 enumerates classification algorithms included in the framework and related settings (some algorithms appear more times with different configurations). For completeness and clarity, and in order to demonstrate the best solution, both results of combinations adopted, even if they did not provide the best performance, are indicated in Tables 4 and 5.



#### *4.3. Discussion*

Table 3 shows the metrics adopted for the performance evaluation, in order to provide a uniform comparison with algorithms working on the same task.


**Table 3.** Evaluation metrics adopted during the relevance feedback stage.

Looking carefully at the table, it is important to focus on the meaning of the individual measures with reference to melanoma detection. The True Positive rate, also known as Sensitivity, concerns the portion of positives melanoma images that are correctly identified. This provide important information because highlights the skill to identify images containing skin lesions and contributes to increase the degree of robustness of result. The same concept is true for the True Negative rate, also known as Specificity, which instead measures the portion of negatives, not containing skin lesions, that have been correctly identified. The Positive and Negative Predictive values, also known as Precision and Recall, respectively, are probabilistic measures that indicate whether an image with a positive or negative melanoma test may or may not have a skin lesion. In essence, Recall expresses the ability to find all relevant instances in the dataset, Precision expresses the proportion of instances that the framework claims to be relevant were actually relevant. Accuracy, a well-known performance measure, is the proportion of true results among the total number of cases examined. In our case, it provides an overall analysis, certainly a rough measurement compared to the previous ones, about the skill of a classifier to distinguish a skin lesion from an image without lesions. *F*1−Score measure combines the Precision and Recall of the model, as the harmonic mean, in order to find an optimal blend. The choice of the harmonic mean instead of a simple mean concerns the possibility of eliminating extreme values. Finally, Matthew's correlation coefficient is another overall well-known quality measure. It takes into account True/False Positives/Negatives values and is generally regarded as a balanced measure which can be adopted even if the classes are of very different sizes.

The Tables 4 and 5 describe the comparison with existing skin cancer classification methods (we referred with the results which appear in the corresponding papers). The provided performance can be considered satisfactory compared to competitors. In terms of accuracy, although it provides a rough measurement, we have provided the best result for MED-NODE and the second for Skin-lesion (only surpassed by BIBS). Differently, predictive positive value and negative positive value give good indications on the classification ability. True positive rate, a measure that provides greater confidence about addressed problem, is very high for both datasets. Otherwise, true negative rate, which also provides a high degree of sensitivity related to the absence of tumors within the image, is the best value for both datasets. Regarding the remaining measures, *F<sup>p</sup>* <sup>1</sup> , *<sup>F</sup><sup>n</sup>* <sup>1</sup> and Matthew's Correlation Coefficient, considerable values were obtained but, unfortunately, not available for all competitors. We can certainly attribute the satisfactory performance to two main aspects. First, the deep learning features, which even if abstract, are able to best represent the images. Furthermore, the framework provides multiple representation models that certainly constitute a different starting point than a standard approach, in which a single representation is provided. This aspect is relevant for improving performance. A non negligible issue, the normalization of the image size, with respect to the request of the first layer of the neural network, before the features extraction phase, does not produce a performance degradation. In other cases, normalization causes loss of quality of the image content and a consequent degradation of details. Otherwise, the weak point is the computational load even if pretrained networks include layers with already tuned weights. Surely, the time required for training is long but less than a network created from scratch. Second, the classification scheme, which provides multiple choices in decision making. In fact, at each iteration, the framework chooses which classifier is suitable for recognizing melanoma in the images included in the proposed set. Certainly, this approach is more computationally expensive but produces better results than a single classifier.



**Table 5.** Experimental results on the Skin-lesion dataset.

## **5. Conclusions and Future Works**

The challenge in the discrimination of melanoma and nevi has resulted to be very interesting in recent years. The complexity of the task is linked to different factors such as the large amount of types of melanomas or the difficulties for digital phase acquisition (noise, lighting, angle, distance and much more). Machine learning classifiers suffer greatly these factors and inevitably reflect on the quality of the results. In support, the convolutional neural networks give a big hand for both classification and features extraction phases. In this context, we have proposed a framework that combines standard classifiers and features extracted with convolutional neural networks using a transfer learning approach. The results produced certainly support the theoretical thesis. A multiple representation of the image compared to a single one is a high discrimination factor even if the features adopted are completely abstract. The extensive experimental phase has shown how the proposed approach is competitive, and in some cases surpassing, with respect to state-of-the-art methods. Certainly, the main weak point concerns the computational complexity relating to features extraction phase, as it is known, takes a long time especially when the data to be processed grows. Future work will certainly concern the study and analysis of convolutional neural networks still unexplored for this type of problem, the application of the proposed framework to additional datasets (such as PH<sup>2</sup> [44]) and alternative tasks from the melanoma detection. Finally, also interesting are different dataset balancing approaches, such as proposed in [45] where all the melanom images are duplicated by including zero-mean Gaussian noise with variance equal to 0.0001.

**Author Contributions:** Both authors conceived the study and contributed to the writing of the manuscript and approved the final version. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no funding.

**Acknowledgments:** Our thanking is for Alfredo Petrosino. He followed us during the first steps towards the Computer Science, through a whirlwind of goals, ideas and, especially, love and passion for the work. We will be forever grateful great master.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**





**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Journal of Imaging* Editorial Office E-mail: jimaging@mdpi.com www.mdpi.com/journal/jimaging

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18

www.mdpi.com

ISBN 978-3-0365-2555-6