**1. Introduction**

Obesity is a public health crisis which afflicts one third of the U.S. population [1] and is associated with a significant risk of metabolic diseases. Metabolic diseases come with abnormal chemical reactions in the body; diabetes mellitus (DM) is one of the most common metabolic diseases. The National Health and Nutrition Examination Survey (NHANES) reveals that diabetes is becoming increasingly prevalent and the corresponding weight classes are continually increasing, with nearly half of adult diabetics considered to be obese [2]. Prediction of obese patients at risk for developing metabolic disease is of central importance. Prior research demonstrates that adipose tissue phenotypic features, also known as tissue observable characteristics, such as adipocyte size, correlate with human metabolic

disease [3–5]. That suggests the adipose size may serve as predictors of metabolic disease risk and a diagnostic and therapeutic target. Histology is the current gold standard for assessments of adipose size [3–5]. While accurate, histology requires adipose tissue biopsy, which involves surgery and microscope observation, which is invasive and labor- and cost-intensive.

Emerging biomedical photoacoustic (PA) imaging technology is based upon the detection of laser-induced thermoacoustic signals from a biological sample which can then be used to produce images reflecting the optical absorption contrast in the sample [6–8]. Due to PA signal's broad-band feature, it can reflect sample microstructures of different scopes. Recently, various methods for analyzing the PA signals in the frequency domain have been developed by several research groups [9–12], aiming to achieve quantitative evaluation of tissue microstructures. One of the methods termed as photoacoustic spectral analysis (PASA) [13–16], borrowed the basic concept from ultrasound spectral analysis [17–19]. In PASA, by quantifying the key parameters, including slope, intercept, and mid-band fit, of the linear fit to the truncated signal power spectrum within the predetermined frequency range, histological microfeatures of the optically absorbing materials in target tissues were characterized. Later, by using ultrabroad-band ultrasonic detectors, the scope of PASA was further extended from the tissue level to the cellular level [16]. Using the three parameters of the linear fit (i.e., slope, intercept, and mid-band fit) to describe the main features of the power spectrum provides a practical way of addressing the stochastic nature of the tissue microstructure, and can lead to measurements that are not only quantitative but also robust. However, despite these advantages, PASA, relying on a simple linear fit to the power spectrum within the entire predetermined frequency range, utilizes only limited information of the signal power spectrum.

Spectral analysis based on deep learning provides a potential solution to this problem. A network is established in deep learning which describes the relationships between input and output by parameters and functions. The process of training uses the data sets whose input and output are known to modify all the parameters in the network to achieve an optimal model under control of a learning rule [20]. Upon achieving optimal parameters, the network can be used to predict the corresponding output of data sets whose input is known. Varying layer numbers (numbers of processing steps) and layer sizes (numbers of parameters involved in each processing step) can provide different performance of data representation which results in varying accuracy of prediction [21]. Due to its excellent performance characteristics, such as high accuracy, deep learning methods in medical diagnosis and evaluation have been rapidly developed [22–24].

We studied the feasibility of measuring adipocyte size in human adipose tissue by analyzing PA spectrum with the help of deep learning in this research. A deep neural network was used to fit the relationship between the PA spectrum and an average adipocyte. We hypothesized that PA measurement powered by deep learning can provide accurate assessment of adipocyte size as determined by the gold standard histology. To examine the validity of this hypothesis, experiments on *ex vivo* adipose tissue specimens from diabetic and nondiabetic obese human subjects were conducted and the network was trained in different spectral sections to obtain the results.

In this paper, we first present the principle of using deep network to evaluate the average adipocyte size. And then we show the procedure of *ex vivo* experiment and details of building our data set. Next, results of using networks with different layer numbers are compared and experiments of analyzing different spectral bands and the entire spectrum are presented. Finally, we make some comparisons between our deep learning method and traditional PASA method.

#### **2. Deep Neural Network with Fully Connected Layers**

The traditional PASA method only depends on a simple linear fit with the power spectrum. Therefore, it cannot accurately represent the relationship between PA spectrum and average adipocyte size which may be complex and difficult to be fit with an intelligible mathematical expression. Compared to PASA, deep learning methods can provide a much more complicated nonlinear model to fit the relationship and achieve higher accuracy for our study. More specifically, with each A-line PA signal from a tissue, the power spectrum was computed and normalized, and then analyzed using deep learning method to fully utilize the spectral information. With the method of deep learning, the relationship between the power spectrum and adipocyte size was learned by a deep network. The power spectrum and the corresponding adipocyte size worked as the input and output of the deep network, respectively, when training and testing.

For an N-layer network, the element number of input and output of the network are I0 and ON+1. In our study, each element of the input was a power spectrum value of the PA signal while the output, with only one element, was the average adipocyte size. With the power spectrum of a PA signal fed into the first layer, all output elements of each layer functioned as the input of the next layer and the output of the last layer was the predicted average adipocyte size.

In our study, N-1 hidden layers and one output layer (the last layer) were used to fit the nonlinear relationship between power spectrum and average adipocyte size. Each layer is a fully connected layer, which means all of the input elements are connected to each output element, as shown in Figure 1. All hidden layers and output layer did a linear transformation at first and then used different functions to map linear transformation result to output. The *n*-th layer can be defined as

$$y\_j = f\left(\sum\_{i=0}^{l\_n} w\_{i,j} x\_i + b\_j\right), j = 1, 2, \dots, O\_{n\prime} \tag{1}$$

where *x* is the input of the layer (*In* elements) and *y* is the output ( *On* elements). The transfer function *f*(*x*), as shown in Figure 1b, in hidden layers was selected as tan-sigmoid function in our network, which is defined as

$$f(\mathbf{x}) = \frac{2}{1 + e^{-2\mathbf{x}}} - 1, \mathbf{x} \in ( -\infty, +\infty), \tag{2}$$

The tan-sigmoid function is continuous, smooth, and monotone, and increases quickly in the vicinity of *x* = 0 with a codomain of ( −1, 1). Here, we chose a tan-sigmoid function as the transfer function because its features match with the human adipocyte size distribution, as their values should be continuous and bounded. For output layer, transfer function *f*(*x*) was selected as a linear function so that the output can match the adipocyte size appropriately.

**Figure 1.** One layer of deep network. (**a**) Fully connected layer. All of the input elements are connected to each output element. (**b**) Obtaining *j*-th element of output from input elements.

Through setting layer number N and the input and output element number for each layer, we establish a network and use all parameters ( *w* and *b* in Equation (1)) to fit the relationship between input and output. When training, data set was divided into training set and test set. The training set was randomly picked out from the whole data set while the remains were defined as test set. All parameters ( *w* and *b* in Equation (1)) in the network were optimized gradually through iteration to minimize the difference between the predicted average adipocyte size and the ground truth value from histology for all samples in the training set. When testing, the power spectrum of test set was fed into the network with the parameters obtained through training to predict the average adipocyte

sizes. When calculating the error, we first calculated the relative error *δx* of every sample from test set, which can be defined as

$$
\delta \mathbf{x} = \frac{\Delta \mathbf{x}}{\mathbf{x}} = \frac{|\mathbf{x}\_0 - \mathbf{x}|}{\mathbf{x}},
\tag{3}
$$

where Δ*x* is absolute error. It is equal to the absolute difference between the output of the deep learning network *x*0 and the ground truth value *x* from histologic analysis. Next, the average of absolute errors and the average of relative errors of all samples from the test set were calculated to ge<sup>t</sup> the MAE (mean absolute error) and MRE (mean relative error).

#### **3. Results and Discussions**

## *3.1. Data Set Building*

An experiment on *ex vivo* human adipose tissue specimens was conducted to test our method. To build our data set for deep learning method, we studied the histological photographs of visceral adipose tissue samples collected from 28 human subjects (12 diabetic subjects and 16 nondiabetic subjects) and calculated the accurate average size for each sample. Human subjects undergoing bariatric surgery were enrolled with Institutional Review Board approval from the University of Michigan and Ann Arbor Veterans Administration Hospital. Visceral adipose tissue (VAT) from the greater omentum was collected at the beginning of operation and processed immediately. By cutting collected tissues into small specimens we obtained 110 specimens of adipose. Then each specimen was divided into two parts which were used to for microscope analysis and to acquire PA signal, respectively. Adipose tissue explants were fixed in 10% formalin, embedded in paraffin, and sectioned onto charged glass slides. Fixed hematoxylin/eosin-stained slides were imaged on an microscope and analyzed with ImageJ software by a blinded observer, as previously described [1]. An example of histological photograph is shown in Figure 2c. For each histological photograph, pixel areas of all individual cells were averaged to evaluate the adipocyte size and the size distribution of the sample in Figure 2c and its Gamma fitting, which proved to be the best fit of adipocyte size distribution in previous research [17], are shown in Figure 2d.

**Figure 2.** Experiment on *ex vivo* human adipose tissue specimens. (**a**) Schematic diagram of experimental setup for PA signal acquisition. (**b**) Photograph of human adipose tissue specimens. (**c**) Representative microphotograph of a human adipose tissue sample demonstrating adipocyte microarchitecture. (**d**) Adipocyte size distribution corresponding to (**c**).

PA signal acquisition was conducted by using the setup shown in Figure 2a. Laser light was from a tunable Optical Parametric Oscillator pumped by the second harmonic output of an Nd:YAG pulsed laser. The laser illuminated the samples at the wavelength of 1210 nm where lipid has strong optical absorption [25]. During signal acquisition, the laser illuminated the whole adipose tissue specimen, achieving an averaged light fluence of 5.6 mJ/cm<sup>2</sup> on the sample surface. Each specimen was cut into an ellipsoid-like shape; sizes shown in Table 1. A needle hydrophone was set against the samples to acquire PA signals. Through a preamplifier and a low-noise amplifier, the PA signal received by the

hydrophone was recorded with a digital oscilloscope. When recording PA signal, averaging was done by oscilloscope for 50 times for one PA signal. We listed the experiment parameters in Table 1.



Considering the limitation of our sample size, we also performed data augmentation [26] to enlarge data set size. Before each training, we first randomly picked out 90% of the 110 samples (i.e., 99 samples) as a training set and the left (i.e., 11 samples) as a test set. Next, as shown in Figure 3a, the signal of each sample was divided into three parts that shared the same length and cover effective signal so that 297 samples and 33 samples were obtained for training set and test set respectively [27]. We also added white Gaussian noise to signals of training set and another two signals were obtained for each signal so that totally 891 samples were used for training. We assumed that the average adipocyte size for each signal was equal to the average adipocyte size of the signal which generated it. In our study, we performed this data augmentation each time before training.

**Figure 3.** Data augmentation. (**a**) Raw signal was divided into three parts based on time. (**b**) Signal from (**a**) with noise and its power spectrum in log scale before and after normalization.

The power spectrum of each signal obtained through data augmentation was calculated. To reduce the effects of laser energy fluctuation on the signal, normalization was performed in the frequency domain by making magnitudes of all frequency components divided by their sum, which can be defined as

$$P\_{norm}(k) = \frac{P(k)}{\sum\_{i=1}^{N-1} P(i)}, k = 0, 1, \dots, N - 1,\tag{4}$$

where *P*(*k*) is the power spectrum of the PA signal and *Pnorm*(*k*) is the power spectrum after normalization. One example of power spectrum in log scale are shown in Figure 3b.

In this study on human adipose tissue specimens, we focused on the spectral band of 0.5 to 24.5 MHz. The minimum value of 0.5 MHz was chosen to remove the low frequency noise below 0.5 MHz which was mainly caused by the laser illumination on the surface of the hydrophone. The maximum value of 24.5 MHz was chosen because the power spectrum before normalization reached the noise level ( −40 dB) at this frequency. We aimed to test the validity and potential of deep learning method for adipocyte size evaluation based on the dataset.

#### *3.2. Most Sensitive Spectral Band*

We first tried to find out the most sensitive spectral band which can lead to accurate evaluation of average adipocyte size. Based on features of our data set, we predicted the spectral band of 12.5 to 16.5 MHz as the most sensitive. PA signals are wide-band signals and can reflect histological microfeatures of adipocyte tissues. The average adipocyte size of our data set was mainly within the range of 3600 to 6400 μm2. As depicted in Figure 2c, an adipocyte is shaped like a polygon. If we consider the adipocyte as a square, the relationship between its diagonal length *l* and area (i.e., size) *S* can be expressed as the equation *S* = *l* 2/2. Thus, the average diagonal length of our data set mainly ranged from 84.85 μm to 113.14 μm. The corresponding time *t* it takes for sound to pass through adipocyte along diagonal can be computed, using the equation *t* = *l*/*<sup>c</sup>*0, where *c*0 is the speed of sound considered as 1450 m/s (a typical value in human adipose tissue [28]). Here, we used diagonal to estimate *t* because it is the longest path for sound to pass through, which varies when size changes. Therefore, the most sensitive frequency of our data set is considered to be

$$f = \frac{1}{t} = \frac{c\_0}{d} \tag{5}$$

which ranged from 12.82 MHz to 17.09 MHz. Therefore, we first tested our deep learning method on the spectral band of 12.5 to 16.5 MHz.

We picked out the normalized magnitude of spectral band of 12.5 to 16.5 MHz as the input of the deep network; the corresponding average adipocyte size acquired from standard histology of the tissue worked as the output. The input had 80 elements, each correspondent to the magnitude over a 0.05-MHz step in the 4-MHz spectral band. To initialize and train the deep network, a MATLAB (2017, MathWorks, Natick, MA, USA) toolbox called Neural Network Toolbox was employed. When training, the Levenberg–Marquardt method [29] was selected as the optimization algorithm. As described in Section 2, the deep network had several layers and each layer was a full connected layer shown in Figure 1. The output of each layer works as the input of the next layer. Different numbers of layers can make up different networks which may show various performance. We tried on networks with different layer numbers to find the optimal layer number. For each layer number, the element number of the output of the first hidden layer was set to about half of the input element number to reduce the number of parameters in the network and for rest layers, the element number of the output decreased gradually. The network was retrained for a total of 10 times for each layer number. For each time, data augment, described in Section 3.2, was performed and the minimum, maximum, and average values of the 10 MREs of the test set were computed, the result of which is shown in Figure 4.

The depth (number of layers) of the deep network largely influences the performance of the network, as illustrated in Figure 4. Initially when the layer number increases, the network becomes deeper and a more complicated model is established to process the sample data, as a result of which, the MRE of the test set becomes smaller and the deep network is more stable as demonstrated by the smaller deviation range. However, additional layers mean more parameters of the deep network and the longer training time. In addition, problems such as vanishing gradient problem of tan-sigmoid function, overfitting, and the stronger influence of the noise on the network may appear when the network becomes too deep [30]. Therefore, when the layer number is larger than seven, further increasing the layer number leads to larger MRE and larger deviation range reflecting unstable performance. As a result, a 7-layer network turned out to be the best choice.

**Figure 4.** Training and testing using the spectral band of approximately 12.5 to 16.5 MHz with different layer numbers. The minimum, maximum, and average of MREs of test set for 10 retrainings are drawn for each layer number.

With the optimal network of seven layers which is shown in Figure 5, we also performed deep learning method on other bands of the spectral range with the same data set to test whether spectral band of 12.5 to 16.5 MHz really leads to the most accurate evaluation of average adipocyte size among the different spectral bands. The spectral range of 0.5 to 24.5 MHz was divided into twenty-one spectral bands. Each spectral band had a length of 4 MHz as well as a 3 MHz (75%) overlap with the adjacent spectral band. For each of the twenty-one spectral bands, the normalized magnitude of the 4-MHz spectral band worked as the input of the deep network, while the corresponding average adipocyte size worked as the output. Using data augmentation illustrated in Section 3.2, the deep network was trained for each band. Then, the relationship found out from the training was tested using the test set. For each of the 4-MHz spectral bands, the network was retrained for 10 times, and the average error of each time was computed. The average MRE from the 10 retrainings over different spectral bands are listed in Table 2.

**Figure 5.** The structure of deep network. The numbers of the elements of the input and the output are marked on the left of each layer in the form of "*Im*/*Om*" for the *m*-th layer. All the layers are connected in the same as shown in Figure 1. The details of layer 1 and layer 2 are presented in the right as an example.


**Table 2.** Errors from the test set over different spectral bands.

By studying each 4-MHz spectral band in the entire spectral band using the deep network, the spectral band of 12.5 to 16.5 MHz turned out to be mostly sensitive to the adipocyte size. As is shown in Figure 6, in the range of 0.5 to 16.5 MHz, spectral bands covering higher frequencies lead to smaller errors; while in the range of 16.5 to 24.5 MHz, the error becomes larger when the frequency is higher. In addition to changes in MREs, the range of errors also changes with the spectral bands as well. In the spectral band of 0.5 to 4.5 MHz, the error varies in a large range. The range of errors decreases when the spectral band moves from 0.5 to 4.5 MHz to 12.5 to 16.5 MHz. However, when the spectral bands changes from 12.5 to 16.5 MHz to 20.5 to 24.5 MHz, the range of errors also increases. As a result, the spectral range of 12.5 to 16.5 MHz leads to the minimum error, and also a much smaller range of errors. The variation of error in different spectral bands results from the adipocyte size distribution. As illustrated in Figure 2d, we used Gamma distribution to fit the adipocyte size distribution. Number of adipocytes with sizes close to average size is larger than that of adipocytes whose size is away from the average. Therefore, the spectral band which is closer to the most sensitive spectral band corresponds to adipocytes accounting for a larger proportion and makes the performance of deep learning network more accurate and stable.

**Figure 6.** Plot showing mean relative error (MRE) of test set and the corresponding spectral bands. For each spectral band, the minimum, maximum and average of MREs of test set for 10 retrainings are drawn at the center frequency.

#### *3.3. Training on the Entire Spectral Band*

In addition to the most sensitive spectral band, we also tested our network on the entire spectral band with the same data set as we used on the most sensitive spectral band and we assumed that using all frequency components could contribute to a smaller error since PA signal is broad-band signal. The lower spectral range of 0.5 to 10.5 MHz contains information of outlines of adipocytes in tissue samples. It shows the size of adipocyte group which consists of several adjacent adipocytes. Sizes of these adipocyte groups are relative to numbers and sizes of each individual adipocyte. Indicated by Equation (5), a lower frequency is correspondent to a larger size of adipocyte group which consists of more or larger adjacent adipocytes. Thus, the spectral bands in this range contribute to evaluation of average adipocyte size but with a larger MRE (more than 14%) as listed in Table 2. Higher frequency in spectral range of 10.5 to 24.5 MHz is correspondent to different adipocyte sizes estimated by Equation (5). A higher frequency is relative to a smaller adipocyte size. Therefore, different frequency components indicate adipocyte size distribution and combining spectral information of all frequency components may lead to a smaller error.

To examine our hypothesis, our network was trained on the entire spectral band to obtain the error of combining all spectral components. Based on the normalized magnitude over a 0.05-MHz step in the range of 0.5 to 24.5 MHz, calculated when testing the network on different spectral bands, we calculated the sum of every six consecutive magnitude values and obtained the normalized magnitude over a 0.3-MHz step so that 80 input elements covered the entire spectral range of 0.5 to 24.5 MHz. Then, with normalized magnitude of the entire spectral band of 0.5 to 24.5 MHz with a 0.3-MHz step as the input and the correspondent average adipocyte size as the output, a network with the structure shown in Figure 5 was retrained 10 times. We performed the data augmentation for each retraining. The error distribution of all tested samples (330 samples) from the 10 retrainings is shown in Figure 7. As listed in Table 3, an average MAE of 394.19 μm<sup>2</sup> and MRE of 9.30% was achieved, which is smaller than errors of using different spectral bands listed in Table 2. The result indicated that combining all frequency components could lead to more accurate evaluations.

**Figure 7.** Error distribution of all tested samples of 10 retrainings using the most sensitive spectral band (12.5–16.5 MHz) and the entire spectral band (0.5–24.5 MHz). (**a**) Absolute error distribution for all tested samples. Absolute errors are mainly within the range of 0 to 500 μm<sup>2</sup> when training on the entire spectral band. The distribution of the relative errors is shown in (**b**).

**Table 3.** Mean absolute errors (MAEs) and MREs of the test set in 10 retrainings using the most sensitive spectral band and entire band.


#### *3.4. Training on the Entire Spectral Band*

We also compared the performance of the deep learning method with the conventional PASA method relying on linear fitting over the entire spectral band. This conventional method was used in a previous study to process the same data set from the human adipose tissue specimens [17]. With the power spectrum obtained from each tissue specimen over a spectral range of 0.5 to 24.5 MHz, a linear fit was produced which led to a quantified spectral parameter of *slope*. We assume that the relationship between the *slope* and the average adipocyte size follows a polynomial regression model of *y* = *ax*<sup>−</sup><sup>3</sup> + *bx*−<sup>2</sup> + *cx*<sup>−</sup><sup>1</sup> + *d*, where *y* is *slope* and *x* is the predicted average adipocyte size. To examine this conventional model, we performed data augmen<sup>t</sup> described in Section 3.2 so that 891 samples were used to calculate the parameter slope and obtain the best fitting using the polynomial regression model. The parameters *a*, *b*, and *c* from the best fitting of the polynomial regression model was tested on the rest 33 samples (Section 3.2). By comparing the predicted adipocyte size with the result from the gold standard histology, the error for each sample in the test set was computed. With the results from all the samples in the test set, the MAE and MRE were quantified for the conventional method of PASA, and compared to the outcomes from deep learning. The errors of using the PASA method and deep learning method combining all frequency components were listed in Table 4. As we can see, deep learning method provided better accuracy compared to conventional PASA method.

**Table 4.** Comparison between deep learning method and photoacoustic spectral analysis (PASA) method.

