**2. Methods**

Given that the detected PA signals have low SNR that deteriorate the quality of PA images, de-noising operation is quite common as a pre-processing step. Theoretically, averaging signals of multiple frames can eliminate random background noise and raise the SNR. However, non-negligible time shifts among signals of different frames always exist for a variety of reasons, including the slight fluctuation of the water around the imaging target, the small movement from a living biological imaging target, and system defect (possibly random delay generated by inner circuit). To solve this issue, we adopted an effective averaging method to adjust time shifts based on cross-correlation method [14,15].

First, we calculated the time shifts (Δ *m*1, Δ *m*2, ···) between the reference frame and other frames by cross-correlation. For discrete PA signals *p*1(*n*) and *p*2(*n*), an estimate of the cross-correlation function is defined as [16]:

$$\hat{\mathcal{R}}\_{p\_1 p\_2}(m) = \begin{cases} \frac{1}{L - |m|} \sum\_{n=0}^{L-m-1} p\_1(n+m) p\_2^\*(n), & m \ge 0\\ \hat{\mathcal{R}}\_{p\_1 p\_2}(-m), & m < 0 \end{cases} \tag{1}$$

where *p*∗ 2 (*n*) is the complex conjugate of *p*2(*n*), *L* is the maximum length of *p*1(*n*) and *p*2(*n*), the normalization coefficient 1/(*L* − |*m*|) can avoid the zero bias inherent in the standard cross-correlation function. According to this equation, the value of time shift Δ *m* should equal to the value of *m* when the cross-correlation function *R*ˆ *p*1*p*2(*m*) is of maximum value.

Assuming *p*1(*n*) is the uniform reference, *p*2(*n*), the signals of all the rest frames need to be shifted to the length of their corresponding |Δ *m*| forwards or backwards based on the sign of Δ *m*. After that, shifted signals of multiple frames that all align is acquired, so we can obtain the final de-noised signal *p*<sup>ˆ</sup>(*n*) by simply averaging them all.

Mathematically, the convolution result of a single wave and positive pulses with different delay is the superposition of many same-shaped waves with different delay, and the specific size of each wave depends on the size of the corresponding positive pulse. Hence, we assumed that PA signal is the convolution result of a series of specific monopolar pulses and a single N-shape wave. With that assumption, the monopolar signal, which contains the amplitude information and time information, can theoretically be acquired through deconvolution. As we mentioned in the former section, the N-shape wave basis has defects that will reduce the quality of PA images, while the

de-convolved monopolar signals possess equally important information for imaging as raw PA signals and perform better than raw N-shape PA signals when reconstructed into images. We intended to acquire de-convolved signals to replace raw PA signals.

Generally, the discrete detected PA signal s(*n*) can be represented in convolution form as:

$$\mathbf{s}(n) = o(n) \* h(n) + n(n),\tag{2}$$

where ∗ denotes convolution, *h*(*n*) is the kernel of this signal convolution system, s(*n*) is the detected raw signal, *o*(*n*) refers to the signal which we aim to acquire and *n*(*n*) represents the additive noise. In our method, we took *h*(*n*) as a normalized single N-shape pulse which is acquired by detecting an approximately ideal optical absorbing particle, specifically, the smallest source which the PA data acquisition system can measure. For example, assuming that velocity is 1540 m/s, an L7-4 ultrasound probe whose center frequency is 5.208 MHz can detect the minimum source is approximately a diameter of 0.296 mm, so in this case, the kernel *h*(*n*) ought to be acquired by detecting the source whose size is closest to and bigger than 0.296 mm.

The deconvolution algorithm we used was the Wiener deconvolution [17]. A typical simulated signal was used for demonstration. As shown in Figure 1b, the signal was converted from a bipolar signal to monopolar signal after deconvolution. However, directly using de-convolved signal to reconstruct images creates defects. As shown in Figure 1c, when reconstructing images by conventional delay-and-sum algorithm, assuming the velocity is uniform, the value corresponding with the position of the imaging target was also added to the other positions of the same distance to the transducer element as imaging target. Normally, artifacts can be restrained as the signal has both positive part and negative part to cancel each other out. Since the de-convolved signal is monopolar, unexpected artifacts will appear around the imaging object because amplitude cannot be cancelled out when summing up as they are all positive. To alleviate this phenomenon and further optimize the images, EMD was adopted subsequent to signal deconvolution.

**Figure 1.** (**a**) Diagram of the normalized deconvolution kernel *h*(*n*) in simulation; (**b**) comparison of simulated signal before and after deconvolution; (**c**) illustration of the widespread artifacts forming process in delay-and-sum algorithm.

The principle of EMD [18], is to decompose a signal into several intrinsic mode functions (IMFs) without setting any basis function in advance, and each of them reserves the local features of the original signal with different time scales. Given that different IMFs possess different features, we were able to rebuild new signals to meet our requirements by depressing some of the IMFs and enhancing other IMFs. Specifically, we applied weight coefficients to each IMF and refactor new signal automatically under the constraints which is set in advance. In our proposed method, we apply the de-convolved signal to EMD and refactor it with two constraints, positive polarity and spectrum consistence. Given that EMD can only process one-dimensional signals at a time, a one-dimensional signal *<sup>o</sup>*1(*n*) (detected by a single transducer element of the probe) of de-convolved signal o(*n*) should be considered as an example, the procedure [19] is described in detail as follows:

Step 1: Use EMD method to decompose signal *<sup>o</sup>*1(*n*) into *k* IMFs (*k* is a changeable number) and the residue:

$$o\_1(n) = \sum\_{i=1}^{k} IMF\_i + res(n)\_\prime \tag{3}$$

Step 2: Set weight coefficient value of each IMF to refactor a set of new signals:

$$mew\_{o1(n)} = [a\_1 \ a\_2 \ \cdots \ a\_k] \times \left[ \begin{array}{c} IMF\_1 \\ \hline \ IMF\_2 \\ \vdots \\ \hline \ IMF\_k \end{array} \right] \tag{4}$$

where *a* = [*a*1 *a*2 ··· *ak*] refers to the weight coefficients of IMFs. As the number of IMFs is *k*, the number of weight coefficient is also *k* and each weight coefficient corresponds with an IMF. The value of each weight coefficient ranges from [0, 1] with step 0.1, so that the number of iteration is 11*k*. The coefficient of residue remains 1.

Step 3: Calculate two constraints in each iteration:

The first constraint is positive polarity. As the original de-convolved signal is monopolar, positive polarity is set to avoid distortion between the refactor signal and the original signal. Positive polarity is defined as:

$$p = \frac{\min(new\\_o\_1(n))}{\max(new\\_o\_1(n))} > \text{threshold},\tag{5}$$

where the threshold here should equal the minimum amplitude of *<sup>o</sup>*1(*n*) divided by maximum amplitude of *<sup>o</sup>*1(*n*). Within each iteration, the second constraint can be calculated if the refactored signal satisfies the positive polarity. Otherwise, this iteration should be skipped and to begin the next iteration.

The second constraint is spectrum consistence. As the high frequency components usually retain detailed information, we applied the spectrum consistence to refactoring a new signal, with more detail and less aliasing. Spectrum consistence is defined as:

$$\mathcal{L} = \sum\_{\omega=f\_1}^{f\_2} f(\omega) \left/ \sum\_{\omega=f\_0}^{f\_1} f(\omega), \ \omega < \frac{f\_s}{2} \right. \tag{6}$$

where *f*(*ω*) is the FFT result of refactored signal, *fs* refers to the sampling frequency. As band-limited ultrasound transducers can only receive the signal with a limited frequency band, *f*0 refers to the low cut-off frequency, *f*2 refers to the high cut-off frequency and *f*1 is the middle frequency point of the frequency band. We assume that the high frequency component and the low frequency component is divided by *f*1. Set the initial *c* = 0, and if the *c* of this iteration is bigger than the former one, the weight coefficient of this iteration should be adopted and the former *c* should be substituted by the new *c*.

Step 4: repeat the iterations in Step 2 and Step 3 until it traverses all the weight coefficients, and we can then assume that the *new*\_*o*1(*n*), which satisfies the positive polarity and obtains the biggest spectrum consistence *c* is the best refactored signal.
