**1. Introduction**

Motion during signal acquisition is known to result in image blurring and can further hinder proper registration of images acquired by different modalities [1–4]. Respiratory motion compensation in tomographic imaging methods is often based on a gated acquisition assisted by physiological triggers, e.g., an electrocardiogram (ECG) signal. In prospective gating, the data is acquired during a limited time window when minimal, or no motion occurs. Alternately, retrospective gating correlates between the acquired images and physiological triggers during post-processing [5]. More advanced retrospective approaches are based on self-gated methods where the physiological trigger is extracted from the image data itself [6–8]. An alternative solution consists in motion tracking of specific points and subsequent correction with rigid-body transformations [9]. In some parts of the body, such as the thoracic region, non-rigid motion is further produced. Thus, more sophisticated models are generally required to estimate and correct for the effects of respiratory motion [10].

High-frame-rate imaging modalities can avoid motion if sub-pixel displacements are produced during the effective image integration time. Particularly, optoacoustic tomography (OAT) can render 2D and 3D images via excitation of an entire volume with a single laser pulse [11]. This corresponds to an effective integration time in the order of the pulse duration, typically a few nanoseconds. This way, the tissue motion can be "frozen" much more e fficiently than most other imaging modalities. OAT has found applicability in biological studies demanding high-frame-rate imaging, such as characterization of cardiac dynamics [12], mapping of neuronal activity [13], monitoring hemodynamic patterns in tumors [14] or visualization of freely-behaving animals [15]. Moreover, real-time imaging has been paramount in the successful translation of OAT to render motion-free images acquired in a handheld mode [16]. While motion correction in OAT might not be relevant for images rendered with a single laser pulse, acquisition of multiple frames is still required in many applications, e.g., for rendering volumetric data from multiple cross-sections or for extending the e ffective field of view (FOV) of a given imaging system [17]. Multiple frames are also required for multi-spectral optoacoustic tomography (MSOT) applications, where mapping of intrinsic tissue chromophores or extrinsically administered agents is achieved via spectral or temporal unmixing [18–21]. Cardiac and breathing motion could readily be captured by OAT systems running at frame rates of tens of hertz [22,23], and several approaches have been suggested to mitigate motion artefacts in applications involving multi-frame data analysis. For instance, respiratory motion gating was suggested by simultaneously capturing the animal's respiratory waveforms [24]. Motion correction was alternatively performed with 3D rigid-body transformations [25] and with free-form deformation models [26]. Models of body motion have also been suggested for other types of scanning-based systems [27,28]. Additionally, motion suppression could be achieved by reducing the delay between consecutive pulsed light excitations [29,30], which requires dedicated laser systems.

In this work, we demonstrate that motion rejection in OAT can e ffectively be performed on-the-fly, before image reconstruction. The suggested approach consists in clustering a sequence of OAT frames that employs the raw time-resolved signals without involving computationally and memory extensive post-processing. This represents an important advantage over other known approaches operating in the image domain [31].

#### **2. Materials and Methods**

#### *2.1. Pre-Reconstruction Motion Rejection Approach*

The algorithm suggested in this work aims at motion rejection in OAT systems based on a multi-frame acquisition of time-resolved pressure signals with transducer arrays. Figure 1a schematically depicts two examples of transducer array configurations for 2D and 3D imaging, which are described in more detail in the following sections. The acquired signals are generally arranged into so-called sinograms, where every sinogram represents a single frame (Figure 1b). At a fixed transducer position, k frames (sinograms) are acquired. These frames consist of matrices with rows representing the m time-samples of each signal and columns corresponding to the n transducer elements (channels) of the array. Step 1 of the algorithm consists in rearranging the k frames of the sequence into columns of a 2D matrix containing m x n rows and k columns, which represent the entire sequence of frames at a fixed transducer position (Figure 1c). In the experiments performed, the number of frames acquired at each array position were chosen to adequately capture a complete breathing cycle. Step 2 of the algorithm consists in calculating the autocorrelation matrix of all pairs of frames (MATLAB (Mathworks Inc, Natick, USA) function 'corrcoef'). An example of the calculated correlation coe fficients is displayed in Figure 1d. At a fixed transducer position, time decorrelation is expected to be chiefly a ffected by respiratory motion. Clustering of frames is subsequently done in Step 3 by applying the second order k-means method to the correlation coe fficients matrix (Figure 1e, MATLAB function 'kmeans'). In Step 4, the k frames are then divided into two sets based on predetermined knowledge of the characteristic physiology of the animal under specific anesthesia. As a rule, motion frames are typically fewer than static frames. Notably, when scanning at multiple transducer positions, Steps 1–4 are to be repeated for each transducer position. As an example, Figure 1f displays a comparison of the 3D views of a reconstructed image from a single position of the spherical array, as obtained from the averaged selected-frames and from the averaged rejected-frames.

**Figure 1.** A schematic diagram of the steps involved in the motion rejection algorithm. (**A**) Two- and three-dimensional scanning systems, (top) spiral volumetric optoacoustic tomography (SVOT) based on a spherical array of transducers and (bottom) cross-sectional optoacoustic tomography based on a full-ring array of cylindrically focused transducers. (**B**) Sequence of frames (sinograms) acquired at a single position of the scanner. (**C**) Rearrangement of the data corresponding to the entire sequence into a single matrix. (**D**) Correlation coefficients of the autocorrelation matrix of the columns in (**C**). (**E**) K-means clustering of the correlation coefficients matrix into two groups, namely, selected (static) frames and rejected (motion) frames. (**F**) Volumetric image of a blood vessel reconstructed with data from the selected versus the rejected-frames.

#### *2.2. Spiral Volumetric Optoacoustic Tomography*

The spiral volumetric optoacoustic tomography (SVOT) scanner is schematically depicted in Figure 1a (top). A detailed description of the system is available elsewhere [32]. Briefly, a spherical ultrasound array of piezocomposite elements (Imasonics SaS, Voray, France) is mounted on motorized rotating and translating stages and scanned around the animal following a spiral trajectory. The array consists of 256 elements with a central frequency of 4 MHz and −6 dB bandwidth of ~100%, arranged in a hemispherical surface with angular coverage of 90◦. The excitation light beam is guided via a fiber bundle (CeramOptec GmbH, Bonn, Germany) through a cylindrical aperture in the center of the sphere. SVOT enables imaging of the entire mouse with a nearly isotropic 3D spatial resolution in the 200 μm range [31]. In the experiments, light excitation was provided with a short-pulsed laser (<10 ns duration pulses with 25 mJ per-pulse energy and up to 100 Hz pulse repetition frequency) based on an optical parametric oscillator (OPO) crystal (Innolas GmbH, Krailling, Germany). The pulse repetition frequency of the laser was set to 25 Hz and the wavelength was maintained at 800 nm, corresponding to the isosbestic point of hemoglobin. The array was scanned for 17 angular positions separated by 15◦ (total angular coverage in the azimuthal direction of 240◦) and for 30 vertical positions separated by 2 mm (total scanning length of 58 mm, a full-body scan requires approximately 10 min). 50 frames were captured for each position of the array, for which all signals were simultaneously digitized at 40 megasamples per second with a custom-made data acquisition system (DAQ, Falkenstein Mikrosysteme GmbH, Taufkirchen, Germany) triggered with the Q-switch output of the laser. The acquired data was eventually transmitted to a PC via Ethernet.

#### *2.3. Cross-Sectional Optoacoustic Tomography with a Ring Array*

The system layout is depicted in Figure 1a (bottom) while its detailed description is available in [33]. Briefly, the ultrasound array (Imasonics SaS, Voray, France) consists of an 80 mm diameter ring having 512 ultrasound individual detection elements with 5 MHz central frequency and −6 dB bandwidth of ~80%. Each element is cylindrically focused at a distance of 38 mm to selectively capture signals from the imaged cross-section. In the experiments, light excitation was provided by a short-pulsed (<10 ns duration pulses at a wavelength of 1064 nm with ~100 mJ per-pulse energy and 15 Hz pulse repetition frequency) Nd:YAG laser (Spectra Physics, Santa Clara, California). The laser beam was guided with a fiber bundle (CeramOptec GmbH, Bonn, Germany) having 12 output arms placed around the circumference of the ring transducer with an angular separation of 60◦ between the arms. Much like the SVOT system, signals detected by all the array elements were simultaneously digitized at 40 megasamples per second using custom-made DAQ (Falkenstein Mikrosysteme GmbH, Taufkirchen, Germany), triggered with the Q-switch output of the laser. The data was transmitted to a PC via Ethernet. In total, 100 frames were recorded with the array positioned at two distinct regions of the mouse.

#### *2.4. Image Reconstruction and Processing*

In both scanning systems, the acquired signals were band-pass filtered (cut-o ff frequencies 0.25–6 MHz for SVOT and 0.5–8 MHz for cross-sectional OAT) and deconvolved with the impulse response of the array elements before reconstruction. For SVOT, tomographic reconstructions of single volumes (15 × 15 × 15 mm3) for each scanning position of the spherical array transducer were done using a 3D back-projection-based algorithm [34,35]. Volumetric images reconstructed at every transducer position were stitched together to render images from a larger field of view (whole-body scale). For cross-sectional OAT, the same back-projection algorithm was modified to account for the heterogeneous distribution of the speed of sound in the mouse versus the coupling medium (water) [36]. For this, an initial image was first reconstructed by considering a uniform speed of sound corresponding to the speed of sound in water (determined from the measured water temperature). The animal's surface was then manually segmented, and the reconstruction was fine-tuned by assigning a di fferent speed of sound to the segmented tissue volume in order to optimize image quality. The processing was executed with a self-developed MATLAB code. Universal image quality index (QI) was calculated for the resulting images. QI is an objective image quality index that combines three models: loss of correlation, luminance distortion and contrast distortion—a detailed description and e fficient MATLAB implementation was reported in [37].
