**1. Introduction**

In recent years, there has been increasing interest in the development of imaging techniques that are capable of reconstructing the full-wave properties (amplitude and phase) of arbitrary electromagnetic field distributions [1–3]. While standard optical technologies, such as cameras and photodiodes, are usually sensitive to the field intensity, a large part of the sample information is encoded in

the optical phase of the scattered field [4]. Interestingly, the direct detection of the field evolution is achievable at terahertz (THz) frequencies thanks to the availability of the time-domain spectroscopy (TDS) technique. TDS detection provides a time-resolved measurement of the electric field (e.g., via electro-optical sampling [5]), allowing researchers to retrieve the complex-valued dielectric function of a sample. Such a capability, coupled with the existence of specific and distinctive spectral fingerprints in the terahertz frequency range, are critical enabling tools for advanced applications, such as explosive detection, biological imaging, artwork conservation and medical diagnosis [6–10]. However, despite the vast body of potential applications, the development of TDS devices that are capable of high-resolution imaging is still regarded as an open challenge. A typical TDS implementation relies on complex and expensive optical components that cannot be easily integrated into high-density sensor arrays [11].

To date, THz imaging mostly relies on thermal cameras, essentially the equivalent of optical cameras, which employ arrays of micro-bolometers to measure the time-averaged intensity of the THz signal. As such, they cannot be employed for time-resolved THz detection and they are insensitive to the optical phase and temporal delay of the transmitted THz field. In an attempt to develop arrays of TDS detectors, researchers have proposed two-dimensional full-wave imaging devices that are composed of arrays of photoconductive antennas or Shack–Hartmann sensors [12,13]. However, these devices require complex and expensive technological platforms and their practicality is still a matter of debate. Furthermore, they fundamentally sample the image information in an array of single and well-separated small points. Hence, obtaining a high resolution can still require mechanical action on the sample.

A promising alternative to TDS imaging arrays is single-pixel imaging, or ghost imaging (GI). In these approaches, the sensor array is replaced by a single bucket detector, which collects the field scattered by the sample in response to a specific sequence of incident patterns. By correlating each acquired signal with its corresponding incident field distribution, it is possible to reconstruct the sample image [14–17]. However, despite its simplicity, the implementation of GI at terahertz frequencies is affected by the limited availability of wavefront-shaping devices (e.g., spatial light modulators) that are capable of impressing arbitrary patterns on an incident THz pulse. Following the initial experimental demonstrations with metallic masks and metamaterial devices [18,19], several research groups' researchers have proposed indirect patterning techniques for the generation of high-resolution THz patterns. One of the most successful approaches relies on the generation of transient photocarrier masks on semiconductor substrates [20–23]. In these experiments, a standard optical Spatial Light Modulator (SLM) impresses a spatial pattern on an ultrafast optical beam. Upon impinging on a semiconductor substrate, the latter generates a distribution of carriers matching the desired pattern profile, which acts as a transient metallic mask and can be used to pattern an external THz beam. While this technique has been successfully employed to achieve THz imaging with a deeply subwavelength resolution, it is also affected by a few limitations. In particular, recent works have shown that the maximum resolution achievable with these techniques is strongly dependent on the semiconductor substrate thickness: in Stantchev and coworkers [20,21], for example, researchers have demonstrated that deeply subwavelength resolutions are achievable only when considering patterning substrates with a thickness below 10 μm.

In a series of recent works, we have proposed a new imaging technique, time-resolved nonlinear ghost imaging (TIMING), which overcomes several of these limitations [24–26]. TIMING relies on the integration of nonlinear THz pattern generation with TDS single-pixel field detection. In this work, we discuss the main features of our approach and present our latest results on the theoretical framework underlying our image reconstruction process. Via analysis of the compression properties of the incident pattern distribution, we show how a TIMING implementation based on an optimised Walsh–Hadamard encoding scheme can significantly reduce the number of incident patterns required to obtain a high-fidelity image of the sample. Finally, we discuss how the development of ultra-thin THz emitters can provide a significant improvement to the imaging performance of TIMING.

#### **2. Time-Resolved Nonlinear Ghost Imaging: A Conceptual Overview**

A conceptual schematic of our imaging setup is shown in Figure 1a. A spatial pattern is impressed on the optical beam through a binary spatial light simulator, e.g., a digital micromirror device (DMD), obtaining the optical intensity distribution *I opt <sup>n</sup>* (*x*, *y*, ω). The THz patterns *E*<sup>0</sup> *<sup>n</sup>*(*x*, *y*, *t*) are generated using a nonlinear conversion of *I opt <sup>n</sup>* (*x*, *y*, ω) in a nonlinear quadratic crystal (ZnTe) of thickness *z*0. The THz pattern propagates across the crystal and interacts with the object, yielding a transmitted field, which is collected by a TDS detection setup. Different from the standard formulations in optics, which relies on the optical intensity, our object reconstruction scheme relies on the time-resolved detection of the electric field scattered by the object. More specifically, the electric field distribution is defined immediately before and after the object as *<sup>E</sup>*−(*x*, *<sup>y</sup>*,*<sup>t</sup>* = *<sup>E</sup>*(*x*, *<sup>y</sup>*, *<sup>z</sup>*<sup>0</sup> <sup>−</sup> , *<sup>t</sup>*) and *<sup>E</sup>*+(*x*, *<sup>y</sup>*, *<sup>t</sup>*) = *<sup>E</sup>*(*x*, *<sup>y</sup>*, *<sup>z</sup>*<sup>0</sup> + , *<sup>t</sup>*), respectively, where *z*<sup>0</sup> is the object plane and > 0 is an arbitrarily small distance (Figure 1a, inset). Without loss of generality, the transmission properties of the object are represented by defining the transmission function *T*(*x*, *y*, *t*), which is defined on both the spatial and temporal components to account for the spectral response of the sample. To simplify our analysis, in the following, we considered two-dimensional objects, i.e., we restricted ourselves to transmission functions of the form *T*(*x*, *y*, *t*). Under this position, the transmitted field is straightforwardly defined as:

$$E^{+}(\mathbf{x}, y, t) = \int \mathrm{d}t' T(\mathbf{x}, y, t - t') E^{-}(\mathbf{x}, y, t). \tag{1}$$

The objective of a single-pixel imaging methodology is to reconstruct the transmitted field distribution *E*+(*x*, *y*, *t*) through a sequence of measurements to retrieve the transmission function of the object. In our approach, this corresponds to measuring the TDS trace of the spatially-averaged transmitted field from the object in response to a sequence of predefined patterns (a procedure known as computational ghost imaging) [27]. The *n*th pattern is denoted by *E*− *<sup>n</sup>* (*x*, *y*, *t*) = *Pn*(*x*, *y*)*f*(*t*), where *Pn*(*x*, *y*) is the deterministic spatial distribution of the pattern and *f*(*t*) is the temporal profile of the THz pulse. The reconstruction process is defined as follows:

$$T(\mathbf{x}, y, t) \;= \; \mathbb{C}\_{n}(t) P\_{n} \; (\mathbf{x}, y)\_{n} - \mathbb{C}\_{n}(t)\_{n} P\_{n} (\mathbf{x}, y)\_{n} \; \tag{2}$$

where ···*<sup>n</sup>* represents an average over the distribution patterns and the expansion coefficients *Cn*(*t*) are defined as follows:

$$\mathcal{C}\_{\boldsymbol{n}}(t) = \int \mathrm{d}\mathbf{x} \mathrm{d}y \, E\_{\boldsymbol{n}}^{+}(\mathbf{x}, \boldsymbol{y}, t) = \int \mathrm{d}\mathbf{x} \mathrm{d}y \mathrm{d}t' T(\mathbf{x}, \boldsymbol{y}, t - \mathbf{t}') \mathcal{E}\_{\boldsymbol{n}}^{-}(\mathbf{x}, \boldsymbol{y}, t). \tag{3}$$

A numerical implementation of the image reconstruction process is shown in Figure 1b,c, where we employed TIMING to reconstruct the transmitted field from a semi-transparent sample (a leaf). In Figure 1b, we report the spatial average of the reconstructed field, exhibiting the characteristic temporal profile of the incident THz pulse. Since our image reconstruction operates simultaneously in time and space, it allows for not only retrieving the spatial distribution of the object but also its temporal/spectral features. The specific result of a TIMING scan is a spatiotemporal image of the transmitted field, as shown in Figure 1c.

An interesting question is whether the distance between the distribution of THz sources and the sample has any effect on the image reconstruction capability of our setup. This point is pivotal when time-resolved imaging is desired, as propagation always induces space–time coupling. This condition represents a typical challenge in mask-based ghost imaging when time-domain detection is sought. The propagation within the patterning crystal is known to lead to significant reconstruction issues when considering deeply subwavelength patterns [20–22]. These issues are related to the intrinsic space–time coupling that takes place within the crystal [28]. In essence, once the patterns are impressed on the THz wave at the surface of the crystal (at *z* = 0), they undergo diffraction. As a result,

the electric field distribution *E*− *<sup>n</sup>* (*x*, *y*, *t*) probing the sample is not the initial distribution *E*<sup>0</sup> *<sup>n</sup>*(*x*, *y*, *t*), but rather a space-time propagated version of it. The latter is mathematically expressed as:

$$E\_n^-(\mathbf{x}, y, t) \ = E\_n(\mathbf{x}, y, z\_0 - \varepsilon, t) \ = \int \mathbf{dx} dy \mathbf{d}t' \mathbf{G}(\mathbf{x} - \mathbf{x}', y - y', z\_0 - \varepsilon, t - t') E\_n^0(\mathbf{x}, y, t), \tag{4}$$

where *G*(*x*, *y*, *z*<sup>0</sup> − , *t*) is the dyadic Green's function propagating the field from *z* = 0 to *z* = *z*<sup>0</sup> − . Since space–time coupling is essentially a linear process, it can be inverted by applying a Weiner filter to the reconstructed image to mitigate the effects of diffraction. In the angular spectrum coordinates *kx*, *ky*, *z*, ω , the Weiner filter is defined as:

$$\mathcal{W}\{k\_{\mathbf{x}}, k\_{\mathbf{y}}, z, \omega\} = \frac{\mathcal{G}^{\dagger}\left(k\_{\mathbf{x}}, k\_{\mathbf{y}}, z\_{\mathbf{z}}, \omega\right)}{\left|\mathcal{G}\{k\_{\mathbf{x}}, k\_{\mathbf{y}}, z, \omega\}\right| + \alpha \text{NSR}\{k\_{\mathbf{x}}, k\_{\mathbf{y}}, \omega\}}\tag{5}$$

where *NSR kx*, *ky*, ω is the spectral noise-to-signal distribution, α is a noise-filtering fitting parameter and † stands for Hermitian conjugation [24]. As expressed by Equation (5), the Weiner filter is the equivalent of an inverse Green's function operator that is modified to take into account the presence of noise in the experimental measurements. The effect of the *NSR* term in the denominator, which is controlled by the parameter α, is to suppress the regions of the spectrum that are dominated by noise and could render the inversion operation an intrinsically ill-posed problem [29].

**Figure 1.** Conceptual description of time-resolved nonlinear ghost imaging (TIMING). (**a**) Schematic of the experimental setup. (**b**,**c**) Simulation of the TIMING reconstruction of a semi-transparent sample, including the average field transmission (panel b) and the full spatiotemporal image of the sample (panel c). The simulated object size was 10.24 cm × 10.24 cm, sampled with a spatial resolution of 512 × 512 pixels (Δ*x* = 200 μm) and a temporal resolution of Δ*t* = 19.5 fs. The nonlinear crystal thickness was *z*<sup>0</sup> = 10 μm. n.u.: normalised units, TDS: Time-domain spectroscopy.

From a physical point of view, Equations (4) and (5) can be read as follows: when performing a time-domain reconstruction of the image, the spatial distribution of *E*<sup>+</sup> *<sup>n</sup>* (*x*, *y*, *t*) is acquired at a given time. However, this is not the scattered field from the object in response to the incident pattern *E*0 *<sup>n</sup>* at that time; there is no time in which the scattered field *E*<sup>0</sup> *<sup>n</sup>*(*x*, *y*, *t*) is univocally represented in the sampling pattern *E*− *<sup>n</sup>* . The reason is simply that the method is slicing a fixed-time contribution of a piece of information that is warped in the space-time. This warping is introduced by the distance between sources and the object plane; therefore, it is different for any plane of the object being imaged.

Said differently, using fixed-time images to reconstruct planar features produces a fundamentally incorrect picture of the evolving scattered field, with different degrees of "distortion" introduced by the amount of propagation. It is worth noting that, although related, this is not the same concept as that of resolution degradation of incoherent near-field systems. In fact, Equation (4) shows that any space-time information retained by the field can be accessed only by accounting for near-field propagation. TIMING reconstructs the image of a scattered field from an object with fidelity by applying the backpropagation kernel from Equation (5). Another interesting aspect is whether the thickness of the nonlinear crystal accounts for an overall separation between terahertz sources and the object, affecting the achievable resolution. The difference here is that the propagation is inherently nonlinear and although the generated terahertz signal diffracts linearly, for any desired resolution, there is always a given generating crystal section that is sufficiently close to the object to illuminate it within the required near-field condition. We have recently theoretically and experimentally demonstrated that the diffraction limit does not directly apply in the nonlinear GI via the generation crystal thickness since the nonlinear conversion from optical to THz patterns is a process distributed across the crystal [25]. We argue that this general approach is particularly useful when considering samples stored in cuvettes or sample holders.

#### **3. Compressed and Adaptive Sensing Applications**

In this section, we discuss the image reconstruction performance of TIMING as a result of our particular choice of input pattern distribution. To reconstruct the sample, TIMING relies on the Walsh–Hadamard (WH) image decomposition, which constitutes the binary counterpart of standard Fourier-based image analysis [30]. In our approach, the choice of the incident pattern distribution was driven by three considerations: (i) the compatibility with the available wavefront-shaping technology impressing patterns on the optical beam, (ii) the average signal-to-noise ratio (SNR) of the signal associated with each incident pattern and (iii) the energy compaction (compressibility) properties of the image expansion base. The WH patterns can be implemented straightforwardly through a digital micromirror device (DMD) and they are known to maximise the SNR of the acquired signals in experiments [31,32]. The latter is a significant advantage when compared to standard TDS imagers, which rely on a raster-scan reconstruction approach, where either the source or receiver (or both) are sequentially moved across the sample, leading to a combination of single-pixel detection and illumination [10]. While this approach is intuitive and straightforward to implement, a single-pixel illumination usually implies a degradation of the SNR of the expansion coefficients for a fixed intensity per pixel. Furthermore, raster-scan imaging is a local reconstruction algorithm that is not suitable for compressed sensing; in mathematical terms, the raster scan corresponds to expanding the sample image in the canonical Cartesian base *En*,*m*(*x*, *y*) = δ(*x* − *xn*, *y* − *yn*). Trivially speaking, to reconstruct the entire image with this approach, each pixel composing it needs to be scanned.

In contrast, the WH encoding scheme is a very popular example of energy compacting (compressive) decomposition, as in the case of Fourier-based or wavelet-based image analysis [33,34]. In these approaches, the image is represented as an orthogonal basis of extended spatial functions. For example, in the case of Fourier image analysis, the sampling patterns are the basis of the two-dimensional Fourier Transform [29,35]. The choice of an expansion basis composed of extended patterns has two main advantages. First, extended patterns are generally characterised by transmitted fields with higher SNRs because distributed sources generally carry more power. In fact, for a given power limit per pixel, the Walsh–Hadamard decomposition allows for a total energy per pattern that is about N/2 higher than single-pixel illumination. Second, and more importantly, there

is no one-to-one correspondence between individual image pixels and distinct measurements (as in the case of the raster scan). In fact, the incident patterns not only probe different parts of the sample in parallel but can also provide useful insights into its spatial structure, even before completing the entire set of illuminating patterns.

In practical terms, a WH pattern of size *N* × *N* is obtained by considering the tensor product between the columns (or, invariantly, rows) of the corresponding *N* × *N* Walsh–Hadamard matrix (see Figure 2a). The columns (or rows) are mutually orthogonal and form a complete tensor basis for any two-dimensional matrix. Interestingly, the columns of the Hadamard matrix can be re-arranged in different configurations, leading to matrices with different orderings [36–38]. In Figure 2, we compare two configurations: the Walsh (or sequency) order and the Hadamard (or natural) order. The Walsh ordering is particularly useful in image reconstruction as it mirrors the standard order of the discrete Fourier basis, i.e., the columns are sorted in terms of increasing spatial frequencies. This means that by using the Walsh matrix, it is possible to acquire complete lower-resolution images before completing the illumination set, which can be useful for applying decisional approaches and reducing the set dimension [39,40].

**Figure 2.** Walsh–Hadamardimage reconstruction. (**a**) Generation ofincident patterns from theWalsh–Hadamard matrix. Each pattern is defined as the tensor product between two columns of the generating matrix. The patterns can be generated from different configurations of a Hadamard matrix: we show the Walsh, or "sequency", order (top, used in TIMING) and the standard Hadamard, or "natural", order (bottom). (**b**,**c**) Reconstructed Walsh spectrum of the peak-field object transmission. Interestingly, only a fraction of the patterns (8.1%) were associated with a spectral amplitude exceeding the −60 dB threshold (with 0 dB being the energy correlation of the fittest pattern—panel c). Nevertheless, these patterns were sufficient to provide a high-fidelity reconstruction of the image (insets). (**d**,**e**) Pearson correlation coefficients between reconstructed and original images as a function of the number of patterns employed in the reconstruction. The results refer to the entire scan (panel d) and the initial 10% of patterns (panel e).

To illustrate how the image information is distributed across the basis of incident patterns, it is useful to analyse the peak-field Walsh spectrum of the reconstructed image, which is shown in Figure 2b. The WH spectrum is obtained by plotting the *Cn t* = *tpeak* coefficients as a function of their generating

pattern indexes. As can be evinced from Figure 2b, the WH decomposition re-organises the image information into a hierarchical structure, which mirrors the spectral content of the image. Interestingly, this property is at the core of the compression properties of the WH encoding scheme, as can be exploited to significantly reduce the number of measurements required to reconstruct the image. We illustrate this result in Figure 2c, where we identify the coefficients with an amplitude exceeding a −60 dB threshold with a red marker. As shown in Figure 2c, these significant coefficients were mostly localised in correspondence with the smaller spatial frequencies of the image, and for this image, they represented 8.1% of the total number of patterns. Remarkably, this limited number of patterns was sufficient to accurately reconstruct the image (as shown in Figure 2c, inset).

For a given Walsh–Hadamard matrix, it is also critical to consider the specific order employed when selecting the sequence of columns forming the distribution of incident patterns. In our approach, we implemented an optimised ordering of the WH patterns (denoted as "smart-Walsh"), which sorts the incident patterns in terms of increasing spatial frequency (see Supplementary Video 1). In Figure 2d,e, we illustrate the fidelity of the TIMING reconstruction across the ensemble of incident patterns for different sorting schemes. The fidelity between reconstructed and original images is estimated through the Pearson correlation coefficient, which measures the spatial correlation between the two datasets and is defined as:

$$\rho(A,B) = \frac{\sum\_{mn} (A\_{mn} - \overline{A}) \left(B\_{mn} - \overline{B}\right)}{\sqrt{\sum\_{mn} \left(A\_{mn} - \overline{A}\right)^2 \cdot \sum\_{mn} \left(B\_{mn} - \overline{B}\right)^2}},\tag{6}$$

where *A* and *B* are the spatial averages of *A* and *B*, respectively. In our analysis, we considered the performance of our "smart-Walsh" sorting (blue line) with the natural Hadamard sorting (yellow line) and the recently proposed "Russian-doll" sorting (orange line) [38]. As shown in Figure 2d, both the smart-Walsh and the Russian-doll sorting were capable of high-fidelity reconstructions of the sample image, even just by using a fraction of patterns, especially when compared to the standard Hadamard case. Further insights on the image reconstruction performance can be obtained by analysing the image reconstruction across the first 10% of patterns (Figure 2e). Remarkably, both our approach and the Russian-doll sorting outperformed the standard Hadamard sorting, yielding a high-fidelity image (spatial correlation exceeding 90%) by considering only 0.1% of the total number of patterns. Interestingly, while the performance of our "smart-Walsh" approach matched the Russian-doll sorting as soon as each Hadamard order was completed (dashed grey lines), we observed that it outperformed it across incomplete scans.
