1. Introduction
Photon counting and single-photon detection have been available at least since the 1930s, with the phtomultiplier tube (PMT) first and microchannel plate (MCP) later. These devices enable relatively high sensitivity, known in this context as photon detection efficiency (PDE), and low dark counts, quantified in terms of dark count rate (DCR), however they are generally bulky and they require high voltages to operate, typically hundreds to thousands of volts. In the 1940s, researchers started working with solid-state diodes operating in avalanche mode, known as avalanche photodiodes (APDs); these devices were refined through the 1950s and 1960s to be then implemented in planar processes. The devices required p-n junctions with guard rings and enhancement regions to prevent premature breakdown at the edge of the junction. With the improvement of semiconductor processes and the availability of more options, Cova and others began experimenting with Geiger-mode APDs or single-photon avalanche diodes (SPADs) in the 1970s and 1980s, recognizing the potential of these devices in capturing fast processes [
1]. In the early 2000s, SPADs could be implemented in high-voltage processes first [
2] and in standard CMOS image sensor processes later [
3,
4,
5,
6].
With the availability of SPADs in deep-submicron CMOS processes, it became conceivable to implement useful functionality in situ, possibly in pixel, so as to count photons and to time stamp them upon detection. The major consequence of this trend was massively parallel timestamping with picosecond resolution (LSB), with a possible explosion of data generated on chip [
7]. Timestamping impinging photons individually has several uses. For instance, through time-correlated single-photon counting (TCSPC) [
8], it becomes possible to accurately characterize photo responses of fluorophores when excited by fast light pulses. Fluorophores exhibit a time-dependent behavior, known as lifetime, that is specific to the fluorophore and/or the environment it is in [
9]. Fluorescence lifetime imaging microscopy (FLIM) [
10] is a technique used to characterize lifetime in fluorophores using multiple excitations and histogramming; in general, FLIM may be used in a confocal microscope with a single pixel, but it can also be used in widefield microscopy with a large number of pixels capable of performing TCSPC independently, thus speeding up lifetime capture by several orders of magnitude [
11,
12].
The first pixel with embedded time-to-digital converter (TDC) for in situ TCSPC was introduced in the project MEGAFRAME [
13,
14]. The drawback of this approach was the reduced fill factor that in turn required microlenses to recover, at least in part, lost sensitivity. As an alternative, researchers proposed to use simpler pixels, with a single digital counter [
15] or with dual digital counters [
16]. The use of analog counters was also proposed to ensure large resolution at low cost in terms of fill factor [
17,
18,
19]. Due to the lack of a TDC though, these methods require a precise gate and significant algorithmic complexity [
20,
21].
In this paper, we describe a photon counting imager comprising a programmable global shutter with sub-150 ps skew and a minimum width of 4ns for time-resolved imaging applications. The image sensor comprises an array of 512 × 128 SPAD pixels that are read out in rolling mode, while the shutter itself is global. Since each pixel has a one-bit counter embedded in it, a frame is read as a binary matrix and can be converted to a multi-bit matrix externally by adding up subsequent frames, as first shown in [
22]. The chip was demonstrated for fast fluorescence imaging and could be used for FLIM in [
23]. Throughout the paper, significant attention was given to circuit details that led to the exceptional skew and to tradeoffs used during the design to achieve the target readout speed. A complete dynamic and static characterization of the chip was also provided with images exemplifying the suitability of the approach.
The paper is organized as follows:
Section 2 describes the architecture of the sensor and its components.
Section 3 analyzes the implications of using binary pixels towards image quality and
Section 4 and
Section 5 report the optical and electrical characterization of the sensor in the context of the target applications.
Section 6 concludes the paper.
2. Sensor Architecture
A single-photon avalanche diode (SPAD) is a p-n junction biased above breakdown, so as to operate in Geiger mode. In this design, SPADs are similar to those in [
24], comprising a circular p+ active region over n-well, whereas premature edge breakdown is prevented by means of p-well guard rings.
Figure 1 shows the cross-section of a planar implementation of a SPAD consistent with CMOS processes. The SPADs are passively quenched, while an active recharge technique is provided [
1,
2].
The pixel can perform photon counting by means of a one-bit counter, implemented as a static latch. The pixel achieves gated operation by way of three transistors acting as switches. The pixel counter content is transferred to the exterior of the sensor via a fast digital readout channel capable of transferring a complete frame in 6.4 µs. The sensor has a global shutter that gates all the pixels simultaneously for a time as short as 3.8 ns.
The pixel conceptual diagram is shown in
Figure 2a. The MOS switch “SPADOFF” is activated to bring the SPAD below breakdown, thereby quenching any ongoing avalanche and preventing any future avalanche in the same frame (see
Figure 2b). The second MOS switch, “RECHARGE”, is designed to bring the bias voltage close to ground, thereby rapidly recharging the SPAD to its idle bias. This action reactivates the SPAD and, to avoid direct conduction from VDD to ground, it should never be performed simultaneously to “SPADOFF”. The last MOS switch, “GATE”, is used to prevent the one-bit counter from being accidentally set during the gating operations.
The actual implementation of the pixel is shown in
Figure 3. The recharging transistor is controlled by a global “RECHARGE” signal. The switches are implemented as NMOS transistors, while the latch is implemented by way of four NMOS transistors, connected as back-to-back NMOS inverters to eliminate the need for PMOS transistors. The pull-up transistors are critical to control power consumption in the latch during idle phases and settling time during set/reset phases. These transistors can be controlled using an external voltage, “TOPGATE”. The column pull-up transistor is biased so as to minimize the power required to bring the column to ‘L’ while ensuring a readout cycle of 6.4 µs/128 = 50 ns.
The pixel content is stored in a latch at the bottom of the column (not shown in the figure) that stores its value for 50 ns while the other three columns are multiplexed out to the external PAD. A 4:1 multiplexer serializes the output of the latches of four columns to the PADs; it is operated at four times that speed, i.e., 4/50 ns = 80 MHz, which is the maximal operating speed of the PADs in this technology and represents a good speed-power tradeoff.
The block diagram of the sensor, known as SwissSPAD, is shown in
Figure 4a. The timing diagram for the pixel is shown in
Figure 4b. The chip features a balanced network that distributes a low-skew version of signals “SPADOFF”, “RECHARGE”, and “GATE” (
Figure 4c). Due to their nanosecond length, “RECHARGE”, and “GATE” are distributed as three precisely timed rising-edge signals that are recomposed in situ by means of a pulse generator (PG) shown in
Figure 4d.
The timing diagram shows a typical readout cycle, wherein a memory reset is performed at the beginning of the cycle and a series of gating operations follows, until the next readout is performed. The gates are spaced an arbitrary time period (25 ns in this example) and are generally synchronized with a light source. This is done to maximize the effective spatio-temporal fill factor when a fast but dim response is expected from a pulsed light source, as, for example in FLIM. Fewer gates or even a single gate is possible, however, it results in an effective spatio-temporal
FF computed as
where
FFG is the geometric fill factor,
DC the duty cycle of gating,
N the number of gates in a frame,
tGATE the time length of the gate,
TFRAME the period of the frame, and
fGATE the frequency of the gate. Note that even though we reduce the temporal
FF, light will usually impinge within the gate synchronized with a laser, while the DCR will be reduced. The duty cycle is generally selected to be a fraction of the lifetime of a fluorophore. The delay between gate and excitation light is varied a minimum of 20 ps and a maximum of
TFRAME/
N, so as to scan the entire laser period 1/
fGATE.
N is chosen to minimize the pile-up effect. Since the counter only counts, at most, one event, no accumulation is possible during a frame but only digitally after multiple frames. This enables us to construct gray levels in images at the expense of a reduced frame rate [
22].
3. Binary Pixels
SwissSPAD is an all-digital, clock-driven sensor comprising pixels that can only detect one photon in a frame: we call these pixels “binary pixels”. Photons impinge a binary pixel with an expected arrival rate χ (photons per second) and are distributed in time following a Poisson distribution (the probability of
k counts per second is
). Thus, the probability of detecting one or more photons per second is
. For a non-unity photon detection probability (PDP) and non-zero
FF, the probability of photon detection per frame
. The expected photon counts per second measured in the pixel will thus become [
25]
where C
M is the measured SPAD count rate. Thus, even if one photon per frame is expected to impinge on the pixel, the pixel will detect it, on average, a fraction of the time, i.e., it will detect a fraction of a count, on average, per frame. Dark noise in a SPAD is dominated by three sources: thermal (trap-assisted and tunneling), noise, and afterpulsing. Assuming a large dead time, afterpulsing can be ignored and thus, with the exception of hot pixels, most exhibit a noise approaching Poisson statistics. The rate of occurrence of this noise is quantified by dark count rate (DCR). Thanks to its Poissonian nature, DCR is added to the equation as follows
From this equation, one can derive the correction factor for the expected detected SPAD count rate
, by simply solving the equation w.r.t E(
CD), as follows
Note that E(CM) and E(CD) were replaced by CM, and CD, respectively, since it is assumed that the correction is applied to a single sample generated by the detector and not the expected value achieved over a very large number of measurements. As can be seen from the equation, this correction is only needed for high values of CM, above 15 kcps.
However, in this condition, the asymptotic behavior can be used to extend the dynamic range of the pixel, as has been known in the silicon photomultiplier community for several years [
26] and in the radiation community from the 1970s [
27]. This can be done both in time and in space, whenever multiple pixels are added to make a larger one [
28,
29].
Figure 5 shows the theoretical and measured response of a binary pixel in clock-driven and in event-driven modes, as compared to the linear response of a non-binary pixel. In clock-driven mode, SPAD recharge or memory reset is applied periodically, asynchronously with respect to SPAD activity, while in event-driven mode recharge is done
Tdead after a SPAD avalanche, thus synchronously with SPAD activity. While clock-driven resets at high frequency are not used in single SPAD devices because of possible afterpulsing, arrays with long
Tdead do not show increased afterpulsing. Recent work is indicating a trend towards higher pixel resolution and advanced processing [
30,
31].
4. Sensor Fabrication
The sensor microphotograph is shown in
Figure 6; the inset shows a detail of the pixels. An array of microlenses (CSEM, Basel, Switzerland) was deposited on the chip matching the pixel pitch to improve light collection through light concentration [
32,
33]. The microlens array, shown in an artist’s rendering in
Figure 7a [
34], was measured and simulated as a function of the f-number of the main objective lens, yielding the plot of
Figure 7b.
Thus, the effective fill factor achieved with a lens of f/10 was 60% with high reliability and reproducibility over the entire array. The pixel PDP and DCR are plotted in
Figure 8 at room temperature.
The sensor was also characterized in terms of afterpulsing. The measurement was achieved by means of the inter-arrival response method introduced in [
35]. The pixels were exposed to a uniform wide-spectrum light source and the inter-arrival time of the response was stored in the FPGA for an integration time of 80 s. A histogram was then constructed confirming the exponential behavior of the response due to the Poissonian nature impinging photons. Afterpulsing probability
APP(
t) is approximated as
In the equation,
is the measured histogram,
the exponential fit, and
the dead time of the pixel, in this case 6.4 µs. Crosstalk probability, or simply crosstalk,
is computed in a similar way, wherein the inter-arrival time is measured between two adjacent pixels, as
where
is the inter-arrival time histogram measured between pixels
i and
j. Afterpulsing and crosstalk are reported in
Figure 9a,b, respectively.
5. Results
The sensor was used to image a large number of biological samples using fluorescence intensity and fluorescence lifetime imaging microscopy. Fluorescence intensity was achieved using a setup based on a dual port Leica SR GSD super resolution microscope (Leica Microsystems, Wetzlar, Germany) where SwissSPAD and an Andor iXon3 897 BV EMCCD were placed on the two ports of the microscope using the same illumination conditions for comparison purposes (
Figure 10). As an illustration, several biological samples are shown hereafter. First, let us consider BPAE cells labeled with MitoTracker Red CMX Ros, Alexa Fluor 488, and DAPI dyes. SwissSPAD was used at V
e = 4.5 V. The EMCCD raw intensity image was converted to a photon count image using counts
D = (
d − b) ×
gamp/
gEM, where
d is the digital intensity value,
b the bias offset,
gamp the preamp gain value and
gEM the EM gain. Due to pixel size differences, 2 × 2 SPAD pixels and 3 × 3 EMCCD pixels were binned to obtain counts for the same area. MATLAB software was used to find the overlapping area of the two images and compare the intensities.
Figure 11 shows the images obtained with the EMCCD (a) and SwissSPAD (b); the exposure times were 10 ms and 73.4 ms, respectively, to match the number of collected photons. The scale shows the number of collected photons per exposure.
Figure 12 shows a widefield image of a cellular cluster magnified 10× using the same microscope setup.
Fluorescence lifetime images could be obtained by sliding the gate start time from 0 to 10 ns with a step of 20 ps and integrating 255 frames per step, whereas the gate timing performance is essential for high quality FLIM images. The timing performance of the gate is summarized in
Figure 13: the response of the sensor is shown in
Figure 13a for a random pixel with minimum gate width and the uniformity of its position and length is shown in
Figure 13b.
Figure 13 shows steep edges of the counting response of the sensor when gating is used, and should not be mistaken with the signal shape of the “Gate” signal. The photosensitive window is defined by the falling edge of “Recharge” and falling edge of “Gate” (if “Gate” occurs after “Recharge”). A large vertical dimension with 128 pixels resulting in 3 mm long metal wires introduces an undesired RC component, limiting the minimal gate width. Although the metal wires of “Gate” and “Recharge” are equal, “Recharge” was designed larger to enable a shorter signal. This introduces a mismatch in “Gate” and “Recharge“ RC, and nonuniformity of photosensitive window widths. Smaller technology nodes will decrease transistor gate capacitances, and wider metal lines through the column can reduce the resistance while keeping the parasitic capacitance dominated by the lateral component constant. The use of repeaters is also an option. Both the metal widening and repeater though can reduce fill factor. A smaller pixel pitch will also reduce the RC component of the line. The right edge of
Figure 13a corresponds to the falling edge of the “Recharge” signal and it represents the critical edge for FLIM. A theoretical approach of a FLIM measurement with gating is a convolution between an exponential distribution signal and a rectangular gate signal. The fall time of the falling edge should be small in comparison to the lifetime of the exponential distribution to assure high precision measurements. This fall time is a similar measure as the instrumentation response function (IRF) in TCSPC.
Figure 14 reports fluorescence intensity images of samples stained with Safranin and Fast Green, having peak excitation wavelengths of 530 nm and 620 nm, respectively. Filtering was used in two subsequent exposures of the same sample and software based recomposition was then applied. Pictures on
Figure 11,
Figure 12, and
Figure 14 were corrected for DCR and possible count compression using Equation (4).
Thanks to the frame readout period of 6.4 µs, one can achieve a maximum frame rate of 156 kfps, whereas a Virtex™ IV or Spartan™ 6 FPGA (Xilinx, San Jose, CA, USA.) is used to acquire and store the one-bit frames.
Figure 15 shows the physical appearance of the system, whereas a daughterboard hosting SwissSPAD is electrically connected to a motherboard hosting two Xilinx-IV FPGAs for acquisition and formatting of the data that are then sent to a Mac/PC through USB2 link.
To construct gray level images, the one-bit frames may be accumulated in the FPGA and transferred to the Mac/PC through a USB-2 or USB-3 link; by doubling the number of frames, the pixel intensity resolution increases by one bit [
22]. The tradeoff between pixel effective number of bit (ENOB) and effective frames-per-second (EFPS) is shown in
Figure 16. While the speed of a DDR memory is high enough to keep up with the data rate generated by the FPGA at any ENOB, USB-2, and USB-3 links to the PC do not allow continuous recording at all ENOBs, making an intermediate memory like DDR2 necessary. An USB-2 link can transfer eight-bit frames continuously.
The sequence of
Figure 17 shows images of an analog oscilloscope obtained without accumulation (156 kfps, 1 bit) and with several levels of accumulation from 4× to 65,536×, resulting in a pixel ENOB of 2 and 16 bits, respectively, during the cumulative frame. Unlike conventional cameras, the pixel ENOB is derived by the simple expression
where
Npixel is the maximum possible number of accumulated counts in the pixel during a cumulative frame. The SNR per pixel, and over the entire sensor, is approximated by
This approximation assumes no readout noise and lower count rates with linear response, thus a Poisson limited noise at any frame rates and no saturation. The images in
Figure 17 show Poisson limited noise in the images at five different exposure times.
In
Figure 18a sequence of a fast event is shown; the frame rate was fix at 1200 frames-per-second, of which one frame every 100 ms is depicted. A global shutter was used achieving deep-subnanosecond uncertainty of the gate width and position.
The ability of SwissSPAD to acquire lifetime images was demonstrated in a lab setup using point detection. Indocyanine green (ICG) in milk with a concentration of 40 µM was excited using a 790 nm laser with 55 ps pulse width and 100MHz repetition rate synchronized with the SwissSPAD gating. Fluorescence intensity from the excited spot was measured for 512 gate windows offset by a fraction of the repetition period (25 ps). From the response similar to the IRF shown in
Figure 13 convolved with an exponential decay the lifetime is extracted by fitting against a set of models constructed from the IRF used in these measurements.
Figure 19 shows the per pixel extracted lifetime and normalized intensity over the excited spot. The extracted lifetimes with μ = 636 ps and σ = 56 ps overestimate the 580 ps reference lifetime given in literature [
36]. Homulle et al. showed in [
37] how the accuracy of lifetime extraction from gated measurements can be improved through refinement of the modeling and simulation.
The sensor specifications are listed in
Table 1. All the measurements were performed at room temperature, unless otherwise indicated.
The sensor is currently used by a number of researchers in different institutions under use warranty based on GNU policies.