Temporal Super-Resolution Using a Multi-Channel Illumination Source

Cohen, Khen; Mendlovic, David; Raviv, Dan

doi:10.3390/s24030857

Open AccessArticle

Temporal Super-Resolution Using a Multi-Channel Illumination Source

by

Khen Cohen

^*,

David Mendlovic

and

Dan Raviv

^*

The Faculty of Engineering, Department of Physical Electronics, Tel Aviv University, Tel Aviv 69978, Israel

^*

Authors to whom correspondence should be addressed.

Sensors 2024, 24(3), 857; https://doi.org/10.3390/s24030857

Submission received: 7 December 2023 / Revised: 24 January 2024 / Accepted: 25 January 2024 / Published: 28 January 2024

(This article belongs to the Section Sensing and Imaging)

Download

Browse Figures

Versions Notes

Abstract

:

While sensing in high temporal resolution is necessary for a wide range of applications, it is still limited nowadays due to the camera sampling rate. In this work, we try to increase the temporal resolution beyond the Nyquist frequency, which is limited by the sensor’s sampling rate. This work establishes a novel approach to temporal super-resolution that uses the object-reflecting properties from an active illumination source to go beyond this limit. Following theoretical derivation and the development of signal-processing-based algorithms, we demonstrate how to increase the detected temporal spectral range by a factor of six and possibly even more. Our method is supported by simulations and experiments, and we demonstrate (via application) how we use our method to dramatically improve the accuracy of object motion estimation. We share our simulation code on GitHub.

Keywords:

computational photography; super-resolution; temporal super-resolution; active illumination

1. Introduction

Resolution in a digital signal refers to its frequency content. High-resolution (HR) signals are band-limited to a more extensive frequency range than low-resolution (LR) signals. While sampling a signal, the captured signal’s resolution is limited by two factors: the physical device limitation (e.g., the device’s response function to different frequencies) and the sampling rate. For example, digital image resolution is limited by the imaging device’s optics (the diffraction limit) and the sensor’s pixel density (the sampling rate).

Super-resolution is a broad research area that uses sophisticated ways to overcome these limits. The ability to exceed the resolution limits of the system always has something to do with some prior knowledge about the scene or about the system [1,2]. In the field of imaging, image super-resolution (SR) techniques can be divided into two main approaches: optical-based and algorithm-based.

Optical-based SR utilizes the optical property of light to transcend over the diffraction limit. This approach can be further divided into mainly three areas: the first is multi-plexing spatial-frequency bands [1], which uses the fact that low-frequency moire fringes are formed when the scene is multi-plexed with a periodic pattern (structured illumination)—e.g., the work by Abraham et al. in speckle structured illumination [3]. The second involves acquiring multiple parameters about the scene and merging them, for example, detecting scene polarization [4]. The third method is the probing near-field electromagnetic disturbance method. It is a modern approach that uses an unconventional imaginary optical system and tries to detect tiny disturbances in electromagnetic waves. For example, using evanescent waves [5]. Each of these super-resolution methods sacrifices another domain [1,2,6]. For example, on account of the time [7], wavelength [8,9], or field of view [10].

Algorithm-based SR is a method that focuses on the sensor pixels’ density limit. It includes mainly algorithmic solutions, such as frame deblurring and localization estimators [11]. Nowadays, deep learning methods have presented excellent performance in SR tasks [12,13,14], including medical imaging [15], satellite imaging [16], and face SR [17].

The field of temporal super-resolution (TSR) deals with a similar challenge but in the temporal domain. In general, TSR can be divided in a similar manner: optical-based and algorithm-based. Optical-based TSR includes several methods. One is s combination of cameras: this method exploits the fact that different cameras with some temporal overlap can provide complementary information to increase the temporal resolution. The temporal coding method uses a preknown temporal pattern as a coding technique for the detected signal. Optical coding extracts temporal illumination patterns [18] or temporarily coded apertures [19], and sensor coding uses a temporary change in the sensor’s reading manner [20] or a flattened shutter [21] to deblur the images. Software Interpolation uses algorithms (nowadays, these are mainly deep learning-based) to generate a temporal interpolation of the signal. Some of the methods are optical flow-based [22], whereas others are phase-based [23] or kernels-based methods [24,25].

Algorithm-based TSR (Software-only) approaches a straightforward solution in terms of system complexity, and these methods demonstrate good performance [26]. However, their ability to interpolate in time is limited since the deep learning models heavily rely on past examples and training. In contrast, TSR supported by hardware (optics or sensor) has the potential to raise the temporal sampling frequency with a much higher rate and reliability. However, the price is the complexity of the system.

While spatial super-resolution has been widely researched for decades, temporal super-resolution (TSR) has not been extensively researched to the same extent. As a consequence, there is still plenty of room for improvement in TSR methods, especially ones that provide a high up-sampling factor, high reliability, and low system complexity.

In this work, we present a novel approach to TSR by using the object’s optical reflection properties, such as its surface polarity reflection or spectral reflection. Our proposed system consists of a standard camera with a high-frequency illumination source. In comparison to other presented methods, our approach constitutes a good compromise between performance (large temporal spectrum reconstruction) and simplicity (a system that is not too complicated or expensive). We model the camera image sensor operation method, formulate our problem as an optimization problem, and provide a comprehensive solution for a particular case of colored-based illumination sources. Our analysis and results are supported by theoretical derivations, simulations, and experimental results. Apart from other works in this field, our method shows high reliability in terms of spectral reconstruction with no significant hardware complexity penalty, and it reconstructs the spectral content of the signal very well. Moreover, our method can be used in real time due to its simple solution form.

The main contributions of this work are as follows:

The demonstration of a novel approach for optical coding to achieve high temporal frequencies with a fixed sensor sampling rate working in real time.
The development of a substantial theoretical background to increase temporal resolution from subsamples.
Providing an anti-aliasing algorithm to improve system performance over a wide range of frequencies.

2. Theoretical Background

2.1. Temporal Model for an Image Sensor

We denote a general signal as

I (x, y, t)

, captured by the image sensor. We formulate the image sensor operation as temporal distortion, which is assumed to be linear time-invariant (LTI), followed by sampling. The distortion is represented by the transfer function,

h (t)

, and the sampling in time at a frequency of 1 over the exposure time is

f_{s} = \frac{1}{T}

.

The sampled signal, therefore, is given by

u [n] \equiv u (t = n T) = [\sum_{n = - \infty}^{\infty} δ (t - T n)] [f (t) * h (t)] = \sum_{n = - \infty}^{\infty} δ (t - T n) {[f (t) * h (t)]}_{t = n T}

(1)

In order to fully reconstruct the

u (t)

signal, two conditions have to be fulfilled: first, the distortion of the signal can not be too severe, and the sampling rate must be at least twice as high as the maximum spectral content of

u (t)

, according to the Shannon-Nyquist theorem of sampling [27]. It is clear that effectively increasing the sampling rate can improve the signal reconstruction in the temporal domain. However, since it means that the sampling rate becomes much higher, the integration time decreases, which, in turn, leads to the signal-to-noise (SNR) degradation of the reconstruction signal.

2.2. Multi-Channel Approach and Assumptions

We define a set of channels as a set of independent optical properties of light. For example, X-polarization and Y-polarization are two channels, or several different wavelengths are different channels. We assume linear optics, meaning that the reflected light from an object does not transform between channels. We define the channel m as follows:

C^{m} = \int_{0}^{T} \int_{- \infty}^{\infty} c^{m} (λ, t) Q^{m} (λ) R (λ, t) d λ d t

(2)

While

c (λ, t)

is the illumination mask generated by a light source (changes in time),

R (λ, t)

is the reflective properties of the object, (change in time) and

Q (λ)

is the image sensor filter for a specific spectral range;

λ

is the wavelength, and T is the integration time of the sensor (exposure time).

We assume that for a given m, there is a spectral match between the flicker light source and the sensor filter. Practically, it means that the sensor is significantly affected by the light source (e.g., a red bulb will be captured intensively in the camera’s red channel). In addition, we assume that

c^{m} (λ, t)

is a product of a temporal-dependent function and a spectral-dependent function:

c^{m} (λ, t) Q^{m} (λ) = c^{m} (t) {\tilde{Q}}^{m} (λ)

(3)

Moreover, we assume that there is a high similarity between the different channels. So, for each time t, the relation between the light collected in each channel is equal up to a constant scale,

γ

:

\int_{- \infty}^{\infty} {\tilde{Q}}^{m} (λ) R (λ, t) d λ \approx γ^{m, k} \int_{- \infty}^{\infty} {\tilde{Q}}^{k} (λ) R (λ, t) d λ

(4)

Now, we focus on the case where the flicker changes in time in a discrete manner between two modes: off and on. Therefore, we get

C^{m} = \sum_{n = 1}^{N} c_{n}^{m} \int_{- \infty}^{\infty} {\tilde{Q}}^{m} (λ) R (λ, t) d λ \approx \sum_{n = 1}^{N} c_{n}^{m} i_{n}

(5)

where N is the up-sampling factor and

i_{n}

represents the average value of the image at a subtime step, n.

2.3. Definitions

In our analysis, we denote N as the up-sample factor of the sampling rate, meaning that we increase the maximum detected spectrum from

\frac{1}{2 T}

to

\frac{N}{2 T}

, and M is the number of independent channels that we used. We assume that in any sub-interval of time,

\frac{T}{N}

is approximately constant, so for each exposure time, we can define the intensity vector of size N,

\vec{I}

. We further define the vector

\vec{C}

of size M to represent the value captured in each of the channels for a single exposure time. We define M vectors (m is between 1 to M) for the vectors

{\vec{c}}^{m}

of size N to represent each of the channel’s code patterns. In our analysis, we focus on the cases where the vectors

{\vec{c}}^{m}

have binary values, 0 or 1, when the flicker of the channel, m, is on or off, respectively.

3. Method

From Equation (5), one can notice that extracting the values of

i_{n}

is equivalent to an up-sample in factor N in the temporal domain. The problem is that for M channels, this equation can be solved uniquely only for an up-sample factor of

N = M

. In practice, we have a low number of channels, and we want to get a high rate of temporal super-resolution. For that, we need to use some prior knowledge about the scene dynamics. We choose to assume scene smoothness in the temporal, so we formulate the problem as the following cost function:

L = \sum_{n = 1}^{N - 1} {(i_{n} - i_{n + 1})}^{2} + \sum_{m = 1}^{M} λ^{m} (C^{m} - \sum_{n = 1}^{N - 1} i_{n} c_{n}^{m})

(6)

where

λ^{m}

represents some regularization factors.

3.1. Spatial Regularization

The absence of any spatial correlation between adjacent pixels might yield some artifacts in the image. To avoid this, we define (for each pixel) a domain P, which includes the pixel with its four closest neighbors (see Figure A1 in Appendix A.3), and we modify the cost function as follows:

\begin{matrix} L = \sum_{n = 1}^{N - 1} \sum_{x, y \in P} w_{x, y}^{t} {(i_{x, y, n} - i_{x, y, n + 1})}^{2} + w_{x, y}^{s} {(i_{x, y, n} - i_{x, y + 1, n})}^{2} + \\ w_{x, y}^{s} {(i_{x, y, n} - i_{x + 1, y, n})}^{2} + \sum_{m = 1}^{M} \sum_{x, y \in P} λ_{m, x, y} (C_{m, x, y} - \sum_{n = 1}^{N} i_{x, y, n} c_{x, y, m, n}) \end{matrix}

The vectors change to become column stack vectors of the different pixels, and the matrices are expanded to block matrices, as explained in Appendix A.3. w factors are weight factors that determine the ratio between spatial and temporal regularization.

3.2. Solution with Lagrange Multipliers

Finding the solution of Equation (6) means that from the infinite number of solutions to Equation (5), we would like to choose the one with the smoothest solution to be the estimator for the actual signal. The solution is given by the following equation (for the complete derivation, please see Appendices Appendix A.1 and Appendix A.2):

\vec{I} = M^{- 1} S {(S^{⊺} M^{- 1} S)}^{- 1} \cdot \vec{C}

(7)

3.3. Colored Light Source

We focus on a particular case of flicker for the colors red, blue, and green. This case is the most common and can be used on any colored camera. We denote the flicker vectors as

{\vec{c}}^{1} = \vec{r}

,

{\vec{c}}^{2} = \vec{g}

, and

{\vec{c}}^{3} = \vec{b}

, and the channel vector is

{\vec{C}}^{⊺} = (R, G, B)

, while R, G, and B are the digital color values, as captured in the image for each color.

The spectral matching assumption (as presented) is fulfilled because the light source spectrum (LED spectrum) is captured well by the sensor’s color channels. This leads to the following interpretation (in digital values):

u (t) \approx γ^{r} u_{r} (t) \approx γ^{g} u_{g} (t) \approx γ^{b} u_{b} (t)

(8)

While

u (t)

is the true signal and

u_{r} (t)

,

u_{g} (t)

, and

u_{b} (t)

are the signals as captured in red, green, and blue, respectively. The

γ

factors are related to the color of the object and can be derived from a single image of the scene (without flicker).

A binary flicker pattern of red, green, and blue for each camera’s exposure time illuminates the scene. The total accumulated result is used to extract the value of the actual signal (see Figure 1).

3.4. The Scanning Mode and Anti-Aliasing Algorithm

Since N represents the up-sample factor, the smaller the N, the more accurate the result should be (for

N = 3

, the result is even unique), but no information about higher frequencies is collected. On the contrary, the high N factor can detect high-frequency content but is less reliable. Therefore, we propose a technique that applies several N factors, each at a separate temporal window. At the same time, we define a temporal window as a period in which the method works at a constant factor, N.

The construction of the signal is carried out in the spectral domain. However, collecting all the contributions from the different temporal windows is not straightforward. There could be many approaches to combination strategy. We chose the following: each spectral interval of the united signal is given by averaging over all the temporal windows’ contributions, with a minimum N factor that detects this spectral interval. For example, given a camera with an FPS of

f_{s}

, if we apply a scanning method with the sequence

N = 3, 4, 5, 6

, the low spectral domain (up to

3 \frac{f_{s}}{2}

) is equal to the spectral content of the first temporal window, the mid-spectral domain (from

3 \frac{f_{s}}{2}

to

2 f_{s}

) is equal to the spectral content of the second temporal window, and the high spectral content (from

2 f_{s}

to

5 \frac{f_{s}}{2}

) is equal to the average between the third and fourth temporal windows.

One assumption that underlies this method’s basis is that the spectral content of the scene does not change much between temporal windows (invariant signal for a short time). According to that, choosing the shortest possible temporal windows is preferred, yet if the temporal windows are too short, this may not provide enough accurate results for the spectrum.

Because every temporal window contributes to another spectral component, an assembly between the windows can be used. However, anti-aliasing techniques should be used to avoid artifacts. Therefore, we can use mutual information from the different spectral domains to attenuate and even eliminate aliasing Algorithm 1.

Algorithm 1: Anti-aliasing algorithm

While BPF is an ideal band-pass filter, Rotate is a function that rotates the signal’s spectrum relative to a specific frequency. The algorithm uses the fact that every temporal window with a specific N is aliased mainly by the spectral components from the components of the

N + 1

temporal window. In this way, we use the spectrum that the temporal window had recovered with the up-sampling factor of

N + 1

and subtract its aliasing contribution from the spectral range recovered by the temporal window using the up-sampling factor N.

3.5. Performance Analysis and Signal-to-Noise Ratio (SNR)

As presented in previous sections, typically, increasing the FPS causes a decrease in exposure time. The SNR of the signal increases linearly with exposure time [28]. Hence, reducing the exposure time should decrease the SNR. However, the SNR grows like a square root in the illumination intensity (or the number of photons) [28]. Since our method uses an active illumination source, it compensates for the SNR decrease and improves image quality. We analyze the signal and the noise separately; see the Appendix for the extensive derivation Appendix A.6. The final result is as follows:

S N R \propto α^{3 / 2}

(9)

where we define

α

as the ratio between the intensity of the active illumination source and the intensity of the background source.

4. Numerical Simulations and Analysis

In order to demonstrate our method, we built a computational simulator. The simulator simulates an ideal matt and white object, with no environmental illumination, that performs any dynamics such that a particular pixel in the image can be described as a continuous trajectory of intensity versus time. Apart from the scene, the simulator simulates the camera sampling method via integration and sampling in FPS and effective flickers in RGB colors. Everything is assumed to be ideal such that in the presence of a red flicker (for example), there is no green and blue intensity value captured at all. Furthermore, we set the exposure time to be equal to one over the camera’s frames-per-second, neglecting the sensor reading time delay (which is a good approximation for common cases in reality). For each of the following results, unless otherwise mentioned, we simulated camera FPS at 10 Hz and we limited our analysis to the cases where

N = 3

,

N = 4

,

N = 5

, and

N = 6

, but this can be examined for higher up-sampling factors as well.

4.1. Flicker Pattern Analysis

The freedom to choose the flicker pattern raises the question of which pattern should be chosen to maximize the signal reconstruction performance. In other works, how to find the optimal coding [29] instance by assuming some noise model has been shown; however, here, we are interested in our method’s performance for different spectrum domains without any explicit assumptions made on the sample noise model. We later show how this spectral approach can be leveraged into a very high upsampling coefficient by using the anti-aliasing scheme (by decomposing the spectral components and using the optimal flickering pattern for each spectral domain).

For example, for a specific channel (B, G, or R), one can choose whether to perform a flicker at one specific time step and then get as much information as possible about this specific time step (at that channel) or apply the flicker to some time steps. Then, the camera collects the accumulative values of this channel, which has uncertainty about any specific time step. However, it gives information from a more extensive temporal range from the signal. Two approaches have been examined here: the reconstruction error for randomly changing patterns over time and a comparison between some arbitrary flicker patterns. For both analyses, we simulated 10,000 random sinus functions, with a temporal frequency of

[5

Hz,

30

Hz] (uniformly distributed) and a total duration of 5 s each.

Randomly Flicker: This analysis was carried out via the random sampling of the flickering pattern. Practically, we sample full-rank matrices, S, for each frame and calculate the method L2 error, and the results are shown in Figure 2. From our analysis, the random sampling has different errors for each spectral content. Therefore, we suggest using the random sampling technique when there is no prior knowledge about the scene frequency content.

Our second test was carried out to examine the reconstruction error for different fixed flicker patterns. Ideally, it is best to search among all the possible existing matrices, but this number is enormous, and we decided to focus on several specific flicker patterns for

N = 4

,

N = 5

, and

N = 6

(see Appendix A.5 to see the different choices).

The results are represented in Figure 3. For each N factor, the “jump” in error at a certain frequency (20 Hz, 25 Hz, and 30 Hz, respectively) is due to the Nyquist theorem of sampling. For each N, there is no one specific graph that can be considered the best one among all candidates. Nevertheless, it is quite clear that if we focus on a specific spectral range, we can divide the spectrum into adjacent regions, where each N gets its lowest error. For example,

N = 3 : [5

Hz,

15

Hz],

N = 4 : [15

Hz,

20

Hz],

N = 5 : [20

Hz,

25

Hz], and

N = 6 : [25

Hz,

30

Hz], and due to this, we choose the best flicker pattern as follows: pattern 1 (for N = 4), pattern 3 (for N = 5), and pattern 4 (for N = 6). These results support our scanning method attitude for merging different temporal windows to construct the entire spectral domain.

An additional comparison is presented in Table 1.

4.2. Simulations Results

Here, we compare various N factors with the signal reconstruction L2 error. The generated signal was, as in the previous section, obtained via the random sampling of 10,000 sinus functions at temporal frequencies of

[5

Hz,

30

Hz] (uniformly distributed). The flicker patterns we chose to use, based on previous analysis, are

\begin{matrix} N = 3 : & \vec{b} = (1, 0, 0), \vec{g} = (0, 1, 0), \vec{r} = (0, 0, 1) \\ N = 4 : & \vec{b} = (1, 0, 0, 1), \vec{g} = (1, 0, 1, 0), \vec{r} = (0, 1, 0, 1) \\ N = 5 : & \vec{b} = (0, 1, 0, 0, 0), \vec{g} = (1, 0, 1, 0, 1), \vec{r} = (0, 0, 0, 1, 0) \\ N = 6 : & \vec{b} = (1, 0, 1, 0, 1, 0), \vec{g} = (0, 1, 0, 1, 0, 1), \vec{r} = (1, 1, 1, 1, 1, 1) \end{matrix}

The results can be seen in Figure 4, where one can figure out several conclusions. First, the blue line (the linear curve) is the maximum error among all values, and it is given by the camera’s original signal with no up-sampling factor. Second, every up-sample factor extends the frequency detected range up to a different cut-off frequency because of the Nyquist sampling theorem. Third, the reconstruction quality for a different N is dependent on the frequency, whereas each up-sampling factor reaches better results in different frequency regions. This insight might help a lot when there is some prior knowledge about the scene spectrum. Moreover, these findings also support the scanning method technique we presented.

The simulation results are shown in Figure 5, where we simulated the following temporal signals:

\begin{matrix} N = 3 : & x (t) = sin (2 π t) + 0.3 sin (4 π t) + 0.8 sin (10 π t) + 0.25 sin (18 π t) + 0.75 sin (22 π t) + 0.5 sin (36 π t) \\ N = 4 : & x (t) = sin (2 π t) + sin (12 π t) + sin (22 π t) \\ N = 5 : & x (t) = sin (6 π t) + sin (22 π t) + sin (46 π t) \\ N = 6 : & x (t) = sin (14 π t) + sin (56 π t) \end{matrix}

and the following different flicker patterns:

\begin{matrix} N = 3 : & \vec{b} = (1, 0, 0), \vec{g} = (0, 1, 0), \vec{r} = (0, 0, 1) \\ N = 4 : & \vec{b} = (0, 1, 0, 0), \vec{g} = (1, 0, 0, 1), \vec{r} = (0, 0, 1, 0) \\ N = 5 : & \vec{b} = (0, 1, 0, 0, 0), \vec{g} = (1, 0, 1, 0, 1), \vec{r} = (0, 0, 0, 1, 0) \\ N = 6 : & \vec{b} = (1, 0, 0, 0, 0, 1), \vec{g} = (0, 1, 1, 0, 0, 0), \vec{r} = (0, 0, 0, 1, 1, 0) \end{matrix}

One can notice that our method significantly improves the ability to detect and reconstruct the signal spectral content even though one can still recognize the aliased signal parts.

In order to demonstrate this technique’s performance and the anti-aliasing algorithm results, we simulated a signal 10 s long, and we defined 2, 3, and 4 temporal windows, each at a size of 5 s, 3.33 s, and 2.5 s, respectively. Every temporal window had its own N factor.

The simulated signal is

x (t) = S W_{12} (t) + S W_{19} (t) + S W_{23} (t) + S W_{27} (t)

(10)

While SW represents a square wave with a 50% duty cycle, we took the N factors to be 3, 4, 5, and 6 (each corresponds to a temporal window). The result is shown in Figure 6. To avoid white noise, we filtered out the lowest 5–

10 %

of the spectrum (filter uniform in the spectrum).

We can see that we obtained a good result when using this technique and even an additional improvement when using the anti-aliasing algorithm.

5. Experimental Results

5.1. The Setup

Our experimental setup can be seen in Figure 7. Our setup consists of a Raspberry PI unit with an RGB camera, a model Pi Camera V2 (which we set to a frame per second of 10 Hz, 20 Hz, or 80 Hz), four light bulbs (one red, two green, and one blue) and a power bank (22.5 Watt). We placed different objects in front of the camera at a typical distance of about 40 cm to 2 m. The main object we examined was the rotating fan since it allowed us to analyze different temporal frequencies (see Figure 8), but we also tried different objects (see Figure 9). The frequency of the rotating fan was measured in parallel by a recording camera (PointGrey) in high-FPS mode (with a typical frame rate of up to 500 Hz, limited to the region of interest); this measurement allowed us to compare our results to the ground truth. For the SNR measurement, we used white paper instead of the object. For every N (upsampling factor), we used the same coded pattern that was found to be the best among the candidates presented in the simulations (Figure 3). We set the rotational fan frequency to approximately

\pm 21.5

Hz. In addition, we normalized the DC value for the different signals to focus only on temporal variations.

5.2. Signal Reconstruction Results

The experimental results are shown in Figure 10.

These results indicate that our method successfully detects high frequencies. Nevertheless, it can be seen from the graphs that sometimes there are some errors and artifacts in the result.

5.3. Imaging Reconstruction Results

Apart from the ability to capture high frequencies, Figure 8 and Figure 9 show the imaging results as examples. For comparison, we used the SuperSlowmo algorithm [30] to raise the frame-per-second rate of the scene. Moreover, we analyzed the imaging results for different temporal and spatial weights in Figure A4, Appendix A.7.

5.4. SNR and Performance Results

In order to evaluate the SNR for different

α

factors, we used a clean, white piece of paper located

\pm 40

cm in front of the camera and the flicker. We used different environmental illumination by using a white-light projector and measured the illumination values using a Lux meter. The results are shown in Figure 11.

As one can notice, the SNR improved since the flicker increases the light in the scene.

An additional experiment was used to measure the performance of the method reconstruction vs. the

α

factor. Here, we focus on

N = 3

, and the results are shown in Figure 11. In fact, there is a decrease in the performance of the method when the

α

factor decreases, which means increasing the illumination of the environment relative to the flicker illumination source.

5.5. Motion Estimation Improvement

One fundamental task in computer vision is to estimate motion or optical flow. Given the image’s spatial and temporal derivatives, one can calculate the velocity of a pixel in the XY plane. However, estimating the temporal derivative relies heavily on the camera frame-per-second rate. Here, we introduce an application for our method. Since high temporal frequencies cannot be detected in a low frame-per-second camera, applying our method and effectively raising the camera frame-per-second can improve the temporal aspect. We measured the rotating fan’s blade velocity (at the XY plane) at each pixel and compared it to the ground truth, which was detected using a high frame-per-second camera. The result is shown in Figure 12.

5.6. Motion Estimation Analysis

A comparison between motion estimation performance with and without our method is presented in Figure 12.

5.7. Discussion

The results demonstrate how our method can significantly increase the temporal upper limit of the camera. We have reached the following conclusions. First, the results demonstrate how our method can significantly raise the temporal upper limit of the camera. We found that applying different flickering patterns can deal with a significant change in the signal reconstruction error, and, as expected, the greater the N factor, the higher this error becomes. Additionally, it has been shown that each flicker pattern can provide better accuracy at a particular frequency when taking into account other frequencies. Following these findings, we introduced the scanning method, which has displayed good results, including aliasing attenuation. Experimentally and supported by theoretical derivation, our experiment shows how the SNR of the scene increases with our method. We demonstrated how our system performance improves when the

α

factor improves, as per Figure 11. The errors are still low, even at a low

α

(Figure 11). Furthermore, we show how the motion estimation error decreases dramatically when using our method (Figure 12). From the experimental and simulation results, it is clear that our method successfully detects high temporal frequencies.

However, our method still suffers from several issues and limitations, which influence the reconstruction error, and we divide them into three aspects:

(1) Illumination errors: Some of the assumptions about the light sources do not hold all the time, for example, the flickering illumination intensity differences that require tuning the coefficients

γ_{r}

,

γ_{g}

, and

γ_{b}

.

(2) Temporal mismatch: The better the synchronization between the camera and light source, the lower the signal reconstruction error will be, and the less artifacts will be seen.

(3) Reconstruction error: Our analysis has shown an inherent error factor in our method, especially for a high N. This error component might lead to the generation of new frequencies and signal distortions (as can be seen for

N = 6

in Figure 10).

6. Conclusions

In this work, we introduced a new method for temporal super-resolution based on multi-channel flickering light sources. We presented a method to solve the problem based on Lagrange multipliers. Our method showed very good results in our tests for several combinations of flickering patterns and up-sampling factors, N. We further demonstrated the performance of our scanning method and anti-aliasing technique. In our experiment, the results were good as well, and our method was able to extract very high frequencies (by a factor of about six) from the original camera Nyquist cut-off frequency. Moreover, we demonstrated (experimentally) how a motion estimation task is significantly improved thanks to our method. While achieving temporal super-resolution is always accompanied by a trade-off between the accuracy results and system complexity, here, we demonstrated a method that constitutes a proper balance between the two. As discussed in the previous section, despite the attractiveness of our method, it still has limitations, for example, the performance decrease for strong background illumination or the technical challenge of synchronizing the camera with the active light source. The up-sampling factor can go up to six (with moderate errors) and beyond without any significant overhead for the system hardware complexity. For future study, we suggest three directions: the first one is to examine different channel types, e.g., using different polarization. The second one is improving the reconstruction algorithm by taking into account different spatial and temporal correlations, and the third one is to examine the method performance for different noise models. To encourage future research, we share our code on GitHub.

7. Patents

A US patent has been submitted for this method.

Author Contributions

Conceptualization, K.C.; Methodology, K.C.; Software, K.C.; Validation, K.C.; Formal analysis, K.C.; Investigation, K.C.; Writing—original draft, K.C.; Visualization, K.C.; Supervision, D.M. and D.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Weinstein Fund for Signal Processing and the Electro-Optics fund from Tel Aviv University.

Data Availability Statement

The data presented in this study are available on GitHub or may be obtained from the authors upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

TSR	Temporal super resolution
FPS	Frames per second
HR	High resolution
LR	Low resolution
SNR	Signal-to-noise ratio

Appendix A

Appendix A.1. Definitions

We define a vector in size M as representing the Lagrange multiplier of each of the channels

\vec{λ}

. Finally, we define the matrices

S

and

M

, which will be used in our derivations:

S_{(N, M)} = (\begin{matrix} {\vec{c}}^{1} & \dots & {\vec{c}}^{M} \end{matrix}), {\vec{C}}_{(M, 1)} = (\begin{matrix} C^{1} \\ ⋮ \\ C^{M} \end{matrix}) {\vec{λ}}_{(M, 1)} = (\begin{matrix} λ^{1} \\ ⋮ \\ λ^{M} \end{matrix})

(A1)

M_{(N, N)} = (\begin{matrix} 4 & - 2 & 0 & \dots & 0 \\ - 2 & 4 & - 2 & \dots & 0 \\ 0 & - 2 & 4 & \dots & 0 \\ ⋮ \\ 0 & \dots & - 2 & 4 & - 2 \\ 0 & \dots & 0 & - 2 & 4 \end{matrix}), {\vec{I}}_{(N, 1)} = (\begin{matrix} i_{1} \\ ⋮ \\ i_{N} \end{matrix})

(A2)

Appendix A.2. Solving the Optimization Problem for Lagrange Multipliers

In the general case, we can say that

L = \sum_{n = 1}^{N} {(i_{n} - i_{n + 1})}^{2} + \sum_{m = 1}^{M} λ_{m} (C_{m} - \sum_{n = 1}^{N} i_{n} c_{m, n})

(A3)

and for our case, we say that

B, G, R

are the input digital values for the colors blue, green, and red, respectively. We shall define the following cost function:

\begin{matrix} L = \sum_{n = 1}^{N} {(i_{n} - i_{n + 1})}^{2} + λ_{b} (B - \sum_{n = 1}^{N} i_{n} b_{n}) + λ_{g} (G - \sum_{n = 1}^{N} i_{n} g_{n}) + λ_{r} (R - \sum_{n = 1}^{N} i_{n} r_{n}) \end{matrix}

While

λ_{b}, λ_{g}, λ_{r}

are Lagrange Multipliers. The derivative of the cost function with respect to the different parameters is

\frac{\partial L}{\partial λ_{b}} = 0 \Rightarrow B = \sum_{n = 1}^{N} i_{n} b_{n}

(A4)

\frac{\partial L}{\partial λ_{g}} = 0 \Rightarrow G = \sum_{n = 1}^{N} i_{n} g_{n}

(A5)

\frac{\partial L}{\partial λ_{r}} = 0 \Rightarrow R = \sum_{n = 1}^{N} i_{n} r_{n}

(A6)

\forall 0 < k < N

:

\begin{matrix} \frac{\partial L}{\partial i_{k}} = 2 (i_{k} - i_{k + 1}) - 2 (i_{k - 1} - i_{k}) - λ_{b} b_{k} - λ_{g} g_{k} - λ_{r} r_{k} = 0 \\ \Rightarrow 4 i_{k} - 2 i_{k + 1} - 2 i_{k - 1} = λ_{b} b_{k} + λ_{g} g_{k} + λ_{r} r_{k} \end{matrix}

We would like to write these system of equations as vectors. Hence, we define the following:

{\vec{C}}_{(3, 1)} = (\begin{matrix} B \\ G \\ R \end{matrix}) {\vec{λ}}_{(3, 1)} = (\begin{matrix} λ_{b} \\ λ_{g} \\ λ_{r} \end{matrix}) {\vec{I}}_{(N, 1)} = (\begin{matrix} i_{1} \\ i_{2} \\ ⋮ \\ i_{N} \end{matrix})

(A7)

S_{(N, 3)} = (\begin{matrix} \vec{b} & \vec{g} & \vec{r} \end{matrix}) M_{(N, N)} = (\begin{matrix} 4 & - 2 & 0 & \dots & 0 \\ - 2 & 4 & - 2 & \dots & 0 \\ 0 & - 2 & 4 & \dots & 0 \\ ⋮ \\ 0 & \dots & - 2 & 4 & - 2 \\ 0 & \dots & 0 & - 2 & 4 \end{matrix})

(A8)

B = {\vec{b}}^{⊺} \cdot \vec{I} G = {\vec{g}}^{⊺} \cdot \vec{I} R = {\vec{r}}^{⊺} \cdot \vec{I} M \cdot \vec{I} = S \cdot \vec{λ}

(A9)

Notice that

M

is a full-rank matrix (adding every row half from the row above leads to an upper triangular matrix), so we can say

\begin{matrix} \vec{I} = M^{- 1} S \cdot \vec{λ} \Rightarrow & B = {\vec{b}}^{⊺} \cdot M^{- 1} S \cdot \vec{λ} \\ G = {\vec{g}}^{⊺} \cdot M^{- 1} S \cdot \vec{λ} \\ R = {\vec{r}}^{⊺} \cdot M^{- 1} S \cdot \vec{λ} \end{matrix}

and we can write this as

\vec{C} = S^{⊺} M^{- 1} S \cdot \vec{λ}

(A10)

By requiring that S needs to be a full-rank matrix and inverting the relation to find Lagrange multipliers, we find

\vec{λ} = {(S^{⊺} M^{- 1} S)}^{- 1} \cdot \vec{C}

(A11)

and the final result is

\vec{I} = M^{- 1} S {(S^{⊺} M^{- 1} S)}^{- 1} \cdot \vec{C}

(A12)

This compact result provides us a way (given flicker vectors of

\vec{b}, \vec{g}, \vec{r}

) to extract the signal values into each frame

\vec{I}

using the total input values of blue, green, and red intensity. Note that we required the S matrix, which represents the flicker pattern, to be a full-rank matrix. This requirement limits the number of possible matrices, assuming

S_{i j}

can only be 0 or 1. In general, it is not obvious what S matrix should be under these constraints because it very much depends on the sampled function behavior manner. We will present an analysis regarding choosing the flicker pattern later.

Appendix A.3. Expanding to Spatial Regularization

To enhance the spatial correlation between adjacent pixels, the cost function can be modified to add some spatial regularization. We assume that only the first-level neighbors are relevant, and we define the domain P to be a five-pixel domain (see Figure A1).

\begin{matrix} L = \sum_{n = 1}^{N - 1} \sum_{x, y \in P} w_{x, y}^{t} {(i_{x, y, n} - i_{x, y, n + 1})}^{2} + w_{x, y}^{s} {(i_{x, y, n} - i_{x, y + 1, n})}^{2} + \\ w_{x, y}^{s} {(i_{x, y, n} - i_{x + 1, y, n})}^{2} + \sum_{m = 1}^{M} \sum_{x, y \in P a t c h} λ_{m, x, y} (C_{m, x, y} - \sum_{n = 1}^{N} i_{x, y, n} c_{x, y, m, n}) \end{matrix}

where

w_{x, y}^{t}, w_{x, y}^{s}

are the weight factors for the temporal condition and the spatial condition, respectively. For simplicity, we assume a constant weight for

w_{x, y}^{s} = w^{s}, w_{x, y}^{t} = w^{t}

.

This system has the same solution form when mapping the different vectors to make column stack vectors (for each of the pixels); for example,

{\vec{C}}_{(3 * 5, 1)} = (\begin{matrix} B_{1} \\ G_{1} \\ R_{1} \\ ⋮ \\ B_{5} \\ G_{5} \\ R_{5} \end{matrix}) {\vec{λ}}_{(3 * 5, 1)} = (\begin{matrix} λ_{b, 1} \\ λ_{g, 1} \\ λ_{r, 1} \\ ⋮ \\ λ_{b, 5} \\ λ_{g, 5} \\ λ_{r, 5} \end{matrix})

(A13)

The S matrix (

5 N

× 15) is changed into a block diagonal matrix, while each block corresponds to different pixels in the domain P. M matrix (

5 N

× 15) has changed to be

M_{i, j} = \{\begin{matrix} 2 w^{s} + 2 w^{t} & if i = j \\ - 2 w^{t} & if | i - j | = 1 \\ - 2 w^{s} & if | i - j | m o d 5 = 0 \\ 0 & else \end{matrix}

(A14)

Figure A1. Pixel neighbor order, as was used in our method. The P domain includes pixels 2, 3, 4, and 5.

Appendix A.4. Signals Similarity Metrics

To evaluate the method’s accuracy, we shall compare our result with the actual signal. Since we deal with the function for finite energy (in

L^{2}

), the inner product is defined as

< f_{1} (t), f_{2} (t) > \equiv \int_{- \infty}^{\infty} f_{1} (t) f_{2} (t) d t

(A15)

We introduce two error metrics that we found to be the most reasonable to examine:

Euclidean distance (L2):

L_{2} (f_{1} (t), f_{2} (t)) \equiv \sqrt{| | f_{1} (t) - f_{2} (t) {| |}_{2}^{2}}

(A16)

Cosine similarity:

E r r o r (f_{1} (t), f_{2} (t)) \equiv {cos}^{- 1} (\frac{< f_{1} (t), f_{2} (t) >}{\sqrt{| | f_{1} {(t) | |}^{2} | | f_{2} (t) {| |}^{2}}})

(A17)

The first metric was chosen because it is a prevalent and intuitive way to compare signals. The second metric was chosen to avoid bias due to the signals’ absolute magnitude and to focus only on their shape relations; it was used in the experimental part.

Appendix A.5. Flicker Order Choices

N = 4:

\begin{matrix} 0 : & \vec{b} = (1, 0, 0, 1), \vec{g} = (0, 1, 0, 0), \vec{r} = (0, 0, 1, 0) \\ 1 : & \vec{b} = (1, 0, 0, 1), \vec{g} = (1, 0, 1, 0), \vec{r} = (0, 1, 0, 1) \\ 2 : & \vec{b} = (1, 0, 0, 0), \vec{g} = (0, 1, 1, 0), \vec{r} = (0, 0, 0, 1) \\ 3 : & \vec{b} = (1, 1, 0, 0), \vec{g} = (0, 1, 1, 0), \vec{r} = (0, 0, 1, 1) \\ 4 : & \vec{b} = (0, 0, 1, 0), \vec{g} = (0, 1, 0, 0), \vec{r} = (1, 1, 1, 1) \end{matrix}

N = 5:

\begin{matrix} 0 : & \vec{b} = (1, 0, 0, 0, 1), \vec{g} = (0, 1, 1, 0, 0), \vec{r} = (0, 0, 1, 1, 0) \\ 1 : & \vec{b} = (1, 0, 0, 1, 0), \vec{g} = (1, 0, 1, 0, 1), \vec{r} = (0, 1, 0, 0, 1) \\ 2 : & \vec{b} = (1, 0, 0, 1, 0), \vec{g} = (0, 0, 1, 0, 0), \vec{r} = (0, 1, 0, 0, 1) \\ 3 : & \vec{b} = (0, 1, 0, 0, 0), \vec{g} = (1, 0, 1, 0, 1), \vec{r} = (0, 0, 0, 1, 0) \\ 4 : & \vec{b} = (1, 1, 0, 0, 0), \vec{g} = (0, 1, 1, 1, 0), \vec{r} = (0, 0, 0, 1, 1) \end{matrix}

N = 6:

\begin{matrix} 0 : & \vec{b} = (1, 0, 0, 0, 1, 0), \vec{g} = (0, 1, 0, 0, 0, 1), \vec{r} = (0, 0, 1, 1, 0, 0) \\ 1 : & \vec{b} = (1, 1, 0, 0, 0, 0), \vec{g} = (0, 0, 1, 1, 0, 0), \vec{r} = (0, 0, 0, 0, 1, 1) \\ 2 : & \vec{b} = (1, 0, 0, 1, 0, 0), \vec{g} = (0, 1, 1, 1, 1, 0), \vec{r} = (0, 0, 1, 0, 0, 1) \\ 3 : & \vec{b} = (1, 0, 1, 0, 1, 0), \vec{g} = (0, 1, 0, 1, 0, 1), \vec{r} = (1, 1, 1, 1, 1, 1) \\ 4 : & \vec{b} = (0, 1, 0, 0, 0, 0), \vec{g} = (1, 0, 1, 1, 0, 1), \vec{r} = (0, 0, 0, 0, 1, 0) \end{matrix}

Figure A2. Simulation result comparison for a square wave with a 25 Hz base frequency. Comparison for different N factors: 3, 4, 5, and 6 in the order of top-left, top-right, bottom-left, bottom-right, respectively.

Figure A3. Simulation result comparison for a square wave with 5 Hz and 10 Hz base frequencies. Comparison for different N factors: 3, 4, 5, and 6 in the order of top-left, top-right, bottom-left, and bottom-right, respectively.

Appendix A.6. SNR Analysis

The Signal: We split the exposure time to N disjoint time steps according to the derivation. At the same time, for each of these steps, we assumed that the captured light was generated by the light source and then reflected from the object (up to a color reflective factor), in addition to the environment’s existing background illumination. In that case, we can not assume that the captured signal was taken only at the time steps when the flicker was applied. Therefore, a more precise derivation should be presented (we denote b = blue, g = green, and r = red, and t is the exposure time of the camera):

\begin{matrix} S_{T S R} = & \sum_{n = 1}^{N} \frac{t}{N} [γ_{b} (i_{n}^{b e n v} + i_{n}^{b f l i c k e r}) + γ_{g} (i_{n}^{g e n v} + i_{n}^{g f l i c k e r}) + γ_{r} (i_{n}^{r e n v} + i_{n}^{r f l i c k e r})] \end{matrix}

This equation says that, for each frame, the signal comprises a sum of the intensity from the environment and the flicker over N time steps. We assume that the flickering intensity is equal for different colors, and we denote it as

i^{f l i c k e r}

. We assume that the environment light is a constant white light with approximately equal intensity for the entire spectral range, denoting it as

i^{e n v}

. In addition, we denote the ratio

α \equiv \frac{i^{f l i c k e r}}{i^{e n v}}

so we can say that

\begin{matrix} S_{T S R} \approx (α^{- 1} + 1) i^{f l i c k e r} \sum_{n = 1}^{N} \frac{t}{N} (γ_{b} b_{n} + γ_{g} g_{n} + γ_{r} r_{n}) \geq (\frac{1}{α} + 1) i^{f l i c k e r} t min {γ_{b}, γ_{g}, γ_{r}} \end{matrix}

We used the fact that we do not allow a time step without a flicker at all. The signal ratio between the signal with and without the flicker (

S_{T}

represents the SNR without the flicker):

\begin{matrix} \frac{S_{T S R}}{S_{T}} = \frac{(α^{- 1} + 1) i^{f l i c k e r} \sum_{n = 1}^{N} \frac{t}{N} (γ_{b} b_{n} + γ_{g} g_{n} + γ_{r} r_{n})}{t i^{e n v}} \geq (1 + α) min {γ_{b}, γ_{g}, γ_{r}} \end{matrix}

The Noise: Say that we deal with a temporal signal and separate its exposure time into several temporal steps. Then, we adopt the noise model, as presented in [28]. Note that the sensor still works on the same original exposure time, but we assume that the dominant noise factor for the source illumination is the shot noise:

N o i s e_{T S R} = \sqrt{S_{T S R} + D t + N_{r}^{2}} \approx \sqrt{S_{T S R}}

(A18)

while t is the camera’s total exposure time, D represents the dark noise coefficient factor, and

N_{r}

the read noise. Then, we can say that the noise ratio between the signals is

\frac{N o i s e_{T S R}}{N o i s e_{T}} = \sqrt{\frac{S i g n a l_{T S R}}{S i g n a l_{T} t + D t + N_{r}^{2}}} \leq \sqrt{\frac{S i g n a l_{T S R}}{S_{T}}}

(A19)

The SNR ratio: According to the definition:

\begin{matrix} \frac{S N R_{T S R}}{S N R_{T}} = & \frac{\frac{S_{T S R}}{N o i s e_{T S R}}}{\frac{S_{T}}{N o i s e_{T}}} \geq {(\frac{S_{T S R}}{S_{T}})}^{3 / 2} \geq {[(1 + α) min {γ_{b}, γ_{g}, γ_{r}}]}^{3 / 2} \end{matrix}

This means that the SNR improves significantly for increasing the

α

factor. For an approximately white object and when

α > > 1

(strong flicker), we find that the improvement in the SNR is

\propto α^{3 / 2}

.

Dealing with environmental illumination: When the environmental illumination is not negligible (

α

∼1), the previous assumption about the detected light does not hold anymore. Then, we can estimate the error in our method, saying that each channel has an additional detected light from the environment:

\vec{C} \Rightarrow \vec{C} + Δ \vec{C} = \vec{C} + \frac{1}{α} \vec{C}

(A20)

which leads to a change in

Δ I

:

\vec{I} \Rightarrow \vec{I} + Δ \vec{I}

(A21)

Δ \vec{I} = \frac{1}{α} M^{- 1} S {(S^{⊺} M^{- 1} S)}^{- 1} \cdot \vec{C} = α^{- 1} \vec{I}

(A22)

Therefore, the error grows as

α^{- 1}

. It is worth mentioning that the contribution of the illumination of the environment can be estimated before and then subtracted from the channel vectors.

Appendix A.7. Regularization Analysis

A comparison for different frame reconstructions given different regularization (spatial vs. temporal) is presented in Figure A4:

Figure A4. Rotating rope (counter clockwise) from left to right. Analysis for different temporal and spatial weights (N = 6).

References

Gustafsson, M.G.L. Surpassing the lateral resolution limit by a factor of two using structured illumination microscopy. J. Microsc. 2000, 198, 82–87. [Google Scholar] [CrossRef] [PubMed]
Mendlovic, D.; Lohmann, A.W. Space bandwidth product adaptation and its application to superresolution, fundamentals. J. Opt. Soc. Am. A 1997, 14, 558–562. [Google Scholar] [CrossRef]
Abraham, E.; Zhou, J.; Liu, Z. Speckle structured illumination endoscopy with enhanced resolution at wide field of view and depth of field. Opto. Electron. Adv. 2023, 6, 220163-1–220163-8. [Google Scholar] [CrossRef]
Brown, A.J. Equivalence relations and symmetries for laboratory, LIDAR, and planetary Müeller matrix scattering geometries. J. Opt. Soc. Am. A 2014, 31, 2789–2794. [Google Scholar] [CrossRef]
Betzig, E.; Trautman, J.K. Near-Field Optics: Microscopy, Spectroscopy, and Surface Modification Beyond the Diffraction Limit. Science 1992, 257, 189–195. [Google Scholar] [CrossRef] [PubMed]
di Francia, G.T. Degrees of Freedom of an Image. J. Opt. Soc. Am. 1969, 59, 799–804. [Google Scholar] [CrossRef] [PubMed]
Lukosz, W. Optical Systems with Resolving Powers Exceeding the Classical Limit∗. J. Opt. Soc. Am. 1966, 56, 1463–1471. [Google Scholar] [CrossRef]
García, J.; Micó, V.; Cojoc, D.; Zalevsky, Z. Full field of view super-resolution imaging based on two static gratings and white light illumination. Appl. Opt. 2008, 47, 3080–3087. [Google Scholar] [CrossRef]
Weiner, A.M.; Heritage, J.P.; Kirschner, E.M. High-resolution femtosecond pulse shaping. J. Opt. Soc. Am. B 1988, 5, 1563–1572. [Google Scholar] [CrossRef]
Sabo, E.; Zalevsky, Z.; Mendlovic, D.; Konforti, N.; Kiryuschev, I. Superresolution optical system with two fixed generalized Damman gratings. Appl. Opt. 2000, 39, 5318–5325. [Google Scholar] [CrossRef]
Zhao, N.; Wei, Q.; Basarab, A.; Dobigeon, N.; Kouame, D.; Tourneret, J.Y. Fast Single Image Super-Resolution. arXiv 2016, arXiv:1510.00143. [Google Scholar]
Hu, M.; Jiang, K.; Wang, Z.; Bai, X.; Hu, R. CycMuNet+: Cycle-Projected Mutual Learning for Spatial-Temporal Video Super-Resolution. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 13376–13392. [Google Scholar] [CrossRef]
Wang, Z.; Chen, J.; Hoi, S.C.H. Deep Learning for Image Super-resolution: A Survey. arXiv 2020, arXiv:1902.06068. [Google Scholar] [CrossRef]
Lepcha, D.C.; Goyal, B.; Dogra, A.; Goyal, V. Image super-resolution: A comprehensive review, recent trends, challenges and applications. Inf. Fusion 2023, 91, 230–260. [Google Scholar] [CrossRef]
Qiu, D.; Cheng, Y.; Wang, X. Medical image super-resolution reconstruction algorithms based on deep learning: A survey. Comput. Methods Programs Biomed. 2023, 238, 107590. [Google Scholar] [CrossRef] [PubMed]
Xiao, Y.; Yuan, Q.; He, J.; Zhang, Q.; Sun, J.; Su, X.; Wu, J.; Zhang, L. Space-time super-resolution for satellite video: A joint framework based on multi-scale spatial-temporal transformer. Int. J. Appl. Earth Obs. Geoinf. 2022, 108, 102731. [Google Scholar] [CrossRef]
Jiang, J.; Wang, C.; Liu, X.; Ma, J. Deep Learning-based Face Super-Resolution: A Survey. arXiv 2021, arXiv:2101.03749. [Google Scholar] [CrossRef]
Chen, G.; Asraf, S.; Zalevsky, Z. Superresolved space-dependent sensing of temporal signals by space multiplexing. Appl. Opt. 2020, 59, 4234–4239. [Google Scholar] [CrossRef] [PubMed]
Llull, P.; Liao, X.; Yuan, X.; Yang, J.; Kittle, D.; Carin, L.; Sapiro, G.; Brady, D.J. Coded aperture compressive temporal imaging. Opt. Express 2013, 21, 10526–10545. [Google Scholar] [CrossRef] [PubMed]
Yoshida, M.; Sonoda, T.; Nagahara, H.; Endo, K.; Sugiyama, Y.; Taniguchi, R. High-Speed Imaging Using CMOS Image Sensor With Quasi Pixel-Wise Exposure. IEEE Trans. Comput. Imaging 2020, 6, 463–476. [Google Scholar] [CrossRef]
Raskar, R.; Agrawal, A.; Tumblin, J. Coded Exposure Photography: Motion Deblurring Using Fluttered Shutter. ACM Trans. Graph. 2006, 25, 795–804. [Google Scholar] [CrossRef]
Liu, Z.; Yeh, R.A.; Tang, X.; Liu, Y.; Agarwala, A. Video Frame Synthesis using Deep Voxel Flow. arXiv 2017, arXiv:1702.02463. [Google Scholar]
Meyer, S.; Wang, O.; Zimmer, H.; Grosse, M.; Sorkine-Hornung, A. Phase-Based Frame Interpolation for Video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1410–1418. [Google Scholar] [CrossRef]
Niklaus, S.; Liu, F. Context-aware Synthesis for Video Frame Interpolation. arXiv 2018, arXiv:1803.10967. [Google Scholar]
Pollak Zuckerman, L.; Naor, E.; Pisha, G.; Bagon, S.; Irani, M. Across Scales and Across Dimensions: Temporal Super-Resolution using Deep Internal Learning. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
Son, S.; Lee, J.; Nah, S.; Timofte, R.; Lee, K.M. AIM 2020 Challenge on Video Temporal Super-Resolution. arXiv 2020, arXiv:2009.12987. [Google Scholar]
Shannon, C. Communication in the Presence of Noise. Proc. IRE 1949, 37, 10–21. [Google Scholar] [CrossRef]
SNR Model of an Image. Available online: https://camera.hamamatsu.com/jp/en/learn/technical_information/thechnical_guide/calculating_snr.html (accessed on 30 September 2020).
Agrawal, A.; Gupta, M.; Veeraraghavan, A.; Narasimhan, S.G. Optimal coded sampling for temporal super-resolution. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 599–606. [Google Scholar] [CrossRef]
Jiang, H.; Sun, D.; Jampani, V.; Yang, M.H.; Learned-Miller, E.; Kautz, J. Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation. arXiv 2018, arXiv:1712.00080. [Google Scholar]
Barron, J.; Fleet, D.; Beauchemin, S. Performance Of Optical Flow Techniques. Int. J. Comput. Vis. 1994, 12, 43–77. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of our method. An object moves and changes its intensity value under different illumination conditions: Top—arbitrary environment illumination. In this case, there is no obvious way to reconstruct the object intensity value in time since the sensor integrates all the light from the scene. Middle—under colored flicker illumination with our prior knowledge about the flicker pattern, we can recover the object intensity value with high-quality certainty. Among all the possible temporal profiles, we choose the most reasonable one in the sense of minimum energy. Bottom—the high-resolution reconstructed signal.

Figure 2. Signal reconstruction error vs. the frequency when using a random S full-rank matrix for each frame. The result (in green) is represented and compared to the non-up-sampled signal (blue) and the particular case of the reconstructed signal of

N = 3

, and S is the identity matrix (orange).

Figure 2. Signal reconstruction error vs. the frequency when using a random S full-rank matrix for each frame. The result (in green) is represented and compared to the non-up-sampled signal (blue) and the particular case of the reconstructed signal of

N = 3

, and S is the identity matrix (orange).

Figure 3. Signal reconstruction error comparison between several candidates for the flicker pattern for different up-sample factors, N (4—top left, 5—top right, and 6—bottom). The Y-axis represents the error, and the X-axis represents the frequency.

Figure 4. Signal reconstruction error between the actual harmonic signal vs. different frequencies. The Y-axis represents the error rate, and X-axis represents the frequency. A comparison between various up-sampled factors. A frequency of 5 Hz is the maximum the camera can detect due to the Nyquist theorem, an up-sample factor of N = 3, 4, 5, and 6 extends the frequency range to 15 Hz, 20 Hz, 25 Hz, and 30 Hz, respectively.

Figure 5. Simulation results for different signals; N = 3, 4, 5, and 6; camera FPS = 10. Blue is the original signal; Orange is the camera reconstruction (no TSR); Green is our TSR algorithm.

Figure 6. Our anti-aliasing algorithm in the scanning mode; combination of N = 3, N = 4, N = 5, and N = 6. Left: before the algorithm; right: after the algorithm. All aliasing was eliminated up to a frequency of 20 Hz.

Figure 7. The experimental setup. Left: A rotating fan in front of our camera setup. Right: Our camera setup, with synchronized LEDs (one red, one blue, and two green) and a Raspberry Pi.

Figure 8. Different imaging examples for N = 3 and N = 4. Top: Our technique; Middle: SuperSlowmo [30]; Bottom: the Flatter Shutter technique [21]. Here, we used

W_{t} = 3

and

W_{s} = 1

.

Figure 8. Different imaging examples for N = 3 and N = 4. Top: Our technique; Middle: SuperSlowmo [30]; Bottom: the Flatter Shutter technique [21]. Here, we used

W_{t} = 3

and

W_{s} = 1

.

Figure 9. Basic examples of our up-sampling method for different scenes (each row). Here, we used

N = 3

, while the first column (from left to right) shows the recorded frame, and the three other columns show the temporal sequence.

Figure 9. Basic examples of our up-sampling method for different scenes (each row). Here, we used

N = 3

, while the first column (from left to right) shows the recorded frame, and the three other columns show the temporal sequence.

Figure 10. Experimental results for N = 3, 4, 5, and 6; camera FPS = 10. Blue is the original signal; orange is the camera reconstruction (no TSR); red is our TSR algorithm. Our method successfully detects spectral components up to a frequency of 30 Hz.

Figure 11. Left–Middle: SNR measurement with (left) and without (middle) flicker (vs.

α

factor). Please notice that

α

is on a logarithmic scale. Right: Experimental measurements of cosine similarity between the actual signal and the reconstructed one for different

α

values; note that

α

is in a logarithmic scale (

N = 3

).

Figure 11. Left–Middle: SNR measurement with (left) and without (middle) flicker (vs.

α

factor). Please notice that

α

is on a logarithmic scale. Right: Experimental measurements of cosine similarity between the actual signal and the reconstructed one for different

α

values; note that

α

is in a logarithmic scale (

N = 3

).

Figure 12. The rotating fan experiment; the original video vs. the up-sampled version video (

N = 3

) error comparison regarding motion estimation with time [31]. The lighting condition is poor, and the detection task is difficult, but there is still a significant improvement in the ability to detect the proper motion.

Figure 12. The rotating fan experiment; the original video vs. the up-sampled version video (

N = 3

) error comparison regarding motion estimation with time [31]. The lighting condition is poor, and the detection task is difficult, but there is still a significant improvement in the ability to detect the proper motion.

Table 1. Reconstruction error for each N factor for different frequency ranges.

N	Frequencies [Hz]	L2 Error [× $10^{- 3}$ ]	Normalized Error
3	5–15	$\pm 3.1$	1
4	15–20	$\pm 10$	$3.22$
5	23–25	$\pm 14$	$4.5$
6	28–30	$\pm 18$	$5.8$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cohen, K.; Mendlovic, D.; Raviv, D. Temporal Super-Resolution Using a Multi-Channel Illumination Source. Sensors 2024, 24, 857. https://doi.org/10.3390/s24030857

AMA Style

Cohen K, Mendlovic D, Raviv D. Temporal Super-Resolution Using a Multi-Channel Illumination Source. Sensors. 2024; 24(3):857. https://doi.org/10.3390/s24030857

Chicago/Turabian Style

Cohen, Khen, David Mendlovic, and Dan Raviv. 2024. "Temporal Super-Resolution Using a Multi-Channel Illumination Source" Sensors 24, no. 3: 857. https://doi.org/10.3390/s24030857

APA Style

Cohen, K., Mendlovic, D., & Raviv, D. (2024). Temporal Super-Resolution Using a Multi-Channel Illumination Source. Sensors, 24(3), 857. https://doi.org/10.3390/s24030857

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Temporal Super-Resolution Using a Multi-Channel Illumination Source

Abstract

1. Introduction

2. Theoretical Background

2.1. Temporal Model for an Image Sensor

2.2. Multi-Channel Approach and Assumptions

2.3. Definitions

3. Method

3.1. Spatial Regularization

3.2. Solution with Lagrange Multipliers

3.3. Colored Light Source

3.4. The Scanning Mode and Anti-Aliasing Algorithm

3.5. Performance Analysis and Signal-to-Noise Ratio (SNR)

4. Numerical Simulations and Analysis

4.1. Flicker Pattern Analysis

4.2. Simulations Results

5. Experimental Results

5.1. The Setup

5.2. Signal Reconstruction Results

5.3. Imaging Reconstruction Results

5.4. SNR and Performance Results

5.5. Motion Estimation Improvement

5.6. Motion Estimation Analysis

5.7. Discussion

6. Conclusions

7. Patents

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1. Definitions

Appendix A.2. Solving the Optimization Problem for Lagrange Multipliers

Appendix A.3. Expanding to Spatial Regularization

Appendix A.4. Signals Similarity Metrics

Appendix A.5. Flicker Order Choices

Appendix A.6. SNR Analysis

Appendix A.7. Regularization Analysis

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI