Space-Time Image Velocimetry Based on Improved MobileNetV2

Hu, Qiming; Wang, Jianping; Zhang, Guo; Jin, Jianhui

doi:10.3390/electronics12020399

Open AccessArticle

Space-Time Image Velocimetry Based on Improved MobileNetV2

by

Qiming Hu

,

Jianping Wang

^*

,

Guo Zhang

and

Jianhui Jin

Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(2), 399; https://doi.org/10.3390/electronics12020399

Submission received: 23 December 2022 / Revised: 10 January 2023 / Accepted: 10 January 2023 / Published: 12 January 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Space-time image velocimetry (STIV) technology has achieved good performance in river surface-flow velocity measurement, but the application in a field environment is affected by bad weather or lighting conditions, which causes large measurement errors. To improve the measurement accuracy and robustness of STIV, we combined STIV with deep learning. Additionally, considering the light weight of the neural network model, we adopted MobileNetV2 and improved its classification accuracy. We name this method MobileNet-STIV. We also constructed a sample-enhanced mixed dataset for the first time, with 180 classes of images and 100 images per class to train our model, which resulted in a good performance. Compared to the current meter measurement results, the absolute error of the mean velocity was 0.02, the absolute error of the flow discharge was 1.71, the relative error of the mean velocity was 1.27%, and the relative error of the flow discharge was 1.15% in the comparative experiment. In the generalization performance experiment, the absolute error of the mean velocity was 0.03, the absolute error of the flow discharge was 0.27, the relative error of the mean velocity was 6.38%, and the relative error of the flow discharge was 5.92%. The results of both experiments demonstrate that our method is more accurate than the conventional STIV and large-scale particle image velocimetry (LSPIV).

Keywords:

STIV; deep learning; MobileNetV2; river flow measurement

1. Introduction

There are many rivers in China, and some areas are prone to flood disasters, and timely and accurate acquisition of hydrological information such as flow velocity and flow rate is essential for disaster prevention and control [1]. However, the severe field measurement environment, which easily damages expensive professional instruments and threatens the personal safety of the measurement personnel, makes it difficult for traditional contact flow measurement methods such as the streamflow meter method to be properly applied [2]. Therefore, the development of non-contact flow measurement methods is necessary [3,4].

In recent years, image-based flow measurement methods have developed rapidly and have gained widespread attention and application because of their simplicity, efficiency, and safety [5,6]. These methods include the particle image velocimetry (PIV) method [7], large-scale particle image velocimetry (LSPIV) method [8], optical flow method, and space-time image velocimetry (STIV) method [9]. In the STIV method, the velocimetry line is firstly set for the image, and the velocimetry line direction is consistent with the main flow direction of the river; then, the space-time image (STI) is generated according to the velocimetry line, and the space-time image generated under normal circumstances is done with the main orientation of texture (MOT). The key of STIV lies in the estimation of the main orientation of the texture. The current methods for detecting the main orientation of texture can be divided into two categories according to the processing domain of the image, including the gradient tensor method (GTM) [9] and the two-dimensional autocorrelation function (QESTA) [10]. The frequency-domain method is to detect the main direction of the texture after converting the spatio-temporal image to the frequency domain by fast Fourier transform (FFT) and then calculate the river surface-flow velocity [11]. GTM divides a space-time image into several small windows, calculates the texture direction of each small window separately, and then calculates the MOT of this space-time image according to the weight of each window. However, this method has poor anti-interference ability. QESTA mainly calculates the two-dimensional autocorrelation function distribution of image intensity in the space-time image, and the gradient of the higher correlation region corresponds to the effective texture direction of the space-time image, and then the MOT is calculated from this gradient. This method is less accurate in detecting space-time images with interference stripes in a non-vertical direction. The frequency domain method based on the fast Fourier transform converts the problem of detecting MOT in the air domain to the frequency domain. Combined with the frequency domain filtering technique, this method is the best performing method among the above main methods, but it still struggles to take into account the detection in various scenes and relies on the manual tuning of the filter.

In complex scenes, the traditional MOT detection algorithm is prone to large detection errors, and the filtering algorithm is difficult to set parameters to achieve adaptive selection. In recent years, deep learning has developed rapidly, and to improve the measurement accuracy and robustness of STIV, this paper considers the MOT detection problem as an image classification problem based on deep learning and combines STIV with deep learning to measure the flow. To pursue classification accuracy, many models are getting more complex. However, these complex models are difficult to be applied on mobile or embedded devices with limited memory. When comparing the accuracies of different network models for the classification of synthetic datasets, we decided to improve on the lightweight network MobileNetV2 [12] to design a network model that can be used on mobile or embedded devices. We improved the structure of MobileNetV2 by adding dilated convolution and global attention mechanism.

2. Overview of STIV

2.1. Generation of STI

The STIV method is mainly divided into three steps: generating space-time images, detecting the MOT of space-time images, and calculating the flow velocity vector. The space-time image is generated as shown in Figure 1. Firstly, an M-frame image sequence is acquired at a suitable time interval

Δ

t, and then a set of single-pixel-wide and L-pixel-long velocity lines are set in the image along the direction of water flow movement. Finally, a space-time image of LxM is synthesized for each velocity line with x-t as a right-angle coordinate system. The space-time image is represented as bright and dark stripes with certain directions, and the angle

δ

between the main direction of the texture and the vertical coordinate axis is defined as MOT; and the value of

δ

is determined by the flow direction of the river and the magnitude of the flow velocity. Let the tracer in the world coordinate system move a distance D along a certain velocity line direction in time T, which is represented as d pixels in

τ

frames in the image coordinate system, and then the corresponding flow velocity V(m· s

^{- 1}

) can be expressed as:

V = \frac{T}{D} = \frac{d \cdot Δ s}{τ \cdot Δ t} = tan δ \cdot \frac{Δ s}{Δ t} = v \cdot Δ s

(1)

where v(pixel/s

^{- 1}

) represents the optical flow motion vector, the positive or negative value of which reflects the direction of motion; there is only a scaling relationship between V and

v s .

expressed by the object scale factor

Δ

s(m· pixel

^{- 1}

) on the velocimetric line, which can be obtained by flow field calibration and perspective projection transformation [13].

2.2. Traditional MOT Detection Method

The gradient tensor method is the traditional STIV technique to detect MOT, and the whole process is shown in Figure 2. In a good video shooting environment, this method can calculate MOT more accurately, but in practice, the video shooting will be affected by various environmental factors, and the quality of the video itself, such as frame rate and resolution, which will affect the accuracy of the calculation results. The method first divides a space-time image into multiple small windows, and then calculates the MOT of each small window with the following formula:

tan 2 δ = \frac{2 G_{x t}}{G_{t t} - G_{x x}}

(2)

G_{x x} = {\int \int}_{w} \frac{\partial I (x, t)}{\partial x} \frac{\partial I (x, t)}{\partial x} d x d x

(3)

G_{x t} = {\int \int}_{w} \frac{\partial I (x, t)}{\partial x} \frac{\partial I (x, t)}{\partial t} d x d t

(4)

G_{t t} = {\int \int}_{w} \frac{\partial I (x, t)}{\partial t} \frac{\partial I (x, t)}{\partial t} d t d t

(5)

where

I (x, t)

denotes the image intensity value of the space-time image and w denotes the integration region.

Then, the consistency C of each window is calculated, which indicates the clarity of the space-time image texture, and a larger value of C indicates a clearer texture of the space-time image, and the calculation formula is as follows:

C = \frac{\sqrt{{(G_{x x} - G_{t t})}^{2} + 4 G_{x t}^{2}}}{G_{x x} + G_{t t}}

(6)

Finally, to minimize the influence of the noise in the space-time image on the calculation results, the MOT of each window of the space-time image is averaged to obtain the final MOT, which is calculated as follows:

\bar{δ} = \frac{\sum_{i} δ_{i} \cdot C (δ_{i})}{\sum_{i} C (δ_{i})}

(7)

2.3. MOT Detection Method Applying Deep Learning

Convolutional neural network (CNN) learns high-dimensional and abstract features from a dataset with labels and generalizes to the same type of data but unknown. In this paper, MOT detection is considered an image classification task in deep learning, and the whole process is shown in Figure 3. First, a space-time image is fast Fourier transformed and resized at the same time and then fed into a trained model, which returns the kind of image with the highest confidence among all classifications, i.e., MOT.

The implementation of the above classification relies mainly on the activation function softmax in the model, which has multiple output nodes and can assign a probability value to each category, indicating the probability that the input image is of that category and outputs the category with the highest probability value as the final prediction result. Its formula is as follows:

s o f t m a x (z_{i}) = \frac{e^{z_{i}}}{\sum_{c = 1}^{n} e^{z_{c}}}

(8)

where

z_{i}

denotes the output value of the ith node and n denotes the number of output nodes, i.e., the number of categories of classification. The softmax function converts the output values of multiple categories into a probability distribution in the range of [0,1].

As can be seen in Equation (8), the activation function softmax introduces an exponential function, and the exponential function curve has an increasing trend and a gradually increasing slope. According to this property, the softmax function can pull apart the distance between output values with large gaps, thereby making the classification more accurate. However, this also creates a new problem: when

z_{i}

is very large, the calculated value will also be very large and may even generate a numerical overflow. The optimization method for numerical overflow is to subtract the largest value from each output value. The formula is expressed as follows:

D = m a x (z_{i})

(9)

s o f t m a x (z_{i}) = \frac{e^{z_{i} - D}}{\sum_{c = 1}^{n} e^{z_{c} - D}}

(10)

2.4. Surface Flow Velocity Measurement and Discharge Calculation

Let the surface-flow velocity of measurement point i be

v_{i}

, the area of the cross-section corresponding to measurement point i − 1 to i be

s_{i}

, and the mean surface-flow velocity between measurement point i − 1 to i be

{\bar{v}}_{i}

, calculated as follows:

{\bar{v}}_{i} = \frac{v_{i - 1} + v_{i}}{2}

(11)

s_{i} = \frac{d_{i - 1} + d_{i}}{2} l_{i}

(12)

where

d_{i}

is the water depth corresponding to velocity measurement point i, and

l_{i}

is the distance between two points.

The product of the surface-flow coefficient and the average surface-flow velocity is the vertical average flow velocity of the point

v_{i}^{'} = k {\bar{v}}_{i}

. According to the flow conditions, the surface-flow coefficient k = 0.78∼0.88; according to the river-flow test specification, which can be calculated for the flow rate of the overwater section

s_{i}

as

q_{i} = v_{i}^{'} s_{i}

, the total flow rate of the river Q is:

Q = \sum_{i = 1}^{n} q_{i} = \sum_{i = 1}^{n} v_{i}^{'} s_{i}

(13)

The average river velocity

\bar{v}

for the whole measurement area is calculated from the total river flow Q and the total cross-sectional area S.

\bar{v} = \frac{Q}{S}

(14)

3. Construction of the Model

3.1. MobileNetV2g

MobileNetV2 is a lightweight convolutional neural network with a new linear bottleneck and inverted residuals structure [13] based on MobileNetV1 [14], which makes the feature extraction capability enhanced. The network structure of MobileNetV2 is shown in Table 1:

In Table 1, t denotes the multiplicity of the 1 × 1 convolutional up-dimensioning in the inverse residual structure, c is the number of channels of the output image, n denotes the number of operator executions, and

S t

denotes the step size of the convolution.

Based on MobileNetV2, the network model was improved for the characteristics of frequency-domain images, and the improved network structure is shown in Figure 4 and named MobileNetV2g. First, a layer of dilated convolution [15] was added to the shallow part of the network backbone to increase the perceptual field to expand the feature extraction range. Second, a global attention mechanism (GAM) [16] was added behind the final 1 × 1 convolution layer to extract more useful information.

3.2. Global Attention Mechanism

GAM adopts the sequential channel-space attention mechanism in the convolutional block attention module (CBAM) [17], and we redesigned submodules that can reduce information loss and amplify global dimensional interaction features to improve the performances of deep neural networks. The whole process is shown in Figure 5 and expressed by Equations (15) and (16):

F_{2} = M_{C} (F_{1}) ⨂ F_{1}

(15)

F_{3} = M_{S} (F_{2}) ⨂ F_{2}

(16)

where

M_{C}

and

M_{S}

are the channel attention map and spatial attention map, respectively, and ⨂ denotes the multiplication operation by element.

The channel attention submodule uses 3D alignment to retain 3D information; then a two-layer multi-layer perceptron (MLP) is used to amplify the correlation between the channel and space, and then after reverse alignment and sigmoid activation function, the channel attention sub-module is shown in Figure 6.

In the spatial attention submodule, two convolutional layers with 7 × 7 convolutional kernels are used for spatial information fusion to focus on spatial information. We set the reduction rate r of the channel attention submodule to 4. Since maximum pooling reduces the use of information, the pooling operation is removed here to further preserve the feature mapping, but this in turn causes the spatial attention module to increase the number of parameters. The spatial attention sub-module is shown in Figure 7.

3.3. Dilated Convolution

Two frequency-domain images with a

1^{\circ}

difference in the main direction of the spectrum have tiny feature differences. To extract as many image features as possible without introducing too many parameters to identify the differences between the two images, we enhanced the feature extraction ability of the network by adding a layer of dilated convolution. The convolution kernel of the dilated convolution was set to 3 × 3, the expansion rate was set to 2, and the step size was set to 1.

Dilated convolution expands the perceptual field by adding holes to the traditional convolution and expands the convolution kernel without increasing the number of parameters so that the convolution kernel extracts more feature information. Dilated convolution introduces a hyperparameter called dilation rate, which defines the number of intervals between the points of the convolution kernels. As shown in Figure 8, the initial convolution kernel size of all three images is 3 × 3; the dilation rates are 1, 2, and 4 from left to right; and the perceptual fields are 3 × 3, 5 × 5, and 9 × 9, respectively.

3.4. Multi-Scale Retinex

When producing the dataset, the video-generated STIs often contain more noise, and some regions are blurred, which will affect the quality of the dataset. In this study, Multi-Scale Retinex (MSR) [18] was used for image enhancement of STIs. The enhancement effect of MSR on space-time images is shown in Figure 9a,b, and the texture features of space-time images became clearer after MSR enhancement. This method is particularly suitable for space-time images in the case of dark backgrounds. The corresponding frequency-domain STIs before and after MSR enhancement are shown in Figure 9c,d.

Retinex theory was used to simulate the human visual system to perceive a scene. MSR is an image enhancement method based on this theory. MSR is based on Single-Scale Retinex (SSR) [19]. MSR allows an acceptable trade-off between good local dynamic range and good color reproduction and is generally better for image enhancement than SSR. The equation for MSR is shown below:

R_{M S R_{i}} = \sum_{n = 1}^{N} ω_{n} R_{n_{i}} = \sum_{n = 1}^{N} ω_{n} [log I_{i} (x, y) - log (F_{n} (x, y) * I_{i} (x, y))]

(17)

where N is the number of scales, and MSR degenerates to SSR when N = 1.

ω_{n}

is the weight of each scale.

F_{n} (x, y) = C_{n} e x p [- (x^{2} + y^{2}) / 2 σ_{n}^{2}]

,

C_{n}

and

σ_{n}

are the normalization factor and filter standard deviation at different scales, respectively, and

I_{i}

is the input image on the ith color channel.

4. Analysis of Experimental Results

4.1. Artificially Synthesized Datasets

The dataset plays a key role in the training of the model, and since there is a lack of space-time images with high-accuracy MOT, we synthesized space-time images using Perlin noise [20] with strong MOT consistency and a smooth texture, as shown in Figure 10a. The orientation is set by humans, so the angle is accurate and known. The space-time images were converted to the frequency domain by FFT, as shown in Figure 10b, and the synthetic dataset was constructed with frequency-domain images from

0^{\circ}

to

179^{\circ}

in

1^{\circ}

steps, divided into 180 classes: there are 100 images in each class in the training set, for a total of 18,000 images, and 20 images in each class in the test set, for a total of 3600 images.

4.2. Mixed Datasets

The Baoji hydrological station is located in Guizhou province, China, and the video flow measurement system was set up on the river bank; the camera view’s is shown in Figure 11, and the flow direction of the river has been marked in the figure. The surface ripple of the water provides natural tracing conditions for the image-based flow-measurement technique. The river is a natural channel with irregular bank lines, and turbulent flow occurs in the areas near the banks.

The duration of each video saved by the video stream measurement system is 20 s, at 25 frames per second, with a frame width of 1920 pixels, and a frame height of 1080 pixels. The length of the velocimetric line was set to 500 pixels, and the size of the synthesized space-time image is 500 × 500 pixels, as shown in Figure 12a, and its Fourier transform is shown in Figure 12b. According to the observation, the space-time images generated by the actual river are classified into five types, as shown in Figure 13, which are normal, obstacle, flare, blur, and turbulence.

The MOT of the video-generated space-time images is not necessarily an integer angle, so it was approximated as an integer angle according to the principle of rounding, and this error is within the allowable range. From each type of space-time image above, 12 images, 60 in total, were selected to calculate the MOT of each one, which was calculated as follows: firstly, the space-time images were converted to the frequency domain; then, the main direction of the spectrum of the frequency-domain images was marked manually, as shown in Figure 14; and the slope was calculated according to the pixel coordinates

(x_{1}, y_{2})

and

(x_{2}, y_{2})

of the two endpoints of the line segment. Since the main directions of the spectrum and the MOT are mutually perpendicular to each other, the MOT was finally calculated.

When making the actual dataset, it is difficult to collect a certain number of space-time images of each MOT, so it is necessary to expand the dataset by sample enhancement. We adopted the method of rotating space-time images to expand the dataset, and rotated them in

1^{\circ}

steps; each space-time image can get 180 different MOT space-time images after rotation, but according to the actual situation, the space-time images of

0^{\circ}

,

90^{\circ}

, and

179^{\circ}

are not common in practice, so these three types of images were excluded. Finally, these space-time images were converted to the frequency domain to make the dataset, in which the training set had 8850 images and the test set 1770.

Since the frame rate and resolution of the video are low, the generated space-time images carry more noise, and the MOT of space-time images is not precise, so we mixed the synthetic dataset with the video-generated dataset. Except for the three categories of

0^{\circ}

,

90^{\circ}

, and

179^{\circ}

, half of each other category in the synthetic dataset was replaced with the video-generated dataset, i.e., 50 sheets each in the training set and 10 sheets each in the test set, to form a hybrid dataset, with a total of 18,000 sheets in the training set and 3600 sheets in the test set.

4.3. Experimental Platform and Experimental Parameters

The hardware information of the experimental platform was: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10 GHz, 32 GB running memory, NVIDIA GeForce GTX 1080Ti GPU, 11 GB video memory size; software information: Windows10, Python3.9, Pytorch1.8.1. The settings of hyperparameters in the experiment were as follows: the initial learning rate was 0.001, the sample size for each iteration was 16, the number of iterations was 300, and the Adam optimizer was chosen. The loss function was cross-entropy.

4.4. Experiments with Artificially Synthesized Datasets

The MOT of the synthetic datasets is accurate and contains little noise. In this section, we report comparative experiments on VGG16 [21], ResNet50 [22], MobileNetV2 [12], MobileNetV3 [23], and training on the synthetic dataset with the above networks, and we saved the optimal parameters obtained from the model training during the experiment. Then, we tested the test set with the optimal parameters to obtain the accuracy. Among them, MobileNetV2 had the highest accuracy, indicating that it discriminates MOT more accurately than other networks, so this network was used as the basis for improvement. The improved network was also subjected to comparative experiments. The experimental results are shown in Table 2.

In this paper, the evaluation metrics for image classification are Top-1 accuracy and top-5 accuracy. Top-1 means to take the result with the highest confidence as the prediction result, and if this prediction result is the same as its label, the prediction is correct. Top-5 means to take the top-5 results with the highest confidence as the prediction result, and if one of the prediction results is the same as its label, the prediction is correct. In this paper, top-1 accuracy is considered as the accuracy of MOT’s detection results at

0^{\circ}

error. The top-5 accuracy is considered as the accuracy of MOT detection within the allowed angular error.

4.5. Mixed Datasets Experiment

This dataset contains images in a variety of complex scenes and is closer to the real measurement situation than the synthetic dataset. To verify whether the proposed network model can be effectively applied to more complex scenes, comparative experiments were conducted on the hybrid dataset in this section, and the experimental results are shown in Table 3.

4.6. Measurement Comparison Experiment

To verify the effectiveness of my method in measurement, a comparative experiment was designed and carried out. The experimental site was the hydrological station of Baoji, the weather was cloudy and rainy at the time of testing, and the camera view is shown in Figure 15. The ground calibration points were A, B, C, and D. EF is the section line, E is the starting point, and F is the endpoint; and the section data are shown in Table 4. According to years of actual measurement, the bank coefficient is 0.70, and the water’s surface-flow coefficient is 0.88.

The current meter LS25-3A was used in this experiment, and the mean vertical flow velocity was measured at each point of the section line with the starting distances of 7, 12, 17, 22, 27, 32, 37, and 42 m. The results are shown in Table 5. The mean flow velocity was calculated as 1.57 m/s, and the flow rate was 149 m

^{3}

/s according to the river flow test specification. The STI generated by the video STI and its frequency-domain images are shown in Figure 16. Eight STIs were generated based on eight velocity lines, and the location of the velocity lines is shown in Figure 15. The width of the generated STIs is different because the length of each velocity line is different. The frequency-domain images are input to the model for MOT detection, and the output results are shown in Table 6. The MOT detection results are compared with those of the conventional STIV.

The present method is compared with the current meter method, LSPIV, and STIV for the experiments; and the mean vertical flow velocity, mean flow velocity, and discharge measured by each method are shown in Table 7. A comparison of the mean vertical velocity is shown in Figure 17. In the hydrographic measurements, the results obtained by the current meter are taken as the true values. After calculation, the absolute and relative errors of the three methods are shown in Table 8 and Table 9, respectively. When analyzing the data in the tables, it can be seen that the mean vertical velocity measured by the present method at each starting distance is closer to the true value than the results measured by STIV. Compared with LSPIV, only at the starting distances of 12, 17, and 42 m is the error of the mean vertical velocity measured by our method greater. Regarding the relative error of the mean flow velocity, the method improved by 5.74% compared with LSPIV and 7.65% compared with STIV. Compared with the relative error of discharge, the present method improved by 5.83% than LSPIV and 7.64% to STIV. Taken together, the present method achieves values very close to the true values measured by current meter method.

4.7. Generalization Performance Verification Experiment

Our model is trained on mixed datasets, and the river measured at the Baoji station is natural. To verify the measurement effect of the model on the artificial river, the generalization performance verification experiment of the model is reported in this section. The experimental site was the Dali hydrological station, and the camera view and ground calibration points are laid out in Figure 18. During the test period, the weather was clear, and the river banks in the flow measurement area were very regular; a thin line spanning diagonally from the top left to the bottom right of the viewpoint into the river channel masked the measurement. With DA as the section line, D as the starting point, and A as the ending point, the section data are shown in Table 10. The bank coefficient of this hydrographic station is 0.80, and the water’s surface-flow coefficient is 0.82.

The current meter LJ-20 was used in this experiment, and the mean velocity of the vertical line was measured at the starting distances of 2, 3, 4, 5, 6, 7, 8, 9, 10, and 11 m; and the results are shown in Table 11. Then, the mean velocity and flow were calculated according to the river-flow test specification. The STI generated from the video and its frequency-domain image are shown in Figure 19. Ten STIs were generated, and the locations of the velocity lines are shown in Figure 18. The frequency-domain images were input to the model for MOT detection, and the output results are shown in Table 12 and are compared with the MOT detection results of conventional STIV.

The present method is compared with the current meter method, LSPIV, and STIV; and the mean vertical flow velocity, mean flow velocity, and discharge measured by each method are shown in Table 13. A comparison of the mean vertical velocity is shown in Figure 20. Additionally, the results measured by the current meter are compared and the absolute and relative errors of the three methods are shown in Table 14 and Table 15, respectively. Analysis of the data in the table shows that, compared with STIV, the error of the vertical average flow velocity measured by the method in this paper at the starting distances of 2, 3, 5, and 11 m is greater, because the STI generated at the starting distance of 2 m contains a large amount of noise, the texture characteristics are not obvious, and the MOT detection of the method in this paper is obtuse, so the calculated vertical average flow velocity has a negative value. Compared with LSPIV, the vertical mean flow velocities measured by this method at the starting distances of 3, 5, 6, 7, and 8 m is closer to the value measured by the flow velocity meter. Regarding the relative error of the average flow velocity, the present method was 10.64% better than LSPIV and 6.42% better than STIV. Regarding the relative error of flow rate, this method was 11.38% better than LSPIV and 1.78% better than STIV. Taken together, the method proposed in this paper has better generalization performance.

5. Conclusions

Given the problems of the traditional STIV technique being susceptible to noise interference, large MOT detection errors in complex scenes, and complicated tuning process, we proposed a flow measurement algorithm combined with an improved lightweight network, which treats MOT detection as a picture classification task without complicated tuning. Its top-1 accuracy was 49.89%, which is 3.36% higher than before the improvement, and top-5 accuracy was 90.47%, which is 2.30% higher than before the improvement. By applying this method to the river flow measurement, the relative error of mean flow velocity was 1.27%, and the relative error of flow discharge was 1.15% measured at the Baoji station, using the measurement results of the current meter as the standard. The relative error of mean flow velocity measured at Dali station was 6.38%, and the relative error of flow discharge was 5.92%. Experimental results prove that the model has good generalization performance. In this paper, a mixed dataset training strategy was proposed for the first time, involving 180 classes of frequency-domain images, only 100 images per class, and 300 generations of iterative training to obtain a model with the above effects. Combined with the STIV technique of deep learning, its end-to-end implementation is simpler and more robust than the traditional methods. It has a variety of research prospects as a new measurement technology.

Although MobileNet-STIV outperforms the traditional method in complex scenes, there is still room to improve its measurement accuracy and robustness, and future work will consider the following aspects for improvement: (1) Pre-processing of images and image enhancement of STI or frequency-domain images using generative adversarial networks to improve MOT detection accuracy. (2) Acquiring more space-time images in more scenes and expanding the datasets to make the datasets more representative in different scenes. (3) Through the analysis of experimental data, our method is prone to large errors in detecting river bank flow, and future work could be improving for the characteristics of bank flow.

Author Contributions

Conceptualization, G.Z. and J.J.; methodology, Q.H.; formal analysis, J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by “Yunnan Xingdian Talents Support Plan” project of Yunnan and Key Projects of Yunnan Basic Research Plan (202101AS070016).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wan, D.; Wang, K. Key Technology of Flood Prevention, Control and Emergency Management for Small and Medium-Sized Rivers. J. Hohai Univ. (Nat. Sci.) 2021, 49, 204–212. [Google Scholar]
He, B.; Li, Q. Exploration on present situation and developing tendency of mountain flood disaster prevention technology. China Water Resour. 2014, 18, 11–13. [Google Scholar]
Tsubaki, R.; Fujita, I.; Tsutsumi, S. Measurement of the flood discharge of a small-sized river using an existing digital video recording system. J. Hydro-Environ. Res. 2011, 5, 313–321. [Google Scholar] [CrossRef] [Green Version]
Xu, L.; Zhang, Z.; Yan, X.; Wang, H.; Wang, X. Development status of non-contact open channel water flow monitoring technology. Water Resour. Inform. 2013, 3, 37–44. [Google Scholar]
Yang, D.; Shao, G.; Hu, W.; Liu, G.; Liang, J.; Wang, H.; Xu, C. Review of image-based river surface velocimetry research. J. Zhejiang Univ. Sci. 2021, 55, 1752–1763. [Google Scholar]
Zhang, Z.; Xu, F.; Wang, X.; XU, L. Research progress on river surface imaging velocimetry. Chin. J. Sci. Instrum. 2015, 36, 1441–1450. [Google Scholar]
Hu, W.; Ma, Z.; Tian, M.; Zhao, X.; Hu, X. Multispectral-Based Particle lmage Velocimetry. Spectrosc. Spectr. Anal. 2018, 38, 2038–2043. [Google Scholar]
Fujita, I.; Muste, M.; Kruger, A. Large-scale particle image velocimetry for flow analysis in hydraulic engineering applications. J. Hydraul. Res. 1998, 36, 397–414. [Google Scholar] [CrossRef]
Fujita, I.; Notoya, Y.; Tani, K.; Tateguchi, S. Efficient and accurate estimation of water surface velocity in STIV. Environ. Fluid Mech. 2019, 19, 1363–1378. [Google Scholar] [CrossRef] [Green Version]
Fujita, I.; Watanabe, H.; Tsubaki, R. Development of a non-intrusive and efficient flow monitoring technique: The space-time image Velocimetry (STIV). Int. J. River Basin Manag. 2007, 5, 105–114. [Google Scholar] [CrossRef] [Green Version]
Zhen, Z.; Huabao, L.; Yang, Z.; Jian, H. Design and evaluation of an FFT-based space-time image velocimetry (STIV) for time-averaged velocity measurement. In Proceedings of the 2019 14th IEEE International Conference on Electronic Measurement & Instruments, Changsha, China, 1–3 November 2019; pp. 503–514. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Zhang, Z.; Lv, L.; Shi, A.; Liu, H.; Wang, H. River surface flow field calibration method based on object-image scaling. J. Instrum. 2017, 38, 2273–2281. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Wey, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Wang, Z.; Ji, S. Smoothed dilated convolutions for improved dense prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 2486–2495. [Google Scholar]
Liu, Y.; Shao, Z.; Hoffmann, N. Global Attention Mechanism: Retain Information to Enhance Channel-Spatial Interactions. arXiv 2021, arXiv:2112.05561. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Petro, A.B.; Catalina, S.; Jean-Michel, M. Multiscale retinex. Image Process. Line 2014, 71–88. [Google Scholar] [CrossRef]
Choi, D.H.; Jang, I.H.; Kim, M.H.; Kim, N.C. Color image enhancement using single-scale retinex based on an improved image formation model. In Proceedings of the 2008 16th European Signal Processing Conference, Lausanne, Switzerland, 25–29 August 2008; pp. 1–5. [Google Scholar]
Perlin, K. Improving noise. In Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, San Antonio, TX, USA, 23–26 July 2002; pp. 681–682. [Google Scholar]
Rahman, M.; Laskar, M.; Asif, S.; Imam, O.T.; Reza, A.W.; Arefin, M.S. Flower Recognition Using VGG16. In Proceedings of the Third International Conference on Image Processing and Capsule Networks, Lecce, Italy, 23–27 May 2022; pp. 748–760. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Andrew, H.; Mark, S.; Bo, C.; Weijun, W.; Liang-Chieh, C.; Mingxing, T.; Grace, C.; Vijay, V.; Yukun, Z.; Ruoming, P.; et al. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, South Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]

Figure 1. Generation of a space-time image.

Figure 2. Process of MOT solving.

Figure 3. Process of MOT detection by deep learning.

Figure 4. MobileNetV2g model structure.

Figure 5. The overview of GAM.

Figure 6. Channel attention submodule.

Figure 7. Spatial attention submodule.

Figure 8. Schematic diagram of dilated convolution expansion.

Figure 9. MSR image enhancement before and after comparison: (a) before STI enhancement, (b) after STI enhancement, (c) the frequency-domain image before enhancement, (d) the frequency-domain image after enhancement.

Figure 10. Example of synthetic STI. (a) A synthetic STI, (b) The FFT image of (a).

Figure 11. View of camera.

Figure 12. Example of real STI. (a) A STI generated from video, (b) The FFT image of (a).

Figure 13. STI in different scenes: (a) obstacle (b) blur (c) flare (d) turbulence (e) normal.

Figure 14. Main direction of spectrum by marking.

Figure 15. Layout diagram of the ground mark and search lines in a comparison test.

Figure 16. STIs and their FFT images obtained at Baoji hydrographic Station.

Figure 17. Comparison of mean vertical flow velocity.

Figure 18. Layout diagram of the ground mark and search lines.

Figure 19. STIs and their FFT images obtained at Dali hydrographic Station.

Figure 20. Comparison of mean vertical flow velocity.

Table 1. MobileNetV2 model structure.

Input	Operator	t	c	n	$S t$
$224^{2} \times 3$	Conv2d	-	32	1	2
$112^{2} \times 32$	Bottleneck	1	16	1	1
$112^{2} \times 16$	Bottleneck	6	24	2	2
$56^{2} \times 24$	Bottleneck	6	32	3	2
$28^{2} \times 32$	Bottleneck	6	64	4	2
$28^{2} \times 64$	Bottleneck	6	96	3	1
$14^{2} \times 96$	Bottleneck	6	160	3	2
$7^{2} \times 160$	Bottleneck	6	320	1	1
$7^{2} \times 320$	Conv2d $1 \times 1$	-	1280	1	1
$7^{2} \times 1280$	Avgpool $7 \times 7$	-	-	1	-
$1^{2} \times 1280$	Conv2d $1 \times 1$	-	-	-	-

Table 2. Accuracies of different models.

Model	Top-1 Accuracy	Top-5 Accuracy
VGG16 [21]	0.56%	2.78%
ResNet50 [22]	52.31%	99.50%
MobileNetV2 [12]	52.72%	99.53%
MobileNetV3 [23]	52.31%	99.50%
MobileNetV2g	54.58%	99.53%

Table 3. Accuracies of different models.

Model	Top-1 Accuracy	Top-5 Accuracy
VGG16 [21]	0.56%	2.78%
ResNet50 [22]	47.39%	89.64%
MobileNetV2 [12]	46.53%	88.17%
MobileNetV3 [23]	47.75%	90.31%
MobileNetV2g	49.89%	90.47%

Table 4. Sectional data of the comparison experiment.

Starting Distance/m	Depth/m
1.8	1.37
7.0	1.90
12.0	1.75
17.0	1.72
22.0	1.84
27.0	1.88
32.0	2.12
37.0	2.33
42.0	2.47
48.4	1.35

Table 5. Current meter measurement results of the comparison experiment.

Starting Distance/m	Mean Vertical Velocity/(m/s)	Partial Mean Velocity/(m/s)	Partial Area/m $^{2}$	Partial Discharge/(m $^{3}$ /s)
0	0	-	-	-
0∼7	-	0.93	10.10	9.39
7	1.33	-	-	-
7∼12	-	1.60	8.97	14.40
12	1.88	-	-	-
12∼17	-	1.94	8.59	16.70
17	2.01	-	-	-
17∼22	-	2.14	8.86	19.00
22	2.27	-	-	-
22∼27	-	2.03	9.18	18.60
27	1.79	-	-	-
27∼32	-	1.72	10.00	17.20
32	1.64	-	-	-
32∼37	-	1.68	11.10	18.60
37	1.73	-	-	-
37∼42	-	1.58	12.30	19.40
42	1.43	-	-	-
42∼51.4	-	1.00	15.80	15.80
51.4	0	-	-	-

Table 6. Test results of MOT.

Number	Starting Distance/m	STIV	MobileNetV2g
1	7	16.16 $^{\circ}$	47.00 $^{\circ}$
2	12	54.48 $^{\circ}$	71.00 $^{\circ}$
3	17	64.93 $^{\circ}$	73.00 $^{\circ}$
4	22	72.01 $^{\circ}$	74.00 $^{\circ}$
5	27	76.38 $^{\circ}$	68.00 $^{\circ}$
6	32	74.68 $^{\circ}$	67.00 $^{\circ}$
7	37	73.45 $^{\circ}$	67.00 $^{\circ}$
8	42	69.39 $^{\circ}$	57.00 $^{\circ}$

Table 7. Comparison of measurement results by different methods.

Methods	Mean Vertical Velocity/(m/s)								$\bar{v}$ /(m/s)	Q/(m $^{3}$ /s)
Methods	No.1	No.2	No.3	No.4	No.5	No.6	No.7	No.8	$\bar{v}$ /(m/s)	Q/(m $^{3}$ /s)
LS25-3A	1.33	1.88	2.01	2.27	1.79	1.64	1.73	1.43	1.57	149.00
LSPIV	1.00	1.57	1.99	2.48	2.10	1.76	1.60	1.52	1.46	138.60
STIV	0.31	1.11	1.43	2.07	2.51	2.42	2.43	2.30	1.71	162.10
Our Method	1.14	2.31	2.19	2.34	1.50	1.56	1.70	1.33	1.55	147.29

Table 8. Absolute error comparison.

Methods	Mean Vertical Velocity/(m/s)								$\bar{v}$ /(m/s)	Q/(m $^{3}$ /s)
Methods	No.1	No.2	No.3	No.4	No.5	No.6	No.7	No.8	$\bar{v}$ /(m/s)	Q/(m $^{3}$ /s)
LSPIV	0.33	0.31	0.02	0.21	0.31	0.12	0.13	0.09	0.11	10.40
STIV	1.02	0.77	0.58	0.20	0.72	0.78	0.70	0.87	0.14	13.10
Our Method	0.19	0.43	0.18	0.07	0.29	0.08	0.03	0.10	0.02	1.71

Table 9. Relative error comparison.

Methods	Mean Vertical Velocity/(m/s)								$\bar{v}$ /(m/s)	Q/(m $^{3}$ /s)
Methods	No.1	No.2	No.3	No.4	No.5	No.6	No.7	No.8	$\bar{v}$ /(m/s)	Q/(m $^{3}$ /s)
LSPIV	24.81	16.49	0.10	9.25	17.32	7.32	7.51	6.29	7.01	6.98
STIV	76.69	40.96	28.86	8.81	40.22	47.56	40.46	60.84	8.92	8.79
Our Method	14.29	22.87	8.96	3.08	16.20	4.88	1.73	6.99	1.27	1.15

Table 10. Sectional data of the Dali station.

Starting Distance/m	Depth/m
0.1	0.59
0.5	0.70
1.0	0.81
1.5	0.82
2.0	0.76
2.5	0.86
3.0	0.93
3.5	0.90
4.0	0.96
4.5	1.00
5.0	0.99
5.5	0.94
6.0	1.04
6.5	0.95
7.0	0.86
7.5	0.78
8.0	0.77
8.5	0.70
9.0	0.92
9.5	0.85
10.0	0.81
10.5	0.74
11.0	0.55
11.5	0.42
11.9	0.39

Table 11. Current meter measurement results.

Starting Distance/m	Mean Vertical Velocity/(m/s)	Partial Mean Velocity/(m/s)	Partial Area/m $^{2}$	Partial Discharge/(m $^{3}$ /s)
0	0	-	-	-
0∼2	-	0.29	1.45	0.42
2	0.36	-	-	-
2∼3	-	0.40	0.85	0.34
3	0.43	-	-	-
3∼4	-	0.39	0.92	0.36
4	0.35	-	-	-
4∼5	-	0.40	0.99	0.40
5	0.45	-	-	-
5∼6	-	0.60	0.98	0.59
6	0.76	-	-	-
6∼7	-	0.72	0.95	0.68
7	0.68	-	-	-
7∼8	-	0.64	0.80	0.51
8	0.61	-	-	-
8∼9	-	0.54	0.77	0.42
9	0.48	-	-	-
9∼10	-	0.43	0.86	0.37
10	0.38	-	-	-
10∼11	-	0.44	0.71	0.31
11	0.49	-	-	-
11∼11.9	-	0.39	0.42	0.16
11.9	0	-	-	-

Table 12. Test results of MOT.

Number	Starting Distance/m	STIV	MobileNetV2g
1	2	34.20 $^{\circ}$	150.00 $^{\circ}$
2	3	58.25 $^{\circ}$	62.00 $^{\circ}$
3	4	61.87 $^{\circ}$	59.00 $^{\circ}$
4	5	62.78 $^{\circ}$	59.00 $^{\circ}$
5	6	64.88 $^{\circ}$	60.00 $^{\circ}$
6	7	20.76 $^{\circ}$	61.00 $^{\circ}$
7	8	45.14 $^{\circ}$	54.00 $^{\circ}$
8	9	5.46 $^{\circ}$	56.00 $^{\circ}$
9	10	11.36 $^{\circ}$	48.00 $^{\circ}$
10	11	36.58 $^{\circ}$	32.00 $^{\circ}$

Table 13. Comparison of measurement results by different methods.

Methods	Mean Vertical Velocity/(m/s)										$\bar{v}$ /(m/s)	Q/(m $^{3}$ /s)
Methods	No.1	No.2	No.3	No.4	No.5	No.6	No.7	No.8	No.9	No.10	$\bar{v}$ /(m/s)	Q/(m $^{3}$ /s)
LJ-20	0.36	0.43	0.35	0.45	0.76	0.68	0.61	0.48	0.38	0.49	0.47	4.56
LSPIV	0.23	0.01	0.45	0.58	0.38	0.41	0.57	0.62	0.37	0.62	0.39	3.77
STIV	0.38	0.46	0.03	0.45	0.13	0.79	0.72	0.72	0.61	0.40	0.43	4.21
Our Method	−0.21	0.68	0.59	0.57	0.63	0.71	0.60	0.67	0.53	0.31	0.44	4.29

Table 14. Absolute error comparison.

Methods	Mean Vertical Velocity/(m/s)										$\bar{v}$ /(m/s)	Q/(m $^{3}$ /s)
Methods	No.1	No.2	No.3	No.4	No.5	No.6	No.7	No.8	No.9	No.10	$\bar{v}$ /(m/s)	Q/(m $^{3}$ /s)
LSPIV	0.13	0.42	0.10	0.13	0.38	0.27	0.04	0.14	0.01	0.13	0.08	0.79
STIV	0.02	0.03	0.32	0.00	0.63	0.11	0.11	0.24	0.23	0.09	0.04	0.35
Our Method	0.57	0.25	0.24	0.12	0.13	0.03	0.01	0.19	0.15	0.18	0.03	0.27

Table 15. Relative error comparison.

Methods	Mean Vertical Velocity/(m/s)										$\bar{v}$ /(m/s)	Q/(m $^{3}$ /s)
Methods	No.1	No.2	No.3	No.4	No.5	No.6	No.7	No.8	No.9	No.10	$\bar{v}$ /(m/s)	Q/(m $^{3}$ /s)
LSPIV	34.93	97.70	28.67	29.21	50.21	40.15	7.01	28.28	3.61	26.86	17.02	17.30
STIV	4.60	6.70	91.00	0.80	82.80	16.80	18.00	49.60	61.50	17.70	12.80	7.70
Our Method	158.33	58.14	68.57	26.67	17.11	4.41	1.64	39.58	39.47	36.73	6.38	5.92

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, Q.; Wang, J.; Zhang, G.; Jin, J. Space-Time Image Velocimetry Based on Improved MobileNetV2. Electronics 2023, 12, 399. https://doi.org/10.3390/electronics12020399

AMA Style

Hu Q, Wang J, Zhang G, Jin J. Space-Time Image Velocimetry Based on Improved MobileNetV2. Electronics. 2023; 12(2):399. https://doi.org/10.3390/electronics12020399

Chicago/Turabian Style

Hu, Qiming, Jianping Wang, Guo Zhang, and Jianhui Jin. 2023. "Space-Time Image Velocimetry Based on Improved MobileNetV2" Electronics 12, no. 2: 399. https://doi.org/10.3390/electronics12020399

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Space-Time Image Velocimetry Based on Improved MobileNetV2

Abstract

1. Introduction

2. Overview of STIV

2.1. Generation of STI

2.2. Traditional MOT Detection Method

2.3. MOT Detection Method Applying Deep Learning

2.4. Surface Flow Velocity Measurement and Discharge Calculation

3. Construction of the Model

3.1. MobileNetV2g

3.2. Global Attention Mechanism

3.3. Dilated Convolution

3.4. Multi-Scale Retinex

4. Analysis of Experimental Results

4.1. Artificially Synthesized Datasets

4.2. Mixed Datasets

4.3. Experimental Platform and Experimental Parameters

4.4. Experiments with Artificially Synthesized Datasets

4.5. Mixed Datasets Experiment

4.6. Measurement Comparison Experiment

4.7. Generalization Performance Verification Experiment

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI