Automatic ROI Setting Method Based on LSC for a Traffic Congestion Area

He, Yang; Jin, Lisheng; Wang, Huanhuan; Huo, Zhen; Wang, Guangqi; Sun, Xinyu

doi:10.3390/su142316126

Open AccessArticle

Automatic ROI Setting Method Based on LSC for a Traffic Congestion Area

by

Yang He

¹,

Lisheng Jin

^1,2,*,

Huanhuan Wang

¹,

Zhen Huo

¹,

Guangqi Wang

¹ and

Xinyu Sun

¹

School of Vehicle and Energy, Yanshan University, Qinhuangdao 066004, China

²

Hebei Key Laboratory of Special Delivery Equipment, Yanshan University, Qinhuangdao 066004, China

^*

Author to whom correspondence should be addressed.

Sustainability 2022, 14(23), 16126; https://doi.org/10.3390/su142316126

Submission received: 23 October 2022 / Revised: 29 November 2022 / Accepted: 30 November 2022 / Published: 2 December 2022

(This article belongs to the Special Issue Emerging Transportation in Sustainable Development Environment: Multi-Modal Transportation and Connected Automated Vehicle (CAV) Technology)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Congested regions in videos put forward higher requirements for target detection algorithms, and the key detection of congested regions provides optimization directions for improving the accuracy of detection algorithms. In order to make the target detection algorithm pay more attention to the congested area, an automatic selection method of a traffic congestion area based on surveillance videos is proposed. Firstly, the image is segmented with superpixels, and a superpixel boundary map is extracted. Then, the mean filtering method is used to process the superpixel boundary map, and a fixed threshold is used to filter pixels with high texture complexity. Finally, a maximin method is used to extract the traffic congestion area. Monitoring data of night and rainy days were collected to expand the UA-DETRAC data set, and experiments were carried out on the extended data set. The results show that the proposed method can realize automatic setting of the congestion area under various weather conditions, such as full light, night and rainy days.

Keywords:

intelligent transportation system; roadside perception; target detection; ROI automatic setting; superpixel segmentation

1. Introduction

Intelligent transportation systems play an important role in public transportation management, traffic safety and connected autonomous driving [1,2]. Intelligent transportation systems usually deploy a large number of cameras on the roadside, analyze the video data collected by these cameras and extract target categories, target numbers, traffic events and other content from the video, which are of great significance for the promotion of the application of intelligent transportation systems [3]. At this stage, target detection methods based on deep learning have shown good detection accuracy on datasets, but they still face challenges brought by target congestion, small-sized targets and illumination changes in actual deployment [4,5,6]. In the actual collected surveillance video, congestion is a common condition, and the congestion area is represented as a complex texture area on the image [7]. Congested areas put forward higher requirements for the generalization of target detection algorithms. Extracting the congested area and using a detection algorithm to focus on the area is of great significance for the improvement of the generalization of detection algorithms [8].

In order to improve the performance of algorithms in practical use, Region of Interest (ROI) selection is used in multiple tasks. The authors of [9] introduced the ROI selection module in a target recognition deep learning network, and they proposed an ROI selection method based on sliding windows and grayscale sum maximization. The ROI area image is input into the deep learning network for target recognition, which reduces the background. The interference of the region in the target recognition improves the confidence of the algorithm recognition results. However, the window height is fixed when selecting the ROI, and the texture features in the window are not considered. The authors of [10] proposed a traffic sign detection method based on ROI selection. In ROI selection, a window sliding method is used to search for the largest possible target area in the Hue Saturation Value (HSV) image after binarization, but the texture information of the image is not fully utilized. The authors of [11] proposed an ROI selection method based on standard deviation maps (STDMs) and used an STDM to build a feature map known as a candidate centers map (CCM) based on clutter rate (CR). It uses a graph-based pruning method to retain useful ROI candidate points and then uses geometric clustering to merge the candidate points. However, when extracting ROI, only candidate points are obtained, and areas with complex textures are not further extracted. The authors of [12] used a region proposal network (RPN) to generate an ROI frame. However, the size of the ROI is relatively fixed, and the ROI area only contains the foreground target. The authors of [13] proposed a saliency object detection method based on texture screening and hypergraph analysis. The texture screening method is used in the ROI selection, the feature points are extracted by the Canny operator and the feature points are voted by using the texture side length and the distance from the pixel to the texture boundary. Finally, the Delaunay triangulation method is used to calculate the convex hull of the reserved feature points to generate a candidate ROI area. However, the determination conditions of the ROI area are relatively harsh, and there is a risk of losing important areas.

The authors of [14] proposed an improved method for salient feature point detection and ROI determination. The feature map is filtered through the mask generated by the local feature. Then, the feature map is optimized using the center-surrounding parameters, and pixels with significant features are marked as ROIs. However, no clear ROI boundary is generated. The authors of [15] proposed a traffic sign detection and recognition method based on multiple thresholds and subspaces. A multi-threshold method is used in ROI selection, and different thresholds are used in the day and night to binarize the features mapped in the red-angle space. Then, pixels with significant features are marked as ROIs. The proposed method has a good effect when a strong light source is directly illuminated but does not consider the influence of local texture features, and the extraction effect is poor when different objects have similar degrees of reflection. The authors of [16] proposed an improved deep learning method for retrieving river outlets from UAS (Unmanned aircraft systems) images. The RPN network is used in the ROI selection process, and in order to balance the training speed and the model progress, the number of ROIs is limited. The authors of [17] proposed a tracking method based on feature selection. Before ROI selection, mutual information is introduced to analyze the correlation between feature maps, and key feature maps are retained, thereby improving the real-time performance of the algorithm.

In the above references, the setting of the ROI area is mainly based on the area of a single target. When the features of the target cannot be correctly extracted, the ROI area set is not reliable, and the area where the congestion occurs is not extracted.

Aiming to solve the above problems, this paper proposes a method for the automatic selection of congested areas of surveillance videos, which realizes the automatic selection of congested areas of a single frame of pictures from the collected video. First, linear spectral clustering (LSC) [18] is used to perform superpixel segmentation on a single-frame image of the video stream and to extract the superpixel boundary map. Then, mean filtering is used to extract the texture intensity of the superpixel boundary map and to filter map pixels with complex textures based on the threshold. Finally, a maximin method is used to determine the complex texture area and to mark the area as a congested area. The public dataset UA-DETRAC [19] is augmented, and the proposed method is validated.

2. Proposed Method

2.1. Overview of the Proposed Framework

As shown in Figure 1, the proposed method mainly includes three steps:

(1): Superpixel segmentation (corresponding to the content of Section 2.2), mainly using LSC to achieve superpixel segmentation of the original image and extract the superpixel image;
(2): Complex texture point screening (corresponding to Section 2.3), through mean filtering and threshold filtering, to extract pixels with high texture complexity from superpixel images;
(3): Congestion area selection (corresponding to Section 2.4), using a simple maximin method to extract complex texture areas from pixels with high texture complexity and mark them as traffic jam areas.

2.2. LSC-Based Superpixel Segmentation of the Image

The texture of an image is a statistical feature represented by pixel values in a certain area. The texture features within the same target are generally uniform, and the texture features between different targets generally have significant differences, resulting in clear texture boundaries between different targets. When there is congestion in the surveillance video, when there is congestion, when there are shadows, when there is leaf occlusion, etc., in the surveillance video, the number of targets increases, or the texture differences within the same target increase, which eventually leads to an increase in the complexity of image textures. In order to count the complexity of the texture in the image, a simple clustering method is used to cluster the image pixels. Pixels with high similarity are extracted as the same cluster, the boundaries between different clusters are extracted and the extracted boundaries are used for texture complexity analysis.

Superpixel segmentation refers to using the local texture features of the image to extract pixels with high similarity through similarity analysis, and it marks these pixels as the same superpixel. Each pixel in the superpixel is each element in the same cluster, and the boundary of the superpixel is the boundary between different clusters. The image is superpixeled to extract superpixel boundaries for the analysis of texture complexity.

Widely used methods for superpixel segmentation include simple linear iterative clustering (SILC) [20], LSC [18], etc. The SILC method only needs to set two parameters: the number of superpixels and the compactness factor. During the initialization process, the superpixels are initialized to a fixed size. Then, the features of the superpixels are extracted, and the current superpixel area is corrected using the distance of the features. This method is simple and efficient, but the segmentation effect of this method is not good when the scene is complex. The LSC algorithm optimizes the performance in complex scenes and uses the k-means algorithm in the high-dimensional space iteration process. Therefore, the LSC algorithm can well extract boundaries in complex scenes. The purpose of this paper is to extract complex texture regions by using the complexity of the boundary. Accurate extraction of the boundary can provide strong support for the task of this paper. Therefore, the LSC algorithm is used as the superpixel segmentation method.

The processing flow of the LSC algorithm is shown in Figure 2. In order to improve the real-time performance as much as possible, the same feature parameters as those in LSC [18] are used. The feature parameters are three values

(l, α, β)

in the CIELab color space, where

l

represents the brightness of pixels, and

α

represents the redness. If the value is greater than 0, the degree of redness is darker. If the value is less than 0, the degree of greenness is darker.

β

represents yellowness. As the value becomes larger when it is greater than 0, the yellowness becomes darker; as the value becomes larger is when it is less than 0, the blueness becomes darker. The conversion of the image pixel RGB color space to the CIELab color space refers to [21,22,23]. The spaces need to be converted from an RGB space to an XYZ space (XYZ space is the CIE1931 color space [24]) and then converted to the CIELab space. The conversion from an RGB space to an XYZ space is shown in Equation (1):

[\begin{array}{l} X \\ Y \\ Z \end{array}] = [\begin{matrix} 0.412453 & 0.357580 & 0.180423 \\ 0.212671 & 0.715160 & 0.072169 \\ 0.019334 & 0.119193 & 0.950227 \end{matrix}] [\begin{matrix} R \\ G \\ B \end{matrix}]

(1)

Conversion from the XYZ space to the CIELab color space is shown in Equations (2)–(4):

l = 116 f (\frac{Y}{Y_{W}}) - 16,

(2)

α = 500 [f (\frac{X}{X_{W}}) - f (\frac{Y}{Y_{W}})],

(3)

β = 200 [f (\frac{Y}{Y_{W}}) - f (\frac{Z}{Z_{W}})],

(4)

where

X

,

Y

and

Z

are the sample observation values, and

X_{W}

,

Y_{W}

and

Z_{W}

are the tri-stimulus value of the light source.

f ()

is shown in Equation (5):

f (t) = \{\begin{matrix} t^{1 / 3} & i f t > {(6 / 29)}^{3} \\ \frac{1}{3} {(\frac{29}{6})}^{2} t + \frac{4}{29} & o t h e r w i s e \end{matrix},

(5)

The Euclidean distance formula that we used is shown in Equation (6).

D_{e u c l i d} (i, j) = \sqrt{{(p_{j 1} - c_{i 1})}^{2} + {(p_{j 2} - c_{i 2})}^{2}},

(6)

In the equation,

p_{j} (p_{j 1}, p_{j 2})

represents the

i

th neighbor point of the

j

th center point, and

c_{i} (c_{i 1}, c_{i 2})

is the

i

th center point. Points are pixels in the image.

Through the conversion in Equations (1)–(5), the RGB of the pixel is converted to the Lab value in the CIELab coordinate system, and then the converted data are used as the input of Figure 2 for subsequent processing.

2.3. Filter-Based Selection of Complex Texture Points

Traffic congestion often causes the texture of the image to become more complex. The extraction of points with significant features caused by an increase in image texture complexity is of great significance for the extraction of congested areas.

When there is traffic congestion in the surveillance video, the number of traffic participants is large, and there is generally a clear boundary between different traffic participants (between an object and an object, such as cars and cars, cars and pedestrians, etc.). In addition, there are also boundaries caused by shadows and occlusions within the same target. These boundaries have led to an increase in the number of boundaries in the congested area, and the boundaries are clear. This brings about a local improvement in the complexity of image texture features, and points with significant texture features correspond to the target boundary of the congested area. By extracting pixels with significant texture features, i.e., extracting complex texture points, it can provide strong support for the selection of congested areas.

The main purpose of complex texture point screening is to extract pixels with significant texture features from the image. Considering that the research object is the traffic congestion area, the main reason for the increase in texture complexity caused by congestion is the increase in the number of target boundaries. Therefore, the extracted superpixel image is processed, the boundary of the superpixel is extracted, the boundary is analyzed and complex texture points are screened. The processing flow of complex texture point screening is shown in Figure 3.

The extraction of superpixel boundaries is achieved by the bit-wise inversion of pixels, as shown in Equation (7).

M a s k_{l} = \{\begin{matrix} \begin{array}{l} 0 \\ 255 \end{array} & \begin{array}{l} i f (I_{u v} a n d M a s k_{u v}) = 1 \\ e l s e \end{array} \end{matrix},

(7)

In the equation,

I_{u v}

is the pixel point in the original image, whose coordinate is

(u, v)

,

M a s k_{u v}

is the corresponding superpixel image and

M a s k_{l}

is the superpixel boundary map.

In terms of feature extraction, most mainstream methods at the present stage adopt deep learning schemes [25,26,27,28,29], and models obtained through supervision training can obtain the main features required by the task from the data. The method proposed in this paper is mainly used in the pre-processing stage of the image. In order to reduce the calculation cost as much as possible, the binary graph after pixel clustering is used as the input of ROI selection, and the pixel texture density is extracted from it. The extracted boundary map only contains the boundary information of the image. Complex texture points are represented in the map as pixels with complex boundaries and a large number of boundaries. As the number of boundary pixels around the pixel becomes greater, the current pixel becomes more likely to belong to a complex texture point. Considering that the current pixel is only affected by surrounding neighbor pixels and that the pixel value can only be 0 or 255, mean filtering can be performed on the superpixel boundary map, and the obtained pixel value can reflect the possibility that the current pixel belongs to a complex pixel point. By threshold filtering the image after mean filtering, we can determine that the pixels exceeding the threshold are complex pixels (pixels with high texture complexity), and those less than the threshold are pixels with low texture complexity. The mean filtering operation of the binary boundary graph is shown in Equation (8).

M a s k_{M} = \frac{1}{m^{2}} \sum_{(u, v) \in S_{u, v}} M a s k_{l} (u, v),

(8)

where

m

is the filter template size, and

S_{u, v}

is the pixel contained in the neighborhood model of the image.

The boundary map after mean filtering implements fuzzy processing in the boundary part. Moreover, the pixel value of the filtered boundary map contains the information of the pixel texture complexity. Pixel points with complex textures and pixel points without complex textures can be screened out by setting the threshold and traversing the entire boundary graph. For screening, see Equation (9).

M a s k_{C} = \{\begin{matrix} 0 & i f M a s k_{M} (u, v) < T h r e s h o l d \\ 1 & e l s e \end{matrix},

(9)

where

T h r e s h o l d

is the threshold of screening whether it belongs to the complex texture area, which is determined by enumeration. The values are shown in Table 1.

2.4. Maximin-Based Selection of Complex Texture Area

The purpose of congested area selection is to extract congested areas from the extracted pixels with high texture complexity. High-texture-complexity pixels are complex texture points. The congested area is the ROI area determined by pixels with high texture complexity. The extraction of the congested area is of great significance to the accurate detection of the target detection algorithm. In order to account for the real-time performance, this paper adopts a simple maximin strategy to realize the selection of congested areas. The congestion area selection is performed mainly to find the four boundaries of the ROI area from high-texture-complexity pixels, where the left boundary is the minimum value of coordinates in high-texture-complexity pixels, the right boundary is the maximum value of coordinates in high-texture-complexity pixels, the lower bound is the minimum value of coordinates in high-texture-complexity pixels and the upper bound is the maximum value of the coordinates in pixels of high texture complexity. The extracted ROI boundary is the congested area. The extracted region is shown in Equation (10).

R O I = \{\begin{cases} u_{\min} < u < u_{\max} \\ v_{\min} < v < v_{\max} \end{cases},

(10)

where

u_{\min}

,

u_{\max}

,

v_{\min}

and

v_{\max}

are determined by the upper left and lower right pixels of all pixels whose values are 1 in

M a s k_{C}

.

3. Experiment Results and Discussion

3.1. Data Description

In order to verify the effectiveness of the algorithm proposed in this paper, the public surveillance video dataset UA-DETRAC [19] was expanded, and the performance was tested on the expanded dataset. The UA-DETRAC dataset contains continuous video data collected from real scenes, covering typical working conditions such as weather changes, occlusions and continuous changes, and can provide a data source for evaluating algorithm performance. In addition, in order to verify the performance of the algorithm on rainy days and in nighttime conditions, the image data of different scenes in rainy days and at nighttime were collected for the performance test of the algorithm in this paper.

3.2. Implementation Details

The algorithm in this paper was completed on a PC. The PC configuration is as follows: processor Inter core I7, 16 GB memory and windows10 (64-bit) operating system. The development environment was PyCharm, the programming language was python3 and the used data were the expanded UA-DETRAC dataset. The parameters of the algorithm involved in this paper are shown in Table 1.

3.3. Experiment Results

Taking image00001.jpg of MVI20011, image00013.jpg of MVI40192, image00004.jpg of MVI63521 and image00112.jpg of MVI63563 in the training data in the UA-DETRAC dataset as examples, the calculation results of each step of the method proposed in this paper were visualized.

Specifically, as shown in Figure 4, from top to bottom, each row is the original image, superpixel image, superpixel boundary map, mean filtered superpixel boundary map, high-texture-complexity pixels (red), high texture complexity area (blue box) and congested area (blue box). The pixel value intensity after mean filtering is shown in Figure 5, where, as the image height becomes higher, the pixel value intensity becomes greater. Below Figure 5 is a contour plot of pixel intensities.

3.4. Analysis and Discussion

The results of image00001.jpg of MVI20011, image00013.jpg of MVI40192, image00004.jpg of MVI63521 and image00112.jpg of MVI63563 in the training data in the UA-DETRAC dataset are shown in Figure 4. From the penultimate row and the last row in Figure 4, the proposed method can extract complex texture points and complex texture areas from the image, and the complex texture areas are also the areas where congestion and other working conditions occur. When there is no congestion in the image, the area where the target is occluded can also be extracted (as shown in the results in column 4, the target is occluded by vegetation, etc.). It can be seen from the 2nd and 3rd lines in Figure 4 that the LSC algorithm with a single iteration can effectively extract the superpixel area of the image, and the number of superpixel boundaries in the complex texture area is also denser than that in the simple texture area. This suggests that the denseness of the superpixel boundaries enables discrimination between low-texture-complexity pixels and high-texture-complexity pixels. A large-sized filter template is used for mean filtering, and the texture complexity of the current pixel is determined according to the mean value of the pixel neighborhood, which can effectively suppress a single noise point and avoid the interference caused by the noise point. A fixed threshold is used to binarize the filtered boundary map and to extract high-texture-complexity pixels (shown as red pixels in line 5 in Figure 4). The maximum and minimum values of the direction of the region where the high-texture-complexity pixels are located are found by using the maximum and minimum strategy, and the ROI is set for the image (as shown in the blue box in the second last row and the last row in Figure 4). It can be seen that the extracted ROI area can represent the congested area (or the occluded area) in the image to a certain extent. These areas are areas that interfere with the target detection algorithm and that have a direct impact on the accuracy of the detection algorithm, and they need to be accurately detected. The algorithm in this paper can effectively extract these regions, which provides an optimization idea for improving the performance of the target detection algorithm under congestion, occlusion and other conditions. Figure 5 is a scatter plot of pixel intensity after mean filtering in the fourth row in Figure 4, Figure 5 further reveals the difference between high-texture-complexity pixels and low-texture-complexity pixels in the image and the extraction of high-texture-complexity pixels can be achieved by threshold segmentation.

The influence of three parameters (LSC iteration number, filter template size, complexity threshold) involved in the proposed algorithm on the algorithm’s performance was tested and analyzed. We took image00001.jpg of MVI20011 in the training data of the UA-DETRAC dataset as the sample data for analysis. When analyzing the influence of one parameter, the values of other parameters in Table 1 remain unchanged.

First, the number of LSC iterations was enumerated, and the real-time performance of the ROI automatic extraction algorithm (as shown in Figure 6) and the proportion of the extracted ROI range (as shown in Figure 7) were counted. Moreover, the effect of the algorithm with LSC iterations of 1, 5, 10, 15 and 20 was visualized (as shown in Figure 8). The corresponding pixel density distribution after mean filtering is shown in Figure 9. The number of LSC iterations ranges from 1 to 21. It can be seen from Figure 6 that, with the increase in LSC iterations, the time consumption of the proposed algorithm increases approximately linearly. This is because the LSC termination condition is mainly determined by the number of iterations, and the time consumed by each iteration cycle is roughly the same, resulting in an approximately linear increase in the time consumption of the entire algorithm due to the increase in LSC iterations. In addition, it can be seen from Figure 7 that the proportion of the extracted ROI is the lowest when the number of LSC iterations is 1. When the number of LSC iterations is greater than 1, it is between 97.5% and 99.5%. This indicates that, when the number of LSC iterations is greater than 1, the range of the extracted ROI is approximately equal to the original image. This is not a good position for ROI selection. Subsequent target detection and other algorithms provide effective focus areas. When the number of LSC iterations is set to 1, it can achieve a better ROI screening effect. In addition, it can be seen from Figure 8 that the change in the number of LSC iterations has no significant impact on the ROI extraction effect of the proposed algorithm. The pixel density distribution used to filter high-density pixels in Figure 9 also shows that the proposed ROI extraction method is not sensitive to the number of LSC iterations.

Secondly, the template size of the mean filter is enumerated, and the real-time performance of the ROI automatic extraction algorithm (as shown in Figure 10) and the proportion of the extracted ROI range (as shown in Figure 11) are counted. Moreover, the algorithm effect of the filtering template sizes of

1 \times 1

,

11 \times 11

,

21 \times 21

and

31 \times 31

is visualized in Figure 12, and the corresponding pixel density distribution after mean filtering is shown in Figure 13. The filter template size must be an odd number ranging from 1 to 31. It can be seen from Figure 10 that the algorithm’s time consumption has a relatively obvious decreasing trend with increases in the filter template size. When the filter template size exceeds 21, the algorithm’s time consumption region is stable. In addition, it can be seen from Figure 11 that the extracted ROI proportion has the same trend, and the decreasing degree is relatively low when the filter template size exceeds 21. The size of the filtering template determines the size of the input pixel block of the mean filtering module. After filtering the pixel block by the mean, the mean value of the boundary pixel in the window is obtained. If a small template is used for filtering, each pixel after filtering is only affected by the current pixel. As shown in Figure 12a, the boundary is extracted by means of superpixel segmentation to obtain a large number of detailed superpixels and superpixel boundaries. After filtering with a small-sized filtering template, all superpixel boundaries are considered to be the desired pixels in the complex region. The reason for this phenomenon is largely due to the influence of light spots and shadows on the pixel features in a small local area, resulting in over-segmentation of the super-pixel segmentation algorithm and eventually leading to a large number of pixels marked as complex texture points, which is quite different from the real situation. In addition, when the size of the filter template is larger, as shown in Figure 12b–d, the filtering module for a wide range of input pixels for a larger area boundary, through boundary average processing, can effectively restrain over-segmentation of the influence of the proposed algorithm. Moreover, the number of complex texture pixels is restrained, and it reduces the number of pixels to the subsequent need to extract. Thus, the real-time performance of the algorithm is improved, and the time consumption of the algorithm is reduced. The pixel density distribution after mean filtering, as shown in Figure 13, also shows that, when the size of the filter template is small, it is difficult to filter the complex area according to the density distribution, and when the size of the filter template increases, the density distribution of the complex pixel area and the simple pixel area is relatively high.

Finally, the complexity threshold is enumerated, and the real-time performance of the automatic ROI extraction algorithm (as shown in Figure 14) and the proportion of the extracted ROI range (as shown in Figure 15) are statistically analyzed. The effect of the algorithm with complexity thresholds of 1, 31, 61, 91 and 121 is visualized as shown in Figure 16. The corresponding pixel density distribution after mean filtering is shown in Figure 17. The complexity threshold must be an integer ranging from 1 to 121. As can be seen from Figure 14, the time consumption of the algorithm decreases with increases in the complexity threshold, which is because the extraction time is reduced due to reductions in the retained complex pixels, thus reducing the time consumption of the entire algorithm. It can be seen from Figure 15 that, when the complexity threshold is low, it is difficult to effectively extract the ROI. When the threshold exceeds 85, the ROI can be effectively extracted. Due to the threshold comparison strategy adopted in this paper to screen complex pixels, when the set threshold is too small, all points are marked as complex pixels, as shown in Figure 16a,b. When the threshold is set reasonably, the complex texture pixels can be effectively screened, thus extracting effective ROI areas. The corresponding mean filtered pixel density distribution is shown in Figure 17. The wave peak represents the location of the complex texture pixel. Setting the threshold used for screening reasonably can accurately mark the location of the complex texture pixel, realize effective screening of the complex pixel and then extract the reasonable ROI.

In order to verify the effectiveness of the proposed method under two common weather conditions, i.e., rainy days and at night, the collected images were processed according to the parameters in Table 1. Part of the results are shown in Figure 18, Figure 19 and Figure 20:

As can be seen from Figure 18, the method proposed in this paper can mark areas with complex textures in the scene both at night and on rainy days. As can be seen from Figure 19a–e, light spots at night and background brightness have an impact on the algorithm, especially light spots, which lead to an increase in texture complexity. However, light spots are mainly generated by vehicles, street lights and other traffic elements, so areas with increased texture complexity are still areas that need to be focused on and have little impact on the final ROI extraction effect. In the rainy scene, although the texture complexity region is extracted, the ROI still takes a large proportion in the original picture. Through the analysis of the process in Figure 19f–j, it can be seen that the main reason for this phenomenon is that the surveillance camera below is white, which is quite different from the ground in the rainy scene. This difference leads to increased boundary and texture complexity, which ultimately leads to more areas being labeled as ROI areas. In summary, the method proposed in this paper can realize the annotation of complex texture areas and can mark them as ROI areas in good weather, at night and on rainy days. However, the degree of the detail of annotation, as well as the setting of parameters that affect the annotation effect and the labeling method of the ROI, need to be further studied in the future.

4. Conclusions

In order to make the target-based detection algorithm pay more attention to key areas, such as congestion under various weather conditions, this paper proposes a method that can automatically mark congested areas in surveillance videos. The performance test was carried out on an extended UA-DETRAC dataset, and the time spent by the algorithm and the proportion of extracted congested areas in the whole image were calculated. The results show that:

(1): The texture complexity of the congested area is relatively high, so the congested area in the image can be screened by texture density and can be marked as an ROI area.
(2): When using the method proposed in this paper for automatic ROI selection, it is necessary to set a large mean filtering template, to minimize the number of iterations of superpixel segmentation and to set a reasonable texture complexity threshold.
(3): The proposed method can be used to extract the ROI at night and on rainy days. However, when pixels with large light intensity appear in the scene, the ROI is not detailed enough.

In future work, the application of the ROI selection algorithm in target detection, the real-time performance of the ROI selection algorithm and the setting strategy for the threshold value will be studied.

Author Contributions

Conceptualization, Y.H. and L.J.; methodology, Y.H. and L.J.; software, Y.H.; validation, Y.H., H.W. and Z.H.; formal analysis, Z.H.; investigation, Y.H.; data curation, Y.H.; writing—original draft preparation, Y.H.; writing—review and editing, Y.H. and X.S.; visualization, H.W. and Z.H.; supervision, L.J.; project administration, G.W.; funding acquisition, L.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China, grant number 2021YFB3202200, and was funded by the S&T Program of Hebei, grant numbers 21340801D and 20310801D.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to policy and privacy.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kaffash, S.; Nguyen, A.T.; Zhu, J. Big data algorithms and applications in intelligent transportation system: A review and bib-liometric analysis. Int. J. Prod. Econ. 2021, 231, 107868. [Google Scholar] [CrossRef]
Chen, C.; Liu, B.; Wan, S.; Qiao, P.; Pei, Q. An edge traffic flow detection scheme based on deep learning in an intelligent transportation system. IEEE Trans. Intell. Transp. Syst. 2020, 22, 1840–1852. [Google Scholar] [CrossRef]
Wan, S.; Xu, X.; Wang, T.; Gu, Z. An intelligent video analysis method for abnormal event detection in intelligent transportation systems. IEEE Trans. Intell. Transp. Syst. 2020, 22, 4487–4495. [Google Scholar] [CrossRef]
Sam, D.B.; Peri, S.V.; Sundararaman, M.N.; Kamath, A.; Babu, R.V. Locate, size, and count: Accurately resolving people in dense crowds via detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 2739–2751. [Google Scholar]
Wang, K.; Liu, M.; Ye, Z. An advanced YOLOv3 method for small-scale road object detection. Appl. Soft Comput. 2021, 112, 107846. [Google Scholar] [CrossRef]
St-Charles, P.L.; Bilodeau, G.A.; Bergevin, R. SuBSENSE: A universal change detection method with local adaptive sensitivity. IEEE Trans. Image Process. 2014, 24, 359–373. [Google Scholar] [CrossRef]
Zhou, W.; Wu, J.; Lei, J.; Hwang, J.-N.; Yu, L. Salient object detection in stereoscopic 3D images using a deep convolutional residual autoencoder. IEEE Trans. Multimed. 2020, 23, 3388–3399. [Google Scholar] [CrossRef]
Zhang, H.; Qu, S.; Li, H.; Lua, J.; Xu, W. A moving shadow elimination method based on fusion of multi-feature. IEEE Access 2020, 8, 63971–63982. [Google Scholar] [CrossRef]
Han, S.; Yang, F.; Yang, G.; Gao, B.; Zhang, N.; Wang, D. Electrical equipment identification in infrared images based on ROI-selected CNN method. Electr. Power Syst. Res. 2020, 188, 106534. [Google Scholar] [CrossRef]
Suheryadi, A.; Sumarudin, A.; Puspaningrum, A.; Prastyo, E. Traffic sign detection and recognition by improving Region of Interest (ROI) division to support driver assistance system. IOP Conf. Ser. Mater. Sci. Eng. 2020, 850, 012042. [Google Scholar] [CrossRef]
Greenberg, S.; Rotman, S.R.; Guterman, H.; Zilberman, S.; Gens, A. Region-of-interest-based algorithm for automatic target detection in infrared images. Opt. Eng. 2005, 44, 077002. [Google Scholar] [CrossRef]
Liu, L.; Wang, R.; Xie, C.; Li, R.; Wang, F.; Zhou, M.; Teng, Y. Learning region-guided scale-aware feature selection for object detection. Neural Comput. Appl. 2021, 33, 6389–6403. [Google Scholar] [CrossRef]
Liang, Z.; Chi, Z.; Fu, H.; Feng, D. Salient object detection using content-sensitive hypergraph representation and partitioning. Pattern Recognit. 2012, 45, 3886–3901. [Google Scholar] [CrossRef]
Pandya, S.; Lu, T.; Chao, T.H. Optimizing feature selection strategy for adaptive object identification in noisy environment. In Proceedings of the Intelligent Robots and Computer Vision XXX: Algorithms and Techniques. International Society for Optics and Photonics, Washington, VA, USA, 4 February 2013; Volume 8662, p. 866209. [Google Scholar]
Gudigar, A.; Chokkadi, S.; Raghavendra, U.; Acharya, U.R. Multiple thresholding and subspace based approach for detection and recognition of traffic sign. Multimed. Tools Appl. 2017, 76, 6973–6991. [Google Scholar] [CrossRef]
Huang, Y.; Wu, C.; Yang, H.; Zhu, H.; Chen, M.; Yang, J. An Improved Deep Learning Approach for Retrieving Outfalls into Rivers from UAS Imagery. IEEE Trans. Geosci. Remote Sens. 2021, 60, 4703814. [Google Scholar] [CrossRef]
Cui, Z.; Lu, N. Feature selection accelerated convolutional neural networks for visual tracking. Appl. Intell. 2021, 51, 8230–8244. [Google Scholar] [CrossRef]
Li, Z.; Chen, J. Superpixel segmentation using linear spectral clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 6–8 June 2015; pp. 1356–1363. [Google Scholar]
Wen, L.; Du, D.; Cai, Z.; Lei, Z.; Chang, M.C.; Qi, H.; Lim, J.; Yang, M.H.; Lyu, S. UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking. Comput. Vis. Image Underst. 2020, 193, 102907. [Google Scholar] [CrossRef] [Green Version]
Achanta, R.; Shaji, A.; Smith, K.; Lucchi, A.; Fua, P.; Süsstrunk, S. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2274–2282. [Google Scholar] [CrossRef] [Green Version]
McLaren, K. Publications Sponsored by the Colour Measurement Committee—V the Adams—Nickerson Colour-difference Formula. J. Soc. Dyers Colour. 1970, 86, 354–356. [Google Scholar] [CrossRef]
Newhall, S.M.; Nickerson, D.; Judd, D.B. Final report of the OSA subcommittee on the spacing of the Munsell colors. J. Opt. Soc. Am. 1943, 33, 385–418. [Google Scholar] [CrossRef]
Adams, E.Q. XZ Planes in the 1931 ICI System of Colorimetry1. J. Opt. Soc. Am. 1942, 32, 168–173. [Google Scholar] [CrossRef]
Trezona, P.W. Derivation of the 1964 CIE 10° XYZ colour-matching functions and their applicability in photometry. Color Res. Appl. 2001, 26, 67–75. [Google Scholar] [CrossRef]
Gu, K.; Zhang, Y.; Qiao, J. Ensemble meta-learning for few-shot soot density recognition. IEEE Trans. Ind. Inform. 2020, 17, 2261–2270. [Google Scholar] [CrossRef]
Gu, K.; Liu, H.; Xia, Z.; Qiao, J.; Lin, W.; Thalmann, D. PM_2.5 Monitoring: Use Information Abundance Measurement and Wide and Deep Learning. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4278–4290. [Google Scholar] [CrossRef]
Gu, K.; Xia, Z.; Qiao, J.; Lin, W. Deep dual-channel neural network for image-based smoke detection. IEEE Trans. Multimed. 2019, 22, 311–323. [Google Scholar] [CrossRef]
Gu, K.; Xia, Z.; Qiao, J. Stacked Selective Ensemble for PM2.5 Forecast. IEEE Trans. Instrum. Meas. 2019, 69, 660–671. [Google Scholar] [CrossRef]
Gu, K.; Zhang, Y.; Qiao, J. Vision-based monitoring of flare soot. IEEE Trans. Instrum. Meas. 2020, 69, 7136–7145. [Google Scholar] [CrossRef]

Figure 1. Overall framework of the algorithm.

Figure 2. Flowchart of LSC algorithm.

Figure 3. Flowchart of complex texture point screening.

Figure 4. Visualization of algorithm processing. The blue boxes are the extracted ROI area.

Figure 5. Pixel intensity after mean filtering. (a) is the result of image00001.jpg of MVI20011. (b) is the result of image00013.jpg of MVI40192. (c) is the result of image00004.jpg of MVI63521. (d) is the result of image00112.jpg of MVI63563.

Figure 6. The time consumption curve of the proposed method. The abscissa represents the number of LSC iterations.

Figure 7. Pixel intensity after mean filtering and the proportion curve of the ROI area of the proposed method. The abscissa represents the number of LSC iterations.

Figure 8. Effect diagram of different number of LSC iterations on the results of the proposed method, where the number of iterations in (a) is 1, in (b) is 5, in (c) is 10, in (d) is 15 and in (e) is 20. The blue boxes are the extracted ROI area.

Figure 9. Pixel density after mean filtering of the proposed method corresponding to different LSC iteration times, where the number of iterations in (a) is 1, in (b) is 5, in (c) is 10, in (d) is 15 and in (e) is 20.

Figure 10. The time consumption curve of the proposed method. The abscissa represents the template size of mean filtering.

Figure 11. The proportion curve of the ROI area of the proposed method. The abscissa represents the template size of mean filtering.

Figure 12. Effect of different mean filtering template sizes on the results of the proposed method. The size of the filter template in (a) is

1 \times 1

, the size of the filter template in (b) is

11 \times 11

, the size of the filter template in (c) is

21 \times 21

and the size of the filter template in (d) is

31 \times 31

. The blue boxes are the extracted ROI area.

Figure 12. Effect of different mean filtering template sizes on the results of the proposed method. The size of the filter template in (a) is

1 \times 1

, the size of the filter template in (b) is

11 \times 11

, the size of the filter template in (c) is

21 \times 21

and the size of the filter template in (d) is

31 \times 31

. The blue boxes are the extracted ROI area.

Figure 13. Pixel density after mean filtering of the proposed method corresponding to different mean filtering template sizes. The size of the filter template in (a) is

1 \times 1

, the size of the filter template in (b) is

11 \times 11

, the size of the filter template in (c) is

21 \times 21

and the size of the filter template in (d) is

31 \times 31

.

Figure 13. Pixel density after mean filtering of the proposed method corresponding to different mean filtering template sizes. The size of the filter template in (a) is

1 \times 1

, the size of the filter template in (b) is

11 \times 11

, the size of the filter template in (c) is

21 \times 21

and the size of the filter template in (d) is

31 \times 31

.

Figure 14. The time consumption curve of the proposed method. The abscissa represents the texture complexity threshold.

Figure 15. The proportion curve of the ROI area of the proposed method. The abscissa represents the texture complexity threshold.

Figure 16. Effect of different texture complexity thresholds on the results of the proposed method, where the texture complexity threshold in (a) is 1, in (b) is 31, in (c) is 61, in (d) is 91 and in (e) is 121. The blue boxes are the extracted ROI area.

Figure 17. Pixel density after mean filtering of the proposed method corresponding to different complexity thresholds, where the texture complexity threshold in (a) is 1, in (b) is 31, in (c) is 61, in (d) is 91 and in (e) is 121.

Figure 18. The ROI area extracted by the proposed method, where (a) is the original picture of night, (b) is the original picture of a rainy day, (c) is the annotation effect of night and (d) is the annotation effect of a rainy day. The blue boxes are the extracted ROI area.

Figure 19. ROI extraction process at night and on rainy days where (a,f) are the superpixel image, (b,g) are the boundary maps of superpixel pictures, (c,h) are the mean filtered boundary map results, (d,i) are the filter results by the threshold, where the red area is the area with complex texture. (e,j) show the ROI area extracted by the complex texture.

Figure 20. Pixel density of image at night and on rainy days, where (a) shows the pixel density of image at night, (b) shows the pixel density of image on rainy days.

Table 1. Specific parameters of the algorithm.

Parameters	Values
number of LSC iterations	1
mean filter template size	21 × 21
texture complexity threshold	90

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

He, Y.; Jin, L.; Wang, H.; Huo, Z.; Wang, G.; Sun, X. Automatic ROI Setting Method Based on LSC for a Traffic Congestion Area. Sustainability 2022, 14, 16126. https://doi.org/10.3390/su142316126

AMA Style

He Y, Jin L, Wang H, Huo Z, Wang G, Sun X. Automatic ROI Setting Method Based on LSC for a Traffic Congestion Area. Sustainability. 2022; 14(23):16126. https://doi.org/10.3390/su142316126

Chicago/Turabian Style

He, Yang, Lisheng Jin, Huanhuan Wang, Zhen Huo, Guangqi Wang, and Xinyu Sun. 2022. "Automatic ROI Setting Method Based on LSC for a Traffic Congestion Area" Sustainability 14, no. 23: 16126. https://doi.org/10.3390/su142316126

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automatic ROI Setting Method Based on LSC for a Traffic Congestion Area

Abstract

1. Introduction

2. Proposed Method

2.1. Overview of the Proposed Framework

2.2. LSC-Based Superpixel Segmentation of the Image

2.3. Filter-Based Selection of Complex Texture Points

2.4. Maximin-Based Selection of Complex Texture Area

3. Experiment Results and Discussion

3.1. Data Description

3.2. Implementation Details

3.3. Experiment Results

3.4. Analysis and Discussion

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI