Automatic and Fast Recognition of On-Road High-Emitting Vehicles Using an Optical Remote Sensing System

Xie, Hao; Zhang, Yujun; He, Ying; You, Kun; Fan, Boqiang; Yu, Dongqi; Li, Mengqi

doi:10.3390/s19163540

Open AccessArticle

Automatic and Fast Recognition of On-Road High-Emitting Vehicles Using an Optical Remote Sensing System

by

Hao Xie

^1,2

,

Yujun Zhang

^1,*,

Ying He

¹,

Kun You

¹,

Boqiang Fan

^1,2,

Dongqi Yu

^1,2 and

Mengqi Li

^1,2

¹

Key Laboratory of Environmental Optics & Technology, Anhui Institute of Optics and Fine Mechanics, Chinese Academy of Sciences, Hefei 230031, China

²

University of Science and Technology of China, Hefei 230026, China

^*

Author to whom correspondence should be addressed.

Sensors 2019, 19(16), 3540; https://doi.org/10.3390/s19163540

Submission received: 25 June 2019 / Revised: 10 August 2019 / Accepted: 11 August 2019 / Published: 13 August 2019

(This article belongs to the Special Issue Computational Intelligence in Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

:

Optical remote sensing systems (RSSs) for monitoring vehicle emissions can be installed on any road and provide non-contact on-road measurements, that allow law enforcement departments to monitor emissions of a large number of on-road vehicles. Although many studies in different research fields have been performed using RSSs, there has been little research on the automatic recognition of on-road high-emitting vehicles. In general, high-emitting vehicles and low-emitting vehicles are classified by fixed emission concentration cut-points, that lack a strict scientific basis, and the actual cut-points are sensitive to environmental factors, such as wind speed and direction, outdoor temperature, relative humidity, atmospheric pressure, and so on. Besides this issue, single instantaneous monitoring results from RSSs are easily affected by systematic and random errors, leading to unreliable results. This paper proposes a method to solve the above problems. The automatic and fast-recognition method for on-road high-emitting vehicles (AFR-OHV) is the first application of machine learning, combined with big data analysis for remote sensing monitoring of on-road high-emitting vehicles. The method constructs adaptively updates a clustering database using real-time collections of emission datasets from an RSS. Then, new vehicles, that pass through the RSS, are recognized rapidly by the nearest neighbor classifier, which is guided by a real-time updated clustering database. Experimental results, based on real data, including the Davies-Bouldin Index (DBI) and Dunn Validity Index (DVI), show that AFR-OHV provides faster convergence speed and better performance. Furthermore, it is not easily disturbed by outliers. Our classifier obtains high scores for Precision (PRE), Recall (REC), the Receiver Operator Characteristic (ROC), and the Area Under the Curve (AUC). The rates of different classifications of excessive emissions and self-adaptive cut-points are calculated automatically in order to provide references for law enforcement departments to establish evaluation criterion for on-road high-emitting vehicles, detected by the RSS.

Keywords:

optical remote sensing system; emission data analysis; self-adaptive clustering database; automatic high-emitting recognition

1. Introduction

Vehicle emission are a major factor in urban air pollution, and car ownership continuously increases every year [1]. Thus, it is essential that we use available measures to monitor and control vehicle emissions. Generally, these measures consist of chassis and engine dynamometer tests, road-tunnel measurements, portable emission measurement systems (PEMS), plume chasing measurements, and optical remote sensing systems (RSSs). Chassis and engine dynamometer testing cannot reflect the real emission levels in on-road driving conditions [2], and road-tunnel methods are subject to geographical and environmental conditions [3]. PEMS and plume chasing measurement can precisely determine vehicle emissions, but PEMS take considerable time to install and uninstall these systems to transfer them between vehicles, and plume chasing measurements limit the speed and minimum distance for safety; these approaches are not suitable for monitoring a large number of vehicles. Further, their high price must be taken into consideration [4,5]. RSSs adopt non-dispersive infrared technology to detect CO, CO₂, HC, and they use middle-infrared laser spectrum technology to detect NO; thus, RSSs can be used to perform non-contact on-road measurements [6]. An RSS can be installed on any road, rendering it a feasible and real-time measurement system for law enforcement departments to detect on-road high-emitting vehicles, where it is not viable to use the other three methods.

Many researchers have conducted studies with RSSs. Stedman and Bishop, who invented and developed it for a series of studies, were the pioneers of the RSS [7]. Kang et al. proposed a two-step location strategy using both, depth-first searching and greedy strategy, to find the minimum set of roads with traffic emission monitors, based on the digraph modeled from the traffic network [8]. Huang et al. researched the mechanism, applications, as well as a case study of RSS from Hong Kong. Their studies showed that the accuracy and number of vehicles affected by remote sensing screening programs were highly dependent on the cut-points, and that using fixed conservative cut-points in absolute concentrations (% or ppm) may be inappropriate [9]. Bernard et al. carried out a lot of research on RSS in Europe, and they used a laboratory limit to distinguish high-emitting vehicles [10,11]. Zhang et al. used a long short-term memory (LSTM) network to forecast vehicle emissions using multi-day observations by an RSS [12]. Even though many studies have been performed in different research fields using RSSs [13], little research has been carried out to automatically detect on-road high-emitting vehicles using this technology.

Usually, high-emitting vehicles and low-emitting vehicles are classified by the fixed cut-off concentrations of

C O, H C,

and

N O

. However, the set values for these cut-points lack a scientific basis [14]. RSS measurements are highly sensitive to multiple environmental factors, such as geographical conditions, meteorological conditions, air quality, wind, humidity, temperature, and so on, so the cut-off points between high-emitting and low-emitting vehicles are variable among different sites, times, and RSS equipment. To solve the above problem, we propose a novel adaptive method in this paper to establish cut-points and recognize high-emitting vehicles quickly and automatically. The system combines data analysis with clustering and classification methods from machine learning, and attempt to apply these methods to remote sensing monitoring of on-road high-emitting vehicles.

Firstly, 192,097 vehicle emission datasets, comprising

C O, H C,

and

N O

concentrations were collected by RSSs for 8 days. Secondly, we used three-dimensional and histogram statistics to analyze emission relationships. Secondly, an adaptive clustering algorithm was developed to rapidly label and rapidly divide the most recent 10,000 emission datasets into different high-emitting or low-emitting zones. Finally, new vehicles passing through the RSS were automatically and quickly classified into the corresponding zone, using a cluster database and nearest-neighbor classifier.

The core idea of our proposed algorithm is adaptive clustering. In general, there are five types of clustering methods in unsupervised learning: hierarchical-based clustering, density-based clustering, grid-based clustering, model-based clustering, and partition-based clustering. Hierarchical-based clustering generally includes Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) [15], Clustering Using REpresentatives (CURE) [16], RObust Clustering using linKs (ROCK) [17], and Chameleon [18]. The datasets are aggregated (bottom-up) or divided (up-bottom) into a series of nested subsets to form a tree structure. The hierarchical method has two major drawbacks; one is its high time-complexity. The second is that, once a mistake is made in one step, all subsequent steps will fail because of the inner greedy algorithm. Density-based clustering, which includes Density-Based Spatial Clustering of Applications with Noise (DBSCAN) [19], Ordering Points To Identify the Clustering Structure (OPTICS) [20], Distribution-Based Clustering of Large Spatial Databases (DBCLASD) [21], and DENsity-based CLUstEring (DENCLUE) [22], can divide datasets into arbitrary shapes by their regions of density, connectivity, and boundary, but it is extremely sensitive to the two initial parameters. Grid-based clustering divides the data space into grids and computes the density of each grid in order to identify high-density grids, and then adjacent high-density grids are integrated to become a cluster. Wave-Cluster [23] and STtatistical INfromation Grid (STING) [24] are typical examples of this clustering method. Model-based clustering optimizes the fit between the given data and the assumed model, which is based on statistics or neural network. The Gaussian Mixture Model (GMM) [25] and Self-Organizing Maps (SOM) [26] are representative of these two types of models. Partition-based clustering iteratively relocates datasets with a heuristic algorithm until optimization is achieved. There are many partitioning algorithms, such as K-Means, K-Means++ [27], kernel K-Means [28], K-Medoids [29], K-Modes [30], and Fuzzy C-means (FCM) [31]. K-Means++ and K-Medoids are used to restrain the sensitivity of the initial K values and outliers. K-modes and kernel K-means can be used in categorical or non-convex data, which traditional K-means are unable to do. FCM is a soft-threshold clustering method, compared with the hard-threshold of K-means.

The RSS in this study includes, fast and real-time features, as well as a large number of measured concentrations. Given the above advantages and disadvantages of the methods, our proposed approach applies a partition-based method. The most typical partition-based method, called K-means, is efficient for large datasets and has low time and space demands. However, K-means is sensitive to outliers and the selection of the initial K values. The adaptive method, called AFR-OHV was proposed in this paper to solve these two problems.

The remaining content in this paper is organized as follows. In Section 2, the emission datasets of the RSS are analyzed and our proposed method is introduced in detail. The experimental results and discussion are provided in Section 3. The paper is concluded in the last section.

2. Preliminaries

2.1. Emission Data Collection

The emission data were collected by an Optical Remote Sensing System (RSS) for 8 days from December 2018 to January 2019 on Xueyuan Road, Shijiazhuang City, Hopei Prov, China, Yangqiao Road, Hefei City, Anhui Prov, China, and Xincun Road, Zibo City, Shandong Prov, China respectively. The Optical Remote Sensing System, shown in Figure 1, consists of vertical remote sensing hosts, a velocity-measuring part, a vehicle license plate recognition part, an environmental monitoring part, an industrial personal computer (IPC), an LED display, and retroreflective sheeting. The advantage of a vertical RSS, compared with a road-side RSS, is that the monitoring of vehicles in a single lane is not disturbed by other vehicles simultaneously passing through other lanes, which can block the measurement light path when using road-side RSSs. Non-dispersion infrared technology is used to detect the concentration of

C O, C O_{2}, H C,

and middle-infrared laser spectrum technology is used to detect the concentration of

N O

by a vertical remote sensing host. When a vehicle passes through the vertical remote sensing host, the concentration of each emission gas in the exhaust plume is measured by the attenuation of light intensity, as defined by the Beer-Lambert Law [32],

I_{(λ)} = I_{0 (λ)} \exp (- δ c L)

(1)

where

I_{0 (λ)}

and

I_{(λ)}

are the initial and received light intensity,

δ

is the molecular absorption coefficient,

c

is the concentration of a particular gas,

L

is the absorption beam path and

λ

is the wavelength.

In the velocity-measuring part, the radar and laser detection technology measure the vehicle speed, and acceleration, respectively. A camera, a video capture card, and license plate automatic recognition software are integrated into the vehicle license plate recognition part. The temperature, relative humidity, wind speed, wind direction, atmospheric pressure, and gradient are obtained by the environmental monitoring part. The data collected by all sensors are uploaded to an industrial personal computer (IPC) for processing, so that on-road high-emitting vehicles can be recognized automatically. In addition, the license plate number, vehicle speed, and emission detection results are shown by the LED display in real time.

2.2. Collected Data Analysis

The 192,097 emission datasets, which were collected by the IPC in the RSS, include the percentage concentration of

C O, C O_{2}, H C,

and

N O

, as well as the vehicle speed, acceleration, and the gradient. Since the detection of on-road high-emitting vehicles is related to the percentage concentration of

C O, H C, N O

and vehicle specific power (VSP), three-dimensional and histogram statistics were adopted to analyze the relationships between these four parameters, as shown in Figure 2 and Figure 3. VSP is calculated by the IPCs by the follow Equation [33],

V S P = v \times [a \times (1.1 + 9.8 \times θ) + 0 . 132] + 0 . 000302 \times v^{3}

(2)

where

v

is vehicle speed,

a

is vehicle acceleration and

θ

is the gradient.

Analysis of the data distribution in Figure 2 and Figure 3, reveals several emission relationships:

Few points fall into the zone in which the concentrations of all three emission gases are very high, as shown in Figure 2.
According to the U.S National Environmental Protection Agency (EPA), remote sensing data are valid for VSP ranges of 0–20 kW/t [34]; otherwise, the concentrations of CO and HC are likely to have abnormally high values. Figure 3a shows the VSP values in our remote sensing datasets are mostly within the valid range, and the data out of this range were eliminated and deemed invalid.
The probability density function that fits the emission datasets is represented by the solid red line in Figure 3b–d. This fit indicates that the NO and HC emission data do not follow a normal distribution, while the CO emission data approximately fit an exponential distribution.
Most of the emission datasets are located in a concentration zone, that is marked between two boundaries denoted by the red dashed lines in Figure 3b–d. At both ends of this concentration zone, the number of vehicles has a very significant downward trend.

The purpose of our data analysis is to identify the relationships in the emission data collected by the RSS, so that we can improve the method, and quickly and adaptively recognize high-emitting vehicles.

2.3. Data Quality Consideration

To ensure real-time detection of a high emitting vehicle has been performed correctly, the assessment of data quality is based on a comprehensive reference to EPA [34], Hong Kong Transient Emission Test (HKTET) [35], and local standards in Anhui Prov, China, including the following:

Monitoring interval: The interval between each vehicle passing the RSS is not less than 1 s, and the monitoring results of the two vehicles passing the RSS time less than 1 s are regarded as invalid.
Environmental conditions: The wind speed of the monitoring site shall not exceed 5 m/s; the ambient temperature of the monitoring site shall be in the range of 0–45°; and the relative humidity of the monitoring site shall be less than 80%.
Vehicle condition: The VSP, speed and acceleration of the monitored vehicle must be in the range of 0–20 kW/t, 0–90 km/s, and −5~3 km/s/h respectively.
$C O_{2}$ concentration: The $C O_{2}$ concentration of monitored vehicle should be maintained at 12–16%.

If any of the above conditions are not met, the corresponding monitoring data in our RSS is considered invalid.

3. Methods

This paper proposes an automatic and fast recognition method that detects on-road high-emitting vehicles, by using the above emission relationships. The proposed method is described in Figure 4. The training dataset

X \subset R^{n \times d}

is loaded and updated for every n new data from the sampling dataset

D \subset R^{m \times d}

, using the automatic boundary detection (ABD) and initial K-center determination (IKD) methods, in order to determine the initial positions of the K-points. After that, the training dataset is normalized to maintain the same weights of different emission gases and clustered by K-medoids. Then, different clusters are labeled and defined. Also, the dataset, label “1”, is extracted to update the cut-points between high-emitting and low-emitting zones of different emission gases. The above processes construct the cluster database in our method, and the outputs,

X_{t r a i n}

and

L_{t r a i n}

, are inputs to the nearest-neighbor classifier to complete automatic and fast recognition of the testing dataset. The specific sub-algorithms are described in the next subsection.

3.1. Automatic Boundary Detection

Firstly, automatic boundary detection (ABD) is proposed in this paper, in order to improve the adaptability of the high-emitting recognition algorithm. ABD is detailed in Algorithm 1. It loads the most recent

n

datasets into the database of the IPC. The choice of the

n

value, and tests to optimize the clustering speed, are discussed in the experimental section.

Because the concentrations of NO, HC, and CO emissions are the focus of this paper, the characteristic dimension of the datasets is 3. Our method is also suitable for datasets with high feature dimensions owing to the advantages of partition-based clustering. In Algorithm 1, the

c e i l (x)

,

m a x (X)

, and

h i s t o g r a m (X, δ)

call library functions that round up the value

x

, take the maximum of the array

X

, and calculate the histogram of the array

X

and divide it into

δ

equal intervals, respectively.

Algorithm 1. ABD Algorithm

Input:

D = {x_{1}, x_{2}, \dots, x_{m}}

: m 3-dimensional emission datasets;

x_{i 1}, x_{i 2}, x_{i 3}

: the concentrations of NO, HC, CO

n

: the number of datasets that can be loaded in the main memory

Output:

{δ_{1}, δ_{2}, δ_{3}}

: the max concentration of NO, HC, CO;

{X_{m a x 1}, X_{m a x 2}, X_{m a x 3}}

: the upper boundary values of NO, HC, CO;

{X_{m i n 1}, X_{m i n 2}, X_{m i n 3}}

: the lower boundary values of NO, HC, CO;

1: load

X = {x_{m - n + 1}, x_{m - n + 2}, \dots, x_{m}}

from

D = {x_{1}, x_{2}, \dots, x_{m}}

2: for

j =

1 to 3 do

3: for

i =

1 to n do

4:

δ_{j} = c e i l (m a x (X_{i j}))

5:

Y_{j} = h i s t o g r a m (X_{i j}, δ_{j})

6: end for

7: for

i =

1 to

δ_{j} - 1

do

8:

Z_{i j} = Y_{i j} - Y_{(i + 1) j}

9: end for

10: if

j \leq 2

, then

11:

X_{m a x (j)} = \underset{1 \leq i \leq δ_{j} - 1}{a r g m a x} Z_{i j} (X_{i j})

12:

X_{m i n (j)} = \underset{1 \leq i \leq δ_{j} - 1}{a r g m i n} Z_{i j} (X_{i j})

13: else

14:

X_{m a x (j)} = \underset{100 \leq i \leq δ_{j} - 1}{a r g m a x} Z_{i j} (X_{i j})

15:

X_{m i n (j)} = 0

16: end if

17: end for

Figure 3b–d show an example of the results computed by Algorithm 1, with n representing the maximum number of samples. The automatic detection boundaries are indicated by the red dotted lines.

3.2. Initial K-Center Determination

After the maximum and boundary concentrations of each emission gas are established, the proposed method applies the initial K-center determination algorithm, which is detailed in i Algorithm 2.

The IKD algorithm first calculates the center values of the high- and low-emission zones of each gas, and then it forms matrix A, which contains all the center values. At the end of IKD, the function

b i t g e t (i, 1 : 3)

is adopted to return a binary value of

i

from low to high, to automatically generate the initial

k

center points. Since the ABD and IKD methods are continuous calculation processes, we combined them into a single process termed automatic detection of initial k-center (ADIK).

Algorithm 2. IKD Algorithm

Input:

δ_{1}, δ_{2}, δ_{3}

: the max concentration of NO, HC, CO

X_{m a x 1}, X_{m a x 2}, X_{m a x 3}

: the upper-boundary values of NO, HC, CO

X_{m i n 1}, X_{m i n 2}, X_{m i n 3}

: the lower-boundary values of NO, HC, CO.

Output:

K = {u_{1}, u_{2}, \dots, u_{k}}

:

k

3-dimensional initial points

1: for

j =

1 to 3 do

2:

X_{l o w (j)} = \frac{X_{m a x (j)} + X_{m i n (j)}}{2}

3:

X_{h i g h (j)} = \frac{X_{m a x (j)} + δ_{j}}{2}

4: end for

5: define matrix

A = [\begin{matrix} X_{l o w 1} & X_{l o w 2} & X_{l o w 3} \\ X_{h i g h 1} & X_{h i g h 2} & X_{h i g h 3} \end{matrix}]

6: for

i =

1 to

k

do

7:

ε = b i t g e t (i - 1, 1 : 3) + 1

8:

u_{i} = (A (ε (1), 1), A (ε (2), 2), A (ε (3), 3))

9: end for

3.3. Normalization K-Medoids

By running the ADIK algorithms, we acquire the initial positions of

k

center points. To maintain the same weighting of the NO, HC, and CO emission data, a normalization method is adopted as follows,

X_{n o r m (i j)} = \frac{δ_{n o r m}}{δ_{j}} X_{i j}

(3)

K_{n o r m (o j)} = \frac{δ_{n o r m}}{δ_{j}} K_{o j}

(4)

where

i = 1, 2, \dots, n

,

j = 1, 2, 3

, and

o = 1, 2, \dots, k

.

Then K-medoids are used to cluster the emission datasets, as described in this subsection. The difference between K-means and K-medoids is that the central point

u_{k}

is selected in different ways,

u_{k - m e a n s} = \frac{1}{N_{k}} \sum_{x_{i} \in D_{k}} x_{i}

(5)

u_{k - m e d i o d s} = \underset{x_{i} \in D_{k}}{a r g m i n} \sum_{x_{j} \in D_{k}} {‖ x_{j} - x_{i} ‖}_{2}

(6)

where

D_{k}

is the dataset of class

k

. Compared with K-means, the advantage of using K-medoids to select the central point is that it can effectively eliminate the influence of outliers on the clustering results, and it also increases the total running time of the algorithm. The detailed calculation process of K-medoids is shown in Algorithm 3.

The function

r e p m a t (A, n, m)

returns an array containing

n \times m

copies of A in the row and column dimensions. The running time of Algorithm 3 largely depends on the size of the clustering datasets and the initial positions of the

k

center points, which are shown in the experimental section.

Algorithm 3. K-Medoids algorithm

Input:

X_{n o r m} = {x_{1}, x_{2}, \dots, x_{n}}

: n 3D normalized emission datasets extracted from the database

D

K_{n o r m} = {u_{1}, u_{2}, \dots, u_{k}}

:

k

normalized initial points

ε

- convergence threshold

Output:

K^{'} = {u_{1}^{'}, u_{2}^{'}, \dots, u_{k}^{'}}

:

k

3-dimensional final K points

B = {b_{1}, b_{2}, \dots, b_{n}}

—indicates the class to which

x_{n}

belongs;

I t e r

: iterations of algorithm

1: for

I t e r =

1 to 100 do

2: for

i =

1 to n do

3:

d i s t = {‖ r e p m a t (X_{n o r m} (:, i), 1, k) - K_{n o r m} ‖}_{2}

4:

[~, i n d e x] = m i n (d i s t)

5:

B (i) = i n d e x

6: end for

7: for

i =

1 to

k

do

8:

X = X_{n o r m} (:, f i n d (B = = i))

9:

N = s i z e (X, 2)

10: for

j =

1 to

N

do

11:

t o t a l d i s t (j) = s u m ({‖ X - X (:, j) * o n e s (1, N) ‖}_{2})

12: end for

13:

[~, m i n d e x] = m i n (t o t a l d i s t)

14:

K^{'} (:, i) = X (:, m i n i n d e x)

15: end for

16: if

‖ K^{'} - K_{n o r m} ‖ \leq ε

17: break

18: end if

19:

K_{n o r m} = K^{'}

20: end for

3.4. Label and Definition

After clustering is finished, different clusters of emission datasets can be labeled by the formula,

L a b e l = B \times {(1 : k)}^{T}

(7)

where

B = {b_{1 k}, b_{2 k}, \dots, b_{n}_{k} | b_{n}_{k} \in {0, 1}}

is as described in the above subsection.

The unlabeled samples in the training datasets are transformed into labeled samples by this method. The labels and definitions of the results are shown in Table 1.

3.5. Nearest Neighbor Classifier

Once the clustered datasets have been established and labeled, the K-NN algorithm, which is shown in Algorithm 4, is applied to rapidly detect high-emitting vehicles.

Algorithm 4. K-NN algorithm

Input:

X_{t r a i n} = X_{n o r m} = {x_{1}, x_{2}, \dots, x_{n}}

—n 3-dimensional training emission datasets

X_{t e s t} = {t_{1}, t_{2}, \dots, t_{p}}

—m 3-dimensional testing emission datasets

L_{t r a i n} = {l_{1}, l_{2}, \dots, l_{n}}

—the labels of training emission datasets

k

—initial parameters of K-NN

Output:

L_{t e s t} = {c_{1}, c_{2}, \dots, c_{p}}

—the labels of testing emission datasets

1: for

i =

1 to

p

do

2:

d i f f = r e p m a t (X_{t e s t} (i), [n, 1]) - X_{t r a i n}

3:

d i s t = \sqrt{\sum_{j = 1}^{3} d i f f {(j)}^{2}}

4:

[X_{s o r t}, I X] = s o r t (d i s t)

5:

t o t a l l a b = L_{t r a i n} (I X (1 : k))

6:

L_{t e s t} (i) = m o d e (t o t a l l a b)

7: end for

K-NN calculates the Euclidean distance between the testing sample and all training samples, and then the

k

training samples, closest to the test sample, are selected. The value that appears most frequently in the labels, corresponding to

k

training samples, is regarded as the label of the testing sample.

3.6. Update Cut-Points of Excessive Emissions

As the dataset labeled “1” is defined as a “No Excessive Emissions” zone, it can be extracted to update the cut-points that define high-emitting and low-emitting zones. In the approach proposed in this paper, the maximum concentrations of different emissions gases, which are regarded as the cut-points, are calculated in the dataset labeled “1”, and they are updated for every n newest input dataset.

4. Experiments and Discussion

In order to verify the advantages of the proposed method, we performed several experiments, which are described in this section. All experiments were conducted on a Windows10-64bit operation system with an Inter I5-7300U 2.71 Hz CPU and 8 GB RAM.

4.1. Experiment to Compare Clustering Methods

The performance of our proposed method was tested in the first experiment, which entailed the qualitative and quantitative analyses to compare K-means, K-medoids, and ADIK+K-means. All clustering processes were performed 30 times, and the average results are reported in Table 2. The clustering process, with the smallest total squared distance, was used as the sample for the qualitative analysis, which is shown in Figure 5 (emissions data were normalized).

By comparing the clustering results in Figure 5a–d, we found that our method effectively solved the problem of selecting the initial center of clustering, and the 60,000 datasets were divided into our defined emission zones. The outliers that influence K-means were eliminated by K-medoids, as shown in Figure 5c,d, and the proposed method obtained the best clustering results of the four tested methods.

Then the effectiveness of the clustering algorithms was tested using three types of qualitative indicators: The running time of the algorithm (TIME), the Davies Bouldin Index (DBI), and the Dunn Validity Index (DVI) [36,37],

D B I = \frac{1}{k} \sum_{i = 1}^{k} \underset{j \neq i}{m a x} (\frac{a v g (C_{i}) + a v g (C_{j})}{d_{c e n} (u_{i}, u_{j})})

(8)

D V I = \underset{1 \leq i \leq k}{m i n} {\underset{j \neq i}{m i n} (\frac{d_{m i n} (C_{i}, C_{j})}{\underset{1 \leq l \leq k}{m a x} d i a m (C_{l})})}

(9)

in which:

a v g (C) = \frac{2}{| C | (| C | - 1)} \sum_{1 \leq i < j \leq | C |} d i s t (x_{i}, x_{j})

(10)

d_{c e n} (C_{i}, C_{j}) = d i s t (u_{i}, u_{j})

(11)

d_{m i n} (C_{i}, C_{j}) = m i n_{x_{i} \in C_{i}, x_{j} \in C_{j}} d i s t (x_{i}, x_{j})

(12)

d i a m (C) = m a x_{1 \leq i < j \leq | C |} d i s t (x_{i}, x_{j})

(13)

where

a v g (C)

is the mean distance between samples in cluster

C

;

d_{c e n} (C_{i}, C_{j})

is the distance between the center points of cluster

C_{i}

and

C_{j}

;

u = \frac{1}{| C |} \sum_{1 \leq i \leq | C |} x_{i}

, which is the center point of

C

;

d_{m i n} (C_{i}, C_{j})

is the distance between the nearest samples of clusters,

C_{i}

and

C_{j}

; and

d i a m (C)

is the longest distance between samples in cluster

C

.

The smaller the TIME value, the higher the efficiency of the algorithm; the smaller the DBI and the larger the DVI, the better the clustering performance. As shown in Table 2, the ADIK method was adopted to rapidly determine the initial K-center, which was able to effectively reduce the convergence speed of the clustering method, reduce the DBI, and increase the DVI. The K-medoids approach eliminated the influence of outliers, and its DBI and DVI were better than those of the K-means method.

For the next step, the size of the clustering dataset and the clustering time were comprehensively considered. We chose n = 10,000 as the newest input training dataset. This dataset sizes not only ensured that the data characteristics were retained, but also allowed real-time updates of the RSS data. The average running time was less than 5 s, which satisfied the requirements for adaptability and real-time performance.

4.2. Performance Evaluation of the Nearest-Neighbor Classifier

After the clustering emission database was established, the performance of our classifier was tested. The qualitative and quantitative analytical methods, from experiment A, were adopted for this experiment as well.

The most recent 10,000 emission datasets, collected by the RSS were used as the training sets, and the training labels were the emission recognition results of our clustering database. The testing sets were accumulated by monitoring the emission dataset of each new vehicle that passed through the RSS, and the recognition results of 10,000 testing sets were compared with the validation sets, obtained by the clustering algorithm in the experiment, as shown in Figure 6.

The results of the quantitative analysis in Figure 6 show that our classifier obtained a better recognition result. Then, Precision (PRE) and Recall (REC) were used to test the performance of our classifier (Table 3). The formulas for these two indicators are,

P R E = \frac{T P}{T P + F P}

(14)

R E C = \frac{T P}{T P + F N}

(15)

where TP, FP, and FN denote true positive, false positive, and false negative, respectively.

Because the number of categories in our classified samples was unbalanced, the true positive rate (TPR) and false positive rate (FPR) were critical performance indicators. Therefore, the receiver operator characteristic (ROC) [38] based on these two indicators was adopted as shown in Figure 7.

Then, the area under the curve (AUC) [39] was calculated to test the final performance of the classifier, and the results are shown in Table 3. By calculating the various performance indexes for the four sample datasets collected at different times and places, we found that our classifier achieved good results. Here, we paid more attention to the evaluation indicators for category,

k_{1}

because

k_{1}

represents vehicles that do not exceed the standard, while all other categories represent vehicles that exceed the standard. The results of this quantitative experiment show that our classifier could accurately recognize the non-exceeding category,

k_{1}

and the exceeding categories

k_{2} ~ k_{5}

, and it achieved an adequate recognition rate for the emission-exceeding categories,

k_{6}

and

k_{7}

. The reason for this difference in classification performance might be the small sample size for

k_{6}

and

k_{7}

. Additionally, the results of tens of thousands of experiments show that the average recognition speed of our classifier was less than 0.1 s per detected vehicle, which meets the requirements for fast and automatic recognition.

When a new vehicle passes through the RSS, the classifier in the system will automatically distribute the detection result of the new vehicle into a category, according to the trained model, and the LED display will rapidly display the detection results. At the same time, the system would add count information to the database of monitoring results, and the information index is the license plate number of the new recognized car. For example, if a new car were to be assigned to category

k_{4}

, then the counts of excessive NO and HC emissions will increase once they are added to the database of detection results. If the total counts of this car exceed the limit, the system will blacklist the license plate number of this car and upload its information to inform law enforcement authorities.

The advantage of this processing method is that it eliminates some of the factors that might affect a single instantaneous monitoring system. The potential effects might include, noise from the optical equipment and the external environment and sudden acceleration or deceleration of a vehicle.

4.3. The Experiment for Detection Vehicles Exceeding the Standard Rate

In the experiment reported in this sub-section, the automatic and fast recognition method for detecting on-road high-emitting vehicles was tested for cases, in which the standard rate was exceeded. Six experimental datasets, obtained from two different geographical locations, Shijiazhuang and Hefei, were collected by the RSSs at different times, and each dataset contained 10,000 telemetric data points. The results of the experiment are shown in Table 4, which shows that the average rates of standards being exceeded and not exceeded were 27.69% and 72.31%, and the average rates of excessive NO, HC, and CO emissions were 10.53%, 12.98%, and 7.03% respectively.

4.4. The Experiment for Self-Adaptive Cut-Points

Experimental datasets were collected from three different geographical locations, which had been described in Section 2.1, for three days. As the cut-off points in the system were updated every 10,000 new datasets, we took the average of the cut-points in a day. The experimental results for self-adaptive cut-off points are shown in Table 5. We can find that the cut-off points in the table change with time and location, which proves that our proposed method has good adaptability.

It can be seen from the results in Table 5 that the cut-off points in our system do not change much with time, but with the change in geographical locations, a more obvious change takes place. As this experiment was done only verify to the adaptability of our proposed method, the relationships between cut-points and time, locations, outside environment, and different equipment need to be evaluated with more experimental datasets, which will be further demonstrated in future research work.

5. Conclusions

This paper proposes a method for the automatic and fast recognition of on-road high-emitting vehicles, called AFR-OHV. The first step in the AFR-OHV method is to adaptively determine the initial clustering center, according to the distribution characteristics of the most recently input RSS datasets, and to counteract the effects of environmental change to some extent. The second step in AFR-OHV is the normalization of the K-medoids clustering of the RSS datasets. After that, the RSS datasets are labeled and divided into different defined emission zones to construct a clustering database, and then the cut-points are updated automatically. The last step is to recognize high-emitting vehicles, which pass through RSS by a nearest-neighbor classifier, and to update the clustering database.

As reported in the experimental section, the performance of the method was verified using real data collected by RSS from December 2018 to January 2019 on Xueyuan Road, Shijiazhuang City, Hopei Prov, China, and Yangqiao Road, Hefei City, Anhui Prov, China. Different clustering methods were selected for comparison, and the experimental results show that the running time, DBI, and DVI resulting from our method were superior to those obtained using three other methods, namely, ADIK + K-means, K-medoids and K-means. Our classifier also had better performance indexes, i.e., PRE, REC, and AUC. In the last step, the rates of exceeded standards were calculated using multiple emission datasets collected by the RSS in two different geographical locations. The calculated rates provide reference values for law enforcement departments to establish evaluation criteria for on-road high-emitting vehicles detected by remote sensing systems.

The limitation of this paper’s work is that, when optical remote sensing systems, that are developed by different research institutions or companies, are used to detect on-road high-emitting vehicles, the distribution of the emission datasets might be significantly different. In our future work, we will research transfer learning and meta learning in an aim to improve our learning method. The objective is to improve the model so that it can be effectively applied to other optical remote sensing systems after training with a dataset from one set of optical remote sensing systems. In addition, we will research multi-RSS networking on adjacent streets to further reduce the monitoring error and improve the recognition accuracy.

Author Contributions

Conceptualization, H.X.; methodology, H.X.; software, H.X.; validation, Y.H. and K.Y.; writing—original draft preparation, H.X.; writing—review and editing, Y.Z., B.F., D.Y. and M.L.; supervision, Y.Z. and K.Y.; project administration, Y.H.; funding acquisition, Y.Z.

Funding

This research was funded in part by the National Key Research and Development Program of China, grant number 2016YFC0201000, in part by the Strategic Priority Research Program of the Chinese Academy of Sciences, grant number XDA23010204, and in part by the Instrument and Equipment Function Development Technology Innovation of the Chinese Academy of Sciences, grant number Y83H3y1251.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ministry of Ecology and Environment of the People’s Republic of China. China Vehicle Environmental Management Annual Report: 2018; Ministry of Ecology and Environment of the People’s Republic of China: Beijing, China, 2019. [Google Scholar]
Jaworski, A.; Kuszewski, H.; Ustrzycki, A.; Balawender, K.; Lejda, K.; Woś, P. Analysis of the repeatability of the exhaust pollutants emission research results for cold and hot starts under controlled driving cycle conditions. Environ. Sci. Pollut. Res. 2018, 25, 17862–17877. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Geller, M.D.; Sardar, S.B.; Phuleria, H.; Fine, P.M.; Sioutas, C. Measurements of Particle Number and Mass Concentrations and Size Distributions in a Tunnel Environment. Environ. Sci. Technol. 2005, 39, 8653–8663. [Google Scholar] [CrossRef] [PubMed]
O’Driscoll, R.; ApSimon, H.M.; Oxley, T.; Molden, N.; Stettler, M.E.J.; Thiyagarajah, A. A Portable Emissions Measurement System (PEMS) study of NO_x and primary NO₂ emissions from Euro 6 diesel passenger cars and comparison with COPERT emission factors. Atmos. Environ. 2016, 145, 81–91. [Google Scholar] [CrossRef]
Lau, C.F.; Rakowska, A.; Townsend, T.; Brimblecombe, P.; Chan, T.L.; Yam, Y.S.; Mocnik, G.; Ning, Z. Evaluation of diesel fleet emissions and control policies from plume chasing measurements of on-road vehicles. Atmos. Environ. 2015, 122, 171–182. [Google Scholar] [CrossRef]
Bishop, G.A.; Peddle, A.M.; Stedman, D.H.; Zhan, T. On-Road Emission Measurements of Reactive Nitrogen Compounds from Three California Cities. Environ. Sci. Technol. 2010, 44, 3616–3620. [Google Scholar] [CrossRef] [PubMed]
Bishop, G.A.; Stedman, D.H. Measuring the Emissions of Passing Cars. Acc. Chem. Res. 1996, 29, 489–495. [Google Scholar] [CrossRef]
Yu, K.; Li, Z.R.; Zhao, Y.B.; Qin, J.H.; Song, W.G. A novel location strategy for minimizing monitors in vehicle emission remote sensing system. IEEE Trans. Syst. Man Cyber. Syst. 2017, 48, 500–510. [Google Scholar]
Huang, Y.H.; Organ, B.; Zhou, J.L.; Surawski, N.C.; Hong, G.; Chan, E.F.C.; Yam, Y.S. Remote Sensing of on-road vehicle emissions: Mechanism, applications and a case study from Hong Kong. Atmos. Environ. 2018, 182, 58–74. [Google Scholar] [CrossRef]
Dallmann, T.; Bernard, Y.; Tietge, U.; Muncrief, R. Remote Sensing of Motor Vehicle Emissions in London; ICCT: Washington, DC, USA, 2018. [Google Scholar]
Tietge, U.; Bernard, Y.; German, J.; Muncrief, R. A Comparison of Light-Duty Vehicle NO_x Emissions Measured by Remote Sensing in Zurich and Europe; ICCT Consulting Report; National Academy of Sciences: Washington, DC, USA, 2019. [Google Scholar]
Zhang, Q.; Li, F.; Long, F.; Ling, Q. Vehicle Emission Forecasting Based on Wavelet Transform and Long Short-Term Memory Network. IEEE Access 2018, 6, 56984–56994. [Google Scholar] [CrossRef]
Shan, X.; Hao, P.; Chen, X.; Boriboonsomsin, K.; Wu, G.; Barth, M.J. Vehicle Energy/Emissions Estimation Based on Vehicle Trajectory Reconstruction Using Sparse Mobile Sensor Data. IEEE Trans. Intell. Transp. Syst. 2019, 20, 716–726. [Google Scholar] [CrossRef]
Ropkins, K.; DeFries, T.H.; Pope, F.; Green, D.C.; Kemper, J.; Kishan, S.; Fuller, G.W.; Li, H.; Sidebottom, J.; Crilley, L.R.; et al. Evaluation of EDAR vehicle emissions remote sensing technology. Sci. Total Environ. 2017, 609, 1464–1474. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Nirmala, G.; Thyagharajan, K.K. A Modern Approach for Image Forgery Detection using BRICH Clustering based on Normalized Mean and Standard Deviation. In Proceedings of the 2019 International Conference on Communication and Signal Processing (ICCSP), Chennai, India, 4–6 April 2019; pp. 441–444. [Google Scholar]
Guha, S.; Rastogi, R.; Shim, K. Cure: An efficient clustering algorithm for large databases. Inf. Syst. 2001, 26, 35–58. [Google Scholar] [CrossRef]
Guha, S.; Rastogi, R.; Shim, K. Rock: A robust clustering algorithm for categorical attributes. Inf. Syst. 2000, 25, 345–366. [Google Scholar] [CrossRef]
Karypis, G.; Han, E.H.; Kumar, V. Chameleon: Hierarchical Clustering Using Dynamic Modeling. Computer 2002, 32, 68–75. [Google Scholar] [CrossRef]
Birant, D.; Kut, A. ST-DBSCAN: An algorithm for clustering spatial–temporal data. Data Knowl. Eng. 2007, 60, 208–221. [Google Scholar] [CrossRef]
Ankerst, M.; Breunig, M.M.; Kriegel, H.P.; Sander, J. OPTICS: Ordering points to identify the clustering structure. ACM Sigmod Rec. 1999, 28, 49–60. [Google Scholar] [CrossRef]
Xu, X.; Ester, M.; Kriegel, H.P.; Sander, J. A distribution-based clustering algorithm for mining in large spatial databases. In Proceedings of the 14th International Conference on Data Engineering, Orlando, FL, USA, 23–27 February 1998; pp. 324–331. [Google Scholar]
Yu, X.G.; Jian, Y. A new clustering algorithm based on KNN and DENCLUE. In Proceedings of the 2005 International Conference on Machine Learning and Cybernetics, Guangzhou, China, 18–21 August 2005. [Google Scholar]
Adelfio, G.; Chiodi, M.; D’Alessandro, A.; Luzio, D.; D’Anna, G.; Mngano, G. Simultaneous seismic wave clustering and registration. Comput. Geosci. 2012, 44, 60–69. [Google Scholar] [CrossRef]
Sun, Q.X.; Yuan, J.; Zhang, X.B.; Sun, F.C. RGB-D SLAM in Indoor Environments with STING-Based Plane Feature Extraction. IEEE ASME Trans. Mechatron. 2018, 23, 1071–1082. [Google Scholar] [CrossRef]
Wang, G.; Sim, K.C. An investigation of tied-mixture GMM based triphone state clustering. In Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 25–30 March 2012. [Google Scholar]
Inokuchi, R.; Miyamoto, S. LVQ clustering and SOM using a kernel function. In Proceedings of the 2004 IEEE International Conference on Fuzzy Systems, Budapest, Hungary, 25–29 July 2004. [Google Scholar]
Arthur, D.; Vassilvitskii, S. k-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA, 7–9 January 2007. [Google Scholar]
Dhillon, I.S.; Guan, Y.Q.; Kulis, B. Kernel k-means: Spectral clustering and normalized cuts. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 22–25 August 2004. [Google Scholar]
Park, H.S.; Jun, C.H. A simple and fast algorithm for K-medoids clustering. Expert Syst. Appl. 2009, 36, 3336–3341. [Google Scholar] [CrossRef]
Ng, M.K.; Li, M.J.J.; Huang, J.Z.X.; He, Z.Y. On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 503–507. [Google Scholar] [CrossRef] [Green Version]
Shao, H.; Zhang, P.; Chen, X.; Li, F.; Du, G. A Hybrid and Parameter-Free Clustering Algorithm for Large Data Sets. IEEE Access 2019, 7, 24806–24818. [Google Scholar] [CrossRef]
Meng, Y.X.; Liu, T.G.; Liu, K.; Jiang, J.F.; Wang, R.R.; Wang, T.; Hu, H.F. A Modified Empirical Mode Decomposition Algorithm in TDLAS for Gas Detection. IEEE Photonics J. 2014, 6, 1–7. [Google Scholar] [CrossRef]
Yao, R.G.; Sun, L.; Long, M. VSP-based emission factor calibration and signal timing optimization for arterial streets. IET Intell. Transp. Syst. 2019, 13, 228–241. [Google Scholar] [CrossRef]
Wenzel, T. Use of remote sensing measurements to evaluate vehicle emission monitoring programs: Results from Phoenix, Arizona. Environ. Sci. Policy 2003, 6, 153–166. [Google Scholar] [CrossRef]
Huang, Y.H.; Organ, B.; Zhou, J.L.; Surawski, N.C.; Yam, Y.S.; Chan, E.F.C. Characterisation of diesel vehicle emissions and determination of remote sensing cutpoints for diesel high-emitters. Environ. Pollut. 2019, 252, 31–38. [Google Scholar] [CrossRef] [PubMed]
Nikolaou, T.G.; Kolokotsa, D.S.; Stavrakakis, G.S.; Skias, I.D. On the Application of Clustering Techniques for Office Buildings’ Energy and Thermal Comfort Classification. IEEE Trans Smart Grid 2012, 3, 2196–2210. [Google Scholar] [CrossRef]
Rathore, P.; Ghafoori, Z.; Bezdek, J.C.; Palaniswami, M.; Leckie, C. Approximating Dunn’s Cluster Validity Indices for Partitions of Big Data. IEEE Trans. Cybern. 2019, 49, 1629–1641. [Google Scholar] [CrossRef] [PubMed]
Feng, C.; Wang, W.; Tian, Y.; Que, X.; Gong, X. Estimate Air Quality Based on Mobile Crowd Sensing and Big Data. In Proceedings of the 2017 IEEE 18th International Symposium on A World of Wireless, Mobile and Multimedia Networks (WoWMoM), Macau, China, 12–15 June 2017. [Google Scholar]
Huo, J.; Gao, Y.; Shi, Y.H.; Yin, H.J. Cross-Modal Metric Learning for AUC Optimization. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 4844–4856. [Google Scholar] [CrossRef] [Green Version]

Figure 1. The optical remote sensing system for detecting on-road high-emitting vehicles.

Figure 2. The concentration distribution of three main types of emissions collected in 192,097 datasets: (a) 3D front view; (b) 3D side view.

Figure 3. The histogram of vehicle specific power (VSP) and three main types of emissions collected in 192,097 datasets: (a) VSP; (b) NO; (c) HC; (d) CO.

Figure 4. The architecture of the method for the automatic and fast recognition of on-road high-emitting vehicles.

Figure 5. The results of experiments to compare clustering methods: (a) K-means; (b) K-medoids; (c) ADIK+K-means; (d) proposed method.

Figure 6. The comparison results of clustering experiments: (a) testing sets; (b) validation sets.

Figure 7. The receiver operator characteristic (ROC) of different testing datasets: (a) dataset of Day 1; (b) dataset of Day 2; (c) dataset of Day 3; (d) dataset of Day 4.

Table 1. The labels and definitions of different k categories.

$k_{i}$	NO		HC		CO		Definition
$k_{i}$	High	Low	High	Low	High	Low	Definition
$k_{1}$	0	0	0	0	0	0	No Excessive Emissions
$k_{2}$	1	0	0	0	0	0	Excessive NO
$k_{3}$	0	0	1	0	0	0	Excessive HC
$k_{4}$	1	0	1	0	0	0	Excessive NO and HC
$k_{5}$	0	0	0	0	1	0	Excessive CO
$k_{6}$	1	0	0	0	1	0	Excessive NO and CO
$k_{7}$	0	0	1	0	1	0	Excessive HC and CO
$k_{8}$	1	0	1	0	1	0	Excessive NO, HC, and CO

Table 2. The performance test of different clustering algorithms.

Emission Dataset	Proposed Algorithm			ADIK + K-Means			K-Medoids			K-Means
Magnitude	Time (s)	DBI	DVI	Time (s)	DBI	DVI	Time (s)	DBI	DVI	Time (s)	DBI	DVI
5000	2.62 ± 0.44	19.68 ± 3.14	0.0102 ± 0.0013	2.16 ± 0.51	27.05 ± 2.32	0.0054 ± 0.0009	2.97 ± 1.54	17.82 ± 9.84	0.0039 ± 0.0014	2.73 ± 1.54	21.84 ± 13.37	0.0028 ± 0.0009
8000	4.47± 0.40	29.51 ± 4.04	0.0027 ± 0.0005	2.50 ± 0.33	42.83 ± 3.47	0.0028 ± 0.0008	3.79 ± 1.78	37.68 ± 14.55	0.0028 ± 0.0004	3.09 ± 1.31	52.57 ± 17.38	0.0019 ± 0.0003
10,000	4.68 ± 1.39	32.96 ± 4.24	0.0045 ± 0.0011	2.66 ± 0.31	44.51 ± 4.26	0.0028 ± 0.0010	5.39 ± 1.93	44.38 ± 16.79	0.0031 ± 0.0006	4.07 ± 1.71	55.84 ± 20.77	0.0017 ± 0.0005
20,000	15.83 ± 2.45	34.30 ± 3.25	0.0025 ± 0.0006	3.30 ± 0.74	48.18 ± 4.96	0.0030 ± 0.0009	17.54 ± 3.18	52.94 ± 23.18	0.0039 ± 0.0007	5.67 ± 1.21	61.29 ± 21.23	0.0028 ± 0.0008
30,000	45.36 ± 7.32	36.95 ± 3.71	0.0028 ± 0.0009	3.93 ± 0.95	53.49 ± 5.84	0.0018 ± 0.0005	51.28 ± 6.49	60.17 ± 20.62	0.0027 ± 0.0005	6.29 ± 1.57	67.40 ± 19.83	0.0018 ± 0.0004
40,000	91.60 ± 12.03	41.85 ± 4.73	0.0029 ± 0.0007	5.49 ± 1.47	56.81 ± 5.07	0.0023 ± 0.0005	107.45 ± 10.39	64.73 ± 24.25	0.0022 ± 0.0004	6.98 ± 1.81	70.72 ± 18.46	0.0016 ± 0.0003
50,000	110.36 ± 19.88	49.61 ± 4.52	0.0038 ± 0.0010	7.41 ± 1.83	60.03 ± 5.92	0.0027 ± 0.0007	125.81 ± 14.84	68.35 ± 23.49	0.0028 ± 0.0006	8.47 ± 2.03	76.93 ± 20.32	0.0019 ± 0.0003

Table 3. The performance test results of our classifier.

Testing Dataset	Dataset of Day I			Dataset of Day 2			Dataset of Day 3			Dataset of Day 4
Categories	PRE	REC	AUC	PRE	REC	AUC	PRE	REC	AUC	PRE	REC	AUC
$k_{1}$	0.9980	0.9898	0.9929	0.9820	0.9979	0.9740	0.9994	0.9857	0.9919	0.9983	0.9816	0.9888
$k_{2}$	0.9802	0.9682	0.9840	0.9688	0.9848	0.9914	0.9327	0.9945	0.9950	0.8982	0.9862	0.9861
$k_{3}$	0.9688	0.9963	0.9360	0.9914	0.9851	0.9911	0.9430	0.9991	0.9852	0.9667	0.9937	0.9959
$k_{4}$	0.9368	0.8750	0.9962	0.9395	0.9983	0.9994	0.9861	0.7634	0.9917	0.9707	0.7133	0.8740
$k_{5}$	0.8965	0.9982	0.9958	0.9884	0.9440	0.9916	0.9088	0.9966	0.9964	0.8504	1.0000	0.9942
$k_{6}$	1.0000	0.6667	0.9916	1.0000	0.7476	0.9457	1.0000	0.1667	0.9935	1.0000	0.6666	0.8868
$k_{7}$	1.0000	0.5556	0.8837	0.9800	0.7147	0.9983	1.0000	0.6000	0.8536	1.0000	0.4000	0.9980

Table 4. The results of the experiment for detecting the rate of exceeded emissions.

	Loc. I 1	Loc. I 2	Loc. I 3	Loc. II 1	Loc. II 2	Loc. II 3	Avg
Categories	Loc. I 1	Loc. I 2	Loc. I 3	Loc. II 1	Loc. II 2	Loc. II 3	Avg
Excessive NO	7.92%	8.17%	7.25%	7.25%	7.87%	8.26%	7.79%
Excessive HC	10.37%	10.90%	9.39%	11.09%	8.94%	10.83%	10.25%
Excessive CO	7.70%	5.64%	8.07%	5.80%	7.50%	6.29%	6.83%
Excessive NO and HC	2.64%	2.88%	2.33%	2.79%	2.65%	2.47%	2.63%
Excessive NO and CO	0.15%	0.06%	0.14%	0.06%	0.11%	0.07%	0.10%
Excessive HC and CO	0.11%	0.08%	0.11%	0.05%	0.07%	0.12%	0.09%
Excessive NO, HC, and CO	0.00%	0.01%	0.00%	0.00%	0.02%	0.00%	0.01%
Excessive	28.89%	27.74%	27.29%	27.04%	27.16%	28.04%	27.69%
No Excessive	71.11%	72.26%	72.71%	72.96%	72.84%	71.96%	72.31%

Table 5. The performance test results of our classifier.

	Dataset of Day I			Dataset of Day 2			Dataset of Day 3
Locations	CO	HC	NO	CO	HC	NO	CO	HC	NO
Shijiazhuang, Hebei	1.2047%	240 ppm	203 ppm	1.2549%	246 ppm	205 ppm	1.2273%	242 ppm	202 ppm
Hefei, Anhui	1.5472%	258 ppm	222 ppm	1.5194%	253 ppm	215 ppm	1.5249%	255 ppm	220 ppm
Zibo, Shandong	1.1122%	211 ppm	193 ppm	1.2371%	216 ppm	190 ppm	1.1844%	214 ppm	193 ppm

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xie, H.; Zhang, Y.; He, Y.; You, K.; Fan, B.; Yu, D.; Li, M. Automatic and Fast Recognition of On-Road High-Emitting Vehicles Using an Optical Remote Sensing System. Sensors 2019, 19, 3540. https://doi.org/10.3390/s19163540

AMA Style

Xie H, Zhang Y, He Y, You K, Fan B, Yu D, Li M. Automatic and Fast Recognition of On-Road High-Emitting Vehicles Using an Optical Remote Sensing System. Sensors. 2019; 19(16):3540. https://doi.org/10.3390/s19163540

Chicago/Turabian Style

Xie, Hao, Yujun Zhang, Ying He, Kun You, Boqiang Fan, Dongqi Yu, and Mengqi Li. 2019. "Automatic and Fast Recognition of On-Road High-Emitting Vehicles Using an Optical Remote Sensing System" Sensors 19, no. 16: 3540. https://doi.org/10.3390/s19163540

APA Style

Xie, H., Zhang, Y., He, Y., You, K., Fan, B., Yu, D., & Li, M. (2019). Automatic and Fast Recognition of On-Road High-Emitting Vehicles Using an Optical Remote Sensing System. Sensors, 19(16), 3540. https://doi.org/10.3390/s19163540

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automatic and Fast Recognition of On-Road High-Emitting Vehicles Using an Optical Remote Sensing System

Abstract

1. Introduction

2. Preliminaries

2.1. Emission Data Collection

2.2. Collected Data Analysis

2.3. Data Quality Consideration

3. Methods

3.1. Automatic Boundary Detection

3.2. Initial K-Center Determination

3.3. Normalization K-Medoids

3.4. Label and Definition

3.5. Nearest Neighbor Classifier

3.6. Update Cut-Points of Excessive Emissions

4. Experiments and Discussion

4.1. Experiment to Compare Clustering Methods

4.2. Performance Evaluation of the Nearest-Neighbor Classifier

4.3. The Experiment for Detection Vehicles Exceeding the Standard Rate

4.4. The Experiment for Self-Adaptive Cut-Points

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI