2.2.4. Information Processing

The proposed architecture, and particularly the smart managemen<sup>t</sup> layer that was previously discussed, improves the processing costs, offering results that can be in real time or with programmed offline execution. In this way, the use of the platform as a whole can even be shared between different sets of sensors, avoiding the unnecessary complexity of local managemen<sup>t</sup> at the level of each sensor or group of sensors.

From the point of view of information processing, it is possible to minimize the information traffic between the image analysis system (in the cloud) and the processing sensors. As a way of example, Figure 7 shows how the sensors can make fewer requests by encoding multiple captures into one single image.

**Figure 7.** Detection of vehicles in a 3 × 3 image matrix, which allows, by means of a single sending to the cloud, to summarize 9 s of traffic analysis of a given sensor. The system detects both the number plate and certain characteristics of the vehicle, assigning a confidence value to each detection. (To protect personal data, the first three digits of the license plate have been blurred).

#### *2.3. Methodology to Locate ANPR Sensors in a Tra*ffi*c Network*

Having described the sensors to be located and its operating system, the next step is to determine their best locations on the network. To do this, given (1) a reference demand and tra ffic flow conditions; (2) a tra ffic network, defined by a graph ( *N*,*A*), where *N* is the set of nodes and *A* is the set of links; and (3) the budget of the project (i.e., a number of available sensors), the next aim is to obtain the locations that allow obtaining the best possible tra ffic flow estimation. Depending on the number of sensors to be located, we can achieve total or partial observability of the network according to the flow conditions and the number of routes modelled on it. The suitable locations for these sensors are determined from the use of two algorithms that integrate the previous three elements. In this section, these two algorithms are described.

#### 2.3.1. Algorithm 1: Tra ffic Network Modelling

The method used to build an appropriate network model, given a graph ( *N*,*A*), for tra ffic analysis using plate-recognition based data is the one proposed in [13]. We assume that every node of the network can be the origin and the destination of trips, and therefore the classic zone-based D–D matrix has to be transformed into a node-based O–D matrix used as reference. This matrix is assigned to the network using a route enumeration model. Then, a route simplification algorithm is proposed based on transferring to adjacent nodes the generated or attracted (reference) demand of those nodes that generate or attract fewer trips than a given threshold. Figure 8 shows the operation of this first algorithm that involves the modeling of the network, and whose steps are described below.


**Figure 8.** Flowchart of the algorithm defined for the tra ffic network modelling.

STEP 1: Obtain the node-based O–D matrix: Given an O–D matrix by tra ffic zones in the network, and from some data on the attraction and trip generation capacities of the links that form it (see [13] for more details), it is possible to obtain an extended O–D matrix by nodes, defined as follows:

$$T\_{i\bar{j}} = \hat{T}\_{\text{Zi\"Z}\bar{j}} P A\_i P G\_{\bar{j}} \tag{1}$$

where *Tij* is the number of trips from node *i* to node *j*; *T* ˆ *ZiZj* is the number of trips from zone of node *i* to the zone of node *j*; *PAi* is the proportion of attracted trips at node *i*; and *PGj* is the proportion of generated trips at node *j* which depends on its capacity to attract or to generate trips.


2.3.2. Algorithm 2: The ANPR Sensor Location Model

After defining the traffic network, i.e., the set of routes and its associated reference flows, both are introduced in the location model so that from these, and with the particularities of the sensor to be used, this model allows us to obtain a set of links, *SL*, to locate a certain number of sensors to collect data able to obtain the best possible estimation of the remaining flows of the network. This can be a difficult combinatorial problem to solve, especially when it is required to locate sensors in large networks with a grea<sup>t</sup> number of existing routes (this justify the use the set of routes *Q* instead of set *R*). Next, we propose an iterative problem-solving process to find the best possible solution given a series of restrictions. Figure 9 shows the operation of this second algorithm, and whose steps are described below.


**Figure 9.** Flowchart of the algorithm defined for the ANPR sensor location model.

STEP 1: Solve the optimization problem: The following optimization problem has to be solved:

$$\max\_{\bar{z}, \bar{y}\_q} \mathcal{M} = \sum\_{q \in \mathcal{Q}} f\_q^0 y\_q \tag{2}$$

subjected to

$$\sum\_{a \in A} P\_a z\_a \le B \quad \forall a \in A \tag{3}$$

$$\sum\_{a \in A} \delta\_a^q z\_a \ge y\_q \quad \forall q \in \mathbb{Q} \tag{4}$$

$$\sum\_{a \in A \cup \delta\_a^q + \delta\_a^{q1} = 1} z\_a \ge y\_q \quad \forall (q, q1) \in \mathbb{Q}^2 \left| q > q1; \sum\_{a \in A} \delta\_q^a \, \delta\_{q\_1}^a > 0 \right. \tag{5}$$

$$z\_a = 0 \quad \forall a \in \text{NSL} \tag{6}$$

$$\sum\_{a \in A} S\_a^{iter} z\_a \le \sum\_{a \in A} S\_a^{iter} \quad \forall inter \in I \tag{7}$$

The objective function (2) maximizes the distinguished route flow in terms of *f* 0*q* ; *yq* is a binary variable equal to 1 if a route can be distinguished from others and 0 otherwise. Constraint (3) satisfies the budget requirement, where *za* is a binary variable that equals 1 if link *a* is scanned and 0 otherwise. This constraint guarantees that we will have a number of scanned links with a cost *Pa* for link *a* that does not exceed the established limited budget *B*. Constraint (4) ensures that any distinguished route contains at least one scanned link. This constraint is indicated by the parameter δ*qa*, which is the element of the incidence matrix. Constraint (5) is related to the previous constraint since it indicates the exclusivity of routes: a route *q* must be distinguished from the other routes in at least one scanned link a. If δ*qa* + δ*q*1*a* = 1, this means that the scanned link *a* only belongs to route *q* or route *q*1. If *za* ≥ *yq* and *yq* = 1, then at least one scanned link has this property; on the other hand, if *yq* = 0, then the constraint always holds. Constraint (6) is an optional constraint that allows a link to not be scanned if it belongs to a set of links not suitable for scanning *NSL*. This restriction will make the binary variable *za* equal to 0, and therefore a sensor cannot be located on it. The intention of defining this constraint will be discussed with more detail in the next section. Finally, since

this model is part of an iteration process (see [13] for more details), an additional constraint (7) is proposed, which allows us to obtain di fferent solutions of *SL* sets for each iteration performed through the definition of *Siter a* , which is a matrix that grows with the number of iterations *I*, in which each row reflects the set *SL* resulting from each iteration carried out up to then by the model. Therefore, if an element of *Siter a* is 1, means that link *a* was proposed to be scanned in the solution provided on iteration *iter* and 0 otherwise. Each iteration keeps the previous solutions and does not permit the process to repeat a solution in future iterations. That is, each iteration carried out by the algorithm is forced to search for a di fferent solution *SLiter* with the same objective function (2).


$$\min\_{f\_{\mathcal{C}}, v\_{u}} Z\_{\boldsymbol{u}} = \sum\_{\boldsymbol{c} \in \mathcal{C}} \mathcal{U}\_{\boldsymbol{c}}^{-1} \left( \frac{f\_{\mathcal{C}} - f\_{\mathcal{c}}^{0}}{f\_{\mathcal{c}}^{0}} \right)^{2} + \sum\_{\boldsymbol{a} \in SL} Y\_{\boldsymbol{a}}^{-1} \left( \frac{v\_{\mathcal{U}} - \overline{v}\_{\boldsymbol{a}}}{\overline{v}\_{\boldsymbol{a}}} \right)^{2} \tag{8}$$

subjected to

$$
\overline{w}\_s = \sum\_r \beta\_s^\varepsilon f\_\varepsilon; \quad \forall s \in \mathcal{S} \tag{9}
$$

$$v\_a = \sum\_{\mathcal{I}} \delta\_a^c f\_c; \quad \forall a \in \mathcal{A} \tag{10}$$

$$f\_c \ge 0; \quad \forall c \in \mathbb{C} \tag{11}$$

$$
v\_{a} \ge 0; \quad \forall a \in A \tag{12}$$

where *U*−<sup>1</sup> *c* and *Y*−<sup>1</sup> *a* are the inverses of the variance–covariance matrices corresponding to the flow in route *C* and the observed flow in link a; *ws* is the observed flow in each set *OSCSL*; *fc* is the estimated flow of routes in set *C*; β*c s* and δ*c a* are the corresponding incidence matrices of relationship between observed link sets *s* and links *a* with routes.

STEP 4: Check the quality of the solution. Once the flow estimation problem has been resolved, the quality of the solution in absolute terms, can be quantified as follows:

$$RRMAE = \frac{1}{n} \sum\_{a \in A} \frac{\left| \upsilon\_a - \upsilon\_a^{real} \right|}{\upsilon\_a^{real}} \tag{13}$$

where *RMARE* is the root mean absolute value relative error; *n* is the number of links in the network; and *va* and *vreal a* are the estimated flow and (assumed) real flow for link *a*. Such error is calculated over the link flows since the number of them remain constant regardless of the network simplification and the *SL* set used. Each value of *RMARE* indicates the quality of the estimation by using the set *SL* for the tra ffic network. As said above, due to the complexity of the problem, it has not a unique solution so we propose to evaluate a grea<sup>t</sup> amount of combined solutions in an iterative process. This iterative process, which is shown in Figure 9, is carried out since Step 1 a number of iterations equal to the maximum considered *itermax*. For the solution found in the first iteration, the value of *RMARE* will be considered as the best, but in the following iteration, the algorithm could find another solution with lower value of and it will be considered as best. All the solution found and tested in each iteration are stored in *Siter a* matrix, which grows in size during the performance. Finally, the best solution or set *SLbest*, for the established conditions, will be the one provided with the lower *RMARE* value.

#### **3. The Application of the Proposed System in a Pilot Project**

In this section, the proposed low-cost system for tra ffic network analysis is applied in a pilot project in a real network to demonstrate its viability and also to test the influence that some inputs of the Algorithm 1 (i.e., the network modelling) have in the results of the Algorithm 2 (i.e., the expected tra ffic flow estimation quality).

#### *3.1. Description of the Project and Particularities about the Position of Sensors in the Streets*

The network chosen to develop the pilot project was the tra ffic network of the University Campus of Ciudad Real (Spain), delimited in Figure 10, consisting of 75 nodes and 175 links. To consider the influence that the other districts have on this network, links connected to the contour were also modelled. With this, the set of links, its capacity and cost characteristics are established (these characteristics are available by requesting them to the corresponding author). The O–D matrix *<sup>T</sup>*<sup>ˆ</sup>*ZiZj* was first defined, where each zone *Z* contains a certain set of nodes (see Table 2) resulting in a total of 15 zones as shown in Figure 10, while the reference O–D matrix by zones is shown in Table 3.

**Figure 10.** Traffic network to be modelled for analysis: Plan view of the urban area of Ciudad Real and the delimited Campus area to be modelled; Ciudad Real Campus network and its division in 15 traffic zones.


**Table 2.** List of nodes on the tra ffic network and their associated zones.

**Table 3.** O–D trip matrix per defined zones.


From a technical point of view, installing each tra ffic sensor in the streets of a city can be a complex task depending on the configuration and characteristics of the network and the elements that configure it. In the case of links or linear elements, the configuration of their infrastructure and the flow conditions, such as its distribution and intensity along a day, may condition the choice of one or another location, or even consider whether or not a link is a candidate for a sensor to be located. This section deals with the specific problems of installing the sensor in some links.

After a first test of the sensor in the streets of the project, we found a set of links that, due to their physical characteristics, may di fficult the sensor to be installed. As shown in Figure 11a in violet, this set is formed by the links that make up the external corridor that connects the ends of the network, which is one of the main arterials of the city. In this type of links (see images 1 and 2 in Figure 11b), the vehicles can reach higher speeds and flow densities, which can make it di fficult to capture the data because the license plate is not read correctly due to the occlusion of other vehicles as these links have two lanes per direction. Installing sensors in links where their characteristics make such a task di fficult, may involve higher installation and/or operating costs, increasing the possibility that the data that they collect may have errors that may disturb the results of the analysis and the estimation of the remaining flows. The problems that wrong readings of plate license may involve on the flow estimation results have been studied in detail by [20]. Despite these links being very important since the greatest flow of vehicles takes place in them as they are one of the most important arterial corridors of the city. Therefore, the e ffects of not locating a sensor on them must be investigated.

**Figure 11.** (**a**) Representation of the corridor, in violet, where the greatest flows and difficulties in installing a sensor take place; (**b**) Images of several links on the network.

With respect to the rest of links (see images 3 and 4 in Figure 11b), the results obtained from the sensor were very satisfactory resulting all of them suitable and hence being eligible. The flow conditions make feasible the correct identification of the vehicles regardless of its speed and flow density.

To sum up, the sensor location model described in Algorithm 2, has to consider the possibility of avoiding some links which, despite the fact that the greatest flow is concentrated by them, their characteristics make their installation difficult. This may have an impact on the results, since it seeks to obtain the best estimate of flows in the network with the best combination of scanned links. This observation is considered in the sensor location model with the inclusion of constraint (6), which has been described as a restriction that considers that for the arcs belonging to the *NSL* set, their binary variable is null, and therefore they are not suitable for having a sensor installed. Considering this topic can put a risk in obtaining better or worse estimation results, so an analysis is necessary to show that, by avoiding these links, the expected results of the traffic estimation can be similar. Next section below deals with a deeper analysis.

Finally, within the links that are suitable to be scanned, it is important to assess the different locations in them to obtain the correct reading of the license plates (see Figure 12). Here it is necessary to consider the orientation of the sensor with respect to the flow (i.e., recorded from the front or rear of the vehicle); the presence of fixed elements or obstacles present that make it difficult to identify the vehicle; the lighting among others.

**Figure 12.** Examples of installed sensor at strategic points: (**a**) Exit from an intersection and enter a link with two lanes; (**b**) Link with a single lane with unidirectional flow; (**c**) Entrance to a link with a single lane.

#### *3.2. Analysis of the Results*

To obtain the traffic network model, we have applied the proposed Algorithm 1, where, in addition to the above described input data, the important values of k and *Fthres* need to be defined. Therefore, with object to check the network simplification effects (Steps 4 to 6) on the estimation results (obtained with the Algorithm 2), it was decided to vary the value of the threshold flow *Fthres*, establishing values equal to 10, 15, 20, 25, and 30 trips per hour. Regarding the k parameter used in the enumeration model of Step 2, there are usually certain discrepancies between transportation analysts and engineers about its best value. High values are usually rare to find in the literature due to the high computational cost that it would entail, and also because the existence of more than 3–4 routes per O–D is very unlikely [39]. For this pilot project, it was considered to select reasonable values of *k* equal to 2 and 3, whose effects on results will be analyzed in the next sections. In Step 1, a 15 × 15 matrix by zones *T*ˆ shown in Table 2 was transformed into a matrix discretized by nodes *T* with size 75 × 75, resulting in a total of 608 O–D pairs. To obtain the set of existing routes assumed to be "real", the node-based O–D matrix *T* is affected by a random uniform number (0.9–1.1), and the assignment was done using *k* = 4 with to obtain the respective "real" link flows.

Regarding Algorithm 2, some of its inputs come from the outputs of Algorithm 1 (the traffic network and the routes). Due to the budget restrictions of the project (related to *B* parameter), an amount of 30 sensors was set to be used in the network. Therefore, for the different models studied, a fixed value *B* equal to 30 has been considered. Note that in relation to the number of links in the network, this quantity may be insufficient to obtain total observability, but it will be interesting to see to what degree of good estimation it is possible to obtain.

To sum up, in this section, three analysis of results are carried out:


#### 3.2.1. Effect of Varying the *k* Parameter

Vary the *k* parameter means more or less number of routes in the modelled traffic network are considered, conforming part of the information with which the model must work. The consideration of such a parameter in this project has been through the use of a route enumeration algorithm, selecting values of 2 and 3 for the example presented. For this first analysis, it has been considered to analyze a not very simplified network scenario, considering a *Fthres* equal to 10, i.e., all the nodes that attract or generate less than 10 trips lose its condition of origin and/or destination.

An important aspect that has been studied in this first analysis is related to the consideration of a set of links *NSL* ∈ *A*, where the cost and difficulties of installing a scanning sensor are greater than other links in the network. For the shake of brevity, we have decided to undertake a joint study of the *k* parameter influence and the effects of including some conflicting links in set NSL. A first scenario (Model A) where all the links have the same opportunity of install a sensor, which means that all the links have the same cost *Pa* equal to 1. In the second scenario (Model B) a certain set of links (those corresponding to corridor shown in Figure 11a), are included in set *NSL* so a sensor cannot be installed in them.

Figure 13 shows the effects of these considerations on the results of the model. There are four well-differentiated lines in pairs, one assigning a *k* equal to 2 and another equal to 3. Each jump in the graph means that the location model has found a better set of scanned links *SL* that improves the

solution in terms of error, and the horizontal sections mean that the model has not been able to find a better solution.

It is observed that considering a higher value of *k*, the results of the model are better in terms of the error in the estimation of traffic flows. We clearly see how a *k* = 3 obtains quite better results than considering a *k* = 2 due to the existence of a higher number of routes per each O–D pair. In particular, when considering *k* = 2, we are operating with a set *R* of 2074, as opposed to the 2943 routes considering *k* = 3. To define the set of "real" routes and their associated flows, a value of *k* = 4 has been considered, resulting in a set of 4274 routes in total.

**Figure 13.** Evolution of *RMARE* value for the different cases varying the *k* parameter and the threshold flow.

The most interesting demonstration arises when Model A and Model B are compared. Despite considering a certain amount of links in set *NSL*, the results of both models reach almost the same *RMARE* value. We therefore see, in this particular case, how considering or not certain links to install the sensors does not produce a relevant difference in the estimation error to be obtained. In view of this, the following analysis will only consider Model B to avoid installation problems.

Table 4 shows the best *SL* sets obtained for each case after completing the iterative process. In it, the links that are common in both sets are marked in bold, seeing how a certain amount of them remains fixed, and the others are changing due to the modification of the location model through the constraint (6). This is clearly seen in Figure 14, where the optimal locations of the sensors are outlined in 30 of the links that make up the network. In this it is seen how a set of sensors, marked in blue, are located in *NSL*, i.e., when Model A was used. For Model B, it is seen how those sensors are moved to other links, now marked in orange. This change in location leads to an improvement in the estimation results, indicating that there would be no problem locating sensors in links in which, despite having a lower flow, there is a greater probability of obtaining data with lower mistakes.

**Table 4.** Best set of links (*SL*) sets obtained from the variability analysis of *k* parameter.


**Figure 14.** Upper figure: Optimal sensor locations considering *k* = 3, *Fthres* = 10, and all links as candidates to be scanned; Lower figure: Optimal sensor locations considering *k* = 3, *Fthres* = 10, and a certain set non-candidate scanned links (*NSL*) of links as no candidates to be scanned.

#### 3.2.2. Effect of Network Simplification

When the value of *Fthres* is small, the proposed methodology will do a smaller simplification of the network, and therefore it is expected to lead to a lower error in the estimation of traffic flows. As *Fthres* increases, there will be a greater degree of simplification, and therefore greater error in the estimation. Figure 15 shows this effect all the cases modelled with *k* = 3. It is observed that lower *Fthres* values, and therefore less simplification, tend to smaller error values. In any case, depending on the *Fthres* value, the graphs reached to a certain convergence after having performed multiple iterations with the proposed location model. For example, we see how the best solution is achieved with a minimum error difference when considering a *Fthres* equal to 10 or 15 and for *Fthres* equal to 20, 25, and 30.

**Figure 15.** Evolution of *RMARE* value for the different cases varying the threshold flow.

The effects of variation in threshold flow are shown in Table 5. In it, a first column is defined for each evaluated case, and a second that collects the number of routes in the set *R*, which is the same for all of them since the same value of *k* = 3 was used; third column collects the number of routes set *Q* once set *R* has been simplified with the value of *Fthres*; a fourth column that includes the number of additional routes included when locating the sensors in the best set obtained for each case; and a last column that considers all the routes in *C* used in the estimation model. In this table, it can be seen that, with less simplification, the set of routes in *C* with which we work is greater, and therefore the estimations are better. As the simplification increases, more routes are simplified and this means that, by locating the sensors, a greater number of routes are recovered, but a set *C* on similar order of magnitude.

**Table 5.** Number of routes that appear according to the considered *Fthres* value.


Finally, Table 6 indicates the *SL* sets obtained for the cases where *Fthres* is equal to 10 and 15, since they offer the best results and where there is little difference in the best *RMARE* estimation error. We see how for both sets, there is only a difference of 8 links from the 30 considered, the rest being common in both. For both sets, constraint (6) is considered in the location model, and therefore no links belonging to *NSL* appears. Furthermore, this tells us how, depending on how the network is

modeled, one set or another may be obtained, with small di fferences but which may influence the observability and estimation results of the network.

