4.2. Performance Evaluation Results
In this subsection, we describe the simulation results in several respects. The simulation is processed under the simulation settings, already mentioned above.
First, we derive the local model performance for each edge of the river, increasing the size of
D from
to
. As the value of
D becomes larger, the accuracy of the local model, which is trained on one edge is increased. As we can see in
Figure 6, the local model’s accuracy rises sharply at the moment where
D size is between 20 and 50. When
D size is over 50, the accuracy is slightly higher and is converging to
or more from
D is 100. Because the frequency and number of smart sensors generating water quality observation data in the Keum river are fixed, the total amount of data that exists within the entire system is also defined. Therefore, it is necessary to set the appropriate value of
D in advance. Based on the result in
Figure 6, the simulation is set to target where the local model of each edge guarantees more than
accuracy.
We evaluate the scheduling performance based on the number of I. The value of I indicates whether I edges have enough data and are able to learn the local models in the system. As we set the condition in the proposed scheduling algorithm’s equation, only when each edge gets more than D amount of data, I becomes 1 and the size of D guarantees the corresponding local model accuracy. For these reasons, the results represents that the higher the value of I, the more edges that have sufficient learning data through the matching methods used in the experiment, and each edge can train the green tide prediction model, which can participate in the global model generation. We proceed with the simulation during 50 iterations and compare the proposed optimally fair scheduling algorithm performance results. Baseline matching (i.e., randomly matched method), which ignores filling D, is used as a comparative group.
This evaluation has the performance results, as shown in the
Figure 7. The results, based on the size of
D, are distinguished with different marker types and line colors in
Figure 7 and the black dash line with a plus-shape marker represents a baseline. The baseline mechanism, which is a mechanism to let each edge use all the received data, even if the data are too small and unsuitable for local learning, and always has the largest value that is equal to the number of edges in the system. On the other hand, for the proposed matching algorithm, which lets all edges satisfy the
D size, there are performance differences depending on the value of
D. In
Figure 7, when the training initiation threshold
D is set 20, the proposed scheduling algorithms always match the baseline performance. The larger the setting for the
D value, the greater the amount of data that the edge must receive from the sensors to meet the conditions. The size of sampling data produced by the sensors in every iteration is always constant. For these reasons, the number of edges, which satisfy
D for each period, decreases compared to the results with small
D. However, even in this situation, the proposed algorithm matches the sensors and the edges in an optimally fair direction and produces as many federated learning participants (i.e., edges) as possible.
We also measure the number of edges that satisfies the training threshold
D and the actual data size of the edges over the 50 iteration.
Table 3 shows the number of accumulated edges that satisfies the conditions according to the value of
D, as tested using 5 edges located in Backjae barrage and Gongju barrage. The random matching, which is represented with gray color, results in all edges being able to start local learning with the received data no matter the size. It also has the most variety in the numbers of data distribution. If
, 5 edges always meet the threshold during all iterations. If
, most edges satisfy the
D value according to
Figure 7, and eventually approximately 220 edges always have more than
D data. Even if
, as seen in
Figure 7, only two or three edges meet
D on average, and less than
of the edges of
have 100 or more data.
Figure 8 shows the amount of data the edges have based on the cumulative number of edges that meet the
D condition. For
, edges obtained between 30 and 55 pieces of data, and always exceeds the threshold value of 20. For
,
or more of the cumulative number of edges receive data as much as the value of
D, and in some cases have more than
D. If
, all of the data on the edge in
Figure 8 that are distributed have at least 100 pieces of data applied to approximately 110 cumulative edges as seen in
Table 3. About 140 edges, excluding 110 out of 250 total, always receive less than 100 pieces of data and fail to perform the learning.
Model accuracy is also tested using real data received by the scheduling results above
D. We derive CDF graph for the performance of the local model generated by each edge. In
Figure 9, the results of different
D sizes are distinguished by the color of the line and the shape of the marker. One marker represents the cumulative distribution of model accuracy corresponding to the
x-axis. Both Backje barrage and Gongju barrage have a concentration between 90 and 100 accuracy at
, and have a relatively even distribution compared to other cases. For
, there are models with an accuracy of less than 40, which matches the data distribution between 30 and 50 of
Figure 8. At
, the models generally have performance accuracy of above 80 and, in particular, the number of edges where the accuracy increases rapidly in
performance range.
Consequently, the results of the proposed scheduling algorithm show that the case of
satisfies the appropriate balance between the number of local learning available edges and accuracy. In
Table 4, when
D has value of 50, the largest number of edges during all iterations satisfy target performance. When
D has a small value such as 20, the proposed scheduling with
has a similar ratio to the random scheduling in
Table 4. Even if almost all edges are allowed to start training, the performance of edges being more than
is not always guaranteed. The experiments also present that, if
D is too large, high performance models can always be obtained. However, in this case, each edge should receive more than the threshold value
D and it is impossible to get many edges that can perform learning.