Adam Optimizer

Next, we employed Adam as an optimizer. Adam describes the phrase "adaptive moment estimation". Adam is an optimization algorithm that may be used to update network weights iteratively based on training data instead of the traditional stochastic gradient descent procedure. We can note the following advantages of employing Adam for non-convex optimization problems. Its implementation is simple. It is effective in terms of computation. There are not that many memory demands. The gradients are invariant in regard to diagonal rescaling. It is ideally suited to issues involving a large number of data and/or parameters. It is a grea<sup>t</sup> option for non-stationary objectives. It is suitable for

gradients that are exceedingly noisy or sparse. Finally, the hyper-parameters are easy to read and usually do not require much adjustment. Adam is a stochastic gradient descent extension that combines the benefits of two earlier extensions, the Adaptive Gradient Algorithm (AdaGrad) and Root Mean Square Propagation (RMSProp). Adam is a popular deep learning method, since it produces good results swiftly. The results were plotted and reviewed during the evaluation and visualization phase. The model configuration of the experiments is shown in Table 10.



### *4.6. Voice Module*

A multiplicity of application programming interfaces (APIs) are now accessible for a variety of activities that formerly required significant programming effort on behalf of developers. When working with audio file data, the job becomes more challenging. As a result, we relied on Google's speech-to-text engine [147], which can transcribe any audio while maintaining the context and language. The API supports up to 120 languages. Other functions include voice command and control, call center audio transcription, real-time streaming, pre-recorded audio processing, and others. The Google speech-to-text tool can successfully translate written text into grammatically and contextually relevant speech using a range of natural voices. The Google text-to-speech API enables developers to interact with customers through speech user interfaces in devices and applications and customize the communication depending on voice and language preferences.

The Voice Module, for example, allows the user to generate the dataset and transition between different development and operation phases using voice commands. To begin the process of producing a dataset, the user types the command "Train". The system then asks the user "what is the obstacle class?" in order to classify the incoming data. The user specifies the obstacle, such as "Wall". The system then requests that the user to "Specify the dataset file name". Finally, the user enters the file name verbally.

### **5. Performance Evaluation**

We now analyze the performance of the LidSonic V2.0 system: Section 5.1 discusses the performance using the machine learning models and Section 5.2 discusses the system performance using the deep learning models.

## *5.1. Machine-Learning-Based Performance*

There are several metrics defined in the Weka software that can be computed by the model and are useful for measuring the performance. Accuracy is defined as the percentage of properly classified instances. Precision is defined as the percentage of expected positives that were correctly classified. Table 11 displays the accuracy and precision of the six machine learning models, adopting three classifiers to construct models from the two datasets, DS1 and DS2.


**Table 11.** Evaluation of the Machine Learning Models.

The results are depicted in Figure 16. The results indicate that using DS2 with Random Committee and IBk classifiers increases accuracy to (95%) and (95.15%), respectively, with equal precision results of (95.2%). The KStar classifier, on the other hand, has greater accuracy (95.44%) and precision (95.6%) when utilizing DS1.

Figure 17 plots the model training times required for the top three classifiers. The longest time was spent building the classification model using DS1, compared with the time spent building the classifiers using DS2. Using DS1, RC required the longest time for building (299 ms), followed by KStar (42 ms) and IBk (1 ms). For DS2, RC required the longest time for building (128 ms), followed by KStar (11 ms) and then IBk (1 ms). While RC classifier requires the longest time to build its model, it requires the shortest time to predict an object (see Figure 18).

**Figure 17.** Time Required to Build a Model for each Classifier.

**Figure 18.** Classifiers' Inference Elapsed Time in Milliseconds for (**a**) DS1 Test Samples and (**b**) DS2 Test Samples.

Figure 18 plots the model inference times for the top three classifiers. For both the DS1 and DS2 test samples, we constructed 10 test predictions and timed each classifier in order to produce the outcome. The time required for KStar to forecast an object varied between 22 and 55 milliseconds when using the DS1 test samples and between 9 and 13 milliseconds when using the DS2 test samples. The IBk classifier, on the other hand, showed faster times than KStar, with an average of 4–5 milliseconds for the DS1 test samples and 0–1 milliseconds for the DS2 test samples. Random Committee required a substantial amount of time to develop its training model, and it predicted the test samples in less than 1 millisecond for both the DS1 and DS2 test samples.

It is worth mentioning that the IBk and KStar algorithms trained and generated models faster than the Random Committee algorithm (see Figure 17). Random Committee required 299 milliseconds to create its trained model using DS1, whereas for DS2 it required 128 milliseconds. KStar additionally required a long time to construct its training model, taking 42 ms for DS1 and 11 ms for DS2. On the other hand, for both DS1 and DS2,the IBk classifier built the trained model in 1 ms. As a consequence, we sugges<sup>t</sup> that the IBk classifier is preferable over the Random Committee and KStar classifiers for the purpose of mobile adaption, embedded microprocessors, and/or large datasets. Although there is a modest accuracy trade-off, the IBk classifier delivers a significant decrease in the mobile computing and battery usage.

When approaching the design of a system architecture that takes advantage of fog or cloud in the training phase, we sugges<sup>t</sup> the KStar classifier, especially in the case of a larger training dataset that will be trained in the layers of fog or cloud, because it can exploit its computation processing powers and return results for the edge level. In the case of the RC classifier, it can be trained in the higher layers (cloud or fog) and the constructed model can then transfer to the edge, because its prediction time is the shortest among the three classifiers.

Figure 19 depicts the confusion matrix with the best training model score, which was obtained using the KStar classification method with the D1 dataset. Figure 20 shows the confusion matrix of the Random Committee classifiers trained on DS2. Figure 21 shows the confusion matrix for the IBk classifier. Because these were the top three highest performing classifiers identified in our previous study, we chose to exhibit the confusion matrices of these three classifiers. The abbreviations used in the figures are listed in Table 12.

### **Table 12.** Class Abbreviation.


**Figure 19.** KStar Confusion Matrix using DS1.

**Figure 20.** Random Committee Confusion Matrix Using DS2.

**Figure 21.** IBk Confusion Matrix Using DS2.

Figure 19 plots the confusion matrix of the KStar classifier. The highest number of true positives was obtained for the Floor class (109), followed by Ascending Stairs (108), Descending Stairs (77), Wall (87), Deep Hole (20), High Obstacle (144), Ascending Step (55), and Descending Step (49). The total number of wrong predictions for the classes are Floor (2), Ascending Stairs (0), Descending Stairs (3), Wall (0), Deep Hole (0), High Obstacle (5), Ascending Step (7), and Descending Step (5). Obviously, the number of wrong predictions should be considered relative to the total number of instances. It is possible that the higher number of wrong predictions for some classes is due to the low number of instances of the data objects for those classes. Note also that the Floor was misclassified two times as Descending Step. Descending Stairs were misclassified 3 times as Floor and 9 times as Descending Step. The High Obstacle class was misclassified 2 times as Ascending Stairs and 3 times as Ascending Step. Ascending Step was misclassified 2 times as Floor, 2 times as Ascending Stairs, 1 time as Descending Stairs, and 2 times as High Obstacle. Descending Step was misclassified 3 times as Floor and 2 times as Ascending Step. On the other hand, Ascending Stairs, Wall, and Deep Hole classes had no misclassified results.

Figure 20 plots the confusion matrix for the Random Committee (RC) classifier. The highest number of true positives was obtained for the High Obstacle (146) and Ascending Stairs (108) classes, followed by Floor (104), Wall (86), Descending Stairs (83), Ascending Step (58), Descending Step (45), and Deep Hole (17). The total number of wrong predictions for the classes are Floor (7), Ascending Stairs (0), Descending Stairs (6), Wall (1), Deep Hole (3), High Obstacle (3), Ascending Step (4), and Descending Step (9). Note that Floor was misclassified 2 times as Descending Stairs, 3 times as Ascending Step, and 2 times as Descending Step. Descending Stairs were misclassified 2 times as Floor and 4 times as Descending Step. The Wall class was misclassified 1 time as High Obstacle. Deep Hole was misclassified 2 times as Descending Stairs and 1 time as Descending Step. The High Obstacle class was misclassified 2 times as Wall and 1 time as Ascending Step. Ascending Step was misclassified 2 times as Floor, 1 time as Ascending Stairs, and 1 time as Wall. Descending Step was misclassified 6 times as Floor, 2 times as Descending Stairs, and 1 time as Ascending Step. Ascending Stairs had no misclassifications.

Figure 21 plots the confusion matrix for the IBk classifier. The highest number of true positives was obtained for the High Obstacle (141) and Ascending Stairs (106) classes, followed by Floor (102), Wall (86), Descending Stairs (80), Ascending Step (56), Descending Step (47), and Deep Hole (19). The total number of misclassifications for the classes were Floor (9), Ascending Stairs (2), Descending Stairs (9), Wall (1), Deep Hole (1), High Obstacle

(8), Ascending Step (6), and Descending Step (7). The greater numbers of incorrect predictions for some classes are likely due to the low number of instances of the data objects for those classes, as we saw with the KStar classifier. Note that Floor was misclassified 4 times as Ascending Step and 5 times as Descending Step. Ascending Stairs were misclassified 2 times as High Obstacles. The Descending Stairs class was misclassified 2 times as Floor and 7 times as Descending Step. The Wall class was misclassified 1 time as High Object and zero times as any other class. Deep Hole was misclassified 1 time as Descending Step and zero times as any other class. The High Obstacle class was misclassified 1 time as Ascending Stairs, 1 time as Wall, and 6 times Ascending Step. Ascending Step was misclassified 3 times as Floor, 2 times as Ascending stairs, and 1 time as Descending Stairs. Descending Step was misclassified 5 times as Floor, 1 time as High Obstacle, and 1 time as Ascending Step. Wall and Deep Hole had the least number of misclassifications.

### *5.2. Deep-Learning-Based Performance*

Observing the performance of neural networks and deep learning models over time during training helps to provide researchers with knowledge about them. Keras is a Python framework that encapsulates the more technical TensorFlow backends and provides a clear interface for generating deep learning models. We used Keras in Python to evaluate and display the performance of deep learning models over time during training so as to measure their accuracy and loss. Note that, here, the deep learning models were trained and executed on a laptop device. Future work will attempt to implement deep learning models on mobile phones and other edge devices using TFLite, as in our other strands of research [148]. Table 13 summarizes the findings.

**Table 13.** TensorFlow Model Evaluation.


The TModel1 and TModel2 accuracy and loss results were plotted throughout each period and are presented in Figures 22 and 23, respectively. Using different datasets, we can see the significant differences in performance. The TModel2 trained model, which is used for DS2, has a 98.01 percent training accuracy and a model loss of 0. 0883 percent. The model accuracy for the test dataset was 96.49 and the test model loss was 0.3672. The model accuracy of the TModel1 is 88.05 and it has a loss of 0.3374. The test model accuracy is 76.32 and it has a loss of 0.7190.

**Figure 22.** Deep Learning Evaluation Using DS1: (**a**) TModel1 Accuracy and (**b**) TModel1 Loss.

**Figure 23.** Deep Learning Evaluation Using DS2: (**a**) TModel2 Accuracy and (**b**) TModel2 Loss.

Figure 24 plots the confusion matrices of the TModel1 and TModel2 deep learning models used on a test dataset that the trained model was not exposed to. For TModel1, the highest number of true positives were obtained for the Floor (20) and Ascending Stairs (18), followed by Wall and High Obstacle (15), Descending Stairs (9), Descending Step (5), Ascending Step (4), and Deep Hole (1). The total number of wrong predictions for the classes are Floor (13), Ascending Stairs (2), Descending Stairs (3), Deep Hole (1), High Obstacle (2), Ascending Step (3), and Descending Step (4). Wall was classified correctly. The Floor was misclassified 1 time as Descending Stairs, 7 times as Ascending Step, and 4 times as Descending Step. Ascending Step was misclassified 2 times as High Obstacle. Descending Stairs has been misclassified 1 time as High Obstacle and 2 times as Ascending Step. Deep Hole had one misclassification as Descending Step. High Obstacle was misclassified 1 time as Floor and 1 time as Ascending Stairs. Ascending Step was misclassified 1 time as Floor, 1 time as Ascending Stairs, and 1 time as High Obstacle. Descending Step was misclassified 2 times as Floor, 1 time as Descending Stairs, and 1 time as Ascending Step.

**Figure 24.** Deep Learning Confusion Matrix on the Test Dataset: (**a**) TModel1 and (**b**) TModel2.

For TModel2, the highest number of true positives was obtained for the High Obstacle (25) and Ascending Stairs (20) classes, followed by Descending Stairs (17), Wall (12), Descending Step (9), Ascending Step (7), and Deep Hole (4). The total number of

wrong predictions for the classes are Floor (1), Ascending Step (2), and Descending Step (1). Note that the Floor was misclassified 1 time as Ascending Stairs. Ascending Step was misclassified 1 time as Floor and 1 time as Descending Stairs. Descending Step was misclassified 1 time as Floor. Ascending Stairs, Descending Stairs, Wall, Deep Hole, and High Obstacle had no misclassifications.
