1. Introduction
Modern mechatronic systems require elastic tools for design, control and data analysis. Often, significant challenged arise due to data uncertainty, robustness against disturbances and model-free modeling caused by real data processing and problems with mathematical descriptions of issues. Neural networks seem to be an appropriate and useful solution to a number of the issues encountered. Today, due to the availability of efficient programmable devices and algorithms presented in the literature, interest in implementation of neural processing in electric vehicles is also increasing [
1,
2,
3]. This paper describes the application of modules based on different types of neural networks in a real model of a moving platform.
Presently autonomous vehicles are becoming increasingly popular. Using such vehicles in industry and testing the systems in real-life applications is becoming more commonplace. Unmanned Aerial Vehicles (UAVs) have been developed for monitoring hazardous zones such as minefields or remote and inaccessible locations. The autonomous vehicles are also being tested for use in public areas; tests of unmanned shuttle buses or surveillance vehicles have been conducted. In those cases, the zone was strictly defined, and the vehicles had to react to changing conditions and obstacles. Autonomous platforms for delivery and warehouse movement are also being tested also under development [
4,
5,
6,
7,
8].
The data processing in autonomous vehicles should be based on human brain functionality because there is no possibility to predict all situations and issues while designing a control system. There is a need to use control structures to adapt to the driving environment and give the vehicles the capacity to infer commands based on sensor data and gathered knowledge without driver (or external user) assistance. The environment, laws and driving habits are different all around the world, and it is impossible to create a universal database of driving rules (
Figure 1). The autonomous self-learning cars are also supposed to increase safety in traffic, improve vehicle flow and reduce energy consumption [
9,
10,
11]. According to SAE J3016, classes of vehicle automation are precisely defined. In Classes 4 and 5 of SAE J3016, the system is responsible for driving, reacting to obstacles and handling emergency situations [
12]. The difference between them is the operational domain. While Class 5 vehicles are supposed to react in every environment, Class 4 vehicles are only autonomous in defined driving conditions [
13]. While higher classes indicate novel autonomous systems which are still under development, Classes 0–3 are already implemented in the vehicles available on market. Even though, Class 0 represents vehicles without autonomous skills, the sensor systems may be implemented to gather environmental data and display proper information for drivers. The higher classes introduce the ability to interpret sensor data in particular driving tasks, such as driving in parking lots or lane assistance. However, the driver remains the main supervisor; the systems are only their assistants [
14].
The increase in popularity of neural networks in a variety of applications is observed in the literature. Several existing applications can determine the sizes and structures of neural networks. Neural networks may be used as controllers, estimators and classifiers, which makes them very popular tools in the design of control structures [
15,
16,
17,
18,
19,
20,
21]. With the development of hardware and high demand, new neural structures need to be developed. Deep learning convolutional networks are the most powerful structures which have overcome the fully connected multilayer perceptron neural networks in data analysis and image classification [
22,
23]. A comparison of the efficiency shows that the input data for deep networks can be unprocessed; however, the demands in terms of hardware are incomparable (higher). While multilayer perceptron neural networks are usually implemented with DSP processors and FPGA matrices [
24,
25], the deep neural structures require special processing units or multi-core processors. It can be seen that the implementation of neural classifiers on powerful GPUs is also becoming more popular [
26]. Nevertheless, this research attempts to implement different neural structures with low-cost hardware. The proper design and applied simplification are described in detail to run a NN on an 8-bit microcontroller. The last part of the paper describes a simple deep network which was trimmed to work with an inexpensive specialized coprocessor in an RISC-V processor.
The goal of this work was to design and create a model of an autonomous platform enabled for driving in a defined environment using low-cost hardware. Because of the restricted driving conditions (of Class 4), some assumptions can be defined [
27]. The first design problem was related to potential of implementing
Artificial Intelligence (AI) algorithms on Arduino development boards. The control structure utilizing neural computation was supposed to collect sensor data and commands to develop a self-learning path planner. The neural controller should cooperate with the neural image classifier. The control commands should be accessible in the microcontroller system. The control structure should be able to use neural computing for gathering sensor data and operate the hardware, for example DC motors. This paper not only provides a general review, but also a detailed descriptions of design, used hardware and test results.
A microcontroller system consisting of three development boards was designed (
Figure 2). The main controller was implemented in Arduino MEGA 2560. The main advantage (in this project) of the board is the variety of accessible input/output ports and multiple communication modules (e.g., 4 UART serial ports). The main device was responsible for data collection, distance estimation and commands processing. Motor controllers were driven by Arduino UNO reacting to commands received by UART. Drive encoders and current sensors were wired directly, enabling low-level control. The vision system was developed with the most powerful development board, Sipeed Maix Bit. It can control the camera and perform neural object classification. An in-vehicle network was created by establishing a connection between the mentioned boards using UART communication. To reduce complexity of data transfer, simple commands were defined.
This paper consists of seven sections. After the introduction presenting main issues related to the models of electric vehicles and indicating the main concept of the work (details of the autonomous model based on neural data processing), the neural network applied for the analysis of data from distance sensors is described. Next, the feed-forward neural network applied as reference command manager is proposed. The following section deals with the adaptive speed controller implemented for the DC motors. Additionally, a deep learning technique is used for images classification. The points mentioned above complete the details of the neural modules used in the real model of the vehicle. Thus, a subsequent part of the manuscript focuses on the construction of the real platform. Then, tests and short concluding remarks are shown.
2. Neural Distance Estimator
The platform was equipped with two optical distance sensors mounted on the front. Their mission was to measure the distance to the nearest obstacle. The first sensor was an analog optical sensor-Sharp GP2Y0A41SK0F (whose measurement range is 0.04–0.30 m). The second one was a digital sensor VL53L0X (with a proper working distance up to 2.50 m). Because of the optical technology, it was assumed that real results may change due to lighting conditions and the surface type. The tests were conducted to compare results obtained with the sensors mentioned above with actual distance to obstacle. All sensors were placed on a stationary support enabling the researchers to take all measurements without any accidental displacement. The results obtained during the test are shown in
Figure 3. It should be noted that the achieved values do not define properties of the tested sensors but introduce an assessment of the implementation possibility in the described robot (in specific working conditions).
It should be noted that while VL53L0X sends an explicit result in millimeters via the I
2C protocol, the output of the Sharp sensor is the voltage level corresponding to the distance. After calculations of the voltage
VADC (using ADC—
Analog to Digital Converter), the distance has to be calculated using Equation (1) [
28,
29,
30,
31]. The formula seems to be unsuitable for simple 8-bit controllers because of the fractions, exponentiation and multiplications (according to observations, it takes more than 1 ms to compute).
where
ln is the distance to the obstacle, and
VADC is the voltage measured and calculated by the ADC module.
The input vector of the designed neural structure consists of an ADC value and a VL53L0X sensor response. For training purposes, the real reference value of the distance was the target. A neural network structure was created and trained in Matlab software [
32]. Because of the fact that the neural network was designed to be implemented in a low-cost microcontroller system (i.e., Arduino), two hidden layers (with two nodes each) were implemented (
Figure 4). For weight optimization, the Levenberg–Marquardt algorithm was used [
33]. To implement the trained neural structure, it was necessary to deactivate the preprocessing of input data in Matlab. By default, normalization of the input vector is computed as follows:
where
and
are max and min values of the target matrix, respectively, and
xmax,
xmin are maximum and minimum values of the input vector, respectively.
Unfortunately, without any configuration, The normalization process was computed for every input row independently [
34]. However, when the preprocessing was disabled, it was possible to easily implement the structure as a matrix calculation on direct results obtained by the sensors.
Because of already mentioned hardware limits, a linear activation function was chosen. Weights and bias values were extracted in matrix forms and saved in Adruino MEGA 2560. The MatrixMath library was utilised to generate results by conveniently typing direct functions instead of complex
for loops. The neural distance estimator was tested, and results are shown in
Figure 5. Not only was improvement in accuracy observed, but also time efficiency was obtained. The reason was the simplification of the calculation–only summing and multiplying are needed as shown in Equations (3)–(5) below:
where
are the outputs of the first, the second and the output layer, respectively;
X is the input vector,
IW is the matrix of input weights,
LW is the matrix of the hidden layer weights,
OW is the matrix of the output layer, and
are the vectors of the biases of input and the hidden layer.
Table 1 presents a comparison of calculation times. Application of the neural model leads to about twice as fast calculations. The precision of the final values is also much higher. In
Table 2, the calculated values of relative errors are given [
35]. The average relative error for the neural distance estimator is only 1.23%, while the Sharp sensor has an 11.28% relative error, and 17.7% for the VL53L0X sensor. The value of the relative error was reduced to only 0.1 of initial values.
3. Self-Learning Neural Path Planner Applied to the Platform
The platform was equipped with an SD card module. It allows the platform to collect sensor data and control commands. This allowed us to save samples to be used later as training data for the neural network used for this task. As an example, a short test ride in a closed area with one obstacle was conducted (
Figure 6). Based on the gathered data and expected commands, the model was trained [
36]. The input vector consisted of sensor data, previous samples (for better representation of the dynamic state changes of the position of the vehicle) of commands and measured distances:
where
vr is the speed of right wheels,
vl is the speed of left wheels,
lf is the distance to an obstacle in front of the model,
lr is the distance to the wall on the right side,
ll is the distance to the wall on the left side,
k is the number of the sample.
The output of the structure was a two-row vector of forcing speeds for left and right wheels. At this point in the work, implementation of the
MultiLayer Perceptron (MLP) with linear activation functions was proposed. As in the previous section, the adaptation of internal coefficients was performed using the Levenberg–Marquardt algorithm. In
Figure 7, both target control signals and neural planner response are presented. It is worth indicating that the transients of reference speeds of the wheels are similar for both types of control loop. The response of the neural path planner is smoother as in case with no quantized values. Such a ‘soft’ reference signal is easier to reproduce and may lead to smaller overshoots obtained by the speed controller. As it can be seen in
Figure 7, an insignificant difference in desired speeds of the left and the right wheels was observed while the neural controller was applied. During the performed test, it did not affect the desired path. The most visible disadvantage of the neural structure was the delay that was observed. The reaction was delayed by one sample. Nevertheless, it is possible to apply the designed neural planner in the control structures of vehicles where an automatic control is the most important observed feature and may be neglected. Moreover, after additional pre-processing (shift of the output samples relative to input values) of the measured training data, the mentioned delay can be eliminated.
4. Neural Speed Controller
The platform was propelled with four DC motors grouped into two pairs (for left and right side). To perform turns, a separate control of left and right wheels was necessary [
37,
38]. To ensure steering of the vehicle, two separate speed controllers should be implemented. Each pair of DC motors was powered with a driver (based on the H-bridge) equipped with an encoder. The code used for calculation of the speed controller was implemented in Arduino Uno.
When analyzing the published literature that describe electric drives of vehicle powertrains, correct functioning under dynamic states and robustness against disturbances seem to be important requirements. Thus, the significant problem deals with the construction of mentioned drive. Because of the fact that the electric drives are always merged with mechanical structures such as gearboxes, couplings or shafts, additional disturbances and variable state oscillations were observed. Those phenomena are highly disruptive, not only for safety reasons, but also because of increased possibilities of damage; precision of control is also difficult to ensure.
Due to the reasons given above, a vehicle adaptive neural speed controller was implemented. The most important advantage of such a structure is the possibility of autonomous adaptation to changing working conditions [
39,
40]. Firstly, the tests were conducted in Matlab/Simulink based on the model obtained with the System Identification Toolbox. The structure of the adaptive control with the reference model was proposed (
Figure 8). The aim of the reference model is to calculate the error based on the modified reference signal and apply the information as feedback for an adaptation algorithm. Such a model provides the possibility to correct the reference signal if forced dynamics exceed motor features. For simulation purposes, the following transfer function was used:
where
is the damping coefficient and
is the resonant frequency.
The parameters of the reference model can shape the dynamics of the input signal. The conversion of the external waveform was applied to match the performance of the controlled plant. If the difference in dynamics of both the out variable state (speed) and the reference signal is significant, the adjustment mechanism cannot find the steady values of the controller coefficients.
A neural-like structure of speed controller with six tunable weights was tested (
Figure 9). The output value of the controller can be obtained using the expression below (generalized for a variable number of trainable weights):
where
is the vector of input values,
is adaptable weights, and
f() is the activation function.
The given description is valid for the model where additive biases are omitted, input vector consists of
J elements, and
I is number of neurons in the output layer. In practice, for adaptive controllers (with online training), omitting bias values is common. If the assumption mentioned is not taken into account, the general description of the neural controller can be given as below:
where
is the output activation function,
is the biases of the output layer,
is the activation function of the input layer,
is the bias values of the input layer, and
is the vector of the controller input values.
As can be seen in the scheme of the controller (
Figure 8), the input vector of the structure consists of error values (it is calculated as a difference between reference values and measured speed), the integration of successive values is also introduced as:
where
is the reference speed,
is the measured speed, and
K is the input gain coefficient.
The general idea of the controller is to adapt the values of network weights (in the presented controller, the
–
coefficients) during operation of the motor. This online computation process is performed in parallel to the main neural network. The main goal is to minimalize the value of the cost function defined as follows:
where
is the value of
o-th output of neuron,
is the demanded value of
o-th output, and
O is the number of considered outputs.
The basic idea of adaptation is described with the following:
where
is
ij-th the controller coefficient value in the
k-th step of the simulation,
is
ij-th the correction value calculated according to the equation presented below (it corresponds to a gradient-based optimization method):
where
f is the activation function,
is the adaptation coefficient,
is the output of the
i-th node,
x,
y are the input and output values, and
is the error between demanded and real output.
According to the literature, the given adaptation algorithm can be simplified. The first element of Equation (14) equals zero because of the lack of the following connections, while the second part can be omitted for internal neurons, as they are not directly affected by the output
yo. Additional details of the speed control algorithm are presented in [
41]. After theoretical considerations, some similarities to adaptive pinning control can be noticed (the update of parameters, convergence analysis, etc.) [
42].
The simulation test of the structure (
Figure 10) was conducted with a square signal (amplitude 150 rpm and frequency of 0.2 Hz) applied as a reference trajectory (
Figure 10a). Such a trajectory represents electric drive reversals and also ensures better visibility of the adaptation process. At the beginning, oscillations and overshoots are visible; however, after every repetition the inaccuracies are reduced. Efficient recalculation of weights can be observed in
Figure 10b. Random values of the internal parameters of the controllers were assumed. The weight values are continuously modified in the recalculation process. The most significant changes occur at the reference speed changes. However, after a certain period of time, the system stabilizes. The achieved results were obtained for the control structure realized with the adaptation coefficient
and integral part coefficient
K = 5.
Simulations confirm the efficiency of the neural adaptive speed controller. The most important advantage of the structure is the lack of a necessity to perform initial tuning of the controller. The adjustment is an automated process and is performed online during operation of the motor (
Figure 10b). The inaccuracies in measured speed transient are mostly visible only at the beginning of motor operation. It can be assumed that adaptation works correctly, and all unwelcomed disturbances are quickly damped.
5. Neural Road Sign Classifier
The vehicle was also equipped with an RGB camera (on the front of the platform). The images were processed by a Sipeed Maix Bit microcontroller board. It is a specialized microcontroller with a V-RISC dual core microprocessor. The dedicated neural processing unit KPU enables the board to work as a real-time image analyzer. It is capable of operating with 3 × 3 kernel matrices and any activation function. This enables the system to work as a real-time vision system. In UAVs, vision systems may detect obstacles, trace the lines, read QR or bar codes. In case of autonomous vehicles, a vision system should be also able to detect traffic information from existing infrastructure (for example traffic lights or road signs) [
43,
44,
45,
46]. In the designed structure, the vision system was implemented for road sign classification. After analysis of the object, the suitable command is propagated via the UART module to the main controller to interrupt current ride and perform pre-programmed behavior (
Figure 11). In
Table 3 the list of signs, corresponding commands and designed platform behavior is given.
For neural image classifier, a deep neural network was used (
Figure 12). The idea of such a structure is to use a raw RGB image, which is represented with three matrices, as input [
47,
48]. Then every matrix, corresponding to one color (red, blue, or green), is involved in convolution. The input matrices are convolved with the kernel matrix to create new matrices with different number of cells. In the convolutional layer, the kernel biases are trainable parameters. The kernel size for the convolution with the input data can be adapted to the designed functionality. Then, the matrices are downsampled with a moving pooling window to create feature matrices (maps). As shown in
Figure 12, max() function was applied in tests. Another popular pooling function is average() which could return the mean value of the cells in a window. It is considered as a blurring data processing agent. In the application shown in
Figure 12, the feature maps are the matrices filled with filtered max values of convolutional layers. The convolution and pooling process may be repeated to create small matrices which are the input to a fully-connected network. Inputs are called “deep matrices”, where only features (e.g., edges) are extracted. The fully-connected layer consists of an MLP neural network; it can classify or detect objects on the basis of input features. It is worth indicating the difference between classification and detection. The classifier only returns the membership of the object to classify and the probability of such prediction, while the detector response consists of a class, probability, positions and size of the bounding box imposed over the object (
Figure 13) [
49,
50].
For the model of the vehicle, the classifier was developed using an aXeleRate structure based on the Keras library. The structure and the library utilize TensorFlow neural models, which can be adjusted and trained using a previously prepared dataset [
51,
52]. For training purposes, the set of road signs images had to be prepared. According to the literature, the diversity of training data is necessary to obtain good efficiency [
53,
54]. The photos were taken with different cameras in different lighting conditions and with different angles. Moreover, the datasets were extended by creating copies of the images with adjusted saturation and brightness, blur or noise added. Not only were additional samples created, but also better diversification was obtained. Images with cropped, rotated, distorted or partially covered signs were also included. Every image was then cropped to a square of 150 × 150 pixels. The dataset was saved in directories with names of expected classes. If an object detector was trained, the images were also equipped with labels indicating the position of the object. That created additional xml files with the position and size of the bounding box. After collecting and managing the training data, a MobileNet algorithm was chosen for training the TensorFlow classifier structure [
55,
56]. The applied structure uses 150 × 150 × 3 input matrices, then 5 convolutional layers resulting in 16, 32, 64, 128, 256 and 256 matrices successively. The fully-connected network had 4 layers with 256 inputs and 200, 150 and 100 neurons respectively (
Figure 14).
In the MobileNet structure, convolutional layers not only convolute the matrices but also normalize the data with ReLU (
Rectified Linear Unit) activation function:
where
is the input value for ReLU function.
The training was conducted on the Google Colab system with Tesla P100-PCIE16GB GPU that enables researchers to use GPU acceleration and significantly reduces computing time [
57]. The aXeleRate structure was trained during 5000 epochs; additionally, the early stopping option was enabled and an embedded dataset extension (by noise adding with dropout value 0.5 was activated) was used. After the training, the TensorFlow model had to be compiled into an optimized form of the TensorFlow Lite model. The main difference between the mentioned models deals with weight quantization (the
Lite version uses 8-bit precision) [
58]. It enables models to work in real time on neural computing supporting microcontrollers. The MobileNet structure was chosen because of good classification efficiency [
59,
60]. The net structure was originally developed by Google in 2017, and it was designed especially for mobile applications [
61]. It is a universal structure that can be trained and employed for detection, classification and segmentation tasks. The main advantage of using such a structure is the limited number of trainable parameters. The deep-learning structure consists of both a convolutional and a fully-connected network. However, the reduced number of parameters was obtained by considering the convolution kernel height and width independently. The kernel can be obtained after matrix multiplication of two vectors. The feature map is obtained during a two-step convolution (depthwise separable convolution) process. At the beginning, a standard 3 × 3 kernel is used to obtain reduced data with the same depth. Then, a 1 × 1 kernel is applied to take depth into account. Such an approach gives the possibility for faster data processing in mobile devices (because of smaller kernels) and noticeably reduces the number of parameters.
The other models were also tested on limited datasets to compare efficiency, training time, number of trainable parameters and file size (
Table 4). The classification efficiency was calculated according to the following formula:
where
P is the classification efficiency [%],
nOK is the number of correctly classified images, and
ndt is the number of images in the dataset.
Results gathered in
Table 4 indicate significant differences between the tested deep-learning structures. It is worth indicating that the quickest training took only 1 min (resulted in efficiency of 40%) and the longest required nearly half an hour (did not increase precision with overall efficiency of only 26.7%). MobileNet structures are characterized with the best efficiency (higher than 73%) and reasonable training time (less than 5 min). Because implementation in a low-cost microcontroller is taken into account, the size of the model is also important. The smaller the number of parameters, the smaller the size of the output file is. As it can be seen, the model described with the smallest file (with only about 315,000 parameters) works much better than oversized structures.
A novel approach for distance estimation was also proposed and tested. The idea is based on image analysis. An additional one-class detector was trained to analyze the “STOP” sign. Taking the assumption that all objects have identical size, it is possible to estimate the distance from the sign using the vision system. The information can be obtained by comparing the width of the bounding box and class name with an index in the lookup table (
Figure 15a). The relations between distance and the size of the bounding box can be established for known conditions of operation and saved. Using experimental results obtained with a 640 × 480 camera, the lookup table was proposed (
Table 5). The experiment was performed for a detector implemented in a microcontroller (
Figure 15b), and the bounding box size was the direct result. Such an approach may lead to a reduction of the distance number of sensors or fault-tolerant control (without additional sensors).
6. Real Platform
The described microprocessor system was mounted on a double-deck metal chassis (
Figure 16). It makes it possible to gather each element of system tidily. On the bottom deck, DC motors with servos, encoders and Arduino Uno were placed. On the second level, the main controller (Arduino MEGA 2560) was mounted longitudinally. The supports for two side-sensors were designed and created with
Fused Deposition Modeling (FDM) technology. The optical distance sensors were mounted above each other at the front of the platform. The vision system, with a Maix Bit development board and the camera, was rigidly fixed to the PCB, which was mounted with additional elements (also created with the 3D print) on top of the front sensors. Arduino boards and DC motors were powered with stabilized voltage (6 V). It was ensured with the DC/DC converter supplied with two 18,650 Li-ion batteries. The sensors were powered directly from the development boards. The platform was also equipped with a USB powered battery charger. The Maix Bit board and the camera were powered from an external power bank using a USB-C cable. The modular structure gives the possibility to expand the system with additional elements (e.g., accelerometers or GPS receivers) in further development.
Finally, the functional test was performed to verify assumptions, results of simulations and confirm feasibility of autonomous driving of the platform. The test drive was performed outside, to reduce noisy data from the low-cost sensors caused by unsuspected reflections. Firstly, the autonomous control system was tested (
Figure 17). The platform was placed in front of a wall and two boxes forming an alley. It is worth indicating that one was white and the second was black. It was intended as an additional test of a neural distance estimator (utilizing optical sensors). As it can be seen on the presented illustration, the system works correctly, the platform avoided crashing into obstacles and moves according to reference (expected) path trajectory. During the initial test, data from sensors and control commands were saved on an SD card. Next, the control system was switched to reading mode. The platform was placed in an open area, and the previously predefined trajectory was perfectly reproduced.
The last test was also executed in an open area. The control system was switched to autonomous driving mode. The goal of this trial was to confirm the effectiveness of the neural road sign classifier and the accuracy of the feedback loop of that system. The “STOP” sign was placed in front of the platform, a couple meters distant from the starting point. After switching on, the model accelerated, then stopped before the object and set off. The evaluation was positively passed; the construction of the platform and the applied neural control systems works correctly.
Further research concerning the design of the platform will be conducted. The power for the microcontroller system is planned to be delivered from an alternative renewable energy source, such as photovoltaic panels. Such a design is especially beneficial for small delivery robots operating in open areas. However, safety has to be ensured first, so the energy consumption of every element will be verified. As the vision system is also independent in case of energy source and is implemented on the most powerful microcontroller, the preliminary energy usage was measured with a simple USB power meter.