A Low-Cost Approach to Maze Solving with Image-Based Mapping

Mănase, Mihai-Sebastian; Dulf, Eva-H.

doi:10.3390/technologies13070298

Open AccessArticle

A Low-Cost Approach to Maze Solving with Image-Based Mapping

by

Mihai-Sebastian Mănase

^1,† and

Eva-H. Dulf

^1,2,3,*,†

¹

Department of Automation, Technical University of Cluj-Napoca, 400114 Cluj-Napoca, Romania

²

Physiological Controls Research Center, University Research and Innovation Center, Obuda University, 1034 Budapest, Hungary

³

Biomatics and Applied Artificial Intelligence Institute, John von Neumann Faculty of Informatics, Obuda University, 1034 Budapest, Hungary

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Technologies 2025, 13(7), 298; https://doi.org/10.3390/technologies13070298

Submission received: 28 May 2025 / Revised: 22 June 2025 / Accepted: 8 July 2025 / Published: 11 July 2025

Download

Browse Figures

Versions Notes

Abstract

This paper proposes a method for solving mazes, with a special focus on navigation using image processing. The objective of this study is to demonstrate that a robot can successfully navigate a maze using only two-wheel encoders, enabled by appropriate control strategies. This method significantly simplifies the structure of mobile robots, which typically suffer from increased energy consumption due to the need to carry onboard sensors and power supplies. Through experimental analysis, it was observed that although the encoder-only solution requires more advanced control knowledge, it can be more efficient than the alternative approach that combines encoders with a gyroscope. In order to develop an efficient maze-solving system, control theory techniques were integrated with image processing and neural networks in order to analyze images in which various obstacles were transformed into maze walls. This approach led to the training of a neural network designed to detect key points within the maze.

Keywords:

maze solving; image processing; neural network; velocity and position control

Graphical Abstract

1. Introduction

This work aims to emphasize the navigation aspect within a maze, considering factors such as structural design, energy consumption, and component cost, to create a more accessible and practical robotic platform. As highlighted in [1], image processing is a subfield of computer vision, alongside machine learning and object recognition. It plays an important role in fields such as robotics, healthcare, and communication by enabling the detection and extraction of key features from images and video sequences. These features can then be used for further analysis and interpretation, supporting intelligent decision-making and enhancing system performance in a wide range of practical applications.

With the advancement of technology, the field of image processing has become increasingly accessible. Using a camera, it is now possible to guide a robot from an initial point to a target destination while avoiding obstacles interpreted as a maze. In addition, artificial-intelligence-based detection algorithms have become more accessible and easier to implement, helping robots identify objects, people, or other relevant features within a scene. This marks a transition from relying solely on basic physical measurements obtained from traditional sensors to analyzing images that contain a richer set of environmental information that enables the development of small autonomous robots capable of assisting patients in hospitals, acting as medical assistants that can interact with people and transport medication or medical equipment [2].

Control theory plays a crucial role in enabling precise computations within robotic systems through the use of controllers. This facilitates a transition from basic threshold-based sensor evaluation to the control of important physical variables. A feedback loop is incorporated into the system to adjust its behavior dynamically, ensuring that specific performance requirements are met. In this structure, the control signal applied to the actuator is computed in real time, based on the initially defined conditions and any external disturbances that may occur. The only parameter that must be defined is the reference value, representing the desired measurement achieved at the output. This methodology greatly improves robotic navigation, enabling highly accurate speed and position control even when using sensors with limited precision [3].

Based on the findings of [4], the cascade control strategy offers several notable advantages. One of the primary benefits lies in the simplicity of the individual controllers designed for each feedback loop, which makes implementation and tuning more manageable. Furthermore, this architecture enhances the system’s ability to reject external disturbances and effectively compensates for non-linearities in the plant dynamics. An additional strength of the cascade structure is its conceptual clarity: it facilitates a more intuitive understanding of the control process by highlighting the direct relationships between the physical variables involved.

According to [5], maze-solving strategies can be divided into two main categories: algorithms that rely on prior knowledge of the environment, such as Flood Fill or Dead-End algorithms, and algorithms that do not require any prior information, such as Wall Follower or Random Mouse, which perform exploration in real time. Maze solving can be implemented either using sensors to detect walls or by employing a camera for image-based processing. Moreover, recent work has highlighted the potential of machine-learning-based techniques, particularly semantic segmentation and keypoint detection using neural networks, in enhancing obstacle identification and environmental understanding (see, for example, [6]—an idea that is used later in this paper). However, these approaches often require significant computational resources and training data, which limits their applicability in low-cost or resource-constrained systems. In this paper, we address these challenges by proposing a hybrid approach that uses neural networks for lightweight, image-based detection of structural maze elements (e.g., start/finish lines and obstacles) and relies on a classical, computationally efficient algorithm for path planning: Breadth-First Search (BFS). This design choice aims to balance system simplicity with functional performance, particularly in constrained embedded environments.

In contrast to traditional maze-solving approaches, which rely on real-time sensor data to incrementally construct a map of the environment, the use of image processing offers a global view of the maze prior to execution. This reduces both computational complexity during navigation and the need for onboard mapping algorithms. Moreover, by externalizing environment sensing to an overhead camera or a pre-acquired image, the robot design can be significantly simplified. As highlighted in [7], mobile robots often face trade-offs between onboard sensing, weight, and energy consumption. Our method addresses this by minimizing the hardware requirements—no additional range sensors or mapping modules are needed—resulting in a lower-power, lightweight system. However, we acknowledge that this approach assumes the availability of a clear overhead view, which may not be feasible in all real-world scenarios.

In [8], the authors propose a method for scaling maze dimensions from a virtual image to a physical environment using a constant scaling factor, under the assumption of uniform robot speed driven by a fixed voltage. However, this assumption oversimplifies real-world conditions, where speed is affected by surface friction, motor characteristics, and slope variations. Their approach lacks a dynamic adaptation mechanism to handle these variations. In contrast, our system incorporates feedback-based velocity control, enabling more accurate path execution under variable terrain conditions.

In comparison to the work presented in [9], where the maze follows a standardized structure with a predefined number of cells and fixed dimensions, the maze used in this study does not assume any fixed layout. Instead, a grid is dynamically overlaid on the input image to adapt to arbitrary maze geometries. Obstacle detection is performed through depth analysis using neural networks, which allows for more accurate identification of maze boundaries and obstacles in varying environments. This approach enhances the system’s flexibility and applicability in real-world scenarios.

The study in [10] focuses primarily on the image processing aspect, employing a homography transform to rectify the image. However, it provides limited details about the hardware implementation, the types of sensors used, or the physical navigation on the robot platform. In contrast, our work integrates both image processing techniques and hardware-level considerations, including a part of control strategies, to enable practical maze navigation on a mobile robot.

The present paper addresses the problem of solving mazes in environments that are not necessarily regular or composed of a fixed number of cells. The emphasis was placed on minimizing the usage of hardware resources. To enhance the robot’s navigation capabilities within the maze, a cascade control structure was implemented, combining both velocity and position control. The proposed method demonstrated performance comparable to that of the set-up that employed a gyroscope for navigation. To validate the approach, experiments were conducted in two configurations: one based solely on encoder feedback and another integrating a gyroscope to improve position control.

This study is organized into four main sections. It begins with an introduction that outlines the motivation and context of the image-based maze-solving approach, followed by a brief literature review relevant to this case study. The Section 2 describes the design and implementation of the proposed solution. The Section 3 presents the experimental results, compares the two navigation strategies, and presents a brief discussion of the results obtained. Finally, the paper concludes by highlighting that both approaches produce comparable results, despite differences in complexity and hardware requirements.

2. Materials and Methods

As illustrated in Figure 1, the system involves uploading an image of a maze into a Python 3.13.2 script, where offline processing is performed to identify the objects and determine a path from a specified starting point to a selected endpoint. This computed path is then transmitted to the robot via Bluetooth communication. To enable the robot to navigate through the maze, control theory principles were applied to design both position and velocity controllers, which resulted in significantly improved accuracy during movement execution. The following subsections will offer a detailed explanation of each component illustrated in this diagram.

2.1. Image Processing

In Figure 2, the initial block diagram was refined to provide a more detailed and structured representation of the image processing component. This extended view outlines the sequence of operations required to extract relevant information from the maze image. The process includes the identification of key visual features, such as the maze boundaries and the navigable path between two designated points. These features are isolated through a series of pre-processing and analysis steps designed to simplify the image and emphasise structural elements. Once the relevant data has been extracted, it is processed and formatted to be transmitted to the robot’s control system, facilitating accurate and autonomous navigation within the maze.

As illustrated in Figure 3, the initial image underwent a series of preprocessing operations, meticulously designed to enhance the extraction of features. Initially, the image underwent a conversion to grayscale, a process that served to reduce the dimensionality of the data and facilitate the subsequent processing stages. Subsequently, a blur filter was applied in order to suppress high-frequency noise and eliminate irrelevant background details that could have a detrimental effect on the detection accuracy.

The goal was to detect all obstacles in the image. This objective was accomplished by conducting a depth-based analysis utilising the MiDaS_small neural network, as outlined in [6]. This model estimates the depth information for each pixel in the image, thus facilitating the identification of foreground objects based on their relative distance from the camera. The output of this network is illustrated in Figure 4, where the main objects are clearly highlighted.

Following the acquisition of the depth map and the identification of the objects, a Sobel filter—described in detail in [11]—was applied to detect prominent horizontal and vertical edges within the scene As demonstrated in Figure 3b, these edges contribute to the emphasis of the objects’ structural composition. The previously identified edges served as input to the cv2.findContours() function, which was used to extract the contours of the objects.

Following the completion of the initial processing steps, the focus transitioned to the identification of the start and finish lines within the maze environment. The start line was implemented as a green strip physically attached to the robot, serving as a visual cue to facilitate its identification within the scene. Conversely, the finish line was designed as an orange strip placed over a black background in order to enhance its visual prominence and ensure reliable recognition. In view of the distinct and consistent visual properties of both the start and finish lines—specifically, their unique colour and placement—it was deemed appropriate to employ a deep-learning-based approach for robust detection. In order to achieve this objective, a pretrained object detection model, YOLOv8, was selected on the basis of its proven efficiency and accuracy in real-time applications. The training process involved the manual annotation of 500 images containing start and finish lines. These images were captured under a variety of conditions, including different viewing angles, varying lighting environments, and diverse room settings. These annotations enabled the model to acquire generalized representations of the lines. The dataset was partitioned into 80% training data and 20% validation data to ensure both learning and generalization were adequately addressed.

In order to optimize the network’s capacity for object detection, a training procedure was implemented. This consisted of 50 epochs, with each epoch processing batches of 16 images. This configuration was found to provide an optimal balance between learning efficiency and computational resource constraints. Following the conclusion of the training process, the network demonstrated a strong capability to accurately identify both the start and finish lines across a range of scenarios. The efficacy of the detection process is illustrated in Figure 3c, where the correctly recognized elements are clearly highlighted.

Subsequently, a uniform grid was overlaid onto the processed image as illustrated in Figure 3d to discretize the maze into smaller, manageable sections. Each cell of the grid was then analyzed individually to determine the presence of objects. This objective was realized through the creation of masks derived from the identified contours of objects. The initial and final areas were excluded by detecting them using the trained neural network. In this manner, each obstruction was meticulously examined to ascertain its alignment with either a start or finish line. As a result, a binary maze matrix was generated, where cells corresponding to obstacles were assigned a value of 1, while all other cells (indicating free space) were assigned a value of 0. This matrix served as a simplified digital representation of the maze layout, providing a foundation for subsequent path-planning and navigation algorithms.

For the proposed example, the matrix associated with the region of interest is approximately represented in

A = [\begin{matrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 \\ 1 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 1 & 1 \\ 1 & 1 & 1 & 0 & 1 & 1 & 0 & 0 & 0 & 1 & 1 \\ 1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 1 \\ 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 1 & 1 & 1 & 0 & 0 & 1 & 1 & 1 \\ 0 & 0 & 0 & 1 & 1 & 1 & 0 & 0 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 1 \end{matrix}]

(1)

The selection of the grid size presented a considerable challenge during the development of the system. Initially, an approach inspired by [12] was attempted, which involved using the dimensions of the robot as the fundamental reference for determining the grid scale.

However, in this particular case, the physical length of the green start line—measured to be 8.5 cm—was used as the primary reference for scaling. This line was physically attached to the front of the robot. Although it is narrower than the robot’s actual width, it remains fixed and reliably detectable in the captured image. Consequently, it provides a consistent and convenient reference for converting measurements from pixel units in the image to real-world distances. By calculating the pixel length of the line in the image, a scale factor can be derived and applied to estimate the robot’s dimensions and other relevant distances in the environment. In the event of substantial modification to the robot’s design or platform, this scaling can be effortlessly recalibrated by employing a direct proportional relationship between the revised robot dimensions and the reference line. This approach ensures flexibility and adaptability across different robot platforms while maintaining accuracy in mapping from image space to physical space.

However, there are situations in which the initial grid application may be imperfect, primarily due to the tilt or perspective distortion present in the captured image. Such misalignment can lead to difficulties or failures in accurately identifying a valid exit path from the maze. To address the perspective distortions that may affect the accuracy of maze segmentation, the grid cell width is iteratively reduced in small steps, empirically set to 5 pixels. This value was selected based on experimental testing of multiple decrement sizes and was found to provide a good trade-off between computational efficiency and grid alignment precision. The adjustment continues until a feasible solution is obtained, ensuring that the grid fits accurately over the maze structure even in the presence of slight image tilting or scale inconsistencies. This adaptive resizing ensures that the grid better conforms to the maze structure as represented in the image, improving the accuracy of path detection. Once an appropriate grid size has been established, the resulting adjusted grid serves as the foundation for programming the robot’s navigation strategy through the maze.

It is worth mentioning that if the grid size is changed, then the distance traveled by the robot must be adjusted accordingly. This can be done using a simple rule:

new_distance = 8.5 \cdot \frac{new_grid}{initial_grid}

(2)

where 8.5 is the width in centimeters of the start line, initial_grid is the width (in pixels) of the line as identified in the original image, and new_grid is the new width after resizing.

After all the essential parameters, such as the positions of the obstacles, the initial point and the terminal point, the shortest possible path through the maze was determined using the Breadth-First Search (BFS) algorithm, as recommended in [13]. This algorithm systematically explores all possible routes in a level-wise manner ensuring that the first path found to the goal is indeed the shortest. Employing BFS in this context provides a reliable and efficient method for pathfinding within grid-based maze representations.

The command structure utilized for robot navigation is based on a predefined format comprising three actions: front, left, and right. For the front command, the robot is instructed to move forward by a fixed distance corresponding to the width of a maze cell. For the right and left commands, the robot will rotate by 90 degrees and −90 degrees, respectively. This functionality will be made possible through the controllers designed in the following sections. Finally, a partial sequence of commands transmitted to the robot for the example scenario is as follows:

commands = [front, front, front, right, front, front, front, front, left, front, right, front] \dots

(3)

2.2. Design of the Robot

The robot is a simplified version of a Turtle robot described by [14] and is built around an Arduino Uno board equipped with an L298P motor driver shield, as shown in Figure 5. This mobile robot originally had two motors, which were replaced with new motors that include integrated encoders for improved position and speed feedback. One of the motors features a maximum speed of 155 RPM with a 100:1 gear ratio, while the other motor has a maximum speed of 105 RPM with a 150:1 gear ratio. An MPU6050 gyroscope sensor was integrated into the robot’s hardware configuration to facilitate the execution of one of the planned experiments. Furthermore, the original Bluetooth module was substituted with an HC-05 module, which facilitated stable wireless communication and allowed for a seamless connection to a laptop.

Even if identical motors had been employed, implementing a velocity control mechanism would still have been essential to prevent the application of a constant PWM signal to the motors. This is because a fixed PWM input does not account for varying external conditions such as changes in the surface texture over which the robot moves or fluctuations in the load and weight distribution on the robot. Consequently, the control signal must be continuously adjusted in real time to maintain the desired motor speed and ensure stable, accurate movement. Furthermore, precise velocity control plays a critical role in enabling the estimation of the distance travelled by the robot within the maze, which is vital for effective navigation and path planning. Without such control, the robot’s movements would be unpredictable and prone to errors, significantly impairing its ability to complete the maze solving.

Figure 6 presents the detailed block diagram illustrating the robot’s behavior and the sensors employed in two different scenarios. The first scenario involves navigation using only the two motor encoders, relying on wheel rotation data to estimate position and movement. The second scenario incorporates an additional gyroscope sensor to enhance positional accuracy through orientation correction.

If the velocity of the robot is accurately known, its position can be determined through the process of integration over time. This capability allowed the robot to execute precise maneuvers such as left and right turns, which require rotating the robot by 90 degrees. Alternatively, the same rotational movement can be achieved by utilizing a gyroscope sensor to monitor angular velocity specifically around the Z-axis. The comparison between position estimation through velocity integration and gyroscope-based orientation monitoring forms the basis of the analysis conducted in this work, highlighting the strengths and limitations of each approach in the context of robotic navigation.

To achieve effective control of both velocity and position using only encoder feedback, a cascade control strategy was adopted. In this hierarchical configuration, the inner loop is dedicated to closed-loop velocity control, which is prioritized due to its faster response characteristics compared to position control. By rapidly controlling the motor speed, the system ensures smooth and stable velocity adjustments that serve as the foundation for precise movement. The outer control loop manages the position by continuously adjusting the velocity reference based on positional error. This division of control responsibilities enables the system to benefit from the fast response of velocity control while simultaneously ensuring precise position tracking, thereby enhancing the overall accuracy and reliability of the robot’s navigation.

2.3. Velocity Control

For the velocity control stage, a PWM signal was applied to the motor as the input in order to observe the time evolution of the process output, namely, the motor speed, which was measured by the encoder in revolutions per minute (RPM). Figure 7 shows the PWM input signal applied to the right motor, while the adjacent graph illustrates the corresponding system response to this step input. To validate the accuracy of the model, the final graph shows the identified transfer function plotted together with the actual output data, demonstrating a close match between the predicted and measured responses.

A similar methodology was followed for the left motor, with its parameters adopted from the reference [15]. This article also provides a detailed overview of the data acquisition process from the motor encoder. The formula used to calculate the speed (RPM) for the right motor is presented in

R P M = \frac{60 \cdot counter}{150} = 6.67 \cdot counter

(4)

where the counter denotes the number of encoder pulses recorded during a predefined sampling interval (

0.02

s). This computation enabled the transformation of discrete pulse counts into continuous speed measurements, which were subsequently used to generate the motor’s output velocity graph. In addition, the system identification procedure is described, where the motor’s dynamic behavior was approximated by a first-order transfer function model. This simplified representation captures the essential characteristics of the motor response and serves as the foundation for designing effective control strategies.

Therefore, the transfer functions specified in equations

H_{f 1} (s) = \frac{0.54}{0.06 \cdot s + 1}

(5)

and

H_{f 2} (s) = \frac{0.9}{0.06 \cdot s + 1}

(6)

were derived. As a notation convention, any odd-numbered index refers to the right motor, while any even-numbered index refers to the left motor.

Following the system identification phase, the next step focused on the design of the controllers. The objective was to shape the closed-loop system response to emulate a desired first-order behavior characterized by the transfer function

\frac{1}{T s + 1}

, where the time constant T was set to 0.05 s. This design choice aimed to achieve a faster transient response and enhance the overall responsiveness of the velocity control loop.

To meet this performance criterion, two Proportional-Integral (PI) controllers were synthesized—one for each motor. The resulting control laws are expressed in equations

H_{c 1} (s) = 1.33 (1 + \frac{1}{16.7 \cdot s})

(7)

and

H_{c 2} (s) = 2.22 (1 + \frac{1}{16.68 \cdot s}),

(8)

corresponding to the right and left motors, respectively. These controllers ensure that the closed-loop system maintains both stability and tracking accuracy, even in the presence of variations in load or surface conditions.

Furthermore, to implement the controllers on an Arduino platform, both controllers were discretized using the Zero Order Hold (ZOH) method with sample time

T_{e} = 0.02

, which allowed the continuous-time controllers to be converted into two recurrence equations, as shown below:

c_{1} [k] = c_{1} [k - 1] + 2.2222 \cdot ϵ_{1} [k] - 1.4814 \cdot ϵ_{1} [k - 1]

(9)

\begin{matrix} c_{2} [k] = c_{2} [k - 1] + 1.3333 \cdot ϵ_{2} [k] - 0.8888 \cdot ϵ_{2} [k - 1] \end{matrix}

(10)

2.4. Position Control

For the data acquisition phase, a fixed speed was applied to the motors, and the rate of angular increase over time was measured using the formula:

θ = \frac{counter \times 360 °}{pulsesPerRevolution \times 3}

where pulsesPerRevolution is specified in the datasheet, being 1056.7 for the right motor and 695.1 for the left motor.

It is important to note that the variable counter retains the same fundamental meaning in both the velocity and position control contexts as it represents the number of encoder pulses. However, the key distinction lies in the way this value is managed over time. In the case of velocity control, the counter is reset at the end of each sampling period to measure the number of pulses detected within that specific interval, thereby enabling the calculation of rotational speed. Contrarily, for position control, the counter is allowed to accumulate continuously over time, effectively providing an incremental measure of the total angular displacement of the motor shaft. This accumulated count serves as the basis for estimating the robot’s position throughout its navigation. In brief, the angular displacement

θ

is determined by counting the number of pulses generated by the encoder and normalizing this value by the total number of pulses corresponding to one full motor shaft revolution. The division by 3 accounts for the gear ratio.

In Figure 8, the first graph shows the velocity of the right motor, which has an average value of 80 RPM, followed by the position graph, which continuously increases over time, exhibiting integrator behavior. To identify the transfer function for the position, the slope of the straight line was calculated, resulting in

\begin{matrix} T i & = \frac{x_{2} - x_{1}}{y_{2} - y_{1}} \\ H_{f} (s) & = \frac{1}{T i \cdot s} = \frac{1}{0.014 \cdot s} . \end{matrix}

(11)

This is a relation that provides the same result for the left motor as well.

Figure 8. Process identification—position (right motor).

Furthermore, to integrate the velocity and position control, a cascade control approach was implemented. This strategy involves adjusting the speed applied to the motor to reach a rotation of 90 degrees within a predefined time interval. Since

H_{f} (s)

represents an integrator, it inherently eliminates the steady-state error for the position.

Therefore, due to the simplicity of the system dynamics and the relatively low complexity of the required position control, a basic proportional controller was considered sufficient. The selected controller had the form

H_{c} (s) = 0.07

, and its effectiveness was verified through simulation in the Simulink environment presented in Figure 9. The simulation results demonstrated that this control strategy led to a steady-state position error of zero for both motor subsystems, confirming the sufficiency of the chosen gain.

Following this analysis, the controller transfer function was discretized to facilitate implementation on the microcontroller. This discretization process yielded the recurrence equation presented in

\begin{matrix} c_{3} [k] = & (0.007 \cdot ϵ_{3} [k]) \cdot \frac{1000}{6}, \end{matrix}

(12)

which defines the control signal at each sampling instant based on the computed position error. The factor

\frac{1000}{6}

serves solely as a conversion coefficient between the physical quantities associated with the outer control loop and those of the inner control loop.

In Figure 9, the Simulink diagram used to simulate the behavior of the system is illustrated. In the inner loop, a saturation block was used to constrain the control signal within the range

[0, 255]

, corresponding to the PWM values that can be used with an Arduino UNO. Furthermore, to convert the reference 90° from the outer loop to the reference based on RPM required by the inner loop, conversion gains were applied.

3. Results and Discussion

In this section, the performance of the designed controllers is first evaluated based on the results obtained during experimental validation. Following this, the two navigation approaches are presented in detail: one relying solely on encoder feedback and the other incorporating both encoder and gyroscope data. A comparative analysis is then conducted between the two methods, highlighting their respective strengths, limitations, and suitability for the intended application.

3.1. Velocity and Position Control

For the control stage, the final simulation results are illustrated in Figure 10, which was obtained using the Simulink environment. In Figure 10a, a reference of 100° was applied, representing the desired angular displacement of the robot. As can be observed, the system reaches the steady-state in approximately 8 s, with zero final position error. This confirms that the implemented control strategy ensures accurate tracking of the reference input.

As shown in Figure 10b, the control signal generated for both motors maintains a simple shape, with PWM values remaining below the maximum value of 255. This indicates that actuator saturation does not occur for reference angles below 100°. If a larger rotation is required, the same rotation command can be executed multiple times. For instance, to achieve a 180° rotation, the 90° rotation command can be applied two consecutive times. This approach only results in a doubled response time (which is still minimal) while preserving the system’s stability and control accuracy.

Robustness analysis was carried out for the inner control loop by varying the parameters of the fixed transfer function used for velocity control by ±30%. The results confirmed that the system’s performance remained stable and did not exhibit significant degradation, indicating good robustness with respect to parameter variations. These performance values are summarized in Table 1 and illustrated in Figure 11a, confirming the reliability of the velocity controller. Subsequently, the system’s ability to reject output disturbances in the inner loop was evaluated. As shown in Figure 11b, two step disturbances were applied: a negative step of amplitude

- 10

at

t = 1

s and a positive step of amplitude

+ 10

at

t = 2

s. In both cases, the controller promptly and effectively rejected the perturbations, further demonstrating the robustness of the inner loop design.

For the outer loop, the robustness was tested by modifying the time constant

T_{i}

of the fixed part by ±30%. Again, no significant changes were observed in the system behavior. The results are presented in Table 2 and depicted in Figure 11c. Regarding the outer loop’s disturbance response, a step disturbance of amplitude

+ 5

was applied at the system output. As seen in Figure 11d, the system did not reject the disturbance but rather accumulated it. However, applying a disturbance of the same amplitude but with the opposite sign would result in disturbance elimination.

For the outer loop, robustness was evaluated by varying the time constant

T_{i}

of the fixed part by ±30%. The results showed that the system behavior remained consistent with no significant variations, confirming the robustness of the outer loop controller to parameter changes. These findings are summarized in Table 2 and illustrated in Figure 11c. Regarding the disturbance response of the outer loop, a step disturbance with an amplitude of

+ 5

was applied at the system output. As depicted in Figure 11d, the system did not reject the disturbance but instead accumulated it over time. This behavior is expected given the integral nature of the outer loop, which inherently integrates persistent disturbances rather than rejecting them immediately. It should be noted that applying a disturbance of the same amplitude but opposite sign would counterbalance this effect, resulting in disturbance elimination.

3.2. Maze Solving

To evaluate the performance of different control strategies for maze navigation, two experiments were conducted. The same mazes were solved using both approaches. The goal was to compare the two approaches in terms of execution time, system complexity, cost, and power consumption. These performances are illustrated in Table 3, Table 4 and Table 5.

As outlined in Table 3, the encoder-only approach exhibits distinct advantages, particularly with regard to reduced cost and energy consumption, attributable to the simplification of the robot’s physical structure. The objective of the conducted experiments is to demonstrate that a robot with a highly simplified architecture can still perform complex tasks such as maze solving. With regard to path deviation, both approaches demonstrated an accumulation of error over time. In order to address this issue, it is recommended that the robot should periodically reset its control parameters and validate sensor accuracy. The achievement of this objective may be realized through the implementation of continuous trajectory monitoring, utilizing a vision system. This would also facilitate real-time tracking of the robot’s state. Unlike the simpler approach of merely reading values from a sensor, the encoder-only configuration requires a deeper understanding of control theory. However, with a slightly increased computational effort, it can achieve performance levels comparable to the gyroscope-based configuration in maze-solving tasks (see Table 4) while benefiting from a simpler hardware design. Given the robot’s limited motor capabilities, it was not possible to achieve a significantly better response time than with the gyroscope-based approach in this experiment. Attempting to reduce the response time further would have required a more aggressive control signal, pushing the motor into saturation.

Figure 12 and Table 5 present an analysis of positioning errors in the two maze scenarios, using both the encoder-based and gyroscope-based methods. The error values are estimates and cannot be considered exact measurements as they are calculated as the Euclidean distance between the center of the start line and the center of the finish line. Discrepancies may arise between the actual distances and those computed from the images due to the camera’s angle of inclination. Additionally, as the start line is not located precisely at the edge of the robot, the values may appear slightly larger than the actual distance between the robot and the line. An accumulation of errors can be observed in both cases, especially in the second test where the path is longer. Nevertheless, the robot manages to reach the target relatively close in both situations. These experiments highlight that the two navigation strategies have comparable path deviations, with similar error magnitudes.

Figure 13 and Figure 14 illustrate the two mazes, over which a grid was superimposed to generate a matrix representation. From this matrix, the shortest path was extracted between the green line and the orange line. This path is highlighted in yellow. The corresponding movement commands were then transmitted to the robot via Bluetooth.

For the front command, the robot moves forward a fixed distance of 8.5 cm. During this forward motion, a velocity controller is employed to maintain a constant speed, thereby ensuring both consistency in movement and repeatability across different commands.

For rotational movements, namely, the left and right commands, two distinct control strategies were implemented to achieve 90°. The first strategy relies solely on encoder feedback and uses a position controller that integrates motor speed over time to compute angular displacement. The second strategy introduces additional feedback from a gyroscope (MPU6050) to monitor the robot’s rotation around the Z-axis. To reduce the influence of gyroscope noise during rotation, the system does not rely on matching an exact angular value. Instead, a threshold-based approach is applied: the robot first records its initial orientation, then starts rotating. The rotation is stopped when the difference between the current gyroscope reading and the initial value exceeds ±80 degrees. This method ensures a practical and noise-tolerant turning mechanism, reducing the risk of false detections or premature stops due to sensor fluctuations.

A YOLOv8 network was trained to detect the start and finish lines, and its performance is illustrated in Figure 15. The model’s performance was evaluated based on the progression of training losses and detection metrics. Training losses decreased gradually and remained stable throughout the epochs, indicating effective learning without overfitting. In terms of evaluation metrics, the model achieved high precision, recall, and mean average precision (mAP) scores on the validation set, demonstrating its ability to detect and correctly classify target objects. Overall, the model shows good generalization and is well suited to the task.

Additionally, Figure 16 illustrates one of the validation batches, highlighting that the two lines were successfully detected despite being captured under varying lighting conditions and differences in structural appearance. This demonstrates the model’s robustness and ability to accurately identify the lines even when environmental factors and visual characteristics change.

A key advantage of the proposed vision-based approach is that the entire maze structure is captured in a single frame at the start of the process. Consequently, challenges such as dead ends do not impact the navigation strategy as the system possesses comprehensive knowledge of the environment and can compute the optimal path from the outset. In configurations with multiple possible paths, the Breadth-First Search (BFS) algorithm is used to reliably identify the shortest route between the start and finish points. However, the current implementation does not account for dynamic elements, such as moving obstacles. We recognize this as a limitation and propose integrating real-time visual feedback from the camera to enable dynamic path correction and obstacle avoidance during navigation in the future work section.

As demonstrated, navigation using only encoders, although it requires more complex analysis and advanced control strategies, can achieve results that are comparable to those obtained with a gyroscope-based approach. In some situations, it may even provide better performance.

This article places a stronger emphasis on the navigation aspect of the robot, in contrast to [12], which focuses more on solving mazes of any shape or structure. Moreover, the proposed approach can also support diagonal movement by rotating 45°, even though the robot is not equipped with omnidirectional wheels.

The robot’s hardware architecture was deliberately designed to be as minimal as possible in order to demonstrate that a complex task such as maze solving can still be successfully addressed without relying on expensive components or intricate sensor fusion. Unlike the system described in [16], which employs a variety of sensors—such as ultrasonic, inertial, and color sensors—and multiple algorithmic strategies to navigate a maze, the proposed solution utilizes only a single sensor type alongside a Bluetooth communication module. This streamlined design not only reduces cost and energy consumption but also simplifies implementation, making it especially well suited for educational use cases or scenarios where hardware resources are constrained. Despite this simplicity, the robot achieves competitive performance. For example, during Test 1, the robot reached the target using 13 movement commands (the number of yellow cells from the Figure 13b), each corresponding to a step of 8.5 cm. This resulted in a total path length of 110.5 cm completed in 13.7 s (Table 4). By extrapolating from this result, the robot would be capable of traversing a 1000 cm path in approximately 2 min. This speed indicates a higher operational efficiency than that achieved by the more sensor-rich platform described in [16], which takes more time to cover this distance. These results support the idea that effective and responsive navigation in structured environments can be achieved even with minimal hardware, provided that control strategies and perception methods are well optimized.

As highlighted in [17], the gyroscope component can introduce errors, particularly due to its sensitivity during forward motion. This issue is also noted in [18], where an attempt is made to calibrate the sensor while the robot remains stationary. However, this method does not always guarantee accurate or long-term calibration. In the presented paper, the issue was addressed through the use of control algorithms. These controllers provide precise movement responses. Furthermore, the control approach offers an additional advantage: it allows for a predictable execution time for each movement command. This time-based control method provides more consistency and reliability, especially when executing sequential navigation commands where timing and orientation are critical. Specifically for the gyroscope, the system waits until the robot completes a rotation by a specified angle before proceeding.

However, the system has some limitations. Both the encoder-only approach and the approach using a gyroscope exhibit error accumulation over time. In addition, the rotational phase poses a critical challenge for the encoder-only approach as any unexpected interactions during this stage, such as minor collisions or surface irregularities, can cause significant drift. As the encoders lack direct feedback on the robot’s actual orientation, these deviations remain uncorrected and ultimately affect the robot’s ability to follow precise trajectories. This issue could be addressed through a hybrid strategy combining offline processing and real-time analysis in future work. This would enable the robot to navigate from one point to another within a building without electrical power—provided it is equipped with additional ultrasonic or infrared sensors to avoid collisions with surrounding objects, particularly in cases where unforeseen obstacles are present and not previously mapped. The real-time component would support continuous trajectory monitoring, enabling the robot to correct errors introduced by the encoder or gyroscope.

In scenarios where the maze cannot be fully captured within a single camera’s field of view, a distributed vision system is a promising solution. Multiple cameras are deployed at different vantage points, each capturing a portion of the environment. The system then collectively reconstructs the complete maze layout. The cameras can then operate cooperatively, sharing their local observations to build a global map, which can be used for path planning and navigation. This approach enables robots to handle larger and more complex mazes that exceed the spatial limitations of a single camera setup. While this approach introduces new challenges such as synchronization, calibration, and data fusion across cameras, it also opens valuable avenues for future development and real-world applicability.

4. Conclusions

Thus, this project proposes a low-cost and flexible solution for autonomous maze navigation in which the pathfinding process is carried out through image processing techniques. Unlike traditional approaches that rely on predefined maze dimensions, this method does not require the maze to have a fixed number of cells. Instead, a grid is dynamically overlaid on the processed image, and each cell is analyzed to determine whether it contains obstacles or open space. The result is a binary matrix representation of the maze, which can then be used for path planning. To compute the optimal route from the start line to the finish line, the Breadth-First Search (BFS) algorithm is employed. This graph-based search strategy guarantees the shortest path, making it suitable for environments where efficiency and minimal travel distance are required.

Additionally, this paper presents a navigation component and an image processing pipeline based on neural networks. Specifically, a custom model was trained using YOLOv8 to facilitate the detection of the start and finish lines, while depth analysis using MiDaS_small proved robust to variations in shape and lighting conditions for obstacle detection. Integrating the control system with vision-based processing yielded a cost-effective solution capable of navigating mazes in various configurations.

Moreover, by integrating additional infrared or ultrasonic sensors and enhancing the obstacle detection capabilities, the proposed low-cost robot can be seen as a scalable prototype for several real-world applications. In healthcare environments, it could assist with medication delivery or basic cleaning tasks if equipped with suction components. By integrating a microphone and a speaker, the system could also interact with patients, asking daily health-related questions and helping assess their condition through simple dialogue, similar to the NAO robot described in [19]. In industrial settings, a scaled-up version could serve as a transport robot in structured factory zones, where paths are predefined and new obstacles are rare. While the current system demonstrates effective performance in controlled scenarios, future work will focus on improving robustness and adaptability for deployment in dynamic and larger-scale environments.

Author Contributions

Conceptualization, M.-S.M.; methodology, M.-S.M.; software, M.-S.M.; validation, M.-S.M. and E.-H.D.; formal analysis, M.-S.M.; investigation, M.-S.M.; resources, M.-S.M.; data curation, M.-S.M.; writing—original draft preparation, M.-S.M.; writing—review and editing, M.-S.M.; visualization, M.-S.M.; supervision, E.-H.D.; project administration, M.-S.M. and E.-H.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The relevant data are presented in the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wiley, V.; Lucas, T. Computer vision and image processing: A paper review. Int. J. Artif. Intell. Res. 2018, 2, 29–36. [Google Scholar] [CrossRef]
Ktistakis, I.P.; Goodman, G.; Britzolaki, A. Applications of ai in healthcare and assistive technologies. In Advances in Assistive Technologies: Selected Papers in Honour of Professor Nikolaos G. Bourbakis–Vol. 3; Springer: Berlin/Heidelberg, Germany, 2021; pp. 11–31. [Google Scholar]
Niku, S.B. Introduction to Robotics: Analysis, Control, Applications; John Wiley & Sons: New York, NY, USA, 2020. [Google Scholar]
Soares, D.M.; Calil, H.A.; Stephan, R.M. Cascade control vs full-state feedback. In Proceedings of the 2019 IEEE 15th Brazilian Power Electronics Conference and 5th IEEE Southern Power Electronics Conference (COBEP/SPEC), Santos, Brazil, 1–4 December 2019; pp. 1–6. [Google Scholar]
Alamri, S.; Alshehri, S.; Alshehri, W.; Alamri, H.; Alaklabi, A.; Alhmiedat, T. Autonomous maze solving robotics: Algorithms and systems. Int. J. Mech. Eng. Robot. Res. 2021, 10, 668–675. [Google Scholar] [CrossRef]
Ranftl, R.; Lasinger, K.; Hafner, D.; Schindler, K.; Koltun, V. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 1623–1637. [Google Scholar] [CrossRef] [PubMed]
Alatise, M.B.; Hancke, G.P. A review on challenges of autonomous mobile robot and sensor fusion methods. IEEE Access 2020, 8, 39830–39846. [Google Scholar] [CrossRef]
Ambeskar, A.; Turkar, V.; Bondre, A.; Gosavi, H. Path finding robot using image processing. In Proceedings of the 2016 International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, 26–27 August 2016; Volume 3, pp. 1–6. [Google Scholar]
Pame, Y.G.; Kottawar, V.G.; Mahajan, Y.V. A Novel Approach to Maze Solving Algorithm. In Proceedings of the 2023 International Conference on Emerging Smart Computing and Informatics (ESCI), Pune, India, 1–3 March 2023; pp. 1–6. [Google Scholar]
Ambeskar, A.; Bondre, A.; Turkar, V.; Gosavi, H. Intuitive solution for Robot Maze Problem using Image Processing. In Proceedings of the 2019 IEEE Bombay Section Signature Conference (IBSSC), Mumbai, India, 26–28 July 2019; pp. 1–6. [Google Scholar]
Gao, W.; Zhang, X.; Yang, L.; Liu, H. An improved Sobel edge detection. In Proceedings of the 2010 3rd International Conference on Computer Science and Information Technology, Wuxi, China, 4–6 June 2010; Volume 5, pp. 67–71. [Google Scholar]
Avila-Sánchez, L.A.; Sánchez-López, C.; Ochoa-Montiel, R.; Montalvo-Galicia, F.; Sánchez-Gaspariano, L.A.; Hernández-Mejía, C.; González-Hernández, H.G. Maze Solving Mobile Robot Based on Image Processing and Graph Theory. Technologies 2023, 11, 171. [Google Scholar] [CrossRef]
Aqel, M.O.; Issa, A.; Khdair, M.; ElHabbash, M.; AbuBaker, M.; Massoud, M. Intelligent maze solving robot based on image processing and graph theory algorithms. In Proceedings of the 2017 International Conference on Promising Electronic Technologies (ICPET), Deir El-Balah, Palestine, 16–17 October 2017; pp. 48–53. [Google Scholar]
Keyestudio. Keyestudio Smart Little Turtle Robot Car V3.0 for Arduino STEM Programmable Robot Kit. 2025. Available online: https://www.keyestudio.com (accessed on 27 May 2025).
Mănase, M.S. Design of a Low-Cost Distributed Cooperative Monitoring System. In Proceedings of the ACSC Conference, Cluj-Napoca, Romania, 26 May 2023. [Google Scholar]
Alamri, S.; Alamri, H.; Alshehri, W.; Alshehri, S.; Alaklabi, A.; Alhmiedat, T. An autonomous maze-solving robotic system based on an enhanced wall-follower approach. Machines 2023, 11, 249. [Google Scholar]
Bienias, Ł.; Szczepański, K.; Duch, P. Maze exploration algorithm for small mobile platforms. Image Process. Commun. 2016, 21, 15. [Google Scholar]
Ecsedi, E.; Silaghi, H.; Mihok, E.; Spoială, V. The development of an autonomous maze robot. In Proceedings of the 2019 15th International Conference on Engineering of Modern Electric Systems (EMES), Oradea, Romania, 13–14 June 2019; pp. 169–172. [Google Scholar]
Zguda, P.; Radosz-Knawa, Z.; Kukier, T.; Radosz, M.; Kamińska, A.; Indurkhya, B. How Do Older Adults Perceive Technology and Robots? A Participatory Study in a Care Center in Poland. Electronics 2025, 14, 1106. [Google Scholar] [CrossRef]

Figure 1. System block diagram.

Figure 2. System block diagram—image processing.

Figure 3. The steps used in image processing: (a) Original image. (b) Sobel edge detection. (c) YOLOv8 detection. Color-based detection. (d) Final path extracted from processed image.

Figure 4. Depth detection.

Figure 5. Turtle robot.

Figure 6. System block diagram—robot design.

Figure 7. Process identification—velocity (right motor).

Figure 9. Cascade control scheme implemented in Simulink.

Figure 10. System behavior (a) The output of the outer loop. (b) The command applied to the motors.

Figure 11. System robustness and disturbance rejection: (a) Robustness—inner loop. (b) Disturbance—inner loop output ±10. (c) Robustness—outer loop. (d) Disturbance—outer loop output +10.

Figure 12. Error estimation: (a) Test1—encoder. (b) Test1—gyroscope. (c) Test2—encoder. (d) Test2—gyroscope.

Figure 13. Test 1 results: (a) Original maze image with YOLO detection. (b) Final path extracted from the processed image.

Figure 14. Test 2 results: (a) Original maze image with YOLO detection. (b) Final path extracted from the processed image.

Figure 15. Performance of the YOLOv8 Model.

Figure 16. Validation samples—YOLO detection network.

Table 1. Step Response performance under parameter variations for left motor.

Test Case	Settling Time (s)	Overshoot (%)
Initial System	0.12	1.2
T +30%	0.29	5.16
K +30%	0.09	1.97
K −30%	0.19	0.45
T&K +30%	0.25	6.28

Table 2. Step response performance under parameter variations of

T_{i}

for position control.

Table 2. Step response performance under parameter variations of

T_{i}

for position control.

Test Case	Settling Time (s)	Overshoot (%)
Initial System	7.98	0
$T_{i}$ +30%	10.01	0
$T_{i}$ −30%	5.55	0

Table 3. Comparison between encoder-based and gyroscope-based navigation.

Metric	Encoders Only	Encoders + Gyroscope
Complexity	Worse	Better
Path Deviation	Similar	Similar
Component Cost	Better	Worse
Power Consumption	Better	Worse

Table 4. Execution time comparison between navigation methods.

Test #	Encoders Only (ms)	Encoders + Gyroscope (ms)
Test 1	13,778	13,780
Test 2	32,143	32,233

Table 5. Positioning error for navigation methods.

Test	Output (px)	Target (px)	Error (px)	Error (cm)
Test1_Encoder	(522, 821)	(386, 796)	138.28	6.06
Test1_Gyroscope	(365, 346)	(345, 251)	97.08	7.78
Test2_Encoder	(1123, 1727)	(1077, 1532)	200.35	9.36
Test2_Gyroscope	(1228, 1681)	(1061, 1481)	260.56	13.34

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mănase, M.-S.; Dulf, E.-H. A Low-Cost Approach to Maze Solving with Image-Based Mapping. Technologies 2025, 13, 298. https://doi.org/10.3390/technologies13070298

AMA Style

Mănase M-S, Dulf E-H. A Low-Cost Approach to Maze Solving with Image-Based Mapping. Technologies. 2025; 13(7):298. https://doi.org/10.3390/technologies13070298

Chicago/Turabian Style

Mănase, Mihai-Sebastian, and Eva-H. Dulf. 2025. "A Low-Cost Approach to Maze Solving with Image-Based Mapping" Technologies 13, no. 7: 298. https://doi.org/10.3390/technologies13070298

APA Style

Mănase, M.-S., & Dulf, E.-H. (2025). A Low-Cost Approach to Maze Solving with Image-Based Mapping. Technologies, 13(7), 298. https://doi.org/10.3390/technologies13070298

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Low-Cost Approach to Maze Solving with Image-Based Mapping

Abstract

1. Introduction

2. Materials and Methods

2.1. Image Processing

2.2. Design of the Robot

2.3. Velocity Control

2.4. Position Control

3. Results and Discussion

3.1. Velocity and Position Control

3.2. Maze Solving

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI