Multi-Neural Network Localisation System with Regression and Classification on Football Autonomous Robots

Lopes, Carolina Coelho; Ribeiro, António; Ribeiro, Tiago; Lopes, Gil; Ribeiro, A. Fernando

doi:10.3390/ai6020027

Open AccessArticle

Multi-Neural Network Localisation System with Regression and Classification on Football Autonomous Robots

by

Carolina Coelho Lopes

¹

,

António Ribeiro

¹

,

Tiago Ribeiro

²

,

Gil Lopes

³

and

A. Fernando Ribeiro

^2,*

¹

Industrial Electronics Department, University of Minho, 4800-058 Guimarães, Portugal

²

Industrial Electronics Department, ALGORITMI Centre, 4800-058 Guimarães, Portugal

³

LIACC & ISEP, Polytechnic Institute of Porto, 4249-015 Porto, Portugal

^*

Author to whom correspondence should be addressed.

AI 2025, 6(2), 27; https://doi.org/10.3390/ai6020027

Submission received: 27 November 2024 / Revised: 22 January 2025 / Accepted: 3 February 2025 / Published: 5 February 2025

(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)

Download

Browse Figures

Versions Notes

Abstract

:

In environments like the RoboCup Middle Size League (MSL), precise and rapid localisation of robots is crucial for effective autonomous interaction. This study addresses the limitations of conventional localisation approaches—often based on single-camera systems or sensors such as LiDAR (Light Detection and Ranging) and infrared—by developing a robust Artificial Intelligence (AI)-based multi-camera system solution. This method uses multiple neural networks, breaking down the problem while taking advantage of both classification and regression methods. The solution includes a classification neural network to detect field markers, such as line intersections, and two regression neural networks: one for calculating the position of the markers, and another for determining the robot’s position in real-time. It takes advantage of both approaches while maintaining the desired performance, accuracy, and robustness, simplifying the training process and adapting it to different scenarios. Designed specifically to meet MSL robotics’s high-speed demands and precision requirements, the system employs data augmentation techniques to ensure resilience against lighting, angles, and position variations. The results show that this optimised approach improves spatial awareness and accuracy, promising robot football advancements. Beyond MSL applications, this method has the potential for broader real-world uses that require dependable, real-time localisation in dynamic settings.

Keywords:

neural networks; classification; regression; multi-camera localisation; artificial intelligence; marker-based positioning; autonomous robot; robotic vision systems; RoboCup; self-localisation

1. Introduction

Started in 1997, RoboCup is the largest worldwide robotics challenge [1], and it was created to promote the development of robotics research and artificial intelligence. The main challenge consists of making a team of robots to play football against a team of humans in the year 2050 [2]. The challenge is not the football itself, but all the research and development around it in areas such as Artificial Intelligence, Computer Vision, Control, Localisation, Multi-Agent Cooperation [3], Mechanics, and Communication [4], which aims to drive advancements in other fields. Over the years, other challenges have come under consideration with the creation of other leagues like Rescue, @Home [5], Industry, and Junior. Within the RoboCup football challenge, several different leagues exist. The Middle Size League (MSL) features robots weighing up to 40 kg and standing 80 cm tall, with no external sensors permitted [6]. The Small Size League (SSL) features robots 200 mm in diameter, and its vision system is centralised with a single camera above the field that serves the entire team. The Standard Platform League (SPL) features robots from the same manufacturer (NAO robots), and the challenge is based on software. The Humanoid Leagues (Kids, Teens, and Adults) feature two-legged robots playing in teams of two robots each. One of the research topics in all these football leagues is robot localisation, which is a difficult task due to many factors, such as the robot’s speed, wheels/legs being slippery, collisions, the uncertainty of the data acquired, etc. On top of that, in the MSL, no external sensors are allowed, which makes it even more difficult to calculate the robot’s position. Due to the high speeds at which these robots move (some play at up to 8 m/s) [7], the time to calculate the robot’s position in the field is critical, and the code must be thoroughly optimised. The LAR@MSL team [8] from the Laboratory of Automation and Robotics at the University of Minho (Portugal) developed a new approach that uses a vision system to grab 360-degree images and uses a neural network to achieve localisation based on the distance and angle of known markers in the field. This solution was used in the 2024 RoboCup World competition event and proved to be very successful and affordable (both in terms of price and computational cost).

2. State of the Art: Intelligent Football Robots and Localisation

The field of intelligent football robots is expanding rapidly, with applications ranging from football gameplay and facility patrolling to assistance for people with disabilities [5]. Central to their functionality is the challenge of localisation—determining the robot’s precise position and orientation within its environment. Conventional localisation methods use single-camera setups or combinations of sensors, such as LiDAR (Light Detection and Ranging) or IMUs (Inertial Measurement Units). While LiDAR is robust to shadow effects, other approaches can face limitations in dynamic environments, particularly due to varying lighting conditions and shadow interference.

Neural networks have emerged as a promising alternative. They learn to map visual features to a robot’s position and orientation, offering greater robustness than traditional methods. This paper reviews advancements in localisation, taking advantage of both classification and regression neural networks. It primarily focuses on the RoboCup initiative and current localisation systems.

2.1. Localisation Systems for MSL Football Robots

As the competition has intensified, robust vision systems have become essential for enabling MSL robots to navigate the field and interact strategically. Key localisation methods [9,10] developed for these robots include:

Triangulation Approach: Utilises coloured goals and posts as landmarks [11];
Geometry localisation: Employs Hough transforms to detect field lines and markers [12,13];
Monte Carlo localisation (MCL): Applies Bayesian filtering with particles to estimate position [14,15,16];
Matching Optimisation localisation: Enhances accuracy by aligning visual features to field markers [17];
Template Matching: Transforms the image to simulate a top view from the robot itself and finds the spot on the field with the highest matching percentage [18].

With the shift from coloured to white goal nets, MCL, matching optimisation, and template matching remain the most widely used methods due to their robustness.

2.2. Monocular Vision

Monocular vision systems are widely used in humanoid robotics [19,20,21]; however, they inherently limit the field of view, thereby restricting the amount of data that can be captured. To achieve a 360° field of view in single-camera setups, catadioptric vision systems are often utilised. These systems combine a convex mirror with an upward-facing camera, enabling omnidirectional image acquisition.

This is critical for self-localisation, object detection and tracking, and movement predictions. Teams often supplement this with a forward-facing camera for improved precision in specific tasks. The Tech United Eindhoven team uses robot localisation based on the goals and triangulates the position based on the detected goalposts [22]. The NuBot team uses YOLO v2, with CUDA-enabled parallel processing for real-time accuracy [23]. Both teams face hardware constraints but continue innovating with GPU enhancements.

2.3. Multi-Camera Systems and Advanced Visualisation

Multi-camera systems, such as the four-camera setup employed by the Falcons team [24], offer increased resolution and a complete panoramic view, enhancing spatial awareness. Although challenging in terms of synchronisation and processing, these systems provide a critical competitive edge.

2.4. Odometry

Odometry, often using encoders or compasses, allows robots to approximate their field position based on movement data [25]. While magnetic interference previously deterred compass-based methods, modern solutions integrate gyroscopes and Kalman filters for greater stability. The former LAR team used vision-based methods, specifically histograms for orientation [18], where rotation attempts are made until maximum histogram values are reached to determine orientation.

2.5. Computer Vision

Computer vision techniques allow robots to detect objects and navigate fields using real-time visual cues and libraries such as OpenCV. This capability is essential for tracking objects and avoiding false positives, although processing demands can strain system resources.

2.6. Self-Localisation with Deep Learning

Deep learning has revolutionised robot self-localisation, particularly through CNNs trained on extensive datasets to generalise across new environments [26]. While these models excel in complex environments, they require significant computational resources and labelled data. Deep learning offers robust, data-driven solutions to dynamic localisation [27], paving the way for more adaptable robots in the MSL.

This comprehensive review underscores the continued evolution of localisation methods from traditional vision-based approaches to innovative neural network applications. It highlights the strategic challenges and technological demands of autonomous football robots in the MSL context.

2.7. Summary of Localisation Methods

A concise comparison of localisation methods used in robotic systems is presented in this section. This summary emphasises the trade-offs inherent in each approach, providing insights into their applicability across different scenarios. Table 1 highlights the strengths and weaknesses of commonly employed techniques, from traditional methods to advanced neural network-based solutions.

3. Robot Hardware

The robots’ hardware and mechanical description are presented in this section to help better understand the development environment for the localisation system. These robots acquired all real-world results presented in this paper. Every robot is divided into different modules, each responsible for a specific task. The modules have the necessary sensors and actuators, but some sensors are also necessary for multiple robot parts. Figure 1 shows a robot schematic with all its modules, its interaction with the base station, and the protocols used between each model.

The main sensor in these robots is the OmniDirectional Vision System. This system allows the robots to perceive the surrounding environment and gather information that will be used in the presented self-localisation solution.

Vision System

This system is positioned at the highest point of the robot which is allowed by the rulebook to take as much advantage of its height as possible and to make it easier to see objects on the floor. The head is equipped with three cameras, each out of phase by

120^{\circ}

, tilted downwards at an angle of

30^{\circ}

, and positioned around the central axis of the robot. Figure 2 shows both the robot and its head.

Each camera works with a lens with a maximum field of view (FOV) of 180°; however, to achieve higher frame rates with a lower resolution (

640 \times 480

pixels), the image is cropped by the sensor, limiting the FOV to 125°. With the three cameras, this is still enough to cover the 360° around the robot, so there are no blind spots. Each camera (Kayeton Technology Co., Ltd., Shenzhen, China, https://www.kayetoncctv.com/Product/Info/360?cid=925&page=1, accessed on 5 May 2024) uses an OV2710 sensor. Table 2 summarises the configurations that the sensor operates under.

The head was designed and 3D printed to withstand impacts from other robots and ball kicks while ensuring the camera’s calibrated position remains precise. The head comprises three parts: a high-precision component that holds the camera at the desired angles, a lower part that puts the cameras at the desired height, and a top part that covers the cameras and holds the Inertial Measurement Unit (IMU) and Liquid Crystal Display (LCD).

The three cameras, in addition to their high framerate and high FOV, provide auto-brightness and auto-exposure to further improve the collected image quality. In addition to the three-camera system, a digital compass (Adafruit Industries: New York, NY, USA, https://www.adafruit.com/product/4754, accessed on 5 May 2024)—in this case, the Adafruit BNO085—is also used. It returns information from the gyroscope, accelerometer, and magnetometer, but the only data used are the filtered yaw values to obtain the robot’s orientation on the field.

4. Developed Solution

The proposed solution utilises neural networks to interpret visual data (images) and determine the precise position of robots on the playing field. This type of system is particularly suited to dynamic environments, such as robot football matches, where it is essential for robots to accurately locate themselves to avoid collisions and coordinate with their teammates. Figure 3 represents the general architecture of the developed solution.

4.1. General Architecture

The system comprises four main components:

Image Acquisition via Cameras: The process begins with image capture by cameras mounted on the robot. These three cameras, positioned at various angles, provide a comprehensive view of the surrounding environment, capturing field markings and any other relevant visual elements for navigation;
Marker Detection Neural Network: The captured images are processed by a classification neural network dedicated to visual marker detection. These markers are the regulatory field markings defined by FIFA (Fédération Internationale de Football Association) and the Middle Size League (MSL) rules [6];
Coordinate System Translation Neural Network: The data from marker detection are processed by a neural network that translates the coordinate system. This involves converting (u, v) coordinates (derived from pixel-based images) into the field’s coordinate system (Angle, Distance), where the robot operates. The result is a transformation of image coordinates into three-dimensional (although the z coordinate “height” is not considered during the game) physical space in real-world units, such as metres;
Integration with Orientation Sensors: To further enhance localisation and navigation accuracy, data from a compass and other orientation sensors are integrated into the system. This additional information allows the robot to adjust its orientation relative to visual markers, ensuring precise measurements and detection;
Localisation Neural Network: The previously processed data are fed into a regression neural network that accurately estimates the robot’s exact position on the field, resulting in (x, y) coordinates, also expressed in metres. This localisation process uses previously identified visual references to position the robot in the environment accurately and continuously.

4.2. Markers

The proposed solution relies on detecting and recognising visual markers corresponding to standard football field markings, as defined by FIFA and MSL regulations. Unlike systems that use artificial or added markers on the field, this approach is based entirely on the regulatory markings already present on the playing field, making the method exceptionally natural and practical.

These visual markers include fundamental elements such as the four corner marks, the two penalty spots, the centre circle, and boundary lines. In addition, geometric shapes naturally appearing on a football field’s lines are also considered: “L” shapes (where two straight lines meet at a right angle, excluding the field corners), “T” shapes (intersections between area lines or centre field lines), and “+” shapes, such as those found at the field’s centre. Figure 4 illustrates the considered visual markers.

The field corners are represented in Figure 4 by dark blue arcs located at each of the field’s four extremities. The penalty spots are the red circles positioned in the respective penalty zones, while the centre of the field is represented by a purple circle at the centre of the pitch.

Furthermore, there are additional markers that assist in robot localisation. The “T”—shaped markers (in cyan) are distributed along the side and centre lines at strategic positions. The “L”—shaped markers (in yellow) are placed near the goals and sidelines, providing more detailed spatial reference points.

Finally, the goalposts are represented by green markers at each end of the field, which is crucial for detecting the robot’s position relative to the goals.

Table 3 summarises the full list of markers, their quantity, and the assigned colours for graphical representation in the subsequent image.

This robot self-localisation system does not require any modifications to the playing field by using these preexisting, regulated markings. The cameras mounted on the robots capture these markings, which serve as reference points for both human players and robots. The classification neural network detects these markings and uses them as reference points to pass down to the regression neural network and estimate the robot’s precise position in real coordinates within the field’s three-dimensional space.

Furthermore, this localisation method mirrors how humans orient themselves on a football field, as human players rely on field lines and markings for positioning. The proposed system emulates this process, using the same visual references that any human player would do during a match, making the robot localisation process more intuitive and effective.

4.2.1. Inital Phase: Simulation with Webots

In the initial stage, all development was conducted in a simulated environment using an open-source robot simulator with high performance, Webots R2023a [28]. A scenario was created that replicated real game conditions, including the configuration of the robots, their cameras, and the most similar intrinsic and extrinsic parameters for image capture so that the simulated data would closely match reality. Within this virtual environment, the robot was programmed to position itself around the field, capturing images through its cameras. Figure 5 shows the similarities between a robot’s simulated view and a real-world view on a similar field position.

These simulated images formed the initial basis of the dataset. To maximise the efficiency of this process, an algorithm was developed to identify and record the bounding boxes of the different markers detected in the images, visible in Figure 6. This algorithm accounted for the distance and resolution limitations of the cameras, automatically generating a labelled dataset in a format compatible with the YOLO (You Only Look Once) framework, which is widely used for object detection [29].

The primary motivation for this initial phase in a simulated environment was to leverage YOLO’s auto-annotation tool, which would later be applied to images captured in a real environment. The idea was that, through simulation, a robust detection model could be constructed and then directly transferred to real-world images with minimal human intervention.

4.2.2. Real-World Stage: Manual Annotation and Validation

However, upon migrating this initial model to the real environment, several inconsistencies arose in YOLO’s auto-labelling. The classification network generated from the simulated images failed to accurately detect markers in real images, necessitating significant manual correction of the annotations.

Therefore, to refine the dataset, it was necessary to capture real images and extract frame-by-frame from videos of robots operating on the field. The robots were placed in various positions around the playing area, ensuring that all markers (corner lines, penalty marks, centre circle, etc.) were repeatedly detected from different angles and distances, improving the dataset’s variability.

In addition, Roboflow’s Data Augmentation tool was used to overcome the limitation of the number of real images captured and to enhance model robustness in recognising markers under different conditions. This functionality enabled the dataset to be expanded by generating new variations from the original real images. Table 4 lists some permutations used to create new images.

These operations increased the variability of the dataset, resulting in an approximately threefold increase in the original number of images, corresponding to the platform’s free maximum limit. This dataset expansion was essential to ensure that the trained network could handle different situations and visual conditions, promoting a more generalised model adapted to the variations of a real game environment.

Despite the challenges encountered with the auto-labelling tool, the manual annotation process resulted in a more robust and accurate dataset. The meticulous review of each image ensured that the neural network had a sufficiently rich database to automatically detect the visual markers in the real environment with a high success rate.

In summary, the final dataset was constructed using a combination of simulated and real images, with an iterative process of annotation and manual verification. This ensured that the marker detection system was optimised for real-world conditions, even with the limitations imposed by the initial simulation. Table 5 shows the number of images for each marker. Figure 7 shows data distribution and characteristics used during the YOLO model training.

4.2.3. Training of the Marker Detection Neural Network

Among the numerous network architectures and versions of YOLO available for object detection, the smaller model (YOLOv10s) was chosen due to its balance between simplicity and efficiency. This model provided the desired performance level while ensuring the appropriate accuracy for the training requirements. This study required the detection and differentiation of seven unique markers, and YOLOv10s proved suitably equipped to handle this complexity.

A range of hyperparameter configurations was experimented with to achieve optimal results. It was determined that training with 1500 epochs was appropriate to maximise the learning potential, alongside a patience parameter of 100 to prevent premature termination of training in cases where improvements stalled. Additionally, an automated batch-sizing approach was employed to enhance computational efficiency and ensure stability throughout the training process. The model’s input image size was stretched to 640 × 640 pixels, which was chosen to balance the computational load and the fidelity required for accurate detection.

Figure 8 displays both the Precision-Recall Curve and the Normalised Confusion Matrix of the used network.

4.2.4. Limitations

The self-localisation system based on visual marker detection has certain limitations inherent to the conditions of the environment in which it operates. Firstly, a significant restriction concerns the maximum distance at which markers can be detected. Due to the resolution of the cameras installed on the robots (

640 \times 480

pixels), this maximum distance was experimentally determined to be 6 m. This value was obtained after testing in both simulated and real environments, confirming that this is the optimal limit for ensuring accurate detection based on the physical characteristics of the official MSL field (22 m × 14 m).

However, this 6-metre range is less than half the total field dimension, meaning that in certain situations, especially when the robot is positioned at the far ends of the field, marker detection is compromised. Figure 9 illustrates the worst-case scenario, considering the division of one-half of the field into three sections, which occurs in the most distant zone (the outer two-thirds of the playing area), where detection is most challenging. This imposes a significant challenge for the system, as the robot may lose the necessary visual reference for precise localisation.

Another limiting factor relates to the dynamics of the game. During a match, it is common for many of the visual markers to be obscured by other robots or moving objects, which impairs the robot’s localisation capability. This issue directly affects the robustness and reliability of the solution in real-game situations.

4.2.5. Advantages

This approach has a clear advantage over traditional localisation methods, which often rely on detecting colour transitions (e.g., green for the field and white for the lines) and are limited by the need for a field with specific visual characteristics.

Another significant advantage of this approach is its resilience to lighting variations. Traditional methods relying on colour are particularly vulnerable to variations in ambient lighting, which can occur even in indoor environments throughout the day. However, the cameras used in this system have an auto-calibration function, and the neural networks involved can adapt well to these variations, ensuring robust detection of markers regardless of the lighting conditions.

Additionally, data augmentation enhances the network’s ability to generalise and accurately detect markers under various lighting and environmental settings.

4.3. Camera-World Coordinates Transformation

Since the game’s coordinate axes’ origin is located at the centre of the field, and given that the cameras grab the field image, it was necessary to establish a relationship between the screen coordinates (u, v) and the ground coordinates (x, y). The utilised cameras are low-cost models and, therefore, do not possess known lens parameters, which prevents the use of conventional mathematical models for coordinate correction and transformation. Given this limitation, a meticulous calibration process was employed, where real-world distances and angles (subsequently converted into field coordinates, x and y) were related to the screen coordinates (u, v).

4.3.1. Calibration Procedure

The calibration was performed using real positions with known coordinates relative to the robot. A checkerboard-like ground surface was used, with

195 mm

squares. The robot was positioned on this grid, with its centre at the vertex of one square, which was considered the origin of the axes (relative to the robot), and the image captured from the camera aligned horizontally with the pattern. This alignment was crucial to ensure that distortions in the image could be minimised during the pre-processing stage, as shown in Figure 10.

During the calibration, distances and angles between the robot and the vertices of the grid were captured within an area of up to 15 m in front of the robot and up to 3 m on each side. The screen coordinates (u, v) for each vertex were automatically recorded and then manually adjusted to correct for minor misalignments. Figure 11 shows all the grid vertices that were taken into account, marked in blue.

After recording these values, mathematical linear interpolation was considered to convert the coordinates. However, this approach was rejected because it created straight segments between known points, which could introduce errors throughout the image. This would compromise the precision necessary for the robot’s localisation.

4.3.2. Use of Neural Networks for Conversion

To overcome the limitations of traditional interpolation methods, a neural network was chosen, capable of capturing the non-linear relationship between screen coordinates and field coordinates with greater precision and without error accumulation. In this network, the input consists of the image coordinates (u, v), while the output consists of an angle and a distance, calculated relative to the robot’s centre, based on the real-world coordinates

(x, y)

. This approach eliminates the need for explicit formulas for coordinate transformation, providing a system adaptable to lens or positioning variations.

The model was developed in Python 3.10.12 using the Keras 3.1.1 library and trained on a MacBook Air with an M2 processor. To simplify the network and the calculation process, the x and y variables were treated separately, reducing the complexity of the architecture. The initial dataset contained 545 points, each consisting of u and v coordinates, real distance, and real angle. The network was trained for 2000 epochs, with a batch size of 256, using the Adam optimiser and the mean squared error loss function. When evaluating the network results for all pixels, two graphs were generated for both distance and angle. The angle graph proves that the angle fluctuates between positive and negative values based on the reference side. This variation occurs because the centre of the image represents the zero angle, indicating the direction the robot is facing. The distance graph indicates that the distance increases as it goes further up in the image, showing that lower u values reflect a higher distance. In contrast, higher values represent a lower distance value. Both graphs exhibit non-linearity due to the lens curvature, and the first 100 rows of pixels are excluded because the information is irrelevant. It is also notable in the angle graphic that the fisheye effect is more visible in pixels with higher u values near the robot, but further away, the angle becomes more linear. Figure 12 shows a graphic representation of the obtained results.

4.3.3. Integration into a Lookup Table for Fast Localisation

Since some robots operate at high speeds (up to 8 m/s), the localisation system must provide real-time responses to ensure accurate navigation. To optimise the response time, and because the possible outputs given by the network are limited by the image size, a lookup table was created with pre-calculated screen coordinates, allowing the system to directly access the real coordinates from memory. This way, the coordinate transformation is performed with no processing cost during robot operation as there are a finite number of possible solutions for the localisation problem. This ensures the system avoids redundant calculations by reusing pre-computed data, enhancing its computational efficiency and response speed. This method requires

4.91 MB

of raw data and uses

14.74 MB

of memory in Python, with

7.37 MB

allocated for angles and another

7.37 MB

for distances, all loaded at the start of the code. These memory requirements correspond to the current camera configuration, which operates at a resolution of

640 \times 480

pixels. Scaling to higher resolutions would not present significant challenges; for instance, for a full HD image (

1920 \times 1080

pixels), the lookup tables for both distances and angles would occupy

33.17 MB

of raw data and

99.53 MB

in Python 3.10.12. The discrepancy between raw data and Python memory usage arises from the way Python handles data types.

4.4. Localisation Neural Network

The localisation neural network plays a critical role in the robot’s positioning system, determining its precise coordinates on the playing field. Unlike other networks used in the system, this regression network does not work directly with raw visual information. Instead, it receives processed data as input, i.e., angles and distances from previously detected markers, already converted into real-world coordinates by the first regression neural network. These data also include orientation information provided by the robot’s compass, enabling the network to infer the position in space accurately and robustly.

4.4.1. Neural Network Architecture

The network architecture is a feedforward neural network composed solely of dense (fully connected) layers. The network has 31 inputs, corresponding to the full number of markers, whose angle and distance information are concatenated to generate an input vector. There is no use of CNNs or any other type of convolutional layer, as the focus is on processing already structured numerical data rather than raw images. Figure 13 illustrates the used regression neural network architecture.

The simplicity of the feedforward architecture proved effective for this type of problem, as the data had already undergone a process of extraction and conversion. The design of this network prioritised processing speed to synchronise with the robot communication system and to ensure the ability to generalise across different game scenarios.

4.4.2. Neural Network Training

Before being introduced to the neural network, the marker data underwent a preparation and cleaning process. To simulate real-world uncertainties, random noise was added to both the angle and distance of the markers during training. This noise was introduced to make the system more resilient and ensure the network can handle variations in sensor readings during the game. Specifically, the following adjustments were made:

Angle: A noise of up to ±10 degrees was added to the angle value, simulating slight variations in angular readings;
Distance: The distance to the markers was also adjusted with a noise of up to ±20 cm, reflecting variations in distance measurement due to different environmental conditions.

Furthermore, a maximum visibility threshold for the markers was defined. For instance, the goalposts, which are critical markers, have a visibility limit of 10 m, while other markers have a limit of 6 m. Should the distance to a marker exceed this limit or be too close (less than 0.35 m), it is considered invalid and, therefore, ignored (i.e., the distance value is adjusted to −1). This value was derived from the fact that the robot cannot successfully detect markers too close (around 35 cm) to its position.

After validating and preparing the required data, the angle and distance values were normalised to a range between 0 and 1. The angle was circularly converted to this range, while the distance was normalised such that a value close to 1 represented a nearby marker, and 0 represented a marker at the edge of its visibility. Additional steps included randomly invalidating one marker per data instance and setting it to −1 to enhance resilience to missing data. Lastly, all markers were re-ordered based on their normalised distance, prioritising closer markers, as these are more reliable and, therefore, expected to have a greater impact on the network’s learning. It was observed that without this reordering, the network became unreliable whenever a robot passed over a marker, even if it was distant. In the neural network training process, 40,000 data points were used, distributed across the entire football field, which measures 24 m along the X-axis and 16 m along the Y-axis. When considered in metres, this results in an average density of approximately 104 points per square metre, providing the network with a substantial number of samples to learn the various positions on the field. There was approximately one point for every 96 cm² of the field. This distribution ensured an average spacing of approximately 9.79 cm (

\sqrt{96}

) between adjacent points. This spacing is smaller than the smallest measured element on the field, which is the width of the lines at 12.5 cm. Furthermore, this spacing represents half of the noise induced in the distance measurements during network training, where a noise of ±20 cm was applied. In this way, the spacing ensured sufficiently dense sampling to capture significant variations in the robot’s position while maintaining an acceptable margin of error during the training process. Figure 14 shows the distribution of coordinates.

4.4.3. Evaluation of Results for the X-Axis Network

The regression network developed for the X-axis was trained over 500 epochs, with an initial learning rate of 0.0002. This learning rate was adjusted throughout the training whenever the “

v a l_l o s s

” metric stopped improving. The function used was “ReduceLROnPlateau” with an adjustment factor of 0.95 and a patience of 15. This means that whenever the selected metric did not improve over 15 epochs, the learning rate was adjusted to

95 %

of its current value.

After training this neural network, an analysis was conducted to evaluate its accuracy. The performance was measured in centimetres of error in the predictions of the robot’s position along the X-axis.

Table 6 illustrates the number of occurrences within different error intervals, ranging from 0 cm to 100 cm, along with the corresponding accuracy percentages.

Figure 15 demonstrates the obtained results.

Analysing both Table 6 and Figure 15, the performance of the network can be described as follows:

Initial Accuracy: Out of the total number of 40,000 coordinates, 4289 occurrences had no error, corresponding to $10.72 %$ of the total. Some 16,862 coordinates had an error of up to 10 cm, representing an accuracy of $52.88 %$ , and so on. This increase shows that the network is capable of locating the robot within a very small margin of error in most situations;
Progressive Improvement: As the error range increased to 20 cm, the network maintained a high performance, with $80.78 %$ accuracy. In the 30 cm and 40 cm ranges, accuracy continued to improve, reaching $92.56 %$ and $96.72 %$ , respectively. These values indicate that, for errors less than 50 cm, the network can locate the robot correctly in the vast majority of cases;
Stability Range: The network achieved nearly total accuracy for errors above 50 cm, with $99.57 %$ accuracy observed from 90 cm errors onward. This means that, in all predictions, the robot was correctly located within a margin of error of up to 90 cm.

4.4.4. Evaluation of Results for the Y-Axis Network

The regression network developed for the Y axis was trained over 300 epochs, with an initial learning rate of 0.0002. This learning rate was adjusted throughout the training whenever the “

v a l_l o s s

” metric stopped improving. The function used was “ReduceLROnPlateau” with an adjustment factor of 0.95 and a patience of 15. This means that whenever the selected metric did not improve after 15 epochs, the learning rate was adjusted to

95 %

of its current value. After training this neural network, an extensive testing phase was carried out to assess the system’s accuracy. The performance was measured in centimetres of error in predicting the robot’s position on the Y-axis.

Table 7 illustrates the number of occurrences within different error ranges, from 0 cm to 100 cm, along with their respective accuracy percentages.

Figure 16 demonstrates the obtained results.

Analysing both Table 7 and Figure 16, the performance of the network can be described as follows:

Initial Accuracy: Out of the total number of 40,000 coordinates, 6216 occurrences had no error, corresponding to $15.54 %$ of the total. For errors up to 10 cm, the number of occurrences increased significantly to 21,551, representing an accuracy of $69.42 %$ . This increase shows that the network is capable of locating the robot with a very small margin of error in most situations;
Progressive Improvement: As the error range increased to 20 cm, the network maintained high performance, with $91.55 %$ accuracy. In the 30 cm and 40 cm intervals, the accuracy increased, reaching $97.11 %$ and $98.64 %$ , respectively. These values indicate that, for errors less than 50 cm, the network can correctly locate the robot in the vast majority of cases;
Stability Range: The network achieved virtually perfect accuracy for errors greater than 50 cm, with $99.57 %$ accuracy observed for errors from 70 cm onwards. This means that, in all predictions, the robot was correctly located within a margin of error of up to 70 cm.

4.5. Conclusion of Results

The results demonstrate that the developed neural network localisation is highly accurate for each axis, with most errors falling below

50 cm

,

98 %

in the X axis, and

99 %

in the Y axis. For tasks involving navigation in open fields, where error margins of this magnitude are acceptable, the system proved extremely effective. When analysing 40,000 cases, the X axis had an average error of

3.82 cm

, and the Y axis had an average error of

1.61 cm

, proving the high precision and resolution of the network developed. The ability to predict the robot’s position with an accuracy of up to

10 cm

in more than half of the occurrences, and more than

80 %

with an error inferior to

20 cm

, smaller than the ball size, is an indicator of the robustness of the solution.

In instances where the number of detected markers is insufficient or the localisation error exceeds acceptable thresholds, sensor fusion is employed. This process integrates data from the robot’s locomotion encoders, which provide information on the distance travelled, with the heading direction obtained from the compass. By combining these inputs, the system achieves a more reliable estimate of the robot’s position. This sensor fusion technique ensures stability in localisation, effectively preventing abrupt positional jumps or significant errors between consecutive estimations.

The system achieved the desired performance due to its parallel computing capabilities. The localisation system is designed to be faster than the communication system, which operates at

25 Hz

. The computation for both the classification and regression networks takes an average of

38 ms

, resulting in a frequency of

26.3 Hz

. This makes it only slightly faster than the communication system, but there is still potential for further optimisation.

The final solution embodies a robust integration of visual detection and motion data, further refined by the Kalman filter’s capacity to smooth positional estimates, enhancing both accuracy and reliability. Thus, this tuned hybrid approach provides an excellent foundation for ensuring the safe and accurate navigation of the robots during football matches, contributing to their strategic positioning on the field.

5. Discussion

The proposed system demonstrates significant advancements in multi-camera localisation for football robots, but several aspects warrant further exploration and refinement. A detailed analysis of the results highlights key challenges and potential improvements that could enhance the system’s robustness and applicability.

One of the notable observations is the influence of marker detection accuracy on the overall localisation performance. The number of detected markers directly correlates with the precision of the estimated position. In scenarios where fewer markers are detected, localisation errors tend to increase, which could compromise the robot’s decision-making during gameplay. This limitation suggests that enhancing the marker detection rate, potentially through collecting and annotating additional training images, could improve the network’s performance. However, the gains from such an approach remain uncertain.

The overlapping regions captured by multiple cameras introduce another layer of complexity. In these regions, markers can be detected on two cameras, which could lead to redundancy or inconsistencies in data. To address this, the system incorporates a mechanism to identify and handle coincident detections, ensuring that each marker contributes only once to the localisation process.

A significant design decision in the system involves sorting markers within distinct groups based on their proximity to the robot. By prioritising closer markers, the network is guided to rely more heavily on data points that are likely to be more accurate and reliable. Empirical observations show that this approach enhances the network’s learning process, leading to more robust predictions, especially in scenarios with varying marker densities and distributions.

Another aspect deserving attention is the role of data augmentation and simulation tools in accelerating the development process. The use of Webots for simulation proved invaluable in creating realistic training environments, allowing for the generation of diverse datasets that closely mimic real-world conditions. Similarly, the integration of RoboFlow for dataset preparation and augmentation enabled the efficient handling of large volumes of data.

Despite these advancements, certain limitations persist, particularly in scenarios where markers are sparsely distributed, heavily occluded, or with compromised visibility. A comprehensive evaluation revealed that the system achieves localisation errors smaller than the robot radius (approximately 25 cm) in more than

90 %

of all cases. However, errors exceeding 90 cm were observed in rare and extreme conditions, such as when only a limited number of markers were visible or when other robots obstructed multiple markers. These outliers, accounting for less than

0.43 %

of occurrences, highlight the system’s robustness under typical operational scenarios while acknowledging its limitations in highly challenging situations. To provide context, a standard robot used in this setting typically has a diameter of 50 cm, making the observed error negligible in most gameplay scenarios. This demonstrates the practicality and reliability of the proposed approach for its intended use case.

To further enhance robustness, sensor fusion techniques were employed, integrating data from the robot’s locomotion encoders and compass. This fusion enabled the system to maintain stability by using encoders to measure travelled distance and a compass to provide heading direction, effectively mitigating positional jumps and large errors. Including a Kalman filter refined these estimates, ensuring smooth and consistent localisation. While additional methods, such as the Hungarian algorithm, could further optimise marker-matching, the current fusion-based approach effectively balances accuracy and real-time performance, demonstrating the system’s practicality and reliability in dynamic gameplay scenarios.

In a broader context, the findings underscore the potential of combining classification and regression neural networks to transform traditional approaches to localisation. Nevertheless, future research could further refine this approach, exploring its applicability to other domains and optimising its performance in increasingly complex scenarios.

6. Conclusions

This study introduced an innovative approach to a multi-camera localisation system for robotic football, distinguished by combining a classification neural network with regression neural networks to address the localisation challenge. The classification network detects markers, and its output is transformed into angle and distance values, which are then used by two regression networks to compute the robot’s precise location (x and y coordinates) on the field.

This innovative architecture bridges the gap between traditional vision-based localisation methods and modern artificial intelligence approaches. Unlike conventional solutions, which often rely on predefined geometric transformations or single-camera systems, the proposed method leverages neural networks to handle complex spatial relationships and dynamic environments with remarkable adaptability. By breaking the problem into smaller and simpler neural networks, this multi-network approach makes the system easier to replicate, debug, innovate, and adapt to other scenarios or vision systems, further enhancing its versatility and applicability.

The system demonstrated robustness, scalability, and efficiency, with potential applications beyond robotic football. Real-time performance was achieved through computational optimisations, such as pre-computed transformations, ensuring suitability for dynamic and high-speed environments.

The solution’s success depended on a diverse and high-quality marker dataset, which enabled effective generalisation. Nevertheless, marker visibility remains a critical factor, highlighting the importance of environments that enhance detection reliability.

Finally, the system was validated in the Middle Size League of robotic football, demonstrating its capability to address real-world localisation challenges. The error margins observed were within acceptable limits for this case study, proving the system’s reliability and accuracy.

This work received a first-place award in the 2024 RoboCup MSL Scientific Challenge.

Overall, this work represents a significant step forward in the field of robotic localisation, offering a solution that combines innovation, efficiency, and adaptability.

Author Contributions

Conceptualization, C.C.L., A.R., T.R., G.L. and A.F.R.; Data curation, C.C.L. and G.L.; Formal analysis, C.C.L. and G.L.; Funding acquisition, A.F.R.; Investigation, C.C.L., A.F.R., T.R. and G.L.; Methodology, C.C.L., A.R., T.R., G.L. and A.F.R.; Project administration, G.L. and A.R.; Resources, A.R.; Software, C.C.L., A.R. and A.F.R.; Supervision, G.L. and A.F.R.; Validation, C.C.L., A.F.R., T.R. and G.L.; Visualization, C.C.L. and A.F.R.; Writing—original draft, C.C.L.; Writing—review & editing, A.R., T.R., G.L. and A.F.R. All authors have read and agreed to the published version of the manuscript.

Funding

The third author received funding through a doctoral scholarship from the Portuguese Foundation for Science and Technology (Fundação para a Ciência e a Tecnologia) [grant number SFRH/BD/06944/2020], with funds from the Portuguese Ministry of Science, Technology and Higher Education and the European Social Fund through the Programa Operacional do Capital Humano (POCH). This work has been supported by FCT—Fundação para a Ciência e a Tecnologia within the R& D Units Project Scope: UIDB/00319/2020.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kitano, H. RoboCup-97: Robot Soccer World Cup I; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1998; Volume 1395. [Google Scholar]
Stone, P. Will Robots Triumph over World Cup Winners by 2050? Available online: https://spectrum.ieee.org/robocup-robot-soccer (accessed on 12 October 2023).
Ribeiro, A.F.A.; Lopes, A.C.C.; Ribeiro, T.A.; Pereira, N.S.S.M.; Lopes, G.T.; Ribeiro, A.F.M. Probability-Based Strategy for a Football Multi-Agent Autonomous Robot System. Robotics 2024, 13, 5. [Google Scholar] [CrossRef]
Fernando Alcântara Ribeiro, A. Probability-Based Strategy for Football Cooperative Multi-Agent Autonomous Robots. 2024. Available online: https://repositorium.sdum.uminho.pt/handle/1822/93742 (accessed on 1 February 2024).
Ribeiro, T.; Gonçalves, F.; Garcia, I.S.; Lopes, G.; Ribeiro, A.F. CHARMIE: A Collaborative Healthcare and Home Service and Assistant Robot for Elderly Care. Appl. Sci. 2021, 11, 7248. [Google Scholar] [CrossRef]
MSL Technical Committee 1997–2024. Middle Size Robot League Rules and Regulations for 2024, Version-25.1. 2024. Available online: https://msl.robocup.org/wp-content/uploads/2024/05/Rulebook_MSL2024_v25.1.pdf (accessed on 5 May 2024).
Deogan, A.S.; Kempers, S.T.; Hameeteman, D.M.J.; Beumer, R.M.; van der Stoel, J.P.; Olthuis, J.J.; Aangenent, W.H.T.M.; van Brakel, P.E.J.; Briegel, M.; Bruijnen, D.J.H.; et al. Tech United Eindhoven Team Description Paper 2023. Available online: https://msl.robocup.org/wp-content/uploads/2023/03/TDP_TechUnited2023.pdf (accessed on 28 March 2024).
Ribeiro, A.; Lopes, C.; Ribeiro, C.; Costa, J.; Martins, J.; Oliveira, P.; Silva, R.; Lima, R.; Martins, R.; Pereira, R.; et al. LAR@ MSL Description Paper 2024. Available online: https://lar.dei.uminho.pt/images/downloads/LAR@MSL_TDP_2024.pdf (accessed on 1 March 2024).
Li, X.; Lu, H.; Xiong, D.; Zhang, H.; Zheng, Z. A Survey on Visual Perception for RoboCup MSL Soccer Robots. Int. J. Adv. Robot. Syst. 2013, 10, 110. [Google Scholar] [CrossRef]
Lu, H.; Li, X.; Zhang, H.; Hu, M.; Zheng, Z. Robust and real-time self-localization based on omnidirectional vision for soccer robots. Adv. Robot. 2013, 27, 799–811. [Google Scholar] [CrossRef]
Font, J.M.; Batlle, J.A. Mobile robot localization: Revisiting the triangulation methods. IFAC Proc. Vol. 2006, 39, 340–345. [Google Scholar] [CrossRef]
Ballard, D. Generalizing the Hough transform to detect arbitrary shapes. Pattern Recognit. 1981, 13, 111–122. [Google Scholar] [CrossRef]
Marques, C.F.; Lima, P.U. A Localization Method for a Soccer Robot Using a Vision-Based Omni-Directional Sensor. In RoboCup 2000: Robot Soccer World Cup IV; Stone, P., Balch, T., Kraetzschmar, G., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2001; Volume 2019, pp. 96–107. [Google Scholar] [CrossRef]
Dellaert, F.; Fox, D.; Burgard, W.; Thrun, S. Monte Carlo localization for mobile robots. In Proceedings of the 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C), Detroit, MI, USA, 10–15 May 1999; Volume 2, pp. 1322–1328. [Google Scholar] [CrossRef]
Hong, W.; Zhou, C.; Tian, Y. Robust Monte Carlo Localization for humanoid soccer robot. In Proceedings of the 2009 IEEE/ASME International Conference on Advanced Intelligent Mechatronics, Singapore, 14–17 July 2009; pp. 934–939. [Google Scholar] [CrossRef]
Luo, R.; Min, H. A New Omni-vision Based Self-localization Method for Soccer Robot. In Proceedings of the 2009 WRI World Congress on Software Engineering, Xiamen, China, 19–21 May 2009; Volume 1, pp. 126–130. [Google Scholar]
Lauer, M.; Lange, S.; Riedmiller, M. Calculating the Perfect Match: An Efficient and Accurate Approach for Robot Self-localization. In RoboCup 2005: Robot Soccer World Cup IX; Bredenfeld, A., Jacoff, A., Noda, I., Takahashi, Y., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4020, pp. 142–153. [Google Scholar] [CrossRef]
Carlos Oliveira Martins, J. Omnidirectional Vision: Self-Localization and Object Detection with YOLO. 2024. Available online: https://repositorium.sdum.uminho.pt/handle/1822/93744 (accessed on 30 September 2024).
Guohua, L.; Xiandong, X.; Xiang, Y.; Yadong, W.; Tianwei, Q. An Indoor Localization Method for Humanoid Robot Based on Artificial Landmark. In Proceedings of the 2015 Fifth International Conference on Instrumentation and Measurement, Computer, Communication and Control (IMCCC), Qinhuangdao, China, 18–20 September 2015; pp. 1854–1857. [Google Scholar] [CrossRef]
Sharma, A.; Wadhwa, I.; Kala, R. Monocular camera based object recognition and 3D-localization for robotic grasping. In Proceedings of the 2015 International Conference on Signal Processing, Computing and Control (ISPCC), Waknaghat, Solan, India, 24–26 September 2015; pp. 225–229. [Google Scholar] [CrossRef]
Royer, E.; Lhuillier, M.; Dhome, M.; Lavest, J.M. Monocular Vision for Mobile Robot Localization and Autonomous Navigation. Int. J. Comput. Vis. 2007, 74, 237–260. [Google Scholar] [CrossRef]
Van Lith, P.; van de Molengraft, M.; Dubbelman, G.; Plantinga, M. A Minimalistic Approach to Identify and Localize Robots in RoboCup MSL Soccer Competitions in Real-Time. Available online: https://www.techunited.nl/uploads/Minimalist%20MSL%20Robot%20Location%205.0.pdf (accessed on 3 December 2023).
Luo, S.; Lu, H.; Xiao, J.; Yu, Q.; Zheng, Z. Robot detection and localization based on deep learning. In Proceedings of the 2017 Chinese Automation Congress (CAC), Jinan, China, 20–22 October 2017; pp. 7091–7095. [Google Scholar] [CrossRef]
Schreuder, E.; Feitsma, J.; Kouters, E.; Vos, J. Falcons Team Description Paper 2019. Available online: https://www.falcons-robocup.nl/images/Qualification-2019/falcons_team_description_2019.pdf (accessed on 29 November 2023).
Pizarro, D.; Mazo, M.; Santiso, E.; Marron, M.; Jimenez, D.; Cobreces, S.; Losada, C. Localization of Mobile Robots Using Odometry and an External Vision Sensor. Sensors 2010, 10, 3655–3680. [Google Scholar] [CrossRef] [PubMed]
Kendall, A.; Grimes, M.; Cipolla, R. Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 2938–2946. [Google Scholar]
Ghintab, S.S.; Hassan, M.Y. CNN-based visual localization for autonomous vehicles under different weather conditions. Eng. Technol. J. 2023, 41, 375–386. [Google Scholar] [CrossRef]
Ayala, A.; Cruz, F.; Campos, D.; Rubio, R.; Fernandes, B.; Dazeley, R. A comparison of humanoid robot simulators: A quantitative approach. In Proceedings of the 2020 Joint IEEE 10th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), Valparaiso, Chile, 26–30 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar]
Vijayakumar, A.; Vairavasundaram, S. YOLO-based Object Detection Models: A Review and its Applications. Multimed. Tools Appl. 2024, 83, 83535–83574. [Google Scholar] [CrossRef]

Figure 1. Robot schematic.

Figure 2. (a) LAR@MSL team robot; (b) Robot’s head.

Figure 3. General architecture of the developed solution.

Figure 4. Considered visual markers that include fundamental elements and geometric shapes that compose a football field.

Figure 5. Comparison between the robot’s simulated view and an actual real-world view. (a) Concatenated image of the three cameras view from the simulated robot; (b) Concatenated image of the three cameras view from the real robot.

Figure 6. Visual representation of the camera-to-world coordinate transformation process: the left panel illustrates how screen coordinates (u, v) are mapped from the camera’s perspective, capturing markers on the field. The coloured lines indicate the relationship between the field markers and their corresponding positions in the camera image, with each line representing the calculated distance (in metres) and angle (in degrees) relative to the robot’s position.

Figure 7. Data distribution and characteristics used during the YOLO model training. (a) The graph displays the number of instances per class; (b) The graph illustrates a clustering of objects near the centre of the images; (c) The graph reveals that most objects have small dimensions (narrow widths and heights).

Figure 8. YOLO model results. (a) Precision-Recall curve of the used network to gather real-world results. The precision-recall curve shows a mean average precision (mAP) of

0.896

at an Intersection over Union (IoU) threshold of

0.5

, indicating the model’s high detection accuracy across all marker classes.; (b) Normalised Confusion Matrix of the used network. This matrix proves that no two classes are mixed when detected, with most markers showing true positive rates above

85 %

. This is an excellent result considering that all markers are white on a green background, so it would be easy to mix them.

Figure 8. YOLO model results. (a) Precision-Recall curve of the used network to gather real-world results. The precision-recall curve shows a mean average precision (mAP) of

0.896

at an Intersection over Union (IoU) threshold of

0.5

, indicating the model’s high detection accuracy across all marker classes.; (b) Normalised Confusion Matrix of the used network. This matrix proves that no two classes are mixed when detected, with most markers showing true positive rates above

85 %

. This is an excellent result considering that all markers are white on a green background, so it would be easy to mix them.

Figure 9. Example of a worst-case scenario. (a) Illustration of a worst-case scenario considering a detection radius of 6 m; (b) View of a real robot during in-game detection. The goalposts are detected because this marker type is an exception and can be identified at a distance of 10 m.

Figure 10. Robot positioned on a grid floor, captured using the robot’s camera with a resolution of

640 \times 480

pixels.

Figure 10. Robot positioned on a grid floor, captured using the robot’s camera with a resolution of

640 \times 480

pixels.

Figure 11. Acquisition of known coordinates, with each grid vertex marked by blue crosses. The image used was captured by the robot’s camera, with a resolution of

640 \times 480

pixels.

Figure 11. Acquisition of known coordinates, with each grid vertex marked by blue crosses. The image used was captured by the robot’s camera, with a resolution of

640 \times 480

pixels.

Figure 12. Graphical results of both distance and angle. (a) 3D graph illustrating the relationship between distance values (Z-axis) and the u and v coordinates (X and Y axes) of the image (

640 \times 480

pixels). The plot illustrates how distances vary according to object positions within the robot’s reference frame; (b) 3D graph illustrating the variation of angle values (Z-axis) as a function of the u and v coordinates (X and Y axes) of the image (

640 \times 480

pixels). The plot illustrates how angular values vary based on the positions of objects within the robot’s reference frame.

Figure 12. Graphical results of both distance and angle. (a) 3D graph illustrating the relationship between distance values (Z-axis) and the u and v coordinates (X and Y axes) of the image (

640 \times 480

pixels). The plot illustrates how distances vary according to object positions within the robot’s reference frame; (b) 3D graph illustrating the variation of angle values (Z-axis) as a function of the u and v coordinates (X and Y axes) of the image (

640 \times 480

pixels). The plot illustrates how angular values vary based on the positions of objects within the robot’s reference frame.

Figure 13. Regression neural network architecture.

Figure 14. Coordinates distribution of X and Y axis on 40,000 points.

Figure 15. X-Axis neural network training results.

Figure 16. Y-Axis neural network training results.

Table 1. Comparison of localisation methods for robotic systems.

Method	Description	Advantages	Disadvantages
Monocular Vision	Uses a single camera to detect objects and estimate relative positions on the field.	Lower cost and complexity compared to multi-sensor systems.	Limited depth perception; sensitive to lighting variations.
Multi-Camera Systems	Utilise multiple cameras to achieve a 360-degree view, integrating inputs for greater spatial awareness.	Broader field of view; reduced occlusions; greater contextual data for localisation.	Increased processing requirements; potential redundancy or complexity.
Odometry	Relies on wheel encoder readings to estimate position over time.	Simple to implement; does not require external infrastructure.	Accumulation of errors over time due to wheel slippage or sensor inaccuracies.
Neural Network-Based	Use machine learning models to process visual data, often combining classification and regression for accurate localisation.	Highly adaptable; capable of handling complex, dynamic environments; scalable for various tasks.	Dependent on the quality of training data; requires significant computational resources for training.

Table 2. Camera sensor (OV2710) possible configurations.

Resolution (Pixels)	Field of View	Frame Rate
$1920 \times 1080$	$180^{\circ}$	30
$1280 \times 720$	$180^{\circ}$	60
$640 \times 480$	$125^{\circ}$	120
$320 \times 240$	$125^{\circ}$	120

Table 3. Marker list.

Marker Type	Quantity	Colour in Image
Penalty	2	Red (1)
GameCorner	4	Dark Blue (2)
Centre	1	Purple (3)
“T” Shape	10	Cyan (7)
“L” Shape	8	Yellow (5)
“+” Shape	2	Pink (4)
GoalPost	4	Green (6)
Total	31

Table 4. Used permutations of Roboflow’s Data Augmentation tool.

Augmentation	Parameter Costumisation
Flip	Vertical, Horizontal
$90^{\circ}$ rotation	Clockwise, Counter-Clockwise
Random Rotation	$0 %$ Minimum Zoom, $20 %$ Maximum Zoom
Random Crop	Between $- 15^{\circ}$ and $15^{\circ}$
Saturation	Between $- 30 %$ and $30 %$
Brightness	Between $- 25 %$ and $25 %$
Exposure	Between $- 15 %$ and $15 %$
Blur	Up to $0.8$ pixels
Random Noise	Up to $0.89 %$ of pixels

Table 5. Number of images for each marker.

Marker Type	Number of Images
Penalty	2176
GameCorner	2112
Centre	864
“+” Shape	2536
“L” Shape	6265
GoalPost	4056
“T” Shape	6058

Table 6. Number of occurrences within different error intervals for the X-axis neural network.

Error Margin (cm)	Number of Occurrences	Accuracy (%)
0	4289	$10.72 %$
10	16,862	$52.88 %$
20	11,161	$80.78 %$
30	4713	$92.56 %$
40	1663	$96.72 %$
50	619	$98.27 %$
60	263	$98.93 %$
70	150	$99.30 %$
80	72	$99.48 %$
90	34	$99.57 %$
100	20	$99.62 %$

Table 7. Number of occurrences within different error intervals for the Y-axis neural network.

Error Margin (cm)	Number of Occurrences	Accuracy (%)
0	6216	$15.54 %$
10	21,551	$69.42 %$
20	8854	$91.55 %$
30	2221	$97.11 %$
40	614	$98.64 %$
50	232	$99.22 %$
60	107	$99.49 %$
70	32	$99.57 %$
80	9	$99.59 %$
90	11	$99.62 %$
100	4	$99.63 %$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lopes, C.C.; Ribeiro, A.; Ribeiro, T.; Lopes, G.; Ribeiro, A.F. Multi-Neural Network Localisation System with Regression and Classification on Football Autonomous Robots. AI 2025, 6, 27. https://doi.org/10.3390/ai6020027

AMA Style

Lopes CC, Ribeiro A, Ribeiro T, Lopes G, Ribeiro AF. Multi-Neural Network Localisation System with Regression and Classification on Football Autonomous Robots. AI. 2025; 6(2):27. https://doi.org/10.3390/ai6020027

Chicago/Turabian Style

Lopes, Carolina Coelho, António Ribeiro, Tiago Ribeiro, Gil Lopes, and A. Fernando Ribeiro. 2025. "Multi-Neural Network Localisation System with Regression and Classification on Football Autonomous Robots" AI 6, no. 2: 27. https://doi.org/10.3390/ai6020027

APA Style

Lopes, C. C., Ribeiro, A., Ribeiro, T., Lopes, G., & Ribeiro, A. F. (2025). Multi-Neural Network Localisation System with Regression and Classification on Football Autonomous Robots. AI, 6(2), 27. https://doi.org/10.3390/ai6020027

Article Menu

Multi-Neural Network Localisation System with Regression and Classification on Football Autonomous Robots

Abstract

1. Introduction

2. State of the Art: Intelligent Football Robots and Localisation

2.1. Localisation Systems for MSL Football Robots

2.2. Monocular Vision

2.3. Multi-Camera Systems and Advanced Visualisation

2.4. Odometry

2.5. Computer Vision

2.6. Self-Localisation with Deep Learning

2.7. Summary of Localisation Methods

3. Robot Hardware

Vision System

4. Developed Solution

4.1. General Architecture

4.2. Markers

4.2.1. Inital Phase: Simulation with Webots

4.2.2. Real-World Stage: Manual Annotation and Validation

4.2.3. Training of the Marker Detection Neural Network

4.2.4. Limitations

4.2.5. Advantages

4.3. Camera-World Coordinates Transformation

4.3.1. Calibration Procedure

4.3.2. Use of Neural Networks for Conversion

4.3.3. Integration into a Lookup Table for Fast Localisation

4.4. Localisation Neural Network

4.4.1. Neural Network Architecture

4.4.2. Neural Network Training

4.4.3. Evaluation of Results for the X-Axis Network

4.4.4. Evaluation of Results for the Y-Axis Network

4.5. Conclusion of Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI