Enhancing 3D Rock Localization in Mining Environments Using Bird’s-Eye View Images from the Time-of-Flight Blaze 101 Camera

Kern, John; Rodriguez-Guillen, Reinier; Urrea, Claudio; Garcia-Garcia, Yainet

doi:10.3390/technologies12090162

Open AccessArticle

Enhancing 3D Rock Localization in Mining Environments Using Bird’s-Eye View Images from the Time-of-Flight Blaze 101 Camera

Electrical Engineering Department, Faculty of Engineering, University of Santiago of Chile (USACh), Av. Víctor Jara 3519, Estación Central, Santiago 9170124, Chile

^*

Author to whom correspondence should be addressed.

Technologies 2024, 12(9), 162; https://doi.org/10.3390/technologies12090162

Submission received: 22 July 2024 / Revised: 6 September 2024 / Accepted: 10 September 2024 / Published: 12 September 2024

Download

Browse Figures

Versions Notes

Abstract

:

The mining industry faces significant challenges in production costs, environmental protection, and worker safety, necessitating the development of autonomous systems. This study presents the design and implementation of a robust rock centroid localization system for mining robotic applications, particularly rock-breaking hammers. The system comprises three phases: assembly, data acquisition, and data processing. Environmental sensing was accomplished using a Basler Blaze 101 three-dimensional (3D) Time-of-Flight (ToF) camera. The data processing phase incorporated advanced algorithms, including Bird’s-Eye View (BEV) image conversion and You Only Look Once (YOLO) v8x-Seg instance segmentation. The system’s performance was evaluated using a comprehensive dataset of 627 point clouds, including samples from real mining environments. The system achieved efficient processing times of approximately 5 s. Segmentation accuracy was evaluated using the Intersection over Union (IoU), reaching 95.10%. Localization precision was measured by the Euclidean distance in the XY plane (ED_XY), achieving 0.0128 m. The normalized error (e_norm) on the X and Y axes did not exceed 2.3%. Additionally, the system demonstrated high reliability with R² values close to 1 for the X and Y axes, and maintained performance under various lighting conditions and in the presence of suspended particles. The Mean Absolute Error (MAE) in the Z axis was 0.0333 m, addressing challenges in depth estimation. A sensitivity analysis was conducted to assess the model’s robustness, revealing consistent performance across brightness and contrast variations, with an IoU ranging from 92.88% to 96.10%, while showing greater sensitivity to rotations.

Keywords:

3D sensing; point cloud processing; YOLO algorithm; rock localization; mining automation

1. Introduction

The mining industry has witnessed an increased adoption of automation systems across various stages of mineral processing, driven by the need to enhance safety and reduce operational costs. A critical task in this process is secondary reduction, which involves the use of heavy-duty manipulators equipped with hydraulic hammers to reduce the size of oversized rocks. Figure 1 illustrates rock-breaker hammers, which are utilized to break down large rocks that are unable to pass through grizzly systems or primary crushers. This process, when performed efficiently, ensures a continuous and uninterrupted flow in the processing line, thereby minimizing downtime and enhancing both operational efficiency and productivity [1].

Research in this field has explored various approaches, including the operation of robotic systems through teleoperation. For instance, ref. [2] describes the design of a haptic teleoperation system for underground mines. The effective automation of rock reduction requires the implementation of intelligent robotic systems with advanced visual perception capabilities. Initial efforts to automate or modernize rock-breaking systems can be traced back to 1998, when image processing techniques were employed to detect rocks on grizzly systems [3]. While effective at the time, these early processes had several limitations, including a high sensitivity to environmental conditions, computational intensity, a lack of generalizability, and inefficiency compared to modern deep learning methods.

A comparative analysis of machine learning and deep learning algorithms for rock detection in complex mining environments revealed that the You Only Look Once (YOLO) v4 algorithm offers the highest accuracy, while the Single-Shot Detector (SSD) provides the fastest processing speed [4]. Further advancements in this field include the development of a deep reinforcement learning scheme for rock breaking using an impact hammer, as presented in [5]. This approach formulates the problem as a Partially Observable Markov Decision Process and employs Deep Double Deep-Q Networks (DDDQN) for its solution.

Real-time three-dimensional (3D) rock detection has emerged as an effective approach for information processing in mining environments. Numerous studies have explored and leveraged these technologies in various mining contexts. For instance, ref. [6] evaluated the performance and robustness of clustering methods for object recognition during the secondary breaking phase in mining, utilizing Low-Cost Time-of-Flight (ToF) cameras. This research proposed an algorithmic method to efficiently utilize existing clustering and segmentation techniques in the detection loop, determining optimal contact points and approach angles for hydraulic hammers. An autonomous rock-breaking system featuring a Visual Perception System (VPS) capable of real-time detection of multiple irregularly shaped rocks was presented in [1]. Employing a stereo camera and an industrial manipulator, the system achieved an average success rate of 34% and a breaking rate of 3.3 attempts per minute in a real experimental environment. Furthermore, ref. [7] introduced a system for automatically determining target poses for rock breaking in underground mines, utilizing sensor data comprising point clouds and images to segment rocks and generate and evaluate candidate target poses.

Advancements in multimodal fusion techniques were demonstrated in [8], which presented a system for object identification in point clouds with varying density and coverage. By integrating Light Detection and Ranging (LiDAR) sensors and a ToF camera, the system implemented preprocessing, registration, and data fusion techniques to create a coherent and detailed representation of objects in a controlled environment, thereby optimizing rock-crushing operations in the mining industry.

A visual perception system for rock-breaking robots utilizing sensor fusion, specifically combining cameras and LiDAR, was proposed in [9]. The system employed the PP-YOLO algorithm for 2D detection and 3D reconstruction from point cloud data, achieving a detection speed of 13.8 ms, a mean Average Precision (mAP) of 91.2%, and a segmentation accuracy of 75.46% for rock-breaking surfaces.

While these studies focus directly on rock detection and localization in mining environments, it is also valuable to consider recent applications of deep learning in related fields, which could offer insights applicable to mining automation. Recent advancements include the automatic classification of road tunnel defects using Ground Penetrating Radar (GPR) images, as presented in [10]. This study evaluated four models, with the Vision Transformer (ViT) demonstrating superior performance, achieving an average accuracy of 98.1%. Additionally, ref. [11] employed the YOLO algorithm for the automatic detection of steel ribs in GPR images of tunnels. The study’s evaluation of performance on original and augmented datasets yielded miss rates of 7.18% and 0.38%, respectively. When combined with data augmentation, this technique shows considerable potential for enhancing automation in tunnel maintenance and inspection processes.

Although these investigations have laid important foundations in the detection and localization of rocks in mining environments, it is crucial to recognize that point cloud processing and conversion to Bird’s Eye View (BEV) are techniques that have seen significant advances in other fields. These innovations offer promising opportunities to improve the accuracy and efficiency in rock centroid localization. Recent studies have demonstrated the effectiveness of BEV representations in various tasks. In [12], a BEV-based loop closure detection method for LiDAR point clouds is presented. The method proves to be robust to rotations and computationally efficient, achieving a mAP of up to 71.81% on the Waymo dataset.

In [13], a multi-view fusion approach is proposed that combines range (RV) and BEV representations to improve semantic segmentation of point clouds. It also uses a geometric fusion module to align and combine features from both views. It achieves an mIoU of 76.1% on the nuScenes dataset. In [14], a geometric flow network for semantic segmentation of point clouds is proposed, using BEV and RV projections. It achieves an mIoU of 65.4% on SemanticKITTI. These advances underscore the potential of BEV representations to efficiently compress 3D data and leverage well-established 2D convolutional network architectures.

In addition to these data processing techniques, recent advances in neural network architectures, particularly those based on modular or block structures, present new possibilities for addressing the specific challenges of rock localization in mining. For example, in [15], an efficient Multi-scale Attention Module (EMA) is proposed that divides channel dimensions into multiple sub-features, retaining information per channel and decreasing computational overhead. Their method achieved a mAP of 57.8% in object detection on COCO. In [16], a Block-Combined Neural Network (BCNN) is introduced for predicting sediment transport rates, dividing tasks into modular sub-networks, achieving a correct classification rate of 89.77%. Finally, in [17], the researchers propose a block-based convolutional neural network for image forgery detection, incorporating attention mechanisms, with an accuracy of up to 97.97% on the CASIA v2.0 database.

The aforementioned technologies for 3D detection enable precise and efficient material identification, thereby facilitating the automation of critical tasks in mining operations. However, the processing of such data presents significant challenges due to the diverse shapes, sizes, and textures of rocks. To fully harness the advantages of 3D technology and optimize operational accuracy and efficiency in mining, advanced data analysis methods are essential.

In response to these challenges, this research focuses on the development of a rock centroid localization system characterized by both accuracy and speed, making it suitable for application in mining robotic systems such as rock-breaking hammers. The primary contributions of this study are as follows:

The development and validation of an optimized algorithmic pipeline: This study presents a novel approach combining point cloud preprocessing, BEV conversion, and segmentation using YOLO v8x-Seg, complemented by a postprocessing method employing two variants for rock centroid determination. The developed pipeline demonstrates high-precision centroid localization, achieving a Euclidean distance in the XY plane ( $E D_{XY}$ ) of up to 0.0128 m and a normalized error ( $e_{norm}$ ) in the X and Y axes not exceeding 2.3%. These results indicate the successful mitigation of the specific challenges associated with rock localization in mining environments;
Enhanced adaptability and robustness in varied mining conditions: The developed system exhibits consistent performance across diverse lighting conditions and in the presence of suspended particles, a crucial factor for its practical application in dynamic mining environments. This adaptability was achieved through the optimization of system parameters and the incorporation of real mine data in the training set;
Comprehensive experimental validation in real and simulated scenarios: The system underwent rigorous testing using a stationary rock breaker and other industrial equipment in both controlled environments and actual mining conditions. The validation process incorporated tests with 100 point clouds obtained directly from the “La Patagua” mine under a range of operational conditions. This extensive testing protocol ensures the transferability of laboratory results to real-world applications in the mining industry.

The structure of this paper is as follows: Section 2 provides an overview of the fundamental theoretical aspects underlying point cloud processing, BEV representation, and the YOLO v8 detection algorithm. Section 3 details the design of the rock centroid localization system. Section 4 presents and analyzes the results of the study. Finally, Section 5 concludes the paper and outlines future research directions.

2. Methodology

This section addresses the essential theoretical elements of 3D sensing, point cloud processing, BEV imaging, and the YOLO v8 algorithm.

2.1. 3D Sensing Modalities

LiDAR, 3D ToF cameras, and stereo cameras are the most prevalent sensors employed for 3D perception. The complexities of 3D environments necessitate sensors capable of addressing several key challenges: high spatial resolution, sufficient information availability, and accurate detection of fine details in scenarios with significant object overlap [1]. Table 1 provides a comprehensive analysis of the typical technologies utilized in 3D sensing.

2.2. Point Clouds Processing

The concept of a point cloud model is fundamental to 3D spatial representation. As described in [19], this model consists of a set of points

P_{i}, i = 1, \dots, n

in 3D Cartesian space. Each point

P_{i}

is characterized by at least three coordinates

{(x_{i}, y_{i}, z_{i})}^{T} \in R^{3}

representing its position, and may additionally possess attributes such as color, classification or segmentation identifier (ID), and spectral band information. Point clouds can be categorized as either organized or unorganized, depending on their storage structure.

Point clouds generated by the aforementioned sensors are typically unorganized and represented in an

M \times 3

matrix form, where M denotes the number of points and each column corresponds to the X, Y, and Z axes, respectively. The inherent lack of organization in these point clouds presents challenges for various processing tasks. Commonly used methods to address these challenges include downsampling, registration, denoising, clustering, and Random Sample Consensus (RANSAC).

2.2.1. Register

Point cloud registration plays a crucial role in reconstructing the complete shape of 3D objects, as 3D information is typically acquired from multiple viewpoints and requires alignment to consolidate these point clouds. The Iterative Closest Point (ICP) algorithm, a classic registration method described in [20], necessitates an appropriate initial position and high overlap rate to yield optimal results [21]. To enhance the alignment quality, several studies [22,23,24] have employed a two-step approach: first performing approximate registration, followed by fine registration. Approximate registration commonly utilizes shared points, lines, or surfaces within point clouds [25,26,27]. Among these methods, point-based approximate registration is widely adopted due to its speed and simplicity.

The analysis of point approximation requires consideration of the Homogeneous Transformation Matrix (

H T M

) ), which facilitates rotation and translation between coordinate systems. The

H T M

is defined by Equation (1):

H T M = [\begin{matrix} R & T \\ 0 & 1 \end{matrix}],

(1)

where

T = [\begin{matrix} T_{x}, & T_{y}, & T_{z} \end{matrix}]

represents the

3 \times 1

translation vector in the X, Y, Z axes, and R denotes the

3 \times 3

rotation matrix, encompassing the rotation angles roll (rotation about the X axis), pitch (rotation about the Y axis), and yaw (rotation about the Z axis).

The transformation of a point P to a new point

P^{'}

can be achieved using the

H T M

and Equation (2):

P^{'} = R \times P + T

(2)

To achieve precise alignment, the ICP algorithm is employed. This algorithm initiates with two point clouds and an initial estimate for rigid body transformation. It then iteratively refines the transformation by generating corresponding point pairs in the point clouds and minimizing the error metric [28]. In this study, the initial estimate was derived from the approximate registration process.

2.2.2. Random Sample Consensus

RANSAC, introduced by Fischler and Bolles in [29], represents a significant advancement in robust parametric estimation algorithms. This method is particularly effective in handling datasets containing a substantial proportion of outliers, a common challenge in many data processing applications. RANSAC’s innovative approach allows it to identify and utilize inliers (data points that conform to a particular model) while effectively ignoring outliers, thus producing reliable model estimates even in the presence of noisy data.

The algorithm’s strength lies in its iterative nature, which involves randomly sampling the dataset, estimating model parameters, and evaluating the model’s fit to the entire dataset. This process is repeated multiple times, with each iteration potentially yielding an improved model. The repetition continues until either a satisfactory model is found or a predetermined maximum number of iterations is reached, ensuring a high probability of identifying an optimal, outlier-free model.

The RANSAC algorithm (Algorithm 1) can be summarized in the following steps:

Algorithm 1 RANSAC

1:: Randomly select the minimum number of points required to determine the model parameters
2:: Solve for the parameters of the model
3:: Determine how many points from the set of all points fit within a predefined tolerance $ϵ$
4:: if the fraction of the number of inliers over the total number of points in the set exceeds a predefined threshold $τ$ then
5:: Re-estimate the model parameters using all the identified inliers and terminate
6:: else
7:: Repeat steps 1 through 4 (maximum of N times)
8:: end if

The use of predefined tolerances (

ϵ

) and thresholds (

τ

) allows for flexible adaptation to different datasets and application requirements, contributing to RANSAC’s versatility and effectiveness in robust model estimation.

2.2.3. Statistical Outlier Removal

In data analysis, an outlier represents an observation that deviates significantly from the overall pattern of the dataset. In point clouds, outliers manifest as points abnormally distant from the majority, often due to measurement errors, resulting in unwanted data points. To address this, various filtering algorithms have been developed, with Statistical Outlier Removal (SOR) being a widely used technique. SOR is a distance-based filtering approach designed to identify and eliminate points considered anomalous due to their substantial deviation from the general data distribution in their immediate vicinity.

SOR is an adaptive iteration of the Radius Outlier Filter (ROL) [30]. The concept of a distance based outlier (

D B

), as defined in [31], underpins these methods. In this approach, an object O in a dataset T is classified as a

D B (p, D)

outlier if at least a fraction p of the objects in T is at a distance greater than D from O. The method is further enhanced by considering the distance to the k-th nearest neighbor, with outliers being those points that have the largest distances to their k-th nearest neighbor.

A prominent method for distance-based outlier detection is Mahalanobis distance (

M D

), which accounts for correlations between variables and measures the similarity between an unknown sample and a known sample set. The Mahalanobis distance is given by Equation (3):

M D (x) = \sqrt{{(x - μ)}^{T} Σ^{- 1} (x - μ)},

(3)

where x represents the data vector,

μ

denotes the mean vector of the data,

Σ

is the covariance matrix of the data, and

{(x - μ)}^{T}

is the transposition of the vector

(x - μ)

. In cases where

Σ

is both the identity matrix and diagonal, the

M D

simplifies to the normalized Euclidean distance, expressed by Equation (4):

d (x, y) = \sqrt{\sum_{i = 1}^{P} {(\frac{x_{i} - y_{i}}{σ_{i}})}^{2}},

(4)

where

σ_{i}

represents the standard deviation of the i-th component of the data vector.

Distance-based methods like

M D

and the normalized Euclidean distance are effective for detecting outliers in high-dimensional datasets, making SOR particularly useful for processing point clouds.

2.2.4. Clustering

Clustering techniques are widely used to identify groups or clusters in multivariate data. Two notable cluster-based algorithms are Density-Based Spatial Clustering of Applications with Noise (DBSCAN) [32] and K-means [33].

DBSCAN clusters data based on density, identifying clusters as densely populated regions separated by lower-density areas. DBSCAN defines three types of points:

Core points: points with a minimum number of neighbors (min_pts) within a specified radius ( $ϵ$ );
Border points: points within $ϵ$ distance of a core points themselves;
Noise points: points that are neither core nor border points.

The

ϵ

-neighborhood of a point p is defined by Equation (5):

N_{ϵ} (p) = q \in D ∣ dist (p, q) \leq ϵ,

(5)

where D represents the dataset, and

d i s t

denotes a distance metric (typically Euclidean).

In contrast, K-means employs a partitioning approach based on centroids. This algorithm segments the data into K-clusters, aiming to minimize the sum of squared distances between points and their respective cluster centroids. Each cluster is represented by its centroid, calculated as the mean of all points within the cluster. The objective function that K-means seeks to minimize is expressed in Equation (6):

J = \sum_{i = 1}^{K} \sum_{x \in C_{i}} {| x - μ_{i} |}^{2},

(6)

where

C_{i}

denotes the set of points in cluster i, and

μ_{i}

represents the centroid of cluster i. The operational steps for both K-means (Algorithm 2) and DBSCAN (Algorithm 3) algorithms are formally presented below:

Algorithm 2 K-means clustering

Require: Dataset D, number of clusters K
Ensure: K-clusters

1:: Initialization: select K initial centroids randomly or using some heuristic method
2:: repeat
3:: Assignment: assign each point in the dataset to the nearest centroid based on the Euclidean distance (or other distance metrics)
4:: Update: recalculate the centroids by computing the mean of all points assigned to each cluster
5:: until centroids do not change significantly or maximum number of iterations is reached

Algorithm 3 DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

Require: Dataset D, radius

ϵ

, minimum points MinPts
Ensure: Clusters and noise points

1:: for each unvisited point p in D do
2:: Identify the $ϵ$ -neighborhood of p (all points within a radius $ϵ$ of p)
3:: if p has at least MinPts points in its $ϵ$ -neighborhood then
4:: Classify p as a core point
5:: Form a new cluster C with p
6:: for each point q in p’s $ϵ$ -neighborhood do
7:: if q is unvisited or noise then
8:: Add q to cluster C
9:: if q is a core point then
10:: Expand cluster C by recursively including density-reachable points
11:: end if
12:: end if
13:: end for
14:: else if p is within $ϵ$ distance of a core point then
15:: Classify p as a border point
16:: else
17:: Classify p as a noise point
18:: end if
19:: end for

A significant challenge in cluster analysis lies in the practical difficulty of determining the correct number of clusters a priori. Despite this, the majority of clustering algorithms are designed to operate based on a predetermined number of clusters [34].

2.3. Bird’s-Eye View Representations

As previously noted, point clouds represent sparse and irregular 3D data structures that necessitate specialized processing models. The development of efficient models capable of effectively handling point clouds remains a significant challenge for the scientific community. Detection methods in 3D space can be categorized into point-based, grid-based, point-voxel-based, and range-based approaches [35].

BEV represents a prominent technique within grid-based methods, wherein point clouds are transformed into discrete grids to generate sparse pseudo-images [36]. Among various BEV mapping techniques, density-based mapping [37] stands out for its ability to summarize the vertical shape of surrounding structures while leveraging point cloud density. The BEV conversion process involves several key steps:

The point cloud is initially constrained to the ranges $[y_{\min}, y_{\max}]$ and $[x_{\min}, x_{\max}]$ along the Y and X axes, respectively;
A grid size is established, and the point cloud is discretized accordingly;
Features are extracted for each non-empty grid cell in the resulting BEV map.

A detailed formulation of the BEV representation follows:

Given a point cloud P, points are initially distributed uniformly using a voxel grid of size g m. The ground space is discretized into grids with a resolution of g m. The point cloud density is quantified by the number of points within each grid cell. Considering a cubic window

[- C (m), C (m)]

centered at the coordinate origin, the BEV image

B (u, v)

is represented as a matrix of size

⌈\frac{2 C}{g}⌉ \times ⌈\frac{2 C}{g}⌉

. The BEV image

B (u, v)

is defined by Equation (7):

B (u, v) = \frac{min (N_{g}, N_{m})}{N_{m}},

(7)

where

N_{g}

denotes the number of points in the grid at position

(u, v)

, and

N_{m}

represents the normalization factor.

2.4. You Only Look Once v8 Algorithm

You Only Look Once is a deep learning algorithm initially developed for object detection. Since its inception, 10 versions have been released, each expanding the functionality to encompass not only object detection but also tracking, instance segmentation, classification, and pose estimation. Figure 2 illustrates the YOLO v8 architecture developed by Ultralytics [38], which demonstrates significant advancements over its predecessors in terms of accuracy, speed, and efficiency.

The key improvements in the YOLO v8 architecture include:

C2f Module: this module combines the C3 module from YOLOv5 with the ELAN structure from YOLOv7, enhancing gradient information flow through the integration of two parallel gradient paths;
Spatial Pyramid Pooling Fast (SPPF): the SPPF module implements spatial pyramid pooling both serially and in parallel, thereby expanding the coverage area of feature maps and incorporating multi-scale information;
Proto Module: in instance segmentation tasks, YOLO v8-Seg incorporates an additional fully connected convolutional network module, termed Proto, which generates masks to facilitate segmentation;
Anchor-Free Design: unlike its predecessors, YOLO v8 adopts an anchor-free approach, reducing the number of hyperparameters and enhancing segmentation performance through improved model scalability.

The YOLO v8-Seg family comprises five models: YOLO v8n-Seg, YOLO v8s-Seg, YOLO v8m-Seg, YOLO v8l-Seg, and YOLO v8x-Seg. Among these, YOLO v8x-Seg demonstrates the best performance. Table 2 presents the key parameters of the YOLO v8x-Seg model.

3. System Design

This section provides an in-depth exploration of the design of the rock centroid localization system.

3.1. System Architecture

The rock centroid localization system was implemented in three distinct phases: assembly, data acquisition, and data processing. Figure 3 depicts the system’s architecture.

Sensor placement plays a critical role in achieving accurate rock centroid localization. Figure 4 demonstrates the effects of various sensor positions on the resulting point clouds. To capture the maximum amount of information from the target object, sensors must be strategically positioned, as shown in Figure 4b. In this study, the sensors were positioned approximately facing the objects of interest. Additionally, to optimize data collection, the objects were positioned near the center of the point cloud.

3.2. Hardware Architecture

This study utilized a Basler Blaze-101 3D ToF camera [40] (Basler AG, Ahrensburg, Ger many), which provides 3D images with millimeter precision. Operating at a wavelength of 940 nm, this camera is suitable for both outdoor and indoor applications. It features a frame rate of 30 frames per second (fps), a GigE network interface, and a resolution of 640 × 480 pixels. The camera’s field of view is 67 × 51°, with a working range of 0.3 to 10 m. It maintains an accuracy of ±0.005 m within the 0.5 to 5.5 m range and demonstrates efficacy against sunlight of 12.8 W/m².

Network connectivity was ensured by a CISCO SG110D-08 switchCISCO SG110D-08 switch (Cisco Systems, Inc., San Jose, CA, USA), equipped with eight RJ-45 ports supporting 10BASE-T/100BASE-TX/1000BASE-T. Data processing was carried out on a laptop Hewellt Packard (HP) Victus featuring a Intel-Core i7-11800H CPU @ 64-bit 2.30 GHz, 16 GB of RAM, and an NVIDIA GeForce RTX 3060 graphics card.

To validate the experiment and facilitate future research, a RHINO model XDi3000 stationary rock breaker from the Canadian company ROCK-TECH (Lively, ON, Canada) was employed. This device represents a four Degree-of-Freedom (DoF) anthropomorphic robot with an end-effector diameter of 0.107 m. Table 3 shows the parameters of the stationary rock breaker system.

The limitations of LiDAR in this work primarily relate to spatial resolution and the amount of information available for precise rock detection. Although LiDAR is a popular technology in many 3D perception applications, for our specific case of rock detection in mining environments, it presents several disadvantages:

Resolution and point density: The LiDAR sensors considered, such as the SICK MRS 6000 and MRS 1000, and the Ouster OS0, provide a relatively low number of points (between 4000 and tens of thousands). This point density is insufficient to capture the necessary details of rocks, especially in scenarios with stacked or overlapping rocks;
Data structuring: point clouds are unstructured, making them less suitable for direct processing with generic Convolutional Neural Networks (CNNs), which are the state-of-the-art in object detection;
Detail in small objects: the relative scarcity of LiDAR points makes it difficult to evaluate detailed scenes with piles of small, irregular, overlapping rocks;
Quality of additional data: the additional data provided by LiDAR sensors, such as intensity images and depth maps, are of inferior quality compared to those obtained from the Blaze 101 3D ToF camera.

In contrast, the Blaze 101 3D ToF stereo camera we selected offers several advantages:

A higher point density, allowing for a more detailed representation of rocks;
It provides images of adequate resolution with rich texture information, useful for distinguishing objects from the background;
It generates higher quality depth maps and other additional data, which could be valuable for future data fusion implementations;
It offers a better relationship between spatial resolution and the amount of available information, crucial for accurate rock detection in our specific context. These characteristics make the 3D ToF camera more suitable for our specific application of rock detection in mining environments.

To address potential sensor data interruptions, the system incorporates several robust features. The Basler Blaze-101 3D ToF camera was selected partly for its reliability in industrial environments, reducing the likelihood of data interruptions. The system’s real-time processing approach, where each frame is processed independently, mitigates the impact of momentary interruptions. Furthermore, the system’s flexibility to operate with either one or two cameras provides redundancy, allowing continuous operation even if one camera experiences data loss. These design choices collectively enhance the system’s resilience to potential data interruptions, ensuring consistent performance in challenging mining environments.

3.3. Software Architecture

Data analysis and processing were conducted using Python 3.9.18, leveraging several specialized libraries. For training the deep learning networks, PyTorch 2.2.1 with CUDA 11.8 and Ultralytics YOLO v8 (version 8.2.28) were employed. Point cloud processing was handled by Open3D 0.18.0, while Harvester 1.4.3 facilitated connection to and data acquisition from the Blaze 101 sensors. Additionally, CloudCompare 2.13.0 [41] was utilized for initial point cloud processing, database analysis, and ground truth verification. Database labeling was accomplished using the Roboflow application [42].

3.4. Mineralogical and Morphological Characteristics Present in Chilean Mining Deposits

The “La Patagua” mine, illustrated in Figure 5a, is a strata-bound copper and silver deposit situated within volcano-sedimentary sequences. The deposit comprises two mines characterized by heterogeneous materials in terms of mineral composition. The key characteristics include:

Lithology: the predominant rock type is a volcanoclastic breccia tuff of volcanic origin;
Mineralization: Sulfides, including pyrite, chalcopyrite, bornite, and chalcocite, are disseminated throughout clasts and matrix, and in some veinlets. The presence of slight magnetism suggests the occurrence of magnetite and pyrrhotite;
Structure: rock fragments exhibit fracture systems, some of which are subparallel to stratification planes, while others are filled with calcite, as shown in Figure 5b;
Physical properties: the matrix demonstrates high hardness (R4), corresponding to a compressive strength between 50–100 MPa.

Figure 5. Mineralogical and morphological characteristics. (a) “La Patagua” mine. (b) Rock fragment displaying fractures and calcite veinlets.

The developed system, based on point clouds obtained from ToF cameras, primarily focuses on the precise localization of rocks rather than identifying their internal composition or lithology. However, it is acknowledged that depth information alone is insufficient for determining the specific lithological characteristics of the mineral under analysis. This limitation presents an opportunity for future research, integrating complementary technologies to enable a more comprehensive characterization of rock material in mining environments.

3.5. Dataset

The dataset for the rock centroid localization system was meticulously designed to capture the complexity and variability of real mining environments. It consists of 627 point clouds, which include samples of eight rocks from the “La Patagua” mine. Of these, 100 were collected directly in the mining environment: 50 under high illumination conditions and 50 under low illumination conditions with suspended particles, simulating typical adverse mining conditions. Additionally, 315 point clouds featured overlapping rocks, while 312 did not. The remaining samples were obtained in a controlled environment, incorporating variations in lighting, rock overlapping, and sensor positioning to enhance dataset diversity. The following considerations guided dataset creation:

Data were collected over several days under varying conditions, including low lighting, high lighting, and low levels of suspended particles;
Two object positioning variants were created: one without overlap between objects, and another with partial object overlap;
Variations in sensor mounting, including translations and rotations of the target objects, were implemented to increase database variability.

Figure 6 illustrates the created database and the different conditions analyzed.

The dataset was randomly divided: 80% (502 point clouds, 259 with overlap and 243 without) for training, 10% (63 point clouds, 29 with overlap and 34 without) for validation, and 10% (62 point clouds, 27 with overlap and 35 without) for testing. Ground truth was established by measuring distances along the X, Y, Z axes between a reference point

(0, 0, 0)

and each object centroid, verified using the CloudCompare software. All point clouds were subsequently converted to BEV, resulting in 2575 × 2575 pixel images.

The Roboflow application was employed for labeling, which offers tools for computer vision model development, including data acquisition, annotation, processing, and augmentation. Figure 7 displays the label distribution in the created database. Objects were positioned approximately in the center of the point cloud to ensure full visibility in the BEV image.

The current study employed a static database, meticulously constructed with representative samples from the intended implementation site. This database’s efficacy is evidenced by the results presented in Section 4.1, Section 4.2 and Section 4.3. To enhance model generalization, data augmentation techniques were applied, as described earlier. These techniques included adjustments in brightness (±15%), exposure (±10%), and blur (up to four pixels), as illustrated in Figure 8, resulting in an expanded training set of 1506 images. It should be noted that these techniques expanded the diversity of the static dataset rather than created a dynamic database. Despite not implementing a dynamic database, the system demonstrated adaptability to various conditions, as shown in Figure 6. The dataset’s diversity, encompassing various lighting conditions, rock overlaps, and sensor positions, effectively simulates the variability encountered in dynamic mining environments. This approach significantly expands dataset diversity.

While standardized benchmarks are valuable in AI research, our approach prioritizes practical applicability in the specific context of the FONDEF IDeA I + D ID21I10087 project, which aims to provide autonomy to a rock-breaking robotic system. Our database composition, combining images from both the actual mine and a controlled environment, allows for a dataset that is representative of operational conditions while enabling controlled parameter variation to enhance model robustness.

The validity of our approach is demonstrated through the results obtained when applying our algorithms, as detailed in Section 4.1, Section 4.2 and Section 4.3. Furthermore, the transferability of computer vision models to new environments has been demonstrated in previous studies. In our earlier work [4], we showed that models like YOLO, trained on databases created in different locations, successfully detected rocks in our samples. This suggests that our current model could also perform well if applied to similar, though not identical, data.

3.6. Performance Metrics

To validate the system’s performance, several metrics were employed, categorized into two groups: those evaluating the segmentation [43,44] performed by the YOLO v8x-Seg algorithm, and those assessing the localization [22,45] of rock centroids. Segmentation quality was analyzed using the Intersection over Union (IoU), also known as the Jaccard similarity coefficient. The IoU quantifies the rate of correctly classified pixels relative to the total pixels of the class. This metric serves as a statistical precision measure that penalizes false positives (

F P

). The IoU score is defined by Equation (8).

I o U = T P / (T P + F P + F N),

(8)

where

T P

,

F P

, and

F N

represent true positives, false positives, and false negatives, respectively.

To ensure a comprehensive and robust evaluation of our segmentation model, we employed MATLAB’s “evaluateSemanticSegmentation” function. This function automatically calculates the confusion matrix and derives various metrics from it, including

T P

,

T N

,

F P

, and

F N

. Our evaluation process involved creating two separate datastores: one for the prediction images generated by the YOLO v8x-Seg algorithm, and another for the ground truth images obtained through manual labeling using the Roboflow app. Both sets of images were represented as binary masks, with white pixels denoting rocks and black pixels representing the background. This binary representation allowed for a clear, pixel-wise comparison between predictions and ground truth. The “evaluateSemanticSegmentation” function provided us with a range of metrics, including the confusion matrix, Normalized Confusion Matrix, Class Metrics (such as accuracy, IoU, and MeanBFScore), and Global Metrics (including GlobalAccuracy, MeanAccuracy, MeanIoU, WeightedIoU, and MeanBFScore). This comprehensive set of metrics allowed us to assess our model’s performance from multiple perspectives, providing a thorough understanding of its strengths and potential areas for improvement.

The accuracy of rock centroid localization was evaluated using the Mean Absolute Error (

M A E

), normalized error (

e_{norm}

), Euclidean distance in the X,Y axes (

E D_{XY}

), and the coefficient of determination (

R^{2}

). The

M A E

,

e_{norm}

, and

R^{2}

metrics were calculated individually for each X, Y, Z axis.

M A E

,

e_{norm}

,

E D_{XY}

, and

R^{2}

are defined by Equations (9)–(12), respectively.

M A E = \frac{1}{n} \sum_{i = 1}^{n} | e_{i} |,

(9)

e_{norm} = \frac{M A E}{s_{axis}},

(10)

E D_{XY} = \sqrt{{(x_{g} - x_{p})}^{2} + {(y_{g} - y_{p})}^{2}},

(11)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{g_{i}} - y_{p_{i}})}^{2}}{\sum_{i = 1}^{n} {(y_{g_{i}} - \bar{y} g)}^{2}},

(12)

where

| e_{i} |

denotes the absolute error in a given axis,

s_{axis}

represents the size of the rock in a given axis,

[x_{g}, y_{g}]

and

[x_{p}, y_{p}]

indicate the ground truth and predicted positions in the

X Y

plane, respectively,

y_{g_{i}}

are the observed values,

y_{p_{i}}

are the predicted values, and

{\bar{y}}_{g}

is the mean of the observed values.

3.7. Description of the Centroid Location Algorithm

Figure 9 shows the rock centroid localization algorithm. The following subsections explore this functionality in depth.

3.7.1. Preprocessing

The preprocessing of the point clouds is illustrated in Figure 10. As previously discussed, we utilized the Harvester library for point cloud acquisition. The initial sensor connection time was approximately 1 s.

As discussed in the methodology, registration is defined as the alignment of point clouds. The CloudCompare software was employed to obtain an

H T M

for each sensor. This software offers two methods for point cloud alignment: rough and fine. Typically, an easily identifiable object is used for proper rough alignment by selecting common points. In this study, a rough alignment was first performed by selecting matching points in both point clouds, followed by a fine alignment using the ICP algorithm. The ultimate goal was to obtain a point cloud with fused information from two Blaze 101 sensors. A cube with stars on its faces was utilized as the object to obtain the necessary matrices for registration. Each point cloud from the Blaze 101 sensor contains approximately 300,000 points, resulting in a fused cloud of approximately 600,000 points. Figure 11 illustrates the procedure followed.

In the zero adjustment process, a point of interest is defined as the new

(0, 0, 0)

and the previously obtained point cloud is transformed using an

H T M

. The point clouds from the Blaze 101 sensor are in the millimeter range; therefore, the point cloud was normalized to meters. Finally, the major plane (floor) was segmented using the RANSAC algorithm. The essential parameters for this algorithm are as follows: distance_threshold (which defines the maximum distance a point can have from a plane to be considered an inlier), ransac_n (which defines the number of points randomly sampled to estimate a plane), and num_iterations (which defines how often a random plane is sampled and verified). The most critical parameter that can significantly affect major plane detection is distance_threshold; therefore, the minimum average distance between the points of the final point cloud was calculated, obtaining a value of 0.01 m. After conducting several tests, it was found that a value of 0.03 m yielded the best results. The goal of this process is to remove the major plane and reduce the dimensions of the point cloud, which is crucial for decreasing the computational cost in subsequent steps. Figure 12 illustrates the RANSAC procedure.

Our system incorporates several strategies to address the challenge of partial occlusions, which are common in mining environments. Firstly, we utilize multiple Basler Blaze-101 3D ToF sensors positioned at different angles. This multi-sensor configuration allows us to capture information from various perspectives, significantly mitigating partial occlusion problems. Secondly, our point cloud processing method enables us to work with complete three-dimensional information, which is particularly useful for inferring the complete shape of partially occluded objects. We can use depth information to distinguish between overlapping objects, enhancing our ability to accurately localize rock centroids even in complex scenes. Furthermore, the use of the YOLO v8x-Seg algorithm for instance segmentation allows us to detect and segment objects even when they are partially occluded. This algorithm has been trained to recognize partial features of objects and can infer the complete shape based on visible parts. By combining these approaches, our system demonstrates robust performance in handling partial occlusions, a critical capability for effective rock centroid localization in real-world mining scenarios.

3.7.2. Statistical Outlier Removal

Outlier points can appear in sensors such as 3D ToF cameras and LiDAR due to their internal functioning, potentially affecting algorithm performance. Therefore, the influence of SOR on the rock centroid localization algorithm was analyzed. In SOR, the average distances of each point to its nearest neighbors are calculated and used to identify outliers based on a standard deviation threshold. The key parameters include the following:

nb_neighbors specifies the number of neighbors considered when calculating the average distance for a given point;
std_ratio sets the threshold level based on the standard deviation of the average distances; a lower value results in more aggressive filtering.

Selecting these parameters is challenging, as they can affect point clouds with different configurations in various ways. In this study, after conducting several tests, configurations with an std_ratio of 1 and nb_neighbors of 16 were selected. It is important to note that this selection may not be optimal, and further research in this area is warranted.

The SOR method was implemented as an experimental variant to examine its efficacy in point cloud outlier removal. The system demonstrated robust performance both with and without SOR, indicating resilience to different data processing approaches. In this study, which focuses on spatial localization of rock centroids in static images, temporal heterogeneity was not considered a critical factor. The system was designed to process individual images or sequences without reliance on strict temporal coherence.

3.7.3. Bird’s-Eye View and Mapping

Given that image-based deep learning techniques are more established than point-cloud-based techniques, the decision was made to convert the point cloud into a BEV pseudo-image. This conversion requires defining several parameters: side_range (left and right limits), fwd_range (back and front limits), res (desired resolution in meters, where each output pixel represents a square region of res × res ), and height (min_height, max_height). Based on the setup and characteristics of the Blaze 101 sensors, the following parameters were defined for this research: side_range (−5.5, 0.02), fwd_range (−0.02, 5.5), res (0.002), and height (−0.1, 2). Mapping was performed to preserve the positions of the X, Y, Z axes in the image form.

The algorithm for converting the point cloud to BEV and performing the mapping is presented below (Algorithm 4). Figure 13 illustrates the point clouds converted to BEV images.

Algorithm 4 Point cloud to BEV conversion and mapping

Require: Point cloud P, side_range, fwd_range, res, height_range
Ensure: BEV image I, X_map, Y_map, Z_map

1:: x = P[:, 0]
2:: y = P[:, 1]
3:: z = P[:, 2]
4:: f_filt = np.logical_and((x > fwd_range[0]), (x < fwd_range[1]))
5:: s_filt = np.logical_and((y > side_range[0]), (y < side_range[1]))
6:: filter = np.logical_and(f_filt, s_filt)
7:: indices = np.argwhere(filter).flatten()
8:: x = x[indices]
9:: y = y[indices]
10:: z = z[indices]
11:: x_img = (-y - side_range[0]) / res
12:: y_img = (-x - fwd_range[0]) / res
13:: x_img = x_img.astype(np.int32)
14:: y_img = y_img.astype(np.int32)
15:: pixel_values = np.clip(a=z, a_min=height_range[0], a_max=height_range[1])
16:: pixel_values = pixel_values - height_range[0]
17:: I = np.zeros((int((side_range[1] - side_range[0]) / res), int((fwd_range[1] - fwd_range[0]) / res), 3), dtype=np.uint8)
18:: I[y_img, x_img, 0] = pixel_values
19:: X_map = np.zeros_like(I[:,:,0])
20:: Y_map = np.zeros_like(I[:,:,0])
21:: Z_map = np.zeros_like(I[:,:,0])
22:: X_map[y_img, x_img] = x
23:: Y_map[y_img, x_img] = y
24:: Z_map[y_img, x_img] = z
25:: return I, X_map, Y_map, Z_map

The input that would cause maximum activation in our system is a BEV image that clearly represents rocks distinguishable from the background. Ideal characteristics include rocks represented as high-density point regions, clearly contrasted with a low-density background, and well-defined, separated shapes, preferably circular or elliptical in top view. A uniform point density distribution within rock regions is also crucial. Several constraints must be imposed on this input to ensure optimal performance. Rock shapes must be consistent with typical geometries viewed from above, based on geological data collected from mining environments. The point density distribution should reflect realistic rock surface reflection characteristics, as observed in our dataset, with particular attention to the realistic density transition at rock edges, mimicking natural rock-background interfaces. Furthermore, the BEV image scale and perspective must align with our Basler Blaze-101 3D ToF sensor configuration to maintain consistency with our data acquisition setup. These characteristics and constraints ensure that our system responds optimally to inputs that closely resemble real-world mining scenarios while maintaining the high level of detail necessary for accurate rock centroid localization. Our model’s design and training process, detailed in previous sections, ensure its functionality with complex, real-world data, often more varied than these ideal conditions. This approach balances optimal activation with practical applicability in dynamic mining environments.

3.7.4. Rock Segmentation

After obtaining the BEV pseudo-image, the next step in the localization system is rock segmentation. The speed of this process is critical due to the requirements of mining robotic systems. Therefore, Ultralytics YOLO v8 was selected for its speed, accuracy, and ease of use. Additionally, it can perform various tasks such as object detection, tracking, instance segmentation, and pose estimation.

The rock segmentation process employs a two-stage approach to feature selection and learning. Initially, the conversion of 3D point clouds to BEV images serves as a form of feature engineering, preserving crucial spatial information while reducing computational complexity compared to direct 3D point cloud processing. This transformation allows for the leveraging of powerful CNN architectures such as YOLO v8x-Seg. Subsequently, YOLO v8x-Seg performs automated feature learning on these BEV images, extracting hierarchical features relevant to rock detection and localization. This approach enables joint optimization of the BEV representation and the network-learned features, resulting in an efficient end-to-end system. The combination of engineered features (BEV representation) and learned features (via YOLO v8x-Seg) constitutes a robust method for selecting and learning relevant features for the specific task of rock detection in mining environments.

The YOLO v8x-Seg segmentation model was used as the basis for training. Table 4 lists the hyperparameters used.

The results of training the YOLO v8x-Seg model on the BEV image dataset are illustrated in Figure 14. The training results demonstrate excellent performance. The obtained labels were saved with a .txt extension for further analysis and postprocessing.

The hyperparameters for the YOLO v8x-Seg model were selected based on recommendations from the literature and preliminary experiments. While an exhaustive hyperparameter optimization was not conducted due to computational resource limitations, our results demonstrate that these parameters perform well for our specific application. The high IoU values and accuracy in centroid localization achieved with these settings validate their effectiveness. It is worth noting that the current configuration has proven robust across various testing conditions, indicating a good level of generalization. However, we acknowledge that further optimization could potentially enhance the model’s performance. Future work will include a more comprehensive study of hyperparameter optimization, potentially employing techniques such as a grid search or Bayesian optimization to further refine our model’s performance and adaptability to different mining scenarios.

The system’s ability to handle various input resolutions is facilitated by the inherent flexibility of the YOLO v8 model and our preprocessing pipeline. A key feature of the Ultralytics implementation is that image size is a configurable parameter during both training and inference. While the YOLO v8x-Seg model was pre-trained on 640 × 640 × 3 pixel RGB images, this parameter allows for training and inference on images of different dimensions. Images with sizes different from the selected parameter are automatically resized, enabling the processing of inputs with various initial dimensions.

In this study, BEV images with a resolution of 2575 × 2575 pixels in grayscale were utilized, a deliberate choice to preserve critical details such as rock edges during the BEV conversion process. This high-resolution approach is fundamental for accurate centroid detection and localization. The preprocessing pipeline maintains data integrity by preserving spatial information during the conversion from high-resolution input to the model’s required resolution.

Although the main experiments were conducted with this specific high resolution, the flexibility provided by the configurable image size parameter ensures that the system can adapt to different resolutions during both training and inference. Additional tests with varying input resolutions confirmed the system’s adaptability, demonstrating consistent performance across different resolutions. The system’s only constraint is that input images must have a sufficient resolution to capture relevant rock characteristics, providing flexibility while ensuring accuracy.

This approach effectively handles different input resolutions without compromising integrity or precision, ensuring adaptability to various input scenarios while maintaining accurate centroid localization. The ability to adjust the image size parameter in both training and inference stages allows for fine-tuning the model’s performance for specific application requirements or hardware constraints.

Regarding the storage and retrieval of the optimal weight database, these are managed internally within the YOLO v8x-Seg architecture and can be accessed and updated through the training and model loading functions provided by the Ultralytics library.

3.7.5. Postprocessing

The final stage involved comparing two variants for processing the predictions obtained from the YOLO v8x-Seg algorithm. Figure 15a depicts variant 1, where predictions were analyzed collectively, while Figure 15b illustrates variant 2, which considered segmentations individually.

The YOLO v8x-Seg algorithm provides a segmented image and a .txt file containing predictions. This .txt file was utilized to generate mask images. For variant 1, a single mask image incorporating all predictions was created, whereas variant 2 generated individual mask images for each prediction. Random colors were assigned to each prediction in both cases. This color assignment is crucial for the subsequent stages of variant 1 but does not impact variant 2. Inverse mapping was then performed to obtain the point clouds. Variant 1 resulted in a single point cloud containing all rock predictions, while variant 2 produced separate point clouds for each prediction.

For variant 1, given the prior knowledge of the number of predictions, the K-means clustering algorithm was applied. This algorithm is efficient and particularly suitable when the number of clusters is known a priori. The parameters number_cluster, n_init, and max_iter were employed in K-means. The value of number_cluster was obtained directly from the YOLO v8x-Seg algorithm predictions, while the other parameters were determined empirically. Finally, for variant 1, the centroids of each resulting K-means cluster were calculated, whereas for variant 2, the centroids of each individual point cloud were computed.

4. Results and Discussion

This section analyzes and discusses the results of rock centroid localization. The analysis begins with the segmentation results using the YOLO v8x-Seg algorithm, followed by the presentation of rock localization results, a graphical analysis of segmentation and centroid localization, and concludes with a discussion of relevant research aspects. As outlined in the localization system design, several experiments were conducted. Evaluations were performed both with and without overlap. Two experiments in the segmentation section analyzed the influence of Statistical Outlier Removal (SOR) filtering. The variants analyzed in this study are as follows:

N-S-N-O. Without SOR and without overlap;
S-N-O. With SOR and without overlap;
N-S-O. Without SOR and with overlap;
S-O. With SOR and with overlap.

Additionally, in the localization phase, two variants were analyzed to obtain the rock centroid:

N-S-N-O-V1. Without SOR, without overlap, using variant 1;
N-S-N-O-V2. Without SOR, without overlap, using variant 2;
S-N-O-V1. With SOR, without overlap, using variant 1;
S-N-O-V2. With SOR, without overlap, using variant 2;
N-S-O-V1. Without SOR, with overlap, using variant 1;
N-S-O-V2. Without SOR, with overlap, using variant 2;
S-O-V1. With SOR, with overlap, using variant 1;
S-O-V2. With SOR, with overlap, using variant 2.

4.1. Results of BEV Image Segmentation

The correct and rapid segmentation of objects in images represents an area of continuous advancement and development. In this context, the YOLO v8x-Seg algorithm is of great importance due to its efficiency, speed, and ease of use. Table 5 presents the segmentation results achieved with the YOLO v8x-Seg algorithm under various conditions: with and without overlap, and with or without SOR.

As is evident from the table, rock segmentation proved adequate in both overlapping and non-overlapping environments, achieving IoU values above 93%. This high performance is crucial for the subsequent phases of the localization system.

The analysis of the IoU metric per image across the investigated scenarios is illustrated in Figure 16.

It is worth noting that in the non-overlapping scenario, all rocks were successfully detected and segmented in both analyzed variants. In the overlapping scenario, the SOR variant resulted in the detection and segmentation of seven additional rocks not present in the ground truth. Similarly, the No SOR variant led to the detection and segmentation of 12 additional rocks beyond the ground truth.

4.2. Sensitivity Analysis and Model Robustness

The sensitivity analysis conducted in our study, with results presented in Table 6, demonstrates the robustness of the YOLO v8x-Seg model against various perturbations in input images. The results indicate that the model maintains consistent performance when faced with changes in brightness and contrast, with minimal variations in Mean IoU. For instance, in the S-N-O variant, the Mean IoU ranges between 92.88% and 94.48% for these perturbations, while for N-S-O, it varies between 95.33% and 96.10%.

Notably, the model exhibits greater sensitivity to rotations, with a significant decrease in Mean IoU for ±5° rotations, dropping to 74.46% and 74.33% for S-N-O, and to 72.26% for both rotations in N-S-O. This information is crucial for understanding the model’s strengths and limitations under different operational conditions, enabling specific adjustments to enhance its performance in real mining scenarios.

4.3. Results of Rock Centroid Localization in Point Clouds

Precision in rock centroid localization is crucial for increasing rock-breaking efficiency. Table 7 presents the results of the metrics used to evaluate the quality of rock centroid localization. The data reveal favorable outcomes across all variants and scenarios analyzed. The most significant results include:

Euclidean Distance Error ( $E D_{XY}$ ): The maximum $E D_{XY}$ of 0.0196 m was observed in the N-S-O-V1 experiment (without SOR, with overlapping, variant 1). This result indicates high precision in centroid localization in the XY plane, considering that the typical diameter of rock breaker end effectors ranges between 0.07 m and 0.11 m;
Normalized error ( $e_{norm}$ ): The $e_{norm}$ in the X and Y axes did not exceed 3.8% in any case, which is an excellent result. The highest $e_{norm}$ was observed in the Z axis, reaching a maximum of 13.6196% in the S-O-V2 experiment (with SOR, overlapping, variant 2);
Coefficient of determination ( $R^{2}$ ): $R^{2}$ values close to 1 were achieved for all experiments’ X and Y axes, indicating a high correlation between predicted and actual values. For the Z axis, $R^{2}$ values were lower, ranging between 0.334 and 0.843, which is consistent with the higher error observed in this axis due to the use of BEV mapping;
Mean Absolute Error ( $M A E$ ): Consistently low $M A E$ values were obtained for the X and Y axes, with a maximum of 0.0149 m. The $M A E$ in the Z axis was slightly higher, with a maximum of 0.0333 m.

Table 7. Results of the metrics used to evaluate the location of rocks in the point cloud dataset.

Metrics	Experiments
	N_S_N_O_V1	N_S_N_O_V2	S_N_O_V1	S_N_O_V2	N_S_O_V1	N_S_O_V2	S_O_V1	S_O_V2
$M A E_X (m)$	0.0092	0.0085	0.0088	0.0089	0.0105	0.0106	0.0095	0.0100
$M A E_Y (m)$	0.0095	0.0084	0.0098	0.0089	0.0135	0.0149	0.0120	0.0114
$M A E_Z (m)$	0.0297	0.0307	0.0301	0.0310	0.0320	0.0322	0.0331	0.0333
$e_{norm_X} (%)$	2.4000	2.2263	2.3288	2.3183	2.7712	2.8300	2.5112	2.6873
$e_{norm_Y} (%)$	2.5240	2.3151	2.6052	2.4181	3.7826	3.3895	3.2508	3.0954
$e_{norm_Z} (%)$	12.5305	12.8894	12.6450	12.9817	13.1099	13.2652	13.4980	13.6196
$R^{2}_X$	0.999	1.000	0.999	0.999	1.000	1.000	1.000	1.000
$R^{2}_Y$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000
$R^{2}_Z$	0.394	0.360	0.376	0.334	0.841	0.843	0.827	0.822
$E D_{XY} (m)$	0.0144	0.0128	0.0149	0.0134	0.0196	0.0184	0.0173	0.0171

A visual representation of the

R^{2}

results and maximum error for cases without overlapping and with overlapping is provided in Figure 17 and Figure 18, respectively.

Figure 17 illustrates the

R^{2}

results and maximum error for cases without overlapping. The

R^{2}

values for the X and Y axes are consistently high (close to 1), while they are lower for the Z axis. Maximum errors are generally higher in the Z axis, consistent with the inherent challenges of using BEV mapping for depth estimation.

Figure 18 reveals a slight performance degradation compared to the cases without overlapping, especially in the Z axis. However, the results remain robust, with

R^{2}

values for X and Y close to 1 and relatively low maximum errors.

Figure 19 provides a detailed analysis of the

M A E

,

e_{norm}

, and

E D_{XY}

metrics per image in the investigated scenarios. The most significant errors consistently occur in the Z axis, attributable to BEV mapping. However, due to the operation of the analyzed robotic system, these errors do not significantly impair the overall performance. Notably, the extreme cases of

e_{norm}

did not exceed 22%, as shown in Figure 19c,d. This demonstrates the system’s robustness even in challenging situations.

Overall, these results demonstrate the high precision and reliability of the developed rock centroid localization system, both in scenarios with and without overlapping and with or without the use of SOR. The consistency of good results across different configurations underscores the system’s robustness and adaptability to diverse operating conditions in mining environments.

4.4. Graphical Analysis of the Segmentation in the BEV Image and of the Rock Location in the Point Cloud

The graphical analysis of rock segmentation and localization results provides a valuable qualitative evaluation of the developed system. Figure 20 presents several examples of point clouds, with the first two exhibiting top-to-bottom overlap, while the last one does not.

In all the illustrated examples, the system successfully detected and localized the rocks in the scene. This performance is particularly noteworthy given the challenges posed by overlapping, suspended particles, and varying lighting conditions, which typically complicate localization tasks. It was observed that localization errors tended to increase in cases of overlap.

4.5. Relevant Aspects

The rock centroid localization system achieved excellent results, as discussed in the previous sections. This section analyzes some relevant aspects found in the research: mining robotic systems with rock-breaking hammers, as mentioned, have four DoF. The first three are used for positioning in the 3D Cartesian space, and the last provides the attack angle of the end effector. Analyses of their operation in various mining operations in Chile suggests that the end effector should always be vertical concerning the major plane or the floor, as this requires less effort when breaking rocks. This implies that the last DoF of the rock-breaking hammer should be at 90° to the floor. This allows for obtaining the necessary joint angles to perform the rock-breaking task autonomously using the robot’s inverse kinematics and the proposed system. The error obtained on the Z axis, despite being close to 22%, has little influence because the value given for the hammer’s positioning on this axis will always be approximately twice the centroid value, as in [1]. Another significant aspect is that the

e_{norm}

on the X,Y axes is not substantial when considering that the diameters of the analyzed robotic system and similar ones vary between approximately 0.07 m and 0.11 m.

The implemented system required approximately 5 s, making it suitable for mining operations. The initial connection with the Blaze 101 sensors using the Harvester library required 1 s, and then it maintained streaming. This reduced the data acquisition time, reaching 0.03 s per sensor. The BEV image conversion required approximately 1 s, mainly because of the resolution used for its creation. Preprocessing allowed a better scene representation and reduced the point cloud size, which is vital for lowering subsequent computational costs. Depending on whether the SOR was used, the preprocessing time ranged from 1 to 1.5 s. The speed of the YOLO v8x-Seg algorithm allowed an inference time ranging from 1 to 1.5 s, despite having an input image of 2575 × 2575 pixels.

Correct sensor positioning is a vital aspect, as it greatly influences the accuracy of rock centroid localization. Poor sensor placement can result in an object’s shape not being captured, thereby increasing the occurrence of unwanted points or areas lacking object information.

The same methodology for point cloud registration as in [8] was used, which allowed for better scene representation. However, further improvements or incorporation of other methods are needed, as small differences between objects in the final point cloud were observed, affecting proper segmentation and subsequent rock localization.

Using ROL, SOR, and other filters to remove unwanted points is a common technique in point cloud processing; however, a more in-depth analysis of this procedure is required because its parameters can influence the final result of rock centroid localization differently. In some cases, applying SOR removed not only unwanted points but also points belonging to the object to be detected, reducing the accuracy of the IoU as shown in the non-overlapping scenarios.

The analyses and results obtained at La Patagua Mining Company, thanks to the FONDEF IDeA I + D ID21I10087 project, concluded that rock-breaking hammer operators first checked whether the rocks were stacked or overlapping; if so, they use the hammer to unstack them before starting to break them. This demonstrates that implementing the rock centroid localization system would provide excellent results for rock-breaking hammers used in mining operations. The objective of the dataset was to pose a challenge for the developed algorithms, as the overlapping scenario, despite being present in mining operations, always involves unstacking rocks as the first step.

Specifically, it was identified that in environments with high concentrations of suspended particles, the quality of point clouds significantly deteriorates due to the internal functioning of ToF technology-based sensors, such as LiDAR and 3D ToF cameras. This degradation can lead to system malfunction under these particular conditions. To address this challenge, future work will focus on implementing a combination of 3D ToF cameras and thermal cameras. This sensor fusion could mitigate the drawbacks of ToF technologies in environments with high particle concentrations, thereby improving the system’s robustness and reliability across the full spectrum of mining conditions.

In designing the system, its applicability to other types of 3D sensors was considered, keeping in mind some key considerations, such as the following:

Method Generality. The system is based on 3D point cloud processing, making it adaptable to various 3D sensor technologies. The main algorithms (preprocessing, BEV conversion, and YOLO v8x-Seg) were designed to work with point cloud data regardless of the specific sensor;
Requirements and adaptability. (a) Point density. For adequate characterization of objects such as rocks in BEV images, point clouds must have sufficient points. For example, LiDAR sensors like MRS1000 and MRS6000 were initially considered, but their 4000 and 20,000 points proved insufficient for this specific use case. (b) Parameter adjustment. Some system parameters from the utilized sensor are closely related to the characteristics of the point clouds. These parameters directly influence algorithms such as RANSAC and SOR. It is important to note that these adjustments are made only once during the initial setup, and they do not affect the subsequent operation of the system. (c) YOLO retraining. Depending on the characteristics of BEV images generated by different sensors, it might be necessary to retrain or adjust the YOLO v8x-Seg model. This would ensure optimal performance with the specific data from the new sensor;
Ensuring universality. (a) The method was validated with data from controlled environments and real mining conditions. (b) The dataset includes variations in lighting, rock sizes, and environmental factors. (c) Deep learning techniques allow adaptation to new types of data through retraining.

5. Conclusions and Future Work

This study presents a robust and efficient rock centroid localization system for mining robotic applications, particularly rock-breaking hammers, demonstrating significant advancements in addressing the challenges of rock localization in dynamic mining environments. The system achieved exceptional localization accuracy, with a Euclidean distance in the XY plane (

E D_{XY}

) of up to 0.0128 m and a normalized error (

e_{norm}

) on the X and Y axes not exceeding 2.3%, surpassing the precision requirements for typical rock-breaking end effectors. Notably, the system exhibited consistent performance under diverse lighting conditions and in the presence of suspended particles, crucial factors in real-world mining operations.

Rigorous testing, including a sensitivity analysis, validated the system’s efficacy and transferability to real-world scenarios. The YOLO v8x-Seg model demonstrated robust performance against various image perturbations, maintaining high Mean IoU scores (92.88% to 96.10%) for changes in brightness and contrast. However, a notable sensitivity to rotations was observed, with Mean IoU dropping to around 74% for ±5° rotations, highlighting areas for future improvement. The innovative combination of point cloud preprocessing, BEV conversion, and segmentation using YOLO v8x-Seg proved highly effective for precise rock centroid localization, addressing specific challenges in mining environments. With an average processing time of approximately 5 s, the system demonstrates its suitability for real-time applications in mining operations.

While limitations such as sensitivity to high concentrations of suspended particles and interference from intense light were identified, the overall performance suggests that this system could significantly enhance the efficiency and safety of rock-breaking operations in mining. This successful implementation, backed by a comprehensive sensitivity analysis and robustness testing, represents a crucial step towards fully autonomous mining operations. It has the potential to increase productivity, reduce operational costs, and improve worker safety in hazardous mining environments, while also providing a solid foundation for future research in 3D perception and object localization in complex, unstructured environments beyond the mining industry.

Future work will focus on several key areas to further enhance the system’s performance and versatility:

While the current preprocessing pipeline effectively handles various input resolutions, more advanced techniques will be developed to optimize this process. This includes refining the adaptive preprocessing module to more efficiently normalize input images across an even wider range of resolutions and sensor types, ensuring consistent performance across different sensor inputs;
Cutting-edge super-resolution techniques will be explored to potentially improve the quality of lower-resolution inputs, expanding the system’s applicability to scenarios where high-resolution sensors are not available or practical. This research will focus on adapting concepts from the Swift Parameter-free Attention Network (SPAN), as presented in the NTIRE 2024 Efficient Super-Resolution Challenge [46], to the YOLO architecture. The incorporation of Swift Parameter-free Attention Blocks (SPAB) and parameter-free attention generation techniques aims to enhance spatial dependency capture and improve object detection efficiency across various scales. Additionally, the use of strategic residual connections will be investigated to optimize information flow through the network. This direction of research seeks to push the boundaries of what is possible with lower-quality input data, potentially broadening the system’s utility in challenging mining environments while maintaining the computational efficiency crucial for real-time applications;
Advanced image fusion methods for combining data from Basler Blaze-101 3D ToF and thermal cameras will be investigated to enhance system robustness in diverse mining environments. A fusion approach based on ResNet and zero-phase component analysis (ZCA), as proposed in [47], will be adapted to the existing YOLO architecture. Modifications to the YOLO backbone will be implemented to process multimodal inputs, with ZCA being applied to project features into a sparse subspace. Within the YOLO framework, a fusion strategy utilizing local average l1-norm and soft-max operations will be developed to effectively merge depth and thermal information. These enhancements are expected to improve rock detection and localization accuracy, particularly in environments with suspended particles, while preserving YOLO’s real-time performance capabilities. The proposed improvements aim to increase system versatility and applicability across a wider range of mining operations;
The robustness of the model will be enhanced through the implementation of advanced sensitivity analysis techniques, drawing inspiration from uncertainty quantification (UQ) methods for deep neural networks, as demonstrated in [48]. Their automated randomly deactivating process (ARDCW) will be adapted to the YOLO architecture employed in this study. This process involves the selective deactivation of network components to assess their influence on centroid localization accuracy. Three-dimensional visualizations of uncertainty intervals will be developed to facilitate spatial sensitivity analysis. The results will be compared with traditional sensitivity methods to provide a comprehensive evaluation of the model’s sensitivity. These findings will be utilized to optimize the YOLO architecture specifically for rock centroid localization, with potential incorporation of adaptive structures suited to mining environments;
To validate these improvements, extensive testing will be conducted with various camera configurations and resolutions, ensuring that the system not only maintains but potentially exceeds its current high performance across different hardware setups.

Author Contributions

Conceptualization, J.K. and C.U.; methodology, R.R.-G., J.K. and C.U.; software, R.R.-G.; validation, R.R.-G.; formal analysis, R.R.-G., J.K. and C.U.; investigation, R.R.-G., J.K. and C.U.; writing—review and editing, R.R.-G., J.K., C.U. and Y.G.-G.; supervision, J.K. and C.U.; funding acquisition, J.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Agencia Nacional de Investigación y Desarrollo (ANID), Chile, through IDeA I + D ID21I10087 project, ANID-Subdirección de Capital Humano/Doctorado Nacional/2022-21220266, and by Vicerrectoría de Investigación, Innovación y Creación of the University of Santiago of Chile, Chile.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study is contained in the article itself.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lampinen, S.; Niu, L.; Hulttinen, L.; Niemi, J.; Mattila, J. Autonomous robotic rock breaking using a real-time 3D visual perception system. J. Field Robot. 2021, 38, 980–1006. [Google Scholar] [CrossRef]
Correa, M.; Cárdenas, D.; Carvajal, D.; Ruiz-del-Solar, J. Haptic teleoperation of impact hammers in underground mining. Appl. Sci. 2022, 12, 1428. [Google Scholar] [CrossRef]
Takahashi, H.; Sano, K. Automatic detection and breaking system for boulders by use of ccd camera and laser pointer. Fragblast 1998, 2, 397–414. [Google Scholar] [CrossRef]
Rodriguez-Guillen, R.; Kern, J.; Urrea, C. Fast Rock Detection in Visually Contaminated Mining Environments using Machine Learning and Deep Learning Techniques. Appl. Sci. 2024, 14, 731. [Google Scholar] [CrossRef]
Samtani, P.; Leiva, F.; Ruiz-del-Solar, J. Learning to Break Rocks with Deep Reinforcement Learning. IEEE Robot. Autom. Lett. 2023, 8, 1077–1084. [Google Scholar] [CrossRef]
Niu, L.; Aref, M.M.; Mattila, J. Clustering analysis for secondary breaking using a low-cost time-of-flight camera. In Proceedings of the 2018 Ninth International Conference on Intelligent Control and Information Processing (ICICIP), Wanzhou, China, 9–11 November 2018; pp. 318–324. [Google Scholar]
Cárdenas, D.; Parra-Tsunekawa, I.; Leiva, F.; Ruiz-del Solar, J. Automatic determination of rock-breaking target poses for impact hammers. Energies 2022, 15, 6380. [Google Scholar] [CrossRef]
Bernal, D.F.Q.; Kern, J.; Urrea, C. A Multimodal Fusion System for Object Identification in Point Clouds with Density and Coverage Differences. Processes 2024, 12, 248. [Google Scholar] [CrossRef]
Li, J.; Liu, Y.; Wang, S.; Wang, L.; Sun, Y.; Li, X. Visual perception system design for rock breaking robot based on multi-sensor fusion. Multimed. Tools Appl. 2024, 83, 24795–24814. [Google Scholar] [CrossRef]
Rosso, M.M.; Marasco, G.; Aiello, S.; Aloisio, A.; Chiaia, B.; Marano, G.C. Convolutional networks and transformers for intelligent road tunnel investigations. Comput. Struct. 2023, 275, 106918. [Google Scholar] [CrossRef]
Bae, B.; Ahn, J.; Jung, H.; Yoo, C.K. Detection of steel ribs in tunnel GPR images based on YOLO algorithm. J. Korean Geotech. Soc. 2023, 39, 31–37. [Google Scholar]
Cao, D.; Yue, H.; Liu, Z.; Wu, X.; Chen, W. BEVLCD: Real-time and rotation-invariant loop closure detection based on BEV of point cloud. IEEE Trans. Instrum. Meas. 2023, 72, 5026213. [Google Scholar] [CrossRef]
Xu, W.; Li, X.; Ni, P.; Guang, X.; Luo, H.; Zhao, X. Multi-View Fusion Driven 3D Point Cloud Semantic Segmentation Based on Hierarchical Transformer. IEEE Sens. J. 2023, 23, 31461–31470. [Google Scholar] [CrossRef]
Haibo, Q.; Baosheng, Y.; Dacheng, T. Gfnet: Geometric flow network for 3d point cloud semantic segmentation. Trans. Mach. Learn. Res. 2022, 9. Available online: https://openreview.net/forum?id=LSAAlS7Yts (accessed on 22 July 2024).
Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient Multi-Scale Attention Module with Cross-Spatial Learning. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
Hosseini, S.A.; Shahri, A.A.; Asheghi, R. Prediction of bedload transport rate using a block combined network structure. Hydrol. Sci. J. 2022, 67, 117–128. [Google Scholar] [CrossRef]
Zhou, J.; Ni, J.; Rao, Y. Block-Based Convolutional Neural Network for Image Forgery Detection. In Digital Forensics and Watermarking IWDW2017; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2017; pp. 65–76. [Google Scholar]
Kamran-Pishhesari, A.; Moniri-Morad, A.; Sattarvand, J. Applications of 3D Reconstruction in Virtual Reality-Based Teleoperation: A Review in the Mining Industry. Technologies 2024, 12, 40. [Google Scholar] [CrossRef]
Ghamisi, P.; Rasti, B.; Yokoya, N.; Wang, Q.; Hoefle, B.; Bruzzone, L.; Bovolo, F.; Chi, M.; Anders, K.; Gloaguen, R.; et al. Multisource and Multitemporal Data Fusion in Remote Sensing a Comprehensive Review of the State of the Art. IEEE Trans. Geosci. Remote Sens. 2019, 7, 6–39. [Google Scholar] [CrossRef]
Besl, P.J.; Mckay, N.D. A Method for Registration of 3-D Shapes. IEEE Trans. Pattern Anal. Mach. Intell. 1992, 14, 239–256. [Google Scholar] [CrossRef]
Xu, G.; Pang, Y.; Bai, Z.; Wang, Y.; Lu, Z. A Fast Point Clouds Registration Algorithm for Laser Scanners. Appl. Sci. 2021, 11, 3426. [Google Scholar] [CrossRef]
Yue, X.; Liu, Z.; Zhu, J.; Gao, X.; Yang, B.; Tian, Y. Coarse-fine point cloud registration based on local point-pair features and the iterative closest point algorithm. Appl. Intell. 2022, 52, 12569–12583. [Google Scholar] [CrossRef]
Yu, H.; Li, F.; Saleh, M.; Busam, B.; Ilic, S. Cofinet: Reliable coarse-to-fine correspondences for robust pointcloud registration. In Proceedings of the 35th Conference on Neural Information Processing Systems, Online, 6–14 December 2021; Volume 34, pp. 23872–23884. [Google Scholar]
Bueno, M.; Martínez-Sanchez, J.; Gonzalez-Jorge, H.; Lorenzo, H. Detection of geometric keypoints and its application to point cloud coarse registration. ISPRS-Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 41, 187–194. [Google Scholar] [CrossRef]
Cheng, L.; Tong, L.; Wu, Y.; Chen, Y.; Li, M. Shiftable Leading Point Method for High Accuracy Registration of Airborne and Terrestrial LiDAR Data. Remote Sens. 2015, 7, 1915–1936. [Google Scholar] [CrossRef]
Yang, B.; Zang, Y.; Dong, Z.; Huang, R. An automated method to register airborne and terrestrial laser scanning point clouds. ISPRS J. Photogramm. Remote Sens. 2015, 109, 62–76. [Google Scholar] [CrossRef]
Gruen, A.; Akca, D. Least squares 3D surface and curve matching. ISPRS J. Photogramm. Remote Sens. 2005, 59, 151–174. [Google Scholar] [CrossRef]
Rusinkiewicz, S.; Levoy, M. Efficient Variants of the ICP Algorithm. In Proceedings of the Third International Conference on 3D Digital imaging and Modellling, Quebec, QC, Canada, 28 May–1 June 2001; pp. 145–152. [Google Scholar]
Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM. 1981, 24, 381–395. [Google Scholar] [CrossRef]
Szutor, P.; Zichar, M. Fast Radius Outlier Filter Variant for Large Point Clouds. Data 2023, 8, 149. [Google Scholar] [CrossRef]
Arámburo, J.; Ramírez, A.T. Advances in Robotics, Automation and Control, 1st ed.; Intechopen: Rijeka, Croatia, 2008; pp. 265–282. [Google Scholar]
Ester, M.; Krigel, H.P.; Sander, J.; Xu, X. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; pp. 226–231. [Google Scholar]
Dubes, R.C.; Jain, A.K. Algorithms for Clustering Data, 1st ed.; Prentice Hall: Saddle River, NJ, USA, 1988. [Google Scholar]
Kodinariya, T.M.; Makwana, P.R. Review on determining number of Cluster in K-Means Clustering. Int. J. Adv. Res. Comput. Sci. Manag. Stud. 2013, 1, 6. [Google Scholar]
Mao, J.; Shi, S.; Wang, X.; Li, H. 3D object detection for autonomous driving: A comprehensive survey. Int. J. Comput. Vis. 2023, 131, 1909–1963. [Google Scholar] [CrossRef]
Wang, B.; Zhu, M.; Lu, Y.; Wang, J.; Gao, W.; Wei, H. Real-time 3D object detection from point cloud through foreground segmentation. IEEE Access 2021, 9, 84886–84898. [Google Scholar] [CrossRef]
Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-View 3D Object Detection Network for Autonomous Driving. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1907–1915. [Google Scholar]
Ultralytics YOLOV8. Available online: https://github.com/ultralytics/ultralytics (accessed on 5 February 2024).
Uygun, T.; Ozguven, M.M. Determination of tomato leafminer: Tuta absoluta (Meyrick) (Lepidoptera: Gelechiidae) damage on tomato using deep learning instance segmentation method. Eur. Food Res. Technol. 2024, 250, 1837–1852. [Google Scholar] [CrossRef]
Bazler Blaze 101. Available online: https://www.baslerweb.com/en/shop/blaze-101/ (accessed on 2 September 2023).
CloudCompare. Available online: https://www.cloudcompare.org/ (accessed on 10 January 2024).
Roboflow. Available online: https://roboflow.com/ (accessed on 25 January 2024).
Rajalakshmi, T.S.; Senthilnathan, R. Dataset and Performance Metrics towards Semantic Segmentation. Int. J. Eng. Manag. Res. 2023, 13, 1. [Google Scholar] [CrossRef]
Urrea, C.; Garcia-Garcia, Y.; Kern, J. Improving Surgical Scene Semantic Segmentation through a Deep Learning Architecture with Attention to Class Imbalance. Biomedicines 2024, 12, 1309. [Google Scholar] [CrossRef] [PubMed]
Zhang, Q.; Liu, X.; Peng, T.; Yang, X.; Tang, M.; Zou, X.; Liu, M.; Wu, L.; Zhang, T. U-SeqNet: Learning spatiotemporal mapping relationships for multimodal multitemporal cloud removal. GISci. Remote Sens. 2024, 61, 2330185. [Google Scholar] [CrossRef]
Zhang, Z.; Zhang, S.; Wu, R.; Zuo, W.; Timofte, E.; Xing, X.; Park, H.; Song, S.; Kim, C.; Kong, X.; et al. NTIRE 2024 Challenge on Bracketing Image Restoration and Enhancement: Datasets Methods and Results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 6153–6166. [Google Scholar]
Li, H.; Wu, X.; Durrani, T.S. Infrared and visible image fusion with resnet and zero-phase component analysis. Infrared Phys. Technol. 2019, 102, 103039. [Google Scholar] [CrossRef]
Abbaszadeh Shahri, A.; Shan, C.; Larsson, S. A hybrid ensemble-based automated deep learning approach to generate 3D geo-models and uncertainty analysis. Eng. Comput. 2024, 40, 1501–1516. [Google Scholar] [CrossRef]

Figure 1. Rock-breaker hammers.

Figure 2. YOLO v8-Seg architecture [39].

Figure 3. System architecture.

Figure 4. Point clouds based on sensor placement. (a) Angle between sensors less than 30°. (b) Angle between sensors approximately between 120° and 190°.

Figure 6. Created database. (a) Without overlap. (b) With overlap. (c) High lighting. (d) Suspended particles.

Figure 7. Labeling distribution.

Figure 8. Data augmentation. (a) Blur to 2 pixels. (b) Brightness to 15%. (c) Exposure to −5%.

Figure 9. Centroid localization algorithm.

Figure 10. Point cloud preprocessing.

Figure 11. Point cloud registration.

Figure 12. RANSAC.

Figure 13. BEV images converted from point clouds.

Figure 14. Results from training the YOLO v8x-Seg model.

Figure 15. Postprocessing. (a) Var 1. (b) Var 2.

Figure 16. IoU metric results by image. (a) Without overlap. (b) With overlap.

Figure 17.

R^{2}

metrics and

m a x_e r r o r

without overlap. (a) N_S_N_O_V1. (b) N_S_N_O_V2. (c) S_N_O_V1. (d) S_N_O_V2.

Figure 17.

R^{2}

metrics and

m a x_e r r o r

without overlap. (a) N_S_N_O_V1. (b) N_S_N_O_V2. (c) S_N_O_V1. (d) S_N_O_V2.

Figure 18.

R^{2}

metrics and

m a x_e r r o r

with overlap. (a) N_S_O_V1. (b) N_S_O_V2. (c) S_O_V1. (d) S_O_V2.

Figure 18.

R^{2}

metrics and

m a x_e r r o r

with overlap. (a) N_S_O_V1. (b) N_S_O_V2. (c) S_O_V1. (d) S_O_V2.

Figure 19. Metrics used to assess the location of the rock centroid by image. (a)

M A E

without overlap. (b)

M A E

with overlap. (c)

e_{norm}

without overlap. (d)

e_{norm}

with overlap. (e)

E D_{XY}

without overlap. (f)

E D_{XY}

with overlap.

Figure 19. Metrics used to assess the location of the rock centroid by image. (a)

M A E

without overlap. (b)

M A E

with overlap. (c)

e_{norm}

without overlap. (d)

e_{norm}

with overlap. (e)

E D_{XY}

without overlap. (f)

E D_{XY}

with overlap.

Figure 20. Examples of rock center localization in the image and centroid in the point cloud. Blue dots represent the ground truth, and red crosses represent the prediction. (a) Point cloud representation in the CloudCompare software. (b) Instance segmentation in a BEV image using YOLO v8x-Seg. (c) Localization in a BEV image. (d) Localization in the point cloud using the Open3D library.

Table 1. Technologies used in 3D sensing [18].

Parameters	Stereo Vision	Cameras 3D ToF	LiDAR
XY resolution	Scene dependent	Very high	High
Accuracy	Low	High	Very high
Real time capabilities	High	Very high	Very high
Cost	Low	Very high	Very high
Range sensing	Low	High	Very high
Outdoor performance	High	High	High
Low-light performance	Low	Very high	Very high

Table 2. Parameters of the YOLO v8x-Seg model [38].

Parameters	YOLO v8x-Seg
Depth multiple	1.0
Width multiple	1.25
C2f-n (Backbone)	3, 6, 6, 3
C2f-n (Neck)	3, 3, 3, 3
Max. number of channels	512
Image size (pixels)	640 × 640 × 3
Number of parameters (M)	71.8
Number of layers	295

Table 3. Parameters of the ROCK-TECH RHINO model XDi3000.

Parameter	Link 1	Link 2	Link 3	Link 4	Unit
Mass	1029.59	1313.7	975.41	2082.98	kg
Center of mass	0.305	1.779	0.777	1.218	m
Length	0.36	3.06	2.38	2.57	m
Inertia	65.74	1468.53	1117.27	1397.74	kg·m²
Upper angular limit	85	81.31	−44.37	40.19	°
Lower angular limit	−85	−25.3	−110.96	−83.95	°

Table 4. Hyperparameters of the ADAMW optimizer.

Hyperparameters	ADAMW
Epochs	250
Batch size	16
Learning rate	0.001
Weight decay	0.0005
Betas	(0.9, 0.999)
Cos_lr	True
Warmup_epochs	3.0

Table 5. Results of the metric used to evaluate rock segmentation in the BEV dataset.

Metrics	Experiments
	N-S-N-O	S-N-O	N-S-O	S-O
Mean_IoU (%)	95.10	93.71	95.59	95.35

Table 6. Sensitivity analysis of YOLO v8x-Seg performance under various image perturbations.

Perturbations	Experiments
	S-N-O	N-S-O
	Mean_IoU (%)	Mean_ IoU (%)
Original	93.71	95.59
Blur 3 × 3	94.48	96.10
Brightness +10%	92.90	95.45
Brightness −10%	92.88	95.33
Contrast +10%	92.91	95.39
Contrast −10%	92.88	95.36
Rotate +5°	74.46	72.26
Rotate −5°	74.33	72.26

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kern, J.; Rodriguez-Guillen, R.; Urrea, C.; Garcia-Garcia, Y. Enhancing 3D Rock Localization in Mining Environments Using Bird’s-Eye View Images from the Time-of-Flight Blaze 101 Camera. Technologies 2024, 12, 162. https://doi.org/10.3390/technologies12090162

AMA Style

Kern J, Rodriguez-Guillen R, Urrea C, Garcia-Garcia Y. Enhancing 3D Rock Localization in Mining Environments Using Bird’s-Eye View Images from the Time-of-Flight Blaze 101 Camera. Technologies. 2024; 12(9):162. https://doi.org/10.3390/technologies12090162

Chicago/Turabian Style

Kern, John, Reinier Rodriguez-Guillen, Claudio Urrea, and Yainet Garcia-Garcia. 2024. "Enhancing 3D Rock Localization in Mining Environments Using Bird’s-Eye View Images from the Time-of-Flight Blaze 101 Camera" Technologies 12, no. 9: 162. https://doi.org/10.3390/technologies12090162

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing 3D Rock Localization in Mining Environments Using Bird’s-Eye View Images from the Time-of-Flight Blaze 101 Camera

Abstract

1. Introduction

2. Methodology

2.1. 3D Sensing Modalities

2.2. Point Clouds Processing

2.2.1. Register

2.2.2. Random Sample Consensus

2.2.3. Statistical Outlier Removal

2.2.4. Clustering

2.3. Bird’s-Eye View Representations

2.4. You Only Look Once v8 Algorithm

3. System Design

3.1. System Architecture

3.2. Hardware Architecture

3.3. Software Architecture

3.4. Mineralogical and Morphological Characteristics Present in Chilean Mining Deposits

3.5. Dataset

3.6. Performance Metrics

3.7. Description of the Centroid Location Algorithm

3.7.1. Preprocessing

3.7.2. Statistical Outlier Removal

3.7.3. Bird’s-Eye View and Mapping

3.7.4. Rock Segmentation

3.7.5. Postprocessing

4. Results and Discussion

4.1. Results of BEV Image Segmentation

4.2. Sensitivity Analysis and Model Robustness

4.3. Results of Rock Centroid Localization in Point Clouds

4.4. Graphical Analysis of the Segmentation in the BEV Image and of the Rock Location in the Point Cloud

4.5. Relevant Aspects

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI