In this section, the studies are outlined and compared in terms of the goals, input data used, and choice of algorithm. Figures are provided that illustrate the similarities and differences between these studies.
3.2. Log End Grading
Grading logs by assessing the ends alone is useful, because logs are often stacked in piles such that the ends are the only visible part. Much information about the log’s quality can also be deduced from a cross-sectional view of the log, such as wood density [
5]. The four studies with goals in the category log end grading had specific goals that were quite different, as illustrated in
Figure 2. Carratù et al. [
3] detected rot in logs, Cao and Li [
4] had the goal of detecting cracks in logs, Du et al. [
5] extracted information about the annual ring distribution, and Decelle et al. [
6] aimed to locate the pith region in log ends. Images were the only type of data used for log end grading, which is logical, considering that piths, annual rings, and rot are challenging to detect using point-cloud data. One might be able to detect large cracks in the log end, but depending on the resolution of the scanner, smaller cracks would be easier to locate using images. The model developed by Du et al. [
5] was influenced by the saw mark disturbances present in the images. Sawing marks would likely be even more prevalent in point-cloud data, which is another possible reason that it was not found to be used in studies of log end grading.
In all the studies in this goal category, the images were converted to grayscale, which is an indication that only the structural information of the images was of importance and that a fairly low spectrum of color frequencies was necessary to extract that information. Cao and Li [
4] used image graying and histogram equalization to denoise images and used global thresholding with a user-determined threshold to generate a binary mask of cracks followed by morphological operations to further remove noise. Finally, the detected cracks that had a length-to-width ratio smaller than a user-defined threshold were removed, and the remaining cracks left in the binary bitmap were counted as cracks. The model attained an average crack detection accuracy of
. The model developed by Decelle et al. [
6] computes the local image gradients of the images of individual log ends and accumulates the gradients using ant colony optimization to locate the pith. When evaluated on two different datasets, the model attained a mean distance to the ground truth pith location of
mm and
mm, respectively. Du et al. [
5] used pith detection as a subroutine of their model but went one step further and extracted information about the log’s annual rings. Specifically, they determined the number of annual rings, the width of the annual rings, and the average distance of the 15th annual ring from the pith and from the outside. Du et al. [
5] only used the value channel of the image when represented in a hue, saturation, and value (HSV) format. The total variation algorithm was used to denoise the image, followed by the Hough transform, and local peaks were accumulated to locate the pith. Finally, circle fitting was used to locate the annual rings, measure the distance between the rings, and estimate the average distance to the 15th ring. The method attained a relative RMSE of
for estimating the average distance of the 15th ring from the center and
for estimating the average distance of the 15th ring from the outside. Carratù et al. [
3] partitioned an image of the end of a transport truck into a set of image cut-outs of individual log ends, which were fed into a self-developed convolutional neural network that classified logs with a level of rot above a certain threshold as "rotten". The logs were then categorized as suitable (1) or not suitable (0). At the task of binary classification, the network attained an F1-score of
Considering that none of the end goals of the studies were the same, it is not meaningful to compare the performance of the different studies. It would have been interesting if Du et al. [
5] had reported the accuracy of their pith location estimator, as that would have given more grounds for comparison with Decelle et al. [
6]. The two methods used for pith detection are also somewhat similar: Decelle et al. [
6] used image gradients to represent annual rings, which were accumulated to find the pith, and Du et al. [
5] used accumulated peaks to locate the pith. Carratù et al. [
3], Du et al. [
5], Decelle et al. [
6] all had fully autonomous models on deployment, meaning that none of their methods relied on operator interference when they were set to estimate their respective goals. Cao and Li [
4] on the other hand relied on the operator for setting the appropriate threshold for the crack-identification algorithm, making it less autonomous. Carratù et al. [
3] is the only study in this goal category that used images of the entire log piles as input and had a preprocessing step in its approach of extracting cut-outs of singular log-ends. Cao and Li [
4], Du et al. [
5], Decelle et al. [
6] all relied on image cut-outs of singular log ends being extracted before the respective models could be applied. This makes the deep learning model developed by Carratù et al. [
3] the most autonomous model in this goal category.
3.3. Log Side Grading
While much information can be extracted from a single cross-sectional image of a log end, log ends reveal little about the distribution of knots in the log. This can affect the quality of the boards or the yield of boards that could be extracted from the log. To detect these types of defects, one has to inspect and grade the log sides, which was done in the studies in this goal category, as illustrated in
Figure 3.
Four out of the five studies in the log side grading goal category collect the point cloud data used for grading the log using a laser scanner. Laser scanners are a common choice for grading the sides of logs, as the majority of studies in this goal category perform some form of defect detection. Lee et al. [
7] is the only study that used images as inputs, where they applied edge detection algorithms to extract depth information from images. Therefore, one can make the argument that some form of 3D scanner would be more appropriate for their application. However, considering that the work of Lee et al. [
7] was first presented in 1991, one should assume that 3D scanners were not as readily available at that time. Khazem et al. [
12] is the only study included in this review that used cross-sectional imaging. They used an X-ray machine to capture cross-sectional images of logs at varying intervals. The work of Khazem et al. [
12] is an exception to the rule of not including works using cross-sectional imaging because after the model was trained, the final model was intended to be applied to radial scans of logs without requiring cross-sectional imaging.
The studies in this goal category can be partitioned into two groups based on their goals. Lee et al. [
7], Thomas and Mili [
8], Nguyen et al. [
9], Thomas et al. [
10] aim to detect defects on the log surface, whereas Zolotarev et al. [
11], Khazem et al. [
12] also detect surface defects, but they used the detected defects to infer the internal knot structure of the log as well. The work performed by Thomas and Mili [
8] appears to be a continuation of the work performed by Thomas et al. [
10], so only the work performed by Thomas and Mili [
8] will be referred to in this section. The models implemented by Thomas and Mili [
8] and Nguyen et al. [
9] are very similar, although the work of Nguyen et al. [
9] was presented almost a decade after the work of Thomas and Mili [
8] was published. Both studies used point cloud data of the edges of the logs in polar coordinate format. In both studies, the polar coordinates were “unrolled” into a height map in Cartesian coordinate format. Thomas and Mili [
8] fit circles to the radial point clouds of individual log slices and unrolled the polar coordinates relative to the center of the fitted circle. Nguyen et al. [
9] evaluated multiple log slices simultaneously and estimated the center line of said slices using cubic spline interpolation. The polar coordinates were then unrolled with reference to the estimated center line. Nguyen et al. [
9] performed an additional conversion from the Cartesian height map to cylindrical coordinates. Both studies estimated a reference distance from the center of the log to the edge that represented the expected distance. In the final step, both studies used a form of automatic thresholding to detect large surface rises and depressions, which were classified as defects.
Lee et al. [
7] captured images of partitions of the log sides while the log was rotated 10 degrees between each scan. An unspecified edge detection algorithm was used to generate a binary mask of the log edge, which was filtered by an algorithm called the skeleton thinning algorithm. The filtering algorithm aimed to fit multiple small straight lines to the edge of the log. Potential knots were then identified by searching for long diagonal lines among the detected edges. Finally, knots were then distinguished from bark based on their color and texture. The model of Zolotarev et al. [
11], which is initially quite similar to the work of Nguyen et al. [
9], consists of five steps: point-cloud filtering and center line estimation, log surface height map generation, knot segmentation, volumetric reconstruction of knots, and virtual sawing of the log. The DBSCAN clustering algorithm was used to filter the point-cloud data, removing artifacts in the radial direction. Circle fitting and cubic spline interpolation were used to estimate the pith location at individual cross-sections and interpolate them into a center line, which is used as a reference when “unrolling” the radial point-cloud data into a log surface height map. Laplacian of the Gaussian filter was used to segment the knots, and the internal structure of the knots was estimated using a biological knot-property model. The height map with estimated knot structures was then converted back to the 3D Cartesian coordinate representation of the log, where virtual sawing was performed by inserting planes into the 3D model of the log, creating intensity maps that represented virtual board faces. The pixel intensities corresponded to the probability that there was a knot in that location of the virtual board. Khazem et al. [
12] trained a deep learning model to predict the internal knot structure of the log given the external point cloud coordinates. Khazem et al. [
12] developed a mixed neural network using LSTM cells for the recurrent layers as well as convolutional and fully connected layers. The network was intended to take only a radial scan of the surface of the log as input and generate a segmentation mask of the internal knot structure of the log.
Lee et al. [
7], Nguyen et al. [
9] did not present any quantitative evaluation metrics, so they cannot be compared to the other studies in terms of performance. The intensity map produced by the model developed by Zolotarev et al. [
11] could be compared to the segmentation bitmaps produced by Khazem et al. [
12]. However, Zolotarev et al. [
11] chose to evaluate the performance based on the correlation between the generated intensity map values and the probability of a knot being present in the real log. The model developed by Zolotarev et al. [
11] attains a Pearson correlation coefficient of
and a Spearman correlation coefficient of 1. While this metric illustrates the effectiveness of the model, it is somewhat convolved and difficult to understand. Khazem et al. [
12] used multiple metrics to evaluate the segmentation performance of the model they developed, where the dice similarity coefficient (similar to F1-score) is the most relevant for comparing the study with other works in this goal category. Khazem et al. [
12] attained dice similarity coefficients of 0.74 and 0.70 when the model was applied to datasets of fir logs and spruce logs, respectively. What is missing in the performance metrics used by Khazem et al. [
12] is some measure of the detection rate, as that would have given better grounds for comparison with the other studies in this goal category. Thomas and Mili [
8] split defects into two classes: those that were expected to be detected and those that were not expected to be detected. Of the 59 defects the model was expected to find, it detected 47, and of the 103 defects the model was not expected to find, the model detected 11. This yields an overall detection rate of
, but when limiting the classification task to only the most visible defects, the detection rate was improved to
. In terms of autonomy, all the models discussed within this goal category are intended to run without requiring operator interference.
3.4. Individual Log Scaling
Log scaling includes measuring the log length, diameter, or volume, as shown in
Figure 4. The three studies in the category of individual log scaling all had the goal of estimating the dimensions of individual logs using images captured from stereo cameras or an equivalent setup. Kruglov and Chiryshev [
13] detected and tracked the logs in videos and calculated their dimensions and volumes in real-time. Kalmari et al. [
14] measured the length of logs in images taken from the harvester head. Yang et al. [
15] created a 3D reconstruction of logs using a dual-camera setup.
All the studies in this category used some form of stereographic imaging techniques. Kruglov and Chiryshev [
13] employed video frames from a stereo camera system to detect, track, and scale logs on a conveyor belt. Kalmari et al. [
14] used images captured by a stereovision camera mounted on a harvester head to measure the length of logs. Yang et al. [
15] used synchronized cameras to create 3D reconstructions of logs on a conveyor belt.
The structures of the models developed in these studies share some similarities. All three studies involved an initial stage of feature extraction or object segmentation. Kruglov and Chiryshev [
13] relied on the background of the images being static while the logs moved through the video frames. They generated a stochastic pixel model of the background using multivariate normal distribution when logs were not present, and they used this to remove the background when logs were present in the frames to generate a segmentation mask. Kruglov and Chiryshev [
13] then used a key point detection algorithm to recognize specific points on the logs that were tracked over multiple frames using optical flow. The final stage of the detection was the combination of the two synchronized segmentation masks. This was done by minimizing the sum of Euclidean distances between two synchronized frames. Kruglov and Chiryshev [
13] approximated the individual logs by a set of cylinders, and the volume of the logs was estimated by adding the volume of the individual cylinders. The method was tested in a laboratory, but no quantitative metrics were presented on the performance of the volume measurements. Kalmari et al. [
14] also utilized the Harris detector and optical flow to track a log across video frames. However, instead of tracking logs on a conveyor belt, the logs were tracked as they were passed through a harvester head. Random sample consensus (RANSAC) was used to remove false feature matches to improve the estimate of the log’s motion. The model developed by Kalmari et al. [
14] was tested on seven logs and attained a mean absolute error of 2.9 mm and a mean absolute relative error of
when estimating log length.
Similar to Kruglov and Chiryshev [
13], Yang et al. [
15] also attempted to scale logs on a conveyor belt using a stereovision setup, but they also generated a point cloud representation of the log without the background. In this setup, two cameras were placed at opposite ends of a conveyor belt capturing images of opposite ends of a log. Two synchronized frames were warped and rotated such that the log was aligned with the x-axis of both images. The common x-axis after rectification is referred to as the epipolar line. Individual partitions of pixels, referred to as "blocks", were then matched based on their pixel characteristics. For the block-matching, they used a window of
pixels and searched along the epipolar line that minimized the sum of absolute differences algorithm. The two cameras were only able to see one side of the log at a time, so to obtain a point cloud model of the entire log, it was rotated in batches of 10 degrees to compute new point cloud coordinates, which were then added to the 3D reconstruction of the log. Their model was tested on three logs of different shapes, and their model output was compared with the output of a laser scanner. The 3D reconstruction estimated by their model was found to coincide well with the output of the laser scanner, but no quantitative performance metrics were given.
In terms of performance and autonomy, Kalmari et al. [
14] provided quantitative performance metrics, reporting a mean absolute error and mean absolute relative error for their model. In contrast, Kruglov and Chiryshev [
13], Yang et al. [
15] did not provide quantitative results regarding the accuracy of their model estimates. This variation in reporting makes it difficult to compare these studies in terms of model performance. All the models presented in this goal category are intended to run relatively autonomously. The model developed by Kalmari et al. [
14] could even complement one of the other two models such that log scaling could be performed during harvesting and at the sawmill.
3.5. Log Pile Scaling
Since logs are often stored in piles, it is useful to obtain an estimate of the aggregate volume of an entire pile. This was the goal of the studies included in this goal category.
Figure 5 illustrates the two specific goals encountered in this category: the volume estimation of a log pile and the estimation of the distribution of diameters among the logs that make up the log pile.
In terms of input data, all the studies in this goal category utilized images, except for Martí et al. [
24], which used LiDaR to collect point cloud representations of the log end side of a log pile. Herbon et al. [
17], Kruglov et al. [
18], Correia et al. [
19] used images of log piles taken from different angles to create a 3D reconstruction of the pile, similar to the methods used by Yang et al. [
15]. Galsgaard et al. [
16], Li et al. [
20], Carratù et al. [
21], Zheng et al. [
22], Carratù et al. [
23] used images captured only from the log end side of a log pile. Zheng et al. [
22] used a binocular camera that gave depth measurements of the objects in the image, which were used to convert the relative pixel measurements to physical measurements. Galsgaard et al. [
16], Li et al. [
20], Carratù et al. [
21,
23] used cameras that captured 2D images but relied on detecting reference objects in the image of known sizes, which were then used to convert relative pixel measurements to physical measurements.
Table 4 shows the different types of models used by the studies in this goal category. There are an equal number of studies that apply deep learning models and classical image processing algorithms and a single study that uses a clustering algorithm. It should be noted that all the studies that used artificial neural networks were published in 2023, and the studies using classical image processing techniques were published between 1991 and 2021. Li et al. [
20] developed an instance segmentation model based on combining an object detection model and semantic segmentation model run in parallel. The outputs from the object detection and semantic segmentation are combined using a "metric learning paradigm" to produce the final log end instances. Li et al. [
20] developed a custom loss function that combines the loss in the detection and segmentation branches such that the instance segmentation model is end-to-end trainable. To obtain the real-world dimensions of the logs, they used the inner diameter of the rear wheel of the loading trucks as a reference object of known size, which yields a scaling factor. The two studies (Carratù et al. [
21,
23]) are written by many of the same authors, and they have almost identical approaches. The goal of both studies was to detect the logs in the pile and measure their diameters. Carratù et al. [
23] used YOLOv4 to detect log ends on the backs of loading trucks, marked them with bounding boxes, and used direct linear transformation (DLT) to convert the pixel measurements to real measurements. The method developed by Carratù et al. [
23] relied on operators manually marking the corners of two yellow triangles in each image to act as a reference object of known size. The DLT algorithm then used the goal points to calibrate the camera and create a "homography plane" of the truck rear-end, which represented both the distance to and orientation of the truck rear-end with respect to the camera. The length of the longest side of each log end bounding box was then used to estimate the log end diameters. Carratù et al. [
21] seems to be a continuation of the work performed by Carratù et al. [
23], where they used a newer object detection model, YOLOv5s, and incorporated the detection of the reference object into the deep learning model. Carratù et al. [
21] also replaced the two yellow triangles with a checkered square as the reference object. Zheng et al. [
22] used a customized version of Mask R-CNN to perform instance segmentation of the log ends on loading trucks to estimate the wood volume of the entire vehicle. Individual log cutouts were fitted with ellipses using the least squares method. Since a binocular camera was used, Zheng et al. [
22] can estimate the depth at different pixels. Estimating the diameter of the individual log end masks then becomes a matter of estimating the distance from the camera to the log end by matching the coordinates of the center of the fitted ellipse curve to the corresponding coordinate in the depth image and using the distance to scale the pixel diameter to real measurements. Assuming that the length of the truck was known, the length of the logs and hence their volume could be estimated given their distance from the camera.
Herbon et al. [
17], Kruglov et al. [
18], Correia et al. [
19], Martí et al. [
24] all used classical image processing models. In the setup in Martí et al. [
24], the log piles being scanned were always placed in a rectangular support frame and were measured from a fixed range of distances. Martí et al. [
24] used depth–threshold filtering to separate the point cloud that corresponds to the log ends from the point cloud that corresponds to the support frame and the rest of the background. To perform the segmentation, Martí et al. [
24] tested two different algorithms: a region-growing segmentation algorithm and a circle fitting algorithm. The region-growing algorithm classified point cloud coordinates into regions based on their coordinate value and the value of the neighboring coordinate values, and the circle fitting algorithm is given a minimum and maximum diameter and finds circles in the image within the valid diameter domain. In Kruglov et al. [
18], a model that allows for input of one or two images of the same log pile from different angles is validated. It used a combination of segmentation and clustering for detecting, segmenting, and scaling the log piles. The fast radial symmetry algorithm was used to detect the log ends in images, and a combination of the Stoer–Wagner algorithm and the Watershed method was used to segment the log ends in the images. If multiple images were given as inputs, both images were segmented, and the minimum Euclidean distance was used to match the log end segmentations from both images and compute the physical measurements of the log ends. The logs in the pile were assumed to have more or less the same length when computing the volume of the full pile. Correia et al. [
19] segmented log piles on the back of loading trucks. Correia et al. [
19] made use of images captured from the side and the end of each log pile. Additionally, they relied on the images always being captured with the same background and at the same distance to filter out the background and convert from relative measurements to physical measurements. They made use of image gradients, spatial-average filtering, and a region-growing algorithm to segment the pile in both images. The difference in brightness between the solid wood pixels and the empty space between the logs was used to estimate the portion of the segmented pile that was solid wood. Herbon et al. [
17] developed a model that utilizes multiple machine learning techniques. They used a quadratic filtering technique to approximate the log ends in images with circles. The K-nearest-neighbors estimator was used to estimate the contour that enveloped the entire pile. A random sample consensus-based plane fitting method and principal component analysis were used to fit a plane to the surface of the log pile with the log ends and orient the pile according to a Cartesian coordinate system. The volume of the individual logs was estimated by multiplying the circular area of each individually segmented log with the average length of the logs in the pile.
Galsgaard et al. [
16] used the circular Hough transform (CHT) and local circularity measures (LCM) to detect the image region with the highest density of circular shapes. The detected area corresponds to an approximate detection of the log pile in the image. The detected area was regarded as foreground when the graph cut segmentation was initiated. The "blobs", detected in the graph-cut segmentation step, that had a diameter below a certain threshold were discarded as false detections, while the remaining blobs were subjected to a series of morphological operations to refine the circular shape of the segmented logs. To convert the relative measurements in the segmentation mask to physical measurements, Galsgaard et al. [
16] made use of short blue rods placed on the log ends as reference objects of a known scale. It was also assumed that the average length of the logs that made up the pile was known, such that the volume of each log could be calculated as the segmented log end area multiplied by the average log length.
Table 5 shows the performances recorded by the different studies in the goal category. Martí et al. [
24] did not specify any quantitative metrics related to the scaling of the log pile, whereas Correia et al. [
19], Carratù et al. [
23] specified the performance of their methods with slight caveats. Correia et al. [
19] reported that the volume estimates of their model differed less than
from the manual measurement estimates for
of the loads it was tested on. Carratù et al. [
23] report that the errors in the diameter estimates of their model followed a normal distribution centered at zero,
of the measurements had errors in the domain
, and
of the measurements had errors in the domain
. These reported error measurements give an idea of the performance of their models, but they are not easy to directly compare with the other models in this category. Among the studies that have the specific goal of estimating log pile volume, the clustering model developed in Galsgaard et al. [
16] has the worst performance. The three models that used classical image processing techniques, Herbon et al. [
17], Kruglov et al. [
18], Correia et al. [
19], seem to have approximately equal performances, depending on which of the average deviations reported by Herbon et al. [
17] one takes into account. The deep learning model developed by Zheng et al. [
22] has the highest performance at estimating the wood volume of log piles, with an average relative error of
. For the studies with the specific goal of estimating the diameter distribution of the logs, we see that the model developed by Li et al. [
20] has a higher performance than the model developed by Carratù et al. [
21].
All the models presented in the studies in this goal category are intended to run fully autonomously, except the object detection model by Carratù et al. [
23], which required operators to manually select the reference objects in the images for each measurement. However, this seems to have been improved in the second iteration of the model presented by Carratù et al. [
21], where detection of the reference object is included in the deep learning model.
3.6. Log Segmentation
Segmentation of logs in images is an important step in the process of log scaling or log grading using computer vision. However, additional information is required to convert the relative sizes detected in images to absolute sizes. The motivations for developing segmentation models are the same as those for developing automatic log scaling and log grading models; namely, to reduce the labor costs and inaccuracies associated with manual log scaling. Many studies have segmentation as a subroutine of their method, but the studies included in this goal category all have segmentation of logs as their final goal.
In contrast to some of the other goal categories, the goals of all the studies included in log segmentation were quite similar. All of them used images as inputs and had the aim of producing a segmentation mask separating the logs from the backgrounds as output. The slight variations in the goals are illustrated in
Figure 6. The studies that performed semantic segmentation of log ends produced a binary bitmap where individual pixels are regarded as “log end” or “not log end”. The studies that performed instance segmentation of the log ends produced bitmaps where the log ends were separated from the background. The individual instances of log ends were separated as well, such that they could be analyzed separately. The study that performed instance segmentation of entire logs yielded bitmaps where the pixels of the entire logs were separated from the background, and the individual log instances were held separate. The majority of the studies in this goal category used images of log piles, taken from the log end side as input, with three notable exceptions: Fortin et al. [
32] worked with images of entire logs taken from random angles, as their goal was instance segmentation of entire logs, and Schraml and Uhl [
31], Decelle and Jalilian [
33] used small images of individual log ends as the input.
Table 6 shows the different types of models encountered in this goal category. The models used by the studies in this goal category mainly fall within two categories, namely clustering algorithms or artificial neural networks. The only exception is Chiryshev et al. [
25], which used histograms of oriented gradients (HOG) for feature extraction and the Random Forest algorithm to classify individual pixels as “log face” or “not log face”. Graph-cut segmentation is the most-used clustering algorithm among the studies within the log segmentation goal category, although it is implemented in different ways. Gutzeit and Voskamp [
26] first used Haar cascades to perform a type of object detection of the log ends in the images, which were represented as circles. Graph-cut segmentation was then used to segment the image into the foreground, background, and unknown pixels. The Graph-cut segmentation was initiated, with the circular areas detected with Haar cascades as the foreground, and the bitmap was refined by capturing the pixels that were not included in the original circular object detection. Finally, the object detection results and semantic segmentation results were combined to yield an instance segmentation of the log ends. Gutzeit et al. [
27,
30] are studies written by the same authors and have a very similar approach. The algorithms developed by Gutzeit et al. [
30] rely on the assumption that the log piles are always located in the center of the image. They initiated the graph-cut segmentation with portions of the bottom and top of the images as the background and a portion of the image center as the foreground. The yellow component of the image in RGB format and the value component of the image in hue, saturation, and value format (HSV) were extracted from the image and were used to adjust the initial weights of the graph-cut segmentation algorithm when it was applied to log end segmentation. Herbon et al. [
28], Schraml and Uhl [
31] are two studies that perform semantic segmentation of log ends in images using a clustering algorithm other than graph-cut segmentation. Herbon et al. [
28] have designed an iterative model consisting of an initial stage of pixel classification using local binary patterns (LBP) in combination with HOG. The initial classifiers output a binary bitmap, which is passed on to the iterative pipeline of Gaussian mixture model (GMM) clustering, thresholding, watershed transform, distance transform, and another set of LBP and HOG classifications. The iterative pipeline is repeated until the set of detected objects does not change. Schraml and Uhl [
31] worked with images of individual log ends and aimed to segment only the pixels corresponding to the log end. Schraml and Uhl [
31] developed a three-stage region-growing algorithm that used two fast computable texture features to describe pixel blocks in images and the earth movers distance to measure the distance between neighboring pixel blocks. The three stages consisted of cluster initialization, where a fixed number of clusters are initialized that are equidistant from the image center; the growing procedure, where clusters are grown using fixed thresholds for the distance to neighboring pixels; and cluster merging and trimming, where the clusters are merged to form one continuous bitmap for the log end and erroneous pixels are removed using ellipse fitting.
Five studies applied artificial neural networks in their segmentation models. Samdangdech and Phiphobmongkol [
29] created a three-stage algorithm that detects the log pile using the single shot multi-box detector object detection network, segments the log ends in the detected pile using a fully convolutional VGG-16 network, and finally separates the individual log ends using the “connected component labeling” function that is part of the OpenCV library. Fortin et al. [
32] compared the Mask R-CNN, Rotated Mask R-CNN, and Mask2Former neural networks at the task of instance segmentation of whole logs in images. The model that attained the highest mean average precision (mAP) was the visual transformer-based network Mask2Former. Decelle and Jalilian [
33] also compared a set of different networks, but at the task of semantic segmentation of log ends in images. They compared U-net Mask R-CNN, RefineNet, and SegNet. RefineNet attained the highest dice score. Praschl et al. [
34] used a two-stage algorithm for instance segmentation of log ends in images. The YOLOv4 object detection network is used to detect the individual log ends in images and extract the image cut-outs from the computed bounding boxes. The individual cut-outs were then processed individually by a U-net segmentation network that detects the individual pixels in each cut-out that correspond to the log ends. Zheng et al. [
35] used a modified version of the YOLOACT network to perform instance segmentation of the log ends in images. YOLOACT itself is a modified version of the object detection network YOLO, which is made for instance segmentation.
Table 7 shows the performance metrics of the best model in each study. Even though the goals for the studies in this category are close to identical, the evaluation metrics used are vastly different. Some studies only reported the performance metrics on the log detection part of their model, some only reported the performance of the pixel-wise segmentation stages of their model, and some studies reported the performance of both. Gutzeit et al. [
30], Schraml and Uhl [
31] measured performance in terms of pixel-wise false positives and false negatives and pixel-wise absolute error, respectively. These metrics are somewhat comparable, since false positives and false negatives must be a part of the absolute error. The reported metrics indicate that the region-growing algorithm of Schraml and Uhl [
31] performed better than the graph-cut segmentation method of Gutzeit et al. [
30]. However, their respective goals are somewhat different. Schraml and Uhl [
31] had the goal of segmenting single log ends in images of individual log ends, whereas Gutzeit et al. [
30] aimed to segment all the log ends in images of log piles. The latter was a more complex task than the former. Gutzeit and Voskamp [
26], Gutzeit et al. [
27], Samdangdech and Phiphobmongkol [
29], Decelle and Jalilian [
33], Praschl et al. [
34] all reported the pixel-wise segmentation in terms of F1-score, which makes it possible to compare them. Gutzeit and Voskamp [
26], Gutzeit et al. [
27] used graph-cut segmentation-based models and attained F1-scores of
and
, respectively, for pixel-wise segmentation. The deep learning-based methods presented by Samdangdech and Phiphobmongkol [
29], Decelle and Jalilian [
33], Praschl et al. [
34] attained F1-scores of
,
, and
, respectively, for pixel-wise segmentation, which is
to
higher than the clustering-based methods. The RefineNet model of Decelle and Jalilian [
33] performed approximately equally well as the SSD model of Samdangdech and Phiphobmongkol [
29] in terms of F1-score. The RefineNet and SSD networks outperformed the YOLO model of Praschl et al. [
34] in terms of the segmentation F1-score. The remaining studies all evaluated their models in terms of the log detection rate or log detection error rate. Chiryshev et al. [
25], Herbon et al. [
28] both evaluated their model’s performance with the recall and false positive rate. They reported equal recall scores, but Chiryshev et al. [
25] reported a false positive rate that was three times higher than Herbon et al. [
28]. Chiryshev et al. [
25] also reported the F1-score for their log detection rate, which was nearly identical to the log detection rate F1-score reported by Samdangdech and Phiphobmongkol [
29] of
. Fortin et al. [
32], Zheng et al. [
35] are the only two studies that reported performance in terms of mean average precision at the intersection over union (IoU) threshold 50 (
). Although it may seem as though the model of Zheng et al. [
35] outperforms the model of Fortin et al. [
32] in terms of
, it should be noted that Zheng et al. [
35] had the goal of segmenting the log ends on loading trucks in images with fairly homogeneous backgrounds, and Fortin et al. [
32] had the goal of segmenting whole logs in images with heterogeneous backgrounds. Hence, the task of Fortin et al. [
32] was more complex, and their performance cannot be directly compared to that of the model developed by Zheng et al. [
35].
Concerning autonomy, all the studies developed models that are intended to work without any direct operator intervention. That being said, the type of input data the models use indicates what level of indirect intervention they require. The model developed by Fortin et al. [
32] was the only model that was intended to segment entire logs independent of the orientation and background conditions. The remaining models require the logs to be stacked in a pile with the log ends facing the camera. This would make the model developed by Fortin et al. [
32] the most autonomous. Among the log end segmentation studies, there are different levels of autonomy. The studies that used clustering methods in their model require the log piles to be centered in the image and the background to be homogeneous above and below the pile. The studies that used deep neural networks do not specify any of these requirements explicitly, so if the data used to train said networks included a variety of background conditions, such requirements could be mitigated. The least-autonomous model in this category is the one developed by Herbon et al. [
28], as it uses small rectangular images of singular log ends as the input. This means that the step of extracting the cut-out images of singular log ends either has to be performed manually or by another model.