Orthophoto-Based Vegetation Patch Analyses—A New Approach to Assess Segmentation Quality

Maćków, Witold; Bondarewicz, Malwina; Łysko, Andrzej; Terefenko, Paweł

doi:10.3390/rs16173344

Open AccessArticle

Orthophoto-Based Vegetation Patch Analyses—A New Approach to Assess Segmentation Quality

¹

Faculty of Computer Science and Information Technology, West Pomeranian University of Technology, 71-210 Szczecin, Poland

²

Institute of Marine and Environmental Sciences, University of Szczecin, 70-383 Szczecin, Poland

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(17), 3344; https://doi.org/10.3390/rs16173344

Submission received: 22 June 2024 / Revised: 9 August 2024 / Accepted: 30 August 2024 / Published: 9 September 2024

(This article belongs to the Special Issue Remote Sensing of Invasive Alien Species—towards Effective Monitoring and Management (Second Edition))

Download

Browse Figures

Versions Notes

Abstract

:

The following paper focuses on evaluating the quality of image prediction in the context of searching for plants of a single species, using the example of Heracleum sosnowskyi Manden, in a given area. This process involves a simplified classification that ends with a segmentation step. Because of the particular characteristics of environmental data, such as large areas of plant occurrence, significant partitioning of the population, or characteristics of a single individual, the use of standard statistical measures such as Accuracy, the Jaccard Index, or Dice Coefficient does not produce reliable results, as shown later in this study. This issue demonstrates the need for a new method for assessing the betted prediction quality adapted to the unique characteristics of vegetation patch detection. The main aim of this study is to provide such a metric and demonstrate its usefulness in the cases discussed. Our proposed metric introduces two new coefficients,

M^{+}

and

M^{-}

, which, respectively, reward true positive regions and penalise false positive regions, thus providing a more nuanced assessment of segmentation quality. The effectiveness of this metric has been demonstrated in different scenarios focusing on variations in spatial distribution and fragmentation of theoretical vegetation patches, comparing the proposed new method with traditional metrics. The results indicate that our metric offers a more flexible and accurate assessment of segmentation quality, especially in cases involving complex environmental data. This study aims to demonstrate the usefulness and applicability of the metric in real-world vegetation patch detection tasks.

Keywords:

drone image; remote sensing; deep learning; prediction methods quality evaluation

1. Introduction

Semantic segmentation, i.e., the separation of semantically coherent areas (segments) in an image, is used in many fields such as, for example, machine vision in the broad sense [1,2,3], medical diagnostics [4,5] or remote sensing [6,7]. The popularity of this technique is related, among other things, to the development of increasingly efficient and effective deep learning methods, which allow segmentation methods to be easily adapted to specific classes of problems.

In the case of remote sensing, segmentation is often the first step in the image analyses performed, concerning tasks as diverse as land use classification [8], object detection [9,10], change monitoring [11], discovery and inventory of historical heritage [12], or support for precision agriculture [13]. In the following study, we focus on the application of deep learning and semantic segmentation to the inventory and detection of stands of a selected plant species. Depending on the scale of the analysed images, this problem can come down to the detection of individual plants [14], their local clusters [15], or their global range [16]. The choice of scale will depend on the objective we want to achieve. In our work, we use images taken with UAVs—in this particular case, the resolution of the images obtained allows the identification of local clusters but does not allow the separation of a single plant. Such a scale allows a simple inventory to be made. The main advantage of such a solution is that the inventory can be performed much faster than from the ground with human intervention and that it is possible to reach places that are difficult or not at all accessible from the ground [17,18].

Selecting an appropriate deep learning algorithm and training a model for a specific task, e.g., searching for a specific plant species on an orthophoto, usually requires many experiments. By modifying the structure and parameters of the learning network or the characteristics of the input data (e.g., number and type of image channels, ground sample distance, etc.), we look for a model that will allow the most accurate prediction of the area of plant occurrence. To obtain such a model, we need to compare the quality of the data obtained, eliminating those with insufficient accuracy and continuing to experiment with solutions with higher accuracy, which leads to the problem of assessing the quality of the segmentation performed. This problem is not exclusive to remote sensing data but is generally related to segmentation itself. However, solutions are often specific to a particular type of data.

Many studies attempt to structure segmentation evaluation methods. A hierarchical classification of evaluation methods was proposed by Zhang [19], among others, and repeated with minor changes by Chen [20]: firstly, methods were divided into subjective, i.e., based on human evaluation, and objective methods. Within the objective methods, indirect methods, among others, were distinguished, which do not directly evaluate the algorithm itself, but focus on evaluating the results. The direct methods can be further divided into two categories: supervised methods, where we use a ground truth mask for evaluation, and unsupervised methods. Unsupervised methods are systematically developed [21], but by far the more common use is of supervised methods. They dominate numerous overviews of general evaluation methods [22,23] and are among methods dedicated to specific applications, such as medical [24] or remote sensing [20,25,26]. Wanga et al. [22] detail the above division, distinguishing pixel-based, region-based, and distance-based methods among supervised methods. In real-world applications, pixel-based supervised evaluation methods dominate. This trend is mainly related to the simplicity and speed of these methods. In the case of remote sensing image segmentation, the most commonly used basic measures of this type are Accuracy, Recall, Precision, and, interchangeably, the Jaccard Index or Dice Coefficient [6,27]. Among the methods based on Accuracy, there are Cohen’s Kappa and Kendall’s Tau coefficients [28,29]. All of these metrics, in the form of easily comparable numerical values, determine how accurately the prediction resulting from the use of a given model corresponds to the pattern (ground truth mask). This is no different in the special case of remote sensing, where the objects being segmented are plants [30,31,32,33,34,35].

The purpose of this article is to propose a new method for evaluating the quality of segmentation that does not focus on efficient pixel-to-pixel matching, but approaches detection more flexibly due to the specifics of, for example, vegetation. From the point of view of field analysis, it is of greatest importance to identify specific plant sites (regions) in a given area. Even an “incomplete” indication of a single region will be useful, which will allow sending a drone or a person to such a place again for more accurate field verification. By an incomplete indication, we mean one where the entire area of a coherent region in the prediction hit only a certain part of it (e.g., 10%). Therefore, it would be ideal to highly reward even small area true positive hits and increasingly lower rewards for an increase in area hits. The analogous effect for false positive hits would be to penalize only for large (area) non-hits. As a result, we would like to obtain an amplification of the coefficient with an increase in the number of positively localised regions and, at the same time, reduce the impact of region-matching accuracy at the individual pixel level.

Our proposed metric introduces two new coefficients implementing the above assumptions,

M^{+}

and

M^{-}

, which, respectively, reward true positive regions and penalize false positive regions, providing a more nuanced assessment of segmentation quality. The effectiveness of this metric is demonstrated further with detailed scenarios focusing on changes in the spatial distribution and fragmentation of theoretical vegetation patches, comparing the proposed new method with traditional metrics. The results indicate that our metric offers a more flexible and accurate assessment of segmentation quality, especially in cases involving complex environmental data with a particular focus on vegetation analysis. In this paper, we initially used the example of the analysis of the invasive species Heracleum sosnowskyi (Manden.).

2. Materials and Methods

In the following study, we used the results obtained from our work to detect stands of Heracleum sosnowskyi (Manden.) in drone flight images. Invasive alien species (IAS) are animals and plants that are accidentally or intentionally introduced outside their natural habitat [15,36,37]. Their presence often has serious consequences for native ecosystems, including plant and animal species, leading to a loss of biodiversity and negative economic consequences. One of the most significant threats is the rapidly spreading population of invasive hogweed species (Heracleum spp.). Currently, there are four species of the genus Heracleum in Poland: H. mantegazzianum, H. sosnowskyi, H. sphondylium, and H. sphondylium subsp. sibiricum. Two of them, H. mantegazzianum Sommier and Levier, 1895, and H. sosnowskyi Manden, 1944, are of Caucasian origin and are classified as invasive alien species according to the EP Regulation 2014. Originally brought to Europe, primarily to Central European nations for decorative and practical purposes (as animal feed), they have spread quickly and adversely impacted biodiversity by changing the features of the environment. Furthermore, the sap of these plants, which contains phytotoxic substances, poses a risk to both animals and humans [38,39]. Currently, their control requires a comprehensive approach for inventorying their occurrence and assessing their potential for control based on the size of the communities they form. Surveys were conducted at the inventory sites of the plant in areas where it has been noted to occur in large numbers. The possibility of an aerial inventory is important because Heracleum sosnowskyi Manden often occurs in wetlands and areas that are difficult to access. The identification of accurate sites using machine learning prediction will facilitate further work, including the following:

Conducting additional drone flights with different parameters, for example, from a lower altitude (to obtain higher-resolution images) or using a different camera (e.g., multispectral instead of RGB);
Monitoring changes in the extent of the plant over time (e.g., in connection with an attempt to eradicate it);
Sending people to these locations to verify the presence of the plant on the ground and possibly undertake its removal.

A DJI Matrice 210 RTK v2 drone fitted with a MicaSense RedEdge-MX multispectral camera was used to capture the pictures. The drone’s position during flight was determined by GPS and RTK services. Additionally, GPS and RTK services were used to locate ground control points (GPCs). The images did not include the RTK position but only the position from the GPS module built into the camera. Data from a dedicated MicaSense DLS 2 incident light sensor were used to normalize the results obtained from the camera. Version 1.6.2 of Agisoft Metashape software was used to assemble orthophotos.

In the baseline scenario represented in this study, prediction results were compared for:

Three-channel orthophoto (red, green, blue)
Five-channel orthophoto (red, green, blue, red edge, near-infrared).

The images for the orthophotomap were acquired from an area of 63.3 ha, located near the small lake Jan (Poland, West Pomeranian Voivodeship, 53°5′24″N, 15°19′24″E). The flying range included lakes, meadows, arable fields, scrubs, and forests. The locations of the study sites are shown in Figure 1.

2.1. Standard Segmentation Quality Metrics Used

We will present the basic measures of supervised pixel-based evaluation using the standard notations [40]. Let

T

denotes the ground truth mask,

P

denote the prediction mask, and

||T||

and

||P||

be their respective areas. We used standard terms such as

T P

(true positive),

F P

(false positive),

T N

(true negative), and

F N

(false negative), for which the following relationships hold (Figure 2):

T = T P \cup F N .

P = T P \cup F P .

The primary metrics used to assess classification quality include Accuracy (or, more precisely, Overall Accuracy), Recall (Sensitivity), and Precision [40,41].

Accuracy allows a quick calculation of the percentage of correctly classified pixels.

A c c u r a c y = \frac{||T P|| + ||T N||}{||T P|| + ||T N|| + ||F P|| + ||F N||}

Recall provides information on the accuracy of positive hits while ignoring incorrect hits

R e c a l l = \frac{||T P||}{||T P|| + ||F N||}

Precision mainly considers incorrect hits.

P r e c i s i o n = \frac{||T P||}{||T P|| + ||F P||}

This way we obtain information based on the number of pixels, ignoring the existence and distribution of the connected components of the ground truth mask. In our case, these metrics are only helpful for an initial evaluation of the quality of the classification. Hereafter, we refer to connected components (i.e., the smallest area of indivisible plant sites) as regions.

To evaluate the quality of the prediction more precisely, we used three metrics commonly used to evaluate the quality of image segmentation (e.g., in aerial or medical image analysis tasks [28,42,43,44,45]). Let us recall these definitions:

IoU (intersection over union (IoU): other terms used in the literature include the Jaccard index and the Jaccard similarity coefficient.

I o U (T, P) = \frac{||T \cap P||}{||T \cup P||}

The values of the coefficient are in the range [0, 1], with 0 indicating no fit and 1 indicating full fit.

2.: Dice’s coefficient, other names: Sørensen–Dice coefficient, Dice similarity coefficient (DSC), and F1 score.

D i c e (T, P) = \frac{2 ||T \cap P||}{||T|| + ||P||}

The values of the coefficient are in the range [0, 1], with 0 indicating no fit and 1 indicating full fit.

Dice’s coefficient can be calculated using the Recall and Precision metrics mentioned previously.

\frac{2 \times R e c a l l \times P r e c i s i o n}{R e c a l l + P r e c i s i o n}

3.: Tau coefficient, other names: Kendall rank correlation coefficient and Kendall’s Tau coefficient. In this study, a version of the Tau-b coefficient was used. The Tau coefficient [28] was proposed as a simpler and more efficient version of the Cohhen’s Kappa coefficient [44], which is again used as another measure of accuracy. Let $C$ denote the number of concordant pairs, $D$ denotes the number of discordant pairs, $T_{t}$ is the number of ties in $T$ , and $T_{p}$ is the number of ties in $P$ . If a tie occurred for the same pair in both $T$ and $P$ , it was not added to either $C$ or $D$ . The Tau-b coefficient is given by the following equation:

$τ_{B} (T, P) = \frac{C - D}{\sqrt{(C + D + T_{t}) (C + D + T_{p})}}$

Although the values of the Tau coefficient are in the range [−1, 1], to make them comparable with the other coefficients, we decided to normalise their values and, as a result, they are in the range [0, 1] (i.e., the same as IoU and Dice coefficients). The formula for the normalized Tau-b coefficient is given below.

τ_{B_{n o r m}} (T, P) = \frac{τ_{B} + 1}{2}

2.2. Research Preparation and Problem Identification

Two different segmentation models based on deep learning were used in this study:

UNet fed with 3-channel images (RGB),
UNet powered by 5-channel data (RGB, red edge, and near-infrared).

The main goal of this ongoing study was to determine the range of a single species (Heracleum Sosnowskyi Manden.), the task was limited to binary segmentation (the location in the image included in the plant). The trained models were evaluated for prediction quality using a 1-bit ground truth mask. The model structure, its parameters, and the learning process were not relevant in this study. The most important task is to compare the quality of the obtained predictions—for example, after changing the learning parameters—to determine which element of the analysis improves or worsens the segmentation results.

An example of an RGB orthophoto is shown in Figure 3A. An inventory was created for the same area, and a ground truth mask of the occurrence of the selected plant species in the study area was created (Figure 3B).

Figure 3C,D displays the result of an overlay of the prediction and ground truth masks using the two models previously mentioned. Figure 3C,D’s colours were interpreted as follows:

Blue is a true positive (TP; an object exists in the mask and has been indicated in the prediction).
Black is a false positive (FP, object exists in the mask but has not been indicated in the prediction).
Red indicates a false negative (FN, no object in the mask, but has been indicated in the prediction).
White is a true negative (TN, no object in the mask, and not indicated in the prediction).

The example shown in Figure 2 shows a few of the disadvantages of using the Tau, IoU, and Dice metrics to compare the prediction quality. Even at first glance, one can see a significant improvement in the prediction from Figure 3D (infrared model) compared with the prediction from Figure 3C. False indications on the lake surface observed in Figure 3C have been eliminated. The standard values of the metrics given in Table 1 do not, however, reflect this. The changes in the metrics’ values were minimal, even despite the clear modifications to the prediction structure. This is because of the linear dependence of the coefficient values on the total area of the correctly hit pixels. As a result, standard metrics are primarily focused on the number of pixels detected but are insensitive to their distribution.

A closer analysis of both predictions revealed that, in the case of the Figure 3D prediction, the positive impact of eliminating obvious false hits was, to a certain extent, offset by the lower accuracy of positive hits; the issue is not so much fewer hits regions and less accurate coverage.

Identifying the precise locations of plants (or regions) in a given area is crucial for the research that will be carried out. Even incomplete indications of a single region will be useful for sending a drone or ground level survey worker to that location. By an incomplete indication, we mean that only a part (e.g., 10%) of a region is hit. Therefore, the optimal solution would be to reward even small TP area hits highly and to reward increasingly lower areas for an increase in the hit area. An analogous effect of FP hits is to penalize only large (area) misses. As a result, we would like to obtain a factor enhancement for the number of positively localized regions and simultaneously reduce the impact of the region matching the accuracy at the individual pixel level.

We conducted a comprehensive analysis of the data collected to better investigate this issue, making sure to verify the quantity and size of each individual region identified in the ground truth mask and the predictions.

Figure 3B shows 611 regions with a total area of 5,665,854 pixels (the entire image has 372,799,248 pixels). Table 2 lists the regional parameters for both predictions. The number of regions and their total areas were included, considering both the masks themselves (ground truth and predictions) and the results in the form of TP and FP images.

The number of TP region hits was slightly lower in the 3D prediction than in the 3C prediction (292 vs. 301). In contrast, the number and area of the indicated FP regions were significantly lower.

2.3. Proposed New Metric

For the aforementioned Tau, IoU, and Dice metrics, the values of the calculated coefficients depended linearly on the TP area. Thus, there is no need to treat each region independently (although this is possible), and we can immediately operate on the entire image. We aim to change this dependence into a nonlinear one; we promote even a small hit in a region. As the TP area in a single region increased, the increase in the coefficient value decreased. To achieve this, we analysed each region separately. To achieve the same results for images with varying resolutions (and the same percentage coverage), a relatively complex normalization had to be added to the logarithmic function used for the initial tests. A better solution is to use a root function (for the TP) and a power function (for the FP).

Because we are interested in detecting a given object, both the ground truth and prediction masks are binary matrices of the same dimension. The proposed metric

M

comprises the following coefficients:

$M^{+}$ —positive region coefficient calculated for the TP regions, taking values from 0 to 1
$M^{-}$ —negative region coefficient calculated for FP regions and taking values from 0 to 1
$Δ = M^{+} - M^{-}$ taking values from −1 to 1.

To better understand how the formulas work, consider an example illustration in which a ground truth mask and a prediction mask have been layered on top of each other. Let us denote the ground truth mask as

T

and the prediction mask as

P

. Regions T1, T2, T3, T4, and T5 were separated in the ground truth mask and regions P1, P2, P3, P4, P5, P6, P7, and P8 were separated in the prediction mask as on the illustration below (Figure 4).

We identified the regions with their matrix representations and denoted them in the same manner. First, for each region of the ground truth mask, we calculated its positive coefficient

m^{+}

, which was then used to calculate the positive region coefficient

M^{+}

. To calculate

m^{+}

, from the initial matrix

T

of the ground truth mask, we selected the smallest submatrix containing the first region

T_{R}

, i.e., the blue region denoted as T1. This submatrix is a matrix of dimension

k \times l

with nonzero rows and columns represented by the rectangle surrounding the T1 region, as shown in Figure 5.

On the other hand, when from the prediction mask matrix

P

we select the same submatrix, there is a possibility that it contains part of the prediction mask, which we call

P_{R}

. Here, we can see that parts of orange regions P1 and P3 and the whole region P2 lie in the matrix

T_{R}

. Next, we calculated the intersection area

I_{R}

of the region

T_{R}

of the ground truth mask with the area

P_{R}

of the prediction mask. Since all matrices consist entirely of zeros and ones, the

i j

-th pixel will belong to the intersection if and only if the product of the values of the

i j

-th pixel of the ground truth mask and the

i j

-pixel of the prediction mask is equal to 1. Hence, in order to find the intersection area

I_{R}

, it is sufficient to multiply matrices

T_{R}

and

P_{R}

element-wise and sum all the ones of the resulting matrix, thus

||I_{R}|| = \sum_{i = 1}^{k} \sum_{j = 1}^{l} T_{i j} P_{i j},

where

T_{i j} \in {0,1}

denotes the value of

i j

-th pixel of

T_{R}

and

P_{i j} \in {0,1}

denotes the value of

i j

-th pixel of

P_{R}

. To obtain a positive coefficient

m^{+}

for a given region, we use the following formula:

m^{+} = {(\frac{||I_{R}||}{||T_{R}||})}^{1 / α} ||T_{R}||,

where

||T_{R}||

denotes the area of the region of the ground truth mask that we are investigating, that is,

||T_{R}|| = \sum_{i = 1}^{k} \sum_{j = 1}^{l} T_{i j},

and

α

is the exponent that determines the final shape of the function and is chosen experimentally, based on the analysed data. The main role of the

α

parameter is to show the significance of incomplete area hits in the evaluation of an image segmentation method. The higher the value of the

α

parameter, the more we reward incomplete area hits.

This way, we go through all regions in the ground truth mask and obtain the positive region coefficient, defined by the following formula:

M^{+} = \frac{\sum_{T_{R} \in T} m^{+}}{\sum_{T_{R} \in T} ||T_{R}||} .

Negative region coefficients were calculated analogously. From the prediction matrix, we select the submatrix of dimension

m \times n

as the smallest submatrix containing a region

P_{R}

of the prediction masks

P

. Similarly, as before, after separating the same submatrix in the ground truth mask

T

, we note that parts of the ground truth mask may appear in it—let us denote them by

T_{R}

. Then, we calculated the intersection

I_{R}

of the region

P_{R}

with the region

T_{R}

as before. The negative coefficient

m^{-}

for a single region

P_{R}

is calculated using the following formula:

m^{-} = {(\frac{||P_{R}|| - ||I_{R}||}{||P_{R}||})}^{β} ||P_{R}||,

where

||P_{R}||

denotes the area of the region of the prediction mask we are investigating, that is,

||P_{R}|| = \sum_{i = 1}^{m} \sum_{j = 1}^{n} P_{i j},

and

β

is the exponent that determines the final shape of the function. The main function of the

β

parameter is to show the significance of areas of false positives. It is again selected through experimentation and chosen for the analysed data, taking into account the particular characteristics of the data. The higher the value of the

β

parameter, the weaker the penalty for a false positive area.

As before, we go through all the regions of the prediction mask and obtain the formula for the negative region coefficient:

M^{-} = \frac{\sum_{P_{R} \in P} m^{-}}{\sum_{P_{R} \in P} ||P_{R}||} .

The entire process of going through the regions to calculate the

M^{+}

and

M^{-}

coefficients is shown in Figure 6 below.

||I_{R}|| = \sum_{i = 1}^{k} \sum_{j = 1}^{l} T_{i j} P_{i j}, m^{+} = {(\frac{||I_{R}||}{||T_{R}||})}^{1 / α} ||T_{R}||, ||T_{R}|| = \sum_{i = 1}^{k} \sum_{j = 1}^{l} T_{i j}, M^{+} = \frac{\sum_{T_{R} \in T} m^{+}}{\sum_{T_{R} \in T} ||T_{R}||} .

m^{-} = {(\frac{||P_{R}|| - ||I_{R}||}{||P_{R}||})}^{β} ||P_{R}||, ||P_{R}|| = \sum_{i = 1}^{m} \sum_{j = 1}^{n} P_{i j} . M^{-} = \frac{\sum_{P_{R} \in P} m^{-}}{\sum_{P_{R} \in P} ||P_{R}||} .

Table 3 shows the coefficient values of the proposed metric for the parameters

α

i

β

equal to 5, calculated for the scenario discussed in the introduction (Figure 3). The coefficients

M^{+}

differ slightly between the Figure 3C,D predictions; however, there was a very clear difference between the coefficients of the

M^{-}

which influenced the difference between the coefficients

Δ

. This corresponds to an intuitive visual assessment of both predictions.

Parameter values

α

i

β

, which affect the degree of non-linearity of the coefficients for the individual regions, can be chosen experimentally for specific images. In the case of the previous scenario, as well as the test scenarios that we will describe in the next section of this study, the parameters

α

i

β

are equal to 5.

The factor

M^{+}

corresponds to the Recall metric, and for parameter

α

= 1 it considers the same values. Only for parameter values

α

greater than 1, the presence of regions is considered. The coefficient

M^{-}

is related to the Precision metric; for parameter i

β

= 1 the value of Precision is equal to 1 −

M^{-}

.

3. Results and Discussion

To facilitate the analysis of the behaviour of the tested metrics, including the new one proposed in this study, sets of test images were generated. All images were 200 px × 140 px, with a total area of 28,000 pixels.

The ground truth mask consists of 20 square regions, each with an area of 100 px. They were arranged in five columns and four rows. The distance between squares was 20 px (Figure 7A). Prediction images were generated as required, with the same size as the ground truth mask, but with varying degrees of correlation with the mask. For example, Figure 7B shows the prediction for 24 regions. The first eight regions (first two columns) were 25 px in the area (5 × 5 px squares) and overlapped with parts of the ground truth mask regions. The next 12 regions (i.e., the next three columns) are 100 px squares, which exactly overlap with the mask regions, and the last 4 regions (last column) are 100 px squares, which do not overlap with any mask region. Figure 7C shows the intersection of the ground truth mask with the generated prediction using the colour convention described earlier (blue: TP—true positive; black: FP—false positive; red: FN—false negative).

In our research, we employed larger images, scaling the test set’s images two, five, and ten times. Except for the experiments with the abandoned logarithmic and exponential functions, this did not, however, alter the values of the coefficients.

Further analysis of the results and discussion were performed using the easy-to-interpret and graphical scenarios discussed below.

3.1. Cases I with Fixed Dice and IoU Values

Figure 8 shows a set of six tests (I-A to I-F). In all tests, the ground truth (T) mask was the same. In tests I-A to I-E, the total area of TP hits did not change (25%). In the last test, the TP area increased to 30% but was offset by the FP hits (the FP area was chosen; as a result, the Dice and IoU coefficients remained unchanged).

The values of the test image coefficients displayed in Figure 8 are listed in Table 4. As can be seen, these are cases where the Tau, IoU, and Dice metrics failed, at least given our previous assumptions. At the same time, the coefficient

M^{+}

clearly shows the difference between images, rewarding small hits in additional regions. In the last test (Figure 8(I-F)), the incorrect prediction of several regions was reflected in the value of the coefficient

M^{-}

and, therefore, in the value of

Δ

. For a relatively high fit in the case of scenario I-E, the coefficient value

Δ

is high. In contrast, for scenario I-F, where not only is the detection of the regions itself worse, but there are additional FP areas, the coefficient value

Δ

is negative.

For the same test cases (excluding case I-F), verification of the effect of the parameter α on the

M^{+}

coefficient was performed (results in Table 5, same TP area, different number of hit regions). As expected, an increase in the value of the parameter α strengthens the effect of the number of hit regions on the coefficient value. Therefore, the choice of parameters can be optimised for a specific task, that is, the predictions analyzed (number and size of regions) and the expected results (how much hitting a region is expected to be more important than covering the area accurately).

The main role of parameter

β

is to amplify the penalty for the number of FP regions, not just their total area. Since parameter

β

defined for the coefficient

M^{-}

behaves analogously to the coefficient

α

defined for the coefficient

M^{+}

, we will skip its detailed description.

As mentioned previously, the

α

and

β

parameters depend on the type of data, so their values are chosen experimentally. The choice of these parameters is arbitrary and it is up to the user to what extent he or she wants to reward positive hits and penalize incorrectly detected areas.

Looking at Table 5, we can see that the higher the value of the

α

parameter, the greater the reward for detecting the same number of regions and, therefore, the greater the value of the

M^{+}

factor.

3.2. Cases II—Hits in the Same Percentage of the Ground Truth Mask Area

Consider a situation in which, on each occasion, the prediction overlapped 25% of the ground truth mask area, but we hit its regions differently (Figure 9). Table 6 displays the coefficients that were computed for these tests.

Note that in the absence of false-positive (FP) areas, Dice, IoU, and Tau were insensitive to the difference between detecting 25% of all regions (Figure 9(II-A)) and 25% of the area of each region (Figure 9(II-B)) and showed the same fit. For our applications, the latter prediction is preferred, as reflected in the corresponding coefficient values

M^{+}

, i.e., it is largest when hitting each of the regions, smaller when hitting eight of them, and smallest when we detected only five. When the FP area appeared, the values of the Dice, IoU, and Tau coefficients decreased. In our case, this is not a desirable effect, as we were unable to assess whether the smaller coefficient was due to hitting a smaller area, a smaller number of regions, or perhaps the appearance of FP areas. Hence, a coefficient

M^{-}

was introduced. It stores this information and makes it possible to compare predictions explicitly in terms of the appearance of FP areas. Let us further note that in the case of low mask and prediction fit for II-A and II-C tests, the value of the coefficient

Δ

is low. In tests II-B and II-D, all regions were detected in the same manner, and the only difference was the appearance of a fairly large FP area in the case of the second one. We can see that in this situation, the value of the

Δ

coefficient reflected this and decreased from 0.758 to 0.521.

3.3. Cases III Where the Same Ground Truth Mask Regions Are Hit

Let us now assume that we want to compare the predictions that detect the same regions (Figure 10). Table 7 displays the coefficients that were computed for these tests.

In tests III-A, III-B, and III-C, the increase in the coefficient

M^{+}

was slower than that in the Dice and IoU metrics. This is because only the area of hits in the regions is increasing, not their number. Note that for scenarios III-C and III-D, the value of the coefficient

M^{+}

is the same, while the FP area information is captured by the coefficient

M^{-}

, which allows us to differentiate between the two predictions. In contrast, when choosing only the coefficient

Δ

this difference was captured and its value decreased. In the case of scenarios III-A and III-B, in which there were no FP areas and only the area of hits in the same regions increased, the coefficient

Δ

is simply equal to the coefficient

M^{+}

.

3.4. Cases IV of Hits on All Ground Truth Mask Regions

To observe the general behaviour of the coefficient

M^{+}

, let us examine the tests in which every region was hit to a certain extent and no FP areas occurred—Figure 11. See Table 8 for a comparison of the coefficient values.

We observe that when the intersection of the prediction and ground truth is large, the values of all the coefficients are similar. Even though every region of the ground truth mask was detected, the coefficients of Dice and IoU indicated a lack of detection of the area when this intersection was very small. In contrast, the Tau coefficient remained large, even though only 1% of the ground truth mask area was detected. The

M^{+}

coefficient, owing to its nonlinearity (i.e., faster growth rate for small TP areas and slower growth rate for large TP areas), thus represents a compromise between an approach based only on area detection, ignoring the size of the area, and an approach based only on the intersection area of the prediction and ground truth masks, without considering how many regions of that mask we have hit.

3.5. Cases V with a Fixed Number of FP Regions

Considering the scenarios in which FP areas are observed, as noted earlier, including them in a metric with a separate coefficient provides an additional precise tool for evaluating the prediction quality.

Each region of the ground truth mask was identified in the ensuing test pairs, V-A and V-B, V-C and V-D, and V-E and V-F (Figure 12), and an extra row of prediction mask regions (i.e., the FP areas appeared). Table 9 compares the coefficients of each pair.

Even within a pair, the coefficients of Dice, IoU, and Tau differ even though the TP areas remain the same. Lower values of the coefficients distinguish the predictions from the existing FP areas, but they do not do so explicitly. Using the proposed metric, the index describing the TP areas was left the same, and information about the FP regions was included in the coefficient

M^{-}

. In contrast, using only the

Δ

coefficient, the fact that FP areas appear in pairs (V-A, V-B), (V-C, V-D), and (V-E, V-F) is reflected in their decreasing values. This indicated a worse fit each time for the second test in the pair.

Note that Tau, IoU, Dice and

Δ

are coefficients that accumulate information on TP and FP, resulting in the loss of some information. Therefore, in our work we focus on the

M^{+}

and

M^{-}

coefficients, which allow us to obtain more accurate information on the occurrence of TP and FP areas and thus better differentiate specific scenarios. For example, in Table 9, where for the

Δ

coefficient we have V-E > V-B and V-C > V-F, it is only by comparing

M^{+}

and

M^{-}

that we are able to observe where such values of the

Δ

parameter came from. It can happen that different values of the TP and FP areas give the same values of the

Δ

coefficient, hence the importance of separating the coefficients into

M^{+}

and

M^{-}

and comparing these values.

4. Conclusions

In traditional segmentation evaluation metrics, such as the Dice coefficient, IoU (Intersection over Union), or Jaccard, the main objective is to examine the degree of coverage between the predicted mask and the true mask. Although these metrics are widely used, they are often insufficient in certain situations, as we have shown in the scenarios we have prepared. This is a common problem not only in vegetation studies and is often a flaw pointed out by other researchers. For example, in a study on the quality of building segmentation [45] using satellite imagery, the author points out the different quality assessment results obtained with the Dice and Jaccard metrics, where the result fluctuated between 84% and 70%, respectively. This suggests that not all metrics are equally reliable in assessing classification quality, and that it often depends on the spatial heterogeneity of the phenomenon under study. Similarly, in the previously cited paper by Müller et al. [24], the authors point out the problem of class imbalance that often occurs in medical analyses and propose the use of a wide range of metrics such as Dice coefficient, IoU, sensitivity, specificity, Rand index, ROC curves, Cohen’s Kappa, and Hausdorff distance, which, when used together, aim to assess classification more accurately. They also point out that the standard metrics used, such as accuracy, sensitivity, and specificity, can introduce significant interpretation errors. Thus, there is a lack of a single, more flexible metric that works in more diverse scenarios. Therefore, we believe that our proposed

M

metric offers more flexibility in assessing the quality of prediction in many non-standard cases, as we have shown in the chapter Results.

Depending on the needs, the metric

M

can be used both in a compact form, as a coefficient

Δ

and in a detailed form, as a pair of coefficients

M^{+}

and

M^{-}

. In the first case, we obtain a general indication of the fit of the prediction mask to the ground truth mask. The higher the value of the

Δ

coefficient, the more accurate the prediction. Values of the

Δ

coefficient lie in the range [−1, 1]. To obtain more detailed information about the prediction fit, we can use the component coefficients

M^{+}

and

M^{-}

, which contain information on the TP and FP areas, respectively. The values of the coefficients

M^{+}

and

M^{-}

are in the range [0, 1]. The higher the value of these coefficients, the higher the proportion of TP and FP areas in the ground truth mask. Furthermore, by changing the parameters

α

and

β

we can influence how much the number of TP and FP regions will be more important than their area. This allowed us to predetermine the relevance conditions for hits in part of the ground truth mask region. This flexibility allows us to tailor the coefficient to the individual characteristics of the segmented object and thus better assess the quality of the prediction in real-life situations. Thus, we do not need to use many different metrics, but only one, which makes the work and the evaluation of the prediction itself much easier.

The proposed method was developed for the purpose of verifying the results obtained during the study of automatic detection of vegetation patches—an example of Heracleum sosnowskyi (Manden.). Currently, further field work is planned to verify the usefulness of the method during the analysis of the effectiveness of detection of communities of various invasive plants. We anticipate that the usefulness of the method is not limited to this application area. In general, it can find application in any field where we are more interested in detecting certain areas rather than determining their exact boundaries, for example, during preliminary analysis of medical imaging. Thus, a logical direction for further work will be to explore the scope of applicability of the proposed method in fields other than remote sensing.

As noted in the article, the selection of

α

and

β

parameters is currently carried out arbitrarily on an experimental basis for a particular type of data. The usefulness of the method could be improved by developing, for example, a way to automatically select these parameters or by determining their optimal values for specific applications. This requires further experimental work with different types of data and will be the next element in the development of the proposed method.

The next step in the development of the method will be an attempt to modify it so that it is more versatile and allows the evaluation of the results of classification, rather than just segmentation. A certain limitation of the proposed method, compared to the compared Dice or IoU, is its higher computational complexity. We estimate that at present it can be used effectively to evaluate the final results, while using it as a metric in the learning process may slow it down.

Author Contributions

Conceptualization, W.M., M.B. and A.Ł.; validation, W.M., M.B. and A.Ł.; writing—original draft preparation, W.M., M.B., A.Ł. and P.T.; software, W.M. All authors have read and agreed to the published version of the manuscript.

Funding

Financed by the Minister of Science under the “Regional Excellence Initiative” Program for 2024–2027.

Data Availability Statement

Dataset available on request from the authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7 June 2015; pp. 3431–3440. [Google Scholar] [CrossRef]
Rizzoli, G.; Barbato, F.; Zanuttigh, P. Multimodal Semantic Segmentation in Autonomous Driving: A Review of Current Approaches and Future Perspectives. Technologies 2022, 10, 90. [Google Scholar] [CrossRef]
Treml, M.; Arjona-Medina, J.; Unterthiner, T.; Durgesh, R.; Friedmann, F.; Schuberth, P.; Mayr, A.; Heusel, M.; Hofmarcher, M.; Widrich, M.; et al. Speeding up Semantic Segmentation for Autonomous Driving. In Proceedings of the 29th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 5–10 December 2016. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In LNCS; Springer International Publishing: Berlin, Germany, 2015; Volume 9351, pp. 234–241. [Google Scholar] [CrossRef]
Huang, S.-Y.; Hsu, W.-L.; Hsu, R.-J.; Liu, D.-W. Fully Convolutional Network for the Semantic Segmentation of Medical Images: A Survey. Diagnostics 2022, 12, 2765. [Google Scholar] [CrossRef] [PubMed]
Pedrayes, O.; Lema, D.; Garcia, F.D.; Usamentiaga, R.; Alonso, A. Evaluation of Semantic Segmentation Methods for Land Use with Spectral Imaging Using Sentinel-2 and PNOA Imagery. Remote Sens. 2021, 13, 2292. [Google Scholar] [CrossRef]
Huang, L.; Jiang, B.; Lv, S.; Liu, Y.; Fu, Y. Deep-Learning-Based Semantic Segmentation of Remote Sensing Images: A Survey. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 17, 8370–8396. [Google Scholar] [CrossRef]
Wu, B.; Gu, Z.; Zhang, W.; Fu, Q.; Zeng, M.; Li, A. Investigator Accuracy: A Center-Weighted Metric for Evaluating the Location Accuracy of Image Segments in Land Cover Classification. Int. J. Appl. Earth Obs. Geoinf. 2023, 122, 103402. [Google Scholar] [CrossRef]
Lin, J.; Jing, W.; Song, H.; Chen, G. ESFNet: Efficient Network for Building Extraction From High-Resolution Aerial Images. IEEE Access 2019, 7, 54285–54294. [Google Scholar] [CrossRef]
Audebert, N.; Le Saux, B.; Lefèvre, S. Segment-before-Detect: Vehicle Detection and Classification through Semantic Segmentation of Aerial Images. Remote Sens. 2017, 9, 368. [Google Scholar] [CrossRef]
Śledziowski, J.; Terefenko, P.; Giza, A.; Forczmański, P.; Łysko, A.; Maćków, W.; Stępień, G.; Tomczak, A.; Kurylczyk, A. Application of Unmanned Aerial Vehicles and Image Processing Techniques in Monitoring Underwater Coastal Protection Measures. Remote Sens. 2022, 14, 458. [Google Scholar] [CrossRef]
Altaweel, M.; Khelifi, A.; Li, Z.; Squitieri, A.; Basmaji, T.; Ghazal, M. Automated Archaeological Feature Detection Using Deep Learning on Optical UAV Imagery: Preliminary Results. Remote Sens. 2022, 14, 553. [Google Scholar] [CrossRef]
Bouguettaya, A.; Zarzour, H.; Kechida, A.; Taberkit, A.M. Deep Learning Techniques to Classify Agricultural Crops through UAV Imagery: A Review. Neural Comput. Appl. 2022, 34, 9511–9536. [Google Scholar] [CrossRef]
Chen, Y.; Ribera, J.; Boomsma, C.; Delp, E.J. Plant Leaf Segmentation for Estimating Phenotypic Traits. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3884–3888. [Google Scholar] [CrossRef]
Marzialetti, F.; Frate, L.; De Simone, W.; Frattaroli, A.R.; Acosta, A.; Carranza, M. Unmanned Aerial Vehicle (UAV)-Based Mapping of Acacia Saligna Invasion in the Mediterranean Coast. Remote Sens. 2021, 13, 3361. [Google Scholar] [CrossRef]
Nair, S.; Sharifzadeh, S.; Palade, V. Farmland Segmentation in Landsat 8 Satellite Images Using Deep Learning and Conditional Generative Adversarial Networks. Remote Sens. 2024, 16, 823. [Google Scholar] [CrossRef]
Reckling, W.; Mitasova, H.; Wegmann, K.; Kauffman, G.; Reid, R. Efficient Drone-Based Rare Plant Monitoring Using a Species Distribution Model and AI-Based Object Detection. Drones 2021, 5, 110. [Google Scholar] [CrossRef]
Baena, S.; Moat, J.; Whaley, O.; Boyd, D. Identifying Species from the Air: UAVs and the Very High Resolution Challenge for Plant Conservation. PLoS ONE 2017, 12, e0188714. [Google Scholar] [CrossRef]
Zhang, Y.J. A Survey on Evaluation Methods for Image Segmentation. Pattern Recognit. 1996, 29, 1335–1346. [Google Scholar] [CrossRef]
Chen, Y.; Ming, D.; Zhao, L.; Lv, B.; Zhou, K.; Qing, Y. Review on High Spatial Resolution Remote Sensing Image Segmentation Evaluation. Photogramm. Eng. Remote Sens. 2018, 84, 629–646. [Google Scholar] [CrossRef]
Gao, H.; Tang, Y.; Jing, L.; Li, H.; Ding, H. A Novel Unsupervised Segmentation Quality Evaluation Method for Remote Sensing Images. Sensors 2017, 17, 2427. [Google Scholar] [CrossRef]
Wang, Z.; Wang, E.; Zhu, Y. Image Segmentation Evaluation: A Survey of Methods. Artif. Intell. Rev. 2020, 53, 5637–5674. [Google Scholar] [CrossRef]
Hsieh, C.-H.; Chia, T.-L. Analysis of Evaluation Metrics for Image Segmentation. J. Inf. Hiding Multimed. Signal Process. 2018, 9, 1559–1576. [Google Scholar]
Müller, D.; Soto-Rey, I.; Kramer, F. Towards a Guideline for Evaluation Metrics in Medical Image Segmentation. BMC Res. Notes 2022, 15, 210. [Google Scholar] [CrossRef]
Wang, H.; Zhuang, C.; Zhao, J.; Shi, R.; Jiang, H.; Yuan, Y.; Guo, X.; Xue, Z. Research on Evaluation Method of Aerial Image Segmentation Algorithm. In Proceedings of the 2022 7th International Conference on Signal and Image Processing (ICSIP), Suzhou, China, 20–22 July 2022; pp. 415–419. [Google Scholar] [CrossRef]
Janušonis, E.; Kazakeviciute-Januskeviciene, G.; Bausys, R. Selection of Optimal Segmentation Algorithm for Satellite Images by Intuitionistic Fuzzy PROMETHEE Method. Appl. Sci. 2024, 14, 644. [Google Scholar] [CrossRef]
Kazakeviciute-Januskeviciene, G.; Janušonis, E.; Bausys, R. Evaluation of the Segmentation of Remote Sensing Images. In Proceedings of the 2021 IEEE Open Conference of Electrical, Electronic and Information Sciences (eStream), Vilnius, Lithuania, 22 April 2021; pp. 1–7. [Google Scholar] [CrossRef]
Ma, Z.; Redmond, R.L. Tau Coefficients for Accuracy Assessment of Classification of Remote Sensing Data. Photogramm. Eng. Remote Sens. 1995, 61, 435–439. [Google Scholar]
Cohen, J. A Coefficient of Agreement for Nominal Scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
Tariku, G.; Ghiglieno, I.; Gilioli, G.; Gentilin, F.; Armiraglio, S.; Serina, I. Automated Identification and Classification of Plant Species in Heterogeneous Plant Areas Using Unmanned Aerial Vehicle-Collected RGB Images and Transfer Learning. Drones 2023, 7, 599. [Google Scholar] [CrossRef]
Pichai, K.; Park, B.; Bao, A.; Yin, Y. Automated Segmentation and Classification of Aerial Forest Imagery. Analytics 2022, 1, 135–143. [Google Scholar] [CrossRef]
Lin, C.-W.; Lin, M.; Hong, Y. Aerial and Optical Images-Based Plant Species Segmentation Using Enhancing Nested Downsampling Features. Forests 2021, 12, 1695. [Google Scholar] [CrossRef]
Xia, L.; Zhang, R.; Chen, L.; Li, L.; Yi, T.; Yao, W.; Ding, C.; Xie, C. Evaluation of Deep Learning Segmentation Models for Detection of Pine Wilt Disease in Unmanned Aerial Vehicle Images. Remote Sens. 2021, 13, 3594. [Google Scholar] [CrossRef]
Fuentes-Pacheco, J.; Torres, J.; Roman-Rangel, E.; Cervantes, S.; Juarez-Lopez, P.; Hermosillo, J.; Rendon-Mancha, J. Fig Plant Segmentation from Aerial Images Using a Deep Convolutional Encoder-Decoder Network. Remote Sens. 2019, 11, 1157. [Google Scholar] [CrossRef]
Gallmann, J.; Schüpbach, B.; Jacot, K.; Albrecht, M.; Winizki, J.; Kirchgessner, N.; Aasen, H. Flower Mapping in Grasslands With Drones and Deep Learning. Front. Plant Sci. 2022, 12, 774965. [Google Scholar] [CrossRef]
Lake, T.; Runquist, R.; Moeller, D. Deep Learning Detects Invasive Plant Species across Complex Landscapes Using Worldview-2 and Planetscope Satellite Imagery. Remote Sens. Ecol. Conserv. 2022, 8, 875–889. [Google Scholar] [CrossRef]
Asner, G. Applications of Remote Sensing to Alien Invasive Plant Studies. Sensors 2009, 9, 4869–4889. [Google Scholar] [CrossRef] [PubMed]
Gałczyńska, M.; Gamrat, R.; Łysko, A. Impact of invasive species of the genus heracleum spp. (apiaceae) on environment and human health Wpływ Gatunków Inwazyjnych z Rodzaju Heracleum Spp. (Apiaceae) Na Środowisko i Zdrowie Człowieka. Kosmos. Seria A, Biologia / Polskie Towarzystwo Przyrodników im. Kopernika 2016, 65, 591–599. [Google Scholar]
Sužiedelytė Visockienė, J.; Tumelienė, E.; Maliene, V. Identification of Heracleum Sosnowskyi-Invaded Land Using Earth Remote Sensing Data. Sustainability 2020, 12, 759. [Google Scholar] [CrossRef]
Powers, D. Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation. Mach. Learn. Technol. 2008, 2, 1–24. [Google Scholar]
Sokolova, M.; Lapalme, G. A Systematic Analysis of Performance Measures for Classification Tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
Zhang, Y.; Guindon, B. Application of the Dice Coefficient to Accuracy Assessment of Object-Based Image Classification. Can. J. Remote Sens. 2017, 43, 48–61. [Google Scholar] [CrossRef]
Yan, J.; Wang, H.; Yan, M.; Wenhui, D.; Sun, X.; Li, H. IoU-Adaptive Deformable R-CNN: Make Full Use of IoU for Multi-Class Object Detection in Remote Sensing Imagery. Remote Sens. 2019, 11, 286. [Google Scholar] [CrossRef]
Setiawan, A.W. Image Segmentation Metrics in Skin Lesion: Accuracy, Sensitivity, Specificity, Dice Coefficient, Jaccard Index, and Matthews Correlation Coefficient. In Proceedings of the 2020 International Conference on Computer Engineering, Network, and Intelligent Multimedia (CENIM), Surabaya, Indonesia, 17–18 November 2020; pp. 97–102. [Google Scholar]
Ataş, İ. Performance Evaluation of Jaccard-Dice Coefficient on Building Segmentation from High Resolution Satellite Images. Balk. J. Electr. Comput. Eng. 2023, 11, 100–106. [Google Scholar] [CrossRef]

Figure 1. Localization of the study area.

Figure 2. Example of comparison of prediction results.

Figure 3. (A) Orthophotomap with marked contours of the range and lake, (B) ground truth mask—location of Heracleum sosnowskyi, (C) prediction compared with mask (model using three channels —RGB), (D) prediction compared with mask (model using five channels—RGB, red edge, near-infrared). For explanation of false negative, true positive and false positive terms, see the explanation on the previous page.

Figure 4. The ground truth (blue) and the prediction mask (pale orange) on top of each other with separated regions.

Figure 5. The separated T1 region from the ground truth mask together with the surrounding rectangle (bounding box) and parts of the prediction mask.

Figure 6. Decomposition of the image into submatrices containing the ground truth mask regions ((A)—passing through the ground truth regions: positive coefficient) and the prediction mask regions ((B)—passing through the prediction regions: negative coefficient).

Figure 7. Example test set—(A) ground truth mask T, (B) prediction P, (C) comparison P and T.

Figure 8. Test images for constant values of IoU and Dice coefficients (A–F—case examples).

Figure 9. Test images for a fixed TP surface.

Figure 10. Test images for fixed regions.

Figure 11. Test images of hits in each region.

Figure 12. Pairs of test images: (V-A, V-B), (V-C, V-D), and (V-E, V-F) illustrating the effect of FP on the analysed coefficients.

Table 1. Prediction rates of Figure 3C,D.

Image	Tau	IoU	Dice
Prediction 3C	0.788	0.408	0.58
Prediction 3D	0.856	0.547	0.707

Table 2. Parameters of the regions of both predictions.

Image	Mask Regions		TP Regions		FP Regions
Image	Number	Total Area	Number	Total Area	Number	Total Area
Mask 3B	611	5,665,854	n/a	n/a	n/a	n/a
Prediction 3C	495	6,926,924	301	3,655,480	453	3,271,444
Prediction 3D	336	4,224,048	292	3,506,152	297	717,896

Table 3. Coefficient values of the proposed metric for the parameters

α

i

β

equal to 5, calculated for the scenario discussed in the introduction.

Table 3. Coefficient values of the proposed metric for the parameters

α

i

β

equal to 5, calculated for the scenario discussed in the introduction.

Image	M⁺ (α = 5) Positive Regions Coefficient	M⁻ (β = 5) Negative Regions Coefficient	Δ The Difference between Region’s Coefficient
Prediction 3C	0.856	0.368	0.488
Prediction 3D	0.839	0.023	0.816

Table 4. The values of the coefficients for the test images in Figure 8.

Case	TP%	FP%	Tau	IoU	Dice	M⁺ (α = 5)	M⁻ (β = 5)	Δ
I-A	25	0	0.743086	0.25	0.4	0.25	0.0	0.25
I-B	25	0	0.743086	0.25	0.4	0.351572	0.0	0.351572
I-C	25	0	0.743086	0.25	0.4	0.453143	0.0	0.453143
I-D	25	0	0.743086	0.25	0.4	0.656287	0.0	0.656287
I-E	25	0	0.743086	0.25	0.4	0.757858	0.0	0.757858
I-F	30	20	0.697491	0.25	0.4	0.3	0.4	−0.1

Table 5. Verification of the effect of the parameter α on the

M^{+}

.

Table 5. Verification of the effect of the parameter α on the

M^{+}

.

		I-A	I-B	I-C	I-D	I-E
TP%		25	25	25	25	25
TP regions		5	8	11	17	20
M⁺	α = 1	0.25	0.25	0.25	0.25	0.25
	α = 2	0.25	0.3	0.35	0.45	0.5
	α = 3	0.25	0.326	0.402	0.554	0.63
	α = 4	0.25	0.341	0.433	0.616	0.707
	α = 5	0.25	0.352	0.453	0.656	0.758
	α = 6	0.25	0.359	0.467	0.685	0.794
	α = 7	0.25	0.364	0.478	0.706	0.82
	α = 8	0.25	0.368	0.486	0.723	0.841
	α = 9	0.25	0.371	0.493	0.736	0.857
	α = 10	0.25	0.374	0.498	0.746	0.871

Table 6. The values of the coefficients for the test images in Figure 9.

Case	TP%	FP%	Tau	IoU	Dice	M⁺ (α = 5)	M⁻ (β = 5)	Δ
II-A	25.0	0.0	0.743	0.25	0.4	0.25	0.0	0.25
II-B	25.0	0.0	0.743	0.25	0.4	0.758	0.0	0.758
II-C	25.0	0.0	0.743	0.25	0.4	0.352	0.0	0.352
II-D	25.0	75.0	0.596	0.143	0.25	0.758	0.237	0.521

Table 7. The values of the coefficients for the test images in Figure 10.

Case	TP%	FP%	Tau	IoU	Dice	M⁺ (α = 5)	M⁻ (β = 5)	Δ
III-A	10.0	0.0	0.653	0.1	0.182	0.303	0.0	0.303
III-B	25.0	0.0	0.743	0.25	0.4	0.352	0.0	0.352
III-C	40.0	0.0	0.809	0.4	0.571	0.4	0.0	0.4
III-D	40.0	120.0	0.625	0.182	0.308	0.4	0.237	0.163

Table 8. The values of the coefficients for the test images in Figure 11.

Case	TP%	Tau	IoU	Dice	M⁺ (α = 5)	Δ
IV-A	100.0	1.0	1.0	1.0	1.0	1.0
IV-B	70.0	0.914	0.7	0.824	0.903	0.903
IV-C	25.0	0.743	0.25	0.4	0.758	0.758
IV-D	9.0	0.645	0.09	0.165	0.618	0.618
IV-E	4.0	0.597	0.04	0.077	0.525	0.525
IV-F	1.0	0.548	0.01	0.02	0.398	0.398

Table 9. Comparing the coefficient values for each of the three pairs.

Case	TP%	FP%	Tau	IoU	Dice	M⁺ (α = 5)	M⁻ (β = 5)	Δ
V-A	100.0	0.0	1.0	1.0	1.0	1.0	0.0	1.0
V-B	100.0	20.0	0.953	0.833	0.909	1.0	0.167	0.833
V-C	25.0	0.0	0.743	0.25	0.4	0.758	0.0	0.758
V-D	25.0	5.0	0.719	0.238	0.385	0.758	0.167	0.591
V-E	70.0	0.0	0.914	0.7	0.824	0.903	0.0	0.903
V-F	70.0	20.0	0.859	0.583	0.737	0.903	0.222	0.681

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Maćków, W.; Bondarewicz, M.; Łysko, A.; Terefenko, P. Orthophoto-Based Vegetation Patch Analyses—A New Approach to Assess Segmentation Quality. Remote Sens. 2024, 16, 3344. https://doi.org/10.3390/rs16173344

AMA Style

Maćków W, Bondarewicz M, Łysko A, Terefenko P. Orthophoto-Based Vegetation Patch Analyses—A New Approach to Assess Segmentation Quality. Remote Sensing. 2024; 16(17):3344. https://doi.org/10.3390/rs16173344

Chicago/Turabian Style

Maćków, Witold, Malwina Bondarewicz, Andrzej Łysko, and Paweł Terefenko. 2024. "Orthophoto-Based Vegetation Patch Analyses—A New Approach to Assess Segmentation Quality" Remote Sensing 16, no. 17: 3344. https://doi.org/10.3390/rs16173344

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Orthophoto-Based Vegetation Patch Analyses—A New Approach to Assess Segmentation Quality

Abstract

1. Introduction

2. Materials and Methods

2.1. Standard Segmentation Quality Metrics Used

2.2. Research Preparation and Problem Identification

2.3. Proposed New Metric

3. Results and Discussion

3.1. Cases I with Fixed Dice and IoU Values

3.2. Cases II—Hits in the Same Percentage of the Ground Truth Mask Area

3.3. Cases III Where the Same Ground Truth Mask Regions Are Hit

3.4. Cases IV of Hits on All Ground Truth Mask Regions

3.5. Cases V with a Fixed Number of FP Regions

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI