Key Takeaways

The critical analysis presented in Table 1 uncovers the fact that recent research explicitly points towards the need of segmentation algorithm which considers the content relevant super-pixel segmentations. To accomplish this task several techniques are proposed with incorporation of prior transformations of image via deep learning methods, simple image processing, and probabilistic methods. Mostly the research uses and conforms the achievements of Simple Linear Iterative Clustering (SLIC). Furthermore, most of the research studies are using SLIC algorithm for super-pixel creation as base mechanisms with added features. Generally, the algorithms proposed in last decade have computational complexity of *<sup>O</sup>*(*N*), whereas if neural networks are employed for automation of required parameter initialization, then the complexity becomes *O NNo*. *o f layers*. All the proposed algorithms use two distance measures for final super-pixel creation, i.e., Euclidean, or geodesic distance measure. However, all these studies have not mentioned the occurrence of semi-dark images and their impact on the overall performance. It is estimated that huge margin of the existing image dataset already includes the problem centric image data. The Berkeley dataset that is substantially used for performance analysis of super-pixel algorithms contains up to 63% semi-dark images. The proposed study uses the semi-dark images of Berkeley dataset for benchmarking analysis of the SLIC++ algorithm.

#### *2.3. Exclusiveness of SLIC++ w.r.t Recent Developments*

The recent studies substantially focus on super-pixels with the induced key features of content sensitivity and adherence of the final segmentations; consequently several related research studies have been proposed. Generally, the desired features are good boundary adherence, compact super-pixel size and low complexity. The same features are required for super-pixel segmentation of semi-dark images. In this section we briefly review the recent developments which are closely related to our proposed method for creating super-pixel in semi-dark images.

BASS (Boundary-Aware Super-pixel Segmentation) [28], is closely related to the methodology that we have chosen, i.e., incorporation of content relevant information in the final pixel labeling which ends up with the creation of super-pixels. However, the major difference resides in the initialization of the super-pixel seeds/centers. BASS recommends the usage of forest classification method prior to the super-pixel creation. This forest classification of image space results with the creation of a binary image with

highlighted boundary information over the image space. This boundary information is then utilized to aid the initialization process of the seed/ cluster centers. Theoretically, the problem with this entire configuration is additional complexity of boundary map creation which raise the complexity from *O*(*N*) to *<sup>O</sup>*(*NlogN*). This boundary map creation and its associated condition of addition and deletion of seeds is expected to further introduce undesired super-pixel feature under-segmentation. The under-segmentation might take place due to easy seed deletion condition and difficult addition condition which means more seeds would be deleted and less would be added. This aspect of the study is not desired for the super-pixel creation in semi-dark scenarios. On the contrary we propose regularly distributed seeds along with usage of both the recommended distance measures without prior image transformation which further reduces the overall complexity. Finally, we also propose the usage of geodesic distance for color components of the pixel rather than only using it for spatial component.

Intrinsic manifold SLIC [30], is an extension of manifold SLIC which proposes usage of manifolds to map high dimensional image data on the manifolds resembling Euclidean space near each point. IMSLIC uses Geodesic Centroidal Voronoi Tessellations (GCVT) this allows flexibility of skipping the post-processing heuristics. For computation of geodesic distance on image manifold weighted image graph is overlayed with the same graph theory of edges, nodes, and weights. For this mapping 8-connected neighbors of pixel are considered. This entire process of mapping and calculation of geodesic distances seems complex. The theoretical complexity is *<sup>O</sup>*(*N*), however with the incorporation of image graph the computational complexity will increase. Moreover, the conducted study computes only geodesic distance between the pixels leaving behind the Euclidean counterpart. With substantially less complexity we propose to implement both the distance measures for all the crucial pixel components.

#### *2.4. Summary and Critiques*

The comprehensive literature survey is conducted to benefit the readers and provide a kickstart review of advancements of super-pixel segmentation over the period of two decades. Moreover, the survey resulted in critical analysis of existing segmentation techniques which steered the attention to studies conducted for adverse image scenarios such as semi-dark images. Arguably, with the increased automated solutions the incoming image data will be of dynamic nature (including lighting conditions). To deal with this dynamic image data, there is a critical need of a super-pixel segmentation technique that takes into account the aspect of semi-dark imagery and results in regular and content-aware super-pixel segmentation in semi-dark scenarios. The super-pixel segmentation techniques currently employed for the task suffer from two major issues, i.e., high complexity, and information loss. The information loss associated with the gradient-ascent methods is attributed to restrictions imposed due to usage of Euclidean image space which totally loses the context of the information present in the image by calculating straight line differences. Many attempts have been made to incorporate CNN probabilistic methods in super-pixel creation methods to optimize and aid the final segmentation results. However, to the best of our knowledge there has been no method proposed exclusively for semi-dark images scenarios keeping the simplicity and optimal performance intact.

In following sections, we describe the preliminaries which are the base for the proposed extension of SLIC namely SLIC++. We also present several distance measures incorporated in base SLIC algorithm namely SLIC+ to analyze the performance for semidark images.

#### **3. Materials and Methods**

#### *3.1. The Semi-Dark Dataset*

For the analysis of the content-aware super-pixel segmentation algorithm, we have used the state-of-the-art dataset which has been used in the literature for years now. The Berkeley image dataset [39] has been used for the comprehensive analysis and benchmarking of the proposed SLIC++ algorithm with the state-of-the-art algorithms. The Berkeley image dataset namely BSDS 500 has go<sup>t</sup> five hundred images overall, whereas the problem under consideration is of semi-dark images. For this purpose, we have initially extracted semi-dark images using RPLC (Relative Perceived Luminance Classification) algorithm. The labels are created based on the manipulation of color model information, i.e., Hue, Saturation, Lightness (HSL) [40]. The final semi-dark images extracted from the BSDS-500 dataset turned out to be 316 images. Each image has resolution of either 321 × 481 or 481 × 321 dimensions. The BSDS-500 image dataset provides the basis for empirical analysis of segmentation algorithms. For the performance analysis and boundary detection, the BSDS-500 dataset provides ground-truth labels by at least five human annotators on average. This raises questions about the selection of annotation provided by the subjects. To deal with this problem, we have performed a simple logic over the image ground truth labels. All the image labels are iterated with 'OR' operation to generate singular ground truth image label. The 'OR' operation is performed to make sure that the final ground truth is suggested by most of the human annotators. Finally, every image is segmented and benchmarked against this single ground truth labeled image.

#### *3.2. Desiderata of Accurate Super-Pixels*

Generally, for super-pixel algorithms there are no definite features for super-pixels to be accurate. The literary studies refer accurate super-pixels in terms of boundary adherence, connectivity of super-pixels, super-pixel partitioning, compactness, regularity, efficiency, controllable number of super-pixels and so on [13,41,42]. As the proposed study is research focused on semi-dark images, we take into account features that are desired for conformation of accurate boundary extraction in semi-dark images.

#### **1. Boundary Adherence**

The boundary adherence is the measure to compute the accuracy to which the boundary has been extracted by super-pixels against boundary image or ground-truth images. The idea is to preserve information as much as possible by creating super-pixels over the image. The boundary adherence feature is basically a measure that results how accurately the super-pixels have followed the ground-truth boundaries. This can be easily calculated by segmentation quality metrics precision-recall.

#### **2. Efficiency with Less Complexity**

As super-pixel segmentation algorithms are now widely used as preprocessing step for further visually intelligent tasks. The second desired feature is efficiency with less complexity. The focus should be creation of memory-efficient and optimal usage of processing resources so that more memory and computational resources can be used by subsequent process. We take into account this feature and propose an algorithm that uses exactly same resources as of Basic SLIC with added distance measures in its discourse.

#### **3. Controllable Number of Super-Pixels**

The controllable number is super-pixels is a desired feature to ensure the optimal boundary is extracted using the computational resources ideally. The super-pixel algorithms are susceptible to this feature that is number of super-pixels. The number of superpixels to be created can directly impact the overall algorithm performance. The performance is degraded in terms of under-segmentation or over-segmentation error. In the former one, the respective algorithm fails to retrieve most of the boundaries due to the smaller number of super-pixels to be created, whereas the latter one retrieves maximum boundary portions of the ground-truth images but there is surplus of computational resources.

Nevertheless, as mentioned earlier there is a huge list of accuracy measures and all those measures refer to different segmentation aspects and features. The required features and subsequent accuracy measures to be reported depend on the application of algorithm. For semi-dark image segmentation, it is mandatory to ensure that most of the optimal boundary is extracted and this requirement can be related to precision-recall metrics.

#### *3.3. SLIC Preliminaries*

Before presenting the SLIC++, we first introduce base functionality of SLIC. The overall functionality is based on creation of restricted windows in which the user defined seeds are placed, and clustering of image point is performed in this restricted window. This restricted window is called Voronoi Tessellations [43]. Voronoi Tessellations is all about partitioning the image plane into convex polygons. This polygon is square in case of SLIC initialization windows. The Voronoi tessellations are made such that each partition has one generating point and all the point in the partition are close to the generating point or the mass center of that partition. As the generating point lies in the center these partitions are also called Centroidal Voronoi Tessellation (CVT). The SLIC algorithm considers CIELAB color space for the processing, where every pixel *p* on image *I* is presented by color components and spatial components as *c*(*p*) = (*l*(*p*), *<sup>a</sup>*(*p*), *b*(*p*)) being colour components and *p*(*<sup>u</sup>*, *v*) being spatial components. For any two pixels SLIC measures straight line difference or Euclidean distance between the two pixels for the entire image space R5.

The spatial distance between two pixels is given by *ds* and color component distance *dc* are given in Equations (1) and (2).

$$d\_{\mathbb{S}} = \sqrt{(u\_1 - u\_2)^2 + (v\_1 - v\_2)^2} \tag{1}$$

And,

$$d\_c = \sqrt{\left(l\_1 - l\_2\right)^2 + \left(a\_1 - a\_2\right)^2 + \left(b\_1 - b\_2\right)^2} \tag{2}$$

Here *ds* and *dc* represent Euclidean distance between pixel *p*1 and *p*2. Instead of simple Euclidean, SLIC uses distance term infused with Euclidean norm given by Equation (3).

$$D\_s = d\_c + \frac{m}{S} d\_s \tag{3}$$

The final distance term is normalized using interval *S* and *m* provides control over the super-pixel compactness which results in perceptually meaningful distance with balanced aspect of spatial and color components. Provided the number of super-pixels *K* seeds (*si*) *K i*=1 are evenly distributed in over the image *I* clusters are created in restricted regions of Voronoi Tessellations. The initialization seed are placed in image space within a window of 2*S* × 2*S* having center *si*. After that simple K-means is performed over the pixels residing in the window to its center. SLIC computes the distance between pixels using Equation (3) and iteratively processes the pixels until convergence.

#### *3.4. The Extension Hypothesis—Fusion Similarity Measure*

The super-pixels created by the SLIC algorithm basically uses the Euclidean distance measure to create pixel clusters or the super-pixels based on the seed or cluster centers. The Euclidean distance measure takes into account the similarity among pixels using straight line differences among cluster centers and the image pixels. This property of distance measure results in distortion of extracted boundaries of image. The reason is measure remains same no matter if there is a path along the pixels. The path along the pixels will result in smoother and content relevant pixels [16,36]. The Euclidean distance overlays a segmentation map over the image without having relevance to the actual content present in the image. Moreover, large diversity in the image (light conditions/high density portions) result in unavoidable distortion. Therefore, we hypothesize to use accurate distance measure which presents content relevant information of the visual scene. For this reason, we extend the functionality of SLIC by replacing the Euclidean distance measure with four potential similarity measures including chessboard, cosine, Minkowski, and geodesic and named it as SLIC+. These distance measures have been used in the literature integrated in clustering algorithms for synthetic textual data clustering where studies mentioned to render reasonable results for focused problem solving [44]. However, we use these similarity measures to investigate the effects on visual images using SLIC approach. Prior to implementation, a brief introductory discussion will help understand the overall integration and foundation for choosing these similarity measures. The distance measures are basically the distance transforms applied on different images, specifying the distance from each pixel to the desired pixel. For uniformity and easy understanding, let pixel *p*1 and *p*2 have the coordinates (*<sup>x</sup>*1, *y*1) and (*<sup>x</sup>*2, *y*2), respectively.

• **Chessboard:** This measure calculates the maximum distance between vectors. This can be referred to measuring path between the pixels based on eight connected neighborhood whose edges are one unit apart. The chessboard distance along any co-ordinate is given by identifying maximum, as presented in Equation (4).

$$D\_{\text{class}} = \max(|\mathbf{x}\_2 - \mathbf{x}\_1|, |y\_2 - y\_1|) \tag{4}$$

*Rationale of Consideration*: Since the problem with existing similarity measures is loss of information, chessboard is one of the alternate to be incorporated in super-pixel creation base. This measure is considered as it takes into account information of eight connected neighbors of pixels under consideration. However, it might add computational overhead due to the same.

• **Cosine:** This measure calculates distance based on the angle between two vectors. The cosine angular dimensions counteract the problem of high dimensionality. The inner angular product between the vectors turns out to be one if vectors were previously normalized. Cosine distance is based on cosine similarity which is then pluggedin distance equation. Equations (5) and (6) shows calculation of cosine distance between pixels.

$$\text{cosine similarity} = \frac{p\_1.p\_2}{\sqrt{p\_1^2}\sqrt{p\_2^2}}\tag{5}$$

$$D\_{\text{cosine}} = 1 - \text{cosine similarity} \tag{6}$$

*Rationale of Consideration*: One of the aspects of content aware similarity measure is to retain the angular information thus we attempted to incorporate this measure. The resulting super-pixels are expected to retain the content relevant boundaries. However, this measure does not consider magnitude of the vectors/pixels due to which boundary performance might fall.

• **Minkowski:** This measure is a bit more intricate. It can be used for normed vector spaces, where distance is represented as vector having some length. The measure multiplies a positive weight value which changes the length whilst keeping its direction. Equation (7) presents distance formulation of Minkowski similarity measure.

$$D\_{\min} = \left( |p\_2 - p\_1|^{\mu} \right)^{1/\mu} \tag{7}$$

Here *μ* is the weight, if its value is set to 1 the resultant measure corresponds to Manhattan distance measure. *μ* = 2, refers to euclidean and *μ* = <sup>∞</sup>, refers to chessboard or Chebyshev distance measure. *Rationale of Consideration*: As user-control in respective application is desired, Minkowski similarity provides the functionality by replacing merely one parameter which changes the entire operationality without changing the core equations. However, here we still have problems relating to the retainment of angular information.

• **Geodesic:** This measure considers geometric movements along the pixel path in image space. This distance presents locally shortest path in the image plane. Geodesic distance computes distance between two pixels which results in surface segmentation with minimum distortion. Efficient numerical implementation of geodesic distance is achieved using first order approximation. For approximation parametric surfaces are considered with *n* number of points on the surface. Given an image mask, geodesic distance for image pixels can be calculated using Equation (8).

$$D\_{\rm gco} = \min\_{P\_{\underline{x}\_i, \underline{x}\_j}} \int\_0^1 D(P\_{\underline{x}\_i, \underline{x}\_j}(t) \| \dot{P}\_{\underline{x}\_i, \underline{x}\_j}(t) \| dt \tag{8}$$

where *Pxi*,*xj*(*t*) is connected path between pixel *xi*, *xj*, provided *t* = 0,1. The density function *<sup>D</sup>*(*x*) increments the distance and can be computed using Equation (9).

$$D(\mathbf{x}) = e^{\frac{E(\mathbf{x})}{v}}, \; E(\mathbf{x}) = \frac{\|\big| \bigtriangledown I\|}{G\_{\sigma^\*} \* \big|\big| \bigtriangledown I\| + \gamma^{\prime}}\tag{9}$$

where *υ* is scaling factor, *<sup>E</sup>*(*x*) is edge measurement also provides normalization of gradient magnitude of image *<sup>I</sup>*. *Gσ* is the Gaussian function with its standard deviation being *σ*. *γ* minimizes the effect of weak intensity boundaries over density function. *D*(*x*) always produces constant distance, for homogeneous appearing regions if *E*(*x*) is zero *D*(*x*) becomes one. *Rationale of Consideration*: For shape analysis by computing distances geodesic has been the natural choice. However, computing geodesic distance is computationally expensive and is susceptible to noise [44]. Therefore, to overcome effect of noise geodesic distance should be used in amalgamation of Euclidean properties to retain maximum possible information in terms of minimum distance among pixels and their relevant angles.

The mentioned distance measures for identification of similarity among pixels based on pixel proximity provides different functionality features including extraction of information based on the 4-connected and 8-connected pixel neighborhood, and incorporation of geometric flows to keep track of angular movements of image pixels. However, none of these similarity measures provide balanced equation with integrated features of optimal boundary extraction based on connected neighbors and their angular movements. Thus, we hypothesize boundary extraction to be more accurate and intricate in presence of a similarity measure which provides greater information of spatial component provided by neighborhood pixels along with geometric flows.

## *3.5. SLIC++ Proposal*

#### 3.5.1. Euclidean Geodesic—Content-Aware Similarity Measure

Considering the simplicity and fast computation as critical components for segmentation, the proposed algorithm uses fusion of Euclidean and geodesic distance measures. The depiction of Euclidean and geodesic similarity is presented in Figure 1, where straight line shows Euclidean similarity while curved line shows geodesic similarity. Since using only Euclidean similarity loses the context information due to usage of straight-line distance and geodesic similarity focuses more on the actual possible path along the pixels. We propose the fusion of both the similarities to extract accurate information of image pixels and their associations.

**Figure 1.** Irrelevance of Euclidean distance measure for super-pixel creation relating to image content.

Using the same logic as SLIC, we propose a normalized similarity measure. The normalization is based on the interval *S* between the pixel cluster centers. To provide the control over super-pixel same variable *m* is also used. Beforehand the contribution of Euclidean and geodesic distance in final similarity measure cannot be determined in terms of optimized performance. Hence, we have introduced two weight parameters for proposal of final similarity measure which based on weighted combination of Euclidean and geodesic distance. The proposed similarity measure is presented in Equation (10).

$$D\_{ca} = w\_1 \left( d\_1 + \frac{m}{S} d\_2 \right) + w\_2 \left( d\_3 + \frac{m}{S} d\_4 \right) \tag{10}$$

where *Dca* is content-aware distance measure, *d*1 and *d*2 are same as *ds* and *dc* (Equations (5) and (6)) calculating the Euclidean distance for spatial and color component of image pixels. Variables *d*3 and *d*4 presents color and spatial component distance calculation using geodesic distance equation 8. Specifically, *d*3 represents geodesic calculation of color components of image pixel and *d*4 represents geodesic calculation of spatial components of pixel. Here, again we introduce similar normalization as of SLIC using variable *S* and *m* to provide control super-pixel compactness using geometric flows. The weights *w*1 and *w*2 further provides user control to choose the contribution of Euclidean and geodesic distance in final segmentation. These weights provide user flexibility, and these values can be changed based on the application. Moreover, these weights can be further tuned in future studies.

#### 3.5.2. Proposal of Content-Aware Feature Infusion in SLIC

The SLIC++ is proposed to extract the optimal information from a visual scene captured in semi-dark scenarios. Nevertheless, the same algorithm holds for any type of image if the objective is to retrieve maximum information from the image space. The steps involved in computing super-pixels are written in SLIC++ algorithm (refer Algorithm 1). Basically, super-pixels are perceptual cluster computed based on pixel proximity and color intensities. Some of the parameters include: *K*being the number of super-pixels, *N*total number of pixels, *A*approximate number of pixels also called area of super-pixel, and *S*length of super-pixel.

#### **Algorithm 1. SLIC++ Algorithm**


6: Assign the pixel from **2***S*×**2***S* in a square window or CVT using distance measure given by Equation (10).

7: *End for*


Keeping simplicity and fast computation intact we present SLIC++ algorithm, here only one of the steps mentioned on step 5 or 6 will be used. If step 5 is implemented, i.e., distance measure given by Equation (3) is used entire functionality of SLIC algorithm is implemented. Whereas, if step 6 is implemented, i.e., distance measure given by Equation (10) is used entire functionality of SLIC algorithm is implemented.

#### a. Initialization and Termination

For initialization a grid of initial point is created separated by distance S in each direction as seen in Figure 2. The number of initial centers is given as parameter *K*. Placement of initial center in restricted squared grids can result in error if the initial center is placed on the edge of image content. This initial center is termed as confused center. To overcome this error gradient of the image is computed and the cluster center is moved in the direction of minimum gradient. The gradient is computed with 4-neighboring pixels and the centroid is moved. To solve this mathematically L2 Norm distance is computed among four connected neighbors of center pixel. The gradient calculation is given by Equation (11).

$$G(\mathbf{x}, \mathbf{y}) = \left\|(\mathbf{x} + \mathbf{1}, \mathbf{y}) - (\mathbf{x} - \mathbf{1}, \mathbf{y})\right\|^2 + \left\|(\mathbf{x}, \mathbf{y} + \mathbf{1}) - (\mathbf{x}, \mathbf{y} - \mathbf{1})\right\|^2\tag{11}$$

*<sup>G</sup>*(*<sup>x</sup>*, *y*) is the gradient of center pixel under consideration.

The gradient of the image pixels is calculated until stability where pixels stop changing the clusters based on the initialized clusters. Overall, the termination and optimization is controlled by parameter 'n' which represents number of iterations the overall SLIC algorithm goes through before finally resulting in super-pixel creation of the image. To keep the uniformity in presented research we have selected '*n*' as 10 which has been a common practice [11,14,29,32].

**Figure 2.** Restricted Image search area for super-pixel creation specified by input argumen<sup>t</sup> for image window under consideration [31].

#### **How it works?**

The incoming image is converted to CIELAB space. The user provides information of all the initialization parameters including '*K*', '*m*', '*n*'. Referring to the algorithm steps presented in SLIC++ algorithm. Step 1, places *K* number of super-pixels provided by user on an equidistant grid. This grid is created separated by *S*, where *S* is given by *NK* , *N* is total number of image pixels. Step 2 performs reallocation of initial seed takes places subjected to the gradient condition to overcome the effect of initial centers placed over the edge pixels in image. Step 3 through step 7, are iteratively executed till the image pixels stop changing the clusters based on the cluster centers/seeds. Steps 5 or 6 are chosen based for respective implementation of SLIC or SLIC++ vice versa. Step 5 and 6 basically performs clustering over the image pixels based on different distance measures. If user opts for SLIC then Euclidean distance measure is used (base functionality). If user opts for SLIC++ then proposed hybrid distance measure is used. Step 8 checks if the new cluster center after every iteration of clustering is different than the previous center (distance between previous centers and recomputed centers). Step 9 keeps track of the threshold value for iterations as specified by the parameter '*n*'. Step 10 enforces connectivity among the created super-pixel/clusters of image pixels.

The simple difference in the implementation of SLIC and SLIC++ lies in the usage of distance measure being used for the computation of image super-pixels. The presented research shows merely changing the distance measure to content-aware computational distance measure leads to better accuracy of results against the ground-truth for semidark images.

#### b. Algorithm Complexity

The proposed algorithm follows the same steps as of Basic SLIC by introducing a new content-aware distance equation, thus the complexity of the proposed SLIC++ remains the same without any addition of new parameters, except the weights associated to the Euclidean and geodesic distance. These weights are merely scaler values to be taken into account in the core implementation of content-aware variant of SLIC, i.e., SLIC++. Hence, the complexity for the pixel manipulation is up to *O*(*N*) where *N* is the total number of image pixels. With the minimum possible imposed requirements of computation SLIC++ manages to find accurate balance of implementation with infused functionality of Euclidean and geodesic distance. This fusion results in optimal boundary detection verified in terms of precision-recall in Section 4.

#### **4. Validation of the Proposed Algorithm**

#### *4.1. Experimental Setup and Implementation Details*

Following the proposed algorithm and details of implementation scheme, SLIC++ is implemented in MATLAB. The benchmarking analysis and experiments are conducted in MATLAB workspace version R2020a using the core computer vision and machine learning toolboxes. For experiments, the semi-dark images of Berkeley dataset have been used. The reported experiments are conducted on processor with specs core i7 10750H CPU, 16 GB RAM and 64-bit operating system.

The images are extracted form a folder using Fullfile method then incoming RGB images are converted in CIELAB space. After that parameter initialization takes place to ge<sup>t</sup> the algorithm started. Based on the number of K seed are initialized on the CIELAB image space and the condition relating to the gradient is checked using several different built-in methods. After that each pixel is processed using the proposed similarity measure and super-pixels are created until the threshold specified by user is reached. Similarly, the performance of reported state-of-the-arts is checked using the same environmental setup using the relevant parameters. Finally, the reported boundary performance is reported in form of precision recall measure to check the boundary adherence of super-pixel methods including Meanshift, SLIC and SLIC++. For analysis in terms of precision recall bfscore method is used which takes in the segmented image, ground-truth image and compares the extracted boundary with the ground-truth boundary by returning parameters precision, recall and score.

#### *4.2. Parameter Selection*

In this section we introduce the parameter associated with Meanshift, SLIC and SLIC++. Starting off with the proposed algorithm, SLIC++ uses several parameters as of Basic SLIC. Scaling factor *m* is set to 10, threshold on the iteration is set to value 10 represented by variable *n* and parameter S is computed based on N number of image pixels divided by user defined number of super-pixels in terms of variable K. The variable K provides user control for the number of super-pixels. Compact super-pixels are created as the value of K is increased but it increases the computational overhead. We have reported the performance using four different set of values of K, i.e., 500, 1000, 1500 and 2000. All these parameters including m, n and K are kept same as for the basic SLIC experiments. However, there are some additional parameters associated with SLIC++ which are *w*1 and *w*2 and their values are set to 0.3175 and 0.6825, respectively. The weights are cautiously picked based on trial-and-error experimentation procedure. The images were tested for a range of different weights. The weight values were varied to have weight ratios including 10:90, 30:70, 50:50, 70:30, and 90:10 for Euclidean and Geodesic

distance, respectively. The ratio of 30:70 retains empirically maximum and perceptually meaningful super-pixels resulting in the optimal performance against the ground-truth. For Meanshift implementation the bandwidth parameter is set to 16 and 32, keeping rest of the implementation parameter default. Table 2 shows the averaged performance of the proposed SLIC++ algorithm acquired by varying different values of weights for random test cases.


**Table 2.** Summary statistics of average performance of SLIC++ for varying weights.

Empirically optimized performance of SLIC++ over 30:70 weight ratio for Euclidean and Geodesic distance hybridization has been tabulated in Table 2 row number 4, 9, 14 (formatted bold and italics). Moreover, the parameter values have been set as *K* = 500, *m* = 10 and *n* = 10 for the conducted experiments.

#### *4.3. Performance Analysis*

For performance analysis we considered two different experimental setups including qualitative analysis and quantitative analysis. Initially, we extended and analyzed the performance of SLIC with different distance measures to propose the most relevant distance measure for optimal boundary extension in semi-dark images. Then we compare the proposed algorithm with state-of-the-art super-pixel segmentation algorithms. The detail of the analysis is presented in following sub-sections.

#### 4.3.1. Numeric Analysis of SLIC Extension with Different Distance Measures

For the detailed analysis of the proposed algorithm, we first compare the performance of basic SLIC with the variants of SLIC+ proposed in this study. The evaluation is presented in form of precision recall. For the optimal boundary detection greater values of precision are required. High precision rates relate to low number of false positives eventually resulting in high chance of accurate boundary retrieval, whereas high recall rates are relevant to matching of ground-truth boundaries to segmented boundary. Mathematically, precision is probability of valid results and recall is probability of detected ground-truths data [42]. For analysis of image segmentation modules, both high precision and recall are required to ensure maximum information retrieval [45].

Table 3 shows performance analysis of basic SLIC and its variants over randomly picked semi-dark images.


**Table 3.** Performance analysis of SLIC extensions.

Table 2 depicts all the extension of SLIC perform better in terms of precision-recall. The parameters are kept uniform for all the experiments specifically parameter *m* and *n* as in SLIC [11]. Moreover, there is up to 3–9% gain in precision percentage using SLIC++ as compared to the basic SLIC algorithm. The relevant scores based on precision and recall also shoot up by margin of 5–9% using SLIC++ (row 1 vs. 6 and row 7 vs. 12). However, the performance of other variants of SLIC is subjective to dimensions of incoming data, magnitudes, and memory overload. There usually is no defined consensus regarding best generalized performer in terms of similarity measure so far [44]. Thus, we propose an integration of two similarity measures which takes into account minimal processing resources and still provides optimal boundary detection.

For further detailed qualitative analysis using the same test cases by changing the number of super-pixels we extend the analysis of SLIC versus SLIC++. The precision recall and score graphs are shown in Figure 3.

In Figure 3, solid lines represents performance of SLIC and SLIC++ for Test case 1 and dashed lines represents performance for Test case 2. Figure 3a shows precision curves of SLIC++ are substantially better than the SLIC presented by brown (dark and light) lines for test case 1 and 2, respectively. Figure 3b shows the SLIC++ recall is less the resulting recall of SLIC for the same images. Subsequently, based on the precision, recall and the final scores SLIC++ outperforms basic SLIC on semi-dark images. For number of pixels set to 1000 there is a drop observed in precision and recall of SLIC++, this behavior can be attributed to accuracy measure's intolerance, i.e., even mutual refinements may result in

low precision and recall values [45]. Nevertheless, performance for retrieval increases with increasing number of super-pixels and SLIC++ outperforms SLIC up to margin of 10%.

**Figure 3.** SLIC v/s SLIC++ performance over different number of pixels: (**a**) precision values; (**b**) recall value; (**c**) score values.

#### 4.3.2. Comparative Analysis with State-of-the-Art

For the benchmarking of SLIC++ two different algorithms, i.e., SLIC and Meanshift are considered. To investigate the performance of SLIC and SLIC++ for the analysis over entire Berkeley dataset (semi-dark images), we set the number of super-pixels to 1500. The number of super-pixels is chosen 1500 because the peak performance of both the algorithms in experiment for test case 1 and 2 (refer Figure 3) is achieved by setting this parameter to value 1500. For the comparative analysis we also used Meanshift algorithm with input parameter, i.e., bandwidth set to 32. The bandwidth of Meanshift decides the complexity of the algorithm as this value is decreased the segmentation becomes more intricate with the overhead of computational complexity. To maintain computational resources throughout the experiment and keeping it uniform the parameters are chosen. The summary statistics of the obtained super-pixel segmentation results are shown in Table 4. The numerals presented in table are averaged values of precision, recall, and scores obtained for 316 images separately. The average precision, recall, and scores are presented in Table 4.

**Table 4.** Summary statistics of average performance for Berkeley dataset.


Table 4 shows SLIC++ achieves average percentage score up to 54%, whereas SLIC maintains a score of 47%. Finally, Meanshift achieves a score of 55%, which is greater than SLIC++ but as stated earlier for segmentation application greater values of precision and recall are required. So, comparing the recall of SLIC++ versus Meanshift a huge difference is observed. This difference is in terms of low recall of Meanshift which means algorithm fails to capture salient image structure [45] which is not desired for semi-dark image segmentation.

4.3.3. Boundary Precision Visualization against Ground-Truth

To validate the point-of-view relating to high precision and high recall we present perceptual results of Meanshift, SLIC and SLIC++. Notice that, the high precision means the algorithm has retrieved most of the boundary as presented by the ground-truth, whereas high recall means most of the salient structural information is retrieved from the visual scene. Meanshift resulted in a minimum recall, which hypothetically means the structural information was lost. Table 5 presents how Meanshift, SLIC, and SLIC++ performed in terms of perceptual results for visual information retrieval. The reported results are for parameters *K* = 1500 for SLIC and SLIC++ and bandwidth = 32 for Meanshift.

**Table 5.** Semi-dark perceptual results conforming boundary retrieval.

**Table 5.** *Cont.*

As super-pixels are not just about the boundary detection, resulting applications also expect the structural information present in the visual scene. Consequently, we are not just interested in the object boundaries but also the small structural information present in the visual image specifically semi-dark images. Table 5 shows SLIC and SLIC++ not only retrieves boundaries correctly with minimal computational power consumed but also retrieves the structural information. Column 4 shows the fact by mapping prediction over ground-truth image. For test case 1, in column 4 row id 3 Meanshift fails to extract the structural information as few green lines are observed. Whereas, for the same image SLIC and SLIC++ perform better as a lot of green textured lines are observed (refer column 4 row id 1 and 2). Meanwhile for test case 2, all three algorithms perform equally likely. Similar performance is observed with test case 3, SLIC and SLIC++ retains structural information

better than Meanshift. Since Meanshift resulted in minimum recall over the entire semidark Berkeley dataset (refer Table 4) it does not qualify to be a good fit for super-pixel segmentation. The reason is less reliability of structural information fetching and its performance is highly subjective to the incoming input images.

#### 4.3.4. Visualizing Super-Pixels on Images

For one more layer of subjective analysis of super-pixel performance we present super-pixel masks in this section. Initially, we present the input image in Figure 4 with the highlighted boxes to look closely for retrieval of structural information from the image. Here, the red box shows the texture information present on the hill whereas the green box shows water flowing in a very dark region of the semi-dark image.

**Figure 4.** Input image with highlighted regions for detailed analysis.

Using the input image presented in Figure 4, we conducted experiments by changing the initialization parameters of all three algorithms. Table 5 shows the perceptual analysis visualizing the retrieval of salient structural information.

Table 6 shows that Meanshift extracts the boundaries correctly, whereas it loses all the contextual information when the bandwidth parameter is set to 32. This loss of information is attributed to low recall scores, whereas decreasing the value of the bandwidth increases the computational complexity and at the cost of additional complexity Meanshift now retrieves contextual information. SLIC and SLIC++ with minimal computational power retains structural information as seen in the red and green boxes in rows 1 and 2 of Table 5. Moreover, as the number of super-pixels '*K*' increases, better and greater structural information retrieval is observed.

Figure 5 shows a zoomed in view of the super-pixels created by SLIC and SLIC++, residing in the red box. Here, we can see that SLIC++ retrieves content-aware information and SLIC ends up creating circular super-pixels (Figure 5a) due to the content irrelevant distance measure being used in its operational discourse.
