Next Article in Journal
Study on the Stability of High and Steep Slopes of Open-Air Dump with Various Slope Ratios Under Rainfall Conditions
Next Article in Special Issue
All-Integer Quantization for Low-Complexity Min-Sum Successive Cancellation Polar Decoder
Previous Article in Journal
Design and Simulation Optimization for Hydrodynamic Fertilizer Injector Based on Axial-Flow Turbine Structure
Previous Article in Special Issue
Adaptive Neural-Network-Based Lossless Image Coder with Preprocessed Input Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Enhanced Framework for Lossless Image Compression Using Image Segmentation and a Novel Dynamic Bit-Level Encoding Algorithm

1
Department of Computer Engineering, Faculty of Engineering and Natural Sciences, Kirikkale University, Kirikkale 71450, Turkey
2
Department of Computer Engineering, Graduate School of Natural and Applied Sciences, Kirikkale University, Kirikkale 71450, Turkey
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(6), 2964; https://doi.org/10.3390/app15062964
Submission received: 7 February 2025 / Revised: 7 March 2025 / Accepted: 8 March 2025 / Published: 10 March 2025
(This article belongs to the Special Issue Advanced Digital Signal Processing and Its Applications)

Abstract

:
This study proposes a dynamic bit-level encoding algorithm (DEA) and introduces the S+DEA compression framework, which enhances compression efficiency by integrating the DEA with image segmentation as a preprocessing step. The novel approaches were validated on four different datasets, demonstrating strong performance and broad applicability. A dedicated data structure was developed to facilitate lossless storage and precise reconstruction of compressed data, ensuring data integrity throughout the process. The evaluation results showed that the DEA outperformed all benchmark encoding algorithms, achieving an improvement percentage (IP) value of 45.12, indicating its effectiveness as a highly efficient encoding method. Moreover, the S+DEA compression algorithm demonstrated significant improvements in compression efficiency. It consistently outperformed BPG, JPEG-LS, and JPEG2000 across three datasets. While it performed slightly worse than JPEG-LS in medical images, it remained competitive overall. A dataset-specific analysis revealed that in medical images, the S+DEA performed close to the DEA, suggesting that segmentation alone does not enhance compression in this domain. This emphasizes the importance of exploring alternative preprocessing techniques to enhance the DEA’s performance in medical imaging applications. The experimental results demonstrate that the DEA and S+DEA offer competitive encoding and compression capabilities, making them promising alternatives to existing frameworks.

1. Introduction

Images serve as significant representational entities, functioning as versatile information carriers in various forms, including television, satellite imagery, medical imaging, and computer-stored visuals [1]. Digital images, generated by sampling and digitizing a 2D light intensity signal, inherently contain large volumes of data. This often poses significant challenges in terms of storage and transmission. Image compression stands as a fundamental technique aimed at addressing this challenge by minimizing the quantity of data required to represent an image, thereby optimizing storage and transmission efficiency. Built upon a strong foundation in the domains of signal and image processing, image compression has gained heightened importance, particularly in addressing bandwidth and memory constraints with the rapid expansion of digital image data [2,3]. As a result, a variety of advanced image compression techniques have been developed to enhance storage capacity and streamline bandwidth utilization during transmission [4].
There is a crucial aspect to highlight at this stage. The compression algorithms to be utilized may vary depending on whether the purpose is image storage or transmission. When the goal is image transmission, real-time processing constraints become a critical consideration. Conversely, when the goal is image storage, often as part of an archiving process, there are generally no significant restrictions on the compression algorithms that can be applied. The fundamental difference between these two approaches lies in the fact that during the storage process, algorithms are not required to operate in real time. As a result, there is no need to buffer the data obtained at the end of the encoding process to match the transmission speed of the communication channel [5].
The compression algorithm applied to an image varies not only based on storage or transmission purposes but also according to the structure of the image and its intended use. To meet these objectives, image compression techniques are broadly classified in the literature into two categories: lossless and lossy compression [6,7]. In lossless image compression methods, the compressed image must be an exact copy of the original image. Lossless compression algorithms are employed in scenarios where any loss or alteration of data is unacceptable. Notable examples include medical imaging [8,9], thermal imagery acquired through nano-satellite technology [10], and images used for critical applications such as remote sensing [11]. Lossy compression algorithms, a commonly employed alternative, allow for the intentional loss of certain information from the original image, with the primary goal of achieving significantly higher compression ratios. Images compressed using this method exhibit some degree of distortion compared to the original image. However, the success and efficiency of lossy compression algorithms are assessed based on the extent of data distortion, the achieved compression ratio, and the computational complexity of the algorithm [6,12,13]. Lossy compression algorithms are often preferred due to their ability to significantly reduce file sizes. This reduction is achieved by discarding certain image data deemed less critical to human visual perception. During this process, techniques such as approximation and quantization are applied to the image. However, the inherent information loss resulting from the algorithm’s design can pose challenges in applications such as pattern recognition or optical character recognition processes [14]. Lossy compression algorithms are commonly utilized for transmitting images over the web [15], processing data from IoT sensors [16] and EEG signals [17].
A comprehensive understanding of image compression requires an in-depth exploration of how digital images are represented within the field of computer science. Fundamentally, images can be characterized as a two-dimensional (2-D) function such as f(x, y), where x and y denote spatial (plane) coordinates, and the amplitude of f at any given coordinate pair (x, y) is referred to as intensity, brightness, or gray level [18,19]. To enable processing, an image must first undergo digitization. In the case of digital images, the function is represented as f(i, j), where i and j are discrete integer values, as illustrated in Figure 1.
Images are classified into two categories: grayscale images and colored images. A grayscale digital image is represented as an M x N pixel matrix, where the image captures only the intensity of light. Pixels in a digital image are typically represented as unsigned 8-bit integer values, ranging from 0 to 255. This range allows for the depiction of 256 discrete shades of gray, where 0 corresponds to black, 255 represents white, and the intermediate values indicate varying levels of gray intensity.
Color images are represented and modeled as linear combinations of the three primary colors: red, green, and blue, forming the foundation of what is known as a color space. Widely used color spaces in digital imaging include HSV (Hue, Saturation, Value), RGB (Red, Green, Blue), and CMYK (Cyan, Magenta, Yellow, Black). In such images, each pixel is characterized by three matrices that represent its coordinates within a given color space in RGB. As each matrix consists of 8-bit values, this results in a precision of 24 bits per pixel, widely referred to as 24-bit color schema.
From the perspective of image representation, traditional image compression algorithms are based on pixel-level operations [20,21,22], entropy coding [23,24], block-based prediction [25,26], integer transformations [27,28], or combinations of these techniques [9,29,30]. The literature includes both lossy and lossless compression algorithms developed using these fundamental structures. Additionally, compression algorithms tailored to the specific characteristics of images have also been proposed [31,32].
This study aims to contribute to the literature by introducing an advanced lossless compression algorithm applicable to all types of images. Since the algorithm is developed at the pixel level, it is designed to be universally applicable. A preprocessing step is incorporated to improve performance and achieve higher compression ratios, even while maintaining a lossless approach. During this preprocessing phase, the image is divided into small segments containing similar color tones. To guarantee data integrity throughout the process, a specialized data structure is developed to preserve and encapsulate all information. After preprocessing, the resulting data structure is compressed using a newly developed DEA and S+DEA, designed to maximize compression efficiency.
Encoding and compression algorithms fulfill distinct yet interconnected roles in data processing. Encoding algorithms primarily structure data according to predefined rules, ensuring efficient representation without necessarily reducing its size. In contrast, compression algorithms are specifically designed to minimize data size by eliminating redundancy while retaining essential information. This process typically involves one or more preprocessing steps, which further enhance the overall efficiency of compression. As part of this study, a robust and efficient encoding algorithm, DEA, is initially developed. To enhance the performance and efficiency of the DEA, a preprocessing step in the form of segmentation is integrated into the algorithm, leading to the proposal of a novel compression algorithm, termed S+DEA. To assess and compare the effectiveness of the study, the DEA is evaluated against other encoding algorithms, while the S+DEA is compared with well-established lossless compression algorithms.
To our knowledge, the literature lacks an algorithm designed for all image characteristics that integrates image segmentation, stores the data in a specialized structure, and compresses them using a custom dynamic bit-level encoding algorithm. From a compression perspective, the limited application areas of lossless image compression techniques and the relatively lower success rates of existing algorithms served as the primary motivation for developing a novel dynamic encoding algorithm. A preprocessing step was introduced to leverage the algorithm’s inherent ability to perform better with similar color tones. In this step, the image was segmented based on similar color tones, and a specialized data structure was designed for efficient storage.
This study presents a significant advancement in lossless image compression by introducing an innovative dynamic bit-level encoding algorithm and a framework that enhances encoding efficiency through a combination of image segmentation and dynamic bit-level encoding.
The key contributions of this research can be summarized as follows:
  • A newly developed Dynamic Encoding Algorithm (DEA) is introduced into the literature, specifically designed to enhance compression efficiency. The algorithm builds upon the well-established Huffman Coding Algorithm (HCA) while incorporating a novel adaptive encoding mechanism, making it applicable not only to images but also to various data types. Its dynamic and scalable nature positions the DEA as a promising alternative to conventional encoding techniques,
  • A novel lossless compression framework, S+DEA, is developed, which integrates image segmentation based on similar color values with the DEA. By encoding segmented image regions separately, this approach optimizes compression efficiency, achieving better performance compared to existing methods,
  • A specialized data structure is designed to store segmented image regions, preserving them lossless while enabling efficient reconstruction. Its flexibility allows for applications beyond compression, supporting various image processing applications and contributing to further advancements in the field.
The proposed approach introduces a fresh perspective on lossless image compression, addressing critical gaps in the literature and significantly contributing to the field. Experimental evaluations demonstrate that the developed framework consistently outperforms existing lossless compression algorithms, confirming its robustness and adaptability across various datasets. Given its effectiveness, this study provides a strong foundation advancing lossless compression, with promising applications in both academic research and industrial settings.
The structure of this paper is organized as follows: Section 2 provides a review of related work in the field. Section 3 presents a comprehensive explanation of the proposed model and its underlying methodology. Section 4 details the experimental setup and offers an analysis of the results. Finally, Section 5 concludes the study, includes a discussion of the findings, and outlines potential directions for future research.

2. Related Work

Image compression plays a crucial role in multimedia processing and remains a significant research focus in computer science.
The fundamental steps of the compression process are depicted in Figure 2. Typically, compression algorithms incorporate a preprocessing stage before the encoding process to prepare the image and optimize performance. For lossless compression, restoring the compressed file to its original state is crucial. The decompression process begins with the decoding operation, followed by a post-processing stage that reverses the preprocessing steps to reconstruct the original image [33].
In this domain, the Huffman Coding Algorithm (HCA), Run-Length Encoding (RLE) [34], arithmetic coding [35], and Lempel–Ziv (LZ) coding serve as the foundation for numerous compression techniques based on coding schemes and are widely regarded for their efficiency and adaptability [6].
Among these algorithms, the HCA [23], an entropy-based method, stands out as the most renowned due to its widespread application in various file formats. It is commonly utilized in the preprocessing stages of lossless compression algorithms. The core principle of the HCA is to assign variable-length codes to input characters based on their frequency distribution. The output of the encoding process is a code table, where symbols from the source are assigned variable-length codes according to their frequencies. Each input character is uniquely represented by a bit string determined through a Huffman tree, constructed based on the frequency table. The widespread use of the HCA across various fields and in the literature can be attributed to its simplicity, ease of implementation, and the lack of patent restrictions. Additionally, the algorithm has been adapted in numerous ways in the literature, resulting in specialized versions tailored to specific applications such as Golomb code, adaptive Huffman code, non-binary Huffman code, etc. Moreover, the HCA is utilized at the core of widely used applications today, such as MP3, JPEG, and Deflate [36].
Additionally, dictionary-based coding approaches are particularly advantageous for data with frequently repeating patterns. One of the most widely used algorithms in dictionary-based coding is the Lempel–Ziv (LZ) algorithm. Due to its applicability across various file formats, the LZ algorithm identifies frequently occurring patterns in the data and replaces them with symbols based on their frequency. As expected, this method demonstrates greater efficiency with large datasets containing repetitive patterns. Two distinct versions of the LZ algorithm, known as LZ77 [37] and LZ78 [38], have been developed, differing in their methods of pattern detection and storage. In addition, a variant called Lempel–Ziv–Welch (LZW) [39] was developed. This version allows the encoder to create an adaptive dictionary, permitting the characterization of variable-length strings without any prior information about the input.
Image segmentation is a crucial preprocessing step in image compression, as it divides an image into distinct regions or segments with meaningful characteristics. This approach facilitates targeted compression by distinguishing areas of varying importance, enabling higher compression rates in less significant regions while preserving critical details in essential segments. However, a review of the literature indicates that image segmentation algorithms are primarily applied in medical imaging, where they are predominantly used to distinguish regions of interest (ROIs) from non-ROI areas. In [40,41,42,43,44] medical images were segmented into ROI and non-ROI areas, with different compression algorithms, lossless for the ROIs and lossy for the non-ROIs, applied to the respective regions. Although limited in scope, some studies have applied compression processes following image segmentation [45,46,47].
The literature highlights a lack of versatile and universally applicable lossless compression algorithms, as most are designed for specific domains. This study addresses this gap by introducing (i) a dynamic encoding algorithm adaptable to all file formats, (ii) a flexible data structure supporting different image types, and (iii) a lossless compression framework integrating segmentation preprocessing for optimized performance. These contributions enhance versatility, efficiency, and applicability, setting a new benchmark in lossless compression.

3. Proposed Method

Image compression algorithms fundamentally consist of two core processes: encoding, which compresses the image data into a more compact representation, and decoding, which reconstructs the image from the compressed format. Lossless compression algorithms ensure no data loss, making them more precise and reliable than their lossy counterparts. This study presents an enhanced framework for lossless image compression that leverages image segmentation and a novel dynamic bit-level encoding algorithm. The block diagram of the proposed framework is illustrated in Figure 3.
The original image is first divided into segments using an image segmentation algorithm. At this stage, the most critical factor influencing the success of the compression algorithm is the criterion or parameters used to determine how the image is partitioned. To improve the efficiency of the encoding algorithm and achieve superior compression ratios, a segmentation algorithm that divides the image based on similar colors and textures was employed. To ensure compatibility with lossless compression, a specialized file format was developed for storing the segmented images. This format is critical for accurately reconstructing the original image.
The framework is detailed under three main sections: (a) Image Segmentation, (b) Packaging Technique for Image Segments, and (c) DEA.

3.1. Image Segmentation

The preprocessing stage of the lossless compression algorithm includes segmenting the image into distinct regions with similar color and texture features. The primary objective of this image segmentation phase is to group regions with similar colors or near-identical RGB values, ensuring optimal compression performance and efficiency. The literature offers a wide range of algorithms for image segmentation. In this study, the graph-based segmentation algorithm proposed by Felzenszwalb and Huttenlocher (FH) was chosen [48]. The algorithm basically uses a minimum-spanning-tree-based clustering approach for segmenting RGB images. The algorithm uses Euclidean distance calculation between pixels in color space. The primary reason for selecting this approach was that the FH algorithm generates over-segmented regions that align well with image boundaries, allowing even complex images to be divided into segments with high accuracy and efficient execution time [49]. However, these regions typically have irregular sizes and shapes. Despite appearing as a disadvantage at first, this irregularity allows for color-based segmentation of the entire image, making it the most appropriate approach for achieving better compression results.
A notable example is the Baboon image from the USC-SIPI dataset, which is widely utilized in the literature for evaluating image compression algorithms. This image presents a significant challenge for compression due to its intricate textural characteristics and complex color distribution. In Figure 4, the original input image is now explicitly labeled as Figure 4a, ensuring clarity in the segmentation process. The segmented images, derived from the Baboon image through the Felzenszwalb and Huttenlocher (FH) algorithm, are annotated as Figure 4b–k. The FH algorithm generates segments that vary in size, shape, and geometric regularity, yet it effectively clusters these segments based on color tone similarities, fulfilling the intended segmentation criteria. These updates enhance the comprehensibility of Figure 4, ensuring that the segmentation process is accurately represented and that the figure annotations are aligned with the manuscript’s descriptive content.
Lossless compression algorithms must guarantee the exact reconstruction of the original file. However, the irregular sizes and shapes of the image segments generated by the FH algorithm makes direct compression infeasible. The irregularity of the segments generated by the Felzenszwalb and Huttenlocher (FH) algorithm introduces a significant challenge during the reconstruction phase. Specifically, determining the precise placement, orientation, and spatial configuration of each segment within the original image becomes a complex and non-trivial task. This difficulty arises from the heterogeneous sizes and shapes of the segments, which lack a consistent or predictable structure. To address this issue, a specialized data structure was designed to facilitate the lossless storage of image segments.

3.2. Packaging Technique for Image Segments (Data Structure)

Although the image segments vary in size and shape, they are effectively divided based on similar color and texture characteristics. To enable lossless storage of these segments before compression, a generic data structure was developed. This data structure is not limited to compression tasks but can also be applied to any scenario involving the storage of segmented images, offering a versatile solution to the literature.
At this stage, the most crucial consideration is minimizing the data size of the package. Without this, the benefits of compression could be offset by the size of the data package itself. To address this issue, the process was designed to function in a modular way by generating two distinct files, each optimized to occupy the smallest possible data size. For this purpose, two data structures, named “header” and “data”, were created for each image. The specifics of the first structure, referred to as “header”, are illustrated in Figure 5.
The details of the data stored in the “header” data structure are comprehensively explained. The first 1-byte field of this data structure stores the information regarding the number of segments (n) into which the image has been segmented.
The next 12-byte data fields store the following information for each segment: y, number of lines, 1. Huffman header size and 2. Huffman header size, and finally compressed data size. For instance, as shown in Figure 5, the image is divided into three segments. The number of these fields is increased in direct proportion to the total number of segments in the image. Specifically, for each segment:
  • y: the segment’s starting position on the y-axis of the image (2 bytes),
  • number of lines: the number of rows occupied by the segment along the y-axis (2 bytes),
  • 1. Huffman header size: the length of the Huffman header for the segment (2 bytes—explained later),
  • 2. Huffman header size: the length of the Huffman header for the segment (2 bytes—explained later),
  • Compressed data size: the compressed data size of the segment (4 bytes).
The next 2-byte field stores the size of each segment along the x-axis for the corresponding y value. The x values and length values stored in this field allow the data for each row of the image to be recorded. As the y value progresses incrementally until the end of the image, its starting value is stored in the previous field to avoid redundancy. The developed data structure ensures that an image can be segmented into parts and accurately reconstructed from the segmented data without any loss. Furthermore, the header data structure remains uncompressed and serves as a reference guide for each compressed image.
The size of the header data structure was calculated for various images and different numbers of image segments. For instance, a 512 × 512 image with 10 segments occupies approximately 10 KB, while a 1024 × 1024 image with 10 segments occupies around 20 KB. With increasing pixel count and image dimensions, the header’s relative size becomes negligible after compression.
The details of the data structure named “Data” are presented in Figure 6. This structure contains two Huffman headers and the compressed data for each segment. The size of the field content in the data structure is entirely dynamic, as the size of the compressed data for each segment is not predetermined. Consequently, the “Header” data structure stores the Huffman header data size and compressed data size for each segment.
The developed data structure was specifically designed for the storage of image segments and their data. These data structures can also find applications in a wide range of fields that rely on image segmentation.

3.3. Dynamic Bit-Level Encoding Algorithm (DEA)

This part of the study introduces the novel dynamic encoding algorithm (DEA), developed to provide a flexible and universal solution for a variety of applications. To enhance clarity, the algorithm was divided into steps, which are illustrated in a flowchart presented in Figure 7. The phrase “Do for each segment” appears at the first step of the developed algorithm, accompanied by an asterisk indicating that the algorithm can also be applied to an entire image without segmentation. Furthermore, tests were performed both with and without segmentation to assess the algorithm’s performance in each scenario. All operations within the algorithm were executed for either the entire image or its individual segments. The developed encoding algorithm is described in greater detail at the end of this section, along with a small example to enhance understanding.
Step 1: Initially, the segmented or entire image is used as input, and the RGB values of each pixel within the data packet is extracted. For example, the red value ranges between 0 and 255. These values are planned to be converted into characters based on their corresponding ASCII codes; for instance, if the red value is 65, it is stored as “A”, and if it is 99, it is stored as “c”. Once all pixel values have been converted into characters, the algorithm calculates the frequency with which each character is immediately followed by another. This process aims to identify recurring consecutive characters and their associated frequencies. For each character, the subsequent characters are sorted in descending order based on their frequency values. The aim of this approach is to enable a more straightforward analysis by identifying how often specific characters are followed by others.
Step 2: The HCA is based on the fundamental principle of representing frequently occurring characters in a text with fewer bits, while less frequent characters are represented with longer bit sequences [23]. A critical aspect of the algorithm is calculating the number of palette entries, which ensures that characters frequently following a given character are assigned shorter bit lengths. Conversely, less frequent characters are allocated bit lengths that are efficient without being excessively long. This strategy optimizes the encoding process by balancing the bit-length allocation, thereby improving both compression efficiency and data representation. Therefore, it is crucial to divide the characters that follow each given character into two distinct groups. This approach constitutes a unique feature that distinguishes the developed encoding algorithm from prior studies in the literature. By dynamically determining the number of palette entries according to the specific characteristics of the input image, the proposed algorithm achieves a more efficient and versatile encoding solution, enhancing its adaptability and overall performance.
Since RGB values range between 0 and 255, the characters that can follow any given character can have a maximum of 255 different possibilities. Therefore, the condition Group 1 + Group 2 ≤ 255 must always be satisfied. Determining the optimal number of characters in Group 1 and Group 2 would require repeatedly executing the entire encoding process within a loop. However, this approach poses a significant risk of excessively long runtime. During the study, a specialized mathematical model was developed to calculate the optimal number of palette entries, eliminating the need to run the entire algorithm iteratively in a loop. While developing this equation, several variable parameters were considered, including the number of nodes in the Huffman tree and the bit lengths of characters at the leaf nodes. By incorporating these factors, the model ensures a balanced and efficient determination of palette entries, optimizing the encoding process.
The number of entries in Group1 is referred to as the “palette”, as it contains the most frequently used characters. Since different segments may have varying pixel values and distributions, the palette size also varies for each segment or image. Equation (1) presents the mathematical model designed to calculate the palette size by analyzing the frequency of characters that follow each character. This approach facilitates the efficient representation of frequently used characters, contributing to improved compression performance. By utilizing the palette size, the algorithm identifies the most frequently used characters and assigns them shorter bit lengths, enhancing compression efficiency.
Given that a G r o u p   1 + b G r o u p 2 255 ,
( log 2 a ) × ( N + M × log 2 b )
In this context, N represents the total frequency of characters within the palette (Group 1), and M represents the total frequency of characters outside the palette (Group 2); the palette size is represented by a and the b value is determined as (255 − a).
At this step, the calculated palette size (a) represents the most frequently used characters. This value is dynamically computed based on the frequency distribution within the image or segment. The pseudocode for determining the palette size and classifying characters into Group 1 and Group 2 is presented in Algorithm 1.
Step 3: Based on the palette size determined in the previous step, the first and most frequently occurring group, referred to as Group 1, is processed using a dedicated HCA. Similarly, another HCA is applied to Group 2, which comprises the characters outside the palette (non-palette characters). The frequency value of the last element in Group 1 is set as the cumulative frequency of the characters in Group 2. During the encoding process, Group 1 is searched first. If the character is not found, the flag code (prefix code) value, the last element, is appended.
Step 4: After defining Group 1, the remaining characters are also stored in Group 2. This group consists of characters with lower frequency values, signifying that they appear less often following the specified character.
Algorithm 1. Pseudocode of the algorithm that calculates the number of palettes
function elementsInPalette ← specifyElementsInPalette(alphabet, tempElementsInPalette) {
minLogResult = Int.MaxValue
for i ← 1 to 255 do
//“frequentlyRepeated” method (in step 3) divides the alphabet into two according to the given value.
currentElementsInPalette ← frequentlyRepeated(alphabet, i)
logResult ← calculateLog(i)
if(logResult < minLogResult) then
minLogResult = i
elementsInPalette = minLogResult
}
function calculateLog(paletteNumber){
for (currentKey in alphabet.keyset()) do
for i ← 0, alphabet.currentKey[].Length-1 do
totalFirstFrequency += currentKey[i].valuetotalSecondFrequency += currentKey[Length-1].value
log 2 a M a t h . L o g     p a l e t t e N u m b e r      /    M a t h . log 2  
log 2 b M a t h . L o g     255 p a l e t t e N u m b e r      /    M a t h . log 2  
return :   log 2 a     ( t o t a l F i r s t F r e q u e n c y   +   t o t a l S e c o n d F r e q u e n c y log 2 b )
}
Step 5: After dividing the characters that follow each character into Group 1 and Group 2, two different Huffman trees are generated for each character.
Step 6: At this stage, the encoding process begins by sequentially processing the entire alphabet from the beginning.
  • Since no structure exists initially and there is no preceding character, the first character is directly encoded using its ASCII value and stored as an 8-bit representation.
  • For the next character, its bit value is searched in the Group 1 Huffman tree. If the character is found, it is encoded using the shortened code word.
  • If the character is not found, a prefix code (flag code), defined as a distinguishing feature between Group 1 and Group 2, is added. The character is then encoded using its code-word value from Group 2.
To ensure a better understanding, the algorithm is not only presented through a flowchart and described step by step but is also illustrated with a small example below.

3.4. DEA Process Example

Given that the first pixel values of the blue color (B in RGB) within an image or image segment are 98, 99, and 100, respectively, their ASCII code equivalents would be b, c, and d. Let us encode the substring “bcdi…” found within the entire string and consider this part as the starting section of the string. A part of the table displaying the letters following “b”, “c”, “d” in the visual and their frequencies is presented below. In this example, it is assumed that all the letters following these are represented in the table. Additionally, let us decide the palette number is set to 12 for all characters (assuming that it is calculated with Equation (1)).
CharsThe frequency table of the usage of the subsequent characters.
b(`, 66)(a, 65)(d, 59)(^, 57)(e, 49)(], 47)(c, 47)(b, 45)([, 44)(g, 44)(i, 44)(_, 42)(f, 41)(\, 39)(k, 34)(X, 31)(h, 31)(U, 30)(V, 30)(W, 30)…
c(c, 68)(`, 57)(d, 57)(b, 55)(a, 54)(e, 52)(], 47)(_, 47)(^, 46)(h, 46)(\, 44)(f, 44)(g, 41)(k, 36)(Y, 32)(j, 31)(Z, 30)(i, 28)(W, 26)(X, 26)…
d(`, 69)(b, 60)(c, 59)(e, 56)(d, 52)(a, 50)(g, 50)(h, 45)(_, 40)(f, 39)(\, 35)(], 35)(l, 35)([, 34)(i, 32)(j, 32)(U, 31)(k, 29)(Y, 27)(m, 27)...
The data indicate that there are 20 different letters and their respective frequencies following the letters “b”, “c”, and “d”. For instance, it can be observed that the letter “a” appears 65 times after the letter “b” throughout the segment or entire image. It is apparent that the character “i” is not listed within the first 20 characters.
GroupsCharDistribution of Group 1 and Group 2 characters for each character
Group 1 charactersb
c
d
(`, 66)(a, 65)(d, 59)(^, 57)(e, 49)(], 47)(c, 47)(b, 45)([, 44)(g, 44)(i, 44)(?, 1187)
(c, 68)(`, 57)(d, 57)(b, 55)(a, 54)(e, 52)(], 47)(_, 47)(^, 46)(h, 46)(\, 44)(?, 1206)
(`, 69)(b, 60)(c, 59)(e, 56)(d, 52)(a, 50)(g, 50)(h, 45)(_, 40)(f, 39)(\, 35)(?, 1137)
Group 2 charactersb
c
d
(_, 42)(f, 41)(\, 39)(k, 34)(X, 31)(h, 31)(U, 30)(V, 30)(W, 30)
(f, 44)(g, 41)(k, 36)(Y, 32)(j, 31)(Z, 30)(i, 28)(W, 26)(X, 26)
(], 35)(l, 35)([, 34)(i, 32)(j, 32)(U, 31)(k, 29)(Y, 27)(m, 27)
The frequency value of the last element in Group 1 is the sum of the frequencies of the elements that come after the palette number. This value is represented by the “?” character in this example. For the Group 1 and Group 2 sections of the letters b, c, and d, different Huffman trees are generated before. Let us encode the “bcdi” characters.
(i).
Since the letter “b” is the first character in the image, it is encoded according to its 8-bit ASCII-code representation.
(ii).
The character “c” follows the character “b” and, as observed, since it belongs to Group 1, the code word created for the character “c” in the Group 1 Huffman tree is used.
(iii).
The character “d” follows the character “c” and, since the character “d” also belongs to Group 1, the code word from the Group 1 Huffman tree is used for the character “d”.
(iv).
The situation is different for the character “i” that follows the character “d”. When examining the tables for the character “d”, it is observed that the character “i” belongs to Group 2, not Group 1. Therefore, we cannot write the code word for the character “i” from the Group 1 Huffman tree. First, the code word for the character “?” is added, followed immediately by the code word for the character “i” from the Group 2 Huffman tree.
In this manner, this encoding process is repeated iteratively until the original image or image segments have been completely processed. The flowchart presented in Figure 8 provides a detailed visual representation of the steps summarized above.

4. Experimental Design and Results

This section presents the datasets and evaluation metrics utilized to assess the developed algorithm, a comparison of the algorithm’s results, an analysis of the efficiency of the proposed model, and a discussion of the challenges and limitations encountered.

4.1. Development and Experimental Setup

The computational setup and software environment for the proposed method are as follows. Three machines were utilized for development, deployment, and validation:
  • Development machine: A home PC equipped with a 12th Gen Intel® Core™ i7-12700K CPU, 32 GB RAM, and a 1 TB SSD. The software environment on this machine included NetBeans IDE 24, Java 21.0.5, and Java SE Runtime Environment 21.0.5+9-LTS-239.
  • Deployment and test machine: A notebook PC configured with an Intel Core i7-4720HQ CPU, 16 GB RAM, and a 256 GB SSD. The software environment on this machine consisted of NetBeans IDE 22, Java 17.0.12, and Java SE Runtime Environment 17.0.12+8-LTS-286.
  • Additional deployment machine: A workstation featuring Intel Xeon E5-2620 v4 2.10 GHz processors (8 cores), 32 GB RAM, and a 1 TB SATA HDD. The software environment on this workstation included NetBeans IDE 22, Java 17.0.12, and Java SE Runtime Environment 17.0.12+8-LTS-286.

4.2. Dataset

A comprehensive evaluation of the developed algorithm was necessary within the scope of this study. Accordingly, four distinct datasets featuring images with different characteristics were employed. Six images were selected and evaluated from each dataset, and the thumbnails of the selected images are presented in Figure 9. The first dataset employed was the USC-SIPI dataset, which is widely recognized and commonly referenced in the literature [50]. The second dataset was an open-source collection of remote sensing images available on Kaggle, comprising 350 ultra high-resolution images (3099 × 2329) [51]. The Chest X-Ray Images dataset from Kaggle, containing 5863 X-Ray images, was chosen to assess the performance of the developed algorithm on medical image datasets [52]. The final dataset was the Agriculture Crop Images dataset, which includes images of five types of agricultural products: wheat, rice, sugarcane, maize, and jute [53].
Images with the attributes listed in Table 1 were chosen to evaluate the algorithm across a range of diverse characteristics, resolutions, and sizes. These included both grayscale and color images, as well as images with small and very large dimensions. The results obtained provided sufficient insights to assess and analyze the performance of the algorithm. In addition, the column titled “Name Tag” in the table contains a short identifier assigned to each image. This identifier is utilized in the subsequent steps of the study.

4.3. Evaluation Metrics

Various performance metrics have been established to evaluate the efficiency of compression algorithms. In this study, the effectiveness of the developed algorithm was assessed using commonly utilized metrics for evaluating lossless compression algorithms, as highlighted in the literature: Compression Ratio (CR), Bits Per Pixel (BPP), and Space Saving (SS) [54]. The definitions and corresponding equations for these metrics are detailed below.
The Compression Ratio (CR), as defined in Equation (2), is calculated as the ratio of the original image size to the compressed image size. A higher CR value signifies greater compression efficiency, indicating that the image has been more effectively reduced in size.
C R = O r i g i n a l   I m a g e s i z e C o m p r e s s e d   I m a g e s i z e
Another key metric for evaluation is the Bits Per Pixel (BPP), as defined in Equation (3). This metric is calculated by dividing the size of the compressed image by the total number of pixels in the image. Its primary purpose is to determine the average number of bits needed to represent each pixel. A lower BPP value indicates a more efficient compression, as fewer bits are required per pixel to represent the image.
B P P = C o m p r e s s e d   I m a g e s i z e T o t a l   n u m b e r   o f   p i x e l s   i n   t h e   i m a g e
The Space Saving (SS) value, closely related to the Compression Ratio (CR), is another widely utilized metric in recent studies [54]. As expressed in Equation (4), the SS value measures the reduction in size relative to the original size, offering a clear indication of the algorithm’s compression efficiency.
S S = 1 C o m p r e s s e d   I m a g e s i z e O r i g i n a l   I m a g e s i z e  
The metrics detailed above served as comprehensive tools for evaluating the performance of the algorithms, offering a robust framework for assessing its overall effectiveness.

4.4. Comparison Algorithms

Comparison algorithms refer to the benchmarking methods used to evaluate the performance of proposed algorithms against existing state-of-the-art techniques. These algorithms are typically well established in the literature and serve as a reference for fair performance assessment. Encoding algorithms primarily focus on transforming data into a structured format according to predefined rules. They generally reduce the size of the data but optimize its representation for various purposes, such as efficient storage, transmission, or security. However, compression algorithms are specifically designed to reduce the size of the data by eliminating redundancy while preserving essential information using encoding algorithms. Compression can be either lossless (where the original data are perfectly restored) or lossy (where some data are discarded to achieve higher compression). While encoding algorithms are often used as part of compression techniques (e.g., Huffman coding in ZIP or JPEG), compression algorithms are specifically designed to reduce data size, making them more application-specific.
In this study, the comparison algorithms included the following:

4.4.1. Encoding Algorithms

  • Huffman Coding Algorithm (HCA) [23]: The Huffman Coding Algorithm is a widely used lossless encoding technique based on variable-length coding. It assigns shorter binary codes to more frequently occurring symbols and longer codes to less frequent ones, ensuring an efficient and compact representation of data. Due to its efficiency and adaptability, the Huffman Coding Algorithm remains a fundamental tool in data compression, enabling faster storage and transmission while preserving data integrity.
  • Local Path on Huffman Encoding Algorithm (LPHEA) [30]: The LPHEA is an alternative encoding algorithm designed to address the limitations of Huffman Encoding and arithmetic coding in image compression. While Huffman encoding efficiently represents frequently occurring symbols with short bit sequences, it assigns excessively long bit sequences to less frequent symbols. LPHEA improves this by initially applying Huffman encoding and then introducing flag bits to optimize the representation of low-frequency symbols. Specifically, a flag bit “1” is added if the successive symbol remains on the same leaf level in the Huffman tree, while a flag bit “0” is inserted otherwise. This method retains the advantages of Huffman encoding while mitigating inefficiencies in handling long bit sequences. The algorithm’s effectiveness demonstrating strong performance in image compression, particularly for images with a balanced tree structure, where it achieved competitive results compared to other algorithms.
  • Huffman-Based Lossless Image Encoding Scheme (HBLIES) [29]: The HBLIES is an efficient encoding algorithm designed to improve data compression by leveraging frequency modulation techniques. In this approach, the most frequently occurring symbols following each character are identified and grouped, and Huffman encoding is applied specifically to them, optimizing bit representation. The results indicate that HBLIES outperforms well-known encoding methods, including the Huffman encoding algorithm, arithmetic coding, and LPHEA, achieving superior compression efficiency across all test images.

4.4.2. Compression Algorithms

  • Better Portable Graphics (BPG) [55]: BPG is an image compression format that is a more efficient alternative to JPEG. It is based on the HEVC (High Efficiency Video Coding, H.265) standard, which provides superior compression while maintaining high image quality. Due to its advanced compression capabilities, high visual fidelity, and efficient storage, BPG is a significant development in image compression, making it a strong candidate for replacing traditional JPEG in modern applications.
  • Lossless JPEG Standard (JPEG-LS) [56]: JPEG-LS (JPEG Lossless Standard) is a lossless and near-lossless image compression algorithm developed by the Joint Photographic Experts Group (JPEG) as part of the ISO/IEC 14495-1 standard. It is designed to provide efficient and computationally lightweight compression while ensuring perfect image reconstruction. Due to its simplicity, speed, and efficiency, JPEG-LS is an excellent choice for applications requiring high-quality lossless image compression while maintaining a low computational cost.
  • Advanced Image Compression (JPEG2000) [57]: JPEG2000 is an advanced image compression algorithm developed by the Joint Photographic Experts Group (JPEG) as part of the ISO/IEC 15444 standard. Unlike traditional JPEG, it utilizes wavelet-based compression instead of the Discrete Cosine Transform (DCT), offering superior image quality at higher compression rates. Due to its high compression efficiency, superior image quality, and flexibility, JPEG2000 is an essential algorithm for applications requiring high-fidelity image preservation and advanced compression capabilities.
These algorithms were selected based on their widespread adoption, efficiency, and compression capabilities. The DEA was evaluated against other encoding algorithms, while the S+DEA was compared with well-established lossless compression algorithms.

4.5. Results

This study utilized the CR, BPP, and SS metrics to conduct a comprehensive evaluation and in-depth analysis of the developed algorithm. To ensure a thorough and reliable comparison, the encoding and compression algorithms detailed under the Comparison Algorithms Section were used.
Furthermore, the developed DEA analyzed the characters that followed each character in an image or segment and assigned appropriate bit lengths accordingly. For the R, G, and B values, frequently used characters were computed individually using the mathematical model described in Equation (1).
The palette counts calculated separately for the R, G, and B values of the Baboon image, as shown in Figure 4a, are presented in Table 2.
The palette counts for the R, G, and B values of the segments of the Baboon image, as shown in Figure 4b, were also calculated and presented in Table 3.
The frequently occurring characters that followed a specific character in each image segment were dynamically evaluated to determine their inclusion in the palette. This approach focused on reducing the bit representation of the segment, leading to better compression performance.

4.5.1. Compression Ratio (CR)

The Compression Ratio (CR) was employed in this study as a key performance metric. CR values were calculated separately for HCA, LPHEA, HBLIES, and the DEA across all images in the datasets. Similarly, CR values were computed for the S+DEA compression algorithm, and its performance was compared against BPG, JPEG-LS, and JPEG2000. All CR values are summarized in Table 4, demonstrating that the proposed DEA consistently outperformed the benchmark encoding algorithms across all images in the datasets. Furthermore, the integration of the segmentation preprocessing step into the DEA, as introduced in the S+DEA framework, resulted in further improvements in compression performance. The improvements are highlighted by explicitly presenting the relative gains of the DEA and S+DEA over other encoding and compression algorithms.

4.5.2. Bits per Pixel (BPP)

The BPP metric represents the number of bits required to encode a single pixel. A lower BPP value indicates higher compression efficiency, as fewer bits are needed per pixel. Table 5 presents the BPP values calculated for all images in the datasets. For encoding algorithms, BPP values were computed for HCA, LPHEA, HBLIES, and the DEA. For compression algorithms, BPP values were calculated for the S+DEA compression algorithm and compared against BPG, JPEG-LS, and JPEG2000.

4.5.3. Space Saving (SS)

The SS metric quantifies the reduction in file size as a proportion of the original uncompressed size, serving as an indicator of compression efficiency. Table 6 presents the SS values calculated for all images in the datasets. For encoding algorithms, SS values were computed for HCA, LPHEA, HBLIES, and the DEA. The results demonstrated that while HCA provided a baseline level of storage reduction, the DEA achieved significantly greater storage savings across all image types. For compression algorithms, SS values were calculated for the S+DEA compression algorithm and compared against BPG, JPEG-LS, and JPEG2000. The findings indicate that incorporating segmentation as a preprocessing step within the DEA, leading to the S+DEA, further enhanced storage efficiency, achieving even greater space savings.
To better illustrate these improvements, Table 6 indicates the relative gains of the DEA and S+DEA, emphasizing their performance in storage optimization.

4.5.4. Graphical Analysis

Graphs were generated based on the average results of six images taken from each of the four different datasets. These graphs present data from the comparison algorithms, the proposed innovative DEA, and the S+DEA.
The graphs are organized as follows:
  • Figure 10 presents the CR (Compression Ratio) values,
  • Figure 11 displays the BPP (Bits Per Pixel) values,
  • Figure 12 illustrates the SS (Space Saving) values.
These visual representations facilitate a clearer comparative analysis, highlighting the performance of the proposed methods against well-established comparison algorithms.

4.6. Efficiency of the Algorithms

Improvement percentage (IP) values, defined in Equation (5), were calculated to enable a comparative analysis of the algorithms’ performance.
I m p o v e m e n t   P e r c e n t a g e = n e w   v a l u e o r i g i n a l   v a l u e o r i g i n a l   v a l u e × 100 %  
IP values were calculated for each image individually to facilitate a comparison of the algorithms.
Based on the comparative information presented in Table 7, the following analyses can be derived:
The DEA demonstrated significant success across all image types within the datasets, proving its effectiveness. To evaluate overall performance, average IP values were calculated for all images.
Compared to the original images, the results indicated that:
  • The HCA achieved an IP value of 13.81;
  • The LPHEA achieved an IP value of 14.03;
  • The HBLIES algorithm achieved an IP value of 41.16;
  • The DEA achieved an IP value of 45.12.
Meanwhile, the DEA outperformed all benchmark encoding algorithms, demonstrating its superior performance.
When analyzing the average values across all images, it is evident that the S+DEA consistently outperformed BPG and JPEG2000. However, it lagged slightly behind JPEG-LS by a very small margin.
To further analyze this situation, a dataset-specific evaluation was required. The assessments conducted for each dataset are presented below.
  • In Dataset 1, the S+DEA outperformed the other compression algorithms, achieving an average IP value of 45.12;
  • In Dataset 2, which consists of large satellite images, the S+DEA achieved an average IP value of 59.64, significantly outperforming BPG, JPEG2000 and JPEG-LS;
  • In Dataset 3, which consists of medical images, the BPG, JPEG-LS, JPEG2000 and S+DEA algorithms achieved an average IP value of 50.26, 58.76, 58.58, and 51.66, respectively. The S+DEA performed better than BPG but fell slightly short of the performance levels achieved by JPEG-LS and JPEG2000. JPEG-LS and JPEG2000 achieved a certain level of success in medical imaging due to their encoding methods, which predict pixel values based on their neighboring pixels,
  • In Dataset 4, the S+DEA compression algorithm outperformed the benchmark algorithms BPG, JPEG2000, and JPEG-LS, achieving an average IP value of 37.05.

5. Discussion and Conclusions

The increasing size of image data in today’s world poses significant challenges and costs in terms of transmission and storage. As a result, image compression has become one of the most frequently studied and widely researched topics in the literature, attracting considerable interest of researchers. This study addressed these challenges and provided notable contributions to the field, which are summarized as follows:
(i).
DEA: The proposed algorithm presents an innovative method for dynamically optimizing bit lengths based on character succession frequencies. This method achieved better compression ratios.
(ii).
S+DEA: The dual-phase framework, combining segmentation as a preprocessing step with the DEA, led to notable improvements in compression efficiency.
(iii).
General-purpose specialized data structure: A specialized data structure was developed to store segmented image parts without data loss. Its applicability extends beyond image compression to various data-processing scenarios.
These methodologies were validated on various datasets, including medical, satellite, and agricultural images, demonstrating their versatility, robustness, and reliability.
The algorithms and approaches were evaluated and compared using the CR, BPP, SS, and IP metrics, utilizing datasets and images with varied color tones and characteristics. The evaluation of the results demonstrated that the DEA consistently achieved significantly higher performance compared to the defined reference algorithms across all images. When the segmentation preprocessing step was included, the S+DEA demonstrated a slight but noticeable improvement in performance. Notably, the S+DEA demonstrated strong performance across various datasets, excelling particularly in smaller datasets, while maintaining competitive efficiency in larger datasets with high-resolution images. Although this contribution may appear modest, its effectiveness becomes more evident when considering the resolutions and sizes of images in datasets such as satellite imagery or medical image collections. Overall, the S+DEA achieved significantly high performance, validating its relevance and adaptability in various applications.
When the benchmarking algorithms were evaluated, it was observed that while the proposed method generally performed well against comparison algorithms, its performance in the medical image dataset was slightly lower than that of the JPEG-LS algorithm. This is primarily due to the encoding methods used by JPEG-LS and JPEG2000, which predict pixel values based on their neighboring pixels. This approach is particularly well suited for capturing the subtle grayscale variations present in medical images.
Therefore, to ensure that the S+DEA algorithm surpasses JPEG-LS across all image types, future research will explore alternative preprocessing techniques, such as wavelet-based preprocessing, regional contrast enhancement, or principal component analysis (PCA), specifically tailored for datasets where segmentation proves ineffective.
Additionally, for medical images, the application of the DEA in combination with region-of-interest (ROI)-based segmentation, which is considered a gold-standard approach, will be evaluated to further enhance compression efficiency.
Moreover, given the need for real-time compression and transmission, the processes of image segmentation and subsequent packaging of segments may introduce limited delays. Due to the relatively modest contribution of the segmentation step, it is recommended to use the DEA alone for real-time applications. Conversely, for applications prioritizing archival and storage over real-time processing, the use of the S+DEA is advised, given its higher compression performance. These findings have the potential to establish a new benchmark in data compression, both in theory and practice, while also serving as a strong foundation for future research.
The next phase of this study aims to evolve the proposed algorithm and approaches from a lossless framework to a near-lossless algorithm and to adapt them for application to 3D images, further enriching the literature with these advancements. Additionally, the DEA algorithm will be evaluated with different preprocessing techniques, aiming to develop a specialized compression algorithm for medical images.
In conclusion, this study provides new insights into image compression, addressing both theoretical and practical aspects. The proposed algorithm and framework contribute to the development of more efficient and adaptable data management solutions in the digital era.

Author Contributions

Conceptualization, E.E. and A.Ö.; methodology, E.E.; software, A.Ö.; validation, A.Ö.; formal analysis, E.E.; investigation, A.Ö.; resources, E.E.; data curation, A.Ö.; writing—original draft preparation, E.E.; writing—review and editing, E.E. and A.Ö.; visualization, E.E.; supervision, E.E.; project administration, E.E. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The study utilized datasets obtained from SIPI Image Database-Misc. (https://sipi.usc.edu/database/database.php?volume=misc, accessed on 24 February 2025), Draper Satellite Image Chronology (https://www.kaggle.com/competitions/draper-satellite-image-chronology, accessed on 24 February 2025), Chest X-Ray Images (https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia, accessed on 24 February 2025), and Agriculture Crop Images (https://www.kaggle.com/datasets/aman2000jaiswal/agriculture-crop-images, accessed on 24 February 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Schwarz, H.; Marpe, D.; Wiegand, T. Overview of the Scalable Video Coding Extension of the H.264/AVC Standard. IEEE Trans. Circuits Syst. Video Technol. 2007, 17, 1103–1120. [Google Scholar] [CrossRef]
  2. Mishra, D.; Singh, S.K.; Singh, R.K. Deep Architectures for Image Compression: A Critical Review. Signal Process. 2022, 191, 108346. [Google Scholar] [CrossRef]
  3. Hu, Y.; Yang, W.; Ma, Z.; Liu, J. Learning End-to-End Lossy Image Compression: A Benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 4194–4211. [Google Scholar] [CrossRef] [PubMed]
  4. Hussain, A.J.; Al-Fayadh, A.; Radi, N. Image Compression Techniques: A Survey in Lossless and Lossy Algorithms. Neurocomputing 2018, 300, 44–69. [Google Scholar] [CrossRef]
  5. Netravali, A.N.; Limb, J.O. Picture Coding: A Review. Proc. IEEE 1980, 68, 366–406. [Google Scholar] [CrossRef]
  6. Sandeep, P.; Reddy, K.N.; Teja, N.R.; Reddy, G.K.; Kavitha, S. Advancements in Image Compression Techniques: A Comprehensive Review. In Proceedings of the 2023 2nd International Conference on Edge Computing and Applications (ICECAA), Namakkal, India, 19–21 July 2023; pp. 821–826. [Google Scholar]
  7. Kaur, R.; Choudhary, P. A Review of Image Compression Techniques. Int. J. Comput. Appl. 2016, 142, 8–11. [Google Scholar] [CrossRef]
  8. Lucas, L.F.R.; Rodrigues, N.M.M.; Da Silva Cruz, L.A.; De Faria, S.M.M. Lossless Compression of Medical Images Using 3-D Predictors. IEEE Trans. Med. Imaging 2017, 36, 2250–2260. [Google Scholar] [CrossRef]
  9. Ergüzen, A.; Erdal, E. An Efficient Middle Layer Platform for Medical Imaging Archives. J. Healthc. Eng. 2018, 2018, 3984061. [Google Scholar] [CrossRef]
  10. Deigant, Y.; Akshat, V.; Raunak, H.; Pranjal, P.; Avi, J. A Proposed Method for Lossless Image Compression in Nano-Satellite Systems. In Proceedings of the 2017 IEEE Aerospace Conference, Big Sky, MT, USA, 4–11 March 2017; pp. 1–11. [Google Scholar]
  11. Altamimi, A.; Ben Youssef, B. Lossless and Near-Lossless Compression Algorithms for Remotely Sensed Hyperspectral Images. Entropy 2024, 26, 316. [Google Scholar] [CrossRef]
  12. Jain, A.K. Image Data Compression: A Review. Proc. IEEE 1981, 69, 349–389. [Google Scholar] [CrossRef]
  13. Jamil, S. Review of Image Quality Assessment Methods for Compressed Images. J. Imaging 2024, 10, 113. [Google Scholar] [CrossRef] [PubMed]
  14. Wei, J.; Mi, L.; Hu, Y.; Ling, J.; Li, Y.; Chen, Z. Effects of Lossy Compression on Remote Sensing Image Classification Based on Convolutional Sparse Coding. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
  15. Hu, J.; Song, S.; Gong, Y. Comparative Performance Analysis of Web Image Compression. In Proceedings of the 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Shanghai, China, 14–16 October 2017; pp. 1–5. [Google Scholar]
  16. Correa, J.D.A.; Pinto, A.S.R.; Montez, C. Lossy Data Compression for IoT Sensors: A Review. Internet Things 2022, 19, 100516. [Google Scholar] [CrossRef]
  17. Nguyen, B.; Ma, W.; Tran, D. Investigating the Effects of Lossy Compression on Age, Gender and Alcoholic Information in EEG Signals. Procedia Comput. Sci. 2019, 159, 231–240. [Google Scholar] [CrossRef]
  18. Gonzalez, R.C.; Woods, R.E. Digital Image Processing, 4th ed.; Global Edition; Pearson: New York, NY, USA, 2017; ISBN 978-0-13-335672-4. [Google Scholar]
  19. Setiawan, W.; Wahyudin, A.; Widianto, G.R. The Use of Scale Invariant Feature Transform (SIFT) Algorithms to Identification Garbage Images Based on Product Label. In Proceedings of the 2017 3rd International Conference on Science in Information Technology (ICSITech), Bandung, Indonesia, 25–26 October 2017; pp. 336–341. [Google Scholar]
  20. Sneyers, J.; Wuille, P. FLIF: Free Lossless Image Format Based on MANIAC Compression. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 66–70. [Google Scholar]
  21. Pennebaker, W.B.; Mitchell, J.L. JPEG Still Image Data Compression Standard; Van Nostrand Reinhold: New York, NY, USA, 1992; ISBN 978-0-442-01272-4. [Google Scholar]
  22. Weinberger, M.J.; Seroussi, G.; Sapiro, G. The LOCO-I Lossless Image Compression Algorithm: Principles and Standardization into JPEG-LS. IEEE Trans. Image Process. 2000, 9, 1309–1324. [Google Scholar] [CrossRef]
  23. Huffman, D.A. A Method for the Construction of Minimum-Redundancy Codes. Proc. IRE 1952, 40, 1098–1101. [Google Scholar] [CrossRef]
  24. Sayood, K. Arithmetic Coding. In Introduction to Data Compression; Elsevier: Amsterdam, The Netherlands, 2018; pp. 89–130. ISBN 978-0-12-809474-7. [Google Scholar]
  25. Zhou, M.; Gao, W.; Jiang, M.; Yu, H. HEVC Lossless Coding and Improvements. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1839–1843. [Google Scholar] [CrossRef]
  26. Lyu, Y.; Han, K.-H.; Sullivan, G.J. Improved Lossless Intra Coding for H.264/MPEG-4 AVC. IEEE Trans. Image Process. 2006, 15, 2610–2615. [Google Scholar] [CrossRef]
  27. Liu, Y.; Deforges, O.; Samrouth, K. LAR-LLC: A Low-Complexity Multiresolution Lossless Image Codec. IEEE Trans. Circuits Syst. Video Technol. 2016, 26, 1490–1501. [Google Scholar] [CrossRef]
  28. Calderbank, A.R.; Daubechies, I.; Sweldens, W.; Yeo, B.-L. Lossless Image Compression Using Integer to Integer Wavelet Transforms. In Proceedings of the International Conference on Image Processing, Santa Barbara, CA, USA, 26–29 October 1997; Volume 1, pp. 596–599. [Google Scholar]
  29. Erdal, E. Huffman-Based Lossless Image Encoding Scheme. J. Electron. Imaging 2021, 30, 053004. [Google Scholar] [CrossRef]
  30. Erdal, E.; Ergüzen, A. An Efficient Encoding Algorithm Using Local Path on Huffman Encoding Algorithm for Compression. Appl. Sci. 2019, 9, 782. [Google Scholar] [CrossRef]
  31. Mishra, D.K.; Kumar, A.; Rathor, V.S.; Singh, G.K. Hybrid Technique for Crop Image Compression Using Discrete Wavelet Transform and Sparse Singular Vector Reconstruction. Comput. Electron. Agric. 2023, 215, 108391. [Google Scholar] [CrossRef]
  32. Zerva, M.C.H.; Christou, V.; Giannakeas, N.; Tzallas, A.T.; Kondi, L.P. An Improved Medical Image Compression Method Based on Wavelet Difference Reduction. IEEE Access 2023, 11, 18026–18037. [Google Scholar] [CrossRef]
  33. Umbaugh, S.E. Computer Imaging: Digital Image Analysis and Processing; Taylor & Francis: Boca Raton, FL, USA, 2005; ISBN 978-0-8493-2919-7. [Google Scholar]
  34. Langdon, G.G. An Introduction to Arithmetic Coding. IBM J. Res. Dev. 1984, 28, 135–149. [Google Scholar] [CrossRef]
  35. Capon, J. A Probabilistic Model for Run-Length Coding of Pictures. IEEE Trans. Inf. Theory 1959, 5, 157–163. [Google Scholar] [CrossRef]
  36. Jayasankar, U.; Thirumal, V.; Ponnurangam, D. A Survey on Data Compression Techniques: From the Perspective of Data Quality, Coding Schemes, Data Type and Applications. J. King Saud Univ.-Comput. Inf. Sci. 2021, 33, 119–140. [Google Scholar] [CrossRef]
  37. Ziv, J.; Lempel, A. A Universal Algorithm for Sequential Data Compression. IEEE Trans. Inf. Theory 1977, 23, 337–343. [Google Scholar] [CrossRef]
  38. Ziv, J.; Lempel, A. Compression of Individual Sequences via Variable-Rate Coding. IEEE Trans. Inf. Theory 1978, 24, 530–536. [Google Scholar] [CrossRef]
  39. Welch, A. Technique for High-Performance Data Compression. Computer 1984, 17, 8–19. [Google Scholar] [CrossRef]
  40. Doukas, C.; Maglogiannis, I. Region of Interest Coding Techniques for Medical Image Compression. IEEE Eng. Med. Biol. Mag. 2007, 26, 29–35. [Google Scholar] [CrossRef]
  41. Ansari, M.A.; Anand, R.S. Context Based Medical Image Compression for Ultrasound Images with Contextual Set Partitioning in Hierarchical Trees Algorithm. Adv. Eng. Softw. 2009, 40, 487–496. [Google Scholar] [CrossRef]
  42. Kaur, M.; Wasson, V. ROI Based Medical Image Compression for Telemedicine Application. Procedia Comput. Sci. 2015, 70, 579–585. [Google Scholar] [CrossRef]
  43. Gowda, D.; Sharma, A.; Rajesh, L.; Rahman, M.; Yasmin, G.; Sarma, P.; Pazhani, A.A. A Novel Method of Data Compression Using ROI for Biomedical 2D Images. Meas. Sens. 2022, 24, 100439. [Google Scholar] [CrossRef]
  44. Abdellatif, H.; Taha, T.E.; El-Shanawany, R.; Zahran, O.; Abd El-Samie, F.E. Efficient ROI-Based Compression of Mammography Images. Biomed. Signal Process. Control 2022, 77, 103721. [Google Scholar] [CrossRef]
  45. Ratakonda, K.; Ahuja, N. Lossless Image Compression with Multiscale Segmentation. IEEE Trans. Image Process. 2002, 11, 1228–1237. [Google Scholar] [CrossRef]
  46. Schmalz, M.S.; Ritter, G.X. Region Segmentation Techniques for Object-Based Image Compression: A Review. In Proceedings of the Optical Science and Technology, the SPIE 49th Annual Meeting, Denver, CO, USA, 2–6 August 2004; pp. 62–75. [Google Scholar]
  47. Shen, L.; Rangayyan, R.M. A Segmentation-Based Lossless Image Coding Method for High-Resolution Medical Image Compression. IEEE Trans. Med. Imaging 1997, 16, 301–307. [Google Scholar] [CrossRef]
  48. Felzenszwalb, P.F.; Huttenlocher, D.P. Efficient Graph-Based Image Segmentation. Int. J. Comput. Vis. 2004, 59, 167–181. [Google Scholar] [CrossRef]
  49. Agrawal, N.; Aurelia, S. A Review on Segmentation of Vitiligo Image. IOP Conf. Ser. Mater. Sci. Eng. 2021, 1131, 012003. [Google Scholar] [CrossRef]
  50. SIPI Image Database-Misc. Available online: https://sipi.usc.edu/database/database.php?volume=misc (accessed on 31 December 2024).
  51. Draper Satellite Image Chronology. Available online: https://www.kaggle.com/competitions/draper-satellite-image-chronology (accessed on 31 December 2024).
  52. Chest X-Ray Images (Pneumonia). Available online: https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia (accessed on 31 December 2024).
  53. Agriculture Crop Images. Available online: https://www.kaggle.com/datasets/aman2000jaiswal/agriculture-crop-images (accessed on 31 December 2024).
  54. Ungureanu, V.-I.; Negirla, P.; Korodi, A. Image-Compression Techniques: Classical and “Region-of-Interest-Based” Approaches Presented in Recent Papers. Sensors 2024, 24, 791. [Google Scholar] [CrossRef]
  55. BPG Image Format. Available online: https://bellard.org/bpg/ (accessed on 24 February 2025).
  56. ISO/IEC JTC 1/SC 29; Coding of Audio, Picture, Multimedia and Hypermedia Information. ISO: Geneva, Switzerland.
  57. Taubman, D.S. JPEG2000: Image Compression Fundamentals, Standards and Practice. J. Electron. Imaging 2002, 11, 286. [Google Scholar] [CrossRef]
Figure 1. The representation of a digital image: (a) continuous—f(x,y); (b) digital pixel values of Baboon—f(I,j).
Figure 1. The representation of a digital image: (a) continuous—f(x,y); (b) digital pixel values of Baboon—f(I,j).
Applsci 15 02964 g001
Figure 2. General compression process. (a) Compression. (b) Decompression.
Figure 2. General compression process. (a) Compression. (b) Decompression.
Applsci 15 02964 g002
Figure 3. Block diagram of proposed framework (S+DEA).
Figure 3. Block diagram of proposed framework (S+DEA).
Applsci 15 02964 g003
Figure 4. Segmentation process results. (a) Input image, (bk) image segments.
Figure 4. Segmentation process results. (a) Input image, (bk) image segments.
Applsci 15 02964 g004
Figure 5. Header data structure.
Figure 5. Header data structure.
Applsci 15 02964 g005
Figure 6. Segment data structure.
Figure 6. Segment data structure.
Applsci 15 02964 g006
Figure 7. Flowchart of the DEA (Do for each segment).
Figure 7. Flowchart of the DEA (Do for each segment).
Applsci 15 02964 g007
Figure 8. Flowchart of the DEA process example steps ((i)–(iv) defined above).
Figure 8. Flowchart of the DEA process example steps ((i)–(iv) defined above).
Applsci 15 02964 g008
Figure 9. Four datasets and test images.
Figure 9. Four datasets and test images.
Applsci 15 02964 g009
Figure 10. CR values of the algorithms (higher is better).
Figure 10. CR values of the algorithms (higher is better).
Applsci 15 02964 g010
Figure 11. BPP values of the algorithms (lower is better).
Figure 11. BPP values of the algorithms (lower is better).
Applsci 15 02964 g011
Figure 12. SS values of the algorithms (higher is better).
Figure 12. SS values of the algorithms (higher is better).
Applsci 15 02964 g012
Table 1. Sample datasets and test image details.
Table 1. Sample datasets and test image details.
DatasetImage CodeImage Name TagResolutionSize (Bytes)
USC-SIPI [50]4.1.051.1256 × 256196,608
4.2.031.2512 × 512786,432
4.2.071.3512 × 512786,432
5.1.121.4256 × 256196,608
5.3.011.51024 × 10243,145,728
7.2.011.61024 × 10243,145,728
Draper Satellite Image
[51]
set120_12.13099 × 232921,652,713
set162_42.23099 × 232921,652,713
set192_52.33099 × 232921,652,713
set213_32.43099 × 232921,652,713
set237_42.53099 × 232921,652,713
set260_42.63099 × 232921,652,713
Chest X-Ray Images
[52]
IM-0005-00013.12031 × 183711,192,841
IM-0007-00013.22053 × 181811,197,062
IM-0013-00013.32444 × 215515,800,460
IM-0029-00013.42343 × 213915,035,031
IM-0035-00013.52480 × 232917,327,760
IM-0110-00013.62251 × 182812,344,484
Agriculture Crop Images
[53]
maize028a4.1224 × 224150,528
maize032a4.2224 × 224150,528
rice007a4.3224 × 224150,528
rice011a4.4224 × 224150,528
wheat0002a4.5224 × 224150,528
wheat023a4.6224 × 224150,528
Table 2. Palette numbers for R, G, and B values of the Baboon image.
Table 2. Palette numbers for R, G, and B values of the Baboon image.
ImageRGB
Baboon139138140
Table 3. Palette numbers for R, G, and B values of the Baboon image segments.
Table 3. Palette numbers for R, G, and B values of the Baboon image segments.
Segment NumberRGB
Segment 1124124127
Segment 2808381
Segment 3525861
Segment 4656566
Segment 5525250
Segment 6455154
Segment 7868884
Segment 8737778
Segment 9364038
Segment 10908987
Table 4. CR values and performance comparisons of algorithms.
Table 4. CR values and performance comparisons of algorithms.
Image Name TagEncoding AlgorithmsCompression Algorithms
HCA [23]LPHEA
[30]
HBLIES
[29]
DEABPG
[55]
JPEG
LS
[56]
JPEG
2000
[57]
S+DEA
1.11.271.261.841.961.811.881.872.08
1.21.051.021.181.331.251.411.421.46
1.31.101.091.561.601.601.681.641.74
1.41.211.121.962.112.132.312.132.32
1.51.061.061.461.571.631.661.661.69
1.61.411.391.691.771.721.751.731.79
2.11.331.321.721.811.751.811.811.83
2.21.761.783.353.403.193.332.913.51
2.31.281.302.322.392.452.882.482.48
2.41.211.222.042.111.811.881.782.18
2.51.491.492.842.872.232.202.102.92
2.61.121.142.412.503.153.743.552.58
3.11.051.061.921.961.962.312.301.97
3.21.041.051.951.991.972.322.312.07
3.31.051.061.951.981.962.282.281.99
3.41.081.091.871.911.882.182.191.91
3.51.051.051.881.921.902.272.271.93
3.61.081.142.662.682.513.703.582.73
4.11.071.081.281.461.321.541.531.61
4.21.201.201.231.361.301.521.531.54
4.31.131.141.151.311.191.321.281.51
4.41.111.151.521.721.611.851.761.86
4.51.091.091.281.441.371.541.541.58
4.61.071.071.181.371.291.481.451.49
Table 5. BPP values and performance comparisons of algorithms.
Table 5. BPP values and performance comparisons of algorithms.
Image Name TagEncoding AlgorithmsCompression Algorithms
HCA [23]LPHEA
[30]
HBLIES
[29]
DEABPG
[55]
JPEG
LS
[56]
JPEG
2000
[57]
S+DEA
1.119.2819.4213.2912.4913.5013.0013.1311.77
1.223.0723.6820.3618.1319.3817.1317.0016.47
1.321.9622.2115.4815.0815.1314.3814.7513.90
1.420.2021.8712.4911.6011.5010.6011.5010.56
1.522.6422.6216.4815.2914.7214.4914.4714.26
1.617.0317.2414.2113.5514.0013.7513.8813.40
2.118.1118.1913.9913.2613.6913.2913.2513.12
2.213.6013.507.177.067.537.218.256.85
2.318.6818.5110.3310.059.808.349.689.68
2.419.8719.6811.7811.3913.2912.7613.4910.99
2.516.1316.078.468.3710.7910.9211.448.22
2.621.4421.039.989.617.636.436.769.29
3.122.8222.6912.4812.2312.2310.4210.4612.19
3.223.0122.8312.2812.0812.2110.3710.4111.62
3.322.8722.6512.3212.1012.2410.5410.5212.07
3.422.2021.9912.8312.5812.7911.0210.9812.55
3.522.8622.8112.7912.4912.6110.5710.5812.42
3.622.2021.139.018.979.576.506.718.79
4.122.6022.3718.8416.5918.2915.6715.8415.02
4.220.1520.1219.7117.7218.6115.9015.7615.73
4.321.3621.1321.0718.4620.2418.2918.9415.99
4.421.6820.9615.8614.0415.0213.0613.7112.96
4.522.0822.0818.8716.8217.6315.6715.6715.33
4.622.6522.4920.4117.6018.7816.3216.6416.24
Table 6. SS values and performance comparisons of algorithms.
Table 6. SS values and performance comparisons of algorithms.
Image Name TagEncoding AlgorithmsCompression Algorithms
HCA [23]LPHEA
[30]
HBLIES
[29]
DEABPG
[55]
JPEG
LS
[56]
JPEG
2000
[57]
S+DEA
1.10.210.210.460.490.450.470.460.52
1.20.040.020.160.250.200.290.300.32
1.30.090.080.360.370.370.400.390.42
1.40.180.110.490.530.530.570.530.57
1.50.060.060.310.360.390.400.400.41
1.60.290.280.410.440.420.430.420.44
2.10.250.240.420.450.430.450.450.45
2.20.430.440.700.710.690.700.660.71
2.30.220.230.570.580.590.650.600.60
2.40.170.180.510.530.450.470.440.54
2.50.330.330.650.650.550.550.520.66
2.60.110.120.580.600.680.730.720.61
3.10.050.050.480.490.490.570.560.49
3.20.040.050.490.500.490.570.570.52
3.30.050.060.490.500.490.560.560.50
3.40.080.080.470.480.470.540.540.48
3.50.050.050.470.480.470.560.560.48
3.60.080.120.620.630.600.730.720.63
4.10.060.070.220.310.240.350.340.38
4.20.170.170.180.270.230.340.350.35
4.30.120.130.130.240.160.240.220.34
4.40.100.130.340.420.380.460.430.46
4.50.090.090.220.300.270.350.350.37
4.60.060.070.160.270.220.320.310.33
Table 7. Evaluation of algorithms based on the improvement percentage metric.
Table 7. Evaluation of algorithms based on the improvement percentage metric.
Image Name
Tag
Encoding AlgorithmsCompression Algorithms
HCA [23]LPHEA
[30]
HBLIES
[29]
DEABPG
[55]
JPEG
LS
[56]
JPEG
2000
[57]
S+DEA
1.121.2820.7545.7649.0244.9046.9446.4351.98
1.24.341.8315.5924.8419.6929.0229.5331.73
1.38.947.9535.8437.4837.3140.4138.8642.38
1.417.5410.7449.0352.6453.0656.7453.0656.88
1.55.765.8731.4036.3538.7539.7139.7940.66
1.629.1328.2740.8643.6241.7442.7842.2644.22
Dataset 1
Average
14.5012.5736.4140.6639.2442.6041.6644.64
2.124.5524.2541.7444.7942.9844.6544.8245.38
2.243.3243.7670.1370.6068.6369.9865.6371.49
2.322.1822.9256.9858.1359.1965.2759.6959.70
2.417.2118.0250.9552.5344.6246.8343.7954.21
2.532.8233.0964.7865.1555.0654.5152.3565.78
2.610.6912.3858.4259.9668.2273.2371.8361.28
Dataset 2
Average
25.1325.7457.1758.5356.4559.0856.3559.64
3.14.985.5048.0449.0949.0556.6256.4449.25
3.24.144.8948.8349.6849.1256.8156.6251.60
3.34.725.6448.6849.6048.9956.0956.1749.70
3.47.568.4146.5847.6446.7554.1254.2647.75
3.54.764.9846.7047.9747.4855.9755.9248.27
3.67.5712.0362.4762.6560.1772.9472.0863.40
Dataset 3
Average
5.626.9150.2251.1150.2658.7658.5851.66
4.16.487.4122.0331.3524.3235.1434.4637.86
4.216.6016.7218.4426.6522.9734.2134.7734.92
4.311.6112.5712.7923.6016.2224.3221.6233.81
4.410.2813.2434.3741.9037.8445.9543.2446.36
4.58.648.6221.9330.4127.0335.1435.1436.57
4.66.266.9115.5427.1722.3032.4831.1432.78
Dataset 4
Average
9.9810.9120.8530.1825.1134.5433.4037.05
Overall
Average
13.8114.0341.1645.1242.7748.7447.5048.25
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Erdal, E.; Önal, A. Enhanced Framework for Lossless Image Compression Using Image Segmentation and a Novel Dynamic Bit-Level Encoding Algorithm. Appl. Sci. 2025, 15, 2964. https://doi.org/10.3390/app15062964

AMA Style

Erdal E, Önal A. Enhanced Framework for Lossless Image Compression Using Image Segmentation and a Novel Dynamic Bit-Level Encoding Algorithm. Applied Sciences. 2025; 15(6):2964. https://doi.org/10.3390/app15062964

Chicago/Turabian Style

Erdal, Erdal, and Alperen Önal. 2025. "Enhanced Framework for Lossless Image Compression Using Image Segmentation and a Novel Dynamic Bit-Level Encoding Algorithm" Applied Sciences 15, no. 6: 2964. https://doi.org/10.3390/app15062964

APA Style

Erdal, E., & Önal, A. (2025). Enhanced Framework for Lossless Image Compression Using Image Segmentation and a Novel Dynamic Bit-Level Encoding Algorithm. Applied Sciences, 15(6), 2964. https://doi.org/10.3390/app15062964

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop