MEFormer: Enhancing Low-Light Images While Preserving Image Authenticity in Mining Environments

Sun, Zhenming; Shen, Zeqing; Chen, Ning; Pang, Shuoqi; Liu, Hui; You, Yimeng; Wang, Haoyu; Zhu, Yiran

doi:10.3390/rs17071165

Open AccessArticle

MEFormer: Enhancing Low-Light Images While Preserving Image Authenticity in Mining Environments

by

Zhenming Sun

^1,†

,

Zeqing Shen

^1,†

,

Ning Chen

^2,*,†

,

Shuoqi Pang

¹

,

Hui Liu

¹

,

Yimeng You

¹

,

Haoyu Wang

¹

and

Yiran Zhu

¹

School of Energy and Mining Engineering, China University of Mining and Technology (Beijing), Beijing 100083, China

²

Institute of Remote Sensing and Geographic Information System, Peking University, Beijing 100871, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Remote Sens. 2025, 17(7), 1165; https://doi.org/10.3390/rs17071165

Submission received: 20 January 2025 / Revised: 18 March 2025 / Accepted: 20 March 2025 / Published: 25 March 2025

(This article belongs to the Section Engineering Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

:

In mining environments, ensuring image authenticity is critical for safety monitoring. However, current low-light image enhancement methods often fail to balance optimization and fidelity, resulting in suboptimal image quality. Additionally, existing models trained on general datasets do not meet the unique demands of mining environments, which often feature challenging lighting conditions. To address this, we propose Mining Environment Transformer (MEFormer), a high-fidelity low-light image restoration network with efficient computational performance. MEFormer incorporates an innovative cross-scale feature fusion architecture, which facilitates enhanced image restoration across multiple scales. We also present the Mining Environment Low-Light (MELOL) a new dataset that captures the specific low-light conditions found in mining environments, filling the gap in available data. Experiments on public datasets and MELOL demonstrate that MEFormer achieves a 0.05 increase in the SSIM, a PSNR above 25, and an LPIPS score of 0.15. The model processes 10,000 128 × 128 images in just 2.8 s using an Nvidia H100 GPU.

Keywords:

deep learning; low-light image processing; MELOL dataset; cross-scale feature fusion; attention mechanisms

1. Introduction

In the coal mining industry, we frequently encounter challenges that necessitate monitoring and remote control [1]. The authenticity of low-light image enhancement exerts a significant impact on safety [2], as it directly affects the accuracy of environmental assessments and decision-making processes. The harsh mining environment [3,4], characterized by dust and inadequate lighting, has been shown to have a considerable effect on the health, safety, and satisfaction of operators [5,6,7]. Qiang’s research, for example, underscores the significance of inadequate lighting as a critical factor impacting safety in underground mining. This finding was ascertained through the integration of topic mining and association rule mining, which were employed to develop targeted safety hazard management strategies [8]. Li et al.’s research also highlights insufficient lighting as one of the critical safety risk factors in coal mining, emphasizing its identification through advanced text mining and Bayesian network techniques to enhance safety management [9].

Our work primarily focuses on optimizing the production system [10], which encompasses the monitoring of extraction locations on the coal mining face, the progression of tunneling, and the comprehensive process of screening, processing, and loading coal after its extraction from underground [11]. The principal issues we address include inadequate illumination in semi-enclosed spaces caused by substantial structures and long-distance transport tunnels, as well as the necessity for nighttime operations due to intensified production. Representative low-light images of these scenarios are shown in Figure 1 [12]. These challenges necessitate enhanced image optimization for remote operation during nighttime conditions, ensuring accurate and efficient monitoring of the entire production process [13].

Histogram equalization is a technique used to enhance the quality of low-light images by redistributing intensity values to create a uniform histogram. This process serves to improve contrast, reveal hidden details, and increase overall brightness and clarity. Subramani et al. [14] contributed to image enhancement by developing an adaptive fuzzy gray-level difference histogram equalization algorithm that improves contrast and reduces noise more effectively than existing methods. The Exposure Sub-Image Histogram Equalization (ESIHE) method, proposed by Singh et al. [15], represents a milestone in image enhancement, enabling effective contrast enhancement for low-exposure grayscale images. Vijayalakshmi et al. [16] advanced image enhancement by proposing a multilevel contrast enhancement framework that applies appropriate histogram equalization based on image background uniformity, effectively improving contrast and outperforming existing algorithms. Rao et al. [17] proposed a novel contrast enhancement method that outperforms conventional techniques in brightness adjustment. However, the approach may produce over-illuminated results and lacks adaptive exposure control for regional optimization, limiting its applicability in scenarios requiring precise brightness differentiation.

Retinex is a theory that has been demonstrated to enhance images captured in low-light conditions through the decomposition of the image into illumination and reflectance components. This approach enables the enhancement of lighting and the preservation of details in low-light images. Dong et al. [18] proposed an image enhancement algorithm based on the Retinex theory for coal mine exploration robots. In a related study, Wu et al. [19] introduced an unsupervised HSV-based image enhancement method for mining environments that integrates Retinex theory with a U-Net reflectance estimation network, incorporating ResNeSt blocks and multiscale channel pixel attention to achieve superior image quality without requiring normally illuminated training images. Du et al. [20] proposed a Retinex and wavelet multiscale product-based edge detection algorithm that enhances feature extraction in low-light mine images, achieving high real-time performance and accuracy for mine robot perception. Shang et al. [21] proposed an adaptive enhancement method that achieves superior edge preservation in low-visibility mining conditions compared to classical approaches. While demonstrating rapid processing capabilities, the method occasionally introduces color inaccuracies and over-enhancement artifacts that may compromise fine-detail fidelity.

The utilization of artificial intelligence in image quality enhancement involves employing convolutional neural networks for the extraction of intricate features [22,23,24], using Transformers to capture complex dependencies [25], and applying generative adversarial networks (GANs) to generate high-quality, realistic enhancements [12]. In a related study, Nan et al. [26] developed a deep learning and fusion-based mine image enhancement method that integrates CNN-based decomposition, multiple-exposure simulation, and image block fusion to significantly improve contrast and brightness in low-illumination, high-dust mining environments without causing color distortion [27]. In a similar vein, the research by Cao et al. [28] proposed a multiscale, deep learning-based method to enhance low-light images. The approach utilizes residual attention mechanisms and non-local neural networks in conjunction with noise reduction modules within a U-Net framework to extract dark details and generate attention maps. This results in superior color, contrast, and detail enhancement in comparison to existing methods. In another study, Li et al. [29] introduced a deep learning-based multiscale adaptive method for low-light image enhancement that effectively extracts dark details and reduces noise, outperforming current state-of-the-art techniques in color, contrast, and image quality. Feng et al. [30] proposed an artificial intelligence (AI)-driven framework for enhancing low-light images. The purpose of their proposal was twofold: first, to improve the visual quality of images and suppress noise; and second, to reveal potential safety risks caused by structural and color distortions in industrial scenarios like mining. The authors of this study placed special emphasis on the necessity of authenticity preservation in monitoring systems.

In this study, we address the following challenges associated with low-illumination-image processing in mining environments, where maintaining image authenticity is crucial.

Current algorithms fail to meet authenticity requirements for processed images in mining environments: Existing low-light image enhancement (LLIE) algorithms in mining environments often fail to preserve image authenticity, resulting in color distortions, increased noise, and excessive brightness that leads to distortions. Additionally, these methods may also introduce artifacts and incorrect features, undermining the semantic accuracy and reliability of the enhanced images. These limitations hinder the effective use of LLIE technologies for accurate and trustworthy visual information in mining operations.
Expanding datasets for mining environments: There is a critical demand for more datasets specific to mining environments that can enable models to accurately learn the unique illuminance characteristics of such environments. The incorporation of more relevant data into models will facilitate their ability to effectively handle the challenges posed by low-light images in mining environments, thereby improving the accuracy and realism of image enhancement techniques.

Our research introduces several key innovations that represent significant advancements in the field:

To address the issue of insufficient preservation of image authenticity in existing low-light image enhancement (LLIE) methods, we propose the Unite Features Exchange Encoder–Decoder (UniteEDBlock) structure. This structure integrates a cross-scale feature fusion architecture into the encoder–decoder framework of MEFormer, enabling the exchange and fusion of feature information across multiple scales. This integration enhances feature accuracy, ultimately leading to a significant improvement in the authenticity of low-light image enhancements in mining environments.
We have developed a new dataset named the Mining Environment Low-Light (MELOL) dataset, which encompasses a variety of mining environments, including underground tunnels, working faces, chambers, truck loading areas, and belt conveyors. The dataset consists of pairs of low-light and normal-light images, meticulously designed to address the challenges posed by inadequate illumination in mining environments. This dataset is poised to facilitate more accurate and realistic training for image enhancement models in the mining sector.
We propose a novel image enhancement framework for mining low-illumination environments, which we have designated MEFormer. This framework integrates both the Unite Axial Transformer block (UniteATBlock) and UniteEDBlock, leveraging axial-based Transformer to significantly reduce model size while enhancing expressive capability. Specifically, within its encoder–decoder framework, MEFormer utilizes UniteEDBlock to capture semantic relationships across multiple scales, thereby ensuring the authenticity of enhanced low-light images in mining environments. A comparative evaluation of MEFormer on public datasets, including LOL-v1, MIT-Adobe FiveK, and our proprietary MELOL dataset, demonstrates its superiority over both mainstream low-light enhancement algorithms and those specifically tailored for mining environments. This evaluation highlights MEFormer’s remarkable capacity to preserve image authenticity under complex lighting conditions.

2. Related Work

Onifade et al. [31] showed that the implementation of clear video surveillance in underground mines is crucial for detecting hazards, responding to emergencies, and maintaining safety and productivity in harsh mining conditions.Tan et al. [32] developed a low-light imaging framework combining computational enhancement and adaptive tracking algorithms to improve hazard detection and safety in underground coal mines, enabling accurate drill pipe counting and real-time geological risk identification in environments characterized by unstable artificial lighting. Gonzalez-Cortes et al. [33] demonstrated that enhancing low-light video quality in mining confined spaces improves hazard detection accuracy during equipment inspections, consequently advancing occupational safety despite challenges such as toxic gases and insufficient illumination. Xu et al. [34] evidenced that improving low-light video processing in coal mines helps detect hazards more efficiently, enhances safety management, and boosts productivity by reducing accidents. Hanif et al. [35] developed a lightweight enhancement framework that improves detection accuracy and operational safety in coal mines by increasing model robustness to mining-induced visual distortions and reducing reliance on complex computational resources.

Before the advent of neural networks, low-light image enhancement primarily relied on processing image parameters, particularly through histogram manipulation. This approach aimed to equalize the histogram [36], but it often resulted in a linear or upward-sloping curve that intuitively stretched the image’s tonal range. Scholars then began to explore the mapping relationships between pixels in low-light images and those in normally illuminated images, leading to the introduction of gamma transformation [37]. This method exploits the properties of an exponential function with a base between 0 and 1, where a lower

γ

value enhances pixel intensity, introducing a form of nonlinear mapping that better mimics human visual perception. Although later developments in histogram equalization incorporated nonlinear mapping, they remained limited in their ability to capture the complex nuances of real-world scenes. In contrast, Transformer-based image enhancement techniques offer a more sophisticated approach by employing self-attention mechanisms to model global dependencies and adaptively enhance image details, which represents a significant advancement over these earlier methods [38].

With the advancement of psychology and brain neuroscience, the theory of Retinex [19] was proposed, which suggests that human perception is influenced by the frequency of reflected light and the sensory cells of the eye. For example, when we close our eyes and rub them, we can perceive bright light even in total darkness, which indicates that vision can be stimulated solely by sensory cell activity [39]. This highlights that light of a fixed frequency does not uniquely correspond to one color. Instead, the brain’s neurons that are responsible for color perception would adapt to maintain color constancy under varying illumination by comparing the relative frequency relationships of different colors [40]. However, while Retinex theory provides valuable insights into visual perception, it also has its limitations [41]. Our perception of the world depends not only on the relative frequency of light, but also on high-level features such as texture, edges, and even emotions. These are factors not taken into account by Retinex theory, which focuses predominantly on the intensity relationships between different lights. Although Retinex theory provides a more accurate approximation of human visual perception compared to methods like histogram equalization and gamma transformation, it is still insufficient to fully capture the complexity of visual perception. To address these shortcomings, Tian et al. [42] proposed an improved Retinex algorithm for low-light mine image enhancement that effectively mitigates local halo blur, inadequate edge detail preservation, and noise by integrating multiscale guided filtering, the Weber–Fechner law, and CLAHE to achieve substantial metric improvements over traditional methods.

In recent years, neural networks have caused a paradigm shift in various fields of technology, including low-illumination image enhancement. These networks have been employed to model the complex relationships between low-illumination images and their well-lit counterparts, which has enabled significant advancements in the restoration and enhancement of images captured under challenging lighting conditions. Convolutional neural networks (CNNs), which utilize convolutional operations to extract multiscale information, have been the primary focus of traditional approaches to low-illumination image enhancement. Nevertheless, recent advancements have introduced new families of neural networks specifically designed for vision tasks. Notably, Transformer-based architectures and all-MLP (Multi-Layer Perceptron) models have been developed, offering innovative frameworks for processing visual data. These developments signify a significant progression in the field of image enhancement, as they transcend the constraints of CNNs and explore new paradigms for capturing and enhancing image features [43]. The attention mechanism, pioneered by Bahdanau et al. [44], was developed to address the bottleneck in encoder–decoder architectures by selectively focusing on the most relevant parts of the input sequences during translation. This concept was further expanded by Vaswani et al. [37] through the self-attention mechanism, which demonstrated its ability to capture long-range dependencies and extract intrinsic features more effectively. In the context of image enhancement, the self-attention mechanism has proven particularly valuable, as it enables Vision Transformer (ViT) to model global relationships across the image. Unlike convolutional networks that rely on local receptive fields [45], ViTs leverage self-attention to enhance low-illumination images by effectively capturing fine-grained details and global structures, which leads to more accurate and natural image restoration. Wei et al. [46] have made significant contributions to the field of intelligent monitoring and image enhancement in challenging environments, particularly through their innovative approach to improving low-illumination image recognition and personnel detection in underground coal mines. In another notable contribution, Wang et al. [47] developed LLFormer, a Transformer-based low-light image enhancement method. LLFormer utilizes axial-based multi-head self-attention and cross-layer attention fusion techniques to reduce complexity, and it was used to create a comprehensive large-scale 4K/8K benchmark database. This method was shown to outperform existing methods with improved downstream-task results.

In addition to the challenges posed by insufficient illumination in the mining sector, the field of underwater low-light image enhancement also faces significant difficulties due to the unique properties of water. In order to improve the quality and visibility of underwater low-light images, there is an imperative need to develop more robust low-light enhancement models. Therefore, the development of low-light enhancement models has been significantly influenced by the foundational work and innovative structures established in the field of underwater low-light image enhancement. Wang et al. [48] proposed an adaptive low-light image enhancement framework using virtual exposure, incorporating adaptive parameter generation, a quadratic enhancement function, and multiscale fusion (Laplacian/Gaussian pyramids). This approach eliminates the need for image calibration and camera response estimation while improving fidelity and reducing over-enhancement. Mythili et al. [49] proposed a novel underwater image enhancement method that addresses uneven lighting and water-induced noise in deep-sea/night environments, demonstrating superior color/texture restoration and human-visual consistency compared to existing methods. Ren et al. [50] proposed a large foundation model empowered discriminative underwater image enhancement framework that effectively mitigates foreground-background interference. This framework not only achieves superior visual quality and quantitative metrics compared to conventional methods, but also preserves critical textural details for marine computer vision applications.

Many existing algorithms designed to reduce the memory and computational costs of Transformer often overlook the importance of maintaining the structural similarity of enhanced images, particularly in tasks like monitoring the loading of coal onto trucks during telework. In such scenarios, enhanced images require highly accurate semantic features, which demands a robust multiscale feature fusion structure. To address this, Yang et al. [51] introduced the Enhanced Multiscale Feature Fusion Network (EMFFN) that combines multiscale spectral/spatial features through cascaded dilated convolutional and parallel multipath networks, enabling more effective feature extraction and fusion for hyperspectral image classification. Similarly, Zhu et al. [52] developed the Dynamic FPN (DyFPN), which enhances multiscale feature fusion by adaptively selecting computational branches via dynamic gating, which significantly improved the trade-off between accuracy and computational efficiency in object detection.

As previously mentioned, multiscale feature fusion methods [53,54,55], which primarily fuse features between adjacent scales [56], can lead to semantic loss if the depth of the encoder–decoder structure is too great. This issue arises from the challenge in effectively merging features across significantly different scales [57]. To address this challenge and preserve structural integrity in image enhancement within mining environments, it is necessary to employ a method capable of handling multiscale feature fusion across widely varying scales [58,59,60,61].

3. Methodology

This section delineates the methodological framework of the Low-Light Unite Transformer (MEFormer) model. It commences by establishing the foundational algorithms that form the underpinning of our approach. Subsequently, it provides a comprehensive overview of the model’s architecture, elaborating on the novel MEFormer block algorithms that constitute its core.

3.1. MEFormer Model Preliminaries

The MEFormer model consists of three parts and two primary blocks: the UniteATBlock and the UniteEDBlock. The model begins and ends with the UniteATBlock, while the middle section is composed of a UniteEDBlock, shown in Figure 2.

The UniteATBlock comprises two networks: the Axial Transformer (AT) and the Channel Unite Transformer (CUT). The AT is sequentially superimposed three times to more accurately extract features from low-light images. This process enhances the model’s ability to recover detailed lines, facial textures, colors, and other characteristics, creating a highly expressive model. The CUT aggregates channels from different layers. It assigns an attention score to each channel based on its significance. This mechanism selects the most important channels and gives more weight to these channels in the model’s expressive capability.

The UniteEDBlock consists of three sections: the encoder, UniteChNet, and decoder. The encoder processes the image at the full scale by using a pixel-unshuffle operation to downscale the spatial size and to double the number of channels. UniteChNet then constructs relationships between different feature maps. This allows feature maps of different scales to obtain useful details or structural information based on attention scores. Finally, the decoder uses pixel-shuffle to upscale the feature maps and applies weighted addition to these maps. This weighted addition enhances the useful features and suppresses the less important ones.

UniteEDBlock, as illustrated in Figure 3, is a component of the encoder block structure that facilitates the aggregation and exchange of information between feature maps of different scales within the encoder. It is based on the Transformer structure, primarily aggregating the outputs from the traditional encoder structure and selectively exchanging the information. This structure provides a bidirectional channel that allows feature maps of different scales to aggregate their features. It also enables these feature maps to selectively extract useful features from the aggregated structure, thereby enhancing the model’s expressive capability.

UniteATBlock

Inspired by LLFormer [47], we adopted its architecture as our foundational framework. Compared to traditional CNN networks, Transformers excel at associating local and global map relationships, providing strong learning capabilities. The architecture of the entire network is designed to attend to this layer and utilize its features. The Axial Transformer block offers a more flexible approach to classifying information based on axes without the need to flatten the feature map. This not only conserves computational resources and memory but also preserves the feature scale relationships. The computational complexity is reduced from quadratic to linear with respect to the number of pixels.

As shown in Figure 4, the features undergo a series of processing steps, including normalization, axial attention, and propagation through the dual-gate feed-forward network. This process compiles the features from the image within the model. Figure 5 provides a visual representation of the axial attention feature map

X \in R^{H \times W \times C}

. Initially, the feature map undergoes three

1 \times 1

convolutions to embed more features into additional channels, followed by three depth-wise

3 \times 3

convolutions. The resultant output consists of three matrices, denoted as Q, K, and V. During this process, the matrices Q, K, and V are divided by the number of heads, k, resulting in each head possessing a dimension of C/k, where C is the original channel dimension of the feature maps. Transformers are used to identify the relationships between local information within one dimension of the feature maps and global information. Consequently, the dimensions of the features are rearranged into a new configuration. For the matrix

Q \in R^{H \times C \times W}

, the last two dimensions are W and C. This process aims to switch the locations of C and W, thereby establishing a bridge variable that connects the local H dimension and the global H dimension. The matrix K is the transpose of Q such that

K = Q^{T}

, and V is equivalent to K. The self-attention mechanism is constructed by using the 9 convolutions, which yield the output. This process describes the height-axial attention, and the same procedure is applied to the width-axial attention.

As depicted in the bottom section of Figure 4, the feed-forward network has been engineered for efficient feature transformations. Given that

X \in R^{H \times W \times C}

, the feed-forward network is expressed as in Equation (1):

\begin{matrix} FFN & = ϕ (W_{3 \times 3}^{1} W_{1 \times 1}^{1} Y) ⊙ (W_{3 \times 3}^{2} W_{1 \times 1}^{2} Y) \\ + (W_{3 \times 3}^{1} W_{1 \times 1}^{1} Y) ⊙ ϕ (W_{3 \times 3}^{2} W_{1 \times 1}^{2} Y) \\ \hat{Y} & = W_{1 \times 1} DG (Y) + Y, \end{matrix}

(1)

In this equation,

\hat{Y} \in R^{H \times W \times C}

represents the output features, FFN denotes the feed-forward network, ⊙ is the element-wise multiplication operation, and

ϕ

stands for the GELU activation function.

The proposed model CUT establishes interconnections among disparate layers, thus addressing the limitations of conventional fusion information, which is typically confined to neighboring layers. This Transformer facilitates the integration of features derived from multiple layers, allowing for a more comprehensive and synergistic integration of the features from different layers.

As shown in Figure 6, the CUT receives multiple layers of feature maps

F_{x} \in R^{H \times W \times C}

and concatenates them along the channel dimension. The result of this process is given by Equation (2), in which N denotes the number of layers:

\begin{matrix} \begin{matrix} F_{y} = Concat (F_{x}, x = 1, 2, 3 . . ., \dim = channel) \\ F_{y} \in R^{H \times W \times N C} \end{matrix} \end{matrix}

(2)

Subsequently,

1 \times 1

convolutions are employed to aggregate pixel-wise cross-channel context, yielding Equation (3):

\begin{matrix} F_{y}^{'} = Conv (F_{x}, K = 1, S = 1) \end{matrix}

(3)

This is followed by

3 \times 3

depth-wise convolutions to produce Equation (4):

\begin{matrix} F_{y}^{″} = Conv (F_{x}, K = 3, S = 1) \end{matrix}

(4)

The features are then flattened to obtain Equation (5):

\begin{matrix} Q = Flatten (F_{y}^{''}) \in R^{N \times H W C} \end{matrix}

(5)

And this process is repeated three times in order to obtain K and V. Thereafter, attention is directed toward the channel dimension, leading to the generation of Equation (6), where

α

is the scale factor:

\begin{matrix} F_{y_{-} o u t} = SoftMax (\frac{Q_{x} K^{T}}{\sqrt[2]{α}}) V \end{matrix}

(6)

The resulting

F_{y_{-} o u t}

is then reshaped so as to match the shape of

F_{y}

, and a skip connection is applied, resulting in Equation (7):

\begin{matrix} F_{y_{-} o u t}^{'} = F_{y} + F_{y_{-} o u t} \end{matrix}

(7)

Finally,

1 \times 1

convolutions are applied to obtain the final result

F_{y_{-} o u t}^{''} \in R^{H \times W \times C}

.

3.2. UniteEDBlock

The information from different scale encoder blocks is represented by different feature maps

F_{x} \in R^{\frac{H}{2^{x}} \times \frac{W}{2^{x}} \times 2^{x} C} (x = 1, 2, 3, 4)

, as shown in Figure 3. The union structure initially unifies one dimension of these feature maps, enabling the Transformer to aggregate diverse information. This unification is crucial for ensuring the alignment of the dimensions of the Q, K, or V vectors, which is necessary for the effective functioning of the Transformer network. Inspired by the Swin Transformer, we employ convolutional kernels of varying sizes to process the feature maps, as depicted in Equation (8).

\begin{matrix} \begin{matrix} F_{x}^{'} = Conv (F_{x}, K_{x}, S_{x}) \\ K_{x} = (\frac{a}{2^{x}}, \frac{b}{2^{x}}); S_{x} = K_{x}; x = 1, 2, 3, 4 \end{matrix} \end{matrix}

(8)

By applying these kernels in a non-overlapping sliding manner, the downsampled feature maps are unified in the spatial dimensions, as illustrated in Equation (9).

\begin{matrix} Q_{x} = Flatten (F_{x}^{'}) \end{matrix}

(9)

Then, we concatenate these unified feature maps along the channel dimension to form the Q and V matrices, as shown in Equation (10).

\begin{matrix} K = V = Concat (Q_{1}, Q_{2}, Q_{3}, Q_{4},) \end{matrix}

(10)

Attention mechanisms are applied to the multiscale feature maps, as presented in Equation (11), where C represents the sum of the channels from all feature maps.

\begin{matrix} F_{x_{-} o u t} = SoftMax (\frac{Q_{x}^{T} K}{\sqrt[2]{C}}) V^{T} \end{matrix}

(11)

This process also aligns the d dimension, which originates from the spatial dimension, to apply multi-head attention. The d dimension serves as a bridge variable, connecting information from different local spatial feature maps and integrating all global information. This construction captures the relationships between different feature shapes, even when the feature maps are far apart. The aim is to express more accurate local and global semantic relationships. Finally, super-resolution is applied to these feature maps, followed by reshaping to recover their original spatial dimensions, as shown in Equation (12).

\begin{matrix} F_{x_{-} o u t}^{'} = upsample (F_{x_{-} o u t}, \frac{a}{2^{x}} \frac{b}{2^{x}}) \end{matrix}

(12)

As illustrated in Figure 7, the feature maps at smaller spatial scales exhibit a higher channel proportion, enabling the retention of more detailed information. This structural design endeavors to capture accurate semantic details through the Transformer-based architecture. Concurrently, the larger spatial scale feature maps leverage the detailed information from the smaller-scale maps to enhance their semantic accuracy. This dual-scale approach has the potential to significantly improve the model’s capacity to accurately monitor images, which is particularly crucial in indoor mining environments. The accurate identification of semantic and structural features are imperative for effectively monitoring of these environments, highlighting the importance and potential of this model in such applications.

3.3. MELOL Dataset

The MELOL (Mining Environment Low-Light) dataset is a meticulously curated collection of imagery that aims to address the challenges posed by low-light conditions in mining environments. This dataset comprises a comprehensive array of paired low-light and normal-light images, customized to mirror the diverse and demanding conditions that are frequently encountered in mining operations.

Figure 1 presents a series of real low-light images taken from the MELOL dataset, offering a comprehensive representation of the diversity and intricacy of the captured environments. Figure 8, also from the MELOL dataset, delves into additional sections, emphasizing specific areas such as the ground production system (images d, e, f) and underground scenes (images a, b, c). These six images exemplify the typical illumination characteristics found in mining settings, including significant occlusions caused by large structural elements and the presence of uneven lighting within linear tunnels.

The MELOL dataset encompasses a total of 1622 images, which have been meticulously categorized across a range of specific scenes, as outlined in Table 1. The data collection process entailed the utilization of multiple sources to ensure comprehensive coverage and diversity:

Real-time underground monitoring footage: Obtained through the utilization of advanced monitoring systems, this visual documentation offers a genuine depiction of subterranean environments in conditions of limited luminosity.
Surface loading systems recordings: The objective of incorporating real-time recordings of surface operations was to establish a contrast between the illumination scenarios present in the subterranean environment and those experienced at the surface.
GoPro-captured coal loading images: The utilization of GoPro cameras during coal loading activities facilitated the acquisition of dynamic and high-resolution images during coal loading activities, which could enhance the dataset’s applicability to real-world mining tasks.
Photos taken inside silos: Detailed imagery from within silos introduces an additional dimension of complexity and variability to the dataset, given the distinct lighting and structural characteristics present within these structures.

Table 1. MELOL dataset.

	Underground Mining Scenes			Ground Production System Scenes
	Working Face	Roadways	Chambers	Semi-Enclosed Scenes	Indoor Scenes
Number of Images	310	277	255	375	405

In addition to the aforementioned sources, a subset of the MELOL dataset incorporates images from the publicly available DsLMF+ dataset [62]. The DsLMF+ dataset, originally designed for the classification of subterranean objects, enriches MELOL by introducing a broader range of low-light conditions and environmental settings. This integration significantly enhances the dataset’s diversity and its relevance to various low-light scenarios encountered in underground mining operations.

In order to guarantee the highest possible quality and pertinence of the dataset under consideration, the following rigorous processing and validation steps were undertaken:

Illumination adjustment: A meticulous processing of all images was conducted by using the Adobe Lightroom software 2020 to create low-light and normal-light versions that were accurately paired. This process entailed precise adjustments to illumination levels, ensuring that each pair authentically represented the transition from challenging low-light conditions to optimal visibility.
Quality assurance through expert evaluation: A panel of 20 mining engineering students was tasked with evaluating a set of image pairs. Their assessments focused on the authenticity and quality of the images, ensuring that the dataset reliably mirrors real-world mining environments. This peer-reviewed validation process was crucial in maintaining the dataset’s integrity and applicability.

Through these comprehensive efforts, the MELOL dataset serves as a robust resource for developing and evaluating image processing algorithms that are aimed at enhancing visibility and operational safety in low-light mining environments. The dataset is notable for its diverse scenarios, multiple capture devices, balanced image distribution, and professional verification. These characteristics collectively ensure that MELOL is well suited to support advancements in low-light image enhancement and related research within the mining industry.

Although the current MELOL dataset covers various aspects of coal mine environments, it does not adequately address issues such as image blurring caused by dust and vibrations. Additionally, the dataset continues to exhibit limitations with respect to low-light conditions, as it is deficient in images captured under conditions of extreme darkness. Furthermore, the annotation of lighting conditions remains insufficient, suggesting the necessity for enhanced capture of a more extensive range of illumination scenarios.

4. Experiments and Analysis

4.1. Implementation Details

In the training process for the MEFormer model presented in this paper, input images were divided into

128 \times 128

patches. The training was conducted using an NVIDIA 3090 GPU, with a batch size of 15. The loss function used was Smooth L1 loss, and the optimizer was Adam. A cosine annealing schedule was employed for learning rate decay, starting with an initial learning rate of

10^{- 3}

and a minimum learning rate of

10^{- 6}

. The model was trained for a total of 20,000 epochs.

For benchmarking, we compared a total of eight models, including the PSMUCM algorithm, which is specifically used for image enhancement in the mining field, and seven representative public algorithms: Retinex-Net, URetinex-Net, MIRNetv2, Retinexformer, EnlightenGAN, Zero-DCE, and LLFormer. Training data for the public algorithms were selected from existing publicly available research papers. For testing on our MELOL dataset, each model was fine-tuned with transfer learning for an additional 3000 epochs using the MELOL data.

4.2. Evaluation Metrics

In image quality evaluation, to verify the general applicability of our method and meet the specific requirements for structural integrity and authenticity in the mining industry, we adopted four evaluation metrics. These included the peak signal-to-noise ratio (PSNR), for assessing image restoration at the pixel level; the SSIM, for evaluating structural consistency; and the learned perceptual image patch similarity (LPIPS), for perceptual similarity. Additionally, we used the no-reference evaluation metric BRISQUE, which reflects human perception of image authenticity.

4.3. Low-Light Image Dataset

LOL (Low-Light) dataset: This dataset contains both low-light and normal-light images [63]. It is commonly used for training and evaluating low-light image enhancement algorithms. The dataset provides paired images, allowing researchers to perform supervised learning for enhancement tasks.
DARK FACE dataset: Focused on face detection in low-light conditions, this dataset includes images captured in various low-light environments with different illumination levels [64]. It is designed to benchmark the performance of face detection algorithms under challenging lighting conditions.
SID (See-in-the-Dark) dataset: This dataset consists of raw short-exposure images taken in extremely low light conditions, along with corresponding long-exposure reference images [65]. It is used for tasks such as raw image denoising and enhancement.
MIT-Adobe FiveK: This dataset is a widely used collection of high-resolution photographs intended for research in the field of image enhancement, editing, and processing [66]. This dataset consists of 5000 images captured by various photographers using different cameras and under diverse lighting conditions. Each image in the dataset is accompanied by five expert-retouched versions, providing multiple interpretations of the best possible enhancements for each photo.
LOL-v2 (Low-Light dataset version 2): An updated version of the LOL dataset, LOL-v2 contains more challenging low-light scenarios and more diverse scene types, allowing for more robust training of low-light enhancement algorithms [67].

4.4. Comparison Results on Public Datasets

For benchmarking, we used two of the most popular datasets: LOL and MIT-Adobe FiveK. The results are as follows.

In Table 2, the first row represents the target image, which only includes values for no-reference evaluation metrics and serves as a reference. In Table 2, the symbol ’↓’ indicates that a lower value signifies better image enhancement performance. The numerical results show that our method outperforms others on the highly influential LOL-v1 dataset across all metrics when compared to the other seven methods. Notably, the improvements in the SSIM are significant, with our method achieving an increase of 0.04 compared to the method with the second-highest SSIM, Retinexformer, resulting in a final SSIM value of nearly 0.9 (with the maximum being 1.0). Additionally, our method achieves BRISQUE metrics equivalent to those of the ground truth image, highlighting its superiority over other methods. Unlike PSNR, which evaluates images at the pixel level, these two metrics better demonstrate the effectiveness of our MEFormer method in enhancing image illumination while preserving the original structural integrity of the images. On the MIT-Adobe FiveK dataset, MEFormer achieves the best scores in both the SSIM and BRISQUE, indicating its robust performance across various scenarios.

Figure 9 shows a comparison of methods with a PSNR greater than 20 on the LOL dataset. From the black box in Figure 9a, it is evident that our method closely matches the colors of the original image, with almost no color distortion. Examining the details within the black box, our method demonstrates the strongest ability to preserve image details and structure. The scattered threads maintain their overall structure better and remain clear compared to other methods. This is attributed to the UniteChNet structure in the MEFormer model, which enables the network to retain small-scale features effectively.

4.5. Comparison Results on MELOL Datasets

For the sake of convenience in experiments, we selected the three top-performing methods on public datasets for comparison. The experiments used transfer learning, where all methods were fine-tuned for an additional 3000 epochs based on their training on the LOL dataset. The MELOL dataset was then used for testing, and the results are shown in Table 3.

Table 3 presents the Params (M), FLOPS (G), and inference times (s; 10k images), which represent the model’s parameter size, the computational floating-point operations required, and the time taken to process 10,000 images of size 128 × 128 pixels on an H100 GPU, respectively. Based on the results, it can be concluded that the model proposed in this paper exhibits similar performance in terms of model size, floating-point operations, and processing speed compared to LLFormer. However, our proposed model requires fewer floating-point operations and has a faster processing speed compared to Retinexformer, with a 1 s reduction in the time required to process a single image. In comparison to the MIRNetv2 model, our model processes 10,000 images at only 22 percent of the time required by MIRNetv2. This indicates that, although our model has a larger number of parameters (due to the use of axial attention, leading to higher memory usage), it still exhibits a computational advantage over other models in terms of floating-point operations. This enables efficient processing of low-light images and real-time performance.

From Table 2, it is evident that our method still outperforms the most commonly compared image processing methods in terms of evaluation metrics. Figure 10 illustrates four scenarios in mining environments where image enhancement is often required. Figure 10a,b are related to underground operations: the first involves monitoring the status of a coal mining machine, while the second focuses on monitoring hydraulic support maintenance personnel. Figure 10c,d simulate nighttime coal loading onto trucks and monitoring the details of the trucks, respectively.

In Figure 10a, our primary focus is on monitoring the status of the coal cutter. The example image shows the overall monitoring of the coal cutter’s operational status under slightly-lower-than-normal illumination conditions. Compared to the three other methods, our approach produces images that are closer to the real image when the illumination is not extremely low. This is evident in the higher contrast observed in the area around the coal cutter in our processed image. In contrast, the other three methods show noticeable blurring around the edges of the coal cutter.

In Figure 10b, our method achieves higher contrast in depicting the outlines of individuals and provides more accurate color reproduction, closer to the real colors. In underground work environments, workers often wear bright-colored uniforms. When compared with the three other methods, particularly LLFormer, it is evident that the color of the person influences the surrounding environment’s color. For example, the hydraulic support beneath the worker appears more blue, closely matching the color of the worker’s uniform.

The ability to effectively isolate the non-subject colors from influencing the main colors in the environment, thus enhancing contrast, is attributed to our MEFormer method. This method incorporates the UniteChNet channel, which enables features at different scales to express clearer and more distinct semantic relationships. By segmenting different features and minimizing the impact of relationships across various scales, our method enhances feature contrast in images, allowing people to better distinguish between small-scale objects and their surroundings.

In Figure 10c, workers remotely control the coal feeder to manage the amount of coal loaded onto the trucks. In the upper corner of the truck and the coal pile, as illustrated, light obstruction and the presence of coal dust on the truck walls, which are mostly black, make it challenging to monitor the amount of coal being loaded in this area during nighttime operations. Compared to Retinexformer, our method provides greater differentiation between the coal flow and the truck wall in the upper corner of the truck, resulting in a clearer display of the coal flow. When compared to MIRNet2, although it shows a higher level of differentiation between the coal flow and the truck wall, it also introduces white spots—noise—within large monochromatic areas. This noise reduces the accuracy of information representation in the images, making our method more effective at providing clear and accurate visual information.

Figure 10d involves monitoring the trucks during loading, specifically focusing on the enlarged truck license plate, as shown in Figure 10. From the results in Figure 10, it is evident that our method produces images where the vehicle’s color is closer to the real color, and the white of the license plate is more distinctly separated from the color of the vehicle. In contrast, the images processed by Retinexformer and MIRNetv2 show the vehicle’s color leaning towards white, making it difficult to effectively distinguish the white license plate from the vehicle’s body. The images processed by LLFormer result in the vehicle’s color being overly bright and distorted, with too high a frequency, making it difficult for the human eye to intuitively distinguish between the license plate number and the vehicle’s body color.

5. Ablation Studies

MEFormer is an improved version of LLFormer, enhanced by adding a cross-multiscale joint processing structure called UniteChNet, based on Transformer. Therefore, our experimental design involved training both models on the LOL dataset for 20,000 epochs with a patch size of

128 \times 128

.

The results, shown in Table 4, indicate that MEFormer achieves an increase of 2.1 in the PSNR and an improvement of 0.07 in the SSIM, reaching a value of 0.89 (with the maximum being 1.0). An SSIM of 0.89 indicates that the images processed by our method closely resemble the real target images, demonstrating the excellent performance of our UniteChNet cross-multiscale joint processing structure based on Transformer. This is particularly effective in applications where maintaining the original image structure is crucial, such as in high-fidelity mining environments.

To further investigate the critical components of MEFormer—namely, Axial Transformer (AT), CUT, and UniteChNet—ablation experiments were conducted. These experiments were based on a ResBlock architecture, progressively incorporating each of the three modules to evaluate their individual contributions to the overall performance of the model.

From the analysis of the results presented in Table 5, it is evident that the significant increase in image PSNR is primarily attributable to the incorporation of the AT module and the UniteChNet module. Both of these modules are built upon Transformer-based foundational architectures, which enhance the model’s expressive capacity, enabling it to capture and represent a broader range of features more effectively.

Specifically, replacing the ResBlock in the baseline model with the AT module results in a 10.6% increase in PSNR. Subsequently, the addition of the UniteChNet module leads to an additional 8.9% improvement in PSNR. Although the incremental gain from UniteChNet is smaller compared to the initial enhancement provided by the AT module, it is noteworthy that the PSNR on the LOL dataset reaches 23.65 after integrating UniteChNet. This value is very close to the upper range of inferred image quality, indicating that the overall improvement remains substantial.

In comparison to the PSNR enhancements, the inclusion of the UniteChNet structure yields a significantly more pronounced improvement in the SSIM. While the maximum possible SSIM value is 1, the UniteChNet module achieves an 8% increase in the SSIM. In contrast, other structural modifications only result in a 1% increase in the SSIM. Moreover, the SSIM values for these other structures remain very close to 0.90, demonstrating that the original image structure is largely preserved. Therefore, the UniteChNet module plays a crucial role in maintaining high image quality by effectively preserving structural integrity during illumination restoration.

In summary, the UniteChNet module not only contributes significantly to the improvement of the PSNR but also ensures substantial gains in the SSIM, thereby maintaining the authenticity and structural fidelity of the processed images. This underscores the effectiveness of UniteChNet in enhancing both the quantitative and qualitative aspects of low-light image restoration.

6. Conclusions

In this paper, we proposed a comprehensive reference solution for scenarios that require high fidelity, such as those prevalent in the mining industry. The proposed solution encompasses the MELOL dataset, which is designed for the purpose of enhancing video monitoring in mining environments, as well as the MEFormer model, a novel approach that facilitates more realistic processing of low-light images. Our research responds to the current paucity of datasets tailored to these specific scenarios, in addition to the necessity for a powerful and efficient AI-based image processing model. The experimental results demonstrate that our model generates images with better realism and improved distinguishability of similarly colored objects in low-light conditions. However, it is acknowledged that real-world mining environments frequently entail complex and compounded degradation scenarios that extend beyond low illumination, such as dust interference, haze, and motion blur. These supplementary challenges remain unaddressed by the current model. Henceforth, our future research endeavors will focus on extending the proposed model to accommodate multi-degradation adaptation. Specifically, we plan to incorporate dynamic environment-aware enhancement techniques and develop cross-degradation feature disentanglement mechanisms to improve the model’s robustness in such real-world conditions.

Author Contributions

Conceptualization, Z.S. (Zhenming Sun) and Z.S. (Zeqing Shen); methodology, Z.S. (Zhenming Sun) and Z.S. (Zeqing Shen); software, Z.S. (Zhenming Sun) and Z.S. (Zeqing Shen); validation, Z.S. (Zhenming Sun) and Z.S. (Zeqing Shen), and N.C.; formal analysis, Z.S. (Zhenming Sun); investigation, Full authors; resources, Z.S. (Zhenming Sun) and Z.S. (Zeqing Shen); data curation, S.P., H.L., Y.Y., H.W. and Y.Z.; writing—original draft, Z.S. (Zhenming Sun), Z.S. (Zeqing Shen), S.P., H.L., Y.Y., H.W. and Y.Z.; writing—review and editing, Full authors; visualization, Z.S. (Zhenming Sun) and Z.S. (Zeqing Shen); supervision, N.C. and S.P.; project administration, N.C.; funding acquisition, N.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key Research and Development Program of China (No. 2022YFB4703701) and the Fundamental Research Funds for the Central Universities (No. 2023JCCXNY01).

Data Availability Statement

The MELOL dataset and implementation code are publicly available under MIT License at: https://github.com/ZeKing12/MELOL, accessed on 19 March 2025.

Acknowledgments

The authors appreciate the support by the National Key Research and Development Program of China (No. 2022YFB4703701) and the Fundamental Research Funds for the Central Universities (No. 2023JCCXNY01).

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Niu, S. Coal mine safety production situation and management strategy. Manag. Eng. 2014, 14, 78–82. [Google Scholar]
Miao, D.; Lv, Y.; Yu, K.; Liu, L.; Jiang, J. Research on coal mine hidden danger analysis and risk early warning technology based on data mining in China. Process Saf. Environ. Prot. 2023, 171, 1–17. [Google Scholar] [CrossRef]
Wang, S.; Liu, G.; Song, Z.; Yang, K.; Li, M.; Chen, Y.; Wang, M. Three-Dimensional Deformation Prediction Based on the Improved Segmented Knothe–Dynamic Probabilistic Integral–Interferometric Synthetic Aperture Radar Model. Remote Sens. 2025, 17, 261. [Google Scholar] [CrossRef]
Tian, X.; Yao, X.; Zhou, Z.; Tao, T. Surface Multi-Hazard Effects of Underground Coal Mining in Mountainous Regions. Remote Sens. 2025, 17, 122. [Google Scholar] [CrossRef]
Milošević, I.; Stojanović, A.; Nikolić, Đ.; Mihajlović, I.; Brkić, A.; Perišić, M.; Spasojević-Brkić, V. Occupational health and safety performance in a changing mining environment: Identification of critical factors. Process Saf. Environ. Prot. 2025, 184, 106745. [Google Scholar] [CrossRef]
Zhao, J.; Yu, H.; Dong, H.; Xie, S.; Cheng, Y.; Xia, Z. Analysis on dust prevention law of new barrier strategy in fully mechanized coal mining face. Process Saf. Environ. Prot. 2024, 187, 1527–1539. [Google Scholar] [CrossRef]
Zhou, G.; Guo, H.; Shao, W.; Liu, Z.; Chen, X.; Chen, J.; Yan, G.; Hu, S.; Zhang, Y.; Sun, B. Spatiotemporal spreading characteristics of dust aerosols by coal mining machine cutting and “partition/multi-level composite field” atomization dust purification technology. Process Saf. Environ. Prot. 2024, 192, 386–400. [Google Scholar] [CrossRef]
Qiang, X.; Li, G.; Sari, Y.A.; Fan, C.; Hou, J. Development of targeted safety hazard management plans utilizing multidimensional association rule mining. Heliyon 2024, 10, e40676. [Google Scholar] [CrossRef]
Li, S.; You, M.; Li, D.; Liu, J. Identifying coal mine safety production risk factors by employing text mining and Bayesian network techniques. Process Saf. Environ. Prot. 2022, 162, 1067–1081. [Google Scholar] [CrossRef]
Cheng, L.; Guo, H.; Lin, H. Evolutionary model of coal mine safety system based on multi-agent modeling. Process Saf. Environ. Prot. 2021, 147, 1193–1200. [Google Scholar]
Solarz, J.; Gawlik-Kobylińska, M.; Ostant, W.; Maciejewski, P. Trends in energy security education with a focus on renewable and nonrenewable sources. Energies 2022, 15, 1351. [Google Scholar] [CrossRef]
Wei, C.; Bai, L.; Chen, X.; Han, J. Cross-Modality Data Augmentation for Aerial Object Detection with Representation Learning. Remote Sens. 2024, 16, 4649. [Google Scholar] [CrossRef]
Wu, D.; Zhang, S. Research on image enhancement algorithm of coal mine dust. In Proceedings of the SNSP, Xi’an, China, 28–31 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 261–265. [Google Scholar]
Subramani, B.; Veluchamy, M. Fuzzy gray level difference histogram equalization for medical image enhancement. J. Med. Syst. 2020, 44, 103. [Google Scholar]
Singh, K.; Kapoor, R. Image enhancement using Exposure based Sub Image Histogram Equalization. Pattern Recognit. Lett. 2014, 36, 10–14. [Google Scholar] [CrossRef]
Vijayalakshmi, D.; Nath, M.K. A novel multilevel framework based contrast enhancement for uniform and non-uniform background images using a suitable histogram equalization. Digit. Signal Process. 2022, 127, 103532. [Google Scholar]
Rao, B.S. Dynamic Histogram Equalization for contrast enhancement for digital images. Appl. Soft Comput. 2020, 89, 106114. [Google Scholar] [CrossRef]
She, D. Retinex Based Visual Image Enhancement Algorithm for Coal Mine Exploration Robots. Informatica 2024, 48, 133–146. [Google Scholar] [CrossRef]
Wu, C.; Wang, D.; Huang, K.; Wu, L. Enhancement of Mine Images through Reflectance Estimation of V Channel Using Retinex Theory. Processes 2024, 12, 1067. [Google Scholar] [CrossRef]
Du, Y.; Tong, M.; Zhou, L.; Dong, H. Edge detection based on Retinex theory and wavelet multiscale product for mine images. Appl. Opt. 2016, 55, 9625–9637. [Google Scholar] [CrossRef]
Shang, D.; Yang, Z.; Zhang, X.; Zheng, L.; Lv, Z. Research on low illumination coal gangue image enhancement based on improved Retinex algorithm. Int. J. Coal Prep. Util. 2023, 43, 999–1015. [Google Scholar]
Wang, Z.; Hu, G.; Zhao, S.; Wang, R.; Kang, H.; Luo, F. Local Pyramid Vision Transformer: Millimeter-Wave Radar Gesture Recognition Based on Transformer with Integrated Local and Global Awareness. Remote Sens. 2024, 16, 4602. [Google Scholar] [CrossRef]
Sngeorzan, D.D.; Păcurar, F.; Reif, A.; Weinacker, H.; Rușdea, E.; Vaida, I.; Rotar, I. Detection and Quantification of Arnica montana L. Inflorescences Grassland Ecosystems Using Convolutional Neural Networks Drone-Based Remote Sensing. Remote Sens. 2024, 16, 2012. [Google Scholar] [CrossRef]
Li, J.; Chen, C.; Han, Y.; Chen, T.; Xue, X.; Liu, H.; Zhang, S.; Yang, J.; Sun, D. Wind Profile Reconstruction Based on Convolutional Neural Network for Incoherent Doppler Wind LiDAR. Remote Sens. 2024, 16, 1473. [Google Scholar] [CrossRef]
Liang, Z.; Long, H.; Zhu, Z.; Cao, Z.; Yi, J.; Ma, Y.; Liu, E.; Zhao, R. High-Precision Disparity Estimation for Lunar Scene Using Optimized Census Transform and Superpixel Refinement. Remote Sens. 2024, 16, 3930. [Google Scholar] [CrossRef]
Nan, Z.; Gong, Y. An Image Enhancement Method in Coal Mine Underground Based on Deep Retinex Network and Fusion Strategy. In Proceedings of the ICIVC, Qingdao, China, 23–25 July 2021; pp. 209–214. [Google Scholar]
Zhou, W.; Li, L.; Liu, B.; Cao, Y.; Ni, W. A Multi-Tiered Collaborative Network for Optical Remote Sensing Fine-Grained Ship Detection in Foggy Conditions. Remote Sens. 2024, 16, 3968. [Google Scholar] [CrossRef]
Cao, T.; Peng, T.; Wang, H.; Zhu, X.; Guo, J.; Zhang, Z. Multi-scale adaptive low-light image enhancement based on deep learning. J. Electron. Imaging 2024, 33, 043033. [Google Scholar]
Li, N.; Gao, S.; Xue, J.; Zhang, Y. Downhole Image Enhancement Algorithm Based on Improved CycleGAN. In Proceedings of the CVIDL, Zhuhai, China, 19–21 April 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 216–220. [Google Scholar]
Feng, Y.; Hou, S.; Lin, H.; Zhu, Y.; Wu, P.; Dong, W.; Sun, J.; Yan, Q.; Zhang, Y. DiffLight: Integrating Content and Detail for Low-light Image Enhancement. In Proceedings of the CVPR, Seattle, WA, USA, 17–21 June 2024; pp. 6143–6152. [Google Scholar]
Onifade, M.; Said, K.O.; Shivute, A.P. Safe mining operations through technological advancement. Process Saf. Environ. Prot. 2023, 175, 251–258. [Google Scholar] [CrossRef]
Tingjiang, T.; Changfang, G.; Guohua, Z.; Wenhua, J. Research and application of downhole drilling depth based on computer vision technique. Process Saf. Environ. Prot. 2023, 174, 531–547. [Google Scholar] [CrossRef]
Gonzalez-Cortes, A.; Burlet-Vienney, D.; Chinniah, Y. Inherently safer design: An accident prevention perspective on reported confined space fatalities in Quebec. Process Saf. Environ. Prot. 2021, 149, 794–816. [Google Scholar] [CrossRef]
Xu, P.; Zhou, Z.; Geng, Z. Safety monitoring method of moving target in underground coal mine based on computer vision processing. Sci. Rep. 2022, 12, 17899. [Google Scholar] [CrossRef]
Hanif, M.W.; Li, Z.; Yu, Z.; Bashir, R. A lightweight object detection approach based on edge computing for mining industry. IET Image Process. 2024, 18, 4005–4022. [Google Scholar] [CrossRef]
Dhal, K.G.; Das, A.; Ray, S.; Gálvez, J.; Das, S. Histogram equalization variants as optimization problems: A review. Arch. Comput. Methods Eng. 2021, 28, 1471–1496. [Google Scholar] [CrossRef]
Vaswani, A. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Garg, P.; Jain, T. A comparative study on histogram equalization and cumulative histogram equalization. Int. J. New Technol. Res. 2017, 3, 263242. [Google Scholar]
Hussein, R.R.; Hamodi, Y.I.; Rooa, A.S. Retinex theory for color image enhancement: A systematic review. Int. J. Electr. Comput. Eng. 2019, 9, 5560. [Google Scholar] [CrossRef]
Jiang, K.; Wang, Q.; An, Z.; Wang, Z.; Zhang, C.; Lin, C.W. Mutual retinex: Combining transformer and cnn for image enhancement. IEEE Trans. Emerging Top. 2024, 8, 2240–2252. [Google Scholar]
Cao, X.; Yu, J. LLE-NET: A Low-Light Image Enhancement Algorithm Based on Curve Estimation. Mathematics 2024, 12, 1228. [Google Scholar] [CrossRef]
Tian, Z.; Wu, J.; Zhang, W.; Chen, W.; Zhou, T.; Yang, W.; Wang, S. An illuminance improvement and details enhancement method on coal mine low-light images based on Transformer and adaptive feature fusion. Int. J. Coal Sci. Technol. 2024, 52, 297–310. [Google Scholar]
Tolstikhin, I.O.; Houlsby, N.; Kolesnikov, A.; Beyer, L.; Zhai, X.; Unterthiner, T.; Yung, J.; Steiner, A.; Keysers, D.; Uszkoreit, J.; et al. Mlp-mixer: An all-mlp architecture for vision. Adv. Neural Inform. Process. Syst. 2021, 34, 24261–24272. [Google Scholar]
Bolya, D.; Fu, C.Y.; Dai, X.; Zhang, P.; Feichtenhofer, C.; Hoffman, J. Token merging: Your vit but faster. arXiv 2022, arXiv:2210.09461. [Google Scholar]
Yi, X.; Xu, H.; Zhang, H.; Tang, L.; Ma, J. Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion. In Proceedings of the CVPR, Seattle, WA, USA, 17–21 June 2024; pp. 27026–27035. [Google Scholar]
Yang, W.; Wang, S.; Wu, J.; Chen, W.; Tian, Z. A low-light image enhancement method for personnel safety monitoring in underground coal mines. Complex Intell. Syst. 2024, 10, 4019–4032. [Google Scholar]
Wang, T.; Zhang, K.; Shen, T.; Luo, W.; Stenger, B.; Lu, T. Ultra-high-definition low-light image enhancement: A benchmark and transformer-based method. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 2654–2662. [Google Scholar]
Wang, W.; Yan, D.; Wu, X.; He, W.; Chen, Z.; Yuan, X.; Li, L. Low-light image enhancement based on virtual exposure. Signal Process. Image Commun. 2023, 118, 117016. [Google Scholar] [CrossRef]
Mythili, R.; bama, B.S.; Kumar, P.S.; Das, S.; Thatikonda, R.; Inthiyaz, S. Radial basis function networks with lightweight multiscale fusion strategy-based underwater image enhancement. Expert Syst. 2025, 42, e13373. [Google Scholar] [CrossRef]
Ren, P.; Jia, Q.; Xu, Q.; Li, Y.; Bi, F.; Xu, J.; Gao, S. Oil Spill Drift Prediction Enhanced by Correcting Numerically Forecasted Sea Surface Dynamic Fields With Adversarial Temporal Convolutional Networks. IEEE Trans. Geosci. Remote Sens. 2025, 63, 4701018. [Google Scholar]
Yang, J.; Wu, C.; Du, B.; Zhang, L. Enhanced multiscale feature fusion network for HSI classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 10328–10347. [Google Scholar]
Zhang, Z.; Li, C. Front Matter: Volume 13091. In Proceedings of the Fifteenth International Conference on Signal Processing Systems (ICSPS 2023), Xi’an, China, 17–19 November 2023; 13091, pp. 1309101-1. [Google Scholar]
Xu, K.; Chen, H.; Tan, X.; Chen, Y.; Jin, Y.; Kan, Y.; Zhu, C. HFMNet: Hierarchical feature mining network for low-light image enhancement. IEEE Trans. Instrum. Meas. 2022, 71, 5014014. [Google Scholar]
Wu, C.; Wang, D.; Huang, K. Enhancement of Mine Images Based on HSV Color Space. IEEE Access 2024, 12, 72170–72186. [Google Scholar]
Zhang, W.; Zuo, D.; Wang, C.; Sun, B. Research on image enhancement algorithm for the monitoring system in coal mine hoist. Meas. Control 2023, 56, 1572–1581. [Google Scholar]
Qiao, J.; Wang, X.; Chen, J.; Jian, M. Low-light image enhancement with an anti-attention block-based generative adversarial network. Electronics 2022, 11, 1627. [Google Scholar] [CrossRef]
Peng, B.; Zhang, X.; Lei, J.; Zhang, Z.; Ling, N.; Huang, Q. LVE-S2D: Low-Light Video Enhancement From Static to Dynamic. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 8342–8352. [Google Scholar] [CrossRef]
Si, L.; Wang, Z.; Xu, R.; Tan, C.; Liu, X.; Xu, J. Image enhancement for surveillance video of coal mining face based on single-scale retinex algorithm combined with bilateral filtering. Symmetry 2017, 9, 93. [Google Scholar] [CrossRef]
Huo, G.; Wu, J.; Ding, H. A two-stage image enhancement network for complex underground coal mine environment. In Proceedings of the ICDIP, Haikou, China, 24–26 May 2024; SPIE: Cergy-Pontoise, France, 2024; Volume 13274, pp. 436–445. [Google Scholar]
Li, C.; Zheng, T.; Li, S.; Yu, C.; Gong, Y. Multi-Scale Enhancement and Sharpening Method for Visible Light Images in Underground Coal Mines. In Proceedings of the ICIVC, Dalian, China, 27–29 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 761–770. [Google Scholar]
Sun, L.; Chen, S.; Yao, X.; Zhang, Y.; Tao, Z.; Liang, P. Image enhancement methods and applications for target recognition in intelligent mine monitoring. J. China Coal Soc. 2024, 49, 495–504. [Google Scholar]
Yang, W.; Zhang, X.; Ma, B.; Wang, Y.; Wu, Y.; Yan, J.; Liu, Y.; Zhang, C.; Wan, J.; Wang, Y.; et al. An open dataset for intelligent recognition and classification of abnormal condition in longwall mining. Sci. Data 2023, 10, 416. [Google Scholar]
Wei, C.; Wang, W.; Yang, W.; Liu, J. Deep retinex decomposition for low-light enhancement. arXiv 2018, arXiv:1808.04560. [Google Scholar]
Schmid, C.; Soatto, S.; Tomasi, C. Conference on Computer Vision and Pattern Recognition; IEEE Computer Society: Piscataway, NJ, USA, 2005. [Google Scholar]
Chen, C.; Chen, Q.; Xu, J.; Koltun, V. Learning to see in the dark. In Proceedings of the CVPR, Lake City, UT, USA, 18–22 June 2018; pp. 3291–3300. [Google Scholar]
Bychkovsky, V.; Paris, S.; Chan, E.; Durand, F. Learning photographic global tonal adjustment with a database of input/output image pairs. In Proceedings of the CVPR, Providence, RI, USA, 20–25 June 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 97–104. [Google Scholar]
Afifi, M.; Derpanis, K.G.; Ommer, B.; Brown, M.S. Learning multi-scale photo exposure correction. In Proceedings of the CVPR, Nashville, TN, USA, 20–25 June 2021; pp. 9157–9167. [Google Scholar]
Mittal, A.; Moorthy, A.K.; Bovik, A.C. No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 2012, 21, 4695–4708. [Google Scholar] [PubMed]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
Neukirch, K. Moldava. In Elections in Europe; Nomos Verlagsgesellschaft mbH & Co. KG: Baden-Baden, Germany, 2010; pp. 1313–1348. [Google Scholar]
Wu, W.; Weng, J.; Zhang, P.; Wang, X.; Yang, W.; Jiang, J. Uretinex-net: Retinex-based deep unfolding network for low-light image enhancement. In Proceedings of the CVPR, New Orleans, LA, USA, 18–24 June 2022; pp. 5901–5910. [Google Scholar]
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H.; Shao, L. Learning enriched features for fast image restoration and enhancement. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 1934–1948. [Google Scholar]
Cai, Y.; Bian, H.; Lin, J.; Wang, H.; Timofte, R.; Zhang, Y. Retinexformer: One-stage retinex-based transformer for low-light image enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 12504–12513. [Google Scholar]
Jiang, Y.; Gong, X.; Liu, D.; Cheng, Y.; Fang, C.; Shen, X.; Yang, J.; Zhou, P.; Wang, Z. Enlightengan: Deep light enhancement without paired supervision. IEEE Trans. Image Process. 2021, 30, 2340–2349. [Google Scholar]
Guo, C.; Li, C.; Guo, J.; Loy, C.C.; Hou, J.; Kwong, S.; Cong, R. Zero-reference deep curve estimation for low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1780–1789. [Google Scholar]

Figure 1. Representative real-world images of typical low-light scenes in mining environments: (a) Maintenance workers repairing the scraper conveyor in front of the hydraulic support. (b) Real-time monitoring of a longwall shearer during mining operations. (c) A monitoring image of a hydraulic support. (d,e) Images captured with a GoPro inside the silo and during the process of coal loading onto trucks. (f) Footage of coal discharging operations recorded by on-site surveillance cameras.

Figure 2. MEFormer architecture overview. (a) Overall model architecture. An overview of the complete model structure, accompanied by two illustrative images: one depicting the original low-light image and the other showcasing the resultant enhanced image after processing. (b) UniteATBlock: A detailed representation of the UniteATBlock within the model, designed for efficient feature extraction and enhancement of the processed images. (c) UniteEDBlock: Illustration of the UniteEDBlock, which is intended to determine the relationships between features at different scales to ensure the authenticity and naturalness of the enhanced images.

Figure 3. UniteEDBlock.

Figure 4. Axial Transformer.

Figure 5. Axial attention and feed forward network.

Figure 6. Channel Unite Transormer (CUT).

Figure 7. A comprehensive overview of UniteChNet.

Figure 8. Examples from the MELOL dataset.

Figure 9. Comparison of LOL dataset results. The images from (a–f) are all from the test set of the LOL dataset.

Figure 10. Four typical scenarios within the mining environment. (a) Coal shearer. (b) hydraulic support. (c) Discharge coal from the silo and load it onto the truck. (d) Truck and license plate.

Table 2. Quantitative and MIT-Adobe FiveK. comparisons of different methods on LOL-v1 ↑: Higher values indicate better image quality; ↓: Lower values indicate better image quality; \: This indicator does not apply to Ground Truth images; Red and bold font: Images with the best indicator among all method.

Method	LOL-v1				MIT-Adobe FiveK
	PSNR↑	SSIM↑	BRIS↑ [68]	LPIPS↓ [69]	PSNR↑	SSIM↑	BRIS↑	LPIPS↓
Ground Truth	\	\	2.12	\	\	\	2.11	\
PSMUCM [70]	18.75	0.78	0.53		18.26	0.58	0.61
Retinex-Net [63]	16.77	0.43	0.48	0.47	17.63	0.73	0.56	0.25
URetinex-Net [71]	20.04	0.83	0.48	0.42	18.32	0.77	0.50	0.23
MIRNetv2 [72]	25.04	0.85	0.51	0.26	24.84	0.88	0.57	0.27
Retinexformer [73]	25.16	0.85	0.50	0.18	24.52	0.89	0.58	0.20
EnlightenGAN [74]	17.48	0.65	0.52	0.32	17.91	0.84	0.59	0.14
Zero-DCE [75]	14.86	0.56	0.49	0.34	15.93	0.77	0.49	0.16
LLFormer [47]	23.65	0.82	2.11	0.17	25.75	0.92	1.87	0.04
Ours	25.75	0.89	2.12	0.15	26.47	0.94	2.06	0.07

Table 3. Results from the Low-Light image dataset in a mining environment. Red and bold font: Images with the best indicator among all method.

MELOL					Computational Efficiency
					and Memory Usage
	PSNR↑	SSIM↑	BRIS↑	LPIPS↓	Params	FLOPS	Inference Time
					(M)	(G)	(s; 10k images)
MIRNetv2	22.84	0.86	1.31	0.23	5.0	34.88	12.7
Retinexformer	26.16	0.89	1.48	0.13	1.61	3.93	3.8
LLFormer	25.72	0.86	1.47	0.17	24.52	3.46	2.7
Ours	26.34	0.91	1.62	0.12	29.34	3.56	2.8

Table 4. Performance metrics of AI models with and without the innovative structure. Red and bold font: Images with the best indicator among all structure.

Structure	Params (M)	FLOPS (G)	SSIM	PSNR
Base (LLFormer)	24.52	3.46	0.82	23.65
LLFormer + UniteChNet	29.34	3.56	0.89	25.75

Table 5. Progressive integration of proposed structures into ResNet. Red and bold font: Images with the best indicator among all model.

Model	AT	CUT	UniteChNet	Params (M)	FLOPS (G)	SSIM	PSNR
Base				13.87	1.83	0.80	20.85
a	✓			19.78	2.50	0.81	23.07
b	✓	✓		24.52	3.46	0.82	23.65
c	✓	✓	✓	29.34	3.56	0.89	25.75

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sun, Z.; Shen, Z.; Chen, N.; Pang, S.; Liu, H.; You, Y.; Wang, H.; Zhu, Y. MEFormer: Enhancing Low-Light Images While Preserving Image Authenticity in Mining Environments. Remote Sens. 2025, 17, 1165. https://doi.org/10.3390/rs17071165

AMA Style

Sun Z, Shen Z, Chen N, Pang S, Liu H, You Y, Wang H, Zhu Y. MEFormer: Enhancing Low-Light Images While Preserving Image Authenticity in Mining Environments. Remote Sensing. 2025; 17(7):1165. https://doi.org/10.3390/rs17071165

Chicago/Turabian Style

Sun, Zhenming, Zeqing Shen, Ning Chen, Shuoqi Pang, Hui Liu, Yimeng You, Haoyu Wang, and Yiran Zhu. 2025. "MEFormer: Enhancing Low-Light Images While Preserving Image Authenticity in Mining Environments" Remote Sensing 17, no. 7: 1165. https://doi.org/10.3390/rs17071165

APA Style

Sun, Z., Shen, Z., Chen, N., Pang, S., Liu, H., You, Y., Wang, H., & Zhu, Y. (2025). MEFormer: Enhancing Low-Light Images While Preserving Image Authenticity in Mining Environments. Remote Sensing, 17(7), 1165. https://doi.org/10.3390/rs17071165

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MEFormer: Enhancing Low-Light Images While Preserving Image Authenticity in Mining Environments

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. MEFormer Model Preliminaries

UniteATBlock

3.2. UniteEDBlock

3.3. MELOL Dataset

4. Experiments and Analysis

4.1. Implementation Details

4.2. Evaluation Metrics

4.3. Low-Light Image Dataset

4.4. Comparison Results on Public Datasets

4.5. Comparison Results on MELOL Datasets

5. Ablation Studies

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI