Enhancement of Optical Wireless Discrete Multitone Channel Capacity Based on Li-Fi Using Sparse Coded Mask Modeling
Abstract
:1. Introduction
1.1. Technical Motivation Based on MAE
1.1.1. Fundamental Reason: True Time-Domain Compression and Self-Supervised Reconstruction
- ■
- Physical Signal Shortening vs. Traditional Equalization
- Traditional deep learning methods for optical wireless links (e.g., CNN- or DNN-based equalizers) assume full waveform transmission and focus on compensating channel impairments.
- Our goal, however, is to physically “shorten” the transmitted DMT waveform (by deleting large fractions of samples), thereby freeing up more channel capacity.
- Because the MAE forces the model to learn how to “fill in” missing segments, we can remove a substantial portion (up to 75–85%) of the waveform and still reconstruct it accurately at the receiver.
- This self-supervised “inpainting” approach—where the network learns from unmasked patches to predict masked ones—differentiates the MAE from other autoencoders or supervised methods.
- ■
- Self-Supervised Learning with Minimal Overhead
- A masked autoencoder does not require specialized pilot sequences or external labels; the only references for missing parts are the original waveforms themselves.
- It is thus naturally suited to handling missing data in a self-supervised manner, minimizing extra signaling overhead often necessary in end-to-end or classical channel-estimation approaches.
- ■
- Better Generalization and Reduced Overfitting
- Through random masking, an MAE forces the model to learn global structure rather than memorizing local correlations.
- Our experiments show that increasing the masking ratio can paradoxically decrease errors (EVM) because the model becomes more robust to variations in the DMT waveform.
- Standard autoencoders (without masking) risk local overfitting, as they see the entire waveform in each training pass.
1.1.2. Technical Justification: Treating DMT as Multivariate Time-Series
- ■
- DMT Waveforms as Patches
- We divide the original DMT waveform into patches, then randomly delete (mask) a fraction of them. The MAE reconstructs these missing patches by contextualizing the unmasked patches.
- This works well because DMT signals are effectively multivariate time-series data, with multiple subcarriers modulated in parallel.
- The MAE’s self-attention mechanism can exploit both time-domain continuity and cross-subcarrier correlations more effectively than purely local methods.
- ■
- Positional Encoding for Time-Series Reconstruction
- We embed each patch with a positional encoding, so the MAE knows where each patch belongs.
- This is crucial for a time-series context: unlike images or simple feed-forward networks, we need to preserve temporal and subcarrier ordering.
- By leveraging the transformer-style attention, the MAE can identify subtle relationships across patches, enabling accurate reconstructions even with high masking ratios.
1.1.3. Why Not a Standard Autoencoder or Other Alternatives?
- ■
- Standard Autoencoders
- Typical autoencoders assume access to all (or nearly all) input features, then compress them into a latent dimension.
- They do not physically remove data but only map the full input to a lower-dimensional latent space. Hence, they do not yield a genuine time-domain compression that can be exploited for increasing channel capacity.
- By contrast, an MAE literally masks out entire patches, demanding that the decoder reconstruct them from only partial information.
- ■
- CNN or FNN-based Approaches
- CNNs can handle spatially or temporally correlated data but typically focus on local receptive fields; they struggle to capture long-range dependencies if large chunks of data are missing.
- FNNs fully connect all inputs but are not naturally suited to partial-observation “inpainting” tasks.
- MAEs (particularly those using self-attention) adapt more easily to scenarios with large missing portions by leveraging global context in an end-to-end learned fashion.
- ■
- End-to-End Deep Learning without Masking
- While end-to-end methods can improve channel equalization, they still transmit the complete waveform.
- Consequently, they cannot provide the effective capacity gains we achieve by physically not sending up to 75–85% of the samples.
1.1.4. Practical Impact of the Proposed Technique on Li-Fi Transmission
- ■
- Increasing Channel Capacity Without New Hardware
- Li-Fi’s bandwidth limitations arise from LED response constraints. Our approach circumvents these constraints by truly compressing the waveform in time.
- Even with standard optics and the same nominal 10 MHz bandwidth, we achieve up to 1.85× capacity by transmitting fewer samples per symbol interval and reconstructing them later.
- ■
- Robustness Under Realistic Conditions
- As we showed experimentally (with 1 m free-space Li-Fi, indoor illumination levels of 300–440 lux), the MAE-based approach remains feasible and robust.
- A high masking ratio (75–85%) still yields acceptable EVM under the firstfirst-generation GFEC threshold.
- Actual time-domain compression (masking) and subsequent self-supervised reconstruction,
- Improved generalization through forced global pattern learning, and
- Real capacity gains without hardware changes, going beyond mere channel equalization.
1.2. Other Existing Works and Key Contributions of Proposed Technique
- ■
- True Time-Domain Compression Through Random Masking
- Unlike many deep learning approaches that transmit the complete waveform and focus on post-transmission equalization, our method removes (masks) up to 75–85% of the DMT signal pre-transmission.
- A masked autoencoder then reconstructs these missing segments at the receiver, which increases the effective channel capacity up to 1.85× over a 10 MHz bandwidth.
- ■
- Viewing Multicarrier DMT Waveforms as Time-Series ‘Patches’
- We divide the QPSK/16-QAM DMT waveforms into 1-D patches and apply positional encoding to identify each patch’s location.
- This design captures long-range dependencies via self-attention, facilitating robust patch-wise masking and inpainting—even under high masking ratios.
- ■
- Extensive Experimental Validation
- We systematically measure EVM versus masking ratio, number of patches, and batch size in an actual Li-Fi testbed (1 m range, 300–440 lux).
- Notably, our results show that higher masking ratios can sometimes reduce errors because the MAE is forced to learn global waveform structure instead of overfitting local features.
- Our approach remains practically feasible above ~240 lux, typical of indoor illumination levels.
1.3. Conceptual Comparison with Representative State-of-the-Art (SOTA) Techniques
- Hardware improvements, such as LED driver optimization or photodiode sensitivity enhancement.
- Equalization/filtering methods, such as RLS/FIR/DFE adaptive filters.
- Modulation techniques, like CAP, OFDM with high-order QAM, or VLC-optimized schemes.
- Our method supports waveform compression up to 85% masking, which traditional methods do not address.
- Unlike FNN/CNN models that are limited to local features or final-stage demapping, the MAE captures long-range temporal dependencies and reconstructs full time-series waveforms.
- Our approach also works without requiring strict channel models or Nyquist compliance, which many traditional DSP techniques depend on.
2. Augmentation of Li-Fi Wireless DMT Transmission Capacity Through Sparse Coded Mask Optimization
2.1. Sparse Coded Mask Modeling for DMT Signal
Algorithm 1 Masked Autoencoder (MAE)-Based Sparse Coded Mask Modeling |
1: Input Received DMT patches (both available and masked) 2: Step 1 Patch Embedding Apply a 1-D convolutional layer to transform each patch into a feature space. Embed position information using positional encoding. 3: Step 2 Encoding Feed the unmasked patches into a transformer encoder. Learn contextual relationships using multi-head self-attention (MHSA) layers. 4: Step 3 Masked Patch Reconstruction Input the encoded features into a decoder network. Predict the missing patches based on surrounding patch information. 5: Step 4 Loss Calculation Compute the MSE loss between the predicted and original DMT patches. 6: Step 5 Output the Reconstructed DMT Signal Reassemble the patches and apply the inverse transformation to reconstruct the full DMT waveform. |
2.2. Impact of the Masking Ratio on the Channel Capacity
2.2.1. Revisiting Our Masked Autoencoder and Signal Compression
- ■
- Definition of Masking Ratio
- Suppose the original DMT waveform for one symbol duration has M samples.
- A masking ratio means that a fraction of the total patches (or samples) is deleted, and only the remaining fraction is actually transmitted over the optical link.
- In essence, the time-domain length of the transmitted signal (after masking) becomes .
- ■
- Physical Shortening in the Time Domain
- Because these masked samples are not transmitted at all, we effectively reduce the symbol’s physical duration.
- In a typical data transmission system, if we can compress a waveform to a smaller time interval while still recovering it accurately at the receiver, we can fit more waveforms within a fixed total time interval (e.g., 1 s).
2.2.2. Linking the Masking Ratio to Channel Capacity
- (1)
- Idealized Capacity Gain Analysis
- ■
- Baseline (No Masking) Case
- Let the original DMT waveform occupy a duration . Within each symbol interval, we transmit samples.
- We define the “baseline” capacity, in bits per second (bps) can be expressed as Equation (9).
- If we are using a certain modulation format (e.g., QPSK, 16-QAM), “Bits/symbol” is determined by the number of bits encoded in each DMT symbol.
- ■
- Masked Transmission
- When a fraction is masked, the transmitted portion has duration .
- If we assume that we strictly use that shorter time to send the compressed DMT waveform and can pack more compressed symbols into the same total time, the capacity can be described as Equation (10).
- This suggests an ideal capacity improvement factor, as shown in Equation (11).
- ■
- Interpretation
- If (75% masked), then . In an ideal scenario with negligible overhead, we could quadruple throughput.
- If , .
- (2)
- Practical Considerations and the 1.85× Factor
- ■
- Overheads and Guard Intervals
- In practice, our experiment must incorporate positional encoding, guard intervals, zero padding, and other overheads that reduce the net gain from the theoretical .
- We also rely on the masked autoencoder’s ability to reconstruct the missing patches reliably. If the masking ratio is too high, the system may fail to meet our EVM threshold for reliable demodulation.
- ■
- Experimental Feasibility
- In our Li-Fi testbed, we found that at masking ratios of up to 85%, the error vector magnitude (EVM) for QPSK/16-QAM still remains below the first-generation GFEC limit.
- Accounting for all overhead (MAE training overhead, patch boundary, zero padding, etc.), our experiments showed an effective capacity gain of up to ~1.85× (instead of the pure theoretical factor near 6.67).
- This value (1.85×) reflects a realistic balance between high masking ratios and maintaining acceptable EVM under practical Li-Fi conditions (1 m distance, ≥240 lux illumination).
2.2.3. Mathematical Derivation
- ■
- Masked DMT Signal Length, can be described as Equation (12).
- ■
- Per-Symbol Transmission Time
- If the system transmits each masked symbol in , then the instantaneous symbol rate can be increased proportionally if we keep the same overall bandwidth or sampling rate.
- ■
- Maximum Achievable Data Rate
- Denote bits per DMT symbol (depending on QAM order and number of subcarriers). In an ideal scenario with no overhead, can be expressed as Equation (13).
- ■
- Realistic Rate Improvement
- Let be an experimental efficiency factor (), representing overhead or imperfection. Our measured rate gain is organized as Equation (14).
- In practice, we empirically observed to be significantly below 1, resulting in about 1.85× final measured improvement for .
2.2.4. Additional Factors Influencing Masking Ratio
- ■
- EVM Threshold
- As shown in our experimental plots, once r exceeds a certain level, the reconstruction’s accuracy degrades, pushing EVM beyond forward error correction thresholds.
- Thus, there is an optimal or near-optimal range for r (around 75–85% in our testbed) where capacity is maximized while maintaining acceptable symbol quality.
- ■
- Number of Patches, Positional Encoding Overhead
- If patches are too large, we lose the fine-grained advantage of self-attention. If they are too small, overhead and complexity rise.
- ■
- Complex Modulation Formats
- Surprisingly, for 16-QAM (a more complex modulation format), we observed better reconstruction performance at higher masking ratios compared to QPSK. This is due to the presence of more detailed signal features.
- Future expansions can consider advanced modulation orders or other waveforms, further exploring how the masking ratio and overhead interplay to affect net capacity gains.
3. Experimental Setup
3.1. Li-Fi Transmission Link Testbed Based on Sparse Coded Mask Model
- Controlled Channel Characteristics: Short-range optical transmission minimizes the impact of unpredictable environmental variables such as multipath reflections, non-line-of-sight propagation, and ambient noise. This enables us to isolate and evaluate the effect of our proposed masking, compression, and reconstruction techniques on the DMT waveform.
- Fine-grained Signal Capture: At shorter distances, signal distortion is minimized, allowing us to accurately measure subtle waveform differences introduced by the MAE process. These include time-domain waveform shape, constellation distortion (EVM), and patch recovery quality.
- Reproducibility and System Stability: A 1 m setup ensures consistent illumination intensity (300–440 lux) and signal power at the photo-detector, making it easier to reproduce results and compare performance under controlled variations in masking ratio, patch number, and batch size.
- Focus on Concept Validation: As stated in the manuscript (Section 3, p. 6), the main focus of this work was to validate the concept and behavior of the proposed sparse coded mask modeling technique when applied to Li-Fi DMT signals, rather than to develop a fully optimized long-range Li-Fi system.
- Transmission over Extended Distances: The proposed MAE-based framework is agnostic to transmission distance as long as the signal-to-noise ratio (SNR) at the receiver is sufficient to recover the transmitted DMT patches. In fact, as shown in Figure 7 of the manuscript, our system successfully recovered signals even at SNR levels as low as 22.5 dB, indicating robustness to moderate noise or signal degradation that may occur over longer distances.
- Adaptation to Ambient Illumination and Multipath Effects: The proposed method already accounts for variable lighting conditions, as demonstrated in Figure 8. Additionally, the use of transformer-based MAE architecture enables modeling of non-local dependencies, which may help in learning signal patterns distorted by multipath effects in larger indoor environments.
- Feasibility in Indoor Infrastructure: Many Li-Fi use cases (e.g., smart offices, classrooms, hospitals) are characterized by short to medium range communication, typically within 2–5 m. Therefore, the proposed 1 m setup serves as a relevant first step and is representative of core mechanisms applicable to these environments.
- Possible Enhancements for Long-Range Transmission: To extend the operational range, the following practical solutions can be employed:
- -
- Optical Lens System to focus and collimate the LED beam, increasing directionality and intensity.
- -
- Multiple Input Multiple Output (MIMO) architectures to aggregate spatial diversity and mitigate path loss.
- -
- Higher sensitivity photodiodes (e.g., SiPM or APD arrays) to enhance detection under lower illumination at longer ranges.
- -
- Adaptive masking ratio tuning based on channel state information (CSI) to preserve signal recoverability as the channel degrades.
3.2. Key Transmission Performance Metrics
3.2.1. Bit Error Rate (BER) vs. Error Vector Magnitude (EVM)
- ■
- Why we focus on EVM
- In our experiments, we employed Error Vector Magnitude (EVM) as the primary performance indicator. EVM is widely used in optical and wireless communications as it provides a direct measure of constellation distortion and correlates closely with BER once we set a threshold (e.g., the first-generation forward error correction (GFEC) limit).
- Specifically, in systems that use QAM-based modulation (QPSK, 16-QAM), an EVM under certain percentages typically assures a BER level under 10−3 or 10−4, depending on the coding scheme.
- ■
- Estimated BER from EVM
- Although we did not explicitly measure BER in real time, it is possible to relate EVM to BER via analytical expressions or standardized thresholds. For instance, the first GFEC threshold for QPSK is typically around 26.3% EVM, correlating to a BER near 3.8 × 10−3. For 16-QAM, a 12.3% EVM is often used as a typical threshold for meeting a BER near 10−3.
- As reported in our experimental results, when we keep EVM below those threshold values, we can infer that the resulting BER is within acceptable limits for standard FEC to correct any residual errors.
- Furthermore, a clear theoretical relationship exists between EVM and BER. Typically, a lower EVM indicates higher signal quality, directly translating to a lower BER. This relationship is mathematically defined as follows:
3.2.2. Computational Complexity and Processing Overhead
- ■
- Offline Training vs. Online Inference
- The proposed masked autoencoder (MAE) involves a training phase (offline) and an inference (deployment) phase (online).
- Training Complexity:
- -
- This is typically performed offline on a GPU-equipped workstation. The training process can indeed be computationally demanding. However, once training converges, the resulting model parameters are frozen and used for real-time inference.
- Inference Complexity:
- -
- The inference side typically involves fewer matrix multiplications and can run on relatively modest hardware, such as embedded GPUs or even advanced DSP/FPGA solutions, depending on implementation details.
- Hence, while there is a significant one-time overhead for training, the real-time overhead for the user is predominantly in the decoder that reconstructs masked patches. This step can be optimized.
- ■
- Comparison with Conventional Li-Fi
- Traditional Li-Fi systems might use simpler digital signal processors for equalization or channel estimation, leading to relatively low-latency operations.
- However, those systems transmit the entire waveform, whereas our approach physically omits a large fraction of the samples and recovers them with deep learning. This trade-off can yield a net advantage in throughput if the MAE’s inference overhead is kept manageable.
- To provide a rough computational comparison:
- -
- If we consider a typical Li-Fi link employing FFT-based OFDM or DMT, we have an IFFT/FFT block plus straightforward channel equalization.
- -
- Our method adds the overhead of a 1-D convolution and transformer-based reconstruction at the receiver. The precise complexity depends on hyperparameters such as the number of patches, embedding dimension, and number of attention heads.
- In future work, we plan to include scalability metrics (e.g., how many MAC operations or FLOPs per symbol) to give a more quantitative sense of overhead. Early estimates suggest that for moderate patch sizes and a well-optimized network, real-time operation at tens of MHz sampling rates can be feasible with standard GPU or FPGA acceleration.
- ■
- Possible Reduction in Complexity
- Techniques like model pruning, quantization, or lightweight transformers can reduce complexity further.
- Our approach can also be integrated with partial layering or smaller embedding sizes for faster inference, at some potential trade-off in reconstruction fidelity.
3.2.3. Performance Under Varying Distances and Real-World Channel Impairments
- ■
- Current Setup
- Our current experiments are performed at 1 m with an LED-based Li-Fi link operating around 300–440 lux. We deliberately tested different lux levels to simulate typical indoor illumination conditions, but we have not yet covered a range of link distances (e.g., 2 m, 5 m, etc.).
- ■
- Impact of Distance
- As distance grows, the received optical power drops (following an inverse-square law in free-space optical scenarios), and the SNR typically decreases.
- Lower SNR can degrade the autoencoder’s ability to reconstruct heavily masked waveforms, pushing EVM higher.
- However, if we can ensure an adequate SNR above the forward error correction threshold, the overall concept should still apply. The system might require a lower masking ratio at longer distances or a higher optical power/better photodiode sensitivity to maintain acceptable EVM.
- ■
- Real-World Channel Impairments
- In practice, ambient light interference, multi-LED flicker, device aging, and reflection-induced multipath can further degrade SNR or introduce nonlinearities.
- Our masked autoencoder approach is relatively robust because it learns the global structure of the DMT waveform. However, more severe channel distortions may require additional training data or an expanded neural network capacity.
- We intend to investigate multi-path or large-signal distortion scenarios in future work. The masked approach could also be combined with a pre-distortion or equalization stage if necessary.
- ■
- Future work
- We concur that a more in-depth test across various distances and real-world building environments (e.g., offices, corridors) will help demonstrate the generalizability of our approach. We will aim to include such results in an extended version of this research or subsequent publications.
- BER Discussion: While we measured EVM in our current paper, we acknowledge that a direct BER vs. SNR analysis can provide additional clarity. We will integrate approximate BER estimates and potentially run separate BER tests in future expansions.
- Computational Overhead: Our method introduces a deep learning step for reconstruction, which can be performed in real-time with optimized hardware. A direct complexity comparison to traditional Li-Fi techniques is planned to give numerical overhead estimates.
- Distance and Channel Impairments: We have validated the method at 1 m and under typical indoor illuminance, but broader testing at multiple distances and channels is an important next step.
4. Experimental Results
- (A)
- Summary of the Experimental Transmission System
- ■
- Physical bandwidth used: 10 MHz
- ■
- Modulation formats: QPSK and 16-QAM
- ■
- Number of subcarriers per DMT symbol: 512
- ■
- Cyclic Prefix (CP): Length of 16
- ■
- Total samples per transmission: 2272 (including four DMT symbols and zero-padding)
- ■
- Masking ratio: Up to 85%
- ■
- Sampling rate: 40 MS/s (Arbitrary Waveform Generator)
- (B)
- Data Rate Estimation
- ■
- QPSK Transmission Rate
- Baseline rate (without masking):
- -
- Since QPSK carries 2 bits per symbol:
- -
- Baseline = 10 MHz × 2 bits = 20 Mbps
- Enhanced rate with 85% masking (i.e., a 1.85× increase in channel capacity):
- -
- Effective rate = 20 Mbps × (1 + 0.85) = 37 Mbps
- ■
- 16-QAM Transmission Rate
- Baseline rate:
- -
- 16-QAM carries 4 bits per symbol:
- -
- Baseline = 10 MHz × 4 bits = 40 Mbps
- Enhanced rate with 85% masking:
- -
- Effective rate = 40 Mbps × (1 + 0.85) = 74 Mbps
- ■
- Masking Ratio
- The masking ratio determines the degree of compression applied to the DMT signal. As demonstrated in our experimental results (Figure 3 and Figure 4 of the manuscript), we observed that increasing the masking ratio up to 75% resulted in a decrease in error vector magnitude (EVM) for both QPSK and 16-QAM symbols. This somewhat counterintuitive result is attributed to the self-supervised learning capability of the MAE, which encourages global pattern learning and reduces overfitting when less information is directly provided. However, if the masking ratio exceeds 85%, reconstruction performance starts to degrade due to the lack of sufficient unmasked patches, as explained on page 6 of the manuscript.
- ■
- Patch Numbers
- As shown in Figure 5, increasing the number of DMT patches from 8 to 32 led to a decrease in EVM, indicating improved reconstruction. Smaller patches preserve finer time-domain details, making it easier for the MAE model to learn local and global structures. However, further increasing the patch count to 64 resulted in higher EVM due to excessive fragmentation, which hinders temporal continuity learning and increases model complexity. Therefore, a trade-off exists between information density and spatial resolution, requiring careful tuning depending on the modulation scheme and waveform complexity.
- ■
- Batch Size
- Figure 6 illustrates the influence of batch size on EVM. A smaller batch size introduces stochastic gradient noise during training, promoting generalization and aiding the MAE in learning diverse temporal patterns. On the other hand, large batch sizes tend to stabilize training but may lead to overfitting limited data patterns. An optimal batch size of 32 was selected to balance memory efficiency and reconstruction performance in our implementation.
- ■
- Practical Feasibility in Real-World Li-Fi Systems:
- Hardware Compatibility:
- -
- Our proposed method operates entirely in the digital baseband domain. The masking and reconstruction processes are applied to the digital representation of DMT waveforms before digital-to-analog conversion. Therefore, no modifications to the optical front-end (e.g., LEDs, photodiodes) are required. This makes our method highly compatible with the existing Li-Fi transceiver hardware.
- Integration into DSP Chains:
- -
- The MAE-based reconstruction module can be embedded as part of the digital signal processing (DSP) pipeline on the receiver side. Given that many commercial Li-Fi systems already include FPGA- or SoC-based DSP platforms, deploying our model using an efficient inference engine (e.g., TensorRT, ONNX, or custom hardware accelerators) is feasible. Quantization-aware training can further reduce computational overhead for edge deployment.
- Latency and Complexity:
- -
- Our proposed system is based on a relatively lightweight transformer encoder–decoder architecture (4 layers, 128-dim embeddings), and inference time per DMT frame is in the sub-millisecond range on modern embedded GPUs. Furthermore, the patch-based nature of our method supports parallelization across hardware threads.
- Adaptability to Dynamic Environments:
- -
- The masking ratio can be dynamically adjusted based on real-time channel conditions (e.g., under high interference or signal loss). The model’s robustness to high masking ratios (up to 85%) provides flexibility and resilience under varying Li-Fi deployment scenarios (e.g., mobility, partial blockage).
- Training and Update Strategy:
- -
- Since the MAE model follows a self-supervised training scheme, it can be periodically retrained or fine-tuned using unlabeled waveform data collected from the field. This supports lifelong learning in deployed systems without manual annotation.
- Energy Efficiency Considerations:
- -
- Because the proposed method reduces the total number of transmitted samples via masking, it has the potential to reduce average power consumption on the transmitter side—an important factor in energy-sensitive Li-Fi applications.
- ■
- OFDM-based Visible Light Communications (VLC)
- Similar to DMT, Orthogonal Frequency Division Multiplexing (OFDM) systems also encode signals over multiple subcarriers. The proposed masking and reconstruction strategy can be applied to compress OFDM waveforms for high-throughput VLC systems, especially in scenarios constrained by physical LED bandwidth.
- ■
- mmWave and Terahertz Wireless Communications
- In millimeter-wave (mmWave) and Terahertz (THz) systems, signal waveforms often suffer from path loss and hardware nonlinearities. Applying masked autoencoder-based compression and recovery could help enhance robustness and reduce the overhead associated with channel estimation or signal regeneration.
- ■
- Underwater Optical Communications
- The underwater environment exhibits strong multipath fading and scattering. The MAE-based technique can be adapted to reconstruct distorted waveforms received after transmission through turbid water, thereby improving effective throughput and error resilience.
- ■
- Biomedical Time-Series Signal Compression
- Electrocardiogram (ECG) and Electroencephalogram (EEG) signals are also examples of multivariate time-series data. Our method can be used for efficient compression and reconstruction of these signals in wearable devices with limited transmission bandwidth, while preserving signal integrity for downstream diagnostic use.
- ■
- Structural Health Monitoring (SHM)
- Sensor networks deployed for SHM generate large volumes of time-series vibration or strain data. By applying sparse coded masking, data can be compressed at the edge before transmission to a central server, reducing bandwidth consumption while retaining reconstruction fidelity for anomaly detection.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Hass, H. LiFi is a paradigm-shifting 5G technology. Rev. Phys. 2018, 3, 26–31. [Google Scholar] [CrossRef]
- Albraheem, L.I.; Alhudaithy, L.H.; Aljaser, A.A.; Aldhaan, M.R.; Bahliwah, G.M. Toward designing a Li-Fi-based hierarchical IoT architecture. IEEE Access 2018, 6, 40811–40825. [Google Scholar] [CrossRef]
- Elamassie, M.; Miramirkhani, F.; Uysal, M. Performance characterization of underwater visible light communication. IEEE Trans. Commun. 2018, 67, 543–552. [Google Scholar] [CrossRef]
- Wu, X.; Safari, M.; Haas, H. Access point selection for hybrid Li-Fi and Wi-Fi networks. IEEE Trans. Commun. 2017, 65, 5375–5385. [Google Scholar] [CrossRef]
- Islim, M.S.; Ferreira, R.; He, X.; Xie, E.; Videv, S.; Viola, S.; Watson, S.; Bamiedakis, N.; Penty, R.; White, I.; et al. Towards 10 Gb/s orthogonal frequency division multiplexing-based visible light communication using a GaN violet micro-LED. Photonics Res. 2017, 5, A35–A43. [Google Scholar] [CrossRef]
- Binh, P.H.; Hung, N.T. High-speed visible light communications using ZnSe-based white light emitting diode. IEEE Photonics Technol. Lett. 2016, 28, 1948–1951. [Google Scholar] [CrossRef]
- Cossu, G.; Khalid, A.M.; Choudhury, P.; Corsini, R.; Ciaramella, E. 3.4 Gbit/s visible optical wireless transmission based on RGB LED. Opt. Express 2012, 20, B501–B506. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Huang, X.; Tao, L.; Shi, J.; Chi, N. 4.5-Gb/s RGB-LED based WDM visible light communication system employing CAP modulation and RLS based adaptive equalization. Opt. Express 2015, 23, 13626–13633. [Google Scholar] [CrossRef] [PubMed]
- Shupeng, L.; Huang, H.; Zou, Y. A Novel Pre-Equalization Scheme for Visible Light Communications with Direct Learning Ar-chitecture. In Proceedings of the 2024 IEEE Wireless Communications and Networking Conference, Dubai, United Arab Emirates, 22 April 2024. [Google Scholar]
- Wang, D.; Song, Y.; Li, J.; Qin, J.; Yang, T.; Zhang, M.; Chen, X.; Boucouvalas, A.C. Data-driven Optical Fiber Channel Modeling: A Deep Learning Approach. J. Light. Technol. 2020, 38, 4730–4743. [Google Scholar] [CrossRef]
- Zhongya, L.; Shi, J.; Zhao, Y.; Li, G.; Chen, J.; Zhang, J.; Chi, N. Deep learning based end-to-end visible light communication with an in-band channel modeling strategy. Opt. Express 2022, 30, 28905–28921. [Google Scholar]
- Wu, X.; Huang, Z.; Ji, Y. Deep neural network method for channel estimation in visible light communication. Opt. Commun. 2020, 462, 12572. [Google Scholar] [CrossRef]
- Wenqing, N.; Chen, H.; Hu, F.; Shi, J.; Ha, Y.; Li, G.; He, Z.; Yu, S.; Chi, N. Neural-Network-Based Nonlinear Tomlin-son-Harashima Precoding for Bandwidth-Limited Underwater Visible Light Communication. J. Light. Technol. 2022, 40, 2296–2306. [Google Scholar]
- Ulkar, M.G.; Baykas, T.; Pusane, A.E. VLCnet: Deep Learning Based End-to-End Visible Light Communication System. J. Light. Technol. 2020, 38, 5937–5948. [Google Scholar] [CrossRef]
- Tang, P.; Zhang, X. MTSMAE: Masked Autoencoders for Multivariate Time-Series Forecasting. In Proceedings of the 2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI), Macao, China, 1 November 2022. [Google Scholar]
- Zijiao, C.; Qing, J.; Xiang, T.; Yue, W.L.; Zhou, J. Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked Modeling for Vision Decoding. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 20 June 2023. [Google Scholar]
- Gong, M.; Sun, J.; Xie, X.; Zheng, Y. Multivariate Time Series Prediction based on Improved Transformer Model in Computing System. In Proceedings of the 2023 2nd International Conference on Cloud Computing, Big Data Application and Software Engineering (CBASE), Chengdu, China, 4 November 2023. [Google Scholar]
- Ma, Q.; Liu, Z.; Zheng, Z.; Huang, Z.; Zhu, S.; Yu, Z.; Kwok, J.T. A Survey on Time-Series Pre-Trained Models. IEEE Trans. Knowl. Data Eng. 2024, 36, 7536–7555. [Google Scholar] [CrossRef]
- Na, W.; Liu, K.; Zhang, W.; Xie, H.; Jin, D. Deep Neural Network with Batch Normalization for Automated Modeling of Microwave Components. In Proceedings of the 2020 IEEE MTT-S International Conference on Numerical Electromagnetic and Multiphysics Modeling and Optimization (NEMO), Hangzhou, China, 8 December 2020. [Google Scholar]
- Wu, S.; Li, G.; Deng, L.; Liu, L.; Wu, D.; Xie, Y.; Shi, L. L1-Norm Batch Normalization for Efficient Training of Deep Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2018, 30, 2043–2051. [Google Scholar] [CrossRef] [PubMed]
- Lin, R. Analysis on the Selection of the Appropriate Batch Size in CNN Neural Network. In Proceedings of the 2022 International Conference on Machine Learning and Knowledge Engineering (MLKE), Guilin, China, 26 February 2022. [Google Scholar]
Other Techniques | Key Aspects of Other Techniques | Differences from Our Proposed Technique |
---|---|---|
[9] S. Li et al. “A Novel Pre-Equalization Scheme for Visible Light Communications with Direct Learning Architecture” |
|
|
[10] D. Wang et al. “Data-driven Optical Fiber Channel Modeling: A Deep Learning Approach” |
|
|
[11] Z. Li et al. “Deep learning based end-to-end visible light communication with an in-band channel modeling strategy” |
|
|
[12] X. Wu et al. “Deep neural network method for channel estimation in visible light communication” |
|
|
[13] W. Na et al. “Neural-Network-Based Nonlinear Tomlinson-Harashima Precoding for Bandwidth-Limited Underwater Visible Light Communication” |
|
|
[14] M. G. Ulkar et al. “VLCnet: Deep Learning Based End-to-End Visible Light Communication System” |
|
|
Method | Compression Capability | Reconstruction Ability | Modulation Compatibility | Adaptability | Hardware Cost | Representative References |
---|---|---|---|---|---|---|
Adaptive equalization |
|
|
|
|
| Wang et al., J. Lightwave Technol., 2020 [10] |
CAP modulation (Carrierless amplitude modulation) |
|
|
|
|
| Wang et al., Opt. Express, 2015 [8] |
CNN/FNN-based channel modeling or demodulation |
|
|
|
|
| Ulkar et al., J. Lightwave Technol., 2020 [14] |
Proposed MAE-based sparse mask modeling |
|
|
|
|
| This paper |
Parameter | Value/Specification | Component/Reference |
---|---|---|
Transmitter LED | OSRAM® LUW W5AM, ThinGaN, SMT package | OSRAM datasheet |
Peak luminous output | 116 lm | Manufacturer specification |
Emission spectrum | 400–700 nm (visible white light) | ThinGaN LED technology |
Optical beam divergence angle | ~170° (FWHM) | Based on LED viewing angle at 50% light output |
Transmitting lens | Biconvex lens, focal length: 60 mm, diameter: 50.8 mm | Thorlabs LB1723-A |
Lens spectral passband | 350–700 nm | Thorlabs LB1723-A |
Estimated transmitted optical power | ~10 mW (estimated from luminous flux and beam divergence) | Derived from 116 lm specification |
Bias-tee (LED driver coupling) | Mini-Circuits® ZFBT-6GW-FT+, 100 kHz–6 GHz, insertion loss: 0.15 dB, max power: 30 dBm | Mini-Circuits® datasheet |
Free-space transmission distance | 1 m | - |
Receiver photo-detector | Avalanche photodiode (APD), measured illumination: 300 lux | Hamamatsu (C5330-11) |
Receiver field of view (FOV) | 30° | - |
Operating illuminance environment | ≥240 lux | Minimum required for successful transmission |
Key Hyperparameter | Value |
---|---|
Patch size | 16 samples (selected via ablation over {8, 16,32} |
Masking ratio | 75–85% (optimized based on lowest EVM with acceptable BER) |
Transformer depth | 4 layers (evaluated over {2, 4, 6}) |
Embedding dimension | 128 |
Number of attention heads | 4 |
Dropout rate | 0.1 |
Training epochs | 200 |
Optimizer | Adam with learning rate = 1 × 10−4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Won, Y.-Y.; Han, H.; Choi, D.; Yoon, S.M. Enhancement of Optical Wireless Discrete Multitone Channel Capacity Based on Li-Fi Using Sparse Coded Mask Modeling. Photonics 2025, 12, 395. https://doi.org/10.3390/photonics12040395
Won Y-Y, Han H, Choi D, Yoon SM. Enhancement of Optical Wireless Discrete Multitone Channel Capacity Based on Li-Fi Using Sparse Coded Mask Modeling. Photonics. 2025; 12(4):395. https://doi.org/10.3390/photonics12040395
Chicago/Turabian StyleWon, Yong-Yuk, Heetae Han, Dongmin Choi, and Sang Min Yoon. 2025. "Enhancement of Optical Wireless Discrete Multitone Channel Capacity Based on Li-Fi Using Sparse Coded Mask Modeling" Photonics 12, no. 4: 395. https://doi.org/10.3390/photonics12040395
APA StyleWon, Y.-Y., Han, H., Choi, D., & Yoon, S. M. (2025). Enhancement of Optical Wireless Discrete Multitone Channel Capacity Based on Li-Fi Using Sparse Coded Mask Modeling. Photonics, 12(4), 395. https://doi.org/10.3390/photonics12040395