Beyond Granularity: Enhancing Continuous Sign Language Recognition with Granularity-Aware Feature Fusion and Attention Optimization
Abstract
:1. Introduction
- We introduce a methodology for fusing multi-scale visual feature maps, leveraging a self-attention mechanism to facilitate the flexible and dynamic integration of visual feature maps across diverse dimensions and granularities.
- We analytically examine the limitations of the vanilla Transformer in modeling sign language videos and propose an improvement to the self-attention mechanism by substituting the dot product with Euclidean distance.
- Extensive experiments and ablation studies validate the efficacy of our method, significantly enhancing the performance of Transformer-based models on the continuous cSLR task.
2. Related Work
2.1. Continuous Sign Language Recognition
2.2. Feature Fusion
3. Methodology
3.1. Overall Model Architecture
3.2. Spatial Encoder and Multi-Scale Features
3.3. Multi-Scale Features Fusion with Self-Attention
3.4. Self-Attention Based on Euclidean Distance
3.5. The Local Window in Self-Attention
3.6. CTC Loss and CTC Decoder
Algorithm 1 Multi-scaled feature fusion–Euclidean Transformer (MSF-ET) for continuous sign language recognition. |
Input: Sign language videos Output: Sequence of glosses corresponding to the video V
|
4. Experiments
4.1. Datasets and Metrics
4.2. Implementation Details
4.2.1. Network Details
4.2.2. Training Setup
4.3. Ablation Study
4.3.1. Analysis of Multi-Scale Feature Fusion
4.3.2. Analysis of Feature Fusion Method
4.3.3. Analysis of Attention with Euclidean Distance
4.3.4. Analysis of Local Transformer
4.3.5. Analysis of Time Complexity and Inference Time Cost
4.4. Comparison with Baselines
4.4.1. Evaluation on PHOENIX-2014
4.4.2. Evaluation on CSL
5. Conclusions
6. Limitations and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Koller, O.; Ney, H.; Bowden, R. Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data is Continuous and Weakly Labelled. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3793–3802. [Google Scholar]
- Cihan Camgoz, N.; Hadfield, S.; Koller, O.; Bowden, R. Subunets: End-to-end hand shape and continuous sign language recognition. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3056–3065. [Google Scholar]
- Cui, R.; Liu, H.; Zhang, C. Recurrent convolutional neural networks for continuous sign language recognition by staged optimization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7361–7369. [Google Scholar]
- Koller, O.; Zargaran, S.; Ney, H. Re-sign: Re-aligned end-to-end sequence modelling with deep recurrent CNN-HMMs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4297–4305. [Google Scholar]
- Song, P.; Guo, D.; Xin, H.; Wang, M. Parallel temporal encoder for sign language translation. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 1915–1919. [Google Scholar]
- Yang, Z.; Shi, Z.; Shen, X.; Tai, Y.W. Sf-net: Structured feature network for continuous sign language recognition. arXiv 2019, arXiv:1908.01341. [Google Scholar]
- Zhou, H.; Zhou, W.; Zhou, Y.; Li, H. Spatial-temporal multi-cue network for continuous sign language recognition. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), New York, NY, USA, 7–12 February 2020; Volume 34, pp. 13009–13016. [Google Scholar]
- Yin, K.; Read, J. Better sign language translation with STMC-transformer. arXiv 2020, arXiv:2004.00588. [Google Scholar]
- Xie, P.; Cui, Z.; Du, Y.; Zhao, M.; Cui, J.; Wang, B.; Hu, X. Multi-scale local-temporal similarity fusion for continuous sign language recognition. Pattern Recognit. 2023, 136, 109233. [Google Scholar] [CrossRef]
- Koller, O. Quantitative survey of the state of the art in sign language recognition. arXiv 2020, arXiv:2008.09918. [Google Scholar]
- Papastratis, I.; Dimitropoulos, K.; Konstantinidis, D.; Daras, P. Continuous sign language recognition through cross-modal alignment of video and text embeddings in a joint-latent space. IEEE Access 2020, 8, 91170–91180. [Google Scholar] [CrossRef]
- Cheng, K.L.; Yang, Z.; Chen, Q.; Tai, Y.W. Fully convolutional networks for continuous sign language recognition. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXIV 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 697–714. [Google Scholar]
- Wang, S.; Guo, D.; Zhou, W.g.; Zha, Z.J.; Wang, M. Connectionist temporal fusion for sign language translation. In Proceedings of the 26th ACM international conference on Multimedia, Seoul, Republic of Korea, 22–26 October 2018; pp. 1483–1491. [Google Scholar]
- Pu, J.; Zhou, W.; Li, H. Dilated convolutional network with iterative optimization for continuous sign language recognition. In Proceedings of the 2018 International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; Volume 3, p. 7. [Google Scholar]
- Pei, X.; Guo, D.; Zhao, Y. Continuous sign language recognition based on pseudo-supervised learning. In Proceedings of the 2nd Workshop on Multimedia for Accessible Human Computer Interfaces, New York, NY, USA, 25 October 2019; pp. 33–39. [Google Scholar]
- Guo, D.; Wang, S.; Tian, Q.; Wang, M. Dense Temporal Convolution Network for Sign Language Translation. In Proceedings of the International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 744–750. [Google Scholar]
- Camgoz, N.C.; Koller, O.; Hadfield, S.; Bowden, R. Multi-Channel Transformers for Multi-Articulatory Sign Language Translation; European Conference on Computer Vision (ECCV); Springer: Berlin/Heidelberg, Germany, 2020; pp. 301–319. [Google Scholar]
- Camgoz, N.C.; Koller, O.; Hadfield, S.; Bowden, R. Sign language transformers: Joint end-to-end sign language recognition and translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10023–10033. [Google Scholar]
- Koller, O.; Zargaran, S.; Ney, H.; Bowden, R. Deep sign: Enabling robust statistical continuous sign language recognition via hybrid CNN-HMMs. Int. J. Comput. Vis. 2018, 126, 1311–1325. [Google Scholar] [CrossRef]
- Koller, O.; Camgoz, N.C.; Ney, H.; Bowden, R. Weakly supervised learning with multi-stream CNN-LSTM-HMMs to discover sequential parallelism in sign language videos. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 42, 2306–2320. [Google Scholar] [CrossRef]
- Vallathan, G.; John, A.; Thirumalai, C.; Mohan, S.; Srivastava, G.; Lin, J.C.W. Suspicious activity detection using deep learning in secure assisted living IoT environments. J. Supercomput. 2021, 77, 3242–3260. [Google Scholar] [CrossRef]
- Huang, F.; Lu, K.; Yuxi, C.; Qin, Z.; Fang, Y.; Tian, G.; Li, G. Encoding recurrence into transformers. In Proceedings of the Eleventh International Conference on Learning Representations, Virtual, 25 April 2022. [Google Scholar]
- Peng, B.; Alcaide, E.; Anthony, Q.; Albalak, A.; Arcadinho, S.; Cao, H.; Cheng, X.; Chung, M.; Grella, M.; GV, K.K.; et al. Rwkv: Reinventing rnns for the transformer era. arXiv 2023, arXiv:2305.13048. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 16000–16009. [Google Scholar]
- Rastgoo, R.; Kiani, K.; Escalera, S. Sign language recognition: A deep survey. Expert Syst. Appl. 2021, 164, 113794. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Ma, Y.; Xu, T.; Kim, K. Two-Stream Mixed Convolutional Neural Network for American Sign Language Recognition. Sensors 2022, 22, 5959. [Google Scholar] [CrossRef] [PubMed]
- Kındıroglu, A.A.; Özdemir, O.; Akarun, L. Aligning accumulative representations for sign language recognition. Mach. Vis. Appl. 2023, 34, 12. [Google Scholar] [CrossRef]
- Huang, J.; Zhou, W.; Zhang, Q.; Li, H.; Li, W. Video-based sign language recognition without temporal segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- De Castro, G.Z.; Guerra, R.R.; Guimarães, F.G. Automatic translation of sign language with multi-stream 3D CNN and generation of artificial depth maps. Expert Syst. Appl. 2023, 215, 119394. [Google Scholar] [CrossRef]
- Zhang, Z.; Pu, J.; Zhuang, L.; Zhou, W.; Li, H. Continuous sign language recognition via reinforcement learning. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 285–289. [Google Scholar]
- Graves, A.; Graves, A. Connectionist temporal classification. In Supervised Sequence Labelling with Recurrent Neural Networks; Springer: Berlin/Heidelberg, Germany, 2012; pp. 61–93. [Google Scholar]
- Borg, M.; Camilleri, K.P. Phonologically-meaningful subunits for deep learning-based sign language recognition. In Proceedings of the Computer Vision—ECCV 2020 Workshops, Glasgow, UK, 23–28 August 2020; Proceedings, Part II 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 199–217. [Google Scholar]
- Cui, R.; Liu, H.; Zhang, C. A deep neural framework for continuous sign language recognition by iterative training. IEEE Trans. Multimed. 2019, 21, 1880–1891. [Google Scholar] [CrossRef]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Dong, Q.; Cao, C.; Fu, Y. Rethinking optical flow from geometric matching consistent perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 1337–1347. [Google Scholar]
- Lagemann, C.; Lagemann, K.; Mukherjee, S.; Schröder, W. Challenges of deep unsupervised optical flow estimation for particle-image velocimetry data. Exp. Fluids 2024, 65, 30. [Google Scholar] [CrossRef]
- He, X.; Bharaj, G.; Ferman, D.; Rhodin, H.; Garrido, P. Few-shot geometry-aware keypoint localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 21337–21348. [Google Scholar]
- Xu, X.; Guan, L.; Dunn, E.; Li, H.; Hua, G. DDM-NET: End-to-end learning of keypoint feature Detection, Description and Matching for 3D localization. arXiv 2022, arXiv:2212.04575. [Google Scholar]
- Zhao, L.; Zhang, Z. A improved pooling method for convolutional neural networks. Sci. Rep. 2024, 14, 1589. [Google Scholar] [CrossRef]
- Zafar, A.; Aamir, M.; Mohd Nawi, N.; Arshad, A.; Riaz, S.; Alruban, A.; Dutta, A.K.; Almotairi, S. A comparison of pooling methods for convolutional neural networks. Appl. Sci. 2022, 12, 8643. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Koller, O.; Forster, J.; Ney, H. Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers. Comput. Vis. Image Underst. 2015, 141, 108–125. [Google Scholar] [CrossRef]
- Min, Y.; Hao, A.; Chai, X.; Chen, X. Visual alignment constraint for continuous sign language recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 11542–11551. [Google Scholar]
- Zheng, J.; Wang, Y.; Tan, C.; Li, S.; Wang, G.; Xia, J.; Chen, Y.; Li, S.Z. Cvt-slr: Contrastive visual-textual transformation for sign language recognition with variational alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 23141–23150. [Google Scholar]
- Guan, M.; Wang, Y.; Ma, G.; Liu, J.; Sun, M. Multi-Stream Keypoint Attention Network for Sign Language Recognition and Translation. arXiv 2024, arXiv:2405.05672. [Google Scholar]
- Zhou, M.; Ng, M.; Cai, Z.; Cheung, K.C. Self-attention-based fully-inception networks for continuous sign language recognition. In ECAI 2020; IOS Press: Amsterdam, The Netherlands, 2020; pp. 2832–2839. [Google Scholar]
- Xie, P.; Zhao, M.; Hu, X. Pisltrc: Position-informed sign language transformer with content-aware convolution. IEEE Trans. Multimed. 2021, 24, 3908–3919. [Google Scholar] [CrossRef]
- Hu, H.; Zhao, W.; Zhou, W.; Li, H. Signbert+: Hand-model-aware self-supervised pre-training for sign language understanding. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 11221–11239. [Google Scholar] [CrossRef] [PubMed]
- Guo, D.; Zhou, W.; Li, H.; Wang, M. Hierarchical LSTM for sign language translation. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Spatial Features | Dev | Test | ||
---|---|---|---|---|
del/ins | WER ↓ | del/ins | WER ↓ | |
Fine-grained features | 13.5/4.0 | 32.9 | 14.1/3.7 | 33.4 |
Coarse-grained features | 12.0/3.6 | 27.1 | 11.8/3.6 | 27.0 |
Fused features | 7.4/3.3 | 22.4 | 7.3/3.2 | 22.6 |
Fusion Methods | Dev | Test | |||
---|---|---|---|---|---|
del/ins | WER | del/ins | WER | ||
Concatenation | max-pooling | 14.6/3.3 | 26.7 | 14.4/3.9 | 27.1 |
avg-pooling | 12.1/4.9 | 25.9 | 12.8/3.6 | 26.2 | |
self-attention | 7.4/3.3 | 22.4 | 7.3/3.2 | 22.6 | |
Sum | max-pooling | 13.7/3.7 | 26.5 | 13.5/4.0 | 26.9 |
avg-pooling | 12.8/4.2 | 26.1 | 12.2/3.7 | 25.8 | |
self-attention | 7.6/3.7 | 22.8 | 7.4/4.1 | 23.1 |
Spatial Features | Attention | Dev | Test | ||
---|---|---|---|---|---|
del/ins | WER | del/ins | WER | ||
Coarse-grained features | scaled-dot | 12.3/4.8 | 28.3 | 12.7/4.1 | 28.5 |
Euclidean | 12.0/3.6 | 27.1 | 11.8/3.6 | 27.0 | |
Fine-grained features | scaled-dot | 16.1/2.9 | 34.7 | 15.3/3.5 | 34.5 |
Euclidean | 13.5/4.0 | 32.9 | 14.1/3.7 | 33.4 | |
Fused features | scaled-dot | 8.7/3.4 | 26.3 | 8.4/4.2 | 25.9 |
Euclidean | 7.4/3.3 | 22.4 | 7.3/3.2 | 22.6 |
Last Layers | Local Size | Dev | Test | ||
---|---|---|---|---|---|
del/ins | WER | del/ins | WER | ||
w/o | - | 10.3/3.2 | 28.9 | 10.5/3/4 | 29.3 |
1D-CNN | 3 | 11.2/3.9 | 27.2 | 11.8/3.7 | 27.7 |
5 | 7.9/3.8 | 26.1 | 7.8/3.7 | 25.9 | |
7 | 7.7/3.7 | 25.6 | 7.9/3.7 | 25.8 | |
Local Transformer | 3 | 8.2/3.1 | 24.7 | 7.6/4.2 | 24.2 |
5 | 7.4/3.3 | 22.4 | 7.3/3.2 | 22.6 | |
7 | 8.5/4.7 | 25.0 | 7.7/3.7 | 24.6 |
Models | Dev | Test | ||
---|---|---|---|---|
del ↓/ins ↓ | WER ↓ | del ↓/ins ↓ | WER ↓ | |
HMM/CNN/RNN Based | ||||
CMLLR [50] | 21.8/3.9 | 55.0 | 20.3/4.5 | 53.0 |
SubUNets [2] | 14.6/4.0 | 40.8 | 14.3/4.0 | 40.7 |
Staged-Opt [3] | 13.7/7.3 | 39.4 | 12.2/7.5 | 38.7 |
Re-sign [4] | -/- | 27.1 | -/- | 26.8 |
LS-HAN [32] | -/- | - | - | 38.3 |
CNN-HMM [19] | -/- | 31.6 | -/- | 32.5 |
DenseTCN [16] | 10.7/5.1 | 35.9 | 10.5/5.5 | 36.5 |
CNN-LSTM-HMM [20] | -/- | 26.0 | -/- | 26.0 |
DNF(RGB) [37] | 7.8/3.5 | 23.8 | 7.8/3.4 | 24.4 |
DNF (RGB + Flow) [37] | 7.3/3.3 | 23.1 | 6.7/3.3 | 22.9 |
FCN [12] | -/- | 23.7 | -/- | 23.9 |
SMC [7] | 7.6/3.8 | 22.7 | 7.4/3.5 | 22.4 |
VAC [51] | 7.9/2.5 | 21.2 | 8.4/2.6 | 22.3 |
mLTSF-Net [9] | 8.5/3.3 | 23.8 | 8.3/3.1 | 23.5 |
CVT-SLR [52] | -/- | 21.8 | -/- | 22.0 |
MSKA-SLR [53] | -/- | 21.7 | -/- | 22.1 |
Transformer Based | ||||
REINFORCE (SL) [34] | 5.7/6.8 | 39.7 | 5.8/6.8 | 40.0 |
REINFORCE (RL) [34] | 7.3/5.2 | 38.0 | 7.0/5.7 | 38.3 |
SAFI [54] | 16.6/1.8 | 31.7 | 15.1/1.7 | 31.3 |
SL-Transformers [18] | 39.3/2.8 | 50.18 | 37.0/2.8 | 47.96 |
SL-Tramsformers(pre) [18] | 13.5/5.7 | 26.7 | 13.8/6.4 | 27.62 |
PiSLTRc-R [55] | 8.1/3.4 | 23.4 | 7.6/3.3 | 23.2 |
signBERT+ [56] | -/- | 34.0 | -/- | 34.1 |
MSF-ET (Ours) | 7.4/3.3 | 22.4 | 7.3/3.2 | 22.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Du, Y.; Peng, T.; Hu, X. Beyond Granularity: Enhancing Continuous Sign Language Recognition with Granularity-Aware Feature Fusion and Attention Optimization. Appl. Sci. 2024, 14, 8937. https://doi.org/10.3390/app14198937
Du Y, Peng T, Hu X. Beyond Granularity: Enhancing Continuous Sign Language Recognition with Granularity-Aware Feature Fusion and Attention Optimization. Applied Sciences. 2024; 14(19):8937. https://doi.org/10.3390/app14198937
Chicago/Turabian StyleDu, Yao, Taiying Peng, and Xiaohui Hu. 2024. "Beyond Granularity: Enhancing Continuous Sign Language Recognition with Granularity-Aware Feature Fusion and Attention Optimization" Applied Sciences 14, no. 19: 8937. https://doi.org/10.3390/app14198937
APA StyleDu, Y., Peng, T., & Hu, X. (2024). Beyond Granularity: Enhancing Continuous Sign Language Recognition with Granularity-Aware Feature Fusion and Attention Optimization. Applied Sciences, 14(19), 8937. https://doi.org/10.3390/app14198937