Automatic Essay Scoring Method Based on Multi-Scale Features
Abstract
:1. Introduction
- In order to reduce the influences of the above problems and to score essays more comprehensively, we propose the AES method based on Multi-Scale Semantic Features (MSSF). In particular, we extract multiple-scale characteristics with different modules: (1) We utilize the LSTM-MoT model to extract document-scale global semantic features of essays. (2) After the sentence vector of the essay is extracted by the SBERT, the context relevance of the local features is extracted by LSTM. Then, we utilize attention pooling to determine the contribution of the final scores and obtain sentence-scale local semantic features. (3) The relevance between the essays and their corresponding prompts is an important basis for scoring. We vectorize them by Doc2Vec and calculate their similarities to obtain their relevance features. (4) In addition, to address the shortcomings of DNN models in extracting shallow features, such as grammatical errors and text richness, we use manually extracted features with the adaptive weight to obtain the shallow linguistic features of the essays. MSSF fuses global semantic features, local semantic features, prompt relevance features, and shallow linguistic features for essays. Experiments are conducted on the Kaggle ASAP competition with our proposed model. The experimental results show that our proposed AES model with multi-scale semantic hybrid features can effectively improve the performance of the automatic scoring of essays and obtain the optimal performance compared with the baseline model. Our main contributions are as follows: we add 18 typical manual features with adaptive weights to the distributed representation of essays. They can not only extract valuable quantitative information from essays that are difficult to extract from DNN-AES, but can also adjust the weight parameters adaptively according to different AES tasks.
- We utilize SBERT as the sentence vectorization method of the essay. Compared with manually extracted sentence-level features, it can be fine-tuned according to specific tasks after pre-training the tasks and can make the final score more accurate.
- We add the relevance feature between the prompt and the essay. This feature allows the model to learn the correlation between them, rather than just utilizing the essay for scoring.
2. Related Work
2.1. AES Based on Shallow Linguistic Features
2.2. AES Based on Deep Neural Networks
2.3. AES Based on Pre-Trained Models
2.4. AES Based on Hybrid Model
3. Approach
3.1. Document-Scale Global Features
3.2. Sentence-Scale Local Features
3.3. Prompt Relevance Features
3.4. Shallow Linguistic Features
3.5. Essay Scoring
4. Experiment
4.1. Dataset
4.2. Evaluation Metric
4.3. Experimental Configuration
4.4. Comparative Experiment
- EASE (SVR) [42]: Enhanced AI Scoring Engine (EASE) is a public machine-learning-based classification scoring engine (https://github.com/openedx-unsupported/ease (accessed on 23 May 2023)). As EASE relies on hand-engineered features and regression techniques, we adopt the Support Vector Regression (SVR) model as the baseline approach for comparison purposes in this paper.
- LSTM-MoT [19]: Treating the essay text as a word sequence, we use the LSTM network to extract the temporal relationship and average the output of all time step states for the final scoring.
- CNN(10runs) + LSTM(10runs) [19]: By employing an integrated learning approach, ten CNN and ten LSTM models were utilized for prediction, with the resulting predictions being averaged to yield the final prediction outcome.
- CNN-LSTM-ATT [21]: We use CNN and the attention mechanism to extract sentence-scale features of the essay; then, we input the result into the LSTM model and perform the final scoring through the attention mechanism.
- SkipFlow LSTM [43]: To enhance the performance of essay scoring, the model incorporates the SkipFlow mechanism into the LSTM network. This mechanism leverages the semantic relationships between the hidden layers of the LSTM as auxiliary features.
- TSLF [29]: We use LSTM to obtain semantic features, consistency features, and semantic relevance features, combined with features, such as grammatical errors. We then use XGBoost to score the essay.
- Tran-BERT-MS-ML-R [26]: Two BERT models are used to extract token-scale, document-scale, and segment-scale features; multiple losses are used for essay scoring.
- (1)
- The prediction of the final score by manually extracting features and then utilizing traditional machine learning algorithms is not effective, indicating that relying on shallow linguistic features alone to characterize the whole essay will miss the deep semantic meaning of the essay, which is very fatal in the process of scoring essays. Secondly, parameters such as the kernel function and penalty parameters of SVR have a significant impact on performance; however, it is difficult to determine the optimal value and it is more difficult to expand the future than the DNN-AES method. LSTM-MoT and CNN(10runs) + LSTM(10runs) utilize the document-scale features of the essay to extract deep semantic information, which has significantly improved the effect compared to the manually extracted features; however, the individual document-scale DNN structure does not extract the semantic features of longer texts well. Furthermore, the whole essay is split into words, with the relevance between the words in the same sentence and the relevance between different sentences of the essay being lost. The training of multiple individual networks using integrated learning has slightly improved the effect but the scoring performance is not good.
- (2)
- Incorporating the SkipFlow mechanism into the LSTM to model the semantic relevance between hidden layers leads to a slight improvement in performance. However, it can only rely on word-level co-occurrence patterns rather than capturing the overall meaning and coherence of the essay. Concretely, in comparison to the conventional LSTM, the performance improvement is only significant for the P1 and P8 subsets, whereas the performance enhancement is comparatively similar for the remaining subsets. The CNN-LSTM-ATT model uses sentence-scale features as the basis for scoring, which is better than document-scale scoring; however, the performance improvement of the model is limited by scoring only at the sentence semantic scale. Furthermore, its accuracy in scoring largely depends on the sentence vectorization method.
- (3)
- Combining manually extracted features and deep semantic features extracted by DNN models can effectively improve the scoring performance of essays. The TSLF model with character features improves by 2.7% over LSTM-MoT. The experimental results show that the hand-crafted features and deep learning features are complementary and are effective complements to the semantic features; additionally, fusing multi-scale features can effectively improve the performance of essay scoring. However, it relies too much on document-level LSTM feature extraction capabilities and does not use additional methods to extract other scales features.
- (4)
- The Tran-BERT-MS-ML-R model achieves a QWK coefficient of 79.1% on the ASAP dataset, indicating the effectiveness and superiority of the pre-trained model in the AES task. This also indicated that scoring based on features of different scales can effectively improve the performance but would cause more computational load. Moreover, our proposed MSSF method outperforms the Tran-BERT-MS-ML-R model and achieves the highest performance scores on four subsets because MSSF extracts document-scale global features and sentence-scale local features from deep semantic features, proposes the similarity relevance between the essay and prompt from the topic scale, and extracts linguistic features from shallow information for scoring. Thus, the proposed model in this paper exhibits the best overall performance when compared to the other models and without excessive computational load.
4.5. Feature Performance Analysis
- (1)
- The hybrid DOC-SEN model makes a significant contribution to the overall enhancement of the AES performance. Compared with DOC and SEN, the hybrid model has significantly higher performance improvement on a subset of both, with an overall performance improvement of 2.2% and 1.3%, respectively. The experimental results show that combining document-scale global semantic features and sentence-scale local semantic features can better characterize essays and improve essay scoring performance.
- (2)
- Utilizing shallow linguistic features alone is insufficient for essay scoring. However, incorporating shallow linguistic features into the model alongside deep semantic features improves the performance by 2%. Experimental results indicate that since DNN models cannot capture shallow features, such as grammatical errors and linguistic richness, artificial features can effectively complement deep semantic features to improve performance.
- (3)
- Prompt relevance has an impact on the performance of the model. Compared with deep and shallow semantic features, the addition of prompt-relevance features improves the scores on all subsets. However, the improvement is small, with an overall performance improvement of 0.5%. The experimental results demonstrate that incorporating essay and prompt relevance features can enhance the performance of essay scoring. However, the current approach yields limited performance improvement for the model.
Models | Prompt | ||||||||
---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Ave. | |
SLF | 0.802 | 0.652 | 0.665 | 0.768 | 0.774 | 0.691 | 0.732 | 0.612 | 0.712 |
DOC | 0.775 | 0.687 | 0.683 | 0.788 | 0.810 | 0.807 | 0.805 | 0.614 | 0.746 |
SEN | 0.796 | 0.695 | 0.701 | 0.779 | 0.801 | 0.793 | 0.792 | 0.678 | 0.755 |
DOC-SEN | 0.808 | 0.709 | 0.705 | 0.798 | 0.819 | 0.814 | 0.807 | 0.682 | 0.768 |
DOC-SEN-SLF | 0.839 | 0.742 | 0.732 | 0.812 | 0.821 | 0.817 | 0.815 | 0.721 | 0.788 |
MSSF | 0.846 | 0.748 | 0.737 | 0.820 | 0.826 | 0.825 | 0.820 | 0.725 | 0.793 |
- It generates contextually aware sentence embeddings. This allows SBERT to capture the contextual meaning of sentences and include word order and dependencies, rather than just averaging out the word vectors in the sentence, resulting in more accurate embeddings.
- It benefits from the large-scale pre-training of BERT models. It is typically pre-trained on vast amounts of text data, allowing it to learn a broad range of language patterns. This extensive pre-training contributes to the model’s ability to generalize well across different tasks and domains.
- It allows fine-tuning on specific downstream tasks. This fine-tuning process adapts the pre-trained model to the specific task, resulting in improved performance. On the other hand, InferSent and USE are not designed for easy fine-tuning and lack this adaptability.
- It can be used to calculate the semantic similarity between sentences. The more accurate measure of similarity can be obtained by sentences embedding in a high-dimension and computing the cosine similarity between them. It can better understand the semantics of sentences and improve the capability of sentence vectorization.
4.6. Time Complexity
4.7. Threats to Validity
- It does not have high content validity, which is the degree to which it can cover the content areas it intends to measure. Specifically, AES systems are typically trained on the specific dataset of essays, which may not fully represent the entire scope of writing styles. Consequently, AES may struggle to assess essays that fall outside the scope of the training data, leading to potential biases and incomplete evaluations.
- It can be susceptible to language and cultural biases. The difference in language usage, dialects, or cultural references can impact the accuracy of AES, particularly for non-native English speakers or individuals from diverse linguistic backgrounds. In addition, scoring varies greatly between different languages and the model cannot be transferred well between different languages.
- It often relies on surface-level information, such as word, sentence, or grammar, which is important. However, it may not capture the deeper aspects of writing quality, such as coherence, organization, or argumentation.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hussein, M.A.; Hassan, H.; Nassef, M. Automated language essay scoring systems: A literature review. PeerJ Comput. Sci. 2019, 5, e208. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hua, C.; Wind, S.A. Exploring the psychometric properties of the mind-map scoring rubric. Behaviormetrika 2019, 46, 73–99. [Google Scholar] [CrossRef]
- McNamara, D.S.; Louwerse, M.M.; Graesser, A.C. Coh-Metrix: Automated Cohesion and Coherence Scores to Predict Text Readability and Facilitate Comprehension; Technical Report; Institute for Intelligent Systems, University of Memphis: Memphis, TN, USA, 2002. [Google Scholar]
- Landauer, T.K. Automatic essay assessment. Assess. Educ. Princ. Policy Pract. 2003, 10, 295–308. [Google Scholar] [CrossRef]
- Ke, Z.; Ng, V. Automated Essay Scoring: A Survey of the State of the Art. IJCAI 2019, 195, 6300–6308. [Google Scholar]
- Borade, J.G.; Netak, L.D. Automated grading of essays: A review. In Proceedings of the Intelligent Human Computer Interaction: 12th International Conference, IHCI 2020, Daegu, Republic of Korea, 24–26 November 2020; pp. 238–249. [Google Scholar]
- Uto, M. A review of deep-neural automated essay scoring models. Behaviormetrika 2021, 48, 459–484. [Google Scholar] [CrossRef]
- Cozma, M.; Butnaru, A.M.; Ionescu, R.T. Automated essay scoring with string kernels and word embeddings. arXiv 2018, arXiv:1804.07954. [Google Scholar]
- Butnaru, A.M.; Ionescu, R.T. From image to text classification: A novel approach based on clustering word embeddings. Procedia Comput. Sci. 2017, 112, 1783–1792. [Google Scholar] [CrossRef]
- Dasgupta, T.; Naskar, A.; Dey, L.; Saha, R. Augmenting textual qualitative features in deep convolution recurrent neural network for automatic essay scoring. In Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications, Melbourne, Australia, 19 July 2018; pp. 93–102. [Google Scholar]
- Page, E.B. Grading essays by computer: Progress report. In Proceedings of the Invitational Conference on Testing Problems, Princeton, NJ, USA, 28 October 1967. [Google Scholar]
- Mathias, S.; Bhattacharyya, P. ASAP++: Enriching the ASAP automated essay grading dataset with essay attribute scores. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 7–12 May 2018. [Google Scholar]
- Sakaguchi, K.; Heilman, M.; Madnani, N. Effective feature integration for automated short answer scoring. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, 31 May–5 June 2015; pp. 1049–1054. [Google Scholar]
- Cummins, R.; Zhang, M.; Briscoe, T. Constrained multi-task learning for automated essay scoring. Assoc. Comput. Linguist. 2016, 1, 789–799. [Google Scholar] [CrossRef]
- Nguyen, H.; Litman, D. Argument mining for improving the automated scoring of persuasive essays. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Kim, Y. Convolutional Neural Networks for Sentence Classification. arXiv 2014, arXiv:1408.5882. [Google Scholar]
- Graves, A.; Graves, A. Long short-term memory. In Supervised Sequence Labelling with Recurrent Neural Networks; Springer: Berlin/Heidelberg, Germany, 2012; pp. 37–45. [Google Scholar]
- Dong, F.; Zhang, Y. Automatic features for essay scoring—An empirical study. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–4 November 2016; pp. 1072–1077. [Google Scholar]
- Taghipour, K.; Ng, H.T. A neural approach to automated essay scoring. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–4 November 2016; pp. 1882–1891. [Google Scholar]
- Nguyen, H.; Dery, L. Neural networks for automated essay grading. CS224d Stanf. Rep. 2016, 1–11. Available online: https://cs224d.stanford.edu/reports/huyenn.pdf (accessed on 1 June 2023).
- Dong, F.; Zhang, Y.; Yang, J. Attention-based recurrent convolutional neural network for automatic essay scoring. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), Vancouver, BC, Canada, 3–4 August 2017; pp. 153–162. [Google Scholar]
- Ridley, R.; He, L.; Dai, X.-y.; Huang, S.; Chen, J. Automated cross-prompt scoring of essay traits. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2–9 February 2021; pp. 13745–13753. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Rodriguez, P.U.; Jafari, A.; Ormerod, C.M. Language models and automated essay scoring. arXiv 2019, arXiv:1909.09482. [Google Scholar]
- Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. Xlnet: Generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
- Wang, Y.; Wang, C.; Li, R.; Lin, H. On the use of bert for automated essay scoring: Joint learning of multi-scale essay representation. arXiv 2022, arXiv:2205.0383. [Google Scholar]
- Yang, R.; Cao, J.; Wen, Z.; Wu, Y.; He, X. Enhancing automated essay scoring performance via fine-tuning pre-trained language models with combination of regression and ranking. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online, 16–20 November 2020; pp. 1560–1569. [Google Scholar]
- Farag, Y.; Yannakoudakis, H.; Briscoe, T. Neural automated essay scoring and coherence modeling for adversarially crafted input. arXiv 2018, arXiv:1804.0689. [Google Scholar]
- Liu, J.; Xu, Y.; Zhu, Y. Automated essay scoring based on two-stage learning. arXiv 2019, arXiv:1901.0774. [Google Scholar]
- Uto, M.; Aomi, I.; Tsutsumi, E.; Ueno, M. Integration of Prediction Scores from Various Automated Essay Scoring Models Using Item Response Theory. IEEE Trans. Learn. Technol. 2023, 1–18. [Google Scholar] [CrossRef]
- Alikaniotis, D.; Yannakoudakis, H.; Rei, M. Automatic text scoring using neural networks. arXiv 2016, arXiv:1606.0428. [Google Scholar]
- Wang, Y.; Wei, Z.; Zhou, Y.; Huang, X.-J. Automatic essay scoring incorporating rating schema via reinforcement learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 791–797. [Google Scholar]
- Mesgar, M.; Strube, M. A neural local coherence model for text quality assessment. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 4328–4339. [Google Scholar]
- Nadeem, F.; Nguyen, H.; Liu, Y.; Ostendorf, M. Automated essay scoring with discourse-aware neural models. In Proceedings of the Fourteenth Workshop on Innovative Use of Nlp for Building Educational Applications, Florence, Italy, 2 August 2019; pp. 484–493. [Google Scholar]
- Uto, M.; Okano, M. Robust neural automated essay scoring using item response theory. In Proceedings of the Artificial Intelligence in Education: 21st International Conference, AIED 2020, Ifrane, Morocco, 6–10 July 2020; Proceedings, Part I 21. pp. 549–561. [Google Scholar]
- Mim, F.S.; Inoue, N.; Reisert, P.; Ouchi, H.; Inui, K. Unsupervised learning of discourse-aware text representation for essay scoring. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, Florence, Italy, 28 July–2 August 2019; pp. 378–385. [Google Scholar]
- Uto, M.; Xie, Y.; Ueno, M. Neural automated essay scoring incorporating handcrafted features. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 6077–6088. [Google Scholar]
- Reimers, N.; Gurevych, I. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv 2019, arXiv:1908.10084. [Google Scholar]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
- Le, Q.; Mikolov, T. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 1188–1196. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
- Phandi, P.; Chai, K.M.A.; Ng, H.T. Flexible domain adaptation for automated essay scoring using correlated linear regression. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 431–439. [Google Scholar]
- Tay, Y.; Phan, M.; Tuan, L.A.; Hui, S.C. Skipflow: Incorporating neural coherence features for end-to-end automatic text scoring. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
Feature Type | No. | Detailed Description |
---|---|---|
Word-scale features | 1 | Number of characters |
2 | Number of words | |
3 | Number of punctuation symbols | |
4 | Number of nouns | |
5 | Number of verbs | |
6 | Number of adverbs | |
7 | Number of adjectives | |
8 | Number of conjunctions | |
9 | Number of distinct words | |
10 | Number of misspellings | |
11 | Mean of word length | |
Sentence-scale features | 12 | Number of sentences |
13 | The average length of clauses | |
14 | The average sentence length | |
15 | The variance of sentence length | |
16 | The average depth of the syntax tree of each sentence | |
17 | the average depth of each leaf node of the syntax tree | |
Prompt-relevant features | 18 | Number of words in the essay that appears in the prompt |
Prompt | # of Essays | Genre | Average Essay Length | Score Range |
---|---|---|---|---|
1 | 1783 | ARG | 350 | 2~12 |
2 | 1800 | ARG | 350 | 1~6 |
3 | 1726 | RES | 150 | 0~3 |
4 | 1772 | RES | 150 | 0~3 |
5 | 1805 | RES | 150 | 0~4 |
6 | 1800 | RES | 150 | 0~4 |
7 | 1569 | NAR | 250 | 0~30 |
8 | 723 | NAR | 650 | 0~60 |
Models | Prompt | ||||||||
---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Ave. | |
EASE(SVR) | 0.781 | 0.621 | 0.630 | 0.749 | 0.782 | 0.771 | 0.727 | 0.534 | 0.699 |
LSTM-MoT | 0.775 | 0.687 | 0.683 | 0.795 | 0.818 | 0.813 | 0.805 | 0.594 | 0.746 |
CNN(10runs) + LSTM(10runs) | 0.821 | 0.688 | 0.694 | 0.805 | 0.807 | 0.819 | 0.808 | 0.644 | 0.761 |
CNN-LSTM-ATT | 0.822 | 0.682 | 0.672 | 0.814 | 0.803 | 0.811 | 0.801 | 0.705 | 0.764 |
SkipFlow LSTM | 0.832 | 0.684 | 0.695 | 0.788 | 0.815 | 0.810 | 0.800 | 0.697 | 0.764 |
TSLF | 0.852 | 0.736 | 0.731 | 0.801 | 0.823 | 0.792 | 0.762 | 0.684 | 0.773 |
Tran-BERT-MS-ML-R | 0.834 | 0.716 | 0.714 | 0.812 | 0.813 | 0.836 | 0.839 | 0.766 | 0.791 |
MSSF | 0.846 | 0.748 | 0.737 | 0.820 | 0.826 | 0.825 | 0.820 | 0.725 | 0.793 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, F.; Xi, X.; Cui, Z.; Li, D.; Zeng, W. Automatic Essay Scoring Method Based on Multi-Scale Features. Appl. Sci. 2023, 13, 6775. https://doi.org/10.3390/app13116775
Li F, Xi X, Cui Z, Li D, Zeng W. Automatic Essay Scoring Method Based on Multi-Scale Features. Applied Sciences. 2023; 13(11):6775. https://doi.org/10.3390/app13116775
Chicago/Turabian StyleLi, Feng, Xuefeng Xi, Zhiming Cui, Dongyang Li, and Wanting Zeng. 2023. "Automatic Essay Scoring Method Based on Multi-Scale Features" Applied Sciences 13, no. 11: 6775. https://doi.org/10.3390/app13116775
APA StyleLi, F., Xi, X., Cui, Z., Li, D., & Zeng, W. (2023). Automatic Essay Scoring Method Based on Multi-Scale Features. Applied Sciences, 13(11), 6775. https://doi.org/10.3390/app13116775