Application of Generative Artificial Intelligence Models for Accurate Prescription Label Identification and Information Retrieval for the Elderly in Northern East of Thailand
Abstract
:1. Introduction
- Section 2 (Literature Reviews) surveys existing research on drug label identification and Large Language Models, highlighting key limitations and knowledge gaps.
- Section 3 (Methodology) outlines our model architectures, data collection strategies, and the integration of RAG.
- Section 4 (Results) presents both quantitative and qualitative evaluations of the proposed models using the RAGAs framework.
- Section 5 (Discussion) interprets the findings in relation to previous research, exploring broader implications and addressing potential limitations.
- Section 6 (Conclusion and Future Work) summarizes key insights and proposes directions for extending the system to wider clinical and home healthcare settings.
2. Literature Reviews
2.1. Related Works
Study | Methodology | Dataset | Accuracy | Limitations |
---|---|---|---|---|
Liu et al. (2020) [22] | DLI-IT (CTPN + VGG-16 + LSTM) | Daily-Med (4114 images) | 88% | Requires multimodal data integration; limits with text complexity |
Ting et al. (2020) [28] | YOLOv2 (Blister Packaging) | 36,000 images | 96.26% | Front-view limitations; similar packaging leads to errors |
Gromova and Elangovan (2022) [29] | DCNN (Cylindrical Bottles) | Custom, various lighting | 98% | Handling curved surfaces only; lighting sensitivity |
You and Lin (2023) [33] | TSIDL (Two-Stage Induced) | 108 packaging types | 99.39% | Focus on speed; limited contextual diversity |
2.2. Large Language Models
2.3. Visual Question Answering Model
3. Methodology
3.1. Model Architecture and Approach
3.1.1. Two-Stage OCR + LLM Model
- OCR Stage: EasyOCR is employed to extract text from images of prescription labels. This OCR tool is particularly suited to handle the diverse font styles, sizes, orientations, and Thai language characters commonly found on pharmaceutical labels within Thailand.
- LLM Interpretation Stage: The extracted text is processed by Qwen2-72b-instruct, a language model selected for its strong instruction-following capabilities. This stage leverages the model’s contextual understanding to generate user-friendly interpretations, enriched with predefined knowledge for clarity and precision.
3.1.2. Uni-Stage VQA Model
- Direct Image Processing: The VQA model processes the entire image of the prescription label directly, bypassing the need for a separate OCR stage.
- Simultaneous Visual and Textual Interpretation: This model interprets visual cues and textual data concurrently, providing real-time responses to specific questions about the label. This approach is particularly advantageous in scenarios that require rapid analysis and response.
3.2. RAG and Knowledge Integration
3.3. Evaluation Metrics Using RAGAs
- Accuracy, precision, recall, and F1-score: these are standard classification metrics that gauge how well the system can identify and interpret correct label information.
- LLM Context Recall: This metric measures the model’s ability to retrieve and retain contextually relevant information from external knowledge sources (e.g., DrugBank) to ensure responses are both informative and tailored to specific drug labels. For this metric, ground truth answers are referenced to validate that the essential information required for a complete response is present.
- Factual Correctness: This metric assesses the factual accuracy of the model’s responses, verifying that the generated information is consistent with verified pharmaceutical data. High Factual Correctness is vital in healthcare contexts to minimize the risk of misinformation, particularly for medication instructions.
- Faithfulness: This metric evaluates the extent to which model-generated responses are faithful to the information extracted from drug labels and context sources, minimizing deviation from the source content. Maintaining Faithfulness ensures that the model delivers responses that accurately reflect label data, which is critical in healthcare to prevent potentially harmful misunderstandings.
- Semantic Similarity: This metric measures the semantic alignment of the model’s outputs with expert-verified explanations, focusing on producing responses that closely match the language and terminology healthcare professionals would use. High Semantic Similarity ensures that the model’s outputs are both understandable and contextually relevant, fostering trustworthiness for elderly patients and caregivers.
3.4. Data Collection and Preprocessing
3.5. Prompts and Testing
3.6. Hyperparameter Configuration
- The LLM’s and VQA’s temperature parameter is configured to 0.0 during the inference phase. This conservative setting promotes deterministic and focused output generation, as opposed to more creative or diverse responses. In the context of healthcare, where factual accuracy and consistency are paramount, introducing randomness through a higher temperature is deemed undesirable. Maintaining a temperature of 0.0 effectively mitigates the risk of generating hallucinated or speculative content.
- For text generation, we utilize standard default configurations for top-k sampling and nucleus sampling (top-p). Specifically, top-k is disabled by setting , and top-p is set to . This configuration compels the decoder to produce the most probable next token at each timestep without applying stochastic filtering. These deterministic settings are particularly crucial when generating medication-related advice, where precision is essential.
- To prevent overly lengthy or tangential responses, the maximum number of generated tokens is capped at 128. This limit is deemed sufficient to convey standard prescription or dosage instructions concisely, thereby enhancing overall clarity, particularly for elderly users.
- The repetition penalty is set to its default value of . This parameter is designed to mitigate the occurrence of repeated phrases during generation. Our empirical observations indicated that this parameter did not significantly impact label naming or prescription advice in our trials, justifying the retention of the default setting for simplicity.
- The EasyOCR library features a limited number of tunable parameters, primarily concerning the text detection threshold and confidence level. We adopt the recommended default settings (e.g., a confidence threshold of ) to ensure robust text extraction while minimizing the incidence of false positives.
4. Results
4.1. Performance Comparison Based on RAGAs Metrics
4.2. Performance for Label Naming (Accuracy, Precision, Recall, F1-Score)
4.3. Interpretation of Results
- LLM Context Recall: Scoring 0.88, the Two-Stage model demonstrates high retention and effective application of context from the drug label dataset, surpassing the Uni-Stage model’s score of 0.73. This indicates that sequential processing—first OCR, then LLM—enhances the model’s ability to retain context-rich information, which is essential for accurate interpretation of intricate medical instructions.
- Factual Correctness: Achieving a score of 0.83, the Two-Stage model benefits significantly from Retrieval-Augmented Generation (RAG), allowing it to validate extracted information against a curated pharmaceutical knowledge base. This enhances its reliability, particularly for accurately conveying dosage, administration instructions, and warnings. The Uni-Stage model, with a score of 0.69, demonstrates reasonable performance but shows limitations when more rigorous factual verification is required.
- Faithfulness: With a score of 0.76, the Two-Stage model adheres moderately well to the original label content, helping ensure accuracy in delivering medical information. The Uni-Stage model scored 0.64, suggesting it may occasionally introduce interpretive deviations, likely due to its single-step processing, which may simplify or generalize some details.
- Semantic Similarity: The Two-Stage model’s highest score of 0.91 in Semantic Similarity indicates strong alignment with expert-verified terminology and phrasing, which benefits elderly patients who require consistent and accurate instruction. The Uni-Stage model scored 0.78, showing it can provide reasonable clarity but could benefit from enhanced alignment with medical terminology to reduce ambiguity in its output.
4.4. Comparison with State-of-the-Art Models
Study | Accuracy (%) | Notes |
---|---|---|
Liu et al. (2020) [22] | 88.00 | Multimodal integration; challenges with text complexity |
Ting et al. (2020) [28] | 96.26 | Limitations with similar packaging |
Gromova and Elangovan (2022) [29] | 98.00 | Focused on cylindrical bottles |
You and Lin (2023) [33] | 99.39 | High precision on packaging types |
Our Two-Stage Model | 100.00 | Includes label naming and prescription advice |
- EasyOCR demonstrates high accuracy in text extraction, effectively handling the diverse fonts and languages present in Thai prescription labels.
- The use of Qwen2-72b-instruct allows for nuanced interpretation of extracted text, improving the model’s ability to distinguish between similar drug names and handle complex label information.
- By integrating RAG with a curated drug dataset, the model cross-verifies label information, reducing errors and enhancing Factual Correctness.
- Our dataset includes a wide variety of prescription labels captured under realistic conditions, ensuring the model’s robustness and generalizability.
4.5. Comparative Analysis and Practical Implications
- Two-Stage model (OCR + LLM): Given its consistently higher scores across RAGAs metrics, the Two-Stage model is ideally suited for tasks requiring detailed interpretation and strict adherence to medical standards. By leveraging sequential RAG-enhanced processing, it provides robust information verification, making it an effective tool for supporting medication guidance in settings where accuracy is paramount, such as elderly care consultations or home management of complex medication regimens.
- Uni-Stage model (VQA): While the Uni-Stage model does not achieve the same level of accuracy and context retention as the Two-Stage model, it offers significant advantages in response speed and simplicity. This model’s streamlined design allows it to provide faster output, making it suitable for high-volume healthcare settings like pharmacies, where quick, general-purpose label interpretation is more critical than exhaustive detail. It effectively balances speed with accuracy, albeit with some limitations in complex cases.
- Implications of RAGAs metrics: The RAGAs framework emphasizes the importance of accurate, contextually aligned, and faithful information in healthcare AI applications. The high scores in Factual Correctness and Semantic Similarity for the Two-Stage model highlight its suitability for precise medical tasks, particularly where maintaining fidelity to source information is critical. Conversely, the Uni-Stage model’s scores suggest it may be more effective for routine or repetitive tasks that benefit from rapid, albeit less detailed, responses.
4.6. Computational Complexity Analysis
- Time complexity: the theoretical scaling of processing time with respect to input size and model parameters.
- Memory footprint: GPU memory utilization during inference.
- Throughput: empirically measured inference speed under realistic operating conditions.
4.6.1. Two-Stage Model: OCR + LLM
- OCR systems typically employ a convolutional backbone for feature extraction and a sequence decoder for text recognition. For an input image of size , the convolutional feature extraction step exhibits a time complexity of , where k represents the number of operations per convolutional kernel. The subsequent recurrent or transformer-based decoder has a complexity approximately proportional to , with L being the length of the recognized text and d denoting the decoder’s hidden dimension.
- On an NVIDIA Tesla V100 GPU, EasyOCR processes a standard prescription label (under resolution) in approximately 70–100 ms per image. While increasing image resolution or text density extends inference time, it generally remains within feasible limits for real-time deployment in most outpatient settings.
- Modern transformer-based LLMs exhibit an inference complexity of per forward pass, where S is the sequence length (number of tokens) and D is the hidden dimension. Larger sequence lengths or model parameter counts lead to increased inference latency.
- For typical short prescription label text (ranging from 100 to 200 tokens), the average inference time is approximately 2000–3000 ms per query on a V100 GPU. The GPU memory footprint is typically 10–12 GB, varying slightly with batch size.
4.6.2. Uni-Stage Model: VQA (Qwen2-72b-VL)
- A VQA system processes image features using a vision backbone (e.g., ViT or CNN). The image encoder’s complexity is roughly , analogous to the OCR backbone stage. Subsequently, the combined vision–language transformer performs attention operations over both visual and textual tokens, resulting in a complexity of , where and represent the number of visual and textual tokens, respectively.
- As the Uni-Stage model integrates both vision and text processing into a single pass, it has a singular combined latency, typically ranging from 1800–2500 ms per image for our prescription label inputs. GPU memory usage often reaches 12–14 GB due to the concurrent encoding of visual and textual information.
4.7. Summary of Findings and Recommendations
5. Discussion
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Department of Older Persons, Ministry of Social Development and Human Security. Situation of the Thai Older Persons 2021. Available online: https://www.dop.go.th/download/knowledge/th1663828576-1747_1.pdf (accessed on 4 July 2024).
- Department of Older Persons. Statistics of Older Persons June 2023 by Looker Studio. Available online: https://www.dop.go.th/th/know/side/1/1/2449 (accessed on 4 July 2024).
- Economic Research Institute for ASEAN and East Asia (ERIA). Population Ageing in Thailand. Available online: https://www.eria.org/publications/population-ageing-in-thailand (accessed on 4 July 2024).
- World Health Organization. Thailand’s Leadership and Innovations Towards Healthy Ageing. Available online: https://www.who.int/southeastasia/news/feature-stories/detail/thailands-leadership-and-innovation-towards-healthy-ageing (accessed on 4 July 2024).
- Vinks, T.H.; De Koning, F.H.; de Lange, T.M.; Egberts, T.C. Identification of Potential Drug-Related Problems in the Elderly: The Role of the Community Pharmacist. Pharm. World Sci. 2006, 28, 33–38. [Google Scholar] [CrossRef]
- Sapkota, S.; Pudasaini, N.; Singh, C.; Sagar, G.C. Drug Prescribing Pattern and Prescription Error in Elderly: A Retrospective Study of Inpatient Record. Asian J. Pharm. Clin. Res. 2011, 4, 129–132. [Google Scholar]
- Yang, C.; Zhu, S.; Lee, D.T.F.; Chair, S.Y. Interventions for Improving Medication Adherence in Community-Dwelling Older People with Multimorbidity: A Systematic Review and Meta-Analysis. Int. J. Nurs. Stud. 2022, 126, 104154. [Google Scholar] [CrossRef]
- Stock, S.; Redaelli, M.; Simic, D.; Siegel, M.; Henschel, F. Risk Factors for the Prescription of Potentially Inappropriate Medication (PIM) in the Elderly. Wien. Klin. Wochenschr. 2014, 126, 604–612. [Google Scholar] [CrossRef]
- Roux-Marson, C.; Baranski, J.B.; Fafin, C.; Exterman, G.; Vigneau, C.; Couchoud, C.; Moranne, O. Medication Burden and Inappropriate Prescription Risk Among Elderly with Advanced Chronic Kidney Disease. BMC Geriatr. 2020, 20, 87. [Google Scholar] [CrossRef]
- Noonan, M.C.; Wingham, J.; Taylor, R.S. Who Cares? The Experiences of Caregivers of Adults Living with Heart Failure, Chronic Obstructive Pulmonary Disease and Coronary Artery Disease: A Mixed Methods Systematic Review. BMJ Open 2018, 8, e020927. [Google Scholar] [CrossRef]
- Khanagar, S.B.; Al-Ehaideb, A.; Maganur, P.C.; Vishwanathaiah, S.; Patil, S.; Baeshen, H.A.; Sarode, S.C.; Bhandi, S. Developments, Application, and Performance of Artificial Intelligence in Dentistry—A Systematic Review. J. Dent. Sci. 2021, 16, 508–522. [Google Scholar] [CrossRef] [PubMed]
- Loh, H.W.; Ooi, C.P.; Seoni, S.; Barua, P.D.; Molinari, F.; Acharya, U.R. Application of Explainable Artificial Intelligence for Healthcare: A Systematic Review of the Last Decade (2011–2022). Comput. Methods Programs Biomed. 2022, 226, 107161. [Google Scholar] [CrossRef]
- Yin, J.; Ngiam, K.Y.; Teo, H.H. Role of Artificial Intelligence Applications in Real-Life Clinical Practice: Systematic Review. J. Med. Internet Res. 2021, 23, e25759. [Google Scholar] [CrossRef]
- Albahri, O.S.; Zaidan, A.A.; Albahri, A.S.; Zaidan, B.B.; Abdulkareem, K.H.; Al-Qaysi, Z.T.; Alamoodi, A.H.; Aleesa, A.M.; Chyad, M.A.; Alesa, R.M.; et al. Systematic Review of Artificial Intelligence Techniques in the Detection and Classification of COVID-19 Medical Images in Terms of Evaluation and Benchmarking: Taxonomy Analysis, Challenges, Future Solutions and Methodological Aspects. J. Infect. Public Health 2020, 13, 1381–1396. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Jaided AI, EasyOCR. Available online: https://github.com/JaidedAI/EasyOCR (accessed on 4 July 2024).
- Yang, A.; Yang, B.; Hui, B.; Zheng, B.; Yu, B.; Zhou, C.; Fan, Z. Qwen2 Technical Report. arXiv 2024, arXiv:2407.10671. [Google Scholar]
- Lewis, P.; Oguz, B.; Rinott, R.; Riedel, S.; Stoyanov, V. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020; pp. 9459–9474. [Google Scholar]
- Wishart, D.S.; Feunang, Y.D.; Guo, A.C.; Lo, E.J.; Marcu, A.; Grant, J.R.; Sajed, T.; Johnson, D.; Li, C.; Sayeeda, Z.; et al. DrugBank 5.0: A Major Update DrugBank Database 2018. Nucleic Acids Res. 2018, 46, D1074–D1082. [Google Scholar] [CrossRef]
- Es, S.; James, J.; Espinosa-Anke, L.; Schockaert, S. RAGAs: Automated Evaluation of Retrieval Augmented Generation. arXiv 2023, arXiv:2309.15217. [Google Scholar]
- Liu, X.; Meehan, J.; Tong, W.; Wu, L.; Xu, X.; Xu, J. DLI-IT: A Deep Learning Approach to Drug Label Identification through Image and Text Embedding. BMC Med. Inform. Decis. Mak. 2020, 20, 68. [Google Scholar] [CrossRef] [PubMed]
- Tian, Z.; Huang, W.; He, T.; He, P.; Qiao, Y. Detecting Text in Natural Image with Connectionist Text Proposal Network. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 56–72. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Smith, R. An Overview of the Tesseract OCR Engine. In Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Curitiba, Brazil, 23–26 September 2007; pp. 629–633. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. arXiv 2016, arXiv:1612.08242. [Google Scholar]
- Ting, H.W.; Chung, S.L.; Chen, C.F.; Chiu, H.Y.; Hsieh, Y.W. A Drug Identification Model Developed Using Deep Learning Technologies: Experience of a Medical Center in Taiwan. BMC Health Serv. Res. 2020, 20, 312. [Google Scholar] [CrossRef]
- Gromova, K.; Elangovan, V. Automatic Extraction of Medication Information from Cylindrically Distorted Pill Bottle Labels. Mach. Learn. Knowl. Extract. 2022, 4, 852–864. [Google Scholar] [CrossRef]
- Szeliski, R. Computer Vision: Algorithms and Applications; Springer Nature: Cham, Switzerland, 2022. [Google Scholar]
- Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- You, Y.S.; Lin, Y.S. A Novel Two-Stage Induced Deep Learning System for Classifying Similar Drugs with Diverse Packaging. Sensors 2023, 23, 7275. [Google Scholar] [CrossRef]
- Huang, S.C.; Pareek, A.; Seyyedi, S.; Banerjee, I.; Lungren, M.P. Fusion of Medical Imaging and Electronic Health Records Using Deep Learning: A Systematic Review and Implementation Guidelines. NPJ Digit. Med. 2020, 3, 136. [Google Scholar] [CrossRef]
- Kumar, Y.; Koul, A.; Singla, R.; Ijaz, M.F. Artificial Intelligence in Disease Diagnosis: A Systematic Literature Review, Synthesizing Framework and Future Research Agenda. J. Ambient Intell. Humaniz. Comput. 2023, 14, 8459–8486. [Google Scholar] [CrossRef] [PubMed]
- Hu, S.; Yu, S.; Cheng, C.; Shen, X.; Zhang, H.; Wei, F.; Zhou, X.; Yu, H.; Zhuang, L.; Liu, J.; et al. MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies. Available online: https://arxiv.org/abs/2404.06395 (accessed on 4 July 2024).
- Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A Survey of Large Language Models. arXiv 2023, arXiv:2303.18223. [Google Scholar]
- Harris, J.; Laurence, T.; Loman, L.; Grayson, F.; Nonnenmacher, T.; Long, H.; WalsGriffith, L.; Douglas, A.; Fountain, H.; Georgiou, S.; et al. Evaluating Large Language Models for Public Health Classification and Extraction Tasks. arXiv 2024, arXiv:2405.14766. [Google Scholar]
- Awais, M.; Naseer, M.; Khan, S.; Anwer, R.M.; Cholakkal, H.; Shah, M.; Khan, F.S. Foundational Models Defining a New Era in Vision: A Survey and Outlook. arXiv 2023, arXiv:2307.13721. [Google Scholar]
- Hartsock, I.; Rasool, G. Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review. arXiv 2024, arXiv:2403.02469. [Google Scholar] [CrossRef] [PubMed]
- Conneau, A.; Khandelwal, K.; Goyal, N.; Chaudhary, V.; Wenzek, G.; Guzman, F.; Stoyanov, V. Unsupervised Cross-Lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Seattle, WA, USA, 5–10 July 2020; pp. 8440–8451. [Google Scholar]
- Xue, L.; Constant, N.; Roberts, A.; Kale, M.; Al-Rfou, R.; Siddhant, A.; Barua, A.; Raffel, C. mT5: A Massively Multilingual Pre-Trained Text-to-Text Transformer. arXiv 2021, arXiv:2010.11934. [Google Scholar]
- Anil, R.; Dai, A.M.; Firat, O.; Johnson, M.; Lepikhin, D.; Passos, A.; Wu, Y. PaLM 2 Technical Report. arXiv 2023, arXiv:2305.10403. [Google Scholar]
- Jiang, A.Q.; Sablayrolles, A.; Mensch, A.; Bamford, C.; Chaplot, D.S.; Casas, D.D.L.; Bressand, F.; Lengyel, G.; Lample, G.; Saulnier, L.; et al. Mistral 7B. arXiv 2023, arXiv:2310.06825. [Google Scholar]
- Laurençon, H.; Saulnier, L.; Wang, T.; Akiki, C.; Villanova del Moral, A.; Le Scao, T.; Jernite, Y. The BigScience Roots Corpus: A 1.6 TB Composite Multilingual Dataset. Adv. Neural Inf. Process. Syst. 2022, 35, 31809–31826. [Google Scholar]
- OpenAI. ChatGPT (July 5 Version) [Large Language Model]. Available online: https://chat.openai.com/ (accessed on 4 July 2024).
- Gu, Y.; Tinn, R.; Cheng, H.; Lucas, M.; Usuyama, N.; Liu, X.; Poon, H. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. ACM Trans. Comput. Healthc. 2021, 3, 1–23. [Google Scholar] [CrossRef]
- Singhal, K.; Azizi, S.; Tu, T.; Mahdavi, S.S.; Wei, J.; Chung, H.W.; Natarajan, V. Large Language Models Encode Clinical Knowledge. Nature 2023, 620, 172–180. [Google Scholar] [CrossRef]
- Johnson, J.; Hariharan, B.; Van Der Maaten, L.; Fei-Fei, L.; Lawrence Zitnick, C.; Girshick, R. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2901–2910. [Google Scholar]
- Hudson, D.A.; Manning, C.D. GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 6700–6709. [Google Scholar]
- Lau, J.J.; Gayen, S.; Ben Abacha, A.; Demner-Fushman, D. A Dataset of Clinically Generated Visual Questions and Answers About Radiology Images. Sci. Data 2018, 5, 180251. [Google Scholar] [CrossRef] [PubMed]
- He, X.; Zhang, Y.; Mou, L.; Xing, E.; Xie, P. PathVQA: 30000+ Questions for Medical Visual Question Answering. arXiv 2020, arXiv:2003.10286. [Google Scholar]
- Radford, A.; Kim, J.W.; Xu, T.; Brockman, G.; McLeavey, C.; Sutskever, I. Robust Speech Recognition via Large-Scale Weak Supervision. arXiv 2022, arXiv:2212.04356. [Google Scholar]
Drug Group | Unique Drug Names | Dosage Information |
---|---|---|
Pain Relief | Naproxen | 250 mg |
Paracetamol | 500 mg | |
Tramadol HCl | 50 mg | |
Aspirin | 81 mg | |
NorGesic (Paracetamol + Orphrenadrine) | 500 + 35 mg | |
Cold Relief | Glyceryl Guaicolate | 100 mg |
Dextromethorphan | 15 mg | |
Acetylcysteine | 200 mg | |
Vitamins | Folic Acid | 5 mg |
B-Co Vitamin | - | |
Multivitamins | - | |
Vitamin B6 | 50 mg | |
Blood Pressure | Amlodipine | 10 mg |
Enalapril | 5 mg | |
Losartan | 50 mg | |
Manidipine | 20 mg | |
Hydralazine HCL | 25 mg | |
Cholesterol | Simvastatin | 10 mg, 40 mg |
Gemfibrozil | 300 mg | |
Diabetes | Metformin | 500 mg |
Glipizide | 5 mg | |
Antibiotics | Ciprofloxacin | 250 mg |
Amoxicillin | 500 mg | |
Gastrointestinal | Omeprazole | 20 mg |
Domperidone | 10 mg | |
Sodium Bicarbonate | 300 mg | |
Calcium Carbonate | 600 mg | |
Mental Health | Lorazepam | 1 mg |
Risperidone | 2 mg | |
Amitriptyline | 10 mg | |
Sertraline | 50 mg | |
Fluoxetine | 20 mg | |
Chlorpromazine | 50 mg | |
Benzhexol HCL | 2 mg | |
Gabapentin | 300 mg | |
Hormones | Progesterone | 200 mg |
Calcitriol (Vitamin D3) | 0.25 mcg | |
Ergocalciferol (Vitamin D2) | 20,000 IU | |
Iron Supplements | Ferrous Fumarate | 200 mg |
Miscellaneous | SIMEthicone | 80 mg |
Milk of Magnesia | - | |
Acetylcysteine | - | |
Rifampicin | 300 mg | |
Isoniazid | 100 mg | |
Pantoprazole | 20 mg | |
Natear (Hypromellose + Boric Acid) | - | |
ORS | 7.5 g |
Metric | Two-Stage Model | Std Dev (Two-Stage) | Uni-Stage Model | Std Dev (Uni-Stage) |
---|---|---|---|---|
LLM Context Recall | 0.88 | ±0.04 | 0.73 | ±0.05 |
Factual Correctness | 0.83 | ±0.05 | 0.69 | ±0.06 |
Faithfulness | 0.76 | ±0.06 | 0.64 | ±0.07 |
Semantic Similarity | 0.91 | ±0.03 | 0.78 | ±0.04 |
Model | #Correct | #Incorrect | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|---|---|
Two-Stage | 100 | 0 | 100.00% | 1.0000 | 1.0000 | 1.0000 |
Uni-Stage | 96 | 4 | 96.00% | 1.0000 | 0.9600 | 0.9796 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Thetbanthad, P.; Sathanarugsawait, B.; Praneetpolgrang, P. Application of Generative Artificial Intelligence Models for Accurate Prescription Label Identification and Information Retrieval for the Elderly in Northern East of Thailand. J. Imaging 2025, 11, 11. https://doi.org/10.3390/jimaging11010011
Thetbanthad P, Sathanarugsawait B, Praneetpolgrang P. Application of Generative Artificial Intelligence Models for Accurate Prescription Label Identification and Information Retrieval for the Elderly in Northern East of Thailand. Journal of Imaging. 2025; 11(1):11. https://doi.org/10.3390/jimaging11010011
Chicago/Turabian StyleThetbanthad, Parinya, Benjaporn Sathanarugsawait, and Prasong Praneetpolgrang. 2025. "Application of Generative Artificial Intelligence Models for Accurate Prescription Label Identification and Information Retrieval for the Elderly in Northern East of Thailand" Journal of Imaging 11, no. 1: 11. https://doi.org/10.3390/jimaging11010011
APA StyleThetbanthad, P., Sathanarugsawait, B., & Praneetpolgrang, P. (2025). Application of Generative Artificial Intelligence Models for Accurate Prescription Label Identification and Information Retrieval for the Elderly in Northern East of Thailand. Journal of Imaging, 11(1), 11. https://doi.org/10.3390/jimaging11010011