Next Article in Journal
An IT2FS-ANP- and IT2FS-CM-Based Approach for Conducting Safety Risk Assessments of Nuclear Power Plant Building Projects
Next Article in Special Issue
Requirement Dependency Extraction Based on Improved Stacking Ensemble Machine Learning
Previous Article in Journal
Novel Approaches to the Formulation of Scheduling Problems
Previous Article in Special Issue
Commit-Level Software Change Intent Classification Using a Pre-Trained Transformer-Based Code Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models

by
Mohammad D. Alahmadi
* and
Moayad Alshangiti
Department of Software Engineering, College of Computer Science and Engineering, University of Jeddah, Jeddah 23890, Saudi Arabia
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(7), 1036; https://doi.org/10.3390/math12071036
Submission received: 29 February 2024 / Revised: 18 March 2024 / Accepted: 27 March 2024 / Published: 30 March 2024
(This article belongs to the Special Issue AI-Augmented Software Engineering)

Abstract

The rapid evolution of video programming tutorials as a key educational resource has highlighted the need for effective code extraction methods. These tutorials, varying widely in video quality, present a challenge for accurately transcribing the embedded source code, crucial for learning and software development. This study investigates the impact of video quality on the performance of optical character recognition (OCR) engines and the potential of large language models (LLMs) to enhance code extraction accuracy. Our comprehensive empirical analysis utilizes a rich dataset of programming screencasts, involving manual transcription of source code and the application of both traditional OCR engines, like Tesseract and Google Vision, and advanced LLMs, including GPT-4V and Gemini. We investigate the efficacy of image super-resolution (SR) techniques, namely, enhanced deep super-resolution (EDSR) and multi-scale deep super-resolution (MDSR), in improving the quality of low-resolution video frames. The findings reveal significant improvements in OCR accuracy with the use of SR, particularly at lower resolutions such as 360p. LLMs demonstrate superior performance across all video qualities, indicating their robustness and advanced capabilities in diverse scenarios. This research contributes to the field of software engineering by offering a benchmark for code extraction from video tutorials and demonstrating the substantial impact of SR techniques and LLMs in enhancing the readability and reusability of code from these educational resources.
Keywords: OCR (optical character recognition); code extraction; programming screencasts; image quality; pre-processing techniques; postprocessing techniques; large language models (LLMs); source code denoising; video programming tutorials; empirical study in software engineering OCR (optical character recognition); code extraction; programming screencasts; image quality; pre-processing techniques; postprocessing techniques; large language models (LLMs); source code denoising; video programming tutorials; empirical study in software engineering

Share and Cite

MDPI and ACS Style

Alahmadi, M.D.; Alshangiti, M. Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models. Mathematics 2024, 12, 1036. https://doi.org/10.3390/math12071036

AMA Style

Alahmadi MD, Alshangiti M. Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models. Mathematics. 2024; 12(7):1036. https://doi.org/10.3390/math12071036

Chicago/Turabian Style

Alahmadi, Mohammad D., and Moayad Alshangiti. 2024. "Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models" Mathematics 12, no. 7: 1036. https://doi.org/10.3390/math12071036

APA Style

Alahmadi, M. D., & Alshangiti, M. (2024). Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models. Mathematics, 12(7), 1036. https://doi.org/10.3390/math12071036

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop