1. Introduction
Over the past few years, advancements in computer vision technology have accelerated, largely due to deep learning developments. Computer vision has emerged as an important branch of artificial intelligence, with diverse applications in image recognition, object detection, and video analysis. Examples of these applications include analyzing emotions through facial expression detection [
1], student behavior monitoring systems [
2], and computer vision tasks using deep active learning [
3]. The growth of neural network-based AI, particularly deep learning, has driven innovation in computer vision, leading to its widespread adoption in industries such as medical image analysis, autonomous driving, and security systems. Deep learning algorithms, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Generative Adversarial Networks (GANs), have shown remarkable performance in image classification, object detection, and image generation.
Neural network-based AI is adept at addressing complex problems by learning patterns from data, it often lacks transparency and logical reasoning capabilities. On the other hand, symbolic AI, sometimes called rule-based or classical AI, works with high-level symbolic representations using clear rules and logic for data processing [
4]. Symbolic AI is particularly effective for tasks involving logical reasoning, knowledge representation, and the manipulation of abstract ideas, which makes it suitable for areas such as expert systems and language comprehension [
5]. However, symbolic AI faces challenges in learning from large amounts of raw data and adapting to new, unseen situations due to its reliance on predefined rules and structures [
6].
Computer vision technology is critical to both present and future information technology industries, and its study is fundamental for students. In an environment where neural network-based AI, such as deep learning, is central to computer vision technology, students need to develop practical skills to solve problems using real data, in addition to theoretical knowledge. However, most current educational curricula are either theory-centric or overly focused on practice, lacking a systematic educational approach that integrates neural network-based AI and symbolic AI. Despite technological advancements, there is a lack of systematic educational frameworks capable of effectively teaching computer vision technologies [
3]. Especially in primary and secondary education, teaching computer vision requires a staged learning approach that aligns with students’ levels and educational goals.
The integration of data science and computational thinking is essential in computer vision education. Data science provides the foundation for collecting, preprocessing, analyzing, and modeling image data, enabling students to learn and evaluate computer vision models [
7]. Computational thinking plays a vital role in developing students’ abilities to systematically solve problems. Elements of computational thinking such as decomposition, abstraction, pattern recognition, and algorithm design help students logically approach computer vision problems [
8].
This study aims to develop a staged framework for computer vision education by integrating neural network-based AI and symbolic AI in each stage. This framework incorporates AI, data science, and computational thinking to enhance problem-solving abilities. By emphasizing a staged framework, we design each stage to allow learners to simultaneously experience theory and practice through this integration.
3. Methods
In this study, we established a staged approach and criteria for effective teaching and learning in computer vision education. Computer vision technologies comprise multiple stages, such as image recognition, object detection, and video analysis. Developing methodologies that allow these technologies to be progressively learned in educational environments is essential. Accordingly, we proposed a four-stage educational framework that integrates neural network-based AI and symbolic AI, enabling students to systematically learn computer vision technologies. Each stage includes key elements of computer vision, such as image recognition, object detection, image segmentation, and video analysis, and is linked with various elements of data science and computational thinking.
3.1. Research Design
The research design of this study involved a systematic sequence of steps to develop and validate a staged framework for computer vision education that integrates neural network-based AI and symbolic AI, data science, and computational thinking, as shown in
Figure 1.
Initially, the research objectives were clearly defined to establish the goals and scope of this study. A comprehensive literature review was then conducted to examine existing studies related to computer vision education, AI integration, data science, and computational thinking. This review helped identify gaps in current educational approaches and informed the development of the new framework.
Subsequently, the staged educational framework was developed, comprising four progressive levels designed to build upon each other and incorporate increasing complexity in computer vision concepts and AI methodologies. Following the development of the framework, the first round of expert validity surveys was conducted. Experts in AI education, computer vision, and curriculum development were invited to evaluate the framework using a structured questionnaire. Data collected from this survey were analyzed quantitatively and qualitatively to assess the validity and reliability of the framework.
Based on the analysis of expert feedback, necessary refinements were made to enhance the framework’s validity and applicability. Detailed lesson plan examples were then developed for each stage to provide practical implementation guidance. The second round of expert validity surveys was conducted, during which experts evaluated the practical aspects of the framework and lesson plans, including the clarity of educational objectives and the feasibility of classroom implementation. Data from the second expert survey were collected and analyzed to further validate the framework and identify areas for final adjustments.
3.2. Staged Approach and Criteria for Computer Vision Education
Table 4 outlines the four-stage framework, which progressively introduces more complex computer vision concepts while integrating data science and computational thinking elements. This staged approach is based on Bloom’s Taxonomy of Educational Objectives [
34] and Vygotsky’s Zone of Proximal Development [
35], designed considering the cognitive developmental stages of learners.
In the first stage, learning focuses on image classification and object recognition. No-code platforms are utilized to help learners grasp basic computer vision concepts without prior programming experience. From the data science perspective, students learn about data collection and preprocessing, emphasizing computational thinking skills such as decomposition and pattern recognition. For example, learners can perform tasks like classifying images of laboratory equipment into categories like beakers and flasks.
In the second stage, learners study image retrieval and image captioning. Teachers have the flexibility to choose between basic block coding (e.g., AI blocks in Scratch 3 or mBlock V5) and basic Python coding (utilizing OpenCV 4) based on the learners’ proficiency levels. This allows for a tailored approach that meets students where they are in their coding journey. Students acquire data visualization and data analysis techniques, strengthening abilities in abstraction and algorithm design. They can apply learning content practically through tasks such as explaining the uses and methods of the recognized laboratory equipment. This reflects the importance of accommodating different learning styles and the gradual transition from block-based to text-based coding for advancing learners’ programming skills [
36].
In the third stage, students learn about image segmentation and bounding box techniques. Again, teachers can select between advanced block coding (e.g., supervised learning blocks in Scratch 3 or mBlock V5) and advanced Python coding (utilizing OpenCV 4 or YOLOV8) depending on students’ skills and readiness. They deepen their learning by understanding feature extraction and model training processes, focusing on enhancing automation and simulation abilities. An example task is detecting the laboratory equipment used in each stage in experimental process videos. This stage emphasizes hands-on experience with model training.
In the final stage, learners study object tracking and action recognition. Teachers have the option to choose between block coding combined with physical computing (e.g., Scratch 3 or mBlock V5 with Arduino or other hardware) and utilizing deep learning frameworks like TensorFlow 2 or PyTorch 2, based on the learners’ proficiency. By combining coding and physical computing or delving into deep learning frameworks, students develop practical application skills. They learn about model evaluation and deployment processes, improving abilities in parallelization and data representation. Through tasks like tracking quantitative changes in reactants and products in chemical reaction videos, students can apply advanced computer vision technologies. This stage aligns with project-based learning approaches that enhance problem-solving skills [
37].
3.3. Research Methods
To evaluate the staged approach and criteria for computer vision education, we conducted two rounds of expert validity assessments. The expert validity assessment was based on Lawshe’s Content Validity Ratio (CVR) method [
38], using a content validity questionnaire composed of a Likert 5-point scale. A total of 25 experts in AI education and curricula participated in evaluating the validity of this study, reviewing the validity of the educational criteria and teaching–learning activities presented in each stage.
In the first round of validity assessment, we evaluated the staged approach and criteria of computer vision education as shown in
Table 5. Experts assessed whether the computer vision level presented in each stage (image recognition, object detection, image segmentation, video analysis) was appropriate from an educational perspective and whether the connections with data science and computational thinking elements were sufficient.
In the second round of validity assessment, we evaluated the validity of the teaching–learning activity examples, reflecting improvements derived from the first round. The second assessment evaluated whether the learning activities presented in each stage could effectively achieve learning objectives, the appropriateness of practical activities, and whether they could induce learner participation as shown in
Table 6. Additionally, we reviewed whether the stage-specific learning activities aligned with learners’ levels and learning objectives, and whether the feedback provision methods and evaluation methods had sufficient validity.
And after the first expert validity assessment, we developed and provided an example of a lesson plan along with the first expert validity questionnaire as shown in
Table 7. This was intended to offer concrete guidance and facilitate a more detailed evaluation of the proposed educational framework by the experts. The lesson plan example provided in
Table 7, focusing on Stage 2 (utilizing image recognition) of the staged framework, demonstrates how the approach can be practically implemented in the classroom. For example, in this activity, students progress from understanding basic scientific concepts to applying advanced computer vision techniques. Students engage in hands-on experimentation, analyzing the captured images using Python, and implementing a linear regression model to predict pH values based on RGB data.
3.3.1. Validity Verification Method
We used Lawshe’s CVR method to evaluate content validity [
38]. CVR is calculated based on the results of experts’ evaluations of whether each item is essential, using the following formula:
where (
Ne) is the number of experts indicating the item is ‘valid’, and (
N) is the total number of experts. According to Lawshe, when 25 evaluators participate, a CVR value of 0.37 or higher indicates that the item is statistically significant. Through this method, we verified the validity of the staged educational approach and teaching–learning activities presented in this study.
3.3.2. Data Collection and Analysis
Data for this study were collected through two rounds of expert validity assessments. In the first round, we evaluated the staged approach and criteria of computer vision education, with key evaluation items including the validity of each stage, connection with data science, and appropriateness of coding education. In the second round, we evaluated the teaching–learning activity examples for each stage, focusing on the clarity of learning objectives, appropriateness of practical activities, and the validity of feedback provision and evaluation methods.
The collected data were analyzed using SPSS Statistics 23. We calculated CVR values for each evaluation item to assess validity and incorporated improvements suggested by experts through content analysis of open-ended responses. The analysis results confirmed the validity in most items, and based on this, we revised and supplemented the staged approach and criteria for computer vision education. In particular, we received positive evaluations regarding the connection between stage-specific educational activities and data science and computational thinking elements.
3.4. Pilot Implementation
To validate the developed framework and teaching–learning process, a pilot program was conducted with 40 upper secondary school students. The entire program was delivered over four class hours as shown in
Table 8.
In the first session, students learned the scientific concepts related to acid–base properties and titration using indicators. During the second session, students engaged in activities where they used the BTB indicator to distinguish between acids and bases. They observed color changes and took images of BTB solutions with pH value measurement. In the third and fourth sessions, students analyzed the RGB values of these images using a Python environment. They implemented linear regression machine learning to generate a regression model that could predict pH values based on the average RGB values of the images.
The pilot implementation aligns closely with the staged framework for computer vision education, particularly incorporating elements from the second and third stages. The activities in the second session, where students captured images of BTB solution color changes according to pH changes, correspond to the image retrieval and captioning focus of Stage 2. The third and fourth sessions, involving RGB value analysis and the implementation of a linear regression model, align with the more advanced concepts introduced in Stage 3. This includes feature extraction (RGB values) and model training (linear regression), which are central to Stage 3’s learning objectives. By structuring the pilot implementation in this way, we were able to provide students with a practical, hands-on experience that progressed from utilizing image recognition to more complex data analysis and model development, mirroring the progressive nature of our staged framework.
Students completed a survey both before and after the program. The survey included questions related to their perception of computer vision education, subject knowledge, and the application of artificial intelligence. The survey questions are outlined in
Table 9.
4. Results
4.1. First Expert Validity Assessment Results
The first expert validity assessment evaluated the proposed framework’s structure and content, with the results summarized in
Table 10.
For the ‘Appropriateness of Stage Division’, the mean score was 4.56 (SD = 0.50), indicating high validity. This suggests that the four proposed stages—image recognition, application of image recognition, image segmentation and object detection, and video analysis—are suitable for computer vision education. The ‘Validity of Computer Vision Elements’ received a mean score of 4.60 (SD = 0.49), reflecting expert consensus that the computer vision elements presented in each stage are appropriate. The ‘Connection with Data Science Elements’ was rated with a mean of 4.68 (SD = 0.47), indicating that the data science elements are well integrated with each stage of computer vision education. For the ‘Suitability of Coding Education Levels’, the mean score was 4.64 (SD = 0.48), suggesting that the coding levels—ranging from no-code platforms to Python coding—are appropriate for learners’ levels and educational goals. The ‘Relevance of Computational Thinking Elements’ had a mean score of 4.64 (SD = 0.56) and a CVR of 0.91, indicating strong agreement among experts that the computational thinking elements are relevant to the learning content in each stage. The ‘Suitability of Learning Activities Examples’ received a mean score of 4.64 (SD = 0.48), confirming that the examples and activities are appropriate for achieving the learning objectives. Lastly, the ‘Connection Between Stages’ had the highest mean score of 4.72 (SD = 0.45), indicating that each stage is effectively designed to build upon the previous one, facilitating the progressive deepening of knowledge and skills.
Overall, the results of the first validity assessment suggest that the stages of computer vision education are appropriately designed to match learners’ levels, with meaningful connections between educational objectives and learning activities. However, some experts suggested the need for adjustments in certain stages to accommodate learners’ varying levels and provide additional support where necessary.
4.2. Second Expert Validity Assessment Results
In the second expert validity assessment, we examined the validity of the teaching–learning activity examples aligned with the staged approach and criteria for computer vision education as shown in
Table 11. The same 25 experts participated, evaluating each item on a Likert 5-point scale.
The ‘Clarity of Educational Objectives’ had a mean score of 4.52 (SD = 0.50), indicating that the objectives are clearly defined and understandable to learners. ‘Learner Engagement’ received a mean score of 4.64 (SD = 0.48), suggesting that the learning activities effectively encourage active participation. The ‘Appropriateness of Practical Activities’ also had a mean score of 4.64 (SD = 0.48), reflecting that the practical activities are effective in achieving the learning objectives. The ‘Provision of Feedback’ scored a mean of 4.60 (SD = 0.49), indicating that appropriate feedback can be provided during the learning process. ‘Validity of Evaluation Methods’ had a mean score of 4.68 (SD = 0.47), showing that the evaluation methods are effective in accurately measuring learners’ achievement. Lastly, the ‘Practicality of Technology Application’ received a mean score of 4.52 (SD = 0.50), suggesting that the technical elements and tools are applicable in actual educational environments.
Overall, the second validity assessment results indicate that the teaching–learning activities proposed for each stage are suitable for achieving educational objectives and effectively engaging learners. However, some experts noted that additional learning materials might be necessary for advanced learners, implying a need for differentiated instruction.
4.3. Pilot Implementation Results
The results of the pre- and post-program surveys were analyzed using a 5-point Likert scale. Statistical analysis revealed significant improvements across all questions, as outlined in
Table 12.
For Question 1, which assessed students’ perception of the interest level of computer vision education, the mean score increased from 3.500 (SD = 0.877) before the program to 4.575 (SD = 0.636) after the program. The t-value of 6.345 and the p-value of less than 0.001 indicate a statistically significant improvement in students’ interest in the subject. In Question 2, which examined the perceived usefulness of computer vision education, the mean score improved from 3.600 (SD = 0.900) before the program to 4.625 (SD = 0.628) after the program. A t-value of 6.176 and a p-value of less than 0.001 suggest that students found the program significantly more beneficial following participation. Question 3 focused on students’ knowledge of the color changes of acid–base indicators. The pre-program mean score of 2.300 (SD = 1.091) increased significantly to 4.175 (SD = 0.813) after the program, with a t-value of 9.531 and a p-value of less than 0.001, demonstrating a notable gain in scientific understanding. Question 4, which assessed students’ ability to predict pH values using a computer vision machine learning model, showed a marked improvement. The mean score rose from 2.300 (SD = 0.939) before the program to 4.075 (SD = 0.917) after the program, with a t-value of 8.834 and a p-value of less than 0.001, reflecting a significant enhancement in students’ practical application of AI. Question 5, which evaluated students’ understanding of how computer vision and AI are applied in inquiry-based activities, demonstrated an increase from a pre-program mean score of 1.900 (SD = 1.109) to a post-program mean score of 3.275 (SD = 1.213). The t-value of 5.561 and the p-value of less than 0.001 indicate a statistically significant improvement in students’ comprehension of the integration of AI and computer vision in scientific exploration.
5. Discussion
In this study, we proposed a staged approach to computer vision education, exploring ways to apply elements of data science and computational thinking by integrating neural network-based AI and symbolic AI. In this discussion, we evaluate the significance and potential impact of the proposed staged framework based on the research findings and suggest implications for educational application and future research directions.
The staged approach to computer vision education was designed to enable learners to effectively grasp core concepts by integrating both neural network-based AI and symbolic AI in each learning stage. By incorporating symbolic AI, students engaged in defining explicit rules and logical criteria, which enhanced their understanding of knowledge representation and reasoning. This integrated approach allowed learners to compare symbolic reasoning with neural learning, providing a more comprehensive understanding of AI methodologies.
In the first stage—image recognition—no-code platforms were utilized to allow learners to grasp basic computer vision concepts without prior programming experience. This significantly lowered the initial barriers to learning, especially for novice learners. Learners experienced problem-solving processes through hands-on activities and increased engagement by practically applying data collection and preprocessing—the initial stages of data science. This approach provided a crucial foundation for understanding computer vision technologies in an experimental educational environment. According to [
39], no-code AI platforms can democratize AI education by making it accessible to a broader audience.
The second stage, image retrieval and captioning, was designed to let learners experience more advanced computer vision technologies. Teachers had the flexibility to choose between basic block coding (e.g., AI blocks in Scratch 3 or mBlock V5) and basic Python coding (utilizing OpenCV 4) based on the learners’ proficiency levels. By using block-based programming languages, learners with limited programming experience could implement image retrieval and captioning without complex programming. Alternatively, for learners ready to advance, Python coding provided exposure to text-based programming and OpenCV 4 libraries. Through data visualization and analysis processes, they developed the ability to make better decisions using visual data. However, some experts pointed out that limitations of block-based coding might restrict the implementation of more complex functions [
40]. This suggests the need for concurrent learning with text-based programming languages. Future research should develop educational strategies that organically connect block-based coding with text-based programming to facilitate a smooth transition [
41].
In the third stage, object detection and image segmentation, teachers could select between advanced block coding (e.g., supervised learning blocks in Scratch 3 or mBlock V5) and advanced Python coding (utilizing OpenCV 4 or YOLOV8), depending on learners’ skill levels. This approach allowed learners to directly implement object detection algorithms suitable to their programming proficiency. Using advanced block coding, learners could grasp complex concepts without being hindered by syntax, while Python coding offered a deeper dive into programming and algorithmic implementation. This process deepened learners’ programming skills while encouraging active participation in data preprocessing and model training. Experts evaluated that learners could deeply understand the core concepts of computer vision by actually implementing object detection models in this stage. This was effective in cultivating practical problem-solving abilities rather than mere theoretical learning. However, some learners faced difficulties due to the challenges of text-based programming languages [
36]. Additional support measures, such as scaffolding and differentiated instruction, are necessary to address this issue. Future research could propose personalized learning support that provides assistance based on individual learning levels [
42].
In the fourth stage—video analysis and action recognition—teachers had the option to choose between block coding combined with physical computing (e.g., Scratch 3 or mBlock V5 with Arduino or other hardware) and utilizing deep learning frameworks like TensorFlow 2 or PyTorch 2, based on the learners’ proficiency. This flexibility enabled learners to study advanced computer vision technologies through practical exercises suited to their skill levels, either by integrating coding with hardware for a tangible learning experience or by engaging with deep learning frameworks for those ready for more complex challenges. Experts assessed with high validity that learners could acquire high-level technical skills by building and analyzing actual deep learning models in this stage. However, the complexity of deep learning frameworks posed high difficulty levels for some learners [
43]. This indicates the need for staged support tailored to learners’ levels in future educational courses, requiring improvements in instructional design. Incorporating scaffolding techniques and providing comprehensive resources can help mitigate these challenges [
44].
The pilot implementation, while focusing primarily on the second and third stages of the framework, provided valuable insights into its effectiveness. Students demonstrated significant improvements in their understanding of how computer vision and AI are applied in inquiry-based activities. This suggests that our staged approach successfully integrates computer vision concepts with practical scientific inquiry, enhancing both technical skills and subject knowledge.
The integration of neural network-based AI and symbolic AI in this staged framework is exemplified in the acid–base titration activity. In this context, neural network-based AI is represented by the linear regression model that predicts pH values based on RGB data from images of BTB indicator solutions. This model learns patterns from data without the explicit programming of rules, characteristic of neural network approaches. Concurrently, symbolic AI is incorporated through the process of analyzing RGB values from BTB indicator images using OpenCV 4 and critically examining the reasons behind the machine learning model’s pH predictions. For instance, students use OpenCV 4 to extract RGB values from the indicator images, applying predefined algorithms and rules to process the visual data. This step represents a rule-based, symbolic approach to image analysis. Subsequently, students engage in a logical analysis of the machine learning model’s predictions, utilizing their understanding of acid–base chemistry and color theory to interpret and validate the results. This analytical process involves creating explicit rules and heuristics to explain the relationship between RGB values and pH levels. The integration occurs as students use the symbolically derived RGB data to train the neural network model, and then apply symbolic reasoning to interpret the model’s outputs.
The staged framework for computer vision education proposed in this study holds significant meaning as a new educational model integrating neural network-based AI and symbolic AI. This model helps learners gradually acquire complex concepts in computer vision and develop problem-solving skills by integrating various elements of data science and computational thinking into learning. Practice-centered learning activities provide learners with opportunities to enhance their technical abilities by solving real-world problems, offering substantial educational value. This aligns with the experiential learning theory, emphasizing learning through experience [
45].
This study’s findings suggest several considerations when applying the framework in educational environments. First, selecting appropriate tools and platforms that suit the educational environment and learners’ characteristics is crucial. Although various tools such as no-code platforms, block-based coding, and Python were presented, effective utilization requires an individualized approach considering learners’ levels and needs. The importance of tool selection in STEM education is highlighted by [
46].
Additionally, support measures for enhancing educators’ expertise should be prepared. There is a shortage of educators equipped with knowledge and skills in related fields such as computer vision, AI, and data science [
47]. Therefore, professional development programs and the provision of educational resources for educators should be supported.
Finally, follow-up studies are needed to empirically verify the framework’s effectiveness. By applying the framework in various educational environments and evaluating learners’ achievement and satisfaction, the validity and utility of the framework can be confirmed. Research analyzing the long-term impact of the framework on learners’ career choices or actual problem-solving abilities would also be meaningful. Longitudinal studies, as suggested by [
48], could provide deeper insights into educational interventions and their lasting effects on students’ understanding and application of computer vision and AI technologies.
6. Conclusions
6.1. Theoretical Contribution
This study developed a staged approach and criteria for computer vision education, proposing a new educational model by integrating neural network-based AI and symbolic AI. The findings highlight the significance of a structured educational framework that supports learners in gradually grasping complex concepts in computer vision. This aligns with the emphasis of [
24] on the development of systematic methodologies in computer vision education. While existing AI education research has predominantly focused on neural network-based learning, this study integrates the logical reasoning and interpretability of symbolic AI, allowing learners to engage in deeper learning experiences. This integrated approach offers a theoretical contribution by helping learners navigate and understand the ‘black box’ nature of neural networks, as discussed in [
6].
By clearly demonstrating how symbolic AI is incorporated into each stage, and making this evident in the associated tables and descriptions, the contribution of symbolic AI becomes more transparent and central within the staged framework. This adjustment ensures that symbolic AI is recognized as a critical component of the learning process, aligning with the key themes and objectives of this study.
Furthermore, by incorporating elements of data science and computational thinking into the curriculum, the proposed AI education goes beyond mere technical training. It enables learners to develop problem-solving skills and derive meaningful insights during data analysis.
6.2. Practical Implications
The proposed framework provides educators with practical, stage-specific guidelines for implementing computer vision education, adaptable to various school and learner levels. The findings indicate that learning approaches involving no-code platforms and block-based programming languages are well suited for beginners, effectively reducing the initial barriers to learning programming. This approach enables learners to easily understand and practice the basic concepts of computer vision, even without prior programming knowledge. According to Ref. [
49], block-based programming languages like Scratch facilitate engagement and learning among beginners.
Additionally, by participating in key stages of data science, including data collection, preprocessing, exploration, and analysis, learners can see firsthand how AI technology is used to address practical, real-world problems. The advanced learning stages utilizing text-based programming languages further enhance learners’ technical capabilities and assist in cultivating practical AI implementation skills. Using practical tools like Python and OpenCV 4, learners can directly build and analyze computer vision models, thereby increasing the effectiveness of the education. Finally, practical exercises using deep learning frameworks contribute to learners acquiring advanced computer vision technologies, including real-time object tracking and action recognition.
For educators and policymakers, this study serves as a valuable reference for designing systematic, staged programs in computer vision education. Through this framework, computer vision education can be effectively conducted in diverse learning environments and serve as a practical tool to help students acquire essential AI competencies in the era of the Fourth Industrial Revolution.
6.3. Limitations and Future Research Directions
First, although a pilot test was conducted to verify the framework’s effectiveness, further experimentation across various school levels is necessary to validate its general applicability. Future research should conduct experiments across different educational levels to empirically examine the general effectiveness and long-term impact of the framework on learners’ academic performance and overall educational outcomes. Such studies would help in understanding practical differences in outcomes and identifying potential areas for further refinement of the framework.
Second, the current study’s validation relied primarily on theoretical foundations and expert assessments, without delving into personalized educational approaches that consider individual learner characteristics, prior knowledge, and learning preferences. Future studies should develop customized learning strategies that cater to individual learner differences and propose AI educational methodologies based on these strategies. In particular, educational models that provide personalized feedback in each stage according to the learner’s level are necessary, aligning with the principles of adaptive learning [
42].
Third, this research concentrated on computer vision education at the elementary and secondary levels. However, computer vision is also highly relevant in higher education and vocational training. Subsequent research should develop computer vision education frameworks targeting university students or vocational trainees and explore educational methods suitable for their specific characteristics and needs.
Lastly, while the proposed framework emphasizes technical aspects, the utilization of computer vision technology entails ethical and social considerations. Concerns such as privacy in facial recognition and liability in autonomous systems point to the necessity of incorporating ethical education into the curriculum. Future research should integrate ethical perspectives into computer vision education, enabling learners to critically reflect on the societal impacts of technology, as recommended by Ref. [
50].