*1.2. Literature Review*

In this part, we review the literature from the perspectives of 'Existing methods and the limitations' and 'Advanced methods on human activity recognition' to demonstrate the previous work on the perception and identification of student classroom behaviors.

#### 1.2.1. Existing Methods and the Limitations

The previous research on students' classroom behavior in the traditional education field is often based on statistical survey methods, requiring teachers to observe the classroom behavior of the entire class or a smaller number of people as an observer over a period within the classroom and to record their behavior [16]. In these circumstances, the teacher plays the role of the evaluator to assess the student's behavior patterns. This manual, visionbased approach is usually surpassed in identifying inappropriate classroom activities, but the teacher's one-to-many nature at the time of the count results in the poor perception of the finer classroom behaviors of most students [17]. Furthermore, this visual-based artificial approach to behavior analysis is undoubtedly time-consuming, labor-intensive, and highly subjective. It is highly likely to violate students' privacy through external interventions and create learning ancients. It cannot give objective science-based judgments, thus preventing a comprehensive classroom behavior assessment. For school-age children with developmental disabilities, classroom behavior management and interventions are the primary methods for improving their classroom performance. There are three common primary types of classroom behavior interventions: one-on-one peer help or parent coaching [18], instructional task modification [19], and self-monitoring [20]. However, while this traditional intervention method has helped children with developmental disabilities improve their classroom task completion rate and classroom behavioral performance, this behavioral intervention undoubtedly consumes many resources in terms of monitoring children's classroom behavior. It requires significant human and material resources to assist children's classroom learning process.

#### 1.2.2. Advanced Methods on Human Activity Recognition

With the development and popularization of artificial intelligence technologies, research on scenario-based understanding and behavioral analysis has shone in practical application scenarios [21]. With the help of data-driven and algorithmic reasoning, machine learning theory provides the feasibility of achieving a one-to-one accurate understanding and assessment of students' classroom behaviors [22], especially for fine-grained behavioral analysis that is difficult to be taken into account by manual statistics. Some scholars have already implemented AI techniques with classroom scene understanding and achieved better results. For example, intelligent classroom systems that assist teachers in teaching and personalize students' learning by building front-end interactive learning environments for teachers and students and back-end intelligent learning management systems [23]. Adaptive education platforms that solve students' specific learning problems provide personalized teaching and improve students' learning experiences according to their needs and their abilities [24]. However, little research has been conducted on AI-based classroom behavioral understanding due to the immature combination of technologies and the niche nature of the educational scenario, making it almost impossible to find corresponding work for reference. Although classroom behavioral activities are too complex and refined, it still belongs to the domain of behavior recognition, so we can help build an intelligent classroom-acceptable behavior perception system by referring to the relevant theories of human activity recognition. A brief overview of the human activity recognition approach is presented in the following section.

The mainstream approaches for human activity recognition can be roughly classified into vision-based and sensor-based based on different data sources [25]. Vision-based behavioral analysis systems usually use single or multiple RGB or depth cameras to collect images or video data of participants' behavioral information, environment, and background information in a specific activity space [26]. Moreover, after feature extraction of the collected data through image processing techniques and computer vision methods, they can be used to identify participants' behavior through algorithm learning and inference. Research conducted by numerous scholars applying vision methods in the field of human activity recognition includes: identifying group behavior and classifying abnormal activities in crowded scenes for surveillance as well as public personal safety purposes [27–29],

including analysis of fall detection, patient monitoring and other behavioral recognition of individuals to improve the quality of human life through vision [30,31]. However, since the vision-based data acquisition equipment for human activity behavior recognition mainly relies on cameras, it is vulnerable to environmental conditions such as light and weather, the shooting range and angle, and a large amount of acquired data storage. The reference of participants' activity is easily affected by environmental occlusion and privacy issues. Due to the influence of these factors, vision-based behavior analysis systems have not yet been widely used. In contrast, sensors have the advantages of high sensitivity, small size, accessible storage, and wide applicability to various scenarios, which can avoid various problems in using vision devices, so they are now widely embedded in mobile phones, smartwatches and bracelets, eye-tracking devices, virtual/augmented reality headsets, and various intelligent IoT devices [32]. Meanwhile, along with the widespread popularity of mobile Internet and the increasing demand for daily public use of intelligent devices, the problems of inconvenience in carrying and limited endurance of traditional sensor-based devices have been effectively solved in various application scenarios, and they have now become one of the mainstream methods for human activity recognition [25]. Scholars have applied sensing devices for intelligent activity recognition in several daily domains: Alani et al. achieved 93.67% accuracy in 2020 using a deep learning approach to recognize twenty everyday human activities in intelligent homes [33]; Kavuncuo ˘glu et al. used only a waist sensor to achieve accurate monitoring of fall and daily motion data, achieving 99.96% accuracy in 2520 data [34].
