Next Article in Journal
On Blow-Up and Explicit Soliton Solutions for Coupled Variable Coefficient Nonlinear Schrödinger Equations
Previous Article in Journal
Enhanced Virtual Synchronous Generator with Angular Frequency Deviation Feedforward and Energy Recovery Control for Energy Storage System
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Data Mining of Online Teaching Evaluation Based on Deep Learning

School of Statistics and Data Science, Beijing Wuzi University, Beijing 101149, China
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(17), 2692; https://doi.org/10.3390/math12172692
Submission received: 2 August 2024 / Revised: 26 August 2024 / Accepted: 27 August 2024 / Published: 29 August 2024
(This article belongs to the Section Mathematics and Computer Science)

Abstract

:
With the unprecedented growth of the Internet, online evaluations of teaching have emerged as a pivotal tool in assessing the quality of university education. Leveraging data mining technology, we can extract invaluable insights from these evaluations, offering a robust scientific foundation for enhancing both teaching quality and administrative oversight. This study utilizes teaching evaluation data from a mathematics course at a university in Beijing to propose a comprehensive data mining framework covering both subjective and objective evaluations. The raw data are first cleaned, annotated, and preprocessed. Subsequently, for subjective evaluation data, a model combining Bidirectional Encoder Representations from Transformers (BERT) pre-trained models and Long Short-Term Memory (LSTM) networks is constructed to predict sentiment tendencies, achieving an accuracy of 92.76% and validating the model’s effectiveness. For objective evaluation data, the Apriori algorithm is employed to mine association rules, from which meaningful rules are selected for analysis. This research effectively explores teaching evaluation data, providing technical support for enhancing teaching quality and devising educational reform initiatives.

1. Introduction

As education informatization continues to progress, numerous universities and educational institutions have embraced online evaluation systems to gather student feedback on teaching, making these data a pivotal basis for assessing and enhancing teaching quality. These evaluations often encompass a wealth of information, spanning student ratings of various course aspects, their overall course evaluations, and suggestions for teachers. However, the sheer volume of these data, coupled with the lack of effective technical tools, has resulted in their utilization being limited to basic query and statistical operations. This underutilization of the latent insights within these data is disadvantageous for the future development of colleges and universities [1]. Therefore, the imperative arises to harness the vast assessment data to extract valuable information that can bolster teaching quality and refine teaching methodologies.
Data mining represents the intricate process of uncovering, retrieving, and meticulously analyzing latent patterns, connections, and insights buried within vast datasets. This interdisciplinary field fuses multiple domains, encompassing statistics, machine learning, database management, data visualization, and high-performance computing, with the ultimate objective of extracting meaningful and valuable information from data to empower decision-making and forecast future trends. As a profound data analysis technique, it is invaluable in analyzing teaching evaluations and thoroughly exploring the intricate relationships between teaching quality and its myriad influencing factors, surpassing the limitations of traditional methodologies [2].
Sentiment analysis, also referred to as opinion mining, harnesses techniques from text mining and computational linguistics to identify, analyze, process, summarize, and reason about subjective text that carries emotional undertones [3]. Text sentiment analysis aims to extract valuable information regarding users’ emotional states from such subjective texts [4]. It involves extracting key terms from the text, utilizing appropriate algorithms to analyze these keywords, and subsequently determining the overall emotional tendency of the text.
With the advent of the big data era and the exponential growth of data in education and teaching, manually managing the categorization of sentiments from a vast number of comments has become increasingly challenging for teachers and administrators. This challenge underscores the significant potential for applying sentiment analysis and opinion mining techniques. In education, sentiment analysis can unveil and analyze emotional tendencies and opinions expressed in student evaluations. It goes beyond mere grading statistics by leveraging deep learning and natural language processing techniques such as Convolutional Neural Networks (CNNs) and Long Short-Term Memory Network (LSTM) to delve into student feedback texts with greater depth. Academic research has demonstrated promising outcomes in this area. Specific sentiment analysis techniques have proven to be invaluable tools for institutions to tackle learning issues effectively. They also enhance the overall quality of higher education by evaluating the teaching–learning process and faculty performance. This capability not only aids in addressing specific educational challenges but also fosters continuous improvement within educational institutions [5].
Association rule mining, a pivotal aspect of data mining, falls under the domain of unsupervised learning, enabling the extraction of valuable insights from data without explicit guidance. One of the most renowned applications of association rule mining is “shopping basket analysis”, wherein consumer purchase records are scrutinized to unveil potential correlations between products. This analysis informs strategies such as product bundling or targeted recommendations, ultimately bolstering product sales and enhancing consumer satisfaction. Association rule mining is currently widely applied in the field of education. Mao et al. [6] obtained basic evaluations of the effectiveness of ideological and political education courses among college students through questionnaire surveys. They employed an improved Apriori-Gen algorithm to mine association rules from these evaluations. The refined Apriori-Gen algorithm corrected biases in the teaching process, thereby enhancing the effectiveness of ideological and political education. Liu et al. [7] developed an optimal Apriori algorithm model for assessing the effectiveness of classroom activities in higher education. They constructed an optimal evaluation function model using the support–confidence joint estimation method and optimized Apriori algorithm, enabling effective assessment and multidimensional parameter estimation of classroom activities in higher education.
Through in-depth research and the application of data mining techniques, student evaluation data can be effectively utilized to provide accurate feedback and decision support for colleges and universities, thereby fostering continuous improvement and innovation in educational quality. To address this, this study collects final evaluation text data from mathematics courses at a university in Beijing and focuses on the following key areas:
  • Detailed introduction of dictionary-based sentiment analysis methods, the BERT-LSTM model, and association rule mining.
  • Calculation of sentiment scores for unannotated text data using sentiment lexicon methods, determining sentiment tendencies through calculated scores combined with manual judgment.
  • Development of a teaching evaluation sentiment analysis model using BERT-LSTM for predicting and analyzing sentiment trends.
  • Application of the Apriori algorithm to mine association rules from objective teaching evaluation data, followed by analysis based on the mining results.
The structure of this paper is organized as follows: Section 2 discusses related work on the application of sentiment analysis and association rule mining in education. Section 3 elaborates on the approaches used for sentiment analysis and association rule mining. Section 4 presents the results of sentiment analysis utilizing the BERT-LSTM model. Section 5 showcases the findings from association rule mining using the Apriori algorithm. Finally, Section 6 provides a comprehensive summary and outlines future directions.
The flow of this paper for analyzing the assessment data is shown in Figure 1.

2. Related Work

2.1. Sentiment Analysis

Research methods for text sentiment analysis can be broadly categorized into three types: lexicon-based methods, machine learning-based methods, and deep learning-based methods [8].

2.1.1. Lexicon-Based Methods

Dictionary-based sentiment analysis represents one of the simplest approaches for assessing text polarity or emotional content [9]. Its primary advantage lies in its independence from training data, making it classified as unsupervised by some experts [10]. This method operates by utilizing a sentiment lexicon comprising opinion words, which are matched against the text to ascertain polarity. Opinion words in the lexicon are assigned sentiment scores that denote their positivity, negativity, or objectivity [11]. However, due to its reliance on the lexicon’s construction and selection, this method exhibits limited adaptability across diverse domains and struggles with complex or ambiguous words alone. Consequently, it is commonly employed as a complementary technique in conjunction with other methods in practical applications [12].

2.1.2. Machine Learning Methods

Machine learning approaches employ algorithms to learn and extract features from the text to determine its emotional polarity. Commonly utilized methods include Naive Bayes (NB), Support Vector Machine (SVM), and Logistic Regression (LR), among others.
Ortigosa, Martin, and Carro [13] proposed alternative sentiment analysis methods tailored for e-learning environments, combining a Spanish lexicon with machine learning techniques. Their experiments revealed that combining lexicon-based techniques with SVMs achieved the highest accuracy (83.27%). Dsouza et al. [14] utilized machine learning algorithms such as Random Forests, Polynomial Naive Bayesian Classifiers, and SVMs to analyze student feedback for sentiment analysis. They found that Polynomial Naive Bayesian Classifiers outperformed other methods with an accuracy of 80%.
However, the performance of machine learning methods in sentiment categorization is constrained by complex feature design and limited adaptability across different domains. Thus, the focus is shifting towards discovering effective feature combinations for future advancements [12].

2.1.3. Deep Learning Methods

Deep learning can establish neural networks that simulate the human brain for analysis and learning, capable of capturing more complex semantic and contextual information, thus finding widespread application in sentiment analysis. Deep learning-based sentiment analysis methods integrate feature extraction and text tendency judgment, eliminating the need for manual feature extraction and achieving higher accuracy, albeit at the cost of longer model training times.
Kandhro et al. [15] used the LSTM network model to predict and analyze the text of English student evaluation, which confirms that the deep network learning method can obtain a better classification accuracy than the traditional machine learning method. Zhang et al. [16] introduced a deep learning-based sentiment analysis method for the traditional teaching evaluation session and compared the traditional SVM model with the deep learning model, and they found that the accuracy of CNN and simple recurrent neural network (RNN) and LSTM was significantly improved compared to the traditional classification prediction methods. Cabada et al. [17] proposed several deep learning architectures for educational sentiment analysis with CNN using long and short-term memory, achieving 84.32% classification accuracy.

2.2. Association Rule Mining

In terms of association rule mining, Zhang et al. [18] mined the association between teachers’ teaching effectiveness and teaching assessment results, and found the association between teaching assessment results and the characteristics and learning basis of the class itself. Li [19] analyzed the association between each evaluation subitem of the objective evaluation data to obtain strong rules, and based on this, they put forward teaching improvement suggestions for teachers’ reference to improve teaching quality. Xu et al. [1] used the Apriori algorithm to dig deeper into students’ grades in each subject in the teaching management system, and obtained correlations between courses, which provided a reference for teaching administrators to design teaching plans.

3. Method

3.1. Sentiment Analysis Based on Sentiment Lexicon

3.1.1. Text Preprocessing

Data cleaning. The text of teaching evaluations often contains arbitrary and subjective elements, including non-standardized terms such as homophones, misspelled words, and comments expressed in pinyin. Moreover, the corpus may include redundant and irrelevant comments. If left unprocessed, these noisy data can lead to errors in corpus segmentation and lexical labeling, thereby affecting the accuracy of analysis results. Therefore, prior to segmentation, it is essential to perform denoising on the original data [20]. This involves eliminating worthless and redundant evaluations. Since the data originate from multiple datasets with overlapping entries, we conduct data integration and remove duplicate values to ensure data cleanliness.
Text segmentation. In Chinese text, words are inherently continuous and must be segmented into separate units for analysis. Utilizing tools or algorithms for segmentation is essential. This process enhances the accuracy of sentiment analysis, reduces the dimensionality of the feature space, and extracts key information from the text, enabling a better understanding and analysis of its sentiment tendencies. In this study, we employ the jieba segmentation package (version 0.42.1) to segment the subjective assessment text data.
Removing deactivated words. After segmenting the subjective assessment text data, the next step involves removing common words that lack clear sentiment tendencies in sentiment analysis, known as stopwords. These stopwords typically include functional words like prepositions, pronouns, and conjunctions, which contribute minimally to sentiment analysis. By eliminating stopwords, the accuracy and efficiency of the analysis can be improved. In this study, we utilize the Harbin Institute of Technology stopwords list to filter out stopwords from the segmented data.

3.1.2. Calculating Sentiment Scores Using a Sentiment Lexicon

To compute sentiment scores, we utilize the sentiment dictionary from the China National Knowledge Infrastructure, which is the most widely used and authoritative sentiment lexicon in China [20]. It categorizes words into degree-level words, positive and negative sentiment words, and positive and negative evaluation words. Prior to calculating the sentiment scores, we establish the score constants for degree-level words, as depicted in Table 1, and for positive and negative words, as shown in Table 2.
Degree words are classified into six levels, each corresponding to a distinct score. Both positive sentiment words and evaluative words are assigned a score of 1, while negative sentiment words and evaluative words carry a score of −1. When traversing the provided list of positive and negative words, we determine whether each word is present in the positive or negative lexicon.
If the word is found in the positive lexicon, we assess the lexical properties of the preceding word to assign different positive scores accordingly. For instance, if the preceding word is “most”, the score is multiplied by the highest score constant of 8 and the positive score constant of 1. If the preceding word does not have any special lexical property, a positive score constant of 1 is directly applied.
Similarly, if the word is located in the negative lexicon, the score is determined based on the lexical properties of the preceding word, and the corresponding negative score is assigned. Words not found in either the positive or negative lexicon are treated as non-attitudinal words and are assigned a constant score of 0.
The positive and negative scores from each evaluation are summed to derive the final sentiment score for that evaluation. For instance, the content and score of a single positive evaluation in the subjective assessment are illustrated in Table 3 below.
In this evaluation, the degree-level words are “very”, “very much”, and “most”, while positive evaluation words include “responsible”, “ important”, and “would”. Specifically, “very” corresponds to 6 points, “very much” and “most” correspond to 8 points, and all three positive evaluation words correspond to 1 point. Combinations of intensity words and positive evaluation words include “very responsible”, “important”, “most responsible”, and “would”. Therefore, the score for the phrase “very responsible” is 6 × 1 = 6, “very important” and “most responsible” are both 8 × 1 = 8, and “would” without an intensity word in front of it is 1 point. Applying the method outlined for calculating sentiment scores, we obtain 6 × 1 + 8 × 1 + 8 × 1 + 1 = 23. This evaluation yields a sentiment score of 23.

3.2. Sentiment Analysis Based on BERT-LSTM Model

3.2.1. BERT

The BERT pre-training model employs a bidirectional transformer to analyze the relationship between each word and others within a sentence. Through this process, it comprehends the contextual dependencies within the text and extracts intricate language features, which are subsequently utilized by the model [21]. BERT has two pre-training tasks: Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) [22].
In Masked Language Modeling, a sentence is given with one or more words randomly masked. The model is then tasked with predicting the masked words based on the remaining context. For the masked words, 80% of the time, they are replaced with a special token [MASK], 10% of the time with a random word, and in the remaining 10%, they are left unchanged [12]. Importantly, during fine-tuning on downstream tasks, the [MASK] token is not used. Another benefit of this approach is that during word prediction, the model does not know if the input word at that position is the correct word, forcing it to rely more on contextual information, thereby enhancing its error-correction capabilities.
Next Sentence Prediction involves providing two sentences from a document and requires the model to predict whether the second sentence follows the first one in the text. This task enhances the model’s understanding of context, enabling it to better grasp the logical relationships between sentences.

3.2.2. LSTM

LSTM, a type of temporal RNN, is well suited for processing and predicting significant events within time series data characterized by long intervals and delays. It effectively addresses the limitations of traditional RNNs, making it the most popular choice among RNN architectures today [23]. One of LSTM’s key features is its ability to utilize the output state from the previous time step as input for the next time step. This mechanism enables LSTM to capture long-range dependencies within sequences by leveraging gate control units, thus preserving crucial rule-compliant information [24].
In an LSTM cell, depicted in Figure 2, several components are crucial: a forgetting gate, an input gate, an output gate, a memory cell, and a hidden state. The forgetting gate determines whether information from the previous memory cell should be preserved or forgotten. It evaluates the previous cell’s output h t 1 and the current input x t , producing an output vector f t ranging between 0 and 1 through a sigmoid function. A value of 1 indicates complete retention, while 0 signifies complete discardment. The specific formula is represented in Equation (1), where σ denotes the sigmoid function, W f represents the forgetting gate weight, and b f denotes the forgetting gate bias.
f t = σ W f · h t 1 , x t + b f
The input gate determines which new information should be incorporated into the memory cell. It comprises a sigmoid layer and a tanh layer. The sigmoid layer regulates which information will be stored in the candidate state, as illustrated in Equation (2). Meanwhile, the tanh layer generates the candidate state itself, as depicted in Equation (3).
i t = σ W i · h t 1 , x t + b i ,
C t ~ = tanh W C · h t 1 , x t + b C .
After information passes through the forgetting gate and the output gate, the current memory cell undergoes an update. The previous state is multiplied by f t , where f t determines the portion of information to forget. The filtered candidate state is then added to form the new memory unit. The calculation formula is presented in Equation (4).
C t = f t × C t 1 + i t × C t ~ .
The output gate controls which feature information the current memory cell should output. Initially, it decides the information to be passed through via a sigmoid layer, as depicted in Equation (5). Subsequently, it processes the memory cell using the tanh layer and multiplies this result by the output from the sigmoid layer, yielding the final output information. This process is detailed in Equation (6).
o t = σ W o · h t 1 , x t + b o ,
h t = o t · tanh C t .

3.2.3. BERT-LSTM

When constructing the model, we opt to incorporate the LSTM layer following the BERT layer, aiming to amalgamate the contextual comprehension prowess of BERT with the sequence modeling capability of LSTM. This strategic arrangement enables the entire model to acquire multi-level features from textual information, facilitating the extraction and discernment of crucial implicit features influencing the emotional tendencies within students’ comments. The complete model comprises an input layer, a BERT layer, an LSTM network, a fully connected layer, and an output layer, as illustrated in Figure 3.
The input layer receives preprocessed sample features and forwards them to the BERT layer. Through the BERT pre-training model, input text sequences are transformed into their respective word embedding representations, which are then passed to the LSTM layer. Leveraging its feedback memory capability, the LSTM network extracts feature relationships between the context and words, thereby capturing sequential information in text by learning both long-term and short-term dependencies. Subsequently, the fully connected layer amalgamates the previously extracted features and maps them to a higher-dimensional feature space, enhancing their suitability for classification or regression tasks. Lastly, the output layer generates the final prediction results, utilizing the softmax function to convert the output values of the multiclassification into a probability distribution ranging from 0 to 1, summing up to 1 [25]. This distribution represents the probability of the different categories.
With this deep network model structure, the BERT-LSTM model enables us to conduct sentiment analysis of students’ evaluations effectively. By leveraging this model, vital implicit features can be extracted from the evaluations, thereby enhancing the ability to analyze and predict the sentiment tendencies therein.

3.3. Association Rule Mining Based on the Apriori Algorithm

Association rules highlight the correlations between individual transactions and other transactions, making them a valuable tool for uncovering interrelationships among data items within extensive datasets. This technique is commonly employed to unearth potential associations or connections between items, thereby facilitating recommendations based on patterns identified in the data [26].

3.3.1. Basic Concept of Association Rules

Support represents the proportion of itemsets that contain both X and Y simultaneously, quantifying the likelihood of co-occurrence of these items within the dataset. It essentially measures the probability of encountering both X and Y within an itemset:
S u p p o r t X Y = P X , Y .
The minimum support serves as the threshold that a user-defined association rule must surpass.
Confidence denotes the ratio of the number of itemsets containing both X and Y to the number of itemsets containing only X.
C o n f i d e n c e X Y = S u p p o r t X , Y S u p p o r t ( X ) ,
C o n f i d e n c e X Y = P Y | X .
The minimum confidence signifies the smallest confidence threshold set by the user that an association rule must exceed, indicating the minimum level of reliability required for the association rule to be considered significant.
Lift is computed as the ratio of the conditional probability of the term before and after the association rule to the probability of the term after it, as illustrated below:
L i f t X Y = C o n f i d e n c e X Y P ( Y ) = P Y | X P ( Y ) .
When Lift = 1, it indicates that the prior and posterior terms are statistically independent.
When Lift > 1, it suggests a positive correlation between the prior and posterior terms, implying that the likelihood of both X and Y occurring together is higher than if they were independent.
When Lift < 1, it signifies a negative correlation between the two terms.
Lift serves as a measure of the association strength between the antecedent and consequent terms in an association rule [27]. Typically, when Lift exceeds 1, it signifies a meaningful relationship between the antecedent and consequent terms. Consequently, filtering association rules based on a Lift threshold greater than 1 is often straightforward and effective [19].
Frequent itemsets refer to non-empty sets of items with support greater than or equal to the minimum support threshold. When a frequent itemset contains k elements, it is termed a k-frequent itemset.
An association rule that meets both the minimum support and minimum confidence criteria is referred to as a strong association rule [19].

3.3.2. Apriori Algorithm

Association rule mining is used to mine the rules with support greater than the minimum support and confidence greater than the minimum confidence from the transaction set, so association rule mining can be divided into two steps. The first step is to discover frequent itemsets, and the second step is to generate association rules.
The Apriori algorithm stands as one of the most classic association rule algorithms. Leveraging the downward closure property of frequent itemsets, it systematically identifies all rules meeting the minimum support and minimum confidence criteria. By iteratively generating and pruning candidate itemsets, it sifts through potential associations, ultimately retaining those that satisfy the screening criteria [27]. This algorithm addresses limitations present in models like discrete choice, which lack the ability to quantitatively compare and analyze result variations across groups. Its versatility and scalability make it widely applicable [28].
The Apriori algorithm operates based on the principle that any subset of a frequent itemset must also be frequent, while a superset of an infrequent itemset must also be infrequent. Employing an iterative approach termed layer-by-layer search, it progresses from k-itemsets to (k + 1)-itemsets. Initially, it identifies the set of frequent 1-itemsets, denoted as set L1. L1 serves as the foundation for discovering the set L2 of frequent 2-itemsets, which, in turn, leads to the identification of L3, and so forth, until no further frequent k-itemsets can be found. This iterative nature of the Apriori algorithm efficiently compresses the search space, enhancing the generation of frequent itemsets layer by layer.
The execution flow of the Apriori algorithm is depicted in Figure 4 below:

4. Subjective Evaluation of Teaching Analysis

4.1. Experimental Data Preparation

4.1.1. Preprocessing of Subjective Evaluation Data

After preprocessing the text data, the essential vocabulary can be extracted from the subjective assessment data. By conducting a simple word frequency count, the top 20 words with the highest occurrences are identified, as depicted in Figure 5.
Subsequently, a word cloud is generated based on these extracted words and their respective frequencies. The word cloud visually presents an overview of the evaluation by depicting words in varying sizes and colors based on their frequency and importance, as illustrated in Figure 6.
It is evident that words with high frequencies are predominantly positive. Words such as “good”, “serious”, “conscientious and responsible”, “patience”, and “humor” have notably high occurrence frequencies. This suggests that the majority of teachers are diligent and responsible in their teaching, and their lessons are well received by students.
After completing the preprocessing of the teaching evaluation texts, a dataset comprising 13,026 subjective teaching evaluations was obtained. Calculating sentiment scores for these data reveals 4933 positive evaluations, 7955 neutral evaluations, and 138 negative evaluations. Figure 7 illustrates the distribution of teaching evaluation data across various emotional tendencies.
It is evident that neutral evaluations constitute the largest proportion, representing 61.07% of all subjective evaluation data. This predominance may stem from many students not giving sufficient attention to teaching evaluations, particularly subjective ones. Subjective teaching evaluations often entail using non-emotional vocabulary and straightforward responses. The following neutral evaluations are positive ones, comprising 37.87% of all subjective evaluation data. Conversely, negative evaluations make up only 1.06%. This indicates that the majority of students who provided detailed evaluations had a positive learning experience from the courses. They acquired knowledge, enhanced their skills, and were generally content with the courses.
After calculating the sentiment scores for each subjective evaluation, those with a score greater than 0 are categorized as positive and labeled as 1. Evaluations with a score less than 0 are deemed negative and labeled as 2. Evaluations with a score equal to 0 are regarded as neutral and labeled as 0. Table 4 below presents the evaluations with the highest and lowest scores.

4.1.2. Addressing Category Imbalance

In university evaluation data, the number of positive and neutral evaluations typically far exceeds that of negative evaluations. Directly applying traditional machine learning sentiment classification methods to such unbalanced sentiment category corpora can lead to biased classification results favoring the majority class [29], thereby impacting the model’s training and prediction. To address this issue, the positive and neutral evaluations are downsampled by randomly selecting 150 samples from each tendency. Subsequently, the negative evaluations are upsampled by augmenting them to 150, fine-tuning the inflectional order to ensure consistency in sample numbers across the three types of evaluations. Finally, 20% of the samples from each evaluation type are randomly selected as test samples, with the remainder utilized as training samples.

4.2. Performance Metrics

To effectively assess the model’s performance, commonly used evaluation metrics such as accuracy, precision, recall, and F 1 Score are chosen to gauge the predictive effectiveness in classification tasks [30]. The confusion matrix for the predicted results is shown in Table 5.
Accuracy represents the ratio of correctly predicted samples to the total number of samples, serving as a comprehensive evaluation metric for the entire model. However, it may not provide an accurate assessment of model performance when there is significant class imbalance. The formula for calculating accuracy is
A c c u r a c y = T P + T N T P + T N + F P + F N .
The precision rate measures the proportion of true positive samples among all instances predicted as positive by the model. It is calculated by the following formula:
P r e c i s i o n = T P T P + F P .
Recall, also referred to as sensitivity, quantifies the percentage of samples correctly predicted as positive by the model out of all samples that actually have a positive true value. It is calculated using the following formula:
R e c a l l = T P T P + F N .
The F 1   S c o r e is a harmonized average of precision and recall, given equal weight in its calculation. It is computed as follows:
F 1   S c o r e = 2 × p r e c i s i o n × r e c a l l p r e c i s i o n + r e c a l l .

4.3. Experimental Environment and Parameter Settings

The experiment is based on the TensorFlow 2.12.0 (TensorFlow Team, Mountain View, CA, USA) deep learning framework and coded in Python 3.11.5 (Python Software Foundation, Beaverton, OR, USA). The BERT-LSTM model’s hyperparameter settings are detailed in Table 6 below. The model’s input layer utilizes BERT-base-Chinese, an open-source Chinese pre-trained language model provided by Hugging Face, as the pre-trained model.

4.4. Results

Following model training, we derive the prediction accuracy values for both training and validation datasets, showcased in Figure 8a. Additionally, Figure 8b illustrates the prediction loss values for the training and validation datasets. The training accuracy of the model shows an initial sharp increase followed by a gradual leveling off, reaching a peak of 100%. Similarly, the validation accuracy demonstrates rapid growth in the early stages, peaking at 98.33% by the fourth iteration. The validation loss value fluctuates within a certain range after an initial rapid decrease. Considering both the training accuracy and the magnitude of the loss value, we designate the network model parameters from the 14th iteration as the optimal parameters.
For the test samples, utilizing the BERT-LSTM model for sentiment analysis yields a prediction accuracy of 92.76%, achieved within a prediction time of only 43 s. This not only attains high prediction accuracy but also significantly reduces sentiment analysis time compared to manual judgment. A summary of the model performance is shown in Table 7.
As illustrated in Figure 9, only four positive samples with a true label of 1 are incorrectly predicted as label 2, two negative samples with a true label of 2 are misclassified as label 1, and three neutral samples with a true label of 0 are misclassified as label 1. The recall rate for positive samples reaches 90.00%, 95.00% for negative samples, and 92.50% for neutral samples.

5. Objective Evaluation of Teaching Analysis

5.1. Objective Evaluation Data Preprocessing

5.1.1. Data Cleaning

The objective evaluation data encompass multiple datasets, totaling 146,760 records. Upon integration, considering potential overlaps and missing values within individual datasets, the following processing steps are executed:
Identification and removal of duplicate data: A total of 33,076 duplicate entries are detected. Only the initial entry is retained, while subsequent duplicates are deleted.
Elimination of entries with missing values: Due to the dataset’s size, missing values are relatively infrequent. Therefore, entries containing missing values are directly removed, exerting minimal impact on the analytical outcomes.
Following data cleaning procedures, 130,205 data entries persist, representing 88.72% of the original dataset.

5.1.2. Data Conversion

The Apriori algorithm, employed for association rule mining, is designed exclusively for Boolean data. Consequently, numerical objective assessment data must be converted into a Boolean format.
The objective assessment of teaching comprises 20 questions, each referenced by a keyword for streamlined analysis, detailed in Table 8 below:
In the objective assessment data, scores for each item fall within the range of [3,5]. Consequently, the grades are categorized based on these scores. Taking question 1 as an example, the conversion is illustrated in Table 9. Similarly, the remaining 19 objective assessment questions are also transformed into categorical data using this approach.

5.2. Association Rule Mining for Objective Assessment Data

5.2.1. Association Rule Mining Results

The minimum support is established as 0.8 and the minimum confidence as 0.8, filtering association rules with a Lift greater than 1. This process yields a total of 8643 association rules. Among them, 22 are deemed to possess practical significance for further analysis. The filtered association rules are displayed in Table 10 below:

5.2.2. Association Rule Analysis

Due to the abundance of association rules obtained, this paper will only select a subset for analysis.
According to the analysis of Rules 3, 6, and 15, thorough lesson preparation and using practical examples in explanations are likely crucial factors for organized teaching, inspiring student reflection, focusing on key course elements, and ensuring effective teaching methods. The professionalism and practical skills of teachers in lesson planning and delivery directly impact teaching effectiveness and student learning experiences.
According to the analysis of Rule 7, teaching content of moderate difficulty can prevent students from feeling overly challenged or bored, while also stimulating their interest and deepening their thinking. This balance helps teachers impart knowledge more effectively, enabling students to follow and understand the teaching content more easily, thereby enhancing teaching effectiveness and learning experience.
According to Rule 14, timely grading of assignments, thorough responses to questions, and clear explanations by teachers not only help students better grasp course material but also promote their cognitive development and deepen their learning.
Rule 18 indicates that teachers not only impart knowledge but also stimulate students’ critical thinking abilities. Through effective classroom management and clear explanations, they assist students in better understanding course material and mastering key concepts.
According to Rule 19, when teachers conduct classes in an organized manner, their teaching is usually more structured and logically coherent, which helps students utilize classroom time more effectively. Explaining theoretical concepts through practical examples makes abstract content more concrete and applicable, thereby making it easier for students to understand and master course materials.

6. Conclusions and Future Work

This study employs sentiment analysis and association rule mining techniques to analyze the teaching evaluations of mathematics courses at a university in Beijing, presenting an effective method for mining teaching evaluation data. Firstly, a BERT-LSTM model is constructed to predict the sentiment orientation of subjective evaluations, achieving an accuracy of 92.76% as validated by experimental results, confirming the model’s efficacy. Subsequently, the Apriori algorithm is applied to mine association rules from objective evaluations. From numerous generated rules, 22 significant rules are selected for in-depth analysis. These rules help elucidate the impact of specific teaching behaviors within defined parameter ranges on students’ learning outcomes, while also revealing potential interrelationships among them.
This research underscores the practical utility of text mining and association rule mining methods in extracting actionable insights from teaching evaluation data, thereby providing technical support for educational reform decisions. Regrettably, due to privacy concerns, information regarding evaluated teachers’ gender, age, and title could not be obtained. Access to such information might uncover additional valuable association rules.
Furthermore, this study identifies several issues. Firstly, there exists a high degree of similarity among objective evaluation items, such as “The teaching methods used by the instructor were very effective for me” and “The instructor’s explanations made it easier for me to understand and grasp the course content”, both addressing core aspects of teaching effectiveness. Redundant indicators not only diminish the efficiency of the evaluation system but also hinder subsequent data mining and analysis efforts. Secondly, many subjective evaluations provided by students are overly brief or incomplete, impeding the analysis of teaching evaluations. Research indicates that students are more likely to participate actively in evaluations if they perceive them as beneficial for course improvement. However, skepticism regarding the seriousness of evaluation outcomes or concerns about the time-consuming nature of the process may lead students to opt out [31]. Universities could consider strategies such as providing timely feedback on evaluation results and demonstrating tangible improvements in teaching methods based on evaluation outcomes to inform students that their feedback contributes to meaningful reforms and empowers them to provide thoughtful evaluations.
Lastly, it is essential to acknowledge the limitations of the Apriori algorithm, particularly its efficiency constraints when dealing with larger datasets. Enhancing algorithm efficiency should be a priority in future research efforts to streamline the association rule mining process effectively. This may involve exploring alternative algorithms or optimizing the existing Apriori algorithm to handle large datasets more efficiently.

Author Contributions

Conceptualization, F.Q. and Y.G.; methodology, Y.G.; software, M.W. and Y.G.; validation, Y.G.; resources, T.J.; data curation, T.J.; writing—original draft preparation, Y.G.; writing—review and editing, F.Q.; visualization, Y.G.; supervision, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in the study are openly available in openicpsr at https://www.openicpsr.org/openicpsr/workspace?goToPath=/openicpsr/203941&goToLevel=project (accessed on 26 May 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Xu, Q.; Li, J.; Wang, Y.; Zhang, L. Student Achievement Analysis and Visualization Based on Apriori algorithm. J. Tonghua Norm. Coll. 2023, 44, 81–87. [Google Scholar]
  2. Tang, S. Design and Implementation of Data Mining Based Evaluation System for Colleges and Universities. Master’s Thesis, University of Electronic Science and Technology of China, Chengdu, China, 2010. [Google Scholar]
  3. Liu, Y.; Zhao, Y. Sentiment analysis of Chinese short text based on teaching evaluation. Mod. Electron. Tech. 2019, 42, 30–33+37. [Google Scholar]
  4. Wang, Y.; Zhu, J.; Wang, Z.; Bai, F.; Gong, J. Review of applications of natural language processing in text sentiment analysis. J. Comput. Appl. 2022, 42, 1011–1020. [Google Scholar]
  5. Baragash, R.; Aldowah, H. Sentiment analysis in higher education: A systematic mapping review. J. Phys. Conf. Ser. 2021, 1860, 012002. [Google Scholar] [CrossRef]
  6. Mao, C.; Zou, S.; Yin, J. Educational Evaluation Based on Apriori-Gen Algorithm. Eurasia J. Math. Sci. Technol. Educ. 2017, 13, 6555–6564. [Google Scholar] [CrossRef]
  7. Liu, D.; Zhang, L. Construction of Higher Education Management and Student Achievement Evaluation Mechanism Based on Apriori Algorithm. Mob. Inf. Syst. 2022, 2022, 5375825. [Google Scholar] [CrossRef]
  8. Lei, P.; Qin, B.; Wang, L.; Wu, Y.; Liang, S.; Chen, Y. PRBDN: A Pretraining-based Emotion Classification Model for Weibo Comment. J. Chin. Inf. Process. 2022, 36, 101–108. [Google Scholar]
  9. Öhman, E. The Validity of Lexicon-based Emotion Analysis in Interdisciplinary Research. In Proceedings of the Workshop on Natural Language Processing for Digital Humanities; NLP Association of India (NLPAI), NIT Silchar: Silchar, India, 2021; pp. 7–12. [Google Scholar]
  10. Wankhade, M.; Rao, A.C.S.; Kulkarni, C. A survey on sentiment analysis methods, applications, and challenges. Artif. Intell. Rev. 2022, 55, 5731–5780. [Google Scholar] [CrossRef]
  11. Aung, K.Z.; Myo, N.N. Sentiment analysis of students’ comment using lexicon based approach. In Proceedings of the 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS), Wuhan, China, 24–26 May 2017; pp. 149–154. [Google Scholar]
  12. Cai, R.; Qin, B.; Chen, Y.; Zhang, L.; Yang, R.; Chen, S.; Wang, W. Sentiment Analysis About Investors and Consumers in Energy Market Based on BERT-BiLSTM. IEEE Access 2020, 8, 171408–171415. [Google Scholar] [CrossRef]
  13. Ortigosa, A.; Martín, J.M.; Carro, R.M. Sentiment analysis in Facebook and its application to e-learning. Comput. Human Behav. 2014, 31, 527–541. [Google Scholar] [CrossRef]
  14. Dsouza, D.D.; Deepika, D.P.N.; Machado, E.J.; Adesh, N.D. Sentimental analysis of student feedback using machine learning techniques. Int. J. Recent Technol. Eng. 2019, 8, 986–991. [Google Scholar]
  15. Kandhro, I.A.; Wasi, S.; Kumar, K.; Rind, M.; Ameen, M. Sentiment analysis of students’ comment using long-short term model. Indian J. Sci. Technol. 2019, 12, 1–16. [Google Scholar] [CrossRef]
  16. Zhang, J.; Chen, F.; Zhang, P. The role and implementation of students’ sentiment analysis in curriculum teaching evaluation. Comput. Knowl. Technol. 2019, 25, 184–188. [Google Scholar]
  17. Cabada, R.Z.; Estrada, M.L.B.; Bustillos, R.O. Mining of educational opinions with deep learning. J. Univ. Comput. Sci. 2018, 24, 1604–1626. [Google Scholar]
  18. Zhang, Y. The Application Research of Data Mining Technique in Teaching Evaluation. Master’s Thesis, Jinan University, Guangzhou, China, 2012. [Google Scholar]
  19. Li, F. Research on Mining Association Rules of Online Teaching Evaluation Data and its Application in Teaching Quality Improvement. Master’s Thesis, Beijing University of Post and Telecommunications, Beijing, China, 2021. [Google Scholar]
  20. Yan, X.; Zhang, K. Application of sentiment analysis technology in postgraduate teaching evaluation text. Comput. Era 2019, 51–54+58. [Google Scholar]
  21. Pu, Q.; Huang, F.; Wang, H. Research on sentiment analysis based on BERT-LSTM Model. J. China Acad. Electron. Inf. Technol. 2023, 18, 912–920. [Google Scholar]
  22. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Lauguage Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
  23. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  24. Sang, Q.; Wang, S. Sentiment analysis of student evaluation based on text mining technology. China Comput. Commun. 2022, 34, 38–41. [Google Scholar]
  25. Ge, Y.; Liu, W.; Gu, Y. Stock price prediction method by fusing sentiment analysis and GAN-TrellisNet. Comput. Eng. Appl. 2024, 60, 314. [Google Scholar]
  26. Tang, S. Python Machine Learning Foundation; Tinghua University Press: Beijing, China, 2022; ISBN 978-730-261-128-8. [Google Scholar]
  27. Jing, G.; Qin, H.; Jiang, F. Coal mine safety accident analysis based on Apriori algorithm. J. Saf. Environ. 2024, 6, 1–9. [Google Scholar]
  28. Hu, S.; Yang, B.; Weng, J.; Zhou, W. A cause analysis of residents’ dependence on public transportation based on association rules. J. Transp. Inf. Saf. 2023, 41, 147–156. [Google Scholar]
  29. Zhang, Z.; Xue, J.; Chen, G. Sentiment Analysis of Class Imbalance Data Under the Framework of Deep Learning. J. Mod. Inf. 2021, 41, 75–82. [Google Scholar]
  30. Xu, X.; Tian, K. A novel financial text sentiment analysis-based approach for stock index prediction. J. Quant. Technol. Econ. 2021, 38, 124–145. [Google Scholar]
  31. Hoel, A.; Dahl, T.I. Why bother? Student motivation to participate in student evaluations of teaching. Assess. Eval. Higher Educ. 2018, 44, 361–378. [Google Scholar] [CrossRef]
Figure 1. Teaching evaluation data mining process.
Figure 1. Teaching evaluation data mining process.
Mathematics 12 02692 g001
Figure 2. The framework of the LSTM.
Figure 2. The framework of the LSTM.
Mathematics 12 02692 g002
Figure 3. BERT-LSTM model for sentiment analysis of subjective evaluation of teaching.
Figure 3. BERT-LSTM model for sentiment analysis of subjective evaluation of teaching.
Mathematics 12 02692 g003
Figure 4. Flow of execution of Apriori algorithm.
Figure 4. Flow of execution of Apriori algorithm.
Mathematics 12 02692 g004
Figure 5. Statistics on the number of users of subjective assessment vocabulary.
Figure 5. Statistics on the number of users of subjective assessment vocabulary.
Mathematics 12 02692 g005
Figure 6. Subjective evaluation keyword word cloud.
Figure 6. Subjective evaluation keyword word cloud.
Mathematics 12 02692 g006
Figure 7. Proportion of words with each emotional tendency.
Figure 7. Proportion of words with each emotional tendency.
Mathematics 12 02692 g007
Figure 8. (a) Training and validation accuracy; (b) training and validation loss.
Figure 8. (a) Training and validation accuracy; (b) training and validation loss.
Mathematics 12 02692 g008
Figure 9. Confusion matrix.
Figure 9. Confusion matrix.
Mathematics 12 02692 g009
Table 1. Score constants for degree-level words.
Table 1. Score constants for degree-level words.
TypeScore
most8
very6
more4
over2
ish0.6
insufficiently−1.5
Table 2. Score constants for positive and negative words.
Table 2. Score constants for positive and negative words.
TypeScore
positive1
negative−1
no attitude0
Table 3. Example of sentiment score calculation.
Table 3. Example of sentiment score calculation.
EvaluationSegmentScore
A very responsible teacher, Chaoxing and MOOC have his own time to record
microclasses, as well as for the examination and non-examination students
of different levels of difficulty of the knowledge points of the explanation,
the class will also be uploaded to the Chaoxing so that we can review after class,
on the homework of the explanation is also very important, he will also record
the course for us to speak, Mr. Tan is the most responsible teacher I met in the university.
[‘very’, ‘responsible’, ‘class’, ‘time’, ‘record’,
‘micro classes’, ‘different ‘, ‘difficult’, ‘knowledge’,
‘explanation’, ‘learning’, ‘lesson’, ‘will’, ‘homework’,
‘explanation’, ‘very’, ‘important’, ‘university’, ‘met’,
‘the most’, ‘responsible’]
23
Table 4. Sentiment tendency and labeling of teaching evaluations.
Table 4. Sentiment tendency and labeling of teaching evaluations.
IndexEvaluationScoreTendencyLabel
1
(1)
The teacher lectures seriously, carefully, can make full use of time, the image of the organization, the key knowledge of the explanation is very clear and easy to understand, so that the students on the knowledge of easy to understand, the teacher’s passion for lectures will infect us, the classroom atmosphere is very good.
(2)
The teacher is good at mobilizing the enthusiasm of students, the classroom atmosphere is very active. Teaching carefully and kindly. Attitude serious and responsible, extremely patient, is our heart of the teacher’s dear.
(3)
The teacher is very serious about lecturing, content outline, organized and strong, and especially good at giving examples, so that students theory and practice, learning is very easy, and impressive, received good results. The teacher is kind, classroom interaction with students, creating a warm classroom atmosphere.
(4)
The teacher is rigorous, strict requirements, can deeply understand the students’ learning and living conditions, and follow the good advice, approachable. Attention to inspire and mobilize students’ enthusiasm, the classroom atmosphere is more active.
49positive1
2The instruction was clearly delineated and organized, it just got in the way of the overall level being a little easy for me, but that’s not the teacher’s problem.0.4positive1
3Some particularly simple topics can be briefly summarized and less homework can be assigned.−12negative2
4Classes are slightly boring.−0.6negative2
Table 5. Confusion matrix.
Table 5. Confusion matrix.
Positive Simple in ForecastNegative Simple in Forecast
Positive simple in realityTPFN
Negative simple in realityFPTN
Table 6. Hyperparameter settings.
Table 6. Hyperparameter settings.
HyperparameterConversion Result
learning rate0.00002
max_length75
batch_size16
LSTM units128
dropout rate0.2
optimizerAdam
loss functionSparseCategoricalCrossentropy
epochs30
Table 7. Model performance summary.
Table 7. Model performance summary.
ItemLabelPrecisionRecallAccuracy
Neutral0100.00%92.50%92.76%
Positive187.80%88.89%
Negative290.48%92.68%
Table 8. Keywords used to refer to questions.
Table 8. Keywords used to refer to questions.
IndexItemWord
1At the beginning of the semester, the teacher clearly told us the course requirements and assessment methods.Information
2I can grasp the teacher’s main points.Emphasis
3The teacher’s content is moderate in difficulty and ease.Moderate
4The teacher can explain the knowledge points through practical cases.Case
5I feel the teacher is organized in class.Orderliness
6The teaching method adopted by the teacher works well for me.Effectivity
7Teachers can inspire me to think.Enlighten
8I think the teacher makes good use of the class time.Efficiency
9The teacher’s explanation made it easier for me to understand and master the course content.Explain
10The teacher’s lectures are full of spirit and not boring.Interesting
11I have the opportunity to participate in classroom interaction.Interaction
12The teacher often assigns us homework.Homework
13Teachers timely comment homework, answer questions carefully.Answer
14Teachers respect us and care about us.Respect
15The teacher is strict with his students.Strictness
16If I need to, I can contact the teacher to communicate.Communication
17The teachers are well-prepared and proficient in their lessons.Preparation
18Through the teacher’s guidance, I have mastered the main content of this course.Guidance
19The teacher’s words and deeds give me inspiration in life.Edification
20I think the teacher is one of the teachers teaching this semester.Evaluation
Table 9. Data conversion rules.
Table 9. Data conversion rules.
ItemScoreLevelConversion Result
Information51Information1
Information4.52Information2
Information43Information3
Information3.54Information4
Information35Information5
Table 10. Association rule mining results.
Table 10. Association rule mining results.
IndexAntecedentsConsequentsSupportConfidenceLift
1Orderliness1, Preparation1, Case1Explain10.830.981.08
2Enlighten1, Preparation1Explain10.840.981.09
3Preparation1Enlighten1, Orderliness10.840.901.05
4Orderliness1, Preparation1Efficiency1, Explain10.820.931.09
5Information1, Preparation1Emphasis10.850.961.08
6Preparation1, Case1Orderliness10.850.971.07
7Moderate1Explain1, Emphasis10.810.951.11
8Respect1Answer10.810.961.12
9Homework1Emphasis10.850.921.02
10Interesting1, Preparation1, Orderliness1Emphasis10.800.981.10
11Information1, Preparation1Guidance10.810.911.09
12Homework1, Information1Answer10.800.921.07
13Information1, Preparation1, Orderliness1Emphasis10.840.971.09
14Answer1, Explain1Enlighten10.800.971.10
15Preparation1Efficiency1, Emphasis1, Orderliness10.830.901.06
16Orderliness1, Case1Effectivity10.850.971.07
17Interesting1Emphasis10.820.971.08
18Enlighten1Efficiency1, Explain1, Emphasis10.820.931.10
19Orderliness1, Case1Efficiency1, Explain1, Emphasis10.810.931.11
20Enlighten1, Case1,
Preparation1
Emphasis10.810.981.09
21Case1Interesting1, Explain10.810.891.07
22Answer1Explain1, Emphasis10.810.941.09
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qi, F.; Gao, Y.; Wang, M.; Jiang, T.; Li, Z. Data Mining of Online Teaching Evaluation Based on Deep Learning. Mathematics 2024, 12, 2692. https://doi.org/10.3390/math12172692

AMA Style

Qi F, Gao Y, Wang M, Jiang T, Li Z. Data Mining of Online Teaching Evaluation Based on Deep Learning. Mathematics. 2024; 12(17):2692. https://doi.org/10.3390/math12172692

Chicago/Turabian Style

Qi, Fenghua, Yuxuan Gao, Meiling Wang, Tao Jiang, and Zhenhuan Li. 2024. "Data Mining of Online Teaching Evaluation Based on Deep Learning" Mathematics 12, no. 17: 2692. https://doi.org/10.3390/math12172692

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop