4.3. System Design
Rule-based models and machine-learning algorithms are two major approaches to building a cause identification or classification system. Recently, machine-learning algorithms have been applied to text analysis on accident reports. Unfortunately, these methods have several limitations in this study. First, it is difficult to find the description of causes in reports, so the characteristics of the causes will be uncertain for these methods. In most cases, the description of each cause appears in a sentence without a fixed location, and each report consists of dozens of sentences. Second, the identification effect of causes with low support is poor. A fairly high number of cases (e.g., 75 to 100) is the minimum needed to obtain a valid statistical model, even when learning shorter texts. However, several reasons, such as E8, C2, and C5, occur fewer than ten times in our dataset. Third, machine-learning algorithms cannot perform effective multilabel classification in our dataset. If each sample has one or more labels, 25 categories are too large a number for classifying 1775 samples. Fourth, though unsupervised machine-learning algorithms allow classification based on word distribution, they cannot meet the requirement of identifying specific causes.
Therefore, a rule-based system seemed to be a better choice because of the following advantages. First, the precondition of the dataset scale is not specified if using rules. We can design specific rules for each cause, even without high support. Second, the upgrading and iteration for rules in a rule-based system can directly improve performance and can easily determine the content related to rules. In contrast, statistical models improve their performance by blind tuning the parameters, which indicates only a better result or a worse one, without knowing the details. Third, the rule-based system enables the intervention of human knowledge and intelligence. In a specific domain, professional information matters since it goes beyond the limits of available data. While statistical models can obtain broad but shallow features, a rule-based model provides considerable insight into the incident-related context. Zhang [
11] used an unsupervised approach based on grammar rules, namely, chunking, to extract common causes of construction accidents. As noted in [
5], such a system is not elegant but is effective.
As mentioned above, the rule-based system is composed of cause-related keywords and rules. The rules, which can be regarded as grammar are established based on the keywords. Therefore, they would be updated at the same time. Here, we provide the details of how we build the keywords dictionary and rules set.
Keywords and rules are equivalent to the components of a sentence and grammar, respectively. In modern Chinese, the essential components, such as subject, verb, object, and the order of these components, are consistent in formal paperwork. Thus, three kinds of keywords were sought: the subject keyword (SK), attribute keyword (AK), and evaluation keyword (EK). The SK is the unit responsible for the cause, and it is always the subject in every sentence. Since the reports are written by multiple entities, the same subject may have several representations. For example, more than ten nouns could represent pilots in this context, such as “飞行员” (Pilots), PF (Pilot Flying), and “驾驶舱” (Cockpit), although their independent meanings are not exactly the same. Since H1 to H5 are all descriptions of pilot errors, they share the same SK list. AK is a property or action of SK, which is associated with a cause. Similarly, a single cause can lead to different behaviors in different situations, so each cause has dozens of AKs to be detected. For example, H1 corresponds to “预估” (estimate/estimation), “评判” (judge/judgement), “决断” (decide/decision) and so on. The SK and AK can only describe the behavior of a unit, which is not sufficient to make a judgement regarding it, so the EK is introduced. As assessments of the AK, EKs are adjectives and adverbs, corresponding to different parts of speech of AKs. Since they were all negative words, the EKs were only divided into adjectives and adverbs. For instance, “欠佳” (not good enough), “缺乏” (be short of), “过于” (too much), “过低” (too low) and “欠妥” (not proper) were the most frequent adjectives and adverbs among EKs.
Specific keywords alone are not sufficient to determine the cause, and rules are needed to guide the usage of keywords. The principle of the rules, as mentioned previously, is to determine one cause from the keywords and other information in the sentence. However, not all cases were detected with rules, and this depended on the type of cause and the written patterns. Six causes (E1, E3–E7) could be determined by only detecting keywords (SKs or AKs). Taking E1 (bird strike) as an example, the occurrence of bird-related words, such as “鸟击” (bird strike), “羽毛” (feather) and “鸟血” (bird’s blood), was enough for identification. In identifying these six causes, some cases also adopted special rules. Bird-related words (SKs) and trace-related words (AKs) with no human-related words (SKs) in the context were also needed when identifying E1. In addition, the numbers recorded quantifiable parameters, such as visibility and lateral wind speed. Such values reached a threshold that triggered identification. In detail, visibility below 1,000 meters triggered E3, and a lateral wind speed over 10 m/s triggered E4.
For EI-type causes, the occurrence of keyword (SK and AK) combinations could be used for identification. These causes were detected using identical words, although these words were not always in the same order. For example, “链条断裂” (chain break) and “断裂的链条” (broken chain) could both trigger the identification of EI2 by detecting “链条” (chain) and “断裂” (break).
H-type and O-type cause-related rules were more complex because more than one AK and SK were distributed in different clauses of the same sentence. The key was pairing each AK with its corresponding SK. Tixier et al. [
5] used fixed radii to link keywords, of which the distances from each other less than 7 could be related to each other. As this value could vary for different contents, it is not robust enough, especially in a sentence that contains multiple clauses. Instead of fixed radii, we adopted another effective distance, which depended on other information from the sentence. The scope was initialized with punctuation. Assuming that a clause was the minimal scope, commas and periods could separate clauses from each other. In addition, words like “和” (and) or “他” (he) would extend the scope of the SK to next clause.
Here, we illustrate the whole process of cause identification with an example report shown in
Table 8. First, the keyword “雷雨” (thunderstorms) in sentences #2, #4, and #7 can determine the cause E3. Second, in sentence #4, only “机组” (aircrew) serves as an SK, and all the AKs in this sentence relate to pilots. “精力” (vigor) and “设置” (set) are the AKs in this sentence. “Vigor” is a noun, and the closest EK is “分散” (distracted), which is a adjective and can be the predicate of “vigor”. As an adverb, “错误地” (wrongly) could be the EK of the verb “set”. Therefore, H5 and H1 are the causes identified in sentence #4. In sentence #5, “机组” (pilots) and “管制” (control) could be the SK, while “管制区域” (control area) is a compound-specific word, which makes the noun “管制区域交接” (control area handover) an attribute of the pilot. The AK “喊话” (communication) and the EK combination of “未” (none) and “标准” (standard) can identify H2. When identifying H4 from AK “交叉检查” (cross-check), “以及” (as well as) expands the scope of EK “没有” (no), which makes “no” an EK of “cross-checking”. The analysis is almost consistent with the narrative and can identify similar causes. However, “通讯环境嘈杂” (noisy communication environment) is mentioned as a consequence of thunderstorms.
“疑似听错” (suspected of mishearing) is not an assertion but a conjecture, which is not used in identification. An analysis of possible causes could also be recorded, even though these causes are not mentioned in the report. One observation is that certain words, such as “未”, “没”, and “疑似”, appear in context after manual analysis. Such words should be detected if a cause is identified to filter this noise.
4.4. Experimental Results
As shown in
Figure 3, the dataset was divided into a training set and a test set, with each take part of 50%. The F1-score was adopted to evaluate the performance. We conducted cross-validation on the training set first. All reports were randomly divided into five sets, and each one built keywords and rules independently. The updating of keywords and rules is an iterative process. The keywords and rules in the first version were derived from a manual analysis process for each set of reports. The rules that led to error detection were modified based on their performance in other situations until an optimal result was reached. Then, five sets of rules, which performed best in the corresponding report sets, were used for detecting other sets, namely, for performing cross-validation.
Table 9 reports the F1 scores for the 5-fold cross-validation.
Although using keywords and rules from other reports, the minimum F1 score for each set was above 0.80, which is an acceptable result in terms of cause identification. This means that these rules can apply to other reports in the database. In addition, the system is sufficiently robust, and the best F1 scores were all above 0.90. After cross-validation, five sets rules were integrated and tuned to their best performance on the whole training set. The results of cause detection on the test set were the final evaluation of this system. Considering the prevalence of synonyms, we used the Word2Vec technique to expand the keywords (mainly AKs).
Table 10 summarizes the precision, recall, and F1 score of the test set. In general, even if the rules were derived from other reports, the results yielded F1 scores of 0.90 or higher. However, there is a clear difference between precision and recall. Higher recall (0.95) means that almost all causes can be detected, while the relatively low precision demonstrates an unremarkable accuracy of causes identification.
The results show that these rules are somewhat imperfect since many FPs that fit the rules are identified. In descending order of F1-score, the cause types are E-type, H-type, EI-type, and O-type; this is because the E-type is described by fewer keywords, as mentioned above. Additionally, the rules for H-type causes were the most complicated because of the complex grammar these sentences used. One observation is that the causes that share identical keywords with others have lower precision. For example, H6 and EI7 are both communication-related causes, and similar keywords will appear in both cases. EI2 can easily be confused with other EI-type causes, as damage to one component can cause other systems to fail. Dozens of keywords could trigger EI4, and the training set was not capable to contain all of them. H2 and O1 also had overlapping keywords since both they related to standards and regulations.
Applying Word2Vec, although the difference was not significant, the overall F1-score was lower. More keywords led to more false positives, and these misidentifications further reduced the precision. However, adding more keywords can also improve recall by adding related words to the identification. Using Word2Vec is not much different and the original rules give satisfactory results. On the other hand, to meet specific requirements, such as higher recall, Word2Vec, and other word-embedding techniques can be tried.