4.3.1. Feature Importance

The importance of all input indicators for all the five safety behaviors is shown in Table 3. The top three important indicators for the five safety behaviors are highlighted. For example, regarding SB5, the top three important indicators are contract value (coded as ConSum), project managers seeking safety suggestions (coded as SC2), and affiliation (coded as AffRes). This indicates that construction personnel on projects with larger contract value, construction personnel on projects where project managers seek more safety suggestions, and those personnel from the client are more likely to use all necessary safety equipment on site.


**Table 3.** Feature importance of the five safety behaviors.

#### 4.3.2. Correlation and OR Values

As mentioned earlier, FI reflects the relative importance of different input indicators for each safety behavior but it does not show whether they exert positive influence or negative influence. In order to make up for this deficiency, correlation analysis based on CTs with OR values is carried out. Table 4 shows the results of correlation analysis for SB1 (i.e., use all necessary safety equipment to do the job). If the *p*-value is significant and the OR is above 1.0 along with the confidence interval, then with feature SB1 is more likely to take place. If the *p*-value is significant and the OR is below 1.0 along with the confidence interval, then feature SB1 is less likely to happen. From Table 5, it can be concluded that the drivers of SB1 are GC1, SSRP3, CI3, LMX1, TMX4, SC2, and SM2, among others. OR values between the five safety behaviors and all of the input indicators are shown in Figure 8. At least two points deserve mentioning. First, different sets of drivers are behind different safety behaviors. For example, ConSum has more impacts on SB3 and SB4 than on SB1. Second, some indicators can be omitted in establishing the classification framework, such as StgProj, Gender, Age, EduRsp, and DriHab, because they have no bearing on any of the five safety behaviors.


**Table 4.** Chi-square test and OR values.

**Table 5.** Comparison with previous studies.


\* BPSO, binary particle swarm optimization; DT decision tree; KNN, k-nearest neighbor; GBDT, gradient boosting decision tree; GSVM, Gaussian support vector machine; Bi-LSTM, bidirectional long short-term memory.

**Figure 8.** OR values between the five safety behaviors and all of the input indicators.

### **5. Discussion**

#### *5.1. Findings*

This study has achieved the two objectives mentioned earlier, and has theoretical, practical, and methodological implications.

First, in theory, safety behavior as an emergent property of a complex socio-technical system has different drivers. Using machine learning, this study supports the proposition. In particular, this study found that in order to encourage personnel to use all necessary safety equipment on the job (i.e., SB1), clients should set examples for contractors and consultants, safety motivation should be enhanced, and clients, private clients in particular, are encouraged to be involve in safety management as early as possible. In projects with a large contract sum, older personnel with more dependents to support is more likely to follow safety procedures on the job (i.e., SB2). In projects with a large contract sum, construction personnel are more likely to promote safety programs willingly (i.e., SB3) with clients actively engaging in safety management. In public projects with a large contract sum, personnel is encouraged to pursue professional development, and hence, more likely to put in extra effort to improve workplace safety (i.e., SB4). In projects with a sound safety climate and more client involvement, personnel is more likely to help colleagues who are in risky conditions (i.e., SB5). Based on the findings, practicable and targeted measures are proposed to promote the five safety behaviors, respectively.

Second, machine learning has advantage over traditional statistical methods in addressing more complex interrelations among independent variables [6]. To garner this advantage, this study first evaluates the performance of four common machine-learning methods. Although these four methods achieve the comparatively satisfactory performance, this study develops a combinative method, CatBoost–MOSMA, to train and test the data again. This is because MOSMA has achieved superior performance in hyperparameter tuning, and this study attempts to introduce it into the safety research domain. Through 64 trials, the combinative method has achieved the maximum classification performance, and therefore, is used to establish factor importance. Furthermore, as noted by Poh et al. [45], the imbalanced distribution of the classes is usually an issue in previous research. This combinative method adopts the SMOTE technique to address this issue and obtains more robust results. This is shown in Table 5, which compares the classification performance between the proposed combinative method and other classification methods. Compared with other methods of tuning and optimization, MOSMA achieves a higher accuracy score when using the same classifiers. When the performance of classifiers is not significantly different, MOSMA achieves a higher F1-score. Hence, it can be concluded that the proposed combinative strategy of MOSMA-CatBoost is effective and efficient in classifying binary construction safety behavioral data.

#### *5.2. Limitations and Future Research Directions*

Although the study has achieved its objectives, it has limitations. First, the sample size can be further enlarged. Although a new machine-learning strategy is developed specifically to tackle the small sample size issue and some seminal studies have used a smaller sample set, it is highly recommended that future researches collect more data. Second, the study uses a sample from Hong Kong, and whether the findings can be extrapolated to other countries/regions needs more research efforts. Third, the factors affecting safety behaviors mentioned in the study are not exhaustive, and their interrelationship is not clearcut. Hence, more in-depth research needs to be undertaken in this regard. Fourth, similar to the third one, this study attempts to propose a generic classification framework, and different construction sites are encouraged to tailor the framework to cater for their own needs. Fifth, this study employs a combination of three feature-selection methods, including FI, CT and BS. Only those input indicators that obtain over half votes were retained. In other word, this approach may omit some input indicators that are strongly correlated with some safety behavior. For instance, the input indicator SmoHab is strongly negatively correlated with SB5, but does not correlate with other safety behaviors. Therefore, it has been deleted. It can be seen in the experiment results that this method generally benefits all of the safety behaviors as a whole since the classification performance improves after deleting those input indicators that were only correlated with certain safety behaviors.

Despite these limitations, the classification framework is highly recommended for future research efforts, given its satisfactory performance.

#### *5.3. Practical Use of the Research*

The proposed methods can be used in safety management practice on construction sites, as shown in Figure 9. A survey is conducted with a representative sample of construction

personnel on the site, and the data are stored into a safety behavioral database. After training, safety staff is charged with modeling and algorithm implementation and deriving model results, which suggest different safety behavioral orientation associated with different feature patterns. Using a combination of their experience and this data-driven clue, safety staff shall be able to predict a newcomer's safety behavioral orientation, and then propose and implement targeted interventions. When the prediction performance turns out to be unsatisfactory, a new round of survey begins, and more data are stored in the database. Complemented with their gut feeling, this data-driven decision support system is supposed to help deter unsafe behaviors on construction sites in an efficient and effective way.

**Figure 9.** Practical use of the research.
