*Article* **Predicting Employer and Worker Responsibilities in Accidents That Involve Falls in Building Construction Sites**

**Emre Caner Akcay <sup>1</sup> and David Arditi 2,\***


**Abstract:** Fall-related accidents have received more attention in building construction than in civil construction as fall-from-heights is more common in building construction. In addition to social costs, construction companies face a significant financial burden when fall-related accidents occur. The major portion of the direct cost of accidents that involve falls includes the compensation paid by the employer to the worker. The employer and the worker try to reach an agreement on the size of the compensation, however, most of the time the process is contentious. The objective of this study is to predict the parties' responsibilities for a fall-related accident by modeling the relationship between the employer and the worker using a multi-agent system. The research pursued a three-step method, including collection of data, development of a multi-agent model, and testing of the model. The model provides satisfactory results and can be used to quantify the employer's and the worker's responsibilities in construction fall accidents, hence avoiding any escalation to pursue arbitration or litigation.

**Keywords:** construction accident; construction safety; worker compensation; falls; multi-agent systems

### **1. Introduction**

The construction industry is one of the most hazardous industries [1,2]. It has a poor safety record all over the world [3,4]. There were 4779 worker deaths recorded in all sectors in the U.S. in 2018, 21.1% of which occurred in the construction industry [5]. In other words, one out of every five deaths was caused by accidents on construction sites [6]. In the U.K., 111 workers lost their lives on the job between the years 2019–2020, 40 of them in construction [7]. Although there is a gradual decrease in the number of occupational incidents on construction sites thanks to preventive laws and regulations [8], the construction industry still experiences higher rates of accidents compared to other industries. This is a problem that can be resolved if adequate precautions are put in place to eliminate the most common reasons that cause accidents on construction sites.

Falls have received more attention than any other type of occupational accident as falls have been very common in the construction industry [9]. According to OSHA [10], 33.5% of total fatalities in the U.S. construction industry in 2018 were fall-related. Additionally, falls were responsible for 25% of all fatal injuries in the U.K. [7]. Besides social costs, construction companies also face a significant financial burden caused by falls. The direct cost of falls in U.S. construction sites has been estimated to be about USD 70 billion annually [11]. The economic impact of falls has been significant not only in the US but also in other countries such as the U.K. [12], European Union Countries [13], Australia [14], South Africa [15], Singapore [16], and Taiwan [17]. The prevention of fall-related accidents on construction sites is therefore of paramount importance.

The major portion of the direct cost of accidents that involve falls includes the compensation paid by the employer to the worker. When an accident occurs, the employer and the

**Citation:** Akcay, E.C.; Arditi, D. Predicting Employer and Worker Responsibilities in Accidents That Involve Falls in Building Construction Sites. *Buildings* **2022**, *12*, 464. https://doi.org/10.3390/ buildings12040464

Academic Editors: Srinath Perera, Albert P. C. Chan, Dilanthi Amaratunga, Makarand Hastak, Patrizia Lombardi, Sepani Senaratne, Xiaohua Jin and Anil Sawhney

Received: 21 February 2022 Accepted: 7 April 2022 Published: 10 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

worker (or the worker's family) try to reach an agreement on the size of the compensation, however, on many occasions they fail to reach an agreement and must go to court for a settlement. In Turkey, there were 180,000 court cases between the employer and worker in the period 2017–2018 [18]. As it is extremely time-consuming to consider these many cases, the Turkish Government encourages employers and workers to settle using arbitration, and skip court. In addition to preventing the occurrence of fall-related accidents, it is important to reach a mutually agreed settlement that is fast and fair to both parties.

The arbitration process involves discussions between the employer and worker. These discussions depend to a large extent on the employer's and the worker's responsibilities for the accident. In other words, it should be easy to calculate the size of the compensation by using the rules provided by the government that make use of the employer's and the worker's responsibilities for the accident, however, it seldom is. Most of the time, the process is contentious and often results in ill feelings on the part of one of the parties. This research aims to predict the parties' responsibilities for an accident by modeling the discussion between the employer and the worker using a multi-agent system, hence skipping painful and lengthy arbitration or litigation and creating an objective and fast solution that is acceptable to both parties. This study is expected to contribute to the literature in building construction safety management and to safety practice in the building construction industry by streamlining the settlement of worker compensation in the aftermath of fall-related accidents. In addition, this study provides procedural benefits to the legal system that is routinely congested.

### **2. Literature Review**

This research builds on and extends studies about (a) accidents that involve falls on construction site, (b) multi-agent systems, and (c) dispute resolution in construction. The issues that are encountered in fall-related accidents on construction sites, the basic principles and the pros and cons of multi-agent systems, and the studies about construction dispute resolution are briefly discussed in the next three subsections.

### *2.1. Fall-Related Accidents in Construction Sites*

Falls on construction sites have always drawn the attention of many researchers (e.g., [9,19–24]) as falls are the leading cause of occupational injuries and fatalities on construction sites [10], especially on building construction sites.

As seen in Table 1, several studies were conducted about fall detection, fall prevention, fall protection, safety training, causes of falls, and fall-related accident patterns, each study using different methods such as statistical methods, qualitative evaluations, sensor and camera-based techniques. Despite the existence of quite a few research studies related to fall-related accidents, there was no study published in the literature that focused on modeling the interaction between the employer and the worker to predict each party's responsibility for this kind of accident. To fill this gap, this research proposes a model that simulates the discussion between employer and worker to predict the employer's and worker's responsibilities for the accident.

**Table 1.** Research about Fall-Related Accidents.



#### **Table 1.** *Cont.*

#### *2.2. Multi-Agent Systems*

A multi-agent system is an artificial intelligence technology that consists of autonomous intelligent agents to create an intelligent behavior within a system to achieve a common objective [41]. These intelligent agents have not only the capability to act independently according to their personal objectives, but they can also interact with each other in the same system [42]. Multi-agent systems first appeared in the 1980s, and have been extensively used in different disciplines to solve complex and dynamic problems [43]. As the construction industry is highly fragmented, it mostly involves complex and fragmented problems. Hence, multi-agent systems have been widely used in construction management for simulating different problems. As seen in Table 2, several multi-agent studies were conducted to solve supply chain problems, to simulate the negotiation process to resolve conflicts between parties, to achieve energy savings, and to simulate worker safety behavior.

**Table 2.** Research about Multi-Agent Systems.


As seen in Table 2, the first six rows concern issues encountered in supply chain management, while the remaining rows include studies as diverse as negotiation processes between parties, energy consumption, and workers' safety behaviors. Out of the eighteen papers cited in this table, only Karakas et al. [41], Kim and Paulson [53], and Hosseinian et al. [54]

used agent-based systems to resolve conflicts between parties, a topic of particular interest relative to the study presented in this paper. Karakas et al. [41] developed a multi-agent system to simulate the negotiation process between contractor and client about sharing cost overruns in construction projects, while Kim and Paulson [53] used multi-agent systems to resolve schedule conflicts between subcontractors. Hosseinian et al. [54] developed a multi-agent sharing model for incentive contracts to regulate the relationship between risk-neutral owners and risk-averse contractors. It is noteworthy that none of these studies looked into the discussion/negotiation process that takes place between contractors and workers after a fall-related accident has occurred to agree upon the compensation (if any) to be received by the worker.

#### *2.3. Dispute Resolution in Construction*

As the complexity and scale of construction projects increase, disagreements and legal disputes have been very common in recent years [61]. As seen in Table 3, a vast number of researchers have published studies about various aspects of construction disputes.


**Table 3.** Research About Construction Disputes.

A review of the literature in Table 3 reveals that research in construction disputes mostly emphasizes (a) the investigation of the causes/types/severity of disputes, and (b) the development of alternative dispute resolution methods for resolving the disputes. El-Sayegh et al. [62] who investigated the major causes of construction disputes in the United Arab Emirates stated that identifying the causes of disputes have a great importance in reducing the occurrence of disagreements and disputes between owners, designers, and contractors. While Hayati et al. [63] identified the common causes of disputes in the Indonesian construction industry, Viswanathan et al. [64] developed a dispute causal model that considered the relationships between the causes of a dispute.

Concerning the development of predictive models for construction dispute resolution, Mahfouz and Kandil [69] adopted machine learning models to predict the outcome of differing site condition disputes, whereas Pulket and Arditi [70,71] developed an integrated prediction model to predict the outcome of construction disputes. Although quite a few research studies related to dispute resolution have been conducted, only a limited number were related to construction accidents. For example, Fan and Li [76] offered a

case retrieval approach based on text-mining to resolve disputes, but only in certain types of construction accidents. The literature includes several studies related to construction dispute resolution, but no study offers a multi-agent system to simulate the discussions between an employer and a worker to settle with mutual satisfaction the employer's and the worker's responsibilities in fall-related accidents.

### **3. Research Method**

Before the arbitration or litigation process, a discussion takes place between the employer and the worker to determine the parties' responsibilities. The discussion between the worker and the employer can be thought of as a negotiation process where the parties aim to achieve their respective goals within a specified period. If the parties are not able to reach an agreement, a lengthy arbitration or litigation process starts where both parties spend considerable time/effort and incur significant cost. Given the general lack of relevant research in the literature, a multi-agent system was used to model the discussion between the employer and the worker in this research for two reasons:


The objective of this study is to construct a multi-agent system to simulate the discussion between an employer and a worker to agree on the employer's and the worker's responsibilities in fall-related accidents. To accomplish this objective, this research involves three steps, including collection of data, development of a multi-agent model, and testing of the model. The flow diagram of the research method is presented in Figure 1.

**Figure 1.** Flow Diagram of Research Method.

### *3.1. Data Collection*

Data collection consisted of two stages including identification of the factors that affect fall-related accidents, and identification of the impact of each factor. It was conducted in the Turkish building construction industry and the Turkish court system.

### 3.1.1. Identification of the Factors That Affect Fall-Related Accidents

The identification of the factors that influence the discussions between the employer and the worker was performed by examining the records of fall-related accident cases tried in courts of law, understanding the contents of health and safety laws for fall-related accidents, and seeking the views of thirteen experts in a brainstorming session. It should be noted that all participants in the brainstorming session had been tasked at one time or another by Turkish courts to serve as experts in cases involving fall accidents. The profiles of the participants are provided in Table 4.


**Table 4.** Profile of the thirteen Participants in the Brainstorming Session.

The brainstorming session involved two stages. In the first stage, the participants examined a total of 27 fall accident cases and discussed the employer's and of the worker's responsibilities for each court case by examining the relevant laws. In the second stage, based on what they learned in these court cases, and with the help of their experiences, the participants identified the factors that affect the quantification of the employer's and the worker's responsibilities. They identified five factors:


### 3.1.2. Identification of the Impact of Each Factor

A web-based questionnaire was prepared and administered to 48 experts to identify the impact of each factor. The demographic information of the respondents is presented in Table 5.


**Table 5.** Profile of the 48 Experts who responded to the Survey.

In this questionnaire, each expert rated each factor by using a nine-point Likert Scale where one indicates very low impact and nine indicates very high impact. A Relative Importance Index (RII) was calculated to analyze the data. The RII of each factor was calculated by using Equation (1) and helped to identify the importance of each factor relative to the perceptions of the respondents.

$$\text{RII} = \frac{\sum \mathcal{W}}{A \times N} \tag{1}$$

where RII is the relative importance index; *W* denotes the weight assigned to each factor by the respondents (in this case, it ranges from one to nine); *A* is the highest weight (in this case, it is nine); and *N* is the total number of respondents.

The value of RII varies between zero and one. *A* negotiation factor that has a higher RII has larger impact compared to a factor with a lower RII. The questionnaire results, RII values, and their weighted percentages are presented in Table 6. To check the reliability and the internal consistency of the collected data, the Cronbach's Alpha coefficient was calculated. It was found that the Cronbach's Alpha coefficient of 0.892 is greater than the acceptable minimum value of 0.7 suggested by Santos [77].


**Table 6.** Data, RII values, and fuzziness levels.

As each factor involves a level of uncertainty, a level of fuzziness should be incorporated into the weight of each factor. For this purpose, another questionnaire was administered to the thirteen experts who took part in the brainstorming session at the beginning of the study (see Table 4). The questionnaire was administered orally as it was more convenient for respondents to answer. Respondents assessed the level of vagueness as low, moderate, and high. Later, these values were converted into percentage values. To reduce the range of the answers, the Delphi method was performed in two successive rounds. The Delphi method is a group decision-making and forecasting method that involves successively collating the judgments of experts with the aim of seeking consensus. The fuzziness level of each factor is shown in Table 6.

### *3.2. Development of a Multi-Agent System*

The multi-agent system requires that a decision variable be defined at the beginning of the development process. In this study, the choices were the worker's or the employer's responsibility for the accident. Since this is a zero-sum situation, the selection of one or the other does not make any difference in the functioning of the model. In this study, the responsibility of the worker for the fall was selected as the decision variable. Therefore, the discussion between the worker and the employer was modeled to quantify the responsibility of the worker rather than the responsibility of the employer.

The agents of the multi-agent system developed in this study are the worker and the employer. The employer makes the initial determination of responsibility, typically by assuming little responsibility and assigning most of the responsibility for the fall to the worker. The worker counters with the worker's perspective, typically assigning most of the responsibility to the employer. Both agents have a "reservation value" that defines their limit in making concessions to each other. For the employer, the reservation value is the lowest worker responsibility that can be accepted by the employer, whereas for the worker, it is the highest worker responsibility that can be accepted. In other words, if the worker's responsibility is lower than the employer's reservation value, the employer's responsibility becomes so high that it is not acceptable to the employer; if the worker's responsibility is higher than the worker's reservation value, then the worker assumes such a high responsibility that it makes it unacceptable to the worker. The parties are privy to each other's reservation values. The initial determination of the employer, the initial response of the worker, and the reservation values are inputted into the system by using the fuzzy logic approach proposed by Akcay et al. [42] and Karakas et al. [41]. The initial determination and reservation value of the employer, and the initial response and the reservation value of the worker are set using Equations (2)–(5).

$$\mathbf{F\_{c}} = \sum\_{i=1}^{5} (\mathcal{W}\_{c\bar{i}} + \mathcal{W}\_{s\bar{i}}) \times (1 + F\_{\bar{i}}) \tag{2}$$

where Fe is the initial determination of the employer; *Wei* is the weighted percentage of the *i* th factor where the employer has the power; *Wsi* is the weighted percentage of the *i* th factor where the power is shared; *Fi* is the fuzziness level of the *i* th factor.

$$\mathbf{F\_w} = \sum\_{i=1}^{5} (\mathcal{W}\_{\varepsilon i} + \mathcal{W}\_{si}) \times 0.4 \tag{3}$$

where Fw is the response of the worker to the employer's initial determination; *Wei* denotes the weighted percentage of the *i* th factor where the employer has the power; *Wsi* is the weighted percentage of the *i* th factor where the power is shared.

$$\mathbf{R\_e} = \sum\_{i=1}^{5} (\mathcal{W}\_{ei}) \times (1 - F\_i) \tag{4}$$

where Re is the reservation value of the employer; *Wei* denotes the weighted percentage of the *i* th factor where the employer has the power; *Fi* is the fuzziness level of the *i* th factor.

$$\mathcal{R}\_{\rm w} = \sum\_{i=1}^{5} \mathcal{W}\_{\rm \varepsilon i} + \mathcal{W}\_{\rm si} \tag{5}$$

where Rw is the reservation value of the worker; *Wei* denotes the weighted percentage of the *i* th factor where the employer has the power; *Wsi* is the weighted percentage of the *i* th factor where the power is shared.

As the discussion between the employer and the worker relative to the quantification of employer and worker responsibilities is a dynamic process, and the parties are fully informed about their respective reservation values, the Zeuthen Strategy was selected as a most appropriate settlement protocol. This strategy is performed by comparing the parties' tolerance to risk, which shows the ratio of the utility loss when a party accepts the determination of the other party and the utility loss when the party rejects the other party's determination [78]. The maximum risk acceptable to the employer and to the worker can be calculated using Equations (6) and (7), respectively.

$$\mathcal{R}\_{\mathfrak{c}} = \frac{\mathcal{U}\_{\mathfrak{c}\mathfrak{c}}^{n} - \mathcal{U}\_{\mathfrak{c}\mathfrak{w}}^{n}}{\mathcal{U}\_{\mathfrak{c}\mathfrak{c}}^{n} - \mathcal{U}\_{\mathfrak{c}}(\mathcal{C})} \tag{6}$$

where Re denotes the maximum risk that is acceptable to the employer in round *n*; *U<sup>n</sup> ee* denotes the utility to the employer of the employer's determination in round *n*; *U<sup>n</sup> ew* denotes the utility to the employer of the worker's response to the employer's determination in round *n*; *Ue*(*C*) is the utility to the employer of a breakdown in the discussions.

$$\mathcal{R}\_{\rm w} = \frac{\mathcal{U}\_{\rm ww}^{n} - \mathcal{U}\_{\rm wc}^{n}}{\mathcal{U}\_{\rm ww}^{n} - \mathcal{U}\_{\rm w}(\mathcal{C})} \tag{7}$$

where Rw denotes the maximum risk that is acceptable to the worker in round *n*; *U<sup>n</sup> ww* denotes the utility to the worker of the worker's response to the employer's determination in round *n*; *U<sup>n</sup> we* denotes the utility to the worker of the employer's determination in round *n*; *Uw*(*C*) is the utility to the worker of a breakdown in the discussions.

As per Karakas et al. [41], the recommended utility of the employer's initial determination and the worker's first response was set to one, whereas the utility of the reservation value for each party was set to 0.6. It should be noted that the utility curve between these two values is linear and shows the degree of the agent's satisfaction. The parties calculate their gains and losses at each round by using these functions.

The model was created by using a Java Agent Development Framework (JADE), which is one of the most widely used software environments that enable users to perform agent communications.

### *3.3. Case Example*

In the starting interface of the program (Figure 2), the user specifies the status of the factors that affect fall-related accidents in the building project in question and clicks on the "start discussion" button. In the first round, the system calculates the initial determination of the employer agent using Equation (2), the worker agent's initial response to the employer agent's initial determination (Equation (3)), and each agent's reservation values (Equations (4) and (5), respectively) as depicted in Figure 3. The reservation values remain the same in each round.


**Figure 2.** Starting Interface.


**Figure 3.** Employer Agent's Initial Determination and Worker Agent's Response to the Initial Determination in the First Round.

After these values are calculated by the system, the discussion about the employer agent's determinations and the worker agent's responses to these determinations proceeds by using the Zeuthen Strategy. In this strategy, the parties compare the maximum risk that is acceptable to them and their opponents (using Equations (6) and (7)) in each round. The agent who has the lower acceptable maximum risk, makes the next offer. The discussion between the agents continues until the value of Re (Equation (6)) equals to the value of Rw (Equation (7)). In this example, the system generates the responsibility of the worker for this case as 18.43%.

#### *3.4. Performance of the Model*

Seven court cases related to fall-related accidents were used to test the performance of the proposed multi-agent system. The condition of each factor and the responsibility that the court assigned to the worker were determined by examining the court records. The responsibility that the court assigned to the worker was compared with the worker's responsibility obtained by using the proposed model. The comparison was performed by calculating the percentage difference in each case using Equation (8).

$$\mathbf{D} = |E - T|\tag{8}$$

where D is the percentage difference for each case; *E* is the worker's responsibility obtained by using the proposed multi-agent system; *T* is the worker's actual responsibility of the worker assigned by the court.

Table 7 summarizes the information extracted from the court records and the output of the proposed model for each case. With the exception of the outlier, Case 4, which has a difference of 11.03%, the average differences of the remaining six cases amounts to ±4%, indicating that the performance of the model is quite satisfactory. Courts give their final decisions by considering the reports of safety experts. The minor variation in the differences between the determinations of the courts and the outcomes generated by the proposed model can be explained by the subjectivity of the experts' judgments.

**Table 7.** Performance Results.


Factor 1 = Evidence of worker training. Factor 2 = Use of protective equipment. Factor 3 = Presence of site engineer. Factor 4 = Responsible behavior of worker. Factor 5 = Safe site conditions.

### **4. Conclusions**

The construction safety literature routinely covers many aspects of fall-related accidents that frequently occur on construction sites, including the causes and consequences of these accidents, although none of the studies have ever modelled the interaction between the employer and the worker to predict each party's responsibility in fall-related accidents. Several dispute resolution methods have been developed as an alternative to lengthy and expensive litigation over the years and some of them like arbitration, mediation, and dispute review boards have been quite successful in achieving a fast, inexpensive, convenient, and fair resolution of disputes between construction owners and contractors, however, alternative dispute resolution methods have never been used to settle the parties' responsibilities over fall-related accidents. In response to this research gap, this paper proposes a multi-agent system to simulate the discussions between an employer and a worker for quantifying the responsibilities of these parties in fall-related accidents in construction sites, a major concern especially in building construction sites. Even though agent-based systems have been used by researchers to find solutions to various construction-related problems, only a few researchers attempted to regulate the relationship between contractors and their workers relative to their responsibilities in construction accidents, but never in fall-related accidents.

First, the factors that affect the discussions were identified by performing a brainstorming session with 13 experts, examining the records of 27 cases tried in courts of law, and getting closely acquainted with related laws and regulations. Second, the impact of each factor was determined by conducting a questionnaire survey administered to 48 experts. A Relative Importance Index (RII) was calculated to analyze the results. Third, another questionnaire survey was conducted to determine the fuzziness level of each factor. The Delphi method was performed to reduce the range of answers. Fourth, the model was constructed on the JADE platform by setting up the discussion protocol, creating agents, setting up the input values, and defining the utility function of each agent. Fifth, the performance of the model was assessed by comparing the output of the model and the actual court decisions in seven court cases. The model provides satisfactory results and can be used to quantify the employer's and the worker's responsibilities in construction fall accidents.

This research contributes to construction safety management in the context of fallrelated accidents that constitute a serious problem especially in building construction sites. It also provides potential benefits to the legal system that is routinely used to settle differences between employers and workers in such accidents.


It should be noted that the proposed model can be used to quantify the responsibilities of only the employer and the worker. The responsibilities of other parties such as the site engineer, the project manager, the subcontractor, etc., cannot be predicted. Simulating other parties' responsibilities in fall-related accidents can be explored in future research. In addition, the proposed model can be expanded to assess the responsibilities for other types of accidents other than falls.

**Author Contributions:** Conceptualization, E.C.A.; Formal analysis, E.C.A.; Investigation, D.A. and E.C.A.; Methodology, D.A. and E.C.A.; Software, E.C.A.; Supervision, D.A. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data is contained within the article.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


**Ziqi Li 1, Bo Song <sup>2</sup> and Dongsheng Li 1,\***


**Abstract:** Construction safety monitoring is a significant issue in practical engineering. Unfortunately, specific techniques in this field still heavily depend on artificial monitoring. To detect the abnormal scenarios during the construction process automatically, a method was proposed for the detection and localization of abnormal scenarios in time and space. The method consists of three components: (1) an I3D-AE video prediction model, which extracts the video features from multiple I3Ds and reconstructs the video by 3D deconvolution; (2) a spatial localization module AS-CAM, which determines the location of abnormal areas via back-propagating the I3D-AE; (3) a temporal parameter *St*, which can calculate the abnormal time period. The effectiveness of the method was verified with the use of a dataset, and the resulting data were plotted as ROC curves. The results indicated that the proposed method exceeded 0.9 on the frame-level test and 0.76 on the pixel-level test with the use of the AUC evaluation metric. Therefore, it can be used to assist the construction managers to improve the efficiency of construction safety management.

**Keywords:** abnormal scenarios detection; localization; video prediction; autoencoder; construction process

### **1. Introduction**

Construction is a high-risk industry in many countries, and the risks inherent in the construction industry contribute to high worker fatalities. According to the US Bureau of Labor Statistics (US BUREAU OF LABOR STATISTICS), 3.1 out of every 100 full-time construction workers in the US were injured and sick, and according to the UK National Safety Executive (HSE), 262 workers were injured per 10,000 workers in the UK construction industry in 2017 [1].

Construction site monitoring is an essential procedure in construction safety control, to minimize construction safety risks, as well as to support project managers in making strategic decisions at critical times [2]. However, the construction site environment is complex [3], and with many targets [4], so it is difficult for construction managers to monitor the construction site in real time. Since there are numerous cameras on the construction site [5], it is convenient to introduce some vision-based technologies for continuously monitoring the activities of construction sites [6–10].

However, most of the existing studies have focused on the recognition and tracking of workers' actions during the construction process. The hazards of the construction process do not arise only from workers' actions, but also from the presence of some construction workers, construction machines, or construction materials in areas and times where they should not be. For example, falling from heights are the greatest risk of death for construction workers [11,12]; on the one hand, they are due to the construction workers not wearing safety belts as required when working at heights [13], while on the other hand, the illegal presence of construction workers in some overhead areas is an important reason [14]; in construction site fire accidents, flammable construction materials in areas

**Citation:** Li, Z.; Song, B.; Li, D. Safety Risk Recognition Method Based on Abnormal Scenarios. *Buildings* **2022**, *12*, 562. https://doi.org/10.3390/ buildings12050562

Academic Editors: Srinath Perera, Albert P. C. Chan, Dilanthi Amaratunga, Makarand Hastak, Patrizia Lombardi, Sepani Senaratne, Xiaohua Jin and Anil Sawhney

Received: 24 March 2022 Accepted: 25 April 2022 Published: 27 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

with a considerable amount of welding work are the main cause of fires in the construction process [15–17]. To cope with these risks, researchers have proposed intelligent methods for identifying risk scenarios [18,19].

Although the current approach focuses on specific risk recognition, in the construction site, there are many types of risks, and they change with time, so if the risks are recognized in a directional way, it is easy to cause insufficient recognition of risks; a risk recognition method that can be used for a multi-scenario recognition is also lacking. Under a sound security management system, the occurrence of risk events is generally a small probability event, i.e., an abnormal event. Abnormal events are unpredictable, and if they are recognized for this characteristic of risk events, security risks can be effectively recognized while consuming a small amount of computation. Compared with previous methods, the greatest contribution of this paper is in its exploratory attempt to use the unpredictability of risk scenarios to identify risks; this is the first time that the idea of abnormal scenarios prediction is used to recognize the safety risks of the construction process, for which only a deep learning network is needed to avoid tedious database construction.

Based on the above analysis, we proposed a 3DCNN-based encoder model to detect abnormal frames by predicting future frames, which is an end-to-end deep learning framework trained on normal video samples. Specifically, we predicted a video based on the history of video clips; to this end, we first built a prediction model that can predict future videos, and we trained it with a normal video so that it can predict future videos. In the testing phase, if the error between the truth frame and the predicted frame was small, we decided that it was a normal video, and if the error was large, we considered this frame as an abnormal frame. An effective model to predict the video is key for the task, and therefore, we used an I3D-AE encoder as the video prediction model, which consists of two parts—an encoder and a decoder. In the encoder part, we used multiple I3D [20] networks to extract video features, which have good performance in extracting video features, and we used multiple 3D deconvolutional networks in the decoder. For the abnormal scenarios localization, we proposed a module AS-CAM for the spatial information localization and a parameter *St* for the temporal information localization.

### **2. Related Research**

### *2.1. Abnormal Scenario Detection Methods*

Sungmin and Junseok [21] proposed a novel system that detects abnormal events. Unlike conventional methods, they considered abnormal event detection as a variation matching problem. In the application of abnormal scenarios detection, deep learning plays an irreplaceable role. Wei et al. [22] proposed an abnormal scenario detection method for monitoring abnormal activities in public places. They exploited fully convolutional neural networks (FCNs), which have been proved to be powerful in image processing, to extract the features of videos. Zhang et al. [23] introduced a more effective algorithm for detecting abnormal behaviors in narrow areas with perspective distortion. The algorithm firstly uses the adaptive transformation mechanism to make up for the distorting effect in the region of interest extraction. Then, an improved pyramid L–K optical flow method with perspective weight and disorder coefficient was proposed to extract the abnormal behavior feature that occurred in historical moving images. Abid [24] proposed a two-stream architecture using two separate 3D CNNs to accept a video and an optical flow stream as input to enhance the prediction performance. He et al. [25] proposed an anomaly introduced learning (AL) method to detect abnormal events. A graph-based, multi-instance-learning (MIL) model was formed with both normal and abnormal video data. Sabokrou et al. [26] were the first to apply deep learning to the task of abnormal scenario detection; the researchers combined two detectors into a cascade classifier and achieved good detection results. Fully convolutional networks (FCNs) were used in a pretrained model [27], combining semantic information (inherited from existing CNN models) and low-level optical flow to measure local anomalies. One of the advantages of this method is that it does not require fine-tuning of the phase.

### *2.2. Construction Risk Scenario Detection Methods*

Duan et al. [28] proposed a method to recognize and classify four different risk events by collecting specific acceleration and angular velocity patterns through built-in sensors of smartphones. The events were simulated with anterior handling and shoulder handling methods in the laboratory. After data segmentation and feature extraction, five different machine learning methods were used to recognize risk events, and the classification performances were compared. Jeongeun et al. [29] proposed a deep convolutional neural network that automatically recognizes possible material and human risk factors in the field regardless of individual management capabilities. The most suitable learning method and model for this study's task and environment were experimentally identified, and visualization was performed to increase the interpretability of the model's prediction results. Kim et al. [30] analyzed the step-by-step process required to automate construction site safety management based on Building Information Modeling (BIM) and evaluated a specific construction site hazard using a BIM-based example. Yang et al. [31] developed a fire identification model and a real-time construction fire detection (RCFD) system. Experiments were conducted to verify the applicability of the proposed system under different environmental conditions. Xiong et al. [32] developed an automated hazards identification system (AHIS) to evaluate operational instructions generated from field videos based on safety guidelines extracted from text files by construction safety ontology. Zhang et al. [1] proposed an automatic recognition method.

### **3. Proposed Method**

We proposed an abnormal detection method during the construction process based on future frame prediction, consisting of a model for predicting future frames, which is called I3D-AE, and using past frames in video clips to predict future frames. We proposed a module AS-CAM to locate the spatial information of abnormal scenarios and used a parameter *St* to locate the temporal information of the abnormal scenarios. The key elements of the proposed method are shown in Figure 1.

**Figure 1.** Key elements of proposed method.

#### *3.1. I3D-AE Video Prediction Model*

Unlike conventional autoencoders, our convolution is a 3D convolution, and multiple I3Ds were used in the encoder (two-stream inflated 3D ConvNets) [20]. One video clip was extracted from each I3D, and the video clips extracted from adjacent I3Ds were consecutive but not intersecting. The number of I3D modules can be used according to the actual situation; the number of modules used for the purposes of this paper was four. The 3DCNN is expanded from 2DCNN Inception-V1 and can use the parameters pretrained on ImageNet. The experimental results show that this model achieved the best results with this configuration on all standard datasets. The middle layer uses three fully connected layers, and the decoder has a 3D deconvolutional structure. We used a normal construction site video as the training video, as we can reconstruct the video well when the test video is

a normal video, and the reconstruction error is large when the test video is an abnormal video. The model structure is shown in Figure 2.

**Figure 2.** Structure chart of our I3D-AE.

At the beginning of the training, given an input video clip *Vi* and a future clip *Vp*, the autoencoder reconstructs the video using *Iw*(*Fw* (*Dw* (*Vi*))), where *Iw* is an I3D integrated network with the weight parameter *ID*, *Fw* is a fully connected layer with the parameter *wF*, and the decoder *Dw* is a 3D deconvolutional network with the weight parameter *wD*. To train this autoencoder, we used Euclidean loss as the loss function.

$$\log\_{I} w\_{I}, w\_{\mathcal{F}}, w\_{\mathcal{D}} = \arg\min\_{i} \sum\_{i} \parallel V\_{p} - I\_{\mathcal{W}} \left( F\_{\mathcal{W}} (D\_{\mathcal{W}} (V\_{i})) \right) \parallel\_{2}^{2} + \lambda \left( \parallel\_{1} I\_{\mathcal{W}} \parallel\_{2}^{2} + \parallel F\_{\mathcal{W}} \parallel\_{2}^{2} + \parallel D\_{\mathcal{W}} \parallel\_{2}^{2} \right) \tag{1}$$

The first term of the objective function is the loss function, which was used to calculate the difference between the reconstructed frame and the video frame, and the second phase of the objective function is the L2 regularization, which was used to limit the complexity of the parameters in the autoencoder. In the encoder part, to reduce the model parameters, we read 64 frames of grayscale video instead of RGB video and used four I3D modules as feature extractors; the features extracted by the I3Ds were pooled in time and then entered into the fully connected layer. In the decoder part, we used four upsampling layers and three deconvolution layers, each of which contains a ReLU layer.

### *3.2. Spatial and Temporal Locating*

#### 3.2.1. Abnormal Scenario Spatial Localization

Since we trained normal clips as the dataset, abnormal scenarios cause a larger reconstruction error when they appear in the clips. The reason for the increased reconstruction error is due to the presence of abnormal scenarios in the clips. Therefore, it is crucial to find the pixels that cause the reconstruction error to increase, the locations of which indicate abnormal scenarios. Based on the above assumptions, we proposed an AS-CAM module to locate abnormal scenarios, which was developed based on Grad-CAM++ [33]. The flow of the AS-CAM is shown in Figure 3.

**Figure 3.** Flowchart of AS-CAM.

Grad-CAM++ is based on CAM [34] and Grad-CAM [35], the principle of which is to construct the weights of the action maps, and by solving for the weights, the contribution of different action maps to the reduction in the objective function is obtained; furthermore, by multiplying weights with the action maps, the heat maps are obtained to show which area contribute most to the reduction in the objective function. In our task, we needed to find out which areas contributed most to the increase in reconstruction error, which were the locations of irregular construction in the construction scenarios. The calculation formula for the abnormal location is as follows:

$$L\_{ij} = \sum\_{k} w\_k \cdot A\_{ij}^k \tag{2}$$

where *Lij* is the saliency map of abnormal scenarios of spatial location (*i*,*j*), and *wk* is the weight of the pixel *Aij* in the *kth* action map. The value of *wk* can be multiplied by the pixel weights *α<sup>c</sup> ij* and the loss gradient through the negative ReLU (NReLU) activation weighting.

$$w\_k = \sum\_{i} \sum\_{j} a\_{ij}^k \cdot \text{NReLU} \left( \frac{\partial \Delta f}{\partial A\_{ij}^k} \right) \tag{3}$$

where *α<sup>k</sup> ij* is the weighting coefficient of pixel (*i*,*j*) when *wk* is calculated; NReLU is the activation function. The calculation of *α<sup>k</sup> ij* is as follows:

$$\alpha\_{ij}^k = \frac{\frac{\overleftarrow{\partial^2 \Delta I}}{\left(\partial A\_{ij}^k\right)^2}}{2\frac{\overleftarrow{\partial^2 \Delta I}}{\left(\partial A\_{ij}^k\right)^2} + \sum\_a \sum\_b A\_{ab}^k \frac{\overleftarrow{\partial^3 \Delta I}}{\left(\partial A\_{ij}^k\right)^3}}\tag{4}$$

where (*i*,*j*) and (*a*,*b*) are iterators over the same activation map. NReLU is designed to activate the negative gradients, so the expression of NReLU is as follows:

$$\text{NReLU} = f(\mathbf{x}) = \min(\mathbf{x}, 0) \tag{5}$$

where Δ*J* is the reconstruction error, with the input of *Vi*. It can be calculated by backpropagation of the I3D-AE, and the equation is as follows:

$$
\Delta J = \left\| \begin{array}{c} V\_p - I\_w \left( F\_w \left( D\_w \left( V\_i \right) \right) \right) \end{array} \right\|\_2^2 \tag{6}
$$

Therefore, the saliency map of abnormal scenarios is calculated by Equation (7) as follows:

$$L = \sum\_{i} \sum\_{j} L\_{ij} \tag{7}$$

### 3.2.2. Abnormal Scenario Temporal Localization

For the temporal localization of abnormal scenarios, we calculated the reconstruction error for each frame by subtracting the pixel value of each frame from the pixel value of the corresponding frame of the reconstructed video.

$$x\_t = \frac{\sum\_{\mathbf{x}, \mathbf{y}} \|\left(V(\mathbf{x}, \mathbf{y}, t) - I(F(D(V(\mathbf{x}, \mathbf{y}, t))))\right)\|\_2}{\mathbf{x} \cdot \mathbf{y}} \tag{8}$$

where *V*(*x*, *y*, *t*) indicates the pixel value of a frame, *I*(*F*(*D*(*V*(*x*, *y*, *t*)))) indicates the pixel value of the reconstructed frame at that specific time; after normalizing the reconstruction error, we obtained the temporal information parameter as follows:

$$S(t) = \frac{\varepsilon\_t - \varepsilon\_{\min}}{\varepsilon\_{\max} - \varepsilon\_{\min}} \tag{9}$$

### **4. Experiments**

### *4.1. Construction Site Dataset*

A dataset was taken at a construction site in Chengdu with an iPhone 13. The intention of this section is to simulate a fire scenario in a construction site, hoping to detect fire scenarios in time and space. In construction sites, fires are generally prohibited, so in order to simulate the identification of fire scenarios, in this section, we use the image of steel welding as an alternative. Since fire and smoke are generated when steel welding is drawn and welded, the characteristics of fire can be well simulated. The video in which no firelight and smoke are generated was used as the regular screen for training I3D-AE in dataset species, and the welding screen containing firelight and smoke was tested as the abnormal screen. The dataset contained a total of 62 min of video in the regular frame and 3 min of video in the abnormal frame.

### *4.2. Evaluation Criteria*

In order to evaluate the abnormal scenarios detection method, the receiver operating characteristic (ROC curve) was used as an evaluation metric, which is based on a series of different dichotomies, (cutoff values or decision thresholds), with the true-positive rate (TPR) representing the percentage of samples that are correctly judged as positive among all samples that are actually positive, and false-positive rate (FPR) representing the percentage of samples that are incorrectly judged as positive among all samples that are actually negative. The ROC curve represents the sensitivity of different thresholds to TPR and FPR, and the fuller the curve, the better the classification. TPR and FPR are calculated as follows:

$$TPR = \frac{TP}{TP + FN} \tag{10}$$

$$FPR = \frac{FP}{FP + TN} \tag{11}$$

where false negative is FN, false positive is FP, true negative is TN, and true positive is TP. As shown in Table 1, FN indicates a sample judged to be negative but is in fact positive, FP indicates a sample judged to be positive but is in fact negative, TN indicates a sample judged to be negative and is in fact negative, and TP indicates a sample judged to be positive and is in fact positive.

**Table 1.** Meaning of TP, TN, FP, and FN.


The effectiveness of the proposed method was evaluated from two perspectives—the pixel-level evaluation method and the frame-level evaluation method. Both methods use the area under the curve (AUC) as an evaluation metric, in addition to the ROC curve, which indicates the area enclosed by the ROC curve and the coordinate axes; the larger it is, the better the performance of the classifier.

### (1) Pixel level

The ROC curves at the pixel level were designed to provide the localization ability of the proposed method in space, for which the intersection-over-unio ratio was used as a predictor. Intersection-over-unio (IOU) ratio, a concept used in target detection, is the intersection ratio of the generated candidate frames to original marker frames, i.e., the ratio of their intersection to the merged set, and is calculated as shown in Equations (1)–(12). The intersection-to-merge ratio is ideally a complete overlap, i.e., a ratio of 1. If the intersectionto-merge ratio between the localized area and the true anomaly area is greater than a threshold, the localization result under that frame is defined as a positive sample, and if the intersection-to-merge ratio is less than the threshold, the localization result under that frame is defined as a negative sample.

$$IoI = \frac{\text{area}(L) \cap \text{area}(G)}{\text{area}(L) \cup \text{area}(G)} \tag{12}$$

#### (2) Frame level

The ROC curves at the frame level were designed to provide the localization ability of the proposed method in time, for which the *St* value was used as a predictor.

#### *4.3. Implementation Details*

We adjusted all video clips to 224 × 224 and calculated the optical flow between each adjacent frame [36]. I3D-AE uses Adam [37] as the optimizer with a learning rate of 0.0001, a minimum batch size of 100, and an epoch setting of 200; our model was implemented on TensorFlow 1.14 and trained, with a 2080TiGPU. The model parameters are shown in Table 2.


**Table 2.** I3D-AE; the main parameters of the model.

### **5. Experimental Results**

*5.1. Visualization of Spatial Localization Results*

Figure 4 shows the experimental results of the method proposed in this study on a construction site dataset. When workers perform welding, fire and smoke cause changes in the reconstruction errors, which are localized. When there is no smoke, the proposed method locates the flame more accurately, but when there is smoke, the smoke causes an increase in the reconstruction error and the size of the localized area is larger than the real area.

**Figure 4.** Experimental results of the construction site dataset.

### *5.2. Temporal Localization Results*

Figure 5 shows variations in *St* values in the construction site dataset and distinguishes normal and abnormal frames with background colors. It can be seen from the figure that, in the normal frame, the value of *St* is small, and the curve is flat, while in the abnormal frame, the value of *St* is more prominent, and the two cases can be clearly distinguished. In both scenarios, there is an elevated value of *St* in the normal frame, but in the abnormal frame, *St* does not appear flat, indicating that the method had a mild over-recognition in recognizing abnormal scenarios. As shown in Figure 6, a ROC of 0.901 on the frame-level ROC curve indicates that *St* can distinguish between normal and abnormal frames in time well.

**Figure 5.** The *St* curves of a clip in the construction site dataset.

**Figure 6.** ROC curve at frame level.

#### *5.3. Ablation Experiments*

In this section, the effectiveness of each module of the proposed method is tested through a set of ablation experiments. The experiments are divided into two groups. In Experiment 1, the four I3D modules in the I3D-AE module were replaced with C3D models, i.e., the commonly used 3DCNN video feature extraction networks, and then the module was trained and tested with the replaced network and the construction site dataset. This experiment tested the feature extraction effectiveness of the proposed I3D-AE. In Experiment 2, AS-CAM was replaced with the reconstruction error-based localization method, i.e., the reconstruction error was used as the activation map to achieve the localization of anomalous areas, and then the replaced network and the construction site dataset were used for training and testing; this experiment tested the localization effect of the proposed AS-CAM. The ROC curves and AUC values were used to show the localization ability of the network used in the experiment.

As seen in Figure 7, the ROC curves of the proposed method are fuller than those of the C3D-based method, indicating that the proposed method outperformed the C3D-based method in locating anomalous areas. The reason for this is that the C3D-based method is weaker than I3D in video feature extraction, and I3D has the advantage of pretraining; therefore, it can capture more details in the video through multiple I3D modules in series, making it a stable and fast video feature extraction method.

**Figure 7.** Pixel-level ROC curves with video feature extraction method changes.

From Figure 8, it can be seen that AS-CAM has a better localization effect based on the area surrounded by curves and axes, and the reconstruction error-based method has a larger gap than the proposed method. The reason is that, in video prediction, although a reconstruction error can represent an anomalous area, it is also affected by network depth—when the network layers are deep, the anomalous pixel area causes the error not necessarily in the mapped anomalous area; therefore, this type of method is more suitable for shallow encoder networks.

**Figure 8.** Pixel-level ROC curves with changes in localization method.

### **6. Discussion**


### **7. Conclusions**

Construction safety has always been an important problem in the construction industry, and currently, it mainly relies on manual inspection to detect the risk of construction sites. In this paper, deep learning methods were applied to the process of construction risk detection, providing a new perspective for intelligent monitoring of construction sites as follows:


**Author Contributions:** Conceptualization, Z.L. and D.L.; methodology, Z.L.; software, Z.L.; validation, Z.L., B.S. and D.L.; formal analysis, B.S.; investigation, D.L.; resources, D.L.; data curation, B.S.; writing—original draft preparation, Z.L.; writing—review and editing, Z.L.; visualization, B.S.; supervision, D.L.; project administration, D.L.; funding acquisition, D.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** The authors are grateful for the financial support from the National Natural Science Foundation of China (NSFC), under Grant Nos. 51778104, and the Fundamental Research Funds for the Central Universities (Project No. DUT19LAB26).

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**

