*Article* **New Aspects of Socioeconomic Assessment of the Railway Infrastructure Project Life Cycle**

#### **Vít Hromádka \*, Jana Korytárová, Eva Vítková, Herbert Seelmann and Tomáš Funk**

Faculty of Civil Engineering, Brno University of Technology, 602 00 Brno, Czech Republic; korytarova.j@fce.vutbr.cz (J.K.); vitkova.e@fce.vutbr.cz (E.V.); seelmann.h@fce.vutbr.cz (H.S.); tomas.funk@email.cz (T.F.)

**\*** Correspondence: hromadka.v@fce.vutbr.cz; Tel.: +420-541-148-641

Received: 25 September 2020; Accepted: 19 October 2020; Published: 21 October 2020

#### **Featured Application: The results of the presented research will become part of the methodological material for the economic analysis of railway infrastructure projects after the completion of a broader research task.**

**Abstract:** The paper deals with the issue of evaluation of socioeconomic impacts of occurrences emerging from railway infrastructure. The presented research results form part of a broader research subject focusing on the evaluation of the socioeconomic benefits of projects for the implementation of measures aimed at increasing the safety and reliability of railway infrastructure. The research topic addresses a part of the evaluation of railway infrastructure project efficiency within its life cycle using the cost–benefit analysis method. The methodology is based on the description and definition of input variables that are essential for the process of evaluating socioeconomic impacts. It is followed by another important step, which is the analysis of the categories and the number of occurrences, separately, for regional and national lines, and, further, the data is sorted according to whether occurrences emerge at stations or on a wide line. The result of the presented research is an overview of the calculated values of the expected socioeconomic impacts of partial occurrences according to the categories related to the year of operation on the railway infrastructure and the unit of measure. The research team carried out an inquiry into the annual impacts of the subcategories of occurrences related to one railway station and one kilometer of wide line, e.g., for national lines, the impacts of €2922.72/station/year and €41.67/km of wide line/year were determined. The results of the presented research represent important and necessary inputs for the next phase of the research topic, i.e., the evaluation of the socioeconomic benefits of projects increasing the safety and reliability of railway infrastructure.

**Keywords:** railway infrastructure; occurrences; socioeconomic impact; economic evaluation; CBA; life cycle

#### **1. Introduction**

The research, the results of which are presented in the paper, is focused on the issue of the socioeconomic evaluation of projects in the field of transport, especially railway infrastructure. The issue of the economic evaluation of public projects is very broad; however, the basic principles have been known for many years. Unlike commercial projects, where profit or profit-derived cash-flow plays a key role in the economic evaluation, the evaluation of public projects is usually based on the use of cost–output methods. The most important and widely used method is the cost–benefit analysis (CBA). CBA is thoroughly described in a number of guidelines and publications (e.g., [1]). In the case of transport infrastructure projects, there is a detailed methodological guide for its elaboration [2]. However, none of these documents are detailed enough to include methodologies for evaluating all

relevant costs and benefits that arise in connection with transport (and especially railway) infrastructure. The paper presents the results of applied research aimed at incorporating the benefits associated with increasing the safety and reliability of railway infrastructure as a result of the implementation of projects for the installation of new and improved security equipment. The importance of research lies mainly in the fact that, currently, the benefits associated with increasing the safety and reliability of railways are not incorporated into the socioeconomic evaluation carried out in accordance with the methodology (in the case of the Czech Republic, it is departmental methodology [2]) using CBA, although it is clear that these benefits arise from railway infrastructure project implementation. As a result, railway infrastructure projects show worse results in economic efficiency evaluation and seem to be less economically efficient. The presented paper focuses on the evaluation of the socioeconomic impacts of occurrences that emerge from railway infrastructure. The aim of the paper is to methodically describe and verify the procedure for determining the socioeconomic impacts of occurrences that emerge from railway infrastructure in a case study. The Database of Occurrences [3], which contains detailed information on occurrences emerging from the railway in the Czech Republic, managed by employees of the Railway Administration, was used for the purposes of analysis and subsequent synthesis of the obtained data into methodological steps. Occurrences from the 2011–2018 period were used for the purposes of the research. The output of the research presented in the article is to determine the values of the expected annual socioeconomic impacts of occurrences according to the categories related to railway stations or one kilometer of the track segment. The outputs of the presented research will be used in follow-up research for the purpose of determining potential savings on railway infrastructure arising from the reduction of the number of occurrences, characterized by their potential socioeconomic impact. The reduction in the number of occurrences will be achieved by implementing appropriate security measures. Their economic efficiency shall be assessed by these steps.

#### **2. Materials and Methods**

The subject of the paper is the analysis and subsequent synthesis of relevant data in order to identify the methodology for evaluating the socioeconomic impacts of occurrences emerging from railway infrastructure. The purpose of the evaluation of these socioeconomic impacts is their subsequent use in the analysis of project costs and benefits in the field of transport infrastructure. This article, from a general point of view, can be included in the issue of socioeconomic evaluation of public investment projects in the field of transport infrastructure. The principles and procedures applied in the process of cost–benefit analysis are methodically described in the Guide to CBA of Investment Projects [1], where the general rules are defined and the specifics for the process of CBA for individual types of public projects are described. The issue of public projects in the field of transport infrastructure is dealt with in detail by the Departmental Guideline of the Ministry of Transport of the Czech Republic [2], which addresses the economic evaluation of transport infrastructure projects, both in general and with a focus on individual modes of transport, i.e., projects of roads and highways, projects of railway line construction and projects in the field of transport-important waterways. The abovementioned methodology provides some procedures for evaluating the socioeconomic impacts of transport projects (e.g., user savings, transport time savings, traffic accident savings, or impact on externalities). However, some key impacts, such as the impact on transport network safety and reliability, dealt with in the research presented here, are not addressed in more detail in the methodology. One of the aims of the paper is to explore the possibilities for supplementing the methodology for the socioeconomic evaluation of investment projects in the field of railway infrastructure by assessing its impact on the safety and reliability of railway infrastructure.

Based on research into the available scientific literature, it has not been found that any of the scientific teams were directly involved in assessing the impacts associated with increasing the safety of railway infrastructure. However, the research included texts dealing with the issue of occurrences emerging from railway infrastructure and their causes, as well as technical impacts (material damage, train delays). The basic input task was the identification of occurrences emerging from railway infrastructure, their impact, prevention, and classification. Santos-Reyes [4] dealt with a general analysis of the occurrence of traffic accidents on the railway [4] and presented basic study points to be addressed in order to subsequently prevent occurrences from railway infrastructure. The methodology developed within the Dnipro National University of Railways [5] presented the definition of the categories of occurrences from railway infrastructure in relation to the amount of material damage caused. Occurrences were classified according to the severity of the consequences, which were expressed in physical quantities. The paper provides an overview of financial losses associated with subcategories of occurrences, which represent a suitable data set for its comparison with the partial outputs of the presented results.

Klockner and Toft dealt with the modeling of occurrences on the railway in their study [6]. A second significant part of the publication deals with the factors influencing the occurrence emergence from railway infrastructure. In their study, Iridiastadi and Ikatrinasari [7] presented a classification system, including subfactors, with the potential to influence the occurrence emergence. Zhou and Lei also addressed the causes of occurrences on the railway in their article [8]. In their paper, Baysari et al. [9] presented a detailed analysis of errors leading to railway occurrences. Their work was based on a set of forty reports on the investigation of occurrences in railway infrastructure in Australia. The study concluded that up to half of the occurrences were caused by equipment failure due to insufficient maintenance. The conclusions of the paper [9] are important in relation to the presented research results. The subject of a detailed examination within the presented research is mainly occurrences emerging as a result of human factor failure, not due to a technical defect. Consequently, the next scientific papers also directly focus on the influence of the human factor on occurrence emergence from railway infrastructure. The paper by Hani Tabai et al. [10], which focused on the evaluation of the relationship between engine driver demography, cognitive performance, and the risk of an occurrence emergence, where the need to pay continuous attention had been identified as one of the most important reasons for an occurrence emergence due to an engine driver's error, can serve as one of the examples. The study of Zhan [11] elicits a qualitative and quantitative analysis method to detect the human- and organization-related causes of railway accidents. The HFACS-RAs framework, based on the incident and accident data of the railway industry, is proposed in this study. Evans [12], in his paper, dealt with the influence of the speed of trains on railway infrastructure and the number and severity of occurrences and extended his own previous statistics by the influence of train speed on the severity of the occurrences. This represents an important aspect of occurrences emerging from railway infrastructure; however, this dimension was not considered within the presented research.

An important (albeit rather marginal) issue within the presented research is the causes of occurrences emerging at railway crossings [13] and as a result of suicides [14,15]. These are very important occurrences emerging from all railway infrastructure, which represent a significant part of occurrences recorded in the database used. However, they are not considered in more detail in the analysis presented in this paper. Occurrences emerging at railway crossings are the subject of a separate methodology, according to which they are further assessed, so the development of this methodology is not the subject of the presented research either. Occurrences emerging as a result of suicides represent types of occurrences that cannot be effectively prevented by the implementation of appropriate measures and, therefore, also do not fall within the scope of the research.

The last important area relevant to the scope of the presented paper is the issue of occurrence prevention in railway infrastructure. Kim et al., in their article [16], presented results of a factual analysis of the railway occurrence rate and subsequently proposed preventive care systems focused mainly on the development of a training program for railway safety and railway safety training centers. Edkins and Pollock addressed the issue of the analysis of occurrences in railway infrastructure and the subsequent proposal of preventive measures aimed at reducing the number of occurrences [17]. It comprised a retrospective analysis of 112 railway occurrences in Australia, which revealed a tendency to human error and subsequently led to the development of a railway safety checklist. Evans presented an overview of the development of statistics on rail occurrences in Europe and a proposal for preventive

measures to reduce the number of occurrences [18]. Authors Cheng and Tsai dealt, in their article [19], with the issue of competencies in the case of occurrence management and the restoration of normal operations after the occurrence emergence and its resolution. The abovementioned research reveals findings explaining the emergence of occurrences and the possibilities for reducing their number, which corresponds to the findings of the research team on the Czech conditions. The conclusions of the research confirm the topicality of the issue. The presented research builds on these basic building blocks and proceeds to the further step, i.e., the quantification of impacts from the socioeconomic point of view. These are very important and inspiring findings that explain the emergence of occurrences and the possibilities for reducing their number. The conclusions of the articles confirm the need to address the issue of occurrences and to identify and quantify their impacts.

The authors of the article, following the above-listed research of foreign resources, use other surveys and analyses based on the issues in the national environment and focus on the usability of technical data for socioeconomic analysis (evaluation of economic efficiency), which can be used as a basis for the decision-making process in investment projects for the security devices. The main goal of the research is to develop and present methodological steps for the evaluation of socioeconomic impacts of occurrences emerging from railway infrastructure and to verify the methodology on the occurrences emerging from railway infrastructure in the Czech Republic. The main result of the applied research presented in this paper is the determination of the value of the annual socioeconomic impact of occurrences emerging from railway infrastructure in the Czech Republic per one kilometer of the track segment and one railway station, divided into both national and regional lines.

#### *2.1. Data*

The basic background for the research of the issue dealt with is the creation of a database of occurrences in the researched region/national economy/territorial unit. The Database of Occurrences, managed by the employees of the Railway Administration of the Czech Republic, is the key source of data for the creation of the methodology and the elaboration of the case study, where all occurrences arising in railway infrastructure of the Czech Republic in the 2009–2018 period were registered [3]. The Database of Occurrences is not a publicly accessible source of information; the database was provided to the research team by the Railway Administration of the Czech Republic, upon request, exclusively for the purpose of conducting the presented research. This represents a very large document that contains more than 500,000 items of information.

#### *2.2. Methods*

The proposed methodology is based on the collection, analysis, and use of relevant data to evaluate the societal benefits arising from the establishment of security measures. The outputs are bound for the later use of the "opportunity cost" valuation approach.

The subject of the paper is to develop and present a methodology for the quantification and evaluation of the socioeconomic impacts of occurrences emerging from railway infrastructure, divided into relevant subcategories, following the scientific literature research and basic principles of economic analysis of public investment projects.

Development of the methodology for the evaluation of the socioeconomic impacts of occurrences emerging from railway infrastructure consists of the following steps:


The calculation of the amount of the expected socioeconomic impacts of occurrences by the categories of per kilometer of railway line and station is performed in Section 3. The partial steps are described in detail in the following sections.

2.2.1. Analysing the Database of Occurrences, Defining the Categories, and Selecting the Data for Further Use

Occurrences emerging within the railway infrastructure can be classified into three basic groups:


Subcategories of occurrences were defined within the mentioned groups of occurrences, which differ mainly in the causes of the occurrences and their impacts. For the purposes of the presented research, the following variables, which are relevant to the methodological procedure, were selected from the Database of Occurrences:

	- -Death,
	- -Serious injury,
	- -Minor injury,
	- -Technical,
	- -Human factor,
	- -Others.

These variables are listed in the Database of Occurrences for each specific occurrence.

The Database of Occurrences [3] could not be used for further research purposes in its original version without modifications as the recorded variables were developed year-on-year and the data structure was not consistent. The researchers, therefore, made the necessary adjustments for the purposes of its further use. The analysis of the Database of Occurrences was elaborated in order to address the research question of the average, minimum, and maximum annual impact values (number of occurrences, deaths, number of serious/minor injuries, material damage, and costs) of occurrences, with their possible deviation, which was monitored based on the standard deviation. It was subsequently decided, following the results of the analysis, that the follow-up research would not include occurrences due to suicides and those emerging at railway crossings. Detailed results of the above-defined analysis of the Database of Occurrences and conclusions and recommendations resulting from the analysis were published in [20].

2.2.2. Defining the Key Characteristics of Occurrences and Their Economic Evaluation

The research team defined the information and characteristics of partial occurrences within the Database of Occurrences that provided data on the partial socioeconomic impacts of the occurrences on society. An overview of this information and characteristics is presented in the previous section. Impacts in the form of costs of the removal of material damage or other direct costs incurred are quantified for each individual occurrence and are listed directly in the occurrence database. The research, therefore, focused on the evaluation of the remaining variables. Subsequent research was thus focused on the evaluation of the following characteristics of the occurrences:


The unit impact of an occurrence in a particular category was determined using Relation (1).

$$\text{AUI}\_{\text{O}} = \sum\_{\text{i}=1}^{3} \text{AII}\_{\text{i}} + \sum\_{\text{j}=1}^{2} \text{AITD}\_{\text{j}} + \text{AML} \tag{1}$$

where


The evaluation of the listed impacts was performed in connection with the procedures specified in the Departmental Methodology of the Ministry of Transport of the Czech Republic [2]. In the case of a health impact assessment, the following Relation (2) was used for the evaluation.

$$\rm{AIF\_i} = \rm{AF\_i} \times \rm{UIF\_i} \tag{2}$$

where


The input data for the purposes of the case study elaborated for the Czech Republic was taken directly from the Departmental Methodology [2], indicating the unit cost of the occurrence according to its severity. The data is shown in Table 1 below.



Source: Departmental Methodology of the Ministry of Transport [2].

In the case of impacts related to passenger train delay, a calculation was carried out using Relation (3).

$$\text{AITD}\_1 = \text{PTU} \times \text{ADP} \times \text{UIDP} \tag{3}$$

where


The data presented in the Departmental Methodology [2] and statistical data from the Statistical Yearbook of the Czech Railways Group [21] were used for the purposes of the case study performed for the Czech Republic. Based on these sources, the expected occupancy of one passenger train was determined at 66.55 people/train and the average value of passenger time at 10.63 €/person-hour.

In the case of the evaluation of the impacts associated with the delay of freight trains, the following Relation (4) was used for the calculation:

$$\text{AITD}\_2 = \text{CTU} \times \text{ADC} \times \text{UIDC} \tag{4}$$

where


The average freight weight of a freight train was determined at 455 t/train using statistical data [21] as part of a case study elaborated for the territory of the Czech Republic. The average value of freight transport time was subsequently determined at 0.23 €/ton using the values of freight transport time according to commodities taken from the Departmental Methodology [2] and the percentage rate of individual commodities in freight transport.

The values of the input quantities used for the calculation of Relations (1)–(4) within the case study were taken from the national sources of the Czech Republic. These sources were used in the form of official methodological documents for the socioeconomic evaluation of public projects and are based on long-term statistical data. The key methodological basis was the Departmental Methodology of the Ministry of Transport, which defines some input variables (e.g., unit impacts on health or unit cost of passenger time or cargo). The already mentioned occurrence database or statistical yearbooks of carriers or the administrator of railway infrastructure fall within the documents containing statistical data.

The quantities UIHi (Unit Impact on Health), UIDP (Unit Impact of Personal Train Delay), and UIDC (Unit Impact of Cargo Train Delay) were taken from the Departmental Guideline [2] for the purposes of the presented research. The quantities AML (Average Material Loss), AHi (Average Number), ADP (Average Delay of Personal Trains), and ADC (Average Delay of Cargo Trains) were determined using data recorded in the Database of Occurrences [18], and PTU (Personal Train Utilization) and CTU (Cargo Train Utilization) values were derived from statistical data presented in the Statistical Yearbook of Czech Railways [21].

A detailed calculation of these unit impacts of railway occurrences in railway infrastructure, including a case study to verify the functionality of the evaluation algorithm, is provided in [22].

#### 2.2.3. Determining the Expected Impact of Occurrences by Category

Only those occurrences that emerged due to human error were considered for the calculation of the average impacts of occurrences following the conclusions of [22]. The categories of occurrences listed in Table 2 are, therefore, considered for the current research.


#### **Table 2.** Categories of occurrences.

Source: Database of Occurrences 2009–2018 [3].

Using the data contained in the Database of Occurrences [3] and the unit impacts of occurrences presented in the previous part of the text, the expected overall socioeconomic impacts of individual categories of occurrences were determined using Relation (1). The overall socioeconomic impacts include the following items:


The presented research made use of selected occurrences on two levels. The first level included occurrences taken from the Database of Occurrences [3] (adjusted for occurrences at railway crossings and suicides), whatever the cause; the second level included occurrences caused by human error. A detailed evaluation of individual categories of occurrences was described in [23]; the expected impacts of individual categories of occurrences are summarized in Table 3.


**Table 3.** Average total economic impacts per occurrence by category.

The presented methodological part resulted from the current research that is focused on the evaluation of the socioeconomic impacts associated with occurrence emergence in railway infrastructure in order to consider the increase in the safety and reliability of the railway in the socioeconomic evaluation of railway infrastructure projects. These methodological steps serve as an input basis for determining the expected impact of occurrences emerging from railway infrastructure related to the network of railway lines and railway stations. The results of this research are presented in the following section.

#### **3. Results**

The key output of the research presented in this article is the determination of the expected socioeconomic impacts of individual categories of occurrences in relation to the system in which they emerge. In general, it can be stated that occurrences emerge from railway tracks or at railway stations. Railway tracks and railway stations are further divided into national and regional categories. The expected annual impact of an occurrence of the relevant category on the station or a kilometer of track is determined using the following Relation (5):

$$\text{TI} = \frac{\sum\_{\text{i}=A1}^{C19} \left( \mathbf{O}\_{\text{i},\text{j},\text{k}} \times \text{UI}\_{\text{i},\text{j},\text{k}} \right)}{\mathbf{Q}\_{\text{j},\text{k}} \times \text{t}} \tag{5}$$

where


The Number of Occurrences per evaluated period t is determined separately and further divided into occurrences emerging from the national or regional line and occurrences emerging from the tracks or at the stations. While the information on whether an occurrence emerges from a national or regional line is taken from the Database of Occurrences for the purpose of a case study of a railway line in the Czech Republic, information on whether an occurrence emerges on the tracks or at the stations had to be determined using statistical data. The calculation was performed using Relation (6).

$$\rm{O}\_{i,j,k} = \rm{O}\_{i,j} \times RO\_{i,j,k} \tag{6}$$

where


The ratio of occurrences emerging from railway tracks and at the railway stations was determined using a selected set of data from the Database of Occurrences. This ratio is presented in Table 4.


**Table 4.** Ratio of occurrences emerging from the railway tracks and at railway stations.

In further calculations, it is therefore assumed that the share of occurrences emerging at railway stations (ROi,j,1) is 94.85%, and the share of occurrences emerging from the railway tracks (ROi,j,2) is 5.15%.

As a next step, it is necessary to distinguish between occurrences emerging from the national railway lines and occurrences emerging from the regional lines. The analysis of the Database of Occurrences generally shows a larger number of occurrences on national lines and a smaller number on regional lines.

For the sake of clarity, it is appropriate to summarize which occurrences from the overall Database of Occurrences were used in the final phase of the research. Of the total number of occurrences recorded in the Database of Occurrences, occurrences emerging at railway crossings and as a result of suicides were omitted. The reasons are described in more detail in [20]. Subsequently, those categories of occurrences that, as a law, cannot be caused by the human factor were omitted. The resulting set of occurrences was subsequently reduced by occurrences objectively caused in a different way than by the failure of the human factor. Last but not least, those occurrences that emerge during handling rides, carriage shifts, or on sidings were omitted. The resulting set of occurrences was subsequently used for the final analysis.

An important input for the final calculation was also the extent of the railway network. Using the documents of the Railway Administration [24–26], the total number of railway stations on national and regional lines and the total length in kilometers of national and regional railway lines were determined. An overview of the resulting input data is given in Table 5.


**Table 5.** Overview of the input data for the final calculation.

\* Values taken from Table 3.

Using the input data listed in Tables 2–4, the values of expected annual socioeconomic impacts were calculated for individual relevant categories of occurrences related to one kilometer of railway track and one railway station, separately for national and regional lines. The calculation was performed according to Equation (5), and its results are presented in Table 6.

**Table 6.** Calculation of the expected annual impact of subcategories of occurrences per station and per km of track.


Table 5 shows the calculation and the resulting values of the expected annual socioeconomic impact on one railway station and on one kilometer of track section for individual categories of occurrences. The total expected annual socioeconomic impact of occurrences was determined by the sum of the impacts of the individual categories. The total values are given in Table 7.


**Table 7.** Values of the overall socioeconomic impacts of occurrences.

The resulting values, given in Tables 5 and 6, represent the expected values obtained from historical data taken from the Database of Occurrences for the 2011–2018 period and from the documents of the Railway Administration defining the railway transport network. These outputs clearly point out the importance of occurrences emerging from the railway and their socioeconomic impact on society as a whole. However, the obtained results have further use. As part of follow-up research activities, these values will be used to calculate the socioeconomic benefits associated with increasing the safety and reliability of the railway line as a result of the implementation of projects that increase the level of security of the railway network.

#### **4. Discussion**

The subject of the presented research is the definition of methodological steps for determining the overall economic impacts of occurrences emerging from railway infrastructure related to a kilometer of wide line or a railway station and year. The methodological steps consist of determining the expected socioeconomic impacts for subcategories of occurrences in the form of direct financial impacts, impacts on human health, and the impact on delays of both passenger and freight trains. The partial impacts of individual categories of occurrences were subsequently related to a purpose unit, i.e., a kilometer of wide line or one railway station. The case study used to verify the functionality of the proposed procedure was based on the data obtained by the long-term monitoring of traffic on the railway infrastructure in the Czech Republic. When interpreting the calculations, it is, therefore, necessary to consider the possible specifics of railway transport in the Czech Republic in comparison with abroad. At the same time, the authors of the paper highlight the use of some input variables in values corresponding to the conditions of the Czech Republic. It mainly concerns the data taken from the Departmental Methodology of the Ministry of Transport of the Czech Republic, such as the unit impacts of traffic accidents on health, the value of passenger time, or the cost of transported cargo.

The calculations also make use of national rail transport statistics, which may also show results that are different from those of other countries. While respecting the abovementioned limitations, it is possible to proceed to the discussion on the obtained results. A very interesting partial result is the ratio between occurrences emerging on a wide line and at railway stations. As shown in Table 4, on average, in the Czech Republic, approximately 95% of occurrences emerge at railway stations and only 5% on a wide line. The conclusions resulting from these findings are commented on below. The key outputs of the research carried out are the values of the expected annual impact of the subcategories of occurrences per kilometer of wide line or per railway station, given in Table 6. Table 6 shows the values for the occurrences emerging on a national or regional line and on a wide line or at train stations. For all categories of occurrences, the dominance of national lines is evident, both in the case of emergencies occurring at railway stations as well as emergencies occurring on a wide line. These findings are clearly visible from the total values of impacts of all categories of occurrences per kilometer of wide line or one railway station, as listed in Table 7. The results shown in Table 7 clearly demonstrate the importance of occurrences emerging on national lines (almost a 6.8-times higher unit impact in the case

of railway stations and a 5.6 times higher impact in the case of a wide line) compared to regional lines. Considering the above-listed results, it can be stated that while respecting the restrictions arising from the data used, as well as inputs related solely to the Czech Republic area, it is generally recommended to pay attention and use resources to eliminate occurrence emergence risk at railway stations on the national line, as, in this sector, there is a high damage occurrence risk caused by occurrence emergence from railway infrastructure. This conclusion can be useful in both planning new projects in the field of construction and the modernization of the railway infrastructure, as well as in the decision-making process on the allocation of investment funds.

Authors of the results presented in [5] arrived at methodologically similar partial results. This text provides information on the average losses associated with the occurrences in Ukraine. The categorization of occurrences used in this article is slightly different from the categories used in the presented research; a certain comparison of both is possible. However, this is not a comparison of the final results of the presented research, but a comparison of the processing of partial results listed in Table 3, i.e., the expected impacts of subcategories of occurrences. In the case of the results published in [5], the following results can be given as an example:


It is evident that the results presented in this paper present other values when compared with the corresponding results given in Table 3 of this article. Absolute values of quantities due to different levels of purchasing power cannot be directly compared without further adjustments. However, in part, the reason can also be seen in the fact that the overall impact, including socioeconomic damage, was determined for the purposes of the presented research results, while in the case of [5], financial loss was exclusively considered. Another reason may result from the fact that only occurrences associated with human factor failure were considered in the case of the presented research results. A slightly different classification of occurrences may also play a role. The authors also stated that while financial loss is the result of the research in [5], in the case of the presented research, this is a value obtained from long-term relevant statistics administered by a state authority.

The findings resulting from the presented research have a significant impact on the process of socioeconomic evaluation of projects of transport infrastructure, which forms an integral part of the life cycle of a public investment project. The long-term experience of the research team shows that despite the quality methodological data used to perform a socioeconomic evaluation of railway infrastructure projects, it is not possible to perform a comprehensive socioeconomic evaluation of the railway infrastructure project as the current methodological documents lack a procedure for taking into account the increase in the safety and reliability of the railway line, although, in connection with most of the evaluated projects, the increase in the safety and reliability of the railway line is very closely related to them. The presented results provide important information on the values of the socioeconomic impacts of subcategories of occurrences, and, when considering the frequency of their emergence, as derived from a detailed database of occurrences, the socioeconomic impacts related to special-purpose units are determined, in this case, for one railway station or one kilometer of railway track. In addition, all this is carried out separately for the national and regional lines, as during the research, there were significant differences in the number and severity of occurrences with regard to whether it was a regional or national line.

Follow-up research will focus on the use of the obtained data for the development of a methodology for the evaluation of benefits associated with increasing the safety and reliability of the railway. The authors assume that the results of the research will be incorporated into the existing Departmental

Methodology, which addresses the course of the socioeconomic evaluation of projects in the field of transport infrastructure in the Czech Republic.

#### **5. Conclusions**

The paper presents the results of applied research aimed at refining the evaluation of socioeconomic benefits associated with increasing the safety and reliability of the railway line as a result of the implementation of investment projects in the field of railway infrastructure within a CBA analysis. The aim of the paper was to methodically describe and verify a case study on the procedure for determining the socioeconomic impacts of occurrences that emerge from railway infrastructure. The article presents partial methodological steps that have been developed for these purposes. The research is based on a detailed Database of Occurrences emerging from the railway line in the Czech Republic, using data for the 2011–2018 period, which originally included a total of 8455 occurrences divided into subcategories. After the elimination of the occurrences connected with suicides and accidents at railway crossings, a total of 5378 occurrences was used for a more detailed analysis. The first step was to define the methodological steps for determining the socioeconomic impact of an occurrence emerging on a railway line. The overall socioeconomic impact includes material damage and costs, evaluated costs associated with health impacts, and evaluated impacts associated with delays of both passenger and freight trains. Using these documents and the entire Database of Occurrences, the expected value of the impact of an occurrence of a relevant category was subsequently determined. Combining it with additional information on the railway transport network and the ratio of occurrences emerging on the national or regional line, the expected annual impacts of occurrences of individual categories on a railway station or one kilometer of track were derived in the final phase of the research. The main goal of the paper was to present methodological steps for evaluating the socioeconomic impacts of occurrences emerging from railway infrastructure and to verify the methodology in the case of occurrences emerging from railway infrastructure in the Czech Republic area. The research presented in the paper assessed new aspects of the socioeconomic evaluation of railway projects and subsequently proposed new procedures aimed at quantifying the socioeconomic impacts of occurrences emerging from railway infrastructure. In the follow-up research, the existing process of evaluating the economic efficiency of railway infrastructure projects will be supplemented by a new dimension, i.e., considering the increase in the reliability and safety of railway infrastructure.

When interpreting the results obtained, it is necessary to accept certain limitations associated with the research carried out, as well as the data used. While the proposed methodological steps are generally applicable for the purposes defined in the research goal, the results of the case study are influenced by the fact that they were determined using data specific to the Czech Republic area. For the purposes of the case study, data from the occurrence database emerging exclusively from the Czech Republic area were used, and other input information was also taken from national methodological documents and statistics. In this regard, follow-up research on the topic is also proposed for other countries for the possibility of assessing the international applicability of the methodological steps and for comparing the situation in the field of occurrence impacts on railway infrastructure.

Follow-up research will focus on projecting the results achieved in the presented research into the final methodology to consider the benefits associated with increasing the safety and reliability of the railway network in the socioeconomic evaluation of investment projects of railway infrastructure. The aim of the authors is to incorporate the resulting methodology into the process of economic efficiency assessment of investment projects in the field of railway infrastructure in the form of an annex to the Departmental Methodology of the Ministry of Transport and the State Fund for Transport Infrastructure.

**Author Contributions:** Conceptualization, V.H., J.K. and E.V.; methodology, V.H., J.K., E.V., H.S. and T.F.; validation, V.H., H.S. and T.F.; formal analysis, V.H., J.K. and E.V.; investigation, V.H., J.K., E.V., H.S. and T.F.; resources, E.V., J.K., H.S. and V.H.; data curation, V.H., H.S. and T.F.; writing—original draft preparation, V.H. and J.K.; writing—review and editing, V.H., E.V., H.S. and T.F.; visualization, V.H.; supervision, J.K.; project administration, V.H.; funding acquisition, V.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Technology Agency of the Czech Republic, grant number TL02000278.

**Acknowledgments:** This paper comes under the project of the Technology Agency of the Czech Republic "TL02000278 Evaluation of increased safety and reliability of railway infrastructure after its modernization or reconstruction".

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Statistical Methods in Bidding Decision Support for Construction Companies**

**Agnieszka Le´sniak**

**Citation:** Le´sniak, A. Statistical Methods in Bidding Decision Support for Construction Companies. *Appl. Sci.* **2021**, *11*, 5973. https://doi.org/ 10.3390/app11135973

Academic Editor: Igal M. Shohet

Received: 26 May 2021 Accepted: 24 June 2021 Published: 27 June 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Faculty of Civil Engineering, Cracow University of Technology, 31-155 Krakow, Poland; alesniak@l7.pk.edu.pl

**Abstract:** On the border of two phases of a building life cycle (LC), the programming phase (conception and design) and the execution phase, a contractor is selected. A particularly appropriate method of selecting a contractor for the construction market is the tendering system. It is usually based on quality and price criteria. The latter may involve the price (namely, direct costs connected with works realization as well as mark-ups, mainly overhead costs and profit) or cost (based on the life cycle costing (LCC) method of cost efficiency). A contractor's decision to participate in a tender and to calculate a tender requires an investment of time and company resources. As this decision is often made in a limited time frame and based on the experience and subjective judgement of the contractor, a number of models have been proposed in the literature to support this process. The present paper proposes the use of statistical classification methods. The response obtained from the classification model is a recommendation to participate or not. A database consisting of historical data was used for the analyses. Two models were proposed: the LOG model—using logit regression and the LDA model—using linear discriminant analysis, which obtain better results. In the construction of the LDA model, the equation of the discriminant function was sought by indicating the statistically significant variables. For this purpose, the backward stepwise method was applied, where initially all input variables were introduced, namely, 15 identified bidding factors, and then in subsequent steps, the least statistically significant variables were removed. Finally, six variables (factors) were identified that significantly discriminate between groups: type of works, contractual conditions, project value, need for work, possible participation of subcontractors, and the degree of difficulty of the works. The model proposed in this paper using a discriminant analysis with six input variables achieved good performance. The results obtained prove that it can be used in practice. It should be emphasized, however, that mathematical models cannot replace the decision-maker's thought process, but they can increase the effectiveness of the bidding decision.

**Keywords:** bidding decision; LCC criterion; price criterion; construction; statistical method; classification; probability of winning

#### **1. Introduction**

With the development of new technologies and advanced building materials, an increasing number of demands are placed on the construction industry. Modern buildings should have as little impact as possible on the environment [1–3] using sustainable materials (such as natural or recycled materials) [4–6] and environmentally friendly construction technologies [7–9]. They should have low energy consumption [10,11], demonstrate the ability to perform repairs resulting from wear and tear [12–14], as well as from possible breakdowns [15,16]. Preferably, they should allow the recycling or disposal [17,18] of the resulting construction waste. These aspects are considered by the participants in the investment process, both the investor, the contractor, and the user against the background of the different stages of the building life cycle. Phases identified in the literature include the following: the programming phase (study and conceptual analysis, as well as design), the execution phase (construction of the facility), the operation phase (operation, use, and maintenance of the facility) and the decommissioning phase (demolition of the facility). In

this paper, attention is paid to the programming phase, and in particular to the conclusion of the design phase, which must be followed here by the selection of the contractor for the construction work before the execution phase begins (Figure 1).

**Figure 1.** Bidding decisions of the building contractor within the building life cycle.

The methods of sourcing contractors in the construction market depend on the type of market (private or public sector) and the value of the project. Due to the individualized nature of construction production and the long production cycle, the tendering system is particularly suited to the operating conditions of the construction market [19]. The bidding procedure ensures that competition takes place properly and that its results are objective. It is also a factor conditioning the objectivity of prices in the construction industry. Bidding can be carried out by any investor looking for a contractor, but it is the potential contractor who must decide to tender and begin the laborious process of preparing a bid.

The selection of the most advantageous tender is normally based on quality and price criteria [20]. The price criterion may involve a price or cost and is based on a costeffectiveness method, such as life-cycle costing (LCC). In the former case, the basis for determining a price are the direct costs connected with works realization as well as markups, mainly overhead costs and profit [21–23]. In the latter case, life cycle costs (LCC) should be estimated, including the costs for planning, design, operation, maintenance, and decommissioning minus the residual value, if there is any [24]. In the literature, one can find many mathematical models prepared for the estimation of building life cycle costs [25–27], the description and comparison of which can be found, for example, in [28]. A contractor's decision to enter a tender requires action to prepare the tender and requires investment of time and commitment of staff, that is, the direct use of company resources. Irrespective of the outcome of the tender, the costs of preparing the tender will be incurred. Efficient bidding is certainly essential for every construction company. Choosing the right tender for a company has an impact on the creation of its image, its financial condition, and its aspiration to success [29].

The decision to participate in a tender often must be made by the contractor within a limited time frame and it is often based on his or her own experience. To improve the effectiveness of the decision, various models have been developed to support this process. In this case, a bidding decision model should be understood as a mathematical representation of reality, with a proposed technique to help the construction contractor decide to participate in the tender, avoiding errors and randomness. Efficient decision making is one of the greatest challenges of contemporary construction [30].

Different methods and tools are used to build models supporting construction contractors' decisions to bid. A summary of the selected existing (published after 2000) models is provided in Table 1.


**Table 1.** Examples of models supporting tender decisions presented in the literature after 2000.

It is worth noting that the indicated models differ in the methods used. Different methods, techniques, and approaches are sought and applied to obtain the most effective models. What is important, continuously for at least 20 years, modeling of a tender decision is still an object of research and interest of researchers.

The models proposed in the literature are generally based on factors, also called criteria, affecting the decision, and using them as input parameters. The number of publications on the identification of factors is considerable, as each country and region has a certain characteristic group of factors that will not be found in other markets [48–50]. It can therefore be concluded that the factors influencing tender decisions depend not only on the project to be tendered but also on the environment and market in which the contractor operates.

Bidding problems are also known in procurement auctions [51,52]. This paper [53] presents the analysis of the relation between the award price and the bidding price in the case of public procurement in Spain. An award price estimator was proposed as it is believed to be particularly useful for companies and public procurement agencies. Procurement auctions have long been employed in the logistics and transportation industry [54]. In combinatorial auctions, each carrier must determine the set of profitable contracts to bid on and the associated ask prices. This is known as the bid construction problem (BCP) [55]. Different approaches for the bid construction problem (BCP) in transportation procurement auctions are proposed in literature. One of them can be found in [56] where authors proposed solving the BCP problem for heterogeneous truckload using exact and heuristic methods.

The paper proposes the use of statistical methods to support the decision-making process of a construction contractor related to the preparation of a price offer and entering a tender. Two classification methods were used as decision support models. The response obtained from the classification model is a recommendation to participate in the tender (qualification into the W-winning class), or a recommendation to resign (allocation into the L-losing class). To perform the analyses, it was necessary to use a database consisting of historical data, that is, resolved tenders. The research framework diagram is presented in Figure 2.

**Figure 2.** The research framework diagram.

#### **2. Materials and Methods**

#### *2.1. Data Acquisition*

In [57], a literature survey and research gap analysis of statistical methods used in the context of optimizing bids were presented. The paper attempts to build a decision-making model using two statistical methods: regression analysis and discriminant analysis. In the methods derived from regression analysis, the values of the *Y* variable (the explained variable) are given before determining the model and based on them and the adopted factors, the parameters of the model are determined. However, in the case of discriminant analysis, the values of the variable are obtained when the model is determined.

Factors influencing decision-making were proposed as input parameters of the models (explanatory variables). As a result of research (a questionnaire survey) conducted by the author in Poland, presented and described in previous works [29,44], 15 factors were identified: x1—type of works, x2—experience in similar projects, x3—contractual conditions, x4—investor reputation, x5—project value, x6—need for work, x7—the size of the project, x8—profits made in the past from similar undertakings, x9—duration of the project, x10—tender selection criteria, x11—project location, x12—time to prepare the offer, x13—possible participation of subcontractors, x14—the need for specialized equipment, and x15—degree of difficulty of the works. The tender score was the model output variable (*Y*) representing the class:


The starting point for the selected methods was the construction of a database. The research performed in Poland was of primary nature, based on information collected to solve a given decision problem. With regard to the type of research material, the study comprised quantitative research (evaluation of factors) and qualitative research: determination of the result obtained by the contractor in a given evaluated tender. The factors

identified were used to evaluate the tenders entered into by the contractors participating in the research. Each factor, from x1 to x15, was rated on a scale from 1 to 7, where the numbers meant 1—very unfavorable, and 7—very favorable influence of the factor on the decision to participate in the tender. This scale has already been used successfully in previous works [44]. The result for each tender evaluated was then recorded (W—win, L—loss). In the end, the database contained 88 evaluated tenders, of which 64 were lost cases (L) and 24 won cases (W). Selected database records of evaluated tenders including factor evaluations with the corresponding result obtained in the tender (W—win, L—loss,) are presented in Table 2.



#### *2.2. Regression Analysis Model*

The main task of the qualitative decision-making model will be to determine the probability of the contractor's success in the tender (winning) and to identify variables that significantly affect the outcome of the tender. A binomial (dichotomous) model is sought in which the explanatory variable *Y* is quantified by a zero-one variable. It takes two possible variants described by the codes "1"—W (win) and "0"—L (loss). If *pi* is the probability of the event *Yi* = 1, then 1 − *pi* is the probability of the event *Yi* = 0. The expected value of the variable *Yi* is [58,59]:

$$E(Y\_i) = 1 \cdot p\_i + 0 \cdot (1 - p\_i) = p\_i \tag{1}$$

In binomial models, it is assumed that pi is a function of the vector of values of the explanatory variables *xi* for the *i*-th object and the parameter vector *β* [58,59]:

$$P\_i = P(y\_i = 1) = F\left(\mathbf{x}\_i^T \boldsymbol{\beta}\right) \tag{2}$$

Depending on the type of F-function, different types of models are distinguished [60]: a linear probability model, logit model, and probit model. Using the simplest of the binomial models—the linear probability model—has many negative consequences described in the literature [58,61]. Probit and logit models, on the other hand, as indicated by some authors [60], are similar to each other and in practice one of them is used. Therefore, the search for a binomial model for the phenomenon in question was limited to a logistic regression model. The general form of the logit model is as follows [58,59]:

$$Y\_i^\* = \ln \frac{p\_i}{1 - p\_i} = \beta\_0 \mathbf{a}\_0 + \beta\_1 \mathbf{X}\_{1i} + \beta\_2 \mathbf{x} \mathbf{X}\_{2i} + \dots + \beta\_k \mathbf{X}\_{ki} + \mathbf{u}\_i \tag{3}$$

where:

*βj*—structural model parameters,

*ui*—random component,

*ln pi* 1−*pi* —logit,

*Y*∗ *<sup>i</sup>* —unobservable qualitative variable,

*Xji*—the values of the explanatory variables of the model,

*pi*—the probability of taking the value "1" by the dependent variable *Yi* calculated from the logistic distribution density function.

$$p\_i = \frac{e^{\mathbf{x}\_i^\prime \beta}}{1 + e^{\mathbf{X}\_i^\prime \beta}} = \frac{1}{1 + e^{-\mathbf{X}\_i^\prime \beta}} = \frac{1}{1 + e^{-(\beta \psi + \beta\_1 \mathbf{X}\_{1i} + \beta\_2 \mathbf{X}\_{2i} + \dots + \beta\_k \mathbf{X}\_{ki})}}\tag{4}$$

Unobservable variable *Y*∗ *<sup>i</sup>* is defined as a latent variable, as one can observe only the binary variable *Yi* in the form:

$$\mathcal{Y}\_{i} = \left\{ \begin{array}{c} 1; \ Y\_{i}^{\*} > 0 \\ 0; \ Y\_{i}^{\*} \leq 0 \end{array} \right. \tag{5}$$

Logit according to [53], denotes the odds ratio of accepting to not accepting the value "1" for the variable *Yi*. It takes the value zero if *pi* = 0.5. In the case when *pi* < 0.5, the odds ratio takes a negative value, and when *pi* > 0.5, a positive one.

#### *2.3. Discriminant Analysis Model*

Linear discriminant analysis (LDA), presented in 1936 [62], enables the classification of cases (objects) into one of the predetermined groups based on explanatory variables (case characteristics). The use of linear discriminant analysis to classify objects (cases) [63] or supporting decision-making processes [64] are commonly found in the literature. The aim of discriminant methods is to determine which of the explanatory variables differentiate groups the most. The discrimination problem can be solved by means of discriminant functions which are most often linear functions of input variables characterizing the cases [65]. If group sizes are not comparable, a modified form of the discriminant function should be used [65]:

$$K\_r = c\_{ro} + c\_{r1}X\_1 + c\_{r2}X\_2 + \dots + c\_{rm}X\_m + \ln \frac{n\_r}{n'} \tag{6}$$

where:

*Kr*—classification function (for the *r*-th group of cases),

*crj*—the coefficient of the *r*-th classification function with *j*-th input variable of significant discriminatory power, *j* = 0, 1, . . . , *m'*,

*cro* = *lnpri*—absolute term, probability *pi* means the *a priori* probability of qualifying the *i*-th object to the *r*-th group,

*nr*—denotes the size of a given group,

*n*—sample size.

Modeling takes place in several stages. In the first step of building the model, the discriminant function equation is sought by identifying variables that significantly discriminate groups. The next step is to check the statistical significance of the discriminant function and determine its coefficients. The next stage of the analysis is a classification procedure using classification functions.

#### *2.4. Evaluation of the Proposed Models*

To assess the quality and relevance of the performance of the proposed classification models [66], the following were proposed:


Sensitivity indicates the ratio of true positives to the sum of true positives and false negatives. In the problem under analysis, it describes the ability to detect the winning cases.

$$sensitivity = \frac{TP}{TP + FN} \tag{7}$$

Specificity means the ratio of true negatives to the sum of true negatives and false positives. In the problem examined, it describes the ability to detect the losing cases.

$$Specificity = \frac{TN}{TN + FP} \tag{8}$$

PPV (positive predictive value) denotes the probability that the case identified by the classifier as winning is indeed a winning case.

$$PPV = \frac{TP}{TP + FP} \tag{9}$$

NPV (negative predictive value) stands for the probability that the case identified by the classifier as loss is indeed a losing case.

$$NPV = \frac{TN}{FN + TN} \tag{10}$$

Effectiveness of the decision rule ACC (accuracy) implies the extent to which the results of the study reflect reality.

$$A\text{CC} = \frac{TP + TN}{TP + TN + FP + FN} \tag{11}$$

where:

*TP*—true positive results, *FP*—false positive results, *TN*—false negative results, *FN*—true negative results.

#### **3. Results and Discussion**

#### *3.1. LOG Model—The Model Using Regression Analysis*

Using logit regression, an attempt was made to estimate the qualitative variable *Y*, also trying to explain which factors, with what strength and in what direction, affect the chance of a tender success (*Y*). The parameter estimates are summarized in Table 3.

By analyzing the obtained results with the assumed significance level α = 0.1, only two variables significantly affect the model: x3—contractual conditions and x6—need for work. However, the *p* value for the variables x12—time to prepare the offer and x15—the degree of difficulty of the works, are slightly higher than 0.1, so it was decided to include these variables and recalculate the model. The parameter estimates for the logistic regression model (with four explanatory variables) are summarized in Table 4.

Finally, three variables were left (the non-significant variable x12—time to prepare an offer, was discarded) and recalculations were made.

The parameter estimates for the logistic regression model (with three explanatory variables) are summarized in Table 5.

The form of the proposed logit model (LOG model) is as follows:

$$\hat{Y}\_{i} = \ln \frac{p\_{i}}{1 - p\_{i}} = -0.9532 \cdot \mathbf{x}\_{3} - 2.2877 \cdot \mathbf{x}\_{6} + 0.6012 \cdot \mathbf{x}\_{15} + 15.9217 \tag{12}$$

This means that the probability *pi* (that is, situation *Yi* = 1) is estimated as:

$$\hat{p}\_{i} = \frac{\exp\left(-0.9532 \cdot \text{x}\_{3} - 2.2877 \cdot \text{x}\_{6} + 0.6012 \cdot \text{x}\_{15} + 15.9217\right)}{1 + \exp\left(-0.9532 \cdot \text{x}\_{3} - 2.2877 \cdot \text{x}\_{6} + 0.6012 \cdot \text{x}\_{15} + 15.9217\right)}\tag{13}$$

Statistical verification of the logit model consisting in determining the degree of the model fitting the data and testing the statistical significance of the parameters was successful. The odds quotient is 9.62 and is higher than 1 which means that the classification is nine times better than what would be expected by chance. Using the proposed logit model, it is possible to estimate the probability with which a given tender will be won.


**Table 3.** Parameter estimates for the logit model—15 explanatory variables.

\* Significance level α = 0.1

**Table 4.** Parameter estimates for the logit model—four explanatory variables.


\* Significance level α = 0.1

**Table 5.** Parameter estimates for the logit model—three explanatory variables.


\* Significance level α = 0.1

#### *3.2. LDA Model—The Model Using Discriminant Analysis*

In the first step of building the model, the equation of the discriminant function was searched for, indicating variables that significantly discriminate groups. To achieve this, the backward stepwise method was applied. In this approach, all variables are entered into the model (step 0) and then, in subsequent steps, one variable that is the least statistically significant is removed. Results with all 15 input variables (step 0) indicated at the assumed

significance level α = 0.1, that only four variables significantly discriminate between groups (x3, x5, x6, x13).

The results for the model and the evaluation of all 15 input variables (step 0) are given in Table 6.

**Table 6.** Evaluation of the parameters of the discriminant function—15 explanatory variables.


\* Significance level α = 0.05.

By analyzing the obtained results with the assumed significance level α = 0.1, only four variables (x3; x5; x6; x13) discriminated significantly between groups. The model parameters are as follows: Wilks' Lambda = 0.41344, the corresponding *F* statistic (15.72) = 6.8100, and *p* < 0.0000.

During the first step of the analysis, the variable x2 was removed—the least significantly discriminating group. Subsequent steps (*k* = 2, ... , 15) made it possible to select the most significant variables (Table 7).


**Table 7.** Evaluation of the discriminant function parameters—final model (six input variables).

\* Significance level α = 0.05.

Finally, six input variables, x1—type of works, x3—contractual conditions, x5—project value, x6—need for work, x13—possible participation of subcontractors, x15—the degree of difficulty of the works, discriminate significantly between groups. The model parameters are as follows: Wilks' Lambda = 0.47047; the corresponding statistic *F* (6.81) = 15.195; *p* < 0.0000. It is worth noting that the smaller the value of Wilks' Lambda (from the range <0, 1>) the better the discriminating power the model has. In the analyzed example (0.47047), it is acceptable. Tolerance coefficient *Tk* determines the proportion of the variance of the variable *xk* that is not explained by the variables in the model. If *Tk* coefficient takes a value smaller than the default 0.01, the variable is more than 99% redundant with other variables in the model. Entering variables with low tolerance coefficients into the model may cause its large inaccuracy. In the model under consideration, the *Tk* coefficients for the assumed variables exceed the value of 0.5.

The next step of the analysis is to check the statistical significance of the discriminant function (Table 8) and to determine its coefficients.


**Table 8.** Parameters for assessing the statistical significance of the discriminant function.

The eigenvalue of a discriminant function represents the ratio of the between-group variance to the within-group variance. Large eigenvalues characterize functions with high discriminatory power. Canonical correlation is a measure of the magnitude of the association between a grouping variable and the results of a discriminant function. It ranges from <0, 1>, where 0 means no relationship and 1 means maximum relationship. The value of 0.727691 means that the function is related to a grouping variable. The value of Wilks' Lambda is acceptable. The value of *p* = 0.000000 < 0.05. The proposed discriminant function is statistically significant and ultimately takes the following form:

$$\mathbf{D} = -12.831 + 0.509\mathbf{x}\_1 + 0.437\mathbf{x}\_3 - 0.464\mathbf{x}\_5 + 1.502\mathbf{x}\_6 + 0.615\mathbf{x}\_{13} - 0.429\mathbf{x}\_{15} \tag{14}$$

The next stage of the analysis is the classification procedure using classification functions. In the problem under analysis, two classification functions were defined (two groups were assumed; W—win, L—loss), which take the following form:

• *K*<sup>0</sup> function, classifying to "L-loss" group:

$$K\_0 = -181.383 + 7.139x\_1 + 13.094x\_3 + 0.148x\_5 + 21.275x\_6 + 15.141x\_{13} + 9.958x\_{15} + \ln\frac{64}{\text{BF}} \tag{15}$$

• *K*<sup>1</sup> function, classifying to "W-win" group:

$$K\_1 = -213.841 + 8.338x\_1 + 14.123x\_3 - 0.946x\_5 + 24.813x\_6 + 16.590x\_{13} + 8.947x\_{15} + \ln\frac{24}{88} \tag{16}$$

A given case is classified in the group for which the classification function takes the highest value.

#### *3.3. Evaluation of Models—Discussion of Results*

To evaluate the model, the classification efficiency expressed as the number of cases correctly classified into predefined classes was used. A summary of the performance of the proposed models is presented in Tables 9 and 10.

**Table 9.** Summary of classification for the LOG model.



**Table 10.** Summary of classification for the LDA model.

The data in Tables 8 and 9 enable the basic parameters of the classification model to be determined. The results are given in Table 11.



From the values in Table 10, the LOG model correctly classified 79.55% of the cases, more correctly predicting tender failure (83.82%). The values obtained show a good fit of the model, but it is worrying that the model indicated only three tender factors as statistically significant: x3—contractual conditions, x6—need for work, and x15—degree of difficulty of the works. In the case of the LDA model, classification into the set L—87.14% means that the model (analogous to the LOG model) more accurately predicts tender failure than winning (83.33%). The results obtained by the LDA model are better as it rendered 86% of correctly classified cases. The discriminant analysis, apart from the variables x3, x6, x15 indicated also x1—type of works, x5—value of the project, and x13—possible participation of subcontractors, as significant variables for the model, where the greatest independent influence on the result of the discriminant function is exerted by the variable x6—the need for work, while the least x3—contractual conditions. The following is an extract from the LDA model results sheet with the values of the classification functions in relation to the observed (actual) values shown in Table 12.


**Table 12.** Classification function values for selected cases.

\* Cases misclassified.

The analyses presented in this paper do not exhaust the issue of modeling contractors' decisions to participate in tenders for construction works. They can become a supplement of the models proposed so far in the literature. It should be noted that the construction of classification models requires having an appropriate database, which is built based on tender factors selected by the author of each model.

It is also worth emphasizing that in the face of fierce competition on the construction market, contractors are looking for solutions to maximize their chances of winning tenders. It is worth noting the observations of the authors of the study [67], who noted that the bid preparation process, which is time-consuming and requires a lot of effort, may create the need to have appropriate specialists. Typically, large companies are more able to employ such specialists, while small and medium-sized companies are definitely more likely to feel the need for tools to support the proper selection of orders and the decision to tender. It therefore appears that the proposal to build and use mathematical models is appropriate.

In further research, using the author's constructed database, the author of the paper intends to apply methods of artificial intelligence. The same database, model input and output parameters will allow to objectively compare the effectiveness of these two approaches.

#### **4. Conclusions**

The construction company at each stage of its activity has to make a number of important decisions related to the functioning of the company. One of them is the decision to enter a tender. Although it involves company finances and resources, the decision is usually taken quickly and based on subjectively perceived information. A number of models and mathematical methods have been proposed in the literature to assist the decision maker and to increase the effectiveness of the decisions taken. In this paper, two statistical classification methods are used for modeling: linear regression and linear discriminant analysis. The response obtained from the classification model is a recommendation to participate in the tender (qualification into class W—win), or a recommendation to resign (allocation into class L—loss). To perform the analyses, it was necessary to use a database consisting of historical data, that is, resolved tenders. The comparison of the classification models shows that the model using linear discriminant analysis performed well (86% correctly classified cases). The backward stepwise method was used to eliminate the least statistically significant variables. Finally, from a set of 15 identified factors, six input variables (factors) were identified that significantly discriminate between groups: x1—type of works, x3 contractual conditions, x5—project value, x6—need for work, x13—possible participation of subcontractors, x15—the degree of difficulty of the works. With these variables, the model achieved good performance. The paper by [44] presents the results of a survey in which the works contractors selected the following as the most important factors influencing the decision to enter a tender: type of works, contractual conditions, experience in similar projects, project value, need for work. As can be seen, they mostly coincide with the results obtained from statistical methods. The obtained results (effectiveness of classification and values of model evaluation parameters) testify to the possibility of using the LDA model in practice.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data sharing not applicable.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**

