**3. Estimation of Markov Logistic Model Parameters**

The first stage of the study was to define possible explanatory variables in order to determine which of them could be used in the model. The following explanatory variables were selected: shift, device, occurrence of failure (yes or no) and no production order (yes or no).

The shift predictor was analyzed first. First of all, the normality of distribution and homogeneity of the variance of the efficiency dependent variable during individual shifts was examined in order to determine the possible methods of statistical analysis. The distributions in all groups turned out to be inconsistent with the normal distribution, which is confirmed by the graphs in Figure 1 and the calculated chi-square test statistic values, presented in Table 1.

**Figure 1.** Graphs of normality of distribution of the efficiency variable grouped by shifts.



Next, the homogeneity of variance in individual groups was checked; the Levene and Brown-Forsythe test was used for this purpose. The obtained results are presented in Table 2.

**Table 2.** Results of the Levene and Brown-Forsythe tests of the efficiency variable grouped by shifts.


Although the homogeneity of variance was confirmed in all groups, due to the lack of normality of distributions, the Mann–Whitney test was used to examine the significance of differences between individual averages, and the results thereof are presented in Table 3.

**Table 3.** Results of the Mann-Whitney test for the difference between the average efficiency of individual shifts.


The analyses showed that there were no significant differences between the efficiency of the second and third shift, so a decision was made to combine them. However, the values obtained for the first shift differ significantly from those obtained for the other shifts, therefore this group was left without interference. These conclusions are confirmed by Figure 2 showing the differences described.

**Figure 2.** Frame diagram of the efficiency variable grouped by shifts.

The same test was performed for the device variable. The machines analyzed were of a single type and came from a single production batch, which suggests that their productivity would be similar. In order to confirm the equality of averages, the analysis of distribution normality and variance equality in individual groups (this time defined by the device variable) was carried out again in order to select a proper statistical distribution. The results of the normality test did not confirm the conformity. All the calculated chi-square test statistic values did not allow the zero hypothesis of the compatibility of the examined distribution with the normal one to be accepted. A definite deviation is confirmed by Figure 3.

The analysis of the equality of variance using the Levene and Brown-Forsythe tests showed that variances are not equal in some groups. Consequently, the Mann–Whitney test was used to check the difference between averages, the results of which are presented in Table 4.


**Table 4.** Results of the Mann-Whitney test for the difference between the average efficiency of individual shifts.

**Figure 3.** Graphs of normality of distribution of the efficiency variable grouped by the device variable.

Since the average efficiency varied for virtually every pair of devices, a decision was made not to combine them and to include each of them in the study. After defining the form of independent variables, the impact of each of them on the dependent variable, i.e., efficiency, was checked, but presented in a dichotomous form, as an assessment of whether the level achieved was satisfactory for the company. In line with the expectations of the Management Board, it was assumed that the assessment was positive if the productivity was equal to or above 90%. In other cases, the assessment would be negative. The chi-square test allowed for a statistical and substantive study of the relationship between variables. In all cases, the calculated test statistic did not allow the zero hypothesis on the lack of relationships between variables to be accepted. It was therefore rejected in favor of the alternative hypothesis of the existence of a relationship, the strength of which was measured using Yule's Φ (for binary tables) and Cramér's V coefficient (for tables more complex than 2 × 2). The obtained results are presented in Table 5.

The observed relationships between variables, although significant, are not strong. This is also confirmed by the graphs of interaction of individual dependent variables with the explained variable (Figure 4). Nevertheless, from the point of view of the analyzed company, the diagnosed bonds should not take place at all. A uniform and efficient operation of all devices is expected, so even minor deviations are undesirable and require further investigation.

The calculations carried out (Table 5) and the charts (Figure 4) confirm that the model variables were selected correctly. This allows the parameters of the logistic regression model to be estimated, the values of which are presented in Table 6.


**Table 5.** Results of the tests of significance and strength of the relationship between the predictors and the efficiency variable.

**Table 6.** Parameters of the logistic regression model and their evaluation.


**Figure 4.** Interaction charts of dependent variable and predictors.

All calculated parameters turned out to be statistically significant, which is confirmed by the calculated Wald's statistic value and the associated probability value *p*, which for each line is lower than the assumed level of significance α = 0.05. (Table 6). This means that all the distinguished factors significantly affect the evaluation of production efficiency. This allows the equation of the logistic regression model to be written in the following form:

$$P(efficiency = \text{yes} | \mathbf{X}) = \frac{\mathbf{c}^a}{1 + \mathbf{c}^a} \tag{9}$$

where

$$\begin{array}{l} a = -7.549 - 0.292 \text{ shift } I - 1.241 \ast H2 - 1.066 \ast H5 - 1.153 \ast H6 - 0.668 \ast H12 - 0.866 \ast H6\\ H12 - 0.809 \ast H21 - 0.496 \ast H22 - 0.525 \ast H23 - 0.873 \ast H24 - 1.167 \ast H12 \\\ H25 - 0.317 \ast H4 + 5.047 \ast no \, order \, + 3.08 \ast no \, failure. \end{array} \tag{10}$$

The logistic regression curve is shown in Figure 5. The logistic regression equation presented above can also take equivalent forms:

• logistic regression logit function:

$$\begin{aligned} \text{logit P}(\text{efficient} = 1 | \text{X}) &= \text{In} \frac{\text{P}(\text{efficient} = 1 | \text{X})}{1 - \text{P}(\text{efficient} = 1 | \text{X})} = -7.549 - 0.292 \text{ 1st shift} \\ 1.241 \ast H2 - 1.006 \ast H5 - 1.153 \ast H6 - 0.668 \ast H12 - 0.809 \ast H21 - 0.496 \ast \\ H22 - 0.525 \ast H23 - 0.873 \ast H24 - 1.167 \ast H25 - 0.317 \ast H4 \ast + 5.047 \ast \\ &\quad \text{no order } + 3.08 \ast \text{no failure} \end{aligned} \tag{11}$$

• in the form of the odds:

$$P\frac{P(efficient = 1 | \mathbf{X})}{1 - P(efficient = 1 | \mathbf{X})} = \varepsilon^a,\tag{12}$$

#### where

$$\begin{array}{l} a = -7.549 - 0.292 \quad \text{shift } l - 1.241 \ast H2 - 1.066 \ast H5 - 1.153 \ast H6 - 0.668 \ast H12 - 0.866 \ast H6\\ H12 - 0.809 \ast H21 - 0.496 \ast H22 - 0.525 \ast H23 - 0.873 \ast H24 - 1.167 \ast H12 \\\ H25 - 0.317 \ast H4 \; + 5.047 \ast no \, order \; + 3.08 \ast no \, failure. \end{array} \tag{13}$$

**Figure 5.** Logistic regression curve.

An important element of the evaluation of the studied process is the calculation of the odds of an event occurrence (in this case a satisfactory result of efficiency). The sign at the estimated parameter of the logistic regression model indicates whether the analyzed odds are greater (plus) or smaller (minus) in relation to the reference level. The scale of this change is indicated by the unit odds ratio shown for each parameter in Table 7.


**Table 7.** Odds ratios for individual predictors.

The odds ratio for the 1st shift is 0.75, which means that compared to the 2nd shift, the odds of achieving satisfactory efficiency is 0.75 times lower. In other words, the odds for the proper efficiency is 1.34 times higher in the case of the 2nd shift. The H14 machine was used as a reference when analyzing the impact on the efficiency of individual devices. Its efficiency is the highest in the test sample, so all the model coefficients obtained are negative, which means less odds of achieving a positive result. The odds ratios are given in column 3 of Table 6. The worst result was obtained for the H2 machine with an odds ratio of 0.289, which means an almost 3.5-fold increase in the odds of achieving satisfactory efficiency when replacing H2 with H14.

The results for the other machines, showing how much the efficiency of each machine should be increased in order to obtain the efficiency evaluation as in the case of the H12 machine, are shown in Table 8.


**Table 8.** Odds ratios for individual predictors.

The last two parameters in Table 7 refer to the absence of an order or failure, as their occurrence has a negative impact on the efficiency. Where there is no downtime, the odds of achieving the expected efficiency are 155 times greater than otherwise. Similarly, the occurrence of a failure has a similar effect, but its absence is not as spectacular. There is a 21-fold increase in the odds if the failure does not occur.

The presented model can also be used for predictive purposes, allowing for forecasting the probability of achieving the predicted success (here, the assumed efficiency). It is therefore important to assess the quality of the prediction. For this purpose, it is helpful to determine the so-called cut-off point π0. This parameter allows the observed dichotomous values of a dependent variable to be compared with the continuous probability values calculated on the basis of the model. This value falls within the range (0, 1) and is defined as follows [36] when:

$$
\hat{\pi}(\mathbf{x}) = \hat{P}(Y=1|\mathbf{x}) > \pi\_{0\prime} \tag{14}
$$

it is assumed that an event has occurred (*y*ˆ = 1). In the opposite situation, when

$$
\pi(x) \le \pi\_{0\prime} \tag{15}
$$

it is assumed that an event has not occurred (*y*ˆ = 0).

Prediction ideally occurs when sensitivity and specificity are equal to 1, which means no false positive or negative results. In real life research, the point corresponding to a case where a model best discriminates occurrences is called the optimal cut-off point. It is determined using the Youden's index (*J*), which takes the following form:

$$J = sensitivity + specific \, y - 1. \tag{16}$$

The optimum cut-off point corresponds to the case where the *J* value reaches its maximum. For the case under consideration, the proposed cut-off point is shown in Table 9.

**Table 9.** Cut-off point of the logistic regression model concerned.


For the proposed cut-off point, the sensitivity is 0.32, and the specificity is 0.86. There are 16,767 well classified cases (2864 true positive and 13,903 true negative) and 8115 badly classified cases (2210 false positive and 5905 false negative cases).

Based on the above table it is possible to assess the effectiveness of model prediction in relation to successes and failures, using tools among which one can distinguish such statistics as accuracy, sensitivity or specificity, ROC (receiver operating characteristic) curve or values of rank correlations.

The simplest measure is accuracy, calculated according to the following formula:

$$Accuracy = \text{ACC} = \frac{TP + TN}{TP + TN + FP + FN} \tag{17}$$

where:

*TP*—number of true positive results,

*TN*—number of true negative results,

*FP*—number of false positive results,

*FN*—number of false negative results.

For the model concerned,

$$\text{ACC} = \frac{2864 + 13903}{2864 + 5905 + 2210 + 13903} = 0.674 = 67.4\%. \tag{18}$$

However, the sensitivity *SE* and specificity *SP* are most often considered in such analyses but treated as pairs, which, after being marked on the plane and after connecting the points with segments, form the so-called ROC curve. For the analyzed model this curve is presented in Figure 6.

**Figure 6.** ROC curve for the model concerned.

The most important parameter for assessing the ROC curve is AUC—area under the ROC curve. It takes values from 0 to 1. The interpretation of the result was based on the Kleinbaum and Klein classification (Table 10), according to which discrimination is sufficient [37].


**Table 10.** Cut-off point of the logistic regression model concerned.

The model can therefore be considered satisfactory, although it is recommended rather for qualitative analysis of processes and modification of production strategy on its basis.

Production in the company in question is carried out in a continuous three-shift system. Regardless of the production plan, the plant is fully manned and all machines are in operation at all times. The acceptable level of efficiency assumed by the management should be 90%, which allows all scheduled and expected downtime to be taken into account. The study indicates that in most cases this level is not reached. In almost 25,000 out of over 156,000 observations, this level was not reached. It turned out that the efficiency of the first shift was lower compared to the second and third ones, which suggests the need to diagnose the causes of circumstances or even to restructure the shift system. The use of individual machines also has a negative impact on efficiency. It turns out that many of them represent much lower efficiency than the one taken as a reference point, the efficiency of which was the highest (but also less than 100%). Of course, a lack of orders also leads to a significant reduction in productivity. Such a result, next to failures that occur, indicates the need for a detailed analysis of the machinery. It might be advisable to exclude several machines from the production process so that production is closely correlated with demand. Unused machines could constitute a reserve in case of failure and thus increase the level of readiness of the machinery.

#### **4. Conclusions**

The reliability of machinery and equipment is an essential part of the proper functioning of a business. Modern technologies support the maintenance of an adequate level of readiness and suitability of the technical infrastructure. They not only facilitate production control and implementation of modern operating strategies, but also ensure continuous monitoring of processes and detection of any disturbances or failures. The activities carried out in this area boil down to balancing the maintenance of full operational efficiency and continuity of production and ensuring an acceptable level of costs of these activities.

This is a difficult task, particularly in the case of older generation machinery stock, which is deprived of support from computerized production management systems—as the one presented in this article. In any case, however, the aim is to ensure that machines function perfectly without failures and that products are manufactured without defects. On the other hand, it is also important to ensure the efficiency of the equipment use and balanced workload, which proved to be a problem in the analyzed company. This was the reason for undertaking research in this field and the basis for mathematical analysis of the effectiveness of the manufacturing process.

The lack of IT systems for controlling and monitoring the production process results in the inability to archive data on an ongoing basis, which makes it much more difficult to control processes, and sometimes it is conducive to abandonment thereof. Poor quality of recorded documentation imposes significant limitations on the use of mathematical tools as well; therefore, the authors wanted to present simultaneously that even in such a situation it is possible to create mathematical models that would improve the efficiency of the machinery stock. The available data proved to be sufficient to achieve the research objective, which was to formulate a model providing an unambiguous answer as to whether the efficiency of the equipment used in production is acceptable from the point of view of the assumptions made by the company (in this case 90%).

This was made possible through the use of logistic regression, which, above all, does not require the meeting of assumptions made by other mathematical models, e.g., linear regression and general linear models. The advantage of this method is also the form of the dependent variable. The predictor here is a dichotomous variable and its values can be interpreted as the probability of an event occurring. The organization of production in the analyzed company has remained unchanged for many years. All machines work three shifts every day. Regardless of the orders placed and the market demand, full staff is employed. Machines are taken out of the process only in cases of random incidents. Lack of modification of the adopted procedures and control of the implemented processes results—as demonstrated in the research—in the process not being effective and the use of machinery not being optimal.

The logistic regression model made it possible to identify the causes influencing machinery efficiency. It turned out that the load is not identical during every shift, the productivity is much lower during the first shift in comparison to the other shifts. Restructuring the shift system and limiting the production process to only two shifts or modifying the working time could increase the productivity of machinery, optimize the use of human resources and reduce the costs of the production process. The load on the individual machines also appeared to be disproportionate. Increased use of one piece of equipment may result in an increase in the frequency of breakdowns, increase the costs of repairs

and reduce the total life of the equipment, so it would be advisable to evenly distribute production across all equipment. The reduction in productivity was also caused by a decline in the number of production orders. The lack of production control with regard to orders received and the maintenance of a continuous three-shift readiness exposes the company to costs and favors the aforementioned disproportionate workload.

The studied company has no IT systems in place that enable comprehensive monitoring of production processes. Therefore, the unquestionable advantage of the proposed model is the provision of additional information allowing decisions to be made on the production and use of machinery. They can also encourage the implementation of modern solutions and the abandonment of traditional, outdated methods of recording and archiving data.

In companies that use specialist MES systems based on real-time information on manufacturing execution at subsequent workstations, the proposed model can improve monitoring of productivity drops below the adopted level and activate preventive actions. Additionally, the implementation of data obtained from the IT system may allow to the model to be extended with additional parameters, which is important from the point of view of individual companies.

The aim of the article was to investigate the possibility of developing a model for the analysis and evaluation of the level of efficiency of ongoing production processes, as well as to indicate the method of logistic regression as a tool supporting decision-making in this respect. The model developed for the analyzed company indicated the need for a strict correlation between the demand for a product and the production process. Adopted strategies require verification and modification.

The proposed model may also serve as a basis for setting directions for improvement of the production process, by maximizing the use of the machinery stock and reducing the idle time of both employees and equipment. Re-application of the logistic regression model constructed on the basis of the observation of the process after introduction of changes allows the effectiveness and efficiency of implemented solutions to be evaluated.

**Author Contributions:** Conceptualization, A.B. and M.G.; Formal analysis, A.B.; Methodology, A.B.; Resources, M.G.; Writing—original draft, A.B. and M.G.; Writing—review & editing, A.B.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
