**3. Methodology**

*3.1. The Scope of Research Studies*

> The following research tasks were performed:


	- 1. Changing network learning algorithms (steepest gradient, scaled conjugate gradient, Broyden–Fletcher–Goldfarb–Shanno, and RBFT radial basis function teaching);
	- 2. Network topography (multilayer perceptron and network with radial basis functions);
	- 3. Activation functions (linear, sigmoidal, exponential, hyperbolic, and sine);
	- 4. Number of hidden neurons (1–12).

This methodology first defined general research tasks and then performed network testing based on a specific example of a mechanical part. The graphic concept of the proposed method is presented in Figure 3.

**Figure 3.** The graphic concept of the proposed method of prediction of assembly time.

In the future, this research may be extended to the verification of other parts of machines and devices based on the developed model of the neural network. The selected criteria determining the assembly time are universal and it is assumed that they are also adequate to other solutions.

#### *3.2. Assessment Criteria for the Assembly Sequence*

The proposed tool, based on artificial neural networks, has the objective to support the determined sequence for manual assembly (although it is also possible to apply it, albeit after modifications, to an automated process). It was assumed that at the current stage of research it is used in a specific mechanical production company, where the conditions of the assembly process for newly introduced products are subject to ASP analysis, and the processes implemented were used to teach the network. This applies to issues related to, for example, the available machine park, production organization, process control and supervision, or the level of training of employees, especially in the aspect of manual assembly.

The following assembly sequence evaluation criteria were used as input to the process:

• Number of tool changes for the respective assembly sequence.

This criterion indicates the number of tool changes during assembly operations. Operation constitutes the main structural element of a technological assembly process. In

this work, operations should be understood as, for example, activities such as riveting, drilling, fitting, and screwing, which are related to changing tools. Depending on the type of parts to be installed, the required tools can be assigned to them in a simple manner, from the set of tools utilized in the considered assembly process.


We justify the adoption of these criteria for the evaluation, among others, with the fact that, as one of the few, they can be automatically obtained from the CAD assembly model, although it is also assumed that the data can be completed manually.

The purpose of the system is to assist in the estimation of time for all acceptable sequences under constructional constraints (i.e., feasible ones) and thus enable the selection of the most favorable one under existing manufacturing conditions. Under these evaluation criteria it is the sequence with the lowest number of tool changes, the smallest number of changes in assembly directions, and the smallest possible number of unstable states that will likely be indicated as the most favorable one; however, it is practically impossible to obtain such values with these criteria for a single sequence. This is related, for example, to the weights of individual criteria in relation to the specific assembly process.

#### *3.3. Neural Network Assumptions*

Artificial neural networks were used to evaluate the sequence of combining assembly units. For this purpose, the input and output features of the network were selected and a set of teaching examples was prepared. The input data were the number of tool changes, the number of changes in the assembly direction, and assembly stability, while the assembly time was classified as the group of output data. An important task is to provide an appropriate number of training samples and identify connections between data, which when combined allow for obtaining sufficient results and network efficiency [18]. In order to prepare the training dataset, the numerical values of individual features were normalized, allowing one to obtain independence between all analyzed data and to ensure equivalence. The numerical values of the features initially appearing in different ranges were scaled to values in the range <0.1> using a linear transformation. The task of data normalization was performed by the min-max function, calculating the difference between the scaled value and minimum value and scaling it by the range of numerical data according to the formula:

$$X^\* = \frac{X - \min(X)}{\max(\mathfrak{x}) - \min(\mathfrak{x})} \tag{2}$$

To obtain adequate efficiency, neural network training is performed, consisting of minimizing the prediction error function determined by the sum of squares (*SOS*) as defined by the formula:

$$SSS = \sum\_{i=1}^{n} \left( y\_i - y\_i^\* \right)^2 \tag{3}$$

where: *n* is the number of training examples, *yi* is an expected network output value, and *y*∗ *i* is an actual network output value.

The error surface is paraboloid-shaped with one distinct minimum, it is associated with the neurons belonging to the output layer, and it is calculated after each epoch—repeating the training algorithm. The error is related to a discrepancy between the values obtained at the network output and the reference values included in the training dataset. Errors are also determined for neurons in hidden layers by backpropagation, which consists of adjusting the weight values depending on the assessment of the neuron error in a multilayer network, using gradient optimization methods. The error backpropagation algorithm is implemented in the direction from the output layer to the input layer, which is the opposite direction to the information flow. The effectiveness of a neural network is directly related to the error function and is calculated as the ratio of correctly classified or approximated cases to all cases included in the dataset. In order to obtain the highest efficiency of prediction, the parameters describing the neural network model were changed and empirically selected: the number of layers (input, output, and hidden) and the included neurons, the presence of an additional neuron—bias and network learning rules, including the learning algorithm and activation function. The input layer consists of neurons to which the input signals are sent to the first hidden layer. The set of input data is divided into three groups: (1) training data string that allows reflection on prediction tasks, (2) test data, which check the operation of the network, (3) verification data, which evaluate the network performance based on new, previously unused set of numerical data. The number of neurons and hidden layers is selected empirically, enabling a compromise between its extensive structure and the correct generalization of the processed data. The output layer of the network is a collection of neurons representing the output signals. The number of neurons in the output layer is identical to the number of output data points constituting the result of the network. In addition, in the model of the neural network there may be an additional neuron bias, called the artificial signal generator, constituting an additional input for the neuron with a value of +1 and improving the stability of the network during the training process. The effectiveness of the network is determined by the activation function of hidden and output neurons, which take the following form: linear (directly transmitting the excitation value of the neuron to the output), logistic (sigmoidal curve with values greater than 0 and less than 1), exponential (with a negative exponent), and hyperbolic (hyperbolic tangent curve with values greater than −1 and less than 1). To verify the threshold value of the input signal needed to activate the neuron, the activation functions *f(x)* are used:

• Linear with output values in the range from − ∞ to ∞:

$$\sum\_{\mathbf{x}=(x)^{f}} \mathbf{x} \tag{4}$$

• Logistic (sigmoidal) with output values in the range from 0 to 1:

$$\sum\_{\mathbf{x}\_{-}} \tag{5}$$

• Exponential with output values in the range from 0 to ∞:

$$\prod\_{f(\mathbf{x}1) = e^{-\mathbf{x}}} \tag{6}$$

• Hyperbolic (hyperbolic tangent) with output values in the range from −1 to 1:

$$\sum\_{\lambda \subset \lambda} f(\lambda) = \sum\_{a^{\times -a^{-\times}} \\ \atop a \in \lambda} \tag{7}$$

• Sine with output values from the range from −1 to 1:

$$\bigoplus\_{\mathbf{j}(\mathbf{x}) = \sin(\mathbf{x})} \mathbf{\hat{j}}\_{f(\mathbf{x}) = \sin(\mathbf{x})} \tag{8}$$

The selection of the neural network learning algorithm affects its effectiveness. The general principle of the learning algorithms is to minimize the error function by iteratively modifying the weights assigned to neurons. The learning process involves entering successive learning cases containing information and correct network responses to a set of input values. The iterative algorithm is stopped when the ability to generalize the learning results deteriorates. There are many neural network learning algorithms. In this study, the methods of steepest descent, gradient scaling, and the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm were used. In the steepest descent method, after specifying the search direction, the minimum value of the function in this direction is determined, as opposed to the simple gradient method, which uses a shift with a constant step. An important feature of the steepest descent method is that each new direction towards the function optimum is orthogonal to the previous one. Movement in one direction continues until this direction turns out to be tangent to a certain line of constant value of the objective function. The principle of the steepest slope, when designating subsequent search directions, requires carrying out a large number of searches along the successively proposed straight lines. In this situation, a neural network teaching method based on conjugate directions is a better solution. The algorithm determines the appropriate direction of movement along the multidimensional error surface. Then a straight line is drawn over the error surface in this direction and the minimum value of the error function is determined for all points along the straight line. After finding the minimum value along the initially given direction, a new search direction is established from this minimum and the whole process is repeated. Accordingly, there is a constant shift towards decreasing values of the error function until a point is found which corresponds to the function minimum. The second derivative determined in this direction is set to zero during the next learning steps. To maintain the second derivative value of zero, the direction's conjugate to the previously chosen direction is determined. Moving in the conjugate direction does not change the fixed (zero) value of the second derivative computed along the previously selected direction. Determining the conjugate direction is associated with the assumption that the error surface has a parabolic shape. The Broyden–Fletcher–Goldfarb–Shanno algorithm refers to a quasi-Newton algorithm that modifies the weights of the interneural connections after each epoch based on the mean error gradient. The principle of operation is based on the search for the minimum squared error function with the use of a Hessian matrix (a matrix of partial derivatives of the second-order), the inverse of which is generated by an algorithm that initially uses the steepest descent method, and in the next step it refers to the estimated Hessian. For radial networks, standard learning procedures are used, including k-means center determination, k-neighbor deviation, and then output layer optimization. The k-means method is a method that consists of finding and extracting groups of similar objects (clusters). Thus, k different clusters are created; the algorithm allows one to move objects from one cluster to another until the variations within and between clusters are optimized. The similarity of data in a cluster is supposed to be as large as possible and separate clusters should differ as much as possible from each other. In the k-neighbor method, each dataset is assigned a set of n values that characterize it and then placed in an n-dimensional space. Assigning data to an existing group consists of finding the k-nearest objects in n-dimensional space and then selecting the most numerous group.

The different types of neural network topologies differ in structure and operating principles, the basis of which are the multilayer perceptron (MLP) and the network with radial basis functions (RBF). The multilayer perceptron consists of many neurons arranged in layers that calculate the sum of the inputs, and the determined excitation level is an argumen<sup>t</sup> of the activation function and then the calculated network output value. All neurons are arranged in a unidirectional structure in which the transmission of signals takes place in a strictly defined direction—from input to output. A key task in MLP network design is to determine the appropriate number of layers and neurons, usually performed empirically. A network with radial base functions often has only one hidden layer, containing radial neurons having a Gaussian character. On the other hand, a simple linear transformation is usually applied to the output layer. The task of radial neurons is to recognize the repetitive and characteristic features of input data groups.

In order to elaborate on the best model of the network, a number of constant and variable parameters were determined, tested by the multiple random sampling method, resulting in 10.000 network variants. The error of the sum of squared differences generated for each set of test parameters was established as the criterion for network effectiveness. The constant parameters of the artificial neural network are:


network models were:


#### **4. Results and Discussion**

#### *4.1. Product Structure and Results*

The structure of the product intended for assembly is presented in the form of a modified directed graph of assembly states. Moreover, we assumed that parts in the directed graph (digraph) (or the assembled units) are marked as vertices, while the directed edges demonstrate the possible sequences (paths) for assembling them. It is further assumed that the assembly of further elements takes place by adding a part or subassembly consisting of more parts (treated as a single assembled part) to the nth stage assembly. The directed edges connecting the vertices contain information about the stability of the newly formed assembly state, the direction of attachment of the parts, and the tool applied. The described digraph can be generated automatically, based on the CAD assembly drawing.

The basis for executing ASP according to the defined criteria for a specific assembly process is the determination of all assembly sequences that are feasible due to constraints of a constructional nature. The matrix record (e.g., in the form of an assembly states matrix or an assembly graph matrix) of assembly units enables us to determine all variants of assembly sequences using the appropriate algorithm (this procedure is not discussed here and it is reduced to finding all the paths in the digraph leading from the starting vertex *xs*, constituting the base part, to the final vertex *xe*, i.e., the last state of the assembled product—*xs*,... , *xe*).

The task of determining the sequence of assembly using artificial neural networks was performed for a sample product—a forklift door, consisting of eight main assembly units:


In the first stage, using the construction documentation, the base part was determined in the form of assembly unit no. 1. Then, a digraph of the structural limitations of the assembly states was constructed, shown in Figure 4.

**Figure 4.** Digraph of the structural constraints of the forklift door assembly states.

It was assumed that the assembly of subsequent units takes place by adding another assembly unit to the assembly state of the nth stage. Based on the constructed digraph recorded in the form of the assembly state matrix, we determined, with the use of a selected graph search algorithm 252, those assembly sequences that were possible under the constraints of the structural nature (Table 1) [19,20], which constitute the basis for further analysis.

**Table 1.** Selected feasible assembly sequences generated due to design constraints.


Table 2 presents the most effective neural networks for predicting the assembly time of the discussed product. By assessing the values of the sum of squared differences error and the effectiveness of the selected neural networks, it was found that the best results were obtained for network no. 9—the 3-8-1 RBF (Figure 5). We selected it for further analysis (a network with radial basis functions with three input, eight hidden, and one output neurons), in which hidden neurons were activated by a Gaussian function, and output neurons by a linear one, obtaining about 99% efficiency for the group of verification data.

**Table 2.** Values of neural network parameters that were found best for prediction of assembly time.


**Figure 5.** RBF network model (x1 is the number of tool changes, x2 is the number of changes in assembly directions, x3 is a stability of the assembly unit, n1–n8 are hidden neurons, p is a bias, and y is an assembly time).

Figure 6 presents the changes in the value of the learning error of the selected RBF network depending on the number of learning cycles. The neural network was found in the first learning cycle—after the first iteration of the training algorithm. The stabilization of the error value occurred in the sixth learning cycle.

**Figure 6.** Changes in the value of network learning errors depending on the number of learning cycles.

In the learning process of the neural network, the weight values for all neurons are adjusted. This has an impact on the obtained results because the weights can weaken (negative values) or strengthen (positive values) the signals transferred by individual layers of the network. Table 3 presents the weight values generated for the analyzed RBF network.

**Table 3.** Neural network weights for prediction of assembly time and network parameters that were found best for prediction of assembly time.


Table 4 summarizes the actual and expected assembly time prediction values, whereas Figure 7 is a graphical interpretation of their dependencies. A set of verification data containing previously unused input and output data was selected for the analysis. The results of the analysis confirm the effectiveness of the prediction performed by the RBF 3-8-1 neural network. The obtained results, both expected and obtained at the network output, are comparable. The operation of the network was tested on the basis of 10 random assembly sequences and the result of assembly time was obtained for each of them. Based on the results presented in Table 4, it can be indicated which of the assembly sequences was characterized by the shortest assembly time, which is therefore the optimal solution.


**Table 4.** Assembly time values expected and obtained at the network output.

**Figure 7.** Comparison of the expected and obtained assembly time values at the network output.
