The presented method is based on an ANN architecture called predictor–evaluator network (PEN), which was developed by the authors for this purpose. The predictor is the trainable part of the PEN and its task is to generate—based on input data—optimized geometries.
As mentioned, unlike the state-of-the-art methods, no conventionally topology-optimized or computationally prepared data are used in the training. The geometries used for the training are created by the predictor itself on the basis of randomly generated input data and evaluated by the remaining components of the PEN, called evaluators.
The evaluators perform mathematical operations. Other than the predictor, the operations performed by the evaluators are pre-defined and do not change during the training. It would also be possible to use evaluators based on an ANN.
The predictor, the individual evaluators, their tasks and their way of operation are explained in detail in the following sections.
2.1. Basic Definitions
In topology optimization, the design domain is typically subdivided into elements by appropriate meshing.
Figure 2 visualizes the elements (with one element hatched) and nodes.
In this work, we examined only square meshes with equal numbers of rows and columns. However, this method can be used for non-square and three-dimensional geometries.
The total number of elements in the 2D case is as follows:
where
is the number of rows and
the number of columns (see
Figure 2). In the square case, the number of rows and columns are equal:
.
The design variables , termed density values, scale the contributions of the single elements to the stiffness matrix. The density has a value of one when the stiffness contribution of the element is fully preserved and zero when it disappears.
The density values are collected in a vector
. In general, the density values
are defined in the interval [0,1]. In order to prevent possible singularities of the stiffness matrix, a lower limit value
for the entries of
is set as follows [
2]:
The vector of design variables
can be transformed to a square matrix
of order
d by using the
operator:
Although a binary selection of the density is desired (discrete TO, material present/not present), values between zero and one are permitted for algorithmic reasons (continuous TO). To get closer to the desired binary selection of densities, the so-called penalization can be used in the calculation of the compliance. The penalization is realized by an element-wise exponentiation of the densities by the penalization exponent
[
26].
The arithmetic mean of all
defines the degree of filling of the geometry as follows:
The target value is the degree of filling that is to be achieved by the predictor.
Figure 3 shows an overview of the processing of boundary conditions. In this figure as well as in following figures, the symbols represent the kinematic boundary conditions (structural supports) and the vectors of the static boundary conditions (applied forces).
The kinematic boundary conditions are stored in two Boolean matrices and . The entries of are set to one if the x-component of the displacement in the corresponding node is fixed, and to zero otherwise. Analogously, the entries of are set according to the fixed y-components of the displacements. Both matrices can be transformed into vectors with the operator, which is the inverse of the operator , and then arranged in sequence so that the vector is created.
Analogous to the kinematic boundary conditions, two matrices and are firstly built on the basis of the static boundary conditions. The x- and the y-components of the applied forces are placed, respectively, into the matrices in correspondence to their magnitude, while the remaining entries are set to zero. The matrices and are then converted into the vector .
Investigations showed that the training speed could be increased, for high-resolution geometries, by dividing the training into levels with increasing resolution. Since smaller geometries are trained several orders of magnitude faster and the knowledge gained is also used for higher resolution geometries, the overall training time is reduced, compared to the training that uses only high-resolution geometries. The levels are labeled with the integer number .
Increasing by 1 results in doubling the number d of rows or columns of the design domain’s mesh. This is done by quartering the elements of the previous level. In this way, the nodes of the previous level are kept in the new level. The number of row or columns at the first level is denoted as .
The input data of the predictor include the kinematic
and static
boundary conditions as well as the target degree of filling
. The output of the predictor is vector
, which consists of the individual degree of filling values for each element. Input data can be only defined at the initial level and do not change when the level is changed. Hence, new nodes cannot be subject to static or kinematic boundary conditions (see
Figure 4). When the level is changed, only the dimension of the outputs changes; the dimension of the inputs remains constant. The change in level occurs after a certain condition—which will be described later—is fulfilled.
2.2. Predictor
The predictor is responsible for generating the optimized result for a given input data point. Its ANN-architecture consists of multiple hidden layers, convolutional layers and output layers (see
Figure 5).
All parameters that can be changed during training, such as the bias, the slope of the parametric rectified linear unit (PReLU) as well as the weights of the hidden layers, are generally referred to as trainable parameters in the following. They are collected in the matrix
. The operations performed by the predictor can be represented by function
as follows:
The predictor’s topology is shown in
Figure 5 in a simplified form. An input data point (top left) is processed by several successive hidden blocks and then passed on to some residual network (ResNet)-blocks. In order to reduce the resolution to a lower level
, average pooling is used.
In
Figure 5, the hidden block is the combination of a hidden or fully connected layer and an activation function call. The ResNet-block is the combination of two (convolutional) layers and a shortcut that is added as a bypass to the output of the layers. The ResNet-block allows for faster learning but also reduces the error [
27].
The PReLU function [
28] is used as the activation function
in the hidden and convolutional layers. The PReLU function is the equivalent of the rectified linear unit (ReLU) function [
28]:
with the difference of a variable negative slope
, which can be adapted during training as follows:
The sigmoid function
is well suited as an activation function for the output layer because it provides results in the interval (0,1); see
Figure 6. This makes the predictor’s output directly suitable to describe the density values of the geometry.
2.3. Evaluator: Compliance
The task of the compliance evaluator is the computation of the global mean compliance. For this purpose, an algorithm based on finite element method (FEM) [
26] is used. The global mean compliance
is defined according to [
26] as follows:
where
is the stiffness matrix,
is the force vector and
is the displacement vector. The compliance has the dimension of energy. As is usual in [
26,
29], in the following, the units will be omitted for the sake of simplicity.
As already explained, the static boundary conditions vector
consists first of x-entries and then y-entries. Since the degrees of freedom of the stiffness matrix are arranged in an alternate way (one x-entry and one y-entry), the force vector is to be built accordingly. In order to transform the static boundary condition vector
into the force vector
, the number of nodes
and a collocation matrix
are required. The force vector is then obtained as follows:
The system’s equation is as follows:
The stiffness matrix
depends linearly on the geometry
and is expressed by the following:
where the matrices
are the unscaled contributions of the single elements to the stiffness matrix. The penalization exponent
p achieves the desired focusing of the geometry toward the limits of values
and 1 as described in
Section 2.1.
The stiffness matrix
is then reduced by removing the columns and rows corresponding to the fixed degrees of freedom according to the kinematic boundary conditions. The result is the reduced stiffness matrix
, which then can be inverted. The reduced force vector
is determined according to the same principle. From the reduced equation,
the reduced displacement vector is obtained as follows:
The reduced global mean compliance
is finally computed as follows:
where the calculation of the mean global compliance
c according to (
12) or
according to (
20) leads to the same result since
at the fixed degrees of freedom vanishes and, therefore, has no effect on
c.
2.5. Evaluator: Filter
The filter evaluator searches for checkerboard patterns in the geometry and outputs a scalar value
that points to the amount and extent of checkerboard patterns detected. These checkerboard patterns consist of alternating high and low density values of the geometry. They are undesirable because they do not reflect the optimal material distribution and are difficult to transfer to real parts. These checkerboard patterns exist due to bad numerical modeling [
30].
Several solutions for the checkerboard problem were developed in the framework of conventional topology optimization [
31]. In this work, a new strategy was chosen, which allows for inclusion of the checkerboard filter into the quality function. In the present approach, checkerboard patterns are admitted but detected and penalized accordingly. Since the type of implementation is fundamentally different, it is not possible to compare the conventional filter method with the filter evaluator. With the matrix
the following two-dimensional convolution operation (discrete convolution) is performed:
In detail, the convolution operation is carried out as follows:
The convolution matrix
is visualized in
Figure 7 for an exemplary case.
The matrix
has high values in areas where the geometry has checkerboard patterns. A first indicator can be computed as the mean value of the convolution matrix:
This indicator would already be sufficient to exclude geometries with checkerboard patterns but also penalizes good geometries without recognizable checkerboard patterns. Therefore, an improved indicator is formed on the basis of the mean value and with the help of the
e-function, which is less sensitive to small mean values but nevertheless results in a corresponding penalization for large checkerboard patterns:
where the parameter
controls the shape of the
F-function (see
Figure 8).
2.7. Quality Function and Objective Function
The task of the quality function is to combine all evaluator losses into one scalar. The following additional requirements must be considered:
The function should have a simple mathematical form, in order to not complicate the minimum search.
The function must be monotonically increasing with respect to the evaluators’ losses.
The function contains coefficients to control the relative influence of the evaluators losses.
The most obvious variant fulfilling these criteria is a linear combination of the losses. The problem with this choice consists of the different and variable order of magnitude of the compliance loss with respect to the other losses. For a given choice of the coefficients, the relative influence of the losses changes for different parametrization and input data points. To avoid this drawback, a quality function in the following form is chosen:
The addition of the constant value prevents the quality function from being dominated by one loss when its value is close to zero.
For every single data point, one value of
exists. Optimization on the basis of single data points would require a large computational effort and lead to instabilities of the training process (large jumps of the objective function output). Therefore, a given number
of training data points (batch) is used, and the corresponding quality function values are combined in one scalar value, which works as an objective function for the optimization that rules the training. The value of the objective function
is calculated as the arithmetic mean of the quality function values obtained for the single training data points of the batch. Investigations showed that averaging the quality function outputs over numerous training data points stabilizes the training procedure. The disadvantage of this averaging is the possibility of forming prejudices, e.g., if one element is frequently present, then its frequency is also learned, even if the element’s contribution to stiffness is in some cases small or non-existent.
2.8. Training
The overview in
Figure 9 describes the training process for a single level. Here, it is visible that during a batch iteration, the input data points are calculated randomly and then passed to the predictor as well as to the evaluators.
Within one batch, the input data points are randomly generated, and the predictor creates the corresponding geometries
. Afterwards, the quality function is computed from the evaluators’ losses, according to (
30). The objective function
J is then calculated for the whole batch. Then, the gradient
of the objective function, with respect to the trainable parameters, is calculated. The trainable parameters of the predictor for the next batch are then adjusted according to the gradient descent method to decrease the value of the objective function. In order to apply the efficient gradient descent method, the functions must be differentiable with respect to the trainable parameters [
32]. For this reason, the evaluators and the objective function use only differentiable functions.
When the level increases, the predictor outputs a geometry with higher resolution, and the process starts again at batch .
It is important to stress that, unlike conventional topology optimization, the PEN method does not optimize the density values of the geometry, but only the weights of the predictor.