**Sense and Respond: Industrial Applications of Smart Sensors in Cyber-Physical Systems**

Editors

**Javier Villalba-Diez Joaquin Ordieres Mer´e**

MDPI ' Basel ' Beijing ' Wuhan ' Barcelona ' Belgrade ' Manchester ' Tokyo ' Cluj ' Tianjin

*Editors* Javier Villalba-Diez Management und Vertrieb Hochschule Heilbronn Schwabisch Hall ¨ Germany

Joaquin Ordieres Mere´ Industrial Engineering School Universidad Politecnica de ´ Madrid Madrid Spain

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Sensors* (ISSN 1424-8220) (available at: www.mdpi.com/journal/sensors/special issues/industrial applications smart sensors).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-3814-3 (Hbk) ISBN 978-3-0365-3813-6 (PDF)**

© 2022 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

### **Contents**


### **About the Editors**

#### **Javier Villalba-Diez**

Javier Villalba-Diez, PhD., aims to empower individuals and organizations to achieve their strategic goals while increasing trust.

Dr. Villalba-Diez is a Mechanical Engineer with Technische Universita t Mu nchen, Germany and Industrial Engineer with Technical University Madrid, Spain (2003). He received his PhD in Engineering, Economics and Organizational Innovation with a focus on Strategic Organizational Design from the Universidad Polite cnica de Madrid in 2016. His PhD was awarded with the prize for the best doctoral thesis. In March 2022, he received a second PhD in Engineering from the Universidad Polite cnica de Madrid and Applied Physics with a focus on industrial applications of Quantum Computation.

His current research interests include Quantum Computing, Deep Learning, Hoshin Kanri, Strategic Organizational Design, Business and Artificial Intelligence. He has 15 years'worth of experience as a lean consultant and production manager in a number of positions related to manufacturing operations in German, American and Japanese manufacturing facilities.

His research and work, currently performed at Hochschule Heilbronn in Germany, has brought him to numerous companies and hundreds of factories, where he collaborates with people to test ideas and share lessons learned.

He splits his time between Germany, USA, Japan, and Spain.

#### **Joaquin Ordieres Mer´e**

Prof Ordieres-Mere is a full professor at the Industrial Engineering School of the Universidad ´ Politecnica de Madrid, Spain, and his research aims are to increase the understanding of Integrated ´ Manufacturing processes and their identification and optimization with the help of Business Analytic tools, including artificial intelligence and quantum computing.

In particular, complex processes involving human and technological devices as well as complex socio-technical systems are targeted. The aim is to be able to translate the gained knowledge into advanced tools helping decision-making processes that managers need to carry out, by providing them with either support or guidance.

Different type of processes and industries have been explored, such as building physics, steel-making, and rubber industries. In addition, some interest was also paid to more scientific fields, such as pollution prediction or digital astrophysics, where similar tools helped to bring additional value and knowledge.

His research has been cited more than 8000 times, and he has published more than one hundred and fifty journal papers and a similar number of conference papers as well as fifteen patent applications.

### **Preface to "Sense and Respond: Industrial Applications of Smart Sensors in Cyber-Physical Systems"**

Over the past century, the manufacturing industry has undergone a number of paradigm shifts: from the Ford assembly line (1900s) and its focus on efficiency to the Toyota production system (1960s) and its focus on effectiveness and JIDOKA; from flexible manufacturing (1980s) to reconfigurable manufacturing (1990s) (both following the trend of mass customization); and from agent-based manufacturing (2000s) to cloud manufacturing (2010s) (both deploying the value stream complexity into the material and information flow, respectively).

The next natural evolutionary step is to provide value by creating industrial cyber-physical assets with human-like intelligence. This will only be possible by further integrating strategic smart sensor technology into the manufacturing cyber-physical value creating processes in which industrial equipment is monitored and controlled for analyzing compression, temperature, moisture, vibrations, and performance. For instance, in the new wave of the 'Industrial Internet of Things'(IIoT), smart sensors will enable the development of new applications by interconnecting software, machines, and humans throughout the manufacturing process, thus enabling suppliers and manufacturers to rapidly respond to changing standards. This reprint of "Sense and Respond"aims to cover recent developments in the field of industrial applications, especially smart sensor technologies that increase the productivity, quality, reliability, and safety of industrial cyber-physical value-creating processes.

This reprint is dedicated to Moritz Seydler, a boy with great talents. Remember that discipline is the root of all good qualities.

> **Javier Villalba-Diez and Joaquin Ordieres Mer´e** *Editors*

## **Geometric Deep Lean Learning: Deep Learning in Industry 4.0 Cyber–Physical Complex Networks**

**Javier Villalba-Díez 1,2,3, \* , Martin Molina 2 , Joaquín Ordieres-Meré 3 , Shengjing Sun 3,4 , Daniel Schmidt 3,5 and Wanja Wellbrock 1**


Received: 31 December 2019; Accepted: 29 January 2020; Published: 30 January 2020

**Abstract:** In the near future, value streams associated with Industry 4.0 will be formed by interconnected cyber–physical elements forming complex networks that generate huge amounts of data in real time. The success or failure of industry leaders interested in the continuous improvement of lean management systems in this context is determined by their ability to recognize behavioral patterns in these big data structured within non-Euclidean domains, such as these dynamic sociotechnical complex networks. We assume that artificial intelligence in general and deep learning in particular may be able to help find useful patterns of behavior in 4.0 industrial environments in the lean management of cyber–physical systems. However, although these technologies have meant a paradigm shift in the resolution of complex problems in the past, the traditional methods of deep learning, focused on image or video analysis, both with regular structures, are not able to help in this specific field. This is why this work focuses on proposing geometric deep lean learning, a mathematical methodology that describes deep-lean-learning operations such as convolution and pooling on cyber–physical Industry 4.0 graphs. Geometric deep lean learning is expected to positively support sustainable organizational growth because customers and suppliers ought to be able to reach new levels of transparency and traceability on the quality and efficiency of processes that generate new business for both, hence generating new products, services, and cooperation opportunities in a cyber–physical environment.

**Keywords:** Industry 4.0; IIoT; geometric deep learning; lean management

#### **1. Introduction**

Today it seems almost a truism to talk about the fact that data surround us. According to recent studies, by 2025 humanity will have created about 163 zettabytes of information [1]. However, the alarming thing is not that we are going to be flooded with data, but that these data will be very different from the data with which we are used to dealing in classical disciplines such as signal or image processing, statistics, or machine learning. Beyond this, the data we will face are data that will emerge from the trillions of objects connected to the Internet of Things (IoT). In many cases, including the industrial IoT (IIoT), these data are produced by distributed sources, such as thousands

of sensors in factories, i.e., data are distributed over networks. Managing large amounts of data in these ever-expanding networks raises nontrivial concerns about the efficiency of data collection, processing, analysis, and security [2,3]. Currently, data from processes and systems are collected and stored without a clear strategy, and this can be a barrier to implementing paradigms such as "social manufacturing" [4]. In addition to being distributed, these data may be unstructured, and therefore cannot generally be encapsulated in one table. A defined strategy is therefore needed on what kind of data to collect at the technical and the organizational level. Finally, in addition to numerical, data can be ordinal, categorical, or other. The aim of this work is to introduce the reader to a series of concepts that pave the way for processing these data by means of adapted deep-learning techniques [5].

The purpose of this work is to study the possibility of providing Industry 4.0 leaders with a theoretical model that allows for the extraction of relevant patterns embedded within their organizations by means of artificial intelligence. Specifically, the goal of this work is to provide the reader with mathematical models that adapt convolutional and pooling deep-learning operations, hence describing the possible use of geometric deep-learning architectures on non-Euclidean Industry 4.0 complex cyber–physical networks. The structure of this work is as follows: First, Section 2 provides relevant background information, clarification, and definitions. Second, Section 3 provides a framework of previous relevant concepts regarding deep learning, specifically geometric deep learning. Third, Section 4 provides mathematical models to compute geometric-deep-learning algorithms over Industry 4.0 lean-management complex-networked cyber–physical systems. Finally, Section 5 outlines the conclusions and managerial implications of this model, and its applications in the field.

#### **2. Background**

This brief section presents and defines fundamental preliminary concepts to the comprehensive understanding of the presented content in the following sections of this work:


Within this framework, a complex network is defined as a graph with nontrivial topological features that do not occur in simple graphs such as lattices and random networks [18]. For any given time *t*, lean complex cyber–physical networks can be formally described by time-dependent graphs Ω(*t*) = [*N*(*t*); *E*(*t*)] that can be understood as lists of *N*(*t*) nodes and *E*(*t*) ⊂ (*N*(*t*)*xN*(*t*)) edges that represent its human and cyber–physical nodes, and its standard communication edges [19]. Given the static graph in *t*, Ω(*t*), each node and edge can be characterized by a series of typically two-dimensional signals *x* = [*x*1, . . . , *xn*] ∈ (R*nx*R*m*), where *n* relevant parameters of the node or axis are described as the time series of *m* elements. In the case of nodes, signals typically represent demographic, sociological, or competence information. In case that the nodes are human, and in the case of a cyber–physical node, relevant information on the state of the cyber–physical node expressed in time series of several key performance indicators. In the case of edges, signals typically represent information referring to the quality of measurable relationships of the individual with other stakeholders of the organization; in the case of human–human or cyber–physical-to-human edges, of the time series associated with relevant key performance indicators being reported to other stakeholders. Specifically, snapshots for the time-dependent graph can be built, that is, the time-dependent graph is considered as an ordered pair of potentially different sets. A time-dependent graph considered as a sequence of static graphs is given by Expression 1.

$$\Omega = \left[ \Omega(t\_1), \Omega(t\_2), \dots, \Omega(t\_k) \right] \tag{1}$$

This method is most commonly used for modeling discrete time-dependent graphs, and is suitable for the time-dependent graph with a specific time structure, especially in real-time networks such as complex-networked cyber–physical systems [20]. This modelling method is assumed here, and the time sequence of static graphs is not explicitly mentioned when referring to time-dependent graphs.

As a consequence of these references, it can be stated that cyber–physical complex-networked lean-management systems in an Industry 4.0 context can be understood as management systems that systematically try to reduce the intrinsic variability of industrial value-creation processes by understanding them as complex networks of computational and physical elements.

#### **3. Related Work**

Within this framework, the work approaches the interpretation of strategic information contained in Industry 4.0 cyber–physical complex-networked lean-management systems from two main vectors: social and technical strategic organizational design complexity. As shown in the research overview in Table 1, these two research directions have been intensively examined at three (micro-, meso-, and macroscopic) levels of complexity. A better visualization of these organizational levels is in the graphical abstract of Figure 1 for clarity purposes, but it should be noted that this classification is purely synthetic; in reality, cyber–physical systems in an Industry 4.0 context present the continuous complexification of networks arranged in nested hierarchies. This by no means suggests that one level of aggregated complexity is more difficult to deal with than a less aggregated one. In fact, the opposite is often true. For example, in the study of value-creating cyber–physical systems, the study of shop-floor management has been done for decades with almost solely qualitative methods and common sense [21–23]. Deep learning has been recently used to extract statistical patterns from cyber–physical systems at certain microscopic local levels [24,25]; however, there is an urgent need for algorithms to be developed that ensure a holistic understanding of cyber–physical systems at the meso- and macroscopic level of complex-networked aggregation.

**Figure 1.** Macroscopic, mesoscopic, and microscopic levels of organizational sociotechnical complexity.


**Table 1.** Research overview.

Subsequently, a research hypothesis can be formulated. Due to the high potential shown by deep learning in a wide range of applications, we could hypothesise that deep learning can be used to find patterns within Industry 4.0 lean-management complex-networked cyber–physical systems, which takes us to the concept of geometric deep lean learning. The analogy of networks proposed in this work, as well as the global analysis of the evolving networks and, through the geometric deep lean learning of the local relations between agents, provide an adequate context to establish which data to collect, and how to structure their analysis in a general and systematic way.

Within this context, there are two main resource-organizing classes for integrating deep learning in Industry 4.0 cyber–physical contexts with regard to different assumptions on data acquisition:


Deep-learning algorithms are built by stacking data-processing filters—layers—in deep architectures [5]. These layers extract increasingly accurate representations of the data fed into them through a series of algebraic operations, such as convolution (learning local patterns of feature maps) and pooling (downsampling of feature maps). A key reason for the success of these classical deep-learning applications on time-series, images, or video processing, is on its underlying Euclidean

or gridlike data-structure space. The ability to leverage statistical properties of such data through local statistics is possible because of the shift invariance, local connectivity and the multi-resolution of the dataset. For instance, in a color image, pixels are placed together (shift invariance), present local properties (local connectivity), and present a red–green–blue-layered color structure (multiresolution). The use of convolution and pooling imposes conditions on the dataset while extracting local features shared throughout images that make it suitable for the problem without sacrificing the expressive capacity of the network. In fact, the graph's Laplacian L = *D* − *A* that supports the information contained in the images is constant [70], where *D* and *A* represent the degree and adjacency matrix of the graph, respectively [19]. This allows a series of mass algebraic operations that make the magic of deep learning possible. However, at an organizational level, networks associated with Industry 4.0 lean-management cyber–physical systems are, by definition, dynamic and do not present these characteristics.

The fundamental idea of deep learning is that it is assumed that data to be studied came from the combination of different attributes at multiple hierarchical levels. An important underlining concept in this context is that of the manifold. A manifold can be intuitively understood as locally Euclidean space. Earth, for example, can be understood as a gigantic ellipsoid, but to a human at a point on its surface, it appears to be a plane. In other words, the manifold is an interconnected region: a series of points associated with its surrounding environment. From any of these points, the manifold appears to be locally Euclidean. Formally speaking, differentiable *X* manifold of dimension *d* is a topological space in which each point *x* has an environment that is homeomorphic to a Euclidean space of dimension *d* called tangent space *TxX* [71]. If the manifold is equipped with a Riemannian metric, such as an inner product h· , ·i*Tx<sup>X</sup>* : (*TxX*)*x*(*TxX*) <sup>→</sup> <sup>R</sup>, then the manifold is called a Riemannian manifold. The set of tangent spaces at all points is known as tangent bundle *TX* and is assumed to be smoothly dependent on the *x* position. It is precisely this feature that is exploited by machine-learning algorithms. The condition for this is the implicit assumption that interesting points occur only in a collection of manifolds in directions tangent to the *TX* planes, and with statistically interesting variations happening only when switching manifolds.

In other words, manifolds are topological spaces locally homeomorphic to Euclidean spaces. Complex networks, the object of this study, can be described by complexes of nodes and edges (i.e., triangles) that can be treated as discrete types of manifolds [72]. As has been described before [73–75], these can be understood as manifolds in order to explain the problems related to evolutionary manifolds using the theory of complex evolutionary networks. Specifically, deep learning applied to graphs usually considers these as manifolds; for this reason, we can consider deep lean learning as a manifold learning challenge. In the following sections, the consideration of graphs as manifolds is not geometrically rigorous, and might not be as smooth as previously defined. Classical applications of deep learning to graphs [76] focuses on static networks, but cyber–physical systems represented by complex networks are dynamic in nature, as nodes (both human and cyber–physical) and sociotechnical relations between them are constantly evolving.

For this reason, in order to discover statistical patterns within lean-management cyber–physical systems by means of deep learning, it is necessary to either transform existing data into figures that can be interpreted by classical approaches, or to generalize the concept of deep learning to dynamic networks. The first strategy was successfully implemented by one of the authors [13]. The second strategy follows in the footsteps of geometric deep learning.

Geometric deep learning is an emerging technique to generalize deep-learning models to non-Euclidean domains, such as certain graphs and manifolds [70]. The wide variety of domains in which geometric deep learning has so far been useful can be summarized in four categories:

• Graphwise classification. For instance, in the classification of molecules [77]. In this model, atoms represent the nodes, and chemical bonds are the edges of a graph. Research aims to extract certain features that predict certain properties of the molecule. This is relevant, for instance, to the pharmaceutical companies that are in the business of drug design. Some of these properties are toxicity and water solubility. Given a graph, researchers aim to classify a molecule graph. This is analogous to classical deep-learning-based visual image classification [78].


Existing approaches to implement geometric deep learning can be classified into two broad categories: spectral and local filtering methods.

• Spectral filtering methods.

Spectral filtering methods make use of the spectral eigendecomposition of the Laplacian graph to elegantly mathematically define convolution-like operators. The fundamental limitation of the spectral construction is that it can only be used to single and static domains. This is because filter coefficients are dependent on the eigenvector- and eigenvalue-decomposition basis of the Laplacian graph, which is highly dependent on network architecture [70]. This approach is not suitable for our needs because of the dynamic characteristics of Industry 4.0 lean-management cyber–physical complex systems and their associated complex networks.

• Local filtering methods.

Local filtering methods, on the other hand, are not topology-dependent, fall within the frame of signalling processing on graphs [81], and are more suitable in this setting, in particular, in order to define an operation similar to convolution in this domain [82].

#### **4. Geometric Deep Lean Learning Over Industry 4.0 Lean-Management Complex-Networked Cyber–Physical Systems**

According to Immanuel Kant, a science is not a science until there is a relation to mathematics. Although this characterization is provocative, and few would discuss such absolute numbers today, the implicit main question remains valid: can we find mathematical expressions that explain, process, and learn from network data, especially from complex-networked cyber–physical systems? This question is the motivator of this work, both for its practical and theoretical interest. On the one hand, empirically speaking, the processing of signals on graphs from complex cyber–physical networks has exponential importance due to the unstoppable emergence of technologies such as the IIoT and blockchain. On the other hand, the theoretical field of artificial intelligence constantly needs to develop new algorithms and computational architectures to later allow its practical application.

Applied to the analysis of complex-networked cyber–physical systems in the context of Industry 4.0, this leads to two classes of problem formulations that geometric deep lean learning theoretically solves:

• Strategic organizational design. Performing classical inference problems [76].

Recently, it has been shown that this classification can be considerably improved by using information about the proximity environment [83,84]. Analyzing signals on graph vertices and edges could potentially help to learn inherent structures of the graphs, such as organizational clusters, with better accuracy than that provided by topological information alone—this is a strategic challenge to which organizational design tries to respond.

• Trust and power structures. Learning hidden organizational properties.

Although deep learning has been employed in a wide variety of fields of knowledge, such as modeling social influence [85] and computer vision [86–88], it is important to incorporate knowledge about the domain to be treated in the model. For example, in order to build a deep-learning model for the study of a network of sensors in a cyber–physical system of industry 4.0, it might be useful, in a first approximation, to choose the edge weights of the graph as a decreasing function of the distance between nodes, as this would lead to a smooth graph signal model [89]; however, this would not be suitable for a lean structural network, because adjacency does not necessarily mean similarity [14]. For this reason, the model of the graph to be used can be superimposed on other structures, instead of being a pure unconnected abstraction. In other words, the graph that represents the complex-networked cyber–physical system in an Industry 4.0 context, can be studied from different perspectives, superimposing it to a specific sociotechnical environment that helps to better understand the statistical information that it contains. As a consequence, the integration of these priors is a fundamental challenge for the success of geometric deep lean learning. Some examples are the structures of power or trust between the different actors of an organization that are fundamental variables that influence the success of an organization, but remain elusive, since they often cannot be directly measured. Geometric deep lean learning could be applied to learn these parameters as weights between the nodes of the complex organizational network.

These problems reduce to fitting a time-dependent tensor *A*(*t*), so that Ω(*t* + 1) ≈ *A*(*t*)· Ω(*t*) [90]. The hypothesis underlying this objective is that *x*(*t* + 1) ≈ *A*(*t*)· *x*(*t*) where *A*(*t*) is constant in a window of time. The reason why we can take this assumption as true is that complex networks associated with cyber–physical systems in Industry 4.0 environments do not have very high variability [14]. As a result, a sufficiently small time window can always be found in which the hypothesis is sufficiently true.

Generalizing deep-lean-learning models to dynamic structured data in complex graphs requires a detailed description of the non-Euclidean equivalents to the basic elements of deep learning (convolutional layers and downsampling "pooling"), locally applied to each of the graph elements [70]:

• Convolution on non-Euclidean complex-networked cyber–physical graph time-dependent signals.

As expressed in Expression 1, for weighted time-dependent directed graph Ω(*t*), a series of signals *x* = [*x*(1), . . . , *x*(*n*)] ∈ (R*nx*R*m*) expressed on its human and cyber–physical nodes, and on its standard communication edges, are considered, in which components of *x<sup>a</sup>* reside in or are protruding from node *a*.

For each node, we define a proximity environment given by group *N<sup>a</sup>* = *b* : (*b*, *a*) ⊂ *E* that represents set of nodes *b* connected with *a*. This *N<sup>a</sup>* set is characterized by an R*NxN* matrix *S* called the network-translation matrix operator that defines the manifold metric. We defined *S* as the graph adjacency matrix, the Laplacian of the graph, or any other normalization of it, as a linear transformation to encode the structure of a graph. Without loss of generality, the singularity problem of the adjacency matrix, which is nontrivial, was not considered in this work [91]. As shown in Figure 2, group *N<sup>a</sup>* represents the manifold upon which the convolution acts.

**Figure 2.** Local manifold upon which graph convolution acts.

The Fourier decomposition of graph Ω(*t*) is expressed by *x*ˆ = *U*−<sup>1</sup> · *x*, where *S* = U · Λ · U−<sup>1</sup> and autovalues Λ describe the frequencies of the graph [92]. Now, we can directly filter *x* from the spectral domain by means of function *f* : C → R that allows to compute convolution *z*ˆ = *f*(Λ)· *x*ˆ by means of point-by-point multiplication in the spectral domain between filter *f*(Λ) and the Fourier transform of the graph in *x*. Therefore, by inverting the Fourier transform of the graph, we obtain the extension of the convolutional operation to the non-Euclidean time-dependent graph in Equation (2).

$$z = \mathcal{P}(\mathcal{S}) \ge \text{and} \; \mathcal{P}(\mathcal{S}) = \mathcal{U}f(\Lambda)\mathcal{U}^{-1} \tag{2}$$

The filter operation can be directly described on the node, resulting in an alternative formulation given by Equation (3), where scalar parameter φ*a*,*<sup>b</sup>* is a representation of the information weights coming from neighbour node *b* into or from node *a*.

$$\mathbf{z}\_a = \sum\_{b \in \mathcal{N}\_a \cup a} \phi\_{a,b} \cdot \mathbf{x}\_b \tag{3}$$

Due to the local properties of *S*, *z<sup>a</sup>* can be obtained in the domain of the node through local-information exchange. This means that the initial signal on the node is recursively transformed by *S* a *K* number of times until decomposition is obtained that determines *z<sup>a</sup>* as the convolution between the network filter with a polynomial transfer function and *x<sup>b</sup>* .

By means of the Fourier transform of the network, the screening operation of Equation (3) has the transfer function given by Equation (4):

$$h(\Lambda) = \sum\_{k=0}^{\aleph} \phi\_k \cdot \Lambda^k \tag{4}$$

This filter, based on local-information exchanges, captures information in K-radius proximity from the node representing the depth of the geometric-deep-lean-learning algorithm.

Taking into account this convolutional operation given by Equation (3), we are able to compute the *f*th level feature produced as output of the *l*th layer:

$$\mathbf{y}\_f^l = \sigma^l \cdot \left( \sum\_{\mathcal{g}=1}^{l-1} \mathcal{P}\_{f,\mathcal{g}}^l \cdot y\_{\mathcal{g}}^{l-1} \right) \tag{5}$$

where:


Now, we simply combine two cases to model the mechanism of a convolutional network applied to a non-Euclidean graph in each time slot: the case in which edges vary, and that in which nodes vary. This can be combined into a single expression to describe P*<sup>l</sup> f*,*g* given by Equation (6):

$$\mathcal{P}\_{f,\mathcal{g}}^{l}(\mathcal{S}) = \sum\_{k=1}^{K} \mathcal{Y}\_{f,\mathcal{g}}^{l,(k:1)} + \sum\_{k=0}^{K} \left( \prod\_{m=0}^{k} \mathcal{Y}\_{d}^{(m)} + \phi\_{k} \cdot \boldsymbol{\tau}^{k} \right) \tag{6}$$

where:


	- \* *d* ⊂ *E* is a special set of nodes (i.e., nodes with a degree above a certain threshold, nodes with a certain level of hierarchy in the organization, or any other relevant feature),
	- \* φ*<sup>k</sup>* ∈ [0, 1] *Nxd* is a binary matrix, and
	- \* τ *k* is a vector describing the node parameters in *d*.

As introduced earlier, downsampling pooling layers in classical deep-learning architectures that extract information from Euclidean domains such as speech, images, or videos typically report the maximal output within rectangular proximity [93]. In this way, it is possible to extract local characteristics that are shared by other areas of the images, thus considerably reducing the number of parameters that the deep network has to learn without sacrificing its learning capacity. Pooling can be described as a progressive coarsening of the graph. A simple way to do this is to collapse edges and reduce the size of the graph through a standard max-pooling operation on the nodes by just taking the maximum of each one of the feature tensors on each of the nodes being coarsened. This can be represented as a binary-tree structure of node indices. These pooling modules on graphs can be inserted between the convolutional modules in order to extract high-level graph representations, and thus be able to perform effective graph classification.

Some alternatives in this field have not been to try to pool the whole network, but different hierarchies of the complex network in order to be able to learn which node groups have similar characteristics [94]. Once these groups are learned, clusters are made, and network pooling is carried out as described above or with an alternative method. This process is repeated for each of the network layers; thus, its classification is obtained. This presupposes, however, prior knowledge of the network structure.

The extraction of shared local characteristics is not possible through this method in time-varying non-Euclidean domains, i.e., complex-networked cyber–physical graphs, because no stationarity or shift invariance can be found within these domains. Wu et al. [95], and Lee et al. [96] provided state-of-the-art surveying overview of this interesting open research question.

#### **5. Conclusions and Management Implications**

Geometric deep lean learning at a strategic level is expected to ensure sustainable organizational growth because customers and suppliers are able to reach new levels of transparency and traceability on the quality and efficiency of the processes, which generates new business opportunities for both, and new products, services, and co-operation opportunities in a cyber–physical environment. In a world of limited resources, increasing business volume can only be achieved by increasing the depth of integrated intelligence capable of successfully handling the emerging complexity in value streams. The future implications of geometric deep lean learning at an organizational level are yet to be fully deployed, but it is expected that the field of analysis of complex-networked cyber–physical systems in Industry 4.0 environments will attract intense attention from both industry and scholars who could develop tools to interpret, classify, and better understand the behavioral patterns of such networks through the application of this very exciting field of artificial intelligence.

Managerial implications of geometric deep lean learning on a mesoscopic level should try to integrate geometric deep lean learning in whole-value-stream processes to substantially improve resource optimization. Geometric deep lean learning at a value-stream level is expected to impact lead time and on-time delivery. At a mesoscopic level, producing only what the customer needs, when they needs it, in the required quality, the integration of deep-learning technologies is expected to not only allow the systematic improvement of complex value chains, but also the better use and exploitation of resources, thus reducing the environmental impact of Industry 4.0 processes. This technology could also be implemented at the customer side to increase defect-detection accuracy on products themselves. Such analyses provide sensitivity about operations and operational conditions, which also impacts value-stream-related efficiency and effectiveness.

The theoretical implications of the application of these geometric-deep-lean-learning models to data embedded within complex-networked datasets support researchers in departing from "crafted" features in modeling machine-learning models when dealing with geometric data. In the context of Industry 4.0 cyber–physical systems, these could be drone-positioning and decision-making algorithms, and the proper interpretation of wearable devices (i.e., physical sensors) on human or cybernetic process owners. Until now, models dealing with such problems required a certain amount of prior knowledge (e.g., the isometric-shape-deformation model), and often did not capture the full complexity and wealth of data. Geometric-deep-lean-learning methodologies could bring a breakthrough to the field and be the first indications of a coming paradigm shift by, for instance, expanding existing social-manufacturing knowledge into unknown territory through the contextual self-organizing of mass-individualization processes under a social-manufacturing paradigm through a cyber–physical–social system approach.

Some of the main potential applications can be clustered four categories:


The data needed to implement these mathematical concepts are enormous and fall within the field of big data. The acquisition of data associated with the cyber–physical systems of Industry 4.0 is costly and of great strategic value to the involved organizations, which is why systems that

increase the confidence of the involved actors and guarantee the security of these IIoT data, as the distributed ledger technology, are essential for the practical application of the exposed concepts. The quality of the obtained data essentially depends on the trust that the various value-creating actors have in each other. Achieving the necessary high degrees of confidence and successfully managing these parameters in an environment of interdependent supplier and customer networks is one of the challenges in the immediate future, and ought to be met by several blockchain and distributed ledger protocols. The Constrained Application Protocol is excellent for use with limited devices and low-power networks, such as those preferred in IIoT. To ensure greater security, applications known by the more important User Datagram Protocol, such as Voice over IP/Session Initiation Protocol, Datagram Transport Layer Security, can be run on User Datagram Protocol instead of Transmission Control Protocol. The Rivest–Shamir–Adleman hybrid algorithm can also be good, with high efficiency, better security and privacy protection, and is suitable for the end-to-end encryption requirements of the future IIoT. Future IIoT research within an Industry 4.0 complex-networked cyber–physical context should focus on, among others, the following characteristics: the open security system, the way in which individual privacy is protected, terminal-security function, and laws related to IIoT security. It is undeniable that IIoT security requires a set of policies, laws, and regulations, and a perfect security-management system for mutual collocation to ensure the success of this exciting and fruitful research endeavor.

**Author Contributions:** For research articles with several authors, a short paragraph specifying their individual contributions must be provided. The following statements should be used conceptualization, J.V.-D., M.M. and J.O.-M.; methodology, J.V.-D.; formal analysis, J.V.-D.; investigation, J.V.-D.; resources, S.S. and D.S.; writing—original draft preparation, J.V.-D.; writing—review and editing, J.V.-D., M.M. and J.O.-M.; visualization, J.V.-D.; supervision, J.V.-D.; project administration, J.V.-D.; funding acquisition, J.V.-D. and W.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was partially funded by the European Commission through the RFCS program, grant number ID 793505. This research was partially funded by the Ministerium für Wissenschaft, Forschung, und Kunst Baden-Württemberg (MWK), Germany, as part of idea competition "Mobility concepts for the emission-free campus 'NaMoCa'." The authors thank the China Scholarship Council (CSC) (no. 201206730038) for the support they provided for this work. The authors would also like to acknowledge the Spanish Agencia Estatal de Investigacion, through research project code RTI2018-094614-B-I00 into the "Programa Estatal de I+D+i Orientada a los Retos de la Sociedad".

**Acknowledgments:** J.V.D. would like to thank Paloma García-Lázaro (RIP), María Jesús Sánchez-Naranjo, and Eva Sánchez-Mañes for their magnificent job as professors of mathematics at the Escuela Técnica Superior de Ingenieros Industriales of the Universidad Politécnica de Madrid, for their inspiration, vitality, and energy in transmitting knowledge. All thanks to you, for teaching me these things that have greatly influenced my life and my work.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:

IIoT Industrial Internet of Things

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### **Deep Learning for Industrial Computer Vision Quality Control in the Printing Industry 4.0**

**Javier Villalba-Diez 1,2,†,**∗ **, Daniel Schmidt 3,4,† , Roman Gevers 3 , Joaquín Ordieres-Meré 4 , Martin Buchwitz <sup>5</sup> and Wanja Wellbrock 1**


#### Received: 9 July 2019; Accepted: 13 September 2019; Published: 15 September 2019

**Abstract:** Rapid and accurate industrial inspection to ensure the highest quality standards at a competitive price is one of the biggest challenges in the manufacturing industry. This paper shows an application of how a Deep Learning soft sensor application can be combined with a high-resolution optical quality control camera to increase the accuracy and reduce the cost of an industrial visual inspection process in the Printing Industry 4.0. During the process of producing gravure cylinders, mistakes like holes in the printing cylinder are inevitable. In order to improve the defect detection performance and reduce quality inspection costs by process automation, this paper proposes a deep neural network (DNN) soft sensor that compares the scanned surface to the used engraving file and performs an automatic quality control process by learning features through exposure to training data. The DNN sensor developed achieved a fully *automated classification accuracy rate of 98.4%*. Further research aims to use these results to three ends. Firstly, to predict the amount of errors a cylinder has, to further support the human operation by showing the error probability to the operator, and finally to decide autonomously about product quality without human involvement.

**Keywords:** soft sensors; industrial optical quality inspection; deep learning; artificial vision

#### **1. Introduction**

Countries aspiring to lead these technological changes and remain in industrial leadership positions have strategically positioned themselves for the new type of cyber–physical infrastructure that will emerge from the Industrial Internet of Things (IIoT) and data science. Germany's Industry 4.0 framework has evolved into a pan-European collaborative effort to perform intelligent automation at scale [1]. In a similar move, the United States launched the Manufacturing Leadership Coalition (SMLC) [2] in 2011. Other notable examples include "China Manufacturing 2025" [3] that seeks to elevate advanced manufacturing technology, or Japanese's "Society 5.0" [4] with a holistic focus on the safety and well-being of humans through cyber–physical systems. As a paradigmatic example, the Japanese manufacturer has consistently gained a competitive edge towards its competition by providing its value stream elements with the ability not to pass defects to the next step in the manufacturing process [5].

A prime example of this is the remarkable success of Toyota's implementation of intelligent autonomation, or JIDOKA -自働化- [6–8], alongside other strategic Lean manufacturing system characteristics [9–14]. Thanks to the availability of sufficient data from virtually any element of the production process (through IIoT for example), and the development of computational elements powerful enough to perform real time calculations on the state of the value stream, the systematic extension of JIDOKA in the industry has been made possible [15]. In fact, there is great potential for other industries to increase the ability of machines to recognize their own state through intelligent sensors capable of *sensing* the specific needs of customers and *responding* flexibly and accordingly. This would improve the level of automation and increase product quality and customization while increasing related value stream performance [16–18] .

Within this framework, Optical Quality Control (OQC) is crucial to many manufacturing processes in an effort to meet customer requirements [19]. On the one hand, the performance of human-centered OQC does not meet the necessary requirements: it is limited by ergonomics and cost, as humans get tired with repetitive OQC tasks and these tasks are usually very labor-intensive. For this reason, automatic detection of visual defects, which aims to segment possible defective areas of a product image and subsequently classify them into defect categories, emerges as a necessary solution to the problem. On the other hand, simple threshold techniques are often insufficient to segment background defects when not applied to a controlled environment characterized by stable lighting conditions. Xie [20] provides a classification of existing methods, but the common practice in industrial environments is that each new feature has to be described manually by experts when a new type of problem occurs: surface defects in industrially manufactured products can have all kinds of sizes, shapes or orientations. These methods are often not valid when applied to real surfaces with rough textures, complex, or noisy sensor data. This has the immediate consequence that classifications are almost always insufficient and cannot be generalized to unknown problems [21]. For these reasons, more robust and reliable results are needed in the detection of defects by more sophisticated methods.

The printing industry underwent an enormous transformation through the digital revolution when inkjet reached a mature era. Inkjet printing is based on the formation of small liquid droplets to transfer precise amounts of material to a substrate under digital control. Inkjet technology is becoming relatively mature and is of great industrial interest due to its flexibility for graphic printing and its potential use in less conventional applications such as additive manufacturing and the manufacture of printed electronics and other functional devices. Its advantages over conventional printing processes are numerous. For instance, it produces little or not waste, it versatile thanks to different processes, it is non-contact, and does not require a master template which means printing patterns can be easily changed. However, the technology needs to be developed in order to be used in new applications such as additive manufacturing (3D printing).

Laser engraving of gravure cylinders (Figure 1) is the latest and most exciting development in gravure printing. Laser technology makes it possible to produce cells with variable shapes, which is not possible with electromechanical engraving. These new shapes actually provide a higher print density and it is possible to use inks with a higher viscosity than conventional electromechanically engraved cylinders. Laser engraved cylinders also reduce the influence of print speed on print quality and keep the highlight tone values stable.

**Figure 1.** Printing Cylinder.

Although laser engraving of rotogravure cylinders is a new variant of etching rotogravure cylinders in the rotogravure market, today's systems are still susceptible to errors. Possible errors or optical detectable defects include dents, scratches, inclusions, spray, curves, offset, smearing and excessive, pale or missing printing or color errors (i.e., incorrect colors, gradients and color deviations from the desired pattern). The most common errors is dents, 32%, while the least common error is smearing, 3%. Due to the different errors and noise levels typical of industrial settings, an automatic error detection based on classical computer vision algorithms was not possible [22]. Most systems aim to select potential faults and present them to the human expert responsible for deciding the presence or severity of faults. Practice shows that about 30% of the possible errors that need to be checked are not relevant. This fact increases both the costs associated with the OQC and the lead time of the overall process. Both factors are crucial to achieving customer confidence and must be systematically optimized.

Bearing these issues in mind, this research delves into an alternative solution to overcome the problem of the need of manual predetermination of the specific characteristics for each new inspection problem: deep learning-based deep neural networks (DNN). Deep learning is a paradigm of machine learning that enables computational models consisting of multiple processing layers to learn representations of data with multiple levels of abstraction [23,24]. DNN are constructions created by combining a series of hierarchically superimposed and arbitrarily initialized filters that are capable of automatically learning the best features for a given classification problem due to exposure to training data [25,26]. Several DNN architectures have been successfully used to extract statistical information from physical sensors in the context of Industry 4.0 in several applications such as classification problems [27], visual object recognition [23], human activity recognition through wearables [28,29], predictive maintenance [30,31], or computer vision [32] among others. More specifically, DNN have recently proved useful for industrial computer OQC defect detection purposes with promising results by automatically extracting useful features with little to no prior knowledge about the images [33,34].

The goal of this paper is to present a soft sensor DNN that performs a *classification* of images from high-resolution cameras towards a fully computer vision OQC of the printing cylinder of a global leading player in the Printing Industry 4.0. As shown in detail in Section 3, this aims to increase the accuracy of the quality inspection process by first supporting the human expert final decision making, thereby reducing the cost of quality inspection process through automatization of the visual processing. This ought to be contextualized in a hostile industrial context in which the complexity of error detection is very high due both to the extraordinary variability of possible errors, as well as the changing environmental conditions of light, moisture, dirt, and pollution - all of which can confuse the best algorithms developed thus far.

The rest of the paper is structured to ensure clarity in the presentation, replication of the results obtained, and a proper framing in the ongoing global context of the fourth industrial revolution. Firstly, Section 2 briefly shows the continuous improvement of the manufacturing value stream of an Industry 4.0 leader that made the integration of deep learning technology possible. Secondly, Section 3 outlines the *materials and methods* used to design and implement a better performing OQC integrated DNN soft sensor. Additionally, DNN computer Code is made available on an Open Access Repository. Next, the *results* obtained are briefly discussed from a technical point of view in Section 4. Finally, in Section 5 the short, medium and long term *consequences* of these findings for the printing industry are discussed and highlighted in a broader manufacturing Industry 4.0 context.

#### **2. Evolution towards Automatic Deep Learning-Based OQC**

In order to frame this research in a more general context and allow its replication in other value streams, it is important to describe the constant process of continuous improvement [35] that a leading player in the printing industry has followed in recent years to reach the level that has allowed the implementation of the presented Deep Learning-based OQC research.

For the purpose of making it easier for interested readers to recognize the fundamental phases of this OQC evolutional continuous improvement process that paved the road for a fully automatized computer vision OQC process have been summarized in Table 1 and is depicted in Figure 2.


**Table 1.** State of the Art.

(**c**) Expert Evaluation and software cLynx

(**a**) Manual Inspection of Printed Product (**b**) Manual Inspection of Monochrome Printed Product

(**d**) Machine scans and software cLynx

**Figure 2.** OQC evolutional continuous improvement process.

**Figure 3.** Cylinder Scan and Layout Engraving File.

**Figure 4.** Example 1 of automatic selection of areas around possible errors.

#### **3. Deep Learning for Industrial Computer Vision Quality Control**

In order to reduce time checking possible mistakes on the cylinder, and further reduce OQC cost and value stream-related lead time, an automatic pre-selection of the errors using artificial intelligence is desired. Due to intensive research investment and strategic focus on quality control throughout the value stream process, real noisy industrial data has been classified and properly labelled. This is how the idea was born to design a DNN that would learn from the statistical information embedded within the previously classified data to perform a fully automated computer vision quality control.

Due to intensive research investment and strategic focus on quality control throughout the value stream process, there were previously numerous classified and properly labeled data aggregated through fourth stage. Possible errors were selected using thresholds between the original file and the scanned cylinder. These were then shown to the operator, who judged them as if they were real errors. These judgements were then saved comprising the labeled data-set.

In the fifth stage the process is taken over by a fully automated DNN architecture, as shown in Figure 5, and as proposed in this paper (see Section 3.1.3), after an intensive experimental program, which has tested different architectures (DNN, restricted boltzmann machines, deep belief networks, etc.) and configurations of different filter sizes, abstraction layers, etc. [37].

*The DNN soft sensor presented achieves an accuracy of 98.4% in fully automatic recognition of production errors.* More details are provided in the following subsections. This contribution makes it possible to decide immediately after scanning whether the cylinder can be delivered or whether errors need to be corrected. It was decided not to use specific denoising treatments as specific filters before classification [38,39]. This is because of the intrinsic capabilities found in the adopted CNN architecture.

**Figure 5.** Deep Learning Architecture for Industrial Computer Vision OQC in the Printing Industry 4.0.

#### *3.1. Deep Neural Network Architecture for Computer Vision in Industrial Quality Control in the Printing Industry 4.0*

#### 3.1.1. Experimental Setup

The experiments in this study were implemented with a computer equipped with an Intel(R) Xeon(R) Gold 6154 3.00GHz CPU and an NVIDIA Quadro P4000 Graphic Process Unit (GPU) with 96 GB of random-access memory (RAM). The operating system was *Red Hat Linux* 16.04 64-bit version.

The deep learning model training and testing were conducted with *Keras* which is an interface for *TensorFlow* (Version 1.8), and the model was built in *Python* (Version 2.7) language [40]. TensorFlow is an interface for expressing machine learning algorithms, and an application for executing such algorithms, including training and inference algorithms for DNN models. More specifically, the TF.Learn module of TensorFlow was adopted for creating, configuring, training, and evaluating the DNN. TF.Learn is a high-level Python module for distributed machine learning inside TensorFlow. It integrates a wide range of state-of-the-art machine learning algorithms built on top of TensorFlow's low-level APIs for small- to large-scale supervised and unsupervised problems. Additional Python interfaces were used: *OpenCV* for computer vision algorithms and image processing, *Numpy* for scientific computing and array calculation, and *Matplotlib* for displaying plots. The details of building the DNN model for OQC with Python are provided online at Open Access Repository and were created with *Jupyter Notebook*.

#### 3.1.2. Data Pre-processing

In order to train the DNN, standardized classified input data is needed. For this reason, the Data pre-processing is divided in three steps: (1) decision of which is the size of the image that serves as input for the DNN and what the size of the convolutional window used by the DNN should be, (2) brightness adjustment through a histogram stretching, and (3) automatize the selection and labelling of the file structure to be fed to the DNN.

#### 1. Image Size for DNN Input and Convolutional Window Size

Due to the need for standardized input data, a decision needs to be made about which dimensions the input images should have. The first decision is the aspect ratio. The following decision should be how many pixels wide and high the input images should be. In order to get a first impression of the existing sizes, a short analysis of the previous manually confirmed errors is made. According to the data, the mean value of the width is slightly higher than that of the height. In the mean aspect ratio this gets even clearer with a mean aspect ratio of about 1.5. This is probably a result of some errors that are elongated by the rotation of the cylinder. The median aspect ratio is exactly at 1.0. Because the median describes a higher percentage of errors better this should also be the aspect ratio of the neural network input. As shown in the representation of the width and height of error in pixel against the LOG of the amount of errors Figure 6.

As the size of the error also plays a role in the judgment of the errors, scaling operations should be reduced to a minimum. Due to the range of the sizes this is not always possible. The training time of the neural network would increase dramatically with large input sizes and small errors would mostly consist of *OK*-cylinder surface. Therefore a middle ground is needed so that most input images can be shown without much scaling or added *OK*-cylinder surface. A size in the middle would be 100 pixels. We therefore calculate the percentage of errors with the width smaller or equal to 100. The results show that about 90% of all errors have both the height and width below or equal to 100 and almost 74% have both the height and width below or equal to 10. One option would be to use an input size of 100 × 100.

(**b**) Height of errors vs. LOG Number of errors **Figure 6.** Aspect Ratio Inspection.

2. Brightness Adjustment

To get comparable data for all cylinder images, pre-processing is needed and is performed on the complete scan of a cylinder. From this scan multiple examples are taken. Because there can be slight deviations due to many influences during the recording of the cylinder surface, this can only be achieved by having a similar brightness for the cylinder surface and engraved parts. Another important point is that no essential information gets lost from the images and, that the brightness between the engraved and not engraved parts are comparable for all cylinder scans. Therefore a brightness stretch is needed but only few pixels are allowed to become the darkest or brightest pixels. Notwithstanding, the amount of pixel that become the darkest and brightest pixels ca not be set to a very low value because noise in the image data would result in big differences. In conclusion a low percentage of the pixels should be set as darkest and brightest. For example, the lowest and the highest percentage should each have a maximum of 0.5%. Figure 7 shows a stretching example for brightness adjustment for one image so that 0.5% of all pixels will have a value of 0 and 0.5% of all pixels will have the value of 255.

(**b**) Histogram after stretching **Figure 7.** Pre-processing Histogram for brightness adjustment.

### 3. Automatic selection and Dataset Labelling

To simplify the later steps, the images need to be cut from the original file and saved into two folders with examples that are *OK*-cylinder (Figure 8a) and examples that are *not-OK*-cylinder (Figure 8b). The great variety of patterns presented in the spectrum can be observed in the figures. The very nature of the process implies that each new product represents a new challenge for DNN, as it has probably never before been confronted with these images. For this reason, the errors may be of a very different nature. This implies a high complexity of solving the challenge of training and testing the DNN. Likewise, the different shades of black and grey, very difficult to appreciate with the naked eye when manually sorting the images, represent an added difficulty that must be resolved by DNN architecture.

If errors are smaller in width or height than 100, the ROI gets increased to 100. If any size is bigger than 100 pixels is ignored. For the purpose of checking later on, the big input data is split into

100 × 100 parts. If any one of these is detected as an error, all are marked as an error. As shown in the Open Access Repository, there are multiple possible ways to handle the bigger data. Every example also has the actual and target data. There are different ways of using this data as input. One way is just using the actual data. A different option is to use the difference between the actual and expected data. The problem in both cases is that information gets lost. Better results have been achieved by using the differences. These get adjusted, so that the input data is in a range from [−1,1]. Once this is performed, and because a balanced dataset is important to train the neural network and the *OK*-cylinder examples far outnumber the *not-OK*-cylinder examples, an *OK*-cylinder example is only saved if a *not-OK*-cylinder example has been found previously.

#### 3.1.3. Automatic Detection of Cylinder ErrorsUsing a DNN Soft Sensor

The DNN soft sensor architecture design is performed with two main goals in mind: classification and performance:


After data acquisition and pre-processing, the input data of the DNN are figures represented as tensors. A type of network that performs well on the classification problem of such data is usually divided in two main parts: feature extractors and classifiers as shown in Figure 5:

	- **–** *Convolution and ReLu (rectified linear unit) activated convolutional layers*. Convolution operations, by means of activation functions, extract the features from the input information which are propagated to deeper level layers. A *ReLu* activation function is a function meant to zero out negative values. The *ReLu* activation function was first presented in AlexNet [42] and solves the vanishing gradient problem for training DNN.
	- **–** *Max pooling*. Consists of extracting windows from the input feature maps and outputting the max value of each channel. It's conceptually similar to convolution, except that instead of transforming local patches via a learned linear transformation (the convolution kernel), they are transformed via a max tensor operation.
	- **–** Fully connected activation layers output a probability distribution over the output classes [25]. Because we are facing a binary classification problem and the output of our network is a probability, it is best to use the binary-crossentropy loss function. Crossentropy is a

quantity from the field of Information Theory that measures the distance between probability distributions or, in this case, between the ground-truth distribution and the predictions. It is not the only viable choice: we could use, for instance, mean-squared-error. However, crossentropy is usually the best choice when dealing with models that output probabilities. Because we are *attacking* a binary-classification problem, we end the network with a single unit (a Dense layer of size 1) and a sigmoid activation. This unit will encode the probability that the network is looking at one class or the other [25].

(**b**) not-OK cylinder Images **Figure 8.** Examples of OK cylinder and not-OK cylinder Images.

As shown in the Open Access Repository, using Keras, Tensorflow backend for the DNN and OpenCV/Numpy for the image manipulation, a balanced dataset of 13,335 *not-OK*- and 13335 *OK*-cylinder examples is used, giving a total of 26,670. These were collected over a period of 14 months from almost 4000 cylinder scans. The training part is mirrored vertically and horizontally resulting in 85,344 training samples in total. All *not-OK*- cylinder examples are labeled *0* and all Ok examples are labeled *1*. As a standard procedure, the data is split into *training dataset* (80%), *testing dataset* (10%) and *validation dataset* (10%). The *training dataset* is used to train the DNN throughout an number of epochs as shown in Figure 9. It can be observed that both accuracy and loss do not increase or decrease significantly after epoch number 10.

(**a**) DNN Model Training Accuracy (**b**) DNN Model Training Loss **Figure 9.** DNN Training and Testing Results.

The *testing dataset* is subsequently used to test DNN performance. The confusion matrix is a standard procedure to summarize the results of such a training by typically combining contingency classes (*TRUE*, *FALSE*) and (*OK*, *not-OK*), hence building four categories: (1) True Negative (*TN*), which is an error and has been predicted as an error; (2) False Positive (*FP*), which is an error but has not been predicted as an error, and is by far the most damaging category; False Negative (*FN*) which is not an error but has been predicted as an error; and (4), True Positive (*TP*) which is not an error and has not been predicted as an error. Specifically, given the balanced dataset chosen, the accuracy (ACC) delivered by the DNN soft sensor, defined by the expression *ACC* = (*TP* + *TN*)/(*TP* + *TN* + *FP* + *FN*), is 98.4%. The *TN* rate is 97.85%, the *TP* rate is 99.01%, the *FN* rate is 2.15% and the *FP* rate is 0.99%. These levels of *ACC* can be considered acceptable for such a complicated industrial classification problem. The results are summarized in Figure 10.

**Figure 10.** DNN Model Testing Confusion Matrix.

In Table 2 the DNN architecture shown in Figure 5 is described layer by layer by outlining the rationale behind the choice of a layer rather than another. Going even further, to compare the performance of the proposed soft DNN sensor, it has been compared with three similar architectures. The result of this comparison is shown in Open Access Repository and summarized in Figure 11 in which it is clearly shown that the proposed DNN soft sensor has superior performance to other alternative architectures.

**Figure 11.** Deep Learning Architecture Comparison. Time to Train vs. Accuracy.

Two parameters, accuracy and computational time, have been measured consistently with the same training and test set, and then compared. First, it has been tested with an identical architecture by adding a dropout, then it has been tested with a deeper architecture and finally with a more shallow DNN with fewer layers. The accuracy should be as high as possible in order to generate the lowest possible error in data characterization, and the computation time should be as low as possible in order to ensure that the soft DNN sensor can be effectively integrated into an Industry 4.0 environment, thus ensuring maximum effectiveness and efficiency respectively. A smooth DNN sensor must be not only accurate but also fast to ensure, among other things, a minimum Lead Time impact on the value creation process and low CO<sup>2</sup> emissions derived from the energy consumption associated with the computation.


**Table 2.** DNN Architecture Detailed Description.

(**k**) 12th Max Pooling-4 **Figure 12.** Visualization of all DNN layers as color-coded images of a *TN* image.

#### 3.1.4. Visualizing the Learned Features

Experience has shown that visualizing what each of the DNN layers learns can help deep architecture designers improve their understanding of the learning of the DNN hidden layers and thus support an appropriate fine tuning of their design for improvement purposes. This is because visualizing what the DNN has learned can help in the understanding of the decision making process. There are different ways of visualizing what has been learned by showing different parts. These can make it easier to understand why some things do not work as expected. For example why some pictures with errors were not categorized as errors (FP).

This visualization can be performed in different ways. For instance, given an example image of a *not-OK* cylinder shown in Figure 13a, an option is to visualize what the DNN captures using class activation heatmaps. A class activation heatmap is a 2D grid of scores associated with a specific output class, computed for every location in any input image, indicating how important each location is with respect to the class under consideration. An example is shown in Figure 13b.

(**a**) Example Image of Error in *not-OK* cylinder (**b**) Activation Heatmap of Error in *not-OK* cylinder

**Figure 13.** Example Image *not-OK*-cylinder and Activation Heatmap

Another option is to calculate an input image that gets the highest response from a layer. This is done by displaying the visual pattern that each filter is meant to respond to. This can be done with gradient ascent in input space: applying gradient descent to the value of the input image of a convolutional network so as to maximize the response of a specific filter, starting from a blank input image. The resulting input image will be one that the chosen filter is maximally responsive to. An example is shown in Figure 14.

**Figure 14.** Most Responding Input.

Finally, an alternative approach would be to show the outputs of all DNN layers as color-coded images. Visualizing intermediate activations consists of displaying the feature maps that are output by various convolution and pooling layers in a network, given a certain input (the output of a layer is often called its activation, the output of the activation function). This gives a view into how an input is decomposed into the different filters learned by the network. We want to visualize feature maps with three dimensions: width, height, and depth (channels). Each channel encodes relatively independent features, so the proper way to visualize these feature maps is by independently plotting the contents of every channel as a 2D image. For explanatory purposes, on the Open Access Repository, four different examples, *TP*-*TN*-*FP*-*FN*, of such feature maps are depicted. These shall help the reader better understand what the DNN *sees* and how it *responds* in different circumstances. One of these examples, *TN*, is visualized in Figure 12.

#### **4. Results and Discussion**

Due to the automation by means of the soft DNN sensor, the costs associated with OQC could be drastically reduced. Also, the accuracy of error detection increased considerably. The results can be therefore considered *very promising* and allow for different ways of further industrial implementation. However, these results have to be interpreted in a broad context of Industry 4.0. This section provides some essential aspects that will help to understand and contextualize the contributed results through a meta-discussion at various organizational levels. This will help to present in the next section a possible future strategic development of these *deep technologies* in the short, medium and long term.

There are different steps that have to be taken until the full potential can be used in the production without taking a too high risk of missing an error.

1. Using the DNN fully automate OQC classification to predict the amount of errors a cylinder has.

The DNN *only* provides a successful result 98.4% of the time. To be sure that the wrongly classified images are not big mistakes, human experts will review all possible errors. DNN has already had a positive influence on the workflow, as we know how many errors are very likely an error: DNN helps significantly in the planning of the next workflow step because it is known with a high probability if the cylinder needs to go to the correction department or if it is very likely that the product is an *OK*-cylinder.

2. Showing the error probability to the operator that is currently deciding if it is an error or if it is not.

This gives a hint to the operator, who can give feedback if there are relevant mistakes that were not predicted as mistakes. This can also help the operator to reduce the likelihood of missing an error. Once this soft sensor was integrated in production, OQC productivity, measured in hours per unit - time an operator spends in the OQC -, dramatically increased by *210%* as decision about defects is made in an automatic way.

3. Only showing possible errors that have been predicted by the DNN.

In the last step, the DNN could completely filter out errors that are not relevant. This can also be used in multiple steps because it is possible to increase the threshold error probability for the possible error to be shown. At some point a threshold will have to be chosen taking into consideration the cost of checking a possible error and the cost of missing a error. This would completely eliminate the step of checking the errors and the confirmed errors would only be checked by the correction department.

#### **5. Conclusions and Future Steps of Deep Learning in a Printing Industry 4.0 Context.**

Although there has been an immediate performance increase in OQC error detection accuracy and cost effectiveness, larger scope for improvement is down to the managerial dimension of such a sensor. This is because it can be expanded to not only detect defects but also to classify them in categories. Although this requires additional effort, it will enable the cause-effect analysis regarding manufacturing conditions and defect frequencies.

Some of these efforts can be specifically targeted to achieve an improvement in the accuracy of the model. For example learning from the false predictions: to further improve the correct prediction rate it is important to take a look at the examples that have not been predicted correctly. This could potentially improve the understanding why the wrong prediction was made by the DNN:


(**b**) *FN*. Is not an error but has been predicted as such.

**Figure 15.** Examples of *FP* and *FN* Images.

This technology could also be implemented at the customer side to increase defect detection accuracy on the printed product itself. This strategic step is currently being discussed internally. Such analyses will provide sensitivity about operations and operational conditions, which will also impact in value stream-related efficiency and effectiveness.

These aspects will probably be the next steps in further research actions to be developed within an Industry 4.0 context. For instance, deep learning applied to manufacturing Industry 4.0 technology will have an impact at various levels of aggregation in the printing manufacturing value chains:

1. Deep Learning at a shopfloor level shall impact quality, reliability and cost.

At the shopfloor level, this paper has shown an example of how deep learning increases the effectiveness and efficiency of process control aimed at achieving better quality (e.g., with OQC) and lower costs, allowing self-correction of processes by means of shorter and more accurate quality feedback loops. This intelligence integrated in the value streams will allow many humans and machines to co-exist in a way in which artificial intelligence will complement in many aspects. In the future, significant challenges will still be encountered in the generation and collection of data from the shopfloor.

The main challenge towards a fully automated solution is currently getting the Python DNN integrated into the C++ cLynx program. After this is successfully completed, a testing phase with the cLynx users is planned. If the results are satisfactory, the complete automatic process will be started. If the results are not satisfying, further steps have to be taken so as to improve the DNN further.

2. Deep Learning at a supply chain level shall impact lead time and on-time delivery.

At a higher level of supply chain, producing only what the customer needs, when it needs it, in the required quality, the integration of deep learning technology will allow not only the systematic improvement of complex value chains, but a better use and exploitation of resources, thus reducing the environmental impact of industrial processes 4.0.

3. Deep Learning at a strategic level shall impact sustainable growth.

At a more strategic level, customers and suppliers will be able to reach new levels of transparency and traceability on the quality and efficiency of the processes, which will generate new business opportunities for both, generating new products and services and cooperation opportunities in a cyber–physical environment. In a world of limited resources, increasing business volume can only be achieved by increasing the depth of integrated intelligence capable of successfully handling the emerging complexity in value streams.

To summarize, despite the "black box problem" and the challenge to have enough information and labeled data available for learning, Deep Learning will probably conquer in the field of machine vision, one country after another, and will act in the background without the user being aware of it. The role that Deep Learning will play in the creation of cyber–physical systems will be adopted from a strategic point of view, in which business leaders will tend to think of deep architectures as possible solutions to problems.

**Author Contributions:** Conceptualization, J.V.-D., D.S., J.O.-M. and R.G.; methodology, J.V.-D. and D.S.; software, J.V.-D. and D.S.; validation, J.V.-D., D.S., R.G. and J.O.-M.; formal analysis, J.V.-D. and D.S.; investigation, J.V.-D.; resources, D.S.; data curation, D.S.; writing–original draft preparation, J.V.-D. and D.S.; writing–review and editing, J.V.-D., D.S., M.B. and W.W.; visualization, J.V.-D. and D.S.; supervision, J.V.-D. and D.S.; project administration, J.V.-D. and D.S.; funding acquisition, J.V.-D., J.O.-M. and R.G.

**Funding:** Authors would like to to recognise the support obtained from the EU RFCS program through project number 793505'A'Ÿ4.0 Lean system 440 integrating workers and processes (WISEST)' ˘ A' ˘ Z. ´

**Acknowledgments:** This research was partially supported by Matthews International GmbH, Gutenbergstrasse 1-3, 48691 Vreden. We specially thank Stephan Lammers and Tomas Sterkenburgh for comments on an earlier version of the manuscript. We also thank Oliver Lenzen and Daniela Ludin from Hochschule Heilbronn, and Martin Molina from Universidad Politécnica de Madrid for their institutional support.

#### **Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*
