**1. Introduction**

The bakery trade has experienced significant changes in recent years. While the amount of sales locations in Germany remained at a constant level, the diversity of bakeries decreased significantly [1]. That effect reflects a sharply increasing branch network structure of the whole industry, which is directly related to an increase of fermentation and baking at the sales location. Thereby, the bakeries compete directly with the fast food economy of McDonald's and Burger King [2]. A bread manufacturing process of branch network structures roughly follows the steps that are illustrated in Figure 1, whereby the primary products are produced at a central production bakery and then transported to the branches.

In detail, the preparation of the dough represents the start of the bread manufacturing process chain, where the raw materials are mixed up and finished dough is divided into rations. In the next step, the shape of the desired product is given to each ration, followed by its cooling down to approximately −4 ◦C, which interrupts the fermentation. In that way, the dough is prepared for its cold storage and transportation inside of special refrigerated vans to the particular stores. This process is called proofing retardation [3]. In the branches, the fermentation process is initiated at defined conditions (e.g., discrete-time control of warmth and humidity) in special fermentation chambers. Usually, this process step ends after a defined time and the baking step follows, before the final product is produced. The proposed method is applied within the process step 'fermentation' and helps to achieve a higher process quality by using automation techniques.

Chemically, fermentation means the process of the production of carbon dioxide and ethyl alcohol by the transformation of assimilable carbohydrates and amino acids induced by the metabolism of yeast [4]. This process leads to a volume increase of the dough due to the development of gas cells. Figure 2 illustrates the expected volume development of fermenting dough schematically [5]. Herein, the area of optimal mellowness may be identified.

**Figure 2.** Volume development during fermentation (own illustration based on [5]).

Typically, when the volume gradient approaches zero, the optimal fermenting state is reached, which has to be detected ideally by the staff. Traditionally, the staff evaluates the fermenting state of the dough and estimates the perfect time for ending the fermentation phase. That requires long experience, intuition, and time from the staff. However, today there is an unsatisfied demand for skilled workers, which has lead to an increasing number of untrained and inexperienced employees at the stores. The result is a fixed standard time for the dough, which can be easily programmed and helps to get operationalizable processes. Since the main component of the dough is flour and as its fermentation ability is based on the special cultivation and growing conditions of the grain, this component massively affects the fermentation process and leads consequently to sub-optimal product qualities. By assuming a normal distribution, nearly 15% of the sold products have not only sub-optimal product qualities in taste but are also too small.

Currently, a pre-derivation of the flour properties using effective analysis is not feasible. Additionally, changes in the uniform circulation of temperature or humidity caused by the volume increase of the doughs induce an unpredictable change of the process parameters. The detection and counteracting of these impingements are requirements for high quality. In summary, it is generally not possible to reach the optimal fermenting state only by controlling the time and complying with the machine parameters. Computer-vision based-systems have become increasingly reliant on the

production and logistics sector and represent one of the core concepts of the industry 4.0. The usage of image processing and machine learning techniques in the food industry is a relatively new field and offers a vast potential to control processes that are traditionally based on human observation [6]. The following paper covers the detection and control of the temporal volume growth of pieces of dough out of three-dimensional point clouds. The motivation is to create a method that is not only capable of predicting the optimal fermenting state based on the volume gradient but can also serve as the basis for other systems that control a change in the volume of objects. An additional process parameter that is used in some fermentation chambers is the insertion of aerosol mist, which consists of water drops with a diameter of few micrometers. Using that technique, humidity of nearly 100% can be achieved without, unlike with the use of steam, the problem of condensation, which can lead to hygienic risk due to the growth of mold [7]. Our proposed system should be able to perform a proper measurement despite the aerosol, which is not possible for the human eye.

A patent application for the system for the automatic capturing of the fermentation chamber topology and the determination of the volume of dough pieces has been lodged [8].

#### **2. State of the Art**

#### *2.1. Fermentation Monitoring*

Few studies deal with the topic of monitoring the fermentation state of dough pieces. Elmehdi et al. propose a non-destructive method for real-time information gaining of changes in the structure of dough during the fermentation using low-intensity ultrasonic waves [9]. Utilizing that, a correlation between the fermentation time and the ultrasonic velocity and attenuation can be observed. Increasing fermentation time leads to an increasing attenuation due to the density change of the dough, and hence leads to a decreasing speed of the ultrasonic waves. Skaf et al. describe a sensor that generates low-frequency acoustic waves through an oscillating piezoelectric element to validate the kinetics of bread dough during fermentation [10]. An emitter and a receiver are brought into contact with a piece of dough, and the attenuation of the emitted acoustic signal that changes due to the formation and growth of gas bubbles during the fermentation is measured. By means of that method, the influence of different process parameters such as temperature and humidity can be observed. Both of these proposed methods have the disadvantage of being restricted to only one dough sample at a time and hence being inappropriate for the control of a whole batch of fermenting bread without human supervision. Bajd et al. use magnetic resonance microscopy for the continuous control of dough fermentation and bread baking [11]. The proposed method delivers good results for the monitoring of dough pore distribution and dough volume regarding one dough piece but can theoretically be scaled up to more dough pieces at a time laying in one plane. A significant disadvantage is the complex measurement setup and the missing possibility for being used to upgrade existing fermentation chambers. Ivorra et al. propose an optical method of continuous fermentation state monitoring using a 3D vision system composed of a line laser and a camera and installed inside a fermentation chamber [4]. By means of this method, the height and transversal area of only one dough sample can be measured, and thus the fermentation state controlled. Pour-Damanab et al. use a digital imaging method to monitor the dynamic density of dough during fermentation [12]. The fermenting dough is taken out of the fermentation chamber to take a picture with a camera positioned orthogonal to the object. With that method, conclusions about the density by means of the calculated volume of the dough can be drawn. The technique is invasive and not practicable for multiple samples.

A restriction of all existing methods is the limitation to be only applicable in one plane. Standard fermentation chambers consist of multiple layers of metal sheets containing many dough pieces. The monitoring of one sample and even of one layer representatively is not sufficient because parameters like temperature and humidity can vary at different areas within the fermentation environment, leading to a different fermentation mellowness.

#### *2.2. Robust Object Recognition*

The approach used of monitoring multiple layers requires more robust object recognition techniques because the metal surfaces of the fermentation chambers. Several publications deal with object detection and recognition of three-dimensional point clouds. Scholz-Reiter and Thamer present a simulation platform for multi-view sensor fusion of synthetic time-of-flight (ToF) images to serve as the base for following object recognition tasks [13]. The sensor outputs are fused to one three-dimensional point cloud to obtain a suitable field of view. Qi et al. propose a novel structure of a neural network called PointNet, which allows a direct object recognition and segmentation of three-dimensional point clouds [14]. The generation of training data is very time-consuming because the ground-truth data has to be generated manually, which means that every point belonging to a particular object has to be marked as such. In comparison to the mentioned methods, we use an instance segmentation network Mask Region-based Convolutional Neural Network (Mask R-CNN), proposed by He et al. and originally developed for RGB-images [15]. To our knowledge, we are among the first who propose an approach of using this network structure for the instance segmentation of depth images for a tangible application.

Different kinds of shape representations have been developed for object recovery, such as the extruded generalized cylinder [16], the recognition-by-components based on so-called geons [17], or the representation by superquadrics [18,19]. Due to the flexibility and simplicity, superquadrics are currently used most frequently in computer graphics and computer vision. In Thamer and Scholz-Reiter, a method for the segmentation and object recognition of point clouds generated by laser scanners is proposed [20]. The segmentation is based on the local curvature using the surface normal. By fitting superquadrics to the segmented data, the type of the object is detected based on the shape. Vezzani et al. present a method for a superquadric fitting of objects to serve as a base for robot grasping applications and achieve promising results. We chose the approach of superquadric fitting to estimate the shape parameters and hence the volume of detected objects.
