Optimization of Scene and Material Parameters for the Generation of Synthetic Training Datasets for Machine Learning-Based Object Segmentation

Nagel, Malte; Hedrich, Kolja; Melchert, Nils; Hinz, Lennart; Reithmeier, Eduard

doi:10.3390/computers14080341

Open AccessArticle

Optimization of Scene and Material Parameters for the Generation of Synthetic Training Datasets for Machine Learning-Based Object Segmentation

by

Malte Nagel

^*

,

Kolja Hedrich

,

Nils Melchert

,

Lennart Hinz

^*

and

Eduard Reithmeier

Institute of Measurement and Automatic Control, Leibniz University Hannover, An der Universität 1, 30832 Garbsen, Germany

^*

Authors to whom correspondence should be addressed.

Computers 2025, 14(8), 341; https://doi.org/10.3390/computers14080341

Submission received: 11 July 2025 / Revised: 13 August 2025 / Accepted: 15 August 2025 / Published: 21 August 2025

(This article belongs to the Special Issue Operations Research: Trends and Applications)

Download

Browse Figures

Versions Notes

Abstract

Synthetic training data is often essential for neural-network-based segmentation when real datasets are difficult or impossible to obtain. Conventional synthetic data generation relies on manually selecting scene and material parameters. This can lead to poor performance because the optimal parameters are often non-intuitive and depend heavily on the specific use case and on the objects to be segmented. This study proposes a novel, automated optimization pipeline to improve the quality of synthetic datasets for specific object segmentation tasks. Synthetic datasets are generated by varying material and scene parameters with the BlenderProc framework. These parameters are optimized with the Optuna framework to maximize the average precision achieved by models trained on this data and validated using a small real dataset. After initial single-parameter studies and subsequent multidimensional optimization, optimal scene and material parameters are identified for each object. The results demonstrate the potential of this optimization pipeline to produce synthetic training datasets that enhance neural network performance for specific segmentation tasks, offering insights into the critical role of scene design and material selection in synthetic data generation.

Keywords:

synthetic data generation; object detection; parameter optimization; optuna; blenderproc; semantic segmentation

1. Introduction

The rapid development of machine learning and artificial intelligence has led to significant progress in numerous application areas. Neural networks have proven to be powerful tools for automatically performing image processing tasks such as object detection and image segmentation. Extensive research has focused on the architecture of neural networks for object recognition and segmentation [1,2,3]. In particular, several pre-trained model architectures provide standardized network structures and associated weight initializations that can be directly employed. These include, for example, YOLO [4,5], Faster R-CNN [6], or Mask R-CNN [7]. These models can be fine-tuned to a specific task with little computing effort and thus enable the application of neural networks even with limited hardware resources.

However, the success of these networks is highly dependent on the quality and size of the datasets used for their training [8]. When assembling large real training datasets is impractical, synthetic datasets can be produced by rendering 3D models to generate images together with their corresponding annotations [9]. Various approaches have been developed to generate synthetic data for training neural networks. For example, it has been applied to the classification of defects on steel components [10], the detection of surgical instruments in the operating room [11], or the detection of food for pick-and-place applications [12].

The rendering of the synthetic image data can be carried out using a variety of software tools [13]. For example, approaches based on frameworks originally developed for game development such as Unreal Engine or Unity are noteworthy [14,15]. Furthermore, the open-source software Blender, which was developed for 3D modeling and animation, and the BlenderProc framework based on it, offer the possibility of automatically creating and rendering virtual scenes [16,17]. All these approaches offer the advantage that not only can synthetic image data be generated, but the corresponding bounding boxes, segmentation masks, and, if necessary, depth maps and normal vectors can also be generated.

When creating synthetic datasets, all objects must be available as 3D geometries with associated material and texture information. Additionally, the synthetic scene must be defined through parameters including object counts and positions, light source specifications (type, position, and intensity), virtual camera positions, and background or ground plane design. There are two general approaches for selecting those parameters to bridge the gap between the real and synthetic domains. Domain Adaptation aims to select parameters such that the resulting synthetic scenes mimic real application scenes as closely as possible. Domain Randomization, on the other hand, seeks to create synthetic environments so varied that the real scene can be considered simply another instance within this variation [18,19,20].

Current methods for generating synthetic datasets typically rely on manual selection or heuristic randomization of scene and material parameters, informed by domain knowledge or basic grid search strategies [21,22,23]. While these approaches can yield useful datasets, the optimal settings for maximizing neural network performance are often non-intuitive and highly task-dependent. Manual parameter selection thus risks producing suboptimal synthetic data, as shown by recent studies exploring manually varied parameters in object detection and segmentation pipelines [15,24].

To address these limitations, automated parameter-optimization frameworks are required to systematically identify synthetic scene and material parameters that maximize accuracy on real-world data. This study proposes a novel optimization pipeline designed to find optimal parameters tailored to a specific application, including its unique objects and environmental conditions. Notably, this approach requires only a small real dataset including images and annotations, which is used as a validation set within the optimization loop to directly assess the quality of each synthetic dataset configuration.

The next section presents the optimization pipeline developed to identify optimal parameter sets for synthetic data generation and examines its reproducibility. Section 3 details the case study methodology, followed by independent analyses of individual parameters to assess their impact on validation precision. Subsequently, multidimensional optimization is conducted separately for material and scene parameters. The resulting optimal parameter configuration generates a comprehensive training dataset for final evaluation. Result interpretation appears in Section 4, while Section 5 summarizes key findings. Section 6 concludes with recommendations for future research directions.

2. Optimization Pipeline

Building upon the limitations of manual parameter selection in synthetic data generation, this study introduces a systematic approach to identify optimal scene and material parameters for specific segmentation tasks. This section first presents the overall architecture and implementation of the pipeline, followed by analyses of its deterministic behavior under various computational conditions.

2.1. Pipeline Architecture and Implementation

The optimization architecture comprises two major components (see Figure 1): the optimization loop itself, which runs in a dedicated container, and a central database for communicating results between container instances. The containerized approach enables the framework to perform multiple parameter evaluations simultaneously on different machines.

Before beginning the optimization, the parameters and their limits have to be defined. For the purpose of this study, the parameters are separated into material parameters (e.g., realistic/parametric textures, roughness, metallicity, color) and scene parameters (e.g., number of target and distractor objects, position of target and distraction objects, number and intensity of light sources, ground texture, ground plane curvature, position of objects in the scene, object scaling factor, camera position).

Optionally, each parameter can be defined by a normal distribution. The optimizer then chooses the distribution’s mean and standard deviation and samples a concrete value for each scene generation. This enables the optimization algorithm to freely explore both Domain Adaptation and Domain Randomization approaches.

Inside the optimization loop, the open-source optimization framework Optuna [25] loads the already evaluated parameter sets and their corresponding average precision (AP) values and uses them to identify the next set of material and scene parameters to evaluate with the evolutionary optimization algorithm NSGA-II [26].

To generate synthetic datasets, the BlenderProc framework [17] is used, which internally accesses Blender’s Python interface. This framework offers a pipeline for generating datasets (including image data and annotations) for machine learning purposes. Figure 2 shows examples for synthetic images generated with this pipeline.

Subsequently, these datasets are used to train a neural network. The pre-trained image recognition and segmentation network YOLOv8 (see [5]) with the model size m is utilized. For the training, only the synthetic dataset with the selected material and scene parameters is used. Finally, the quality of the parameter set is determined by evaluating the neural network trained with the synthetic dataset on the real annotated dataset.

2.2. Deterministic Behavior

The pipeline for investigating the scene parameters comprises several components (see Figure 1). The following investigations examine whether the results of each step are reproducible. Reproducibility is important in this framework to ensure that the return values of the optimization function (the AP values of the object classes) depend solely on the chosen scene and material parameters.

The influence of the GPU model is investigated using various available models from NVIDIA across different architecture versions. First, three synthetic datasets are generated. Then, training with these datasets is performed on each GPU, and the trained models are evaluated using the real dataset.

The investigations indicate that different GPU models can achieve varying results on the real validation dataset despite using the same training dataset (see Table 1). When multiple training and validation runs are started on the same worker, the results remain unchanged. Likewise, identical results are consistently obtained on different workers that use the same GPU model. This demonstrates that the training step itself is deterministic. Hence, to improve reproducibility, only workers equipped with an RTX 4090 GPU will be used for the following parameter studies.

These findings show that the training process itself is deterministic when the same synthetic training dataset is used. Additionally, it needs to be investigated whether the rendering pipeline produces the same training images (and annotations) for identical scene and material parameters. To ensure reproducibility, the seeds of all random number generators used during scene creation (e.g., for object poses and selection of materials) are fixed.

In this study, the ray-tracing-based Cycles Engine from Blender version 3.3.14 is used for rendering. By rendering the same scene several times, it can be observed that the images generated are generally not exactly the same. The color values differ especially in poorly lit areas of the scene, which indicates that the ray-tracing process itself lacks deterministic behavior. It should be noted that there is ongoing work on the Blender project that aims for a deterministic rendering process [27].

These differences are reduced due to the 8-bit discretization of the brightness values if the number of ray samples is increased manually. However, no practical value could be found that fully resolved the issue. Rendering on the CPU with a single thread can mitigate this effect, but this is impractical due to the associated loss of performance. Consequently, the potential influence of non-determinism in Cycles on subsequent investigations cannot be ruled out.

3. Analysis

In the following sections, the presented framework is used to evaluate the significance of individual material and scene parameters and to identify optimal parameter settings for a specific application.

3.1. Case Study Setup

For each optimization study, the AP scores of the individual object classes are taken directly as the objectives to be maximized. Although the optimization operates directly on these per-class AP values, subsequent sections frequently report only the mean average precision (mAP) for conciseness. Since Optuna supports multi-objective optimization, any number of object classes can be incorporated in practical applications of this pipeline. For simplicity, this case study focuses on two target classes:

The Waterpump class (see Figure 3a) represents a water pump cover. It exhibits distinctive features that facilitate recognition, though these may vary by viewpoint.
The Cap class (see Figure 3b) corresponds to a housing cap. While easily identifiable due to latching tabs, these features become obscured from certain angles, causing the object to resemble a simple cuboid and complicating detection.

For the real dataset, physical versions of these objects were fabricated via 3D printing. To enhance visual diversity, some components were colored, while others were deliberately deformed or had parts removed to simulate manufacturing defects.

The real dataset comprises three distinct image groups, each designed to simulate potential application scenarios of varying difficulty. The first group (27 images) depicts office environments (Figure 4a), where objects rest on similarly colored surfaces (e.g., keyboards or desks) and are partially occluded by target or primitive objects. The second group (28 images) captures components on a conveyor belt (Figure 4b), featuring clear background separation to emulate a standard robotic pick-and-place task. The third group (47 images) shows components placed at varying heights amidst laboratory equipment and utensils, representing a complex laboratory setting. As this dataset serves as the benchmark for evaluating synthetic data effectiveness, the optimization should yield parameter sets that deliver optimal performance specifically for these application domains.

3.2. Synthetic Dataset Size

The time required to evaluate a parameter set is approximately proportional to the number of synthetic training images. To maximize the number of evaluations feasible with the available hardware, it is important to first assess the influence of the synthetic training dataset size on validation precision. For this purpose, a dataset of 400 scenes is generated first. The number of scenes to be used for the training processes

n_{S c e n e s} \in [1, 400]

is then varied. The scenes used for each training process are randomly selected from the generated pool.

Figure 5 indicates that an almost linear correlation between the number of scenes

n_{Scenes}

and the achieved mAP can be approximated for small numbers of scenes (roughly

0 < n \leq 50

). As the number of scenes grows further, the gain in mAP saturates. To balance validation accuracy against processing time, the number of scenes is set to

n_{S c e n e s} = 50

for all subsequent investigations.

3.3. Optimization of Individual Parameters

The framework is first used to individually analyze the influence of several selected parameters. In this way, the influence of a single parameter—or a small group of parameters—on the training accuracy achieved can be analyzed in isolation. These analyses are conducted independently, and the results of one parameter study are not always integrated into subsequent analyses. For this reason, the mAP values obtained can be compared only within a study, not across different studies. In each parameter evaluation, 500 synthetic image–annotation pairs (10 in each of the 50 unique scenes) are generated.

When analyzing the number of objects, both the absolute number of all objects n and the discrete individual object numbers (Waterpump object

n_{W}

, Cap object

n_{C}

, distractor objects

n_{D}

) as well as the share of the target object classes in the total number of objects

r_{T}

and the inter-class ratio r between the Waterpump and Cap objects are considered. The distractor objects have primitive geometric shapes (e.g., cylinders, spheres, squares and cones) whose numbers are given by

\begin{matrix} n_{W} & = r_{T} \cdot r \cdot n \\ n_{C} & = r_{T} \cdot (1 - r) \cdot n \\ n_{D} & = (1 - r_{T}) \cdot n \end{matrix}

3.3.1. Analysis of the Total Object Count

The first analysis investigates the total number of objects

n \in [10, 500]

present in each scene. This total number is divided into

n_{D} = 25 %

distractor objects,

n_{W} = 30 %

Waterpump objects and

n_{C} = 45 %

Cap objects. The remaining material and scene parameters are selected at random.

Figure 6 shows a clear correlation between the total number of objects and the mAP achieved on the real validation dataset. If n is too low, the validation accuracy achieved on the real dataset decreases, as fewer objects are available overall for training the neural network. Excessive object counts reduce mAP, likely due to increased occlusion that hinders complete object visibility. The optimal number of objects is approximately

n \approx 150

.

3.3.2. Analysis of the Number of Primitive Objects

Furthermore, the influence of the primitive distractor objects on the achieved validation accuracy is to be analyzed. For this purpose,

n_{W} = 30

Waterpump objects and

n_{C} = 45

Cap objects are rendered in each of the 500 training datasets. The number of primitive objects

n_{D}

is then sampled from the interval

[0, 100]

. All other material and scene parameters are again selected at random.

The results in Figure 7 show a noisy, but approximately linear decrease of mAP with the number of additional distractor objects. The results indicate that including distractor objects does not improve the dataset quality.

3.3.3. Analysis of the Inter-Class Ratio

The inter-class ratio r between the Waterpump and Cap target classes is further analyzed in the following section. The proportion of target objects in the total objects

n = 100

is

r_{T} = 0.75

. The aim is to identify a Pareto front where the average precision values achieved on the real validation dataset for both object classes are maximized.

Figure 8 shows the AP values for both target object classes as a function of the inter-class ratio r. As expected, for small r, the AP values for the Waterpump class are low and high for the Cap class. Conversely, equivalent behavior can be observed for large r. Furthermore, the AP of the Cap class is lower when the dataset is heavily skewed toward Cap objects compared to more balanced distributions. This effect also occurs for Waterpump objects with a strongly skewed distribution, but is much less pronounced. Most Pareto-optimal solutions are at

r \approx 0.4

. This corresponds to a slightly higher number of Cap objects compared to Waterpump objects.

3.3.4. Analysis of the Material Library Composition

The materials for the objects (target and distractor objects) and the material for the ground plane are randomly selected from a material library comprising 100 different materials. The material library contains both parametric materials (defined by their color, roughness and metallicity) and realistic materials (sourced from the CC-Textures library). The composition of this material library is analyzed using the proportion of realistic materials

r_{c c}

(referred to as CC-Share). The proportion of parametric materials is correspondingly

1 - r_{c c}

. All other parameters are set to random values.

Figure 9 shows that material libraries composed exclusively of parametric or realistic materials achieve lower mAP values than more balanced material libraries. The best precision (apart from outliers) is achieved at

r_{c c} \approx 0.5

. It should be noted that the outliers in the data initially resemble noise effects. On closer inspection, however, it becomes apparent that the data points in the immediate vicinity of an outlier also feature similar AP values. This indicates a systematic cause for the outliers, which point to particularly good or bad

r_{c c}

values.

3.3.5. Analysis of the Camera Distance

In this parameter study, the camera distance to the center of gravity of all objects is examined. The minimum camera distance

r_{\min} \in [0, 1]

and the maximum camera distance

r_{\max} \in [r_{\min}, 2]

are examined.

Figure 10 shows the mAP values obtained for the chosen ranges of

r_{\min}

and

r_{\max}

. It is clear that in the range

r_{\min} > 0.75 m

, only mAP values below

0.4

are achieved. The best results on the real validation data are achieved with the values

r_{\min} \approx 0.2 m

,

r_{\max} \approx 0.8 m

. In detail, the optimal camera distances differ between both target object classes.

3.4. Multi-Parameter Optimizations

The next step is the multidimensional optimization of the parameters already presented as well as some additional parameters, e.g., the ground plane curvature a and the number of light sources

N_{L}

. The optimization is divided into separate scenes and material parameter optimization, reducing the dimensionality of the search space.

First, the multidimensional optimization of the material parameters is carried out. The roughness and metallicity of the parametric materials and the proportion of realistic materials in the material library are considered. A total of 85 trials with distinct parameters are evaluated, and three Pareto-optimal sets are identified.

As shown in Figure 11, the best training results occur when the proportion of realistic materials lies between approximately

0.4

and

0.7

. By contrast, metallicity and roughness show no clear relationship with AP values. This suggests that the proportion of realistic materials in terms of material parameters has the greatest influence on the quality of a synthetic training dataset.

This is verified through parameter relevance analysis using the fANOVA algorithm [28]. The results support the hypothesis that the proportion of realistic materials is the most important material parameter (see Figure 12). The three Pareto-optimal training results, in terms of the average precision for both object classes, are achieved with the material parameters shown in Table 2:

An analogous multidimensional optimization is also performed for the scene parameters. Here, the number of point light sources, the curvature of the ground plane, the total number of objects n, the ratio of target objects

r_{T} \in [0, 1]

, the inter-class ratio r and the minimum and maximum camera distances

r_{\min}

and

r_{\max}

are optimized. A total of 171 parameter sets are evaluated. Figure 13 and Table 3 show the parameters that are used to generate the most suitable datasets for training. For some parameters, an accumulation of these best training datasets can be recognized. These include the camera distances and the parameters that describe the object distribution.

This observation can also be confirmed for the scene parameter optimization using a fANOVA-based analysis (see Figure 14). The most important parameter is the minimum camera distance

r_{\min}

. The optimal interval for this parameter roughly corresponds to the results of the preliminary study on the influence of the camera distance (see Figure 10).

3.5. Evaluation of Optimal Configuration

In the previous multidimensional optimizations, Pareto-optimal parameter sets were found for the generation of synthetic training data. Based on these investigations, an optimal dataset is generated. For this purpose, the material and scene parameters with which the highest mAP was achieved are combined (last columns in Table 2 and Table 3). This dataset with optimal parameters consists of 5000 images and annotations. Additionally, a second dataset is generated in which roughness, metallicity, background curvature, and number of light sources are not fixed values, but are instead modeled as normally distributed. The expected value of each normal distribution is set to the determined optimal value. The final evaluation of these training datasets is performed using the xl-model of YOLOv8. The results of this final evaluation are shown in Table 4.

For both the dataset with fixed parameters and the dataset with normal distributed values, a training process with and without data augmentation is started. When data augmentation is used, the YOLO framework automatically alters the training data to increase variety. This includes adjustments to hue, saturation, and value, as well as geometric transformations (rotation, translation, scaling, shearing, flipping, etc.), and copy–pasting image segments.

4. Discussion

The mAP values achieved are modest compared with those reported in previous studies, primarily because the validation dataset is small and challenging. For example, a more application-specific validation dataset could have been generated, such as one in which all objects are placed on a conveyor belt, emulating a classic pick-and-place task. However, the aim of this work extends beyond maximizing absolute mAP values. Instead, the results show that the presented optimization pipeline is able to find a material and scene parameter set that is suitable for generating a purely synthetic training dataset for one or multiple specific applications.

The results demonstrate that the camera distance is the most important scene parameter, likely because it directly determines object size in synthetic images. The other scene parameters only have a minor influence on the accuracy achieved by the trained network (see Figure 14).

Regarding material parameters, the results (Figure 12) suggest that the proportion of realistic materials is especially important. Thus, using approximately

40 %

parametric materials in addition to realistic ones is advantageous. In contrast, the precise parameterization of these parametric materials (in terms of roughness and metallicity) appears to be less important.

It should also be mentioned that the results could be improved by increasing the number of parameter evaluations within the optimizations. However, the absolute number of optimization runs for this study is limited by the available computing power.

5. Conclusions

This paper introduces an optimization pipeline for enhancing the generation of synthetic datasets used in neural network-based object segmentation. Leveraging the BlenderProc framework for procedural scene generation and Optuna for parameter optimization, the pipeline systematically identifies optimal material and scene parameters for specific objects based on performance on a real validation dataset.

Through individual and multidimensional parameter studies, important parameters influencing synthetic dataset quality were identified. Notably, camera distance proved to be the most significant scene parameter, while the proportion of realistic materials was crucial among material parameters. The optimized parameter sets demonstrate that the pipeline can produce synthetic training data that meaningfully improves neural network performance. They also provide insight into how systematic scene design and material selection affect the robustness of synthetic data for segmentation tasks.

This suggests a significant improvement over conventional data generation practices, where parameter selection typically relies on domain expertise or trial-and-error. Such approaches cannot systematically identify the non-intuitive parameter combinations that might be optimal for specific segmentation tasks.

6. Future Work

While the results of this study confirm the feasibility of synthetic data parameter optimization, many avenues for further research remain. For example, a more comprehensive exploration of the parameter space could be achieved by increasing the number of parameter evaluations. Furthermore, the relative importance between the scene and material parameters could be determined by a joint multidimensional parameter optimization. However, both approaches would require additional computing resources. In addition, the influence of the size and type of the real validation dataset on the optimization should be investigated. The investigations should be repeated once a solution ensuring the deterministic behavior of the Blender’s Cycles renderer becomes available.

Optimization was performed on two components with markedly different geometries, demonstrating that optimal parameter values and the relative importance of individual parameters depend on component geometry. Therefore, a new optimization would be required to generate synthetic training data for a different component. In further work, the influence of the component geometry on the optimal material and scene parameters should be investigated in more detail, e.g., by parameterizing the geometry of the component itself so that the optimal parameters can be determined as a function of the component geometry.

The real validation dataset used is relatively heterogeneous, encompassing multiple environments. For a simple application (e.g., pick-and-place robots), on the other hand, a very homogeneous validation dataset can be used that is generated directly in the application environment. In such cases, it would be valuable to investigate whether the optimization converges to a parameter set that closely matches the actual scene, assuming the material and scene parameters permit such a configuration.

Author Contributions

Conceptualization, M.N. and N.M.; methodology, M.N.; software, M.N.; validation, L.H.; formal analysis, M.N.; investigation, M.N.; resources, K.H. and N.M.; data curation, K.H.; writing—original draft preparation, M.N.; writing—review and editing, K.H. and L.H.; visualization, M.N.; supervision, L.H. and E.R.; project administration, L.H. and E.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhao, Z.Q.; Zheng, P.; Xu, S.T.; Wu, X. Object Detection with Deep Learning: A Review. IEEE Trans. Neural Networks Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef]
Zaidi, S.S.A.; Ansari, M.S.; Aslam, A.; Kanwal, N.; Asghar, M.; Lee, B. A survey of modern deep learning based object detection models. Digit. Signal Process. 2022, 126, 103514. [Google Scholar] [CrossRef]
Jiao, L.; Zhang, F.; Liu, F.; Yang, S.; Li, L.; Feng, Z.; Qu, R. A Survey of Deep Learning-Based Object Detection. IEEE Access 2019, 7, 128837–128868. [Google Scholar] [CrossRef]
Redmon, J. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO. 2023. Available online: https://docs.ultralytics.com/de/models/yolov8/ (accessed on 30 December 2024).
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
Sun, C.; Shrivastava, A.; Singh, S.; Gupta, A. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
Schieber, H.; Demir, K.C.; Kleinbeck, C.; Yang, S.H.; Roth, D. Indoor Synthetic Data Generation: A Systematic Review. Comput. Vis. Image Underst. 2024, 240, 103907. [Google Scholar] [CrossRef]
Boikov, A.; Payor, V.; Savelev, R.; Kolesnikov, A. Synthetic Data Generation for Steel Defect Detection and Classification Using Deep Learning. Symmetry 2021, 13, 1176. [Google Scholar] [CrossRef]
Wiese, L.; Hinz, L.; Reithmeier, E. Medical instrument detection with synthetically generated data. In Proceedings of the Medical Imaging 2024: Imaging Informatics for Healthcare, Research, and Applications, Yokohama, Japan, 17–19 May 2024; Yoshida, H., Wu, S., Eds.; International Society for Optics and Photonics, SPIE: Bellingham, WA, USA, 2024; Volume 12931, p. 1293109. [Google Scholar] [CrossRef]
Jonker, M.; Roozing, W.; Strisciuglio, N. Synthetic Data-Based Training of Instance Segmentation: A Robotic Bin-Picking Pipeline for Chicken Fillets. In Proceedings of the 2024 IEEE 20th International Conference on Automation Science and Engineering (CASE), Bari, Italy, 28 August–1 September 2024; pp. 2805–2812. [Google Scholar] [CrossRef]
Arlovic, M.; Damjanovic, D.; Hrzic, F.; Balen, J. Synthetic Dataset Generation Methods for Computer Vision Application. In Proceedings of the 2024 International Conference on Smart Systems and Technologies (SST), Osijek, Croatia, 16–18 October 2024; pp. 69–74. [Google Scholar] [CrossRef]
Martinez-Gonzalez, P.; Oprea, S.; Garcia-Garcia, A.; Jover-Alvarez, A.; Orts-Escolano, S.; Garcia-Rodriguez, J. Unrealrox: An extremely photorealistic virtual reality environment for robotics simulations and synthetic data generation. Virtual Real. 2020, 24, 271–288. [Google Scholar] [CrossRef]
Borkman, S.; Crespi, A.; Dhakad, S.; Ganguly, S.; Hogins, J.; Jhang, Y.C.; Kamalzadeh, M.; Li, B.; Leal, S.; Parisi, P.; et al. Unity Perception: Generate Synthetic Data for Computer Vision. arXiv 2021, arXiv:2107.04259. [Google Scholar] [CrossRef]
Blender Foundation. Blender—A 3D Modelling and Rendering Software. 2024. Available online: https://www.blender.org (accessed on 30 December 2024).
Denninger, M.; Winkelbauer, D.; Sundermeyer, M.; Boerdijk, W.; Knauer, M.; Strobl, K.H.; Humt, M.; Triebel, R. BlenderProc2: A Procedural Pipeline for Photorealistic Rendering. J. Open Source Softw. 2023, 8, 4901. [Google Scholar] [CrossRef]
Tobin, J.; Fong, R.; Ray, A.; Schneider, J.; Zaremba, W.; Abbeel, P. Domain randomization for transferring deep neural networks from simulation to the real world. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 23–30. [Google Scholar] [CrossRef]
Tremblay, J.; Prakash, A.; Acuna, D.; Brophy, M.; Jampani, V.; Anil, C.; To, T.; Cameracci, E.; Boochoon, S.; Birchfield, S. Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
Prakash, A.; Boochoon, S.; Brophy, M.; Acuna, D.; Cameracci, E.; State, G.; Shapira, O.; Birchfield, S. Structured Domain Randomization: Bridging the Reality Gap by Context-Aware Synthetic Data. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 7249–7255. [Google Scholar] [CrossRef]
Rawal, P.; Sompura, M.; Hintze, W. Synthetic Data Generation for Bridging Sim2Real Gap in a Production Environment. arXiv 2024, arXiv:2311.11039. [Google Scholar]
Newman, C.; Petzing, J.; Goh, Y.M.; Justham, L. Investigating the optimisation of real-world and synthetic object detection training datasets through the consideration of environmental and simulation factors. Intell. Syst. Appl. 2022, 14, 200079. [Google Scholar] [CrossRef]
Wiese, L.; Hinz, L.; Reithmeier, E.; Korn, P.; Neuhaus, M. Detection of Surgical Instruments Based on Synthetic Training Data. Computers 2025, 14, 69. [Google Scholar] [CrossRef]
Schneidereit, T.; Gohrenz, S.; Breuß, M. Object detection characteristics in a learning factory environment using YOLOv8. arXiv 2025, arXiv:2503.10356. [Google Scholar]
Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-Generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar]
Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef]
Blender Contributors. Blender Issue #101726: Cycles Does Not Generate the Exact Same Images When a Scene Is Rendered Twice. 2022. Available online: https://projects.blender.org/blender/blender/issues/101726 (accessed on 30 December 2024).
Hutter, F.; Hoos, H.; Leyton-Brown, K. An Efficient Approach for Assessing Hyperparameter Importance. In Proceedings of the 31st International Conference on Machine Learning, Bejing, China, 22–24 June 2014; Xing, E.P., Jebara, T., Eds.; Proceedings of Machine Learning Research. Volume 32, pp. 754–762. [Google Scholar]

Figure 1. Procedure of optimization.

Figure 2. Examples of synthetic images from two different scenes.

Figure 3. Renderings of the target objects with parametric materials.

Figure 4. Examples from the real-world dataset. (a) Sample captured in an office environment. (b) Sample captured on a conveyor belt.

Figure 5. Influence of the number of scenes

n_{Scenes}

on the real validation dataset mAP. Each scene contains 10 images captured from different camera angles.

Figure 5. Influence of the number of scenes

n_{Scenes}

on the real validation dataset mAP. Each scene contains 10 images captured from different camera angles.

Figure 6. Influence of the total number of objects n in the synthetic dataset on the validation accuracy on the real dataset. This number includes the number of the target objects as well as the distractor objects.

Figure 7. Influence of the number of additional distractor objects

n_{D}

. The distractor objects have primitive geometric shapes (e.g., cylinders, spheres, squares and cones). When

n_{D} = 0

, the scene only contains the target objects (in this case Waterpump and Cap).

Figure 7. Influence of the number of additional distractor objects

n_{D}

. The distractor objects have primitive geometric shapes (e.g., cylinders, spheres, squares and cones). When

n_{D} = 0

, the scene only contains the target objects (in this case Waterpump and Cap).

Figure 8. Influence of the inter-class ratio r on the average precision values for both object classes.

r = 0

corresponds to a dataset that consists solely of Cap objects, while

r = 1

corresponds to a dataset which consists of solely Waterpump objects.

Figure 8. Influence of the inter-class ratio r on the average precision values for both object classes.

r = 0

corresponds to a dataset that consists solely of Cap objects, while

r = 1

corresponds to a dataset which consists of solely Waterpump objects.

Figure 9. Analysis results for the composition of the material library. For

r_{c c} = 0

, the material library consists only of parametric materials. For

r_{c c} = 1

, the material library consists only of realistic materials.

Figure 9. Analysis results for the composition of the material library. For

r_{c c} = 0

, the material library consists only of parametric materials. For

r_{c c} = 1

, the material library consists only of realistic materials.

Figure 10. Analysis results of the influence of the camera distance interval

[r_{\min}, r_{\max}]

on the validation dataset mAP. The optimization bounds are chosen so that

r_{\max} \geq r_{\min}

.

Figure 10. Analysis results of the influence of the camera distance interval

[r_{\min}, r_{\max}]

on the validation dataset mAP. The optimization bounds are chosen so that

r_{\max} \geq r_{\min}

.

Figure 11. Results achieved from the multidimensional material parameter optimization. The parameter sets which yielded the best results are highlighted.

Figure 12. Significance of the individual material parameters as calculated using the fANOVA algorithm.

Figure 13. Results from the multidimensional scene parameter optimization. The parameter sets which yielded the best results are highlighted.

Figure 14. Relevance of the individual scene parameters as calculated using the fANOVA algorithm.

Table 1. Average precision for the Waterpump class using identical training data on different GPUs.

GPU	Dataset 1	Dataset 2	Dataset 3
RTX 4090	0.551	0.459	0.511
RTX 4070 Ti	0.554	0.466	0.579
RTX 3090	0.545	0.534	0.569
GV100	0.537	0.472	0.553

Table 2. Material parameters and precision values of the Pareto-optimal datasets.

Dataset Hash	e0834c	22d11f0	8a5d1c
Metallicity	0.215	0.867	0.965
Roughness	0.204	0.204	0.785
CC-Share $r_{c c}$	0.638	0.638	0.386
AP Waterpump	0.630	0.646	0.386
AP Cap	0.590	0.559	0.546
mAP	0.610	0.603	0.611

Table 3. Scene parameters and precision values for the Pareto-optimal datasets.

Dataset Hash	9dc945	893078	f4493f4
Ground curvature a	0.660	0.867	0.274
Light sources $N_{L}$	3	7	2
Min. camera dist. $r_{\min}$ [m]	0.268	0.280	0.207
Max. camera dist. $r_{\max}$ [m]	1.212	0.816	1.228
Share of target objects $r_{T}$	0.767	0.856	0.759
Inter-class ratio r	0.517	0.124	0.403
Total objects n	120	74	199
AP Waterpump	0.710	0.614	0.685
AP Cap	0.506	0.572	0.546
mAP	0.608	0.593	0.615

Table 4. Results of the final evaluation.

Dataset	Fixed Values		Normal Distributed Values
Data Augmentation	Deactivated	Activated	Deactivated	Activated
AP Waterpump	0.683	0.736	0.697	0.785
AP Cap	0.540	0.620	0.526	0.647
mAP	0.612	0.678	0.612	0.716

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nagel, M.; Hedrich, K.; Melchert, N.; Hinz, L.; Reithmeier, E. Optimization of Scene and Material Parameters for the Generation of Synthetic Training Datasets for Machine Learning-Based Object Segmentation. Computers 2025, 14, 341. https://doi.org/10.3390/computers14080341

AMA Style

Nagel M, Hedrich K, Melchert N, Hinz L, Reithmeier E. Optimization of Scene and Material Parameters for the Generation of Synthetic Training Datasets for Machine Learning-Based Object Segmentation. Computers. 2025; 14(8):341. https://doi.org/10.3390/computers14080341

Chicago/Turabian Style

Nagel, Malte, Kolja Hedrich, Nils Melchert, Lennart Hinz, and Eduard Reithmeier. 2025. "Optimization of Scene and Material Parameters for the Generation of Synthetic Training Datasets for Machine Learning-Based Object Segmentation" Computers 14, no. 8: 341. https://doi.org/10.3390/computers14080341

APA Style

Nagel, M., Hedrich, K., Melchert, N., Hinz, L., & Reithmeier, E. (2025). Optimization of Scene and Material Parameters for the Generation of Synthetic Training Datasets for Machine Learning-Based Object Segmentation. Computers, 14(8), 341. https://doi.org/10.3390/computers14080341

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimization of Scene and Material Parameters for the Generation of Synthetic Training Datasets for Machine Learning-Based Object Segmentation

Abstract

1. Introduction

2. Optimization Pipeline

2.1. Pipeline Architecture and Implementation

2.2. Deterministic Behavior

3. Analysis

3.1. Case Study Setup

3.2. Synthetic Dataset Size

3.3. Optimization of Individual Parameters

3.3.1. Analysis of the Total Object Count

3.3.2. Analysis of the Number of Primitive Objects

3.3.3. Analysis of the Inter-Class Ratio

3.3.4. Analysis of the Material Library Composition

3.3.5. Analysis of the Camera Distance

3.4. Multi-Parameter Optimizations

3.5. Evaluation of Optimal Configuration

4. Discussion

5. Conclusions

6. Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI