A Texture-Based Simulation Framework for Pose Estimation

Shen, Yaoyang; Kong, Ming; Yu, Hang; Liu, Lu

doi:10.3390/app15084574

Open AccessArticle

A Texture-Based Simulation Framework for Pose Estimation

School of Measurement Technology and Instrumentation, China Jiliang University, Hangzhou 310020, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(8), 4574; https://doi.org/10.3390/app15084574

Submission received: 17 March 2025 / Revised: 16 April 2025 / Accepted: 18 April 2025 / Published: 21 April 2025

Download

Browse Figures

Versions Notes

Abstract

:

An accurate 3D pose estimation of spherical objects remains challenging in industrial inspections and robotics due to their geometric symmetries and limited feature discriminability. This study proposes a texture-optimized simulation framework to enhance pose prediction accuracy through optimizing the surface texture features of the design samples. A hierarchical texture design strategy was developed, incorporating complexity gradients (low to high) and color contrast principles, and implemented via VTK-based 3D modeling with automated Euler angle annotations. The framework generated 2297 synthetic images across six texture variants, which were used to train a MobileNet model. The validation tests demonstrated that the high-complexity color textures achieved superior performance, reducing the mean absolute pose error by 64.8% compared to the low-complexity designs. While color improved the validation accuracy universally, the test set analyses revealed its dual role: complex textures leveraged chromatic contrast for robustness, whereas simple textures suffered color-induced noise (a 35.5% error increase). These findings establish texture complexity and color complementarity as critical design criteria for synthetic datasets, offering a scalable solution for vision-based pose estimation. Physical experiments confirmed the practical feasibility, yielding 2.7–3.3° mean errors. This work bridges the simulation-to-reality gaps in symmetric object localization, with implications for robotic manipulation and industrial metrology, while highlighting the need for material-aware texture adaptations in future research.

Keywords:

texture design; dataset construction; pose estimation; deep learning; spherical particles

1. Introduction

A three-dimensional pose of spherical objects is pivotal for industrial inspection [1], remote sensing applications [2], particle tracking [3], and Robotic Machining Systems [4]. While the traditional methods of pose estimation rely on geometric feature matching [5] or point cloud registration [6], they struggle under dynamic lighting and occlusion [7]. Recent research advances have combined pose estimation with deep learning to perform end-to-end pose estimation by training convolutional neural networks with datasets [8,9,10,11]. Although this method is very convenient, there are still two key challenges in practical applications: (1) the cost of obtaining the real-world gesture data with intensive annotation [12] is too high, and human errors easily occur in the annotation process; (2) the lack of discriminability of features leads to a limited generalization ability of the model. When applied to unknown targets or when there are drastic lighting changes, the performance of the model decreases significantly [13,14]. Moreover, if the observed object is regularly symmetric, such as a sphere, the problem of rotation ambiguity inherent in symmetry cannot be solved [14,15].

In recent years, the method of printing characteristic texture patterns on the surface of a sphere has been used to solve the problem of obtaining particle attitude information in particle rotation dynamics. Zimmermann et al. [16] matched experimental particle textures with synthetic templates using stereo vision, while Mathai et al. [17] optimized binary surface patterns via cost function minimization. Though effective in controlled settings, these methods exhibit limited generalization due to texture sensitivity. Will et al.’s stereolithography-based rendering [18] further highlights the trade-off between pattern complexity and computational feasibility.

Recent advances in texture-driven pose estimation include the work of Zhang K et al., who discussed in detail the importance of color and texture characteristics in using neural networks to estimate coal ash content and showed through experimental results and visualizations that the importance of color in a CNN was as high as 64.77%, while the texture characteristics contributed 35.23% [19]. Wang Zhen et al. constructed a texture optimization network that combined contextual aggregate information and used the network for texture restoration to enhance low-light images [20]. These studies show that texture features play a crucial role in deep learning and image processing.

To address these limitations, this work introduces a simulation-driven framework combining VTK-based synthetic data generation with Tamura texture theory. The specific research objectives and contents are as follows:

Objective: bridge the simulation–reality gap in attitude estimation for symmetrical objects using synthetic data and texture theory.
Texture design: direction-sensitive surface textures governed by Tamura texture principles (coarseness, contrast, and directionality) with six variants (three complexity levels × grayscale/color).
Data generation: VTK-based synthetic data with automated pose space sampling (3° intervals) and implicit Euler angle encoding.
Validation: experimental evaluation using 3D-printed textured spheres and real-world attitude measurements.

This study establishes and experimentally validates a set of texture design criteria for spherical objects. Using 3D-printed textured spheres and actual attitude measurements, we bridge the simulation–reality gap in the positioning of symmetrical objects. This provides a robust attitude estimation for robotics and industrial metrology while offering a scalable synthetic data paradigm.

2. Materials and Methods

This study proposes a simulation dataset construction scheme based on representational texture. Firstly, based on the texture-related theory, a series of textures are designed with different complexities. Secondly, the texture attachment and pose simulation of spherical particles are established through the VTK simulation library, and automatic pose change and pose information annotation are designed. Finally, a batch of image label datasets with pose information is obtained, which can support the end-to-end deep learning model training.

2.1. Texture Design

Texture, as a visual attribute of the surface of an object, reflects the statistical characteristics of the microstructure of the surface of an object and contains a wealth of structural information, which plays a crucial role in image recognition and object attitude estimation. In the field of computer vision, texture is usually defined as the spatial distribution pattern of gray pixel values within an image area.

Early research shows that the key to texture perception is to extract the basic features of texture. Tamura et al. proposed six basic texture features, including coarseness, contrast, directionality, line-likeness, regularity, and roughness [21]. These features can effectively describe the statistical characteristics of texture and provide a theoretical basis for texture analysis and recognition. Recent studies have further quantified the relationship between texture features and research accuracy. Dzierżak et al. demonstrated that an optimized feature selection from 290 texture descriptors (including gray-level statistics and wavelet transforms) significantly improves osteoporosis detection in CT scans, with the k-nearest neighbors algorithm achieving 96.75% accuracy using 50 prioritized features [22]. Trevisani et al. proposed a method to quantify surface roughness through multi-scale texture analysis and constructed a scalable roughness index. These indexes can reveal terrain texture characteristics at different spatial scales, provide a new dimension for terrain analysis, and improve the accuracy of geomorphic analysis [23]. He et al. further studied the impact of texture distribution on visual perception and proposed rules for how texture distribution affects visual perception [24]. Their research showed that specific texture distribution patterns can enhance the visual system’s perception, thereby improving the accuracy of object recognition.

The stripe spacing annular ratio usually refers to the ratio of the spacing (d) between the adjacent stripes in a ring or periodic texture pattern to the characteristic size of the ring structure, such as the circumference L or radius r. Its mathematical expression is as follows:

η = d / 2 π r \times 100 %

(1)

This ratio reflects the distribution density of the fringes in the ring structure and is a key parameter of texture design. In optical measurement or machine vision, if the fringe spacing is too small (the ring ratio is too low), the imaging system may have aliasing effects due to insufficient sampling, resulting in the fringe not being able to be accurately resolved. According to Nyquist’s sampling theorem, the sampling frequency needs to be at least twice the resolution of the system [25]. In display technology, if the spatial frequency corresponding to the pixel spacing is

f_{p}

, the frequency of the moiré fringe needs to meet the following:

f_{{Moire}^{'}} \leq \frac{f_{p}}{2} \to \frac{d}{2 π r} \geq 5 %

(2)

In the fringe projection system, the ring ratio should be ≥5% to avoid the phenomenon of a moiré fringe on the display screen, resulting in phase unwrapping errors [26].

We considered this in the process of designing the texture patterns, so the fringe spacing ratio is controlled above 5%. Based on the above theoretical basis, three textures with similar proportion distribution and fringe spacing but different complexities are designed. Texture1 is a low-complexity texture composed only of a horizontal stripe texture. Texture2 is a medium-complexity texture, adding a columnar stripe texture, and the texture differentiation is higher. Texture3 adds more columnar stripes and short horizontal stripes for a more complex texture area.

CIE-Lab color difference (

Δ E

) is an internationally used quantitative index of color difference. The minimum color difference perceptible to the human eye is

Δ E

≈ 1, but

Δ E

≥ 5 is needed to be reliably distinguished. When

Δ E

is higher than a certain condition, the color difference is significant (the human eye can clearly distinguish it), and it is suitable for robust recognition in machine vision systems. Studies have shown that

Δ E

> 30 can resist light changes, noise interference, and sensor errors to ensure the stability of color features in complex environments [27]. In this case, we improve on the original black and white texture. Because in the Lab color domain the corresponding brightness of black and white is 0 and 100, which are two extreme colors, while the corresponding Lab values of blue are about 30, 68, and −112 and green is about 46, −52, and 49, the color difference between the two

Δ E

≈ 200 and the color difference of the four colors is far greater than 30. Therefore, a color texture pattern with the same texture distribution is designed, composed of black, white, green, and blue.

A total of six texture patterns, with three different texture complexities and two versions (black and white and color), are designed, as shown in Figure 1 below. Figure 1a–f correspond to six different textures. To further differentiate, these textures are named in order: Texture_1_bw, Texture_1_color, Texture_2_bw, Texture_2_color, Texture_3_bw, and Texture_3_color. Texture1-3 corresponds to the three textures of different complexities mentioned above, while bw represents the texture in black and white type and color represents the color type.

2.2. Dataset Construction

An automated simulation dataset is developed, building on methodology based on the Visualization Toolkit (VTK) [28]. As an open-source 3D visualization framework, VTK’s object-oriented design and cross-platform characteristics support complex scene modeling and high-precision rendering. The core process includes three modules: 3D modeling, attitude control, and automatic annotation. The core process is implemented in a Pycharm environment using Python 3.8 and its integrated VTK library.

A.: 3D modeling and texture mapping

The VTK geometric modeling class is used to build a sphere model, and its radius and spatial position are set by parameterization. In order to realize the discernability of the surface features, the designed texture is mapped to the surface of the model, and the texture coordinate transformation mechanism is used to ensure its continuity and consistency on the surface. The virtual imaging environment is configured with a simulation camera, a multi-light source system, and a physical rendering engine, where the camera parameters (focal length, field angle, and sensor size) strictly mimic real industrial inspection equipment to ensure the physical consistency of the generated images.

B. Attitude control and data generation

The Euler angle rotation sequence of the sphere around the x-/y-/z-axes is defined. In order to avoid the angular coupling effect, the independent axis incremental step method is adopted and a sampling interval of 3° is set to generate a discrete attitude set. Each pose corresponds to a unique coordinate transformation matrix. After the VTK rendering and pipeline real-time calculation, the corresponding two-dimensional projection image is outputted. A single projected image corresponds to unique pose information. In addition, to improve the efficiency of the data generation, a batch script is designed to automate the process of iterating, rendering, and storing the attitude parameters.

C. Pose coding and dataset construction

An implicit encoding method for the pose parameters, based on the file name, is proposed. The Euler angle (pitch angle, yaw angle, and roll angle) is embedded into the image file name in the format of “X_Y_Z.png” to avoid data management redundancy caused by independent annotation files. Through the design of the parsing script, the angle value in the file name is converted into a three-dimensional vector, which is used as the truth value label for the network training. The final dataset contains the image pose information that supports the end-to-end deep learning model training. Some of the datasets generated under different textures are shown in Figure 2 below.

3. Simulations and Design Criterion

In this section, a series of simulations are conducted to verify the method’s effectiveness. Firstly, the simulation image training set generated by different textured particles in the previous chapter is utilized. By training a CNN model, the validation error on the validation set is verified. Secondly, the test error on the test set is further verified by visual and quantitative analyses. By comparing the performance of different texture datasets, a set of design rules for spherical grain texture are defined, and the final texture pattern is determined.

MobileNet is chosen as the model for this experiment [29], and a composite loss function, consisting of the mean absolute error (MAE) and the Pseudo-Huber loss function, is designed. For all the training in this section, 1600 training sets, 597 verification sets, and 100 test sets were used. AdamW is chosen as the optimizer, using a learning rate of 0.0005 and scheduling with a cosine annealing strategy. In addition, each model is trained with 20 epochs. The expression of the loss function is as follows:

L o s s (y, \hat{y}) = \{\begin{matrix} \frac{1}{N} \sum_{i - 1}^{n} a^{2}, i f a \leq δ \\ δ^{2} (\sqrt{1 + {(\frac{a}{δ})}^{2}} - 1), i f a > δ \end{matrix}

(3)

where

n

is the number of samples;

a = |y - \hat{y}|

; and

δ = 1 °

in this example.

3.1. Verification Set Performance Comparison

In this section, the performance of the CNN on the validation sets under different texture datasets is visually and quantitatively analyzed. The specific quantitative analysis results are shown in Table 1, and the visual analysis results are shown in Figure 3. In the table, (Err_X, Err_Y, Err_Z) represents the average angular error of the attitude angle corresponding to each coordinate axis; (Std_X, Std_Y, Std_Z) stands for standard deviation (Std); MAE stands for total mean absolute error; and RMSE stands for total root mean square error, which accounts for error magnitudes and is less prone to cancellation effects compared to the mean error metrics.

As shown in Table 1, the model’s pose estimation accuracy improves progressively with an increasing texture complexity. Texture1 yield the highest errors (MAE = 1.178° for black and white and 0.974° for color; RMSE = 1.469° and 1.189°), while medium-complexity textures (Texture2) reduce the MAE to 0.842° and 0.791° (RMSE = 1.078° and 1.037°), respectively. Texture3, which is composed of complex textures, has the best performance, with the MAE = 0.492° in black and white type and 0.416° in color type and the RMSE reaching 0.661° and 0.543°, which is 64.8% higher than texture 1. Notably, the RMSE values follow a similar decreasing trend as the MAE but emphasize a greater error magnitude reduction in the high-complexity textures. This trend shows that the complexity of the texture features may be highly correlated with the model’s pose estimation ability.

Color textures consistently outperform their black and white counterparts across all the complexity levels, with a lower MAE, RMSE, and Std, suggesting color information strengthens feature discriminability. However, in low-complexity scenarios (Texture_1_color), the z-axis errors increase significantly, implying potential noise interference from color in simple patterns. Figure 3 further reveals that a higher texture complexity concentrates error distributions in low-value regions, particularly for color textures, with improved consistency across all axes.

3.2. Test Set Performance Comparison

In this section, the performance of 100 test set images generated for each texture is analyzed and compared. The trained model is used to estimate the attitude of the test set, and the error distribution is visualized. The following visual analysis diagram is drawn, including a true forecast distribution scatter plot, an error line plot, and an error frequency domain histogram. In the scatter plot, the more concentrated the data distributed on the line where

x = y

, the more accurate the attitude prediction is; otherwise, the larger the deviation is. The line chart can clearly and intuitively picture the specific error distribution trend; the histogram makes a statistical analysis of the individual axis error distribution. The detailed analysis is shown in Figure 4 below.

In the test set, texture 1’s color version underperforms its black and white counterpart, as seen in Figure 4a,b. Despite a lower MAE on the validation set, the color version exhibits a significant z-axis error deviation (around 2°) in the test set, highlighting the potential negative impact of color information on simple textures. Figure 4c,d demonstrate improved overall performance with more complex textures, with more accurate predictions on the test set. In this case, the color information enhances the prediction performance. While Figure 4c shows a larger z-axis error, Figure 4d presents a z-axis error distribution similar to the other axes. Figure 4e,f reveal the optimal model performance on the test set with complex textures. The model predictions closely match the actual values, with error distributions within nearly 1° on each axis. Notably, Figure 6 shows a significant reduction in the z-axis error deviation with color information, resulting in a more uniform error distribution across all the axes.

These results indicate that color information has a more positive effect on feature extraction and overall model performance, particularly with complex textures.

3.3. Results, Discussion, and Design Criterion

In the performance test of the test set, the MAE, RMSE, and Std of each texture corresponding to the test set are recorded, and the texture characteristics and performance on the validation set are summarized, as shown in Table 2 below.

The experimental results demonstrate a strong positive correlation between texture complexity and pose estimation accuracy. The validation set performance progressively improves with increasing complexity, and the high-complexity color textures (Texture_3_color) achieve the optimal results (MAE: 0.416°, RMSE: 0.543°, and per-axis Std ≤ 0.549°). Notably, the grayscale and color variants exhibit parallel trends, with MAE reductions of 58.2% and 57.3%, respectively, from Texture_1 to Texture_3, underscoring complexity’s universal benefit across the color modalities.

However, the test set analyses reveal critical nuances in the model generalization. While color enhances the validation accuracy universally, its real-world impact proves complexity-dependent: low-complexity color textures (Texture_1_color) suffer a 35.5% test error increase, accompanied by a 9.4% RMSE degradation (1.411°to 1.543°), over their grayscale counterparts, suggesting that chromatic noise dominates when structural features are sparse. Conversely, high-complexity color textures (Texture_3_color) maintain superior test performance (MAE: 0.731°, RMSE: 0.876°), with the color-to-grayscale RMSE advantage persisting (0.876° and 0.911°), despite the domain shift. This duality establishes texture complexity as a prerequisite for effective color utilization in pose estimation systems.

Based on the analysis results of the above experimental data and the previous theories, a design criterion for the surface texture of spherical particles for attitude estimation is finally established:

(1) Orientation uniqueness: it should be ensured that each view corresponds to a unique orientation, so that the model can distinguish between different poses;

(2) Proper proportion distribution: the proportions of the texture areas and blank areas should be appropriate, and the pixel ratio should be close to 1:1;

(3) Stripe spacing control: the stripe spacing should be moderate, and the annular ratio of the stripe spacing should be greater than or equal to 5%;

(4) Complex texture design: the texture design should include complex texture parts with obvious features to strengthen the features;

(5) Color complementarity: a CIE-Lab color difference

Δ E

> 30 color texture combination of black, white, green, and blue should be used to enhance the color information.

Finally, the color texture with a high texture complexity is chosen as the surface texture of the spherical particles.

4. Experiments

To verify the accuracy of the texture design, texture is attached to the simulated particle model for modeling, and the physical spherical particles are 3D-printed. At the same time, a real machine vision system is built using an industrial CMOS camera, a triaxial angular displacement table, and a personal computer. Figure 5 shows a photo of the entire system setup.

In the experiments, the pose of the spherical particle is changed using an angular displacement table. A camera is used to collect the 2D projection image corresponding to the 3D pose, which is then transmitted to a computer. The obtained images are processed to be applicable to neural network algorithms. In this experiment, the neural network used is still MobileNet trained to work with texture.

The above system is used to collect 40 real images with different attitude angles, and, after processing the images, the MobileNet network is used to estimate the actual attitude. The error between the estimated result and the actual attitude is analyzed statistically. The detailed analysis is shown in Table 3 below, and a more specific visualization is shown in Figure 6 below.

It can be seen from Table 3 that the model trained with the virtual dataset still has a low MAE (2.7~3.3°) in the practical application, which means good practical application prospects. In the field of attitude estimation for symmetric objects such as spheres, Zimmermann obtained a matching error of about 2° and a weighted error of 3° by matching synthetic textures with stereo vision. Mathai’s method, based on optimized surface patterns, achieved an MAE of about 4° in a controlled environment with SNR = 2. Song extended the matching algorithm to 6-DoF pose detection of complex parts by constructing a multi-view template library offline based on CAD models, achieving position errors < 2 mm and pose errors ≈ 3° [30]. In contrast, the texture optimization framework proposed in this paper achieves a lower average error (2.7–3.3°) in real scenes, indicating its effectiveness. However, the deviation along the y-axis is somewhat large, with a Std of 3.718°, an RMSE of 4.955°, and a maximum prediction error of 15°, indicating that the model’s predictions in this direction are less stable. The test error box diagram in Figure 6 further reflects the error distribution of each coordinate axis. As can be seen from the figure, the median error of each axis is relatively close, distributed around 2–3°, and the upper edge of the box is almost within 4°, indicating that 75% of the data prediction is more accurate, indicating that this method is feasible. However, there are several outliers on the x-axis and y-axis, indicating that the prediction performance of the current model is not very stable and there will be deviations. Further research is needed to understand and mitigate the anomalies for more robust real-world performance.

5. Discussion and Conclusions

The development of robust 3D pose estimation systems for symmetric objects, such as spheres, presents a crucial advancement in industrial automation, robotic manipulation, precision metrology [31,32], and the biomedical field [33]. Existing methods have explored the use of printed texture patterns on spheres for attitude determination, yet these techniques often struggle with limited generalization due to texture sensitivity and the trade-off between pattern complexity and computational demands. This work builds upon these existing efforts by addressing the limitations of traditional approaches through a texture-optimized design tailored for a synthetic dataset. Our results demonstrate that a high-complexity texture design, incorporating both multi-scale directional patterns and high chromatic contrast, leads to significant performance improvements. This enables substantial reductions in the MAE compared to low-complexity textures, thus directly tackling the dual bottlenecks of data scarcity and rotational ambiguity inherent in spherical object localization. This enhancement aligns with Tamura’s texture theory, where multi-scale directional features and chromatic contrast improve feature discriminability, enabling robust pose prediction under varying viewpoints.

Theoretically, this work establishes a novel paradigm for texture-driven synthetic data generation. The observed correlation between texture complexity and model accuracy highlights the importance of the feature information provided by the texture feature design. The dual role of color—enhancing the validation accuracy while introducing noise in low-complexity scenarios—provides new insights into color–texture interactions. Here are the three key findings from this study:

Texture complexity dominance: high-complexity color textures (Texture_3_color) achieved the optimal accuracy, reducing errors by 64.8% compared to low-complexity designs.
Color–texture synergy: color enhanced performance in complex textures (with the test MAE achieving 0.731° and RMSE achieving 0.876°) but degraded the low-complexity results, emphasizing complexity as a prerequisite for effective color utilization.
Real-world generalization: the physical tests confirmed the feasibility, with the average attitude error measured by the real system reaching around 3° and 75% of the test data errors being less than 4°, which ensures the feasibility of training the network with 2D data for 3D attitude estimation.

These results provide a foundation for texture-driven synthetic data systems with applications in industrial detection and the related applications of target attitude estimation and motion analysis.

This study is subject to two key limitations. First, the simulation framework assumes ideal material–light interactions, which may not fully capture real-world scenarios with reflective or translucent surfaces. Second, the rotational symmetries in spherical objects introduce inherent ambiguity in pose estimation. Due to the rotational symmetry of a sphere, an insufficient texture design can result in the object’s appearance after rotation being indistinguishable from its original state, negatively impacting the measurement accuracy. This is particularly pronounced under extreme lighting variations, where cameras struggle to capture subtle differences in surface textures (such as color gradients or tiny marks). This difficulty further weakens the system’s ability to differentiate between various rotation angles. Future work should integrate dynamic lighting models, like ray tracing and material-aware texture mapping, for more realistic simulations. To overcome rotational ambiguity, research should focus on designing more distinctive textures with invariant features. Furthermore, exploring sensor fusion with IMUs and incorporating prior knowledge of object motion could enhance pose estimation robustness. Moreover, future work should prioritize the further optimization of the feature extraction capabilities and generalization performance of datasets and models to achieve better pose estimation accuracy.

Author Contributions

Conceptualization, Y.S. and M.K.; Methodology, Y.S. and M.K.; Software, Y.S.; Validation, Y.S.; Formal analysis, M.K., H.Y. and L.L.; Investigation, Y.S.; Resources, M.K.; Data curation, L.L.; Writing—original draft, Y.S.; Writing—review & editing, H.Y. and L.L.; Visualization, H.Y. and L.L.; Supervision, H.Y. and L.L.; Project administration, M.K. and L.L.; Funding acquisition, M.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MAE	Mean absolute error
Std	Standard deviation
RMSE	Root mean square error

References

Zhou, S.; Cao, W.; Wang, Q.; Zhou, M.; Zheng, X.; Lou, J.; Chen, Y. KMFDSST Algorithm-Based Rotor Attitude Estimation for a Spherical Motor. IEEE Trans. Ind. Inform. 2023, 20, 4463–4472. [Google Scholar] [CrossRef]
Hansen, J.G.; de Figueiredo, R.P. Active Object Detection and Tracking Using Gimbal Mechanisms for Autonomous Drone Applications. Drones 2024, 8, 55. [Google Scholar] [CrossRef]
Zhou, Z.; Zeng, C.; Tian, X.; Zeng, Q.; Yao, R. A Discrete Quaternion Particle Filter Based on Deterministic Sampling for IMU Attitude Estimation. IEEE Sens. J. 2021, 21, 23266–23277. [Google Scholar] [CrossRef]
Hao, D.; Zhang, G.; Zhao, H.; Ding, H. A Combined Calibration Method for Workpiece Positioning in Robotic Machining Systems and a Hybrid Optimization Algorithm for Improving Tool Center Point Calibration Accuracy. Appl. Sci. 2025, 15, 1033. [Google Scholar] [CrossRef]
Jiang, J.; Xia, N.; Yu, X. A feature matching and compensation method based on importance weighting for occluded human pose estimation. J. King Saud Univ. Comput. Inf. Sci. 2024, 36, 102061. [Google Scholar] [CrossRef]
Nadeem, U.; Bennamoun, M.; Togneri, R.; Sohel, F.; Rekavandi, A.M.; Boussaid, F. Cross domain 2D-3D descriptor matching for unconstrained 6-DOF pose estimation. Pattern Recognit. 2023, 142, 109655. [Google Scholar] [CrossRef]
Yu, X.; Zhuang, Z.; Koniusz, P.; Li, H. 6dof object pose estimation via differentiable proxy voting loss. arXiv 2020, arXiv:2002.03923. [Google Scholar] [CrossRef]
Hou, H.; Xu, Q.; Lan, C.; Lu, W.; Zhang, Y.; Cui, Z.; Qin, J. UAV Pose Estimation in GNSS-Denied Environment Assisted by Satellite Imagery Deep Learning Features. IEEE Access 2020, 9, 6358–6367. [Google Scholar] [CrossRef]
Bogaart, M.V.D.; Jacobs, N.; Hallemans, A.; Meyns, P. Validity of Deep Learning-Based Motion Capture Using DeepLabCut to Assess Proprioception in Children. Appl. Sci. 2025, 15, 3428. [Google Scholar] [CrossRef]
Park, S.; Jeong, W.-J.; Manawadu, M.; Park, S.-Y. 6-DoF Pose Estimation from Single RGB Image and CAD Model Retrieval Using Feature Similarity Measurement. Appl. Sci. 2025, 15, 1501. [Google Scholar] [CrossRef]
Kubicki, B.; Janowski, A.; Inglot, A. Multimodal Augmented Reality System for Real-Time Roof Type Recognition and Visualization on Mobile Devices. Appl. Sci. 2025, 15, 1330. [Google Scholar] [CrossRef]
Hodaň, T.; Sundermeyer, M.; Drost, B.; Labbé, Y.; Brachmann, E.; Michel, F.; Rother, C.; Matas, J. BOP challenge 2020 on 6D object localization. In Proceedings of the Computer Vision–ECCV 2020 Workshops, Glasgow, UK, 23–28 August 2020; Proceedings, Part II 16. Springer International Publishing: Cham, Switzerland, 2020; pp. 577–594. [Google Scholar] [CrossRef]
Peng, S.; Liu, Y.; Huang, Q.; Zhou, X.; Bao, H. Pvnet: Pixel-wise voting network for 6dof pose estimation. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 4561–4570, CVPR 2019 Open Access Repository. [Google Scholar]
Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
Asher, J.M.; Hibbard, P.B.; Webb, A.L. Perceived intrinsic 3D shape of faces is robust to changes in lighting direction, image rotation and polarity inversion. Vis. Res. 2024, 227, 108535. [Google Scholar] [CrossRef] [PubMed]
Zimmermann, R.; Gasteuil, Y.; Bourgoin, M.; Volk, R.; Pumir, A.; Pinton, J.-F. International Collaboration for Turbulence Tracking the dynamics of translation and absolute orientation of a sphere in a turbulent flow. Rev. Sci. Instruments 2011, 82, 033906. [Google Scholar] [CrossRef] [PubMed]
Mathai, V.; Neut, M.W.M.; van der Poel, E.P.; Sun, C. Translational and rotational dynamics of a large buoyant sphere in turbulence. Exp. Fluids 2016, 57, 51. [Google Scholar] [CrossRef]
Will, J.B.; Krug, D. Dynamics of freely rising spheres: The effect of moment of inertia. J. Fluid Mech. 2021, 927, A7. [Google Scholar] [CrossRef]
Zhang, K.; Wang, W.; Cui, Y.; Lv, Z.; Fan, Y.; Zhao, X. Deep learning-based estimation of ash content in coal: Unveiling the contributions of color and texture features. Measurement 2024, 233, 114632. [Google Scholar] [CrossRef]
Wang, Z.; Zhang, X. Contextual recovery network for low-light image enhancement with texture recovery. J. Vis. Commun. Image Represent. 2024, 99, 104050. [Google Scholar] [CrossRef]
Tamura, H.; Mori, S.; Yamawaki, T. Textural Features Corresponding to Visual Perception. IEEE Trans. Syst. Man, Cybern. 1978, 8, 460–473. [Google Scholar] [CrossRef]
Dzierżak, R. Impact of Texture Feature Count on the Accuracy of Osteoporotic Change Detection in Computed Tomography Images of Trabecular Bone Tissue. Appl. Sci. 2025, 15, 1528. [Google Scholar] [CrossRef]
Trevisani, S.; Guth, P.L. Terrain Analysis According to Multiscale Surface Roughness in the Taklimakan Desert. Land 2024, 13, 1843. [Google Scholar] [CrossRef]
He, T.; Zhong, Y.; Isenberg, P.; Isenberg, T. Design Characterization for Black-and-White Textures in Visualization. IEEE Trans. Vis. Comput. Graph. 2023, 30, 1019–1029. [Google Scholar] [CrossRef] [PubMed]
Goodman, J.W. Introduction to Fourier Optics; Roberts and Company Publishers: Colorado, CO, USA, 2005. [Google Scholar]
Zhang, S. High-speed 3D shape measurement with structured light methods: A review. Opt. Lasers Eng. 2018, 106, 119–131. [Google Scholar] [CrossRef]
Luo, M.R.; Cui, G.; Rigg, B. The development of the CIE 2000 colour-difference formula: CIEDE2000. Color Res. Appl. 2001, 26, 340–350. [Google Scholar] [CrossRef]
Schroeder, W.; Martin, K.M.; Lorensen, W.E. The Visualization Toolkit an Object-Oriented Approach to 3D Graphics; Prentice-Hall, Inc.: Englewood Cliffs, NJ, USA, 1998; pp. 10–52. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Song, W.; Guo, C.; Shen, L.; Zhang, Y. 3D pose measurement for industrial parts with complex shape by monocular vision. In Proceedings of the SPIE 10827, Sixth International Conference on Optical and Photonic Engineering (icOPEN 2018), Shanghai, China, 8–11 May 2018; p. 1082712. [Google Scholar] [CrossRef]
Balntas, V.; Doumanoglou, A.; Sahin, C.; Sock, J.; Kouskouridas, R.; Kim, T.K. Pose Guided RGBD Feature Learning for 3D Object Pose Estimation. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar] [CrossRef]
Cui, Y.; Hildenbrand, D. Pose estimation based on Geometric Algebra. GraVisMa 2009, 73, 7. Available online: https://www.researchgate.net/publication/286141050_Pose_estimation_based_on_Geometric_Algebra (accessed on 2 January 2025).
Ci, J.; Wang, X.; Rapado-Rincón, D.; Burusa, A.K.; Kootstra, G. 3D pose estimation of tomato peduncle nodes using deep keypoint detection and point cloud. Biosyst. Eng. 2024, 243, 57–69. [Google Scholar] [CrossRef]

Figure 1. Figures (a,b) are the black and white type and color type with a low texture complexity; Figures (c,d) are the black and white type and color type with a medium texture complexity; and Figures (e,f) are the black and white type and color type with a high texture complexity.

Figure 2. Partial datasets generated by different textures. Figures (a–f) correspond to Texture_1_bw -Texture_3_color.

Figure 3. Figures (a–f) plot the distribution of the verification error and Std under different complexities of black and white/color textures.

Figure 4. Figures (a–f) plot the true forecast distribution scatter plot, error line plot, and error frequency domain histogram under different complexities of black and white/color textures.

Figure 5. The machine vision system.

Figure 6. Box plot of test image error.

Table 1. Validation set performance index under different textures.

Texture	Err_X	Err_Y	Err_Z	Std_X	Std_Y	Std_Z	MAE	RMSE
Texture_1_bw	1.7485	0.663	1.121	0.706	0.529	0.873	1.178	1.469
Texture_1_color	0.769	0.654	1.499	0.493	0.454	1.534	0.974	1.189
Texture_2_bw	0.764	0.663	1.097	0.539	0.540	0.873	0.842	1.078
Texture_2_color	0.710	0.658	1.004	0.568	0.874	0.657	0.791	1.037
Texture_3_bw	0.305	0.654	0.517	0.246	0.449	0.426	0.492	0.661
Texture_3_color	0.284	0.367	0.596	0.211	0.242	0.466	0.416	0.543

All indicators are measured in degrees (°).

Table 2. Summary of test data for different textures.

Texture	Description	Val_ Mae	Val_ RMSE	Test_Mae	Test_ RMSE	Test_Std
Texture_1_bw	Black and white; low complexity	1.178	1.469	1.052	1.411	0.997, 0.625, 0.916
Texture_1_color	Color; low complexity	0.974	1.189	1.32	1.543	0.649, 0.48, 0.632
Texture_2_bw	Black and white; medium complexity	0.842	1.078	1.039	1.335	0.6, 0.639, 1.059
Texture_2_color	Color; medium complexity	0.791	1.037	1.008	1.237	0.659, 0.737, 0.762
Texture_3_bw	Black and white; high complexity	0.492	0.661	0.758	0.911	0.327, 0.665, 0.483
Texture_3_color	Color; high complexity	0.416	0.543	0.731	0.876	0.421, 0.422, 0.549

All indicators are measured in degrees (°).

Table 3. Error analysis of test data.

Parameter	Mean Error	Std	RMSE	Maximum
X-axis	2.717	2.34	3.585	11.843
Y-axis	3.275	3.718	4.955	15.273
Z-axis	3.223	2.031	3.810	8.511

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shen, Y.; Kong, M.; Yu, H.; Liu, L. A Texture-Based Simulation Framework for Pose Estimation. Appl. Sci. 2025, 15, 4574. https://doi.org/10.3390/app15084574

AMA Style

Shen Y, Kong M, Yu H, Liu L. A Texture-Based Simulation Framework for Pose Estimation. Applied Sciences. 2025; 15(8):4574. https://doi.org/10.3390/app15084574

Chicago/Turabian Style

Shen, Yaoyang, Ming Kong, Hang Yu, and Lu Liu. 2025. "A Texture-Based Simulation Framework for Pose Estimation" Applied Sciences 15, no. 8: 4574. https://doi.org/10.3390/app15084574

APA Style

Shen, Y., Kong, M., Yu, H., & Liu, L. (2025). A Texture-Based Simulation Framework for Pose Estimation. Applied Sciences, 15(8), 4574. https://doi.org/10.3390/app15084574

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Texture-Based Simulation Framework for Pose Estimation

Abstract

1. Introduction

2. Materials and Methods

2.1. Texture Design

2.2. Dataset Construction

3. Simulations and Design Criterion

3.1. Verification Set Performance Comparison

3.2. Test Set Performance Comparison

3.3. Results, Discussion, and Design Criterion

4. Experiments

5. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI