Evaluation Model of Holographic Communication Experience in the 6G Era Based on Light Field Display

Hou, Wenjun; Bai, Bing; Lyu, Linkai

doi:10.3390/app132212381

Open AccessArticle

Evaluation Model of Holographic Communication Experience in the 6G Era Based on Light Field Display

by

Wenjun Hou

^1,*,†,

Bing Bai

^2,† and

Linkai Lyu

^2,†

¹

School of Digital Media and Design Art, Beijing University of Posts and Telecommunications, Beijing 100876, China

²

Modern Post College, Beijing University of Posts and Telecommunications, Beijing 100876, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2023, 13(22), 12381; https://doi.org/10.3390/app132212381

Submission received: 9 October 2023 / Revised: 3 November 2023 / Accepted: 9 November 2023 / Published: 16 November 2023

Download

Browse Figures

Versions Notes

Abstract

:

Holographic communication is considered one of the typical scenarios in the 6G era. Studies have shown that the light field display is considered the most effective naked-eye 3D display method in the 6G era. Despite this, there are still many issues worthy of study. Since there are currently no experience-evaluation standards for holographic communications proposed worldwide, this also causes a lot of research work to remain at the design level and vision. To truly realize the holographic communication scenario, it is necessary to systematically evaluate the light field display technology. The level of user experience determines the value of the holographic communication scenario. This requires quantifying the user’s experience level and mapping it to the technical parameters of the light field display image. However, there is still room for improvement in related research. This paper proposes a model based on semi-supervised learning, which takes light field image data of various scenes as input and uses three experience scores of comfort, space, and realism as output to complete the subjective experience evaluation of light field images. Compared with evaluation methods that focus on the quality of the image itself, this article focuses more on the effect on human experience. Compared with existing work, this paper makes improvements in two respects: feature engineering and training strategies. In terms of feature selection, the convolutional neural network is used to extract image content features, and the image quality parameter-extraction module is used to extract image property features. The two are spliced as the input of the classifier; in terms of the training strategy, pseudo-labels and dynamic thresholds are used for training. The final experimental results show that on the MPI-LFA data set, the comfort dimension’s classification accuracy is 80.21, the spatial dimension’s classification accuracy is 83.12, and the realism dimension’s classification accuracy is 81.88.

Keywords:

light field image quality; subjective experience; semi-supervised learning; metric mapping

1. Introduction

A survey shows that based on 5G, 6G will have a profound impact on the intelligence of communications development. The prospects, core technologies, scenarios, challenges, and related issues of 6G have aroused extensive discussions around the world [1]. At present, there is a consensus on five typical application scenarios of 6G around the world: immersive communication, ultra-large-scale connection, extreme/critical communication and its reliable communication, communication perception fusion, and the integration of artificial intelligence and communication. Among these, immersive communication ranks first among the five major scenarios due to its unique 6G characteristics. In terms of business use cases, digital twins and holographic immersive applications will become one of the core businesses of 6G, and holographic communication, which is the killer business of this core business, is even more important.

As the topic of the metaverse becomes more and more popular, people are also finding that the light field display is a technical method that is very consistent with the concept of the metaverse. Some people even call the light field display the “entrance to the metaverse”. The light field can be regarded as “discrete” and “digital” holography. As the angular resolution and viewpoint resolution of the light field continue to improve, the display effect of the light field will continue to approach holographic display [2]. With the development of various 3D display technologies [3,4,5,6,7,8,9,10,11,12,13], the light field has gradually become an ideal 3D display technology. Currently, 6G research has evolved from vision to implementation [14]. The evaluation of the implementation effect still depends on people’s experience. Lyu [15] proposed experience indicators for holographic communication and stipulated three dimensions of experience. The mapping from the original light field image to the experience level will be beneficial to the evaluation of experience, the value judgment of the scene, and the calculation of communication performance requirements. However, the research in this part is not yet complete. Therefore, this paper proposes a model based on semi-supervised learning, which takes light field image data of multiple scenes as input and uses three experience scores as output to complete the subjective experience evaluation of light field images. The innovation of this article is that a deep learning model is designed to map the level of user experience with light field images. Compared with the traditional approach, in feature engineering, light field image content features, two-dimensional plane features, and light field features are introduced for combination. Since this article uses semi-supervised learning, a method of dynamic threshold demarcation of pseudo-labels is proposed in the learning strategy to accelerate convergence.

The second part will introduce the related work of this paper. The third part introduces the proposed model in terms of two aspects, the feature dimension and the training strategy, and supplements the description of the data set source and experimental environment. The fourth part introduces the experimental performance and result analysis of the model proposed in this article on the data set. Section 5 summarizes the work of this paper and proposes possible future directions.

2. Related Work

Image quality assessment (IQA) [16,17] is a fundamental problem in image processing and computer-vision-related fields. Its development process has evolved from an early image quality database containing only a small number of image data and a single distortion type to the current image quality database with a rich variety of distortion types and a large number of image data, from the early combination of feature engineering and traditional machine learning algorithms to the current end-to-end deep learning model and from the relatively single evaluation indicators in the early days to the diversified application-related evaluation indicators today. The most mainstream method is currently the end-to-end deep learning method [18].

The image-feature-extraction model and mapping model are obtained through end-to-end learning. The core of this type of model lies in the convolutional neural network (CNN). Kang et al. [19] introduced CNN into the design of the NR-IQA model earlier and proposed the CNNIQA (convolutional neural network for image quality assessment) model. This model only contains one layer of CL, one pooling layer, and two layers of FCLs. Kim et al. [20] proposed an NR-IQA model called DIQA (deep image quality assessment), which is trained in two stages. Among them, the first stage trains a CNN that can learn objective error maps; the second stage uses human subjective scores to fine-tune the CNN model. Yang et al. [21] proposed an NR-IQA model based on a dual-branch CNN structure. The two branches process the input image and the corresponding gradient image, respectively. Finally, the feature vectors processed by the two branches are fused, and after FCLs, the input is the image quality fraction. Zhang et al. [22] proposed a bilinear NR-IQA model (deep bilinear convolutional neural network, DBCNN), which is suitable forthe synthetic distortion and real distortion image-quality prediction. The model feature-extraction module contains two branches, which are shallow CLs and a VGG16 (Visual Geometry Group) model [23]. It uses bilinear fusion to fuse the output feature maps of the two branches and then performs quality prediction through FCLs.

The above models all directly learn the mapping relationship between images and subjective scores, which is also the same type of problem to be solved in this article. The main research problem this article hopes to solve is the mapping relationship between the light field image data transmitted by holographic communication and the user experience level, which is essentially a light field image data-classification problem based on deep learning. The acquisition cost of subjective data labels is very high, so a semi-supervised learning strategy is planned to be used to complete the research.

The main methods of semi-supervised learning (SSL) can be divided into two categories: the pseudo-labeling method and the consistency regularization method. Pseudo-labeling refers to the method of using the model itself to obtain labels for unlabeled data. Specifically, the softmax probability distribution output by the model is regarded as a soft pseudo-label, and the prediction obtained by the argmax or one-hot is regarded as a hard pseudo-label. These pseudo-labels are utilized as supervised losses to further train the model. Consistency regularization is an important part of many SSL algorithms. The idea of consistency regularization is that the classifier should output the same class distribution probability for unlabeled samples even after noise is injected into them. That is, it is enforced that an unlabeled sample should be classified into the same category as its own enhanced sample [24].

The problem faced in this question is that different categories of image data only have differences in image quality, and the differences in image content are very small. The consistency-regularization method comprises enhancing the sample, which will cause the enhanced sample to belong to another category, and the training strategy consists of roughly judging it as a negative sample, which conflicts with the task of this article. Therefore, this article abandons the idea of considering consistency regularity and uses the pseudo-label method.

The currently best-performing semi-supervised image classification models are MixMatch [25], ReMixMatch [25], and UDA [26] based on pseudo-label ideas. These three methods are all results published in the past two years. The only difference between them lies in the processing of pseudo-label negative samples. There is no obvious difference in the general training strategy. The latest method, FixMatch [27], integrates the idea of pseudo-labeling with the idea of consistency regularity, which reduces the complexity of the algorithm and improves the accuracy.

The earliest method to apply unlabeled data to semi-supervised learning is the Self-training method [28,29,30]. This method uses labeled data to build a model and then predicts unlabeled data, selects samples with high prediction confidence and adds them to the labeled data set, and continuously updates the model until convergence. Based on this idea, we proposec our own model, which will be introduced in detail in the next section.

3. Method

The main process of the method in this article is shown in Figure 1. Light-field image data are constructed that are suitable for display devices. Then, by controlling the test environment, the subjects watch the images through the display device and score them in the three dimensions of space, comfort, and reality based on their real feelings. The user rating result and the image are used as a set of training data pairs, and the mapping relationship is learned through the deep learning model. Next, each link is introduced in detail.

3.1. Data Set

First, in terms of image data, the light field images used for testing in this article came from the public database MPI-LFA (Max Planck Institut Informatik Light Field Archive) database [31]. The display device used in this article’s experiments is a Lookingglass, as shown on the left side of Figure 2. The images in MPI-LFA need to be cropped and spliced into a quilt array form that can be used by the Lookingglass. One of the quilt array images is shown on the right side of Figure 2.

In terms of image content, this article selected 6 types of scene content, including 3 types of real scenes and 3 types of virtual content scenes, covering scene semantic elements such as real/virtual, distant view/close-up view, indoor/outdoor, texture, light and shadow, people/objects. Representative diagrams of the six types of scenarios are shown in Figure 3.

In order to meet the testing requirements, we need to perform different types and levels of distortion/degradation processing on the original image. Referring to relevant research, we selected seven treatment methods, each of which contains five treatment levels, as shown in Table 1.

So far, we have produced a total of 6 types of scenes, with 100 light field images for each type of scene. Each light field image has been processed in 7 ways, and each process is divided into 5 levels and comes with an original image. A total of 21,600 light field images were used as experimental data. It is worth mentioning that each light field image is an array composed of 9 × 11 2D images with a resolution of 1536 × 2048.

Next comes the test-scoring part. According to ITU standards, the rating scale for the subjective indicator system in the experiment refers to the five-level quality damage scale in the “ITU-R BT.2021-1 [32] Recommendation on subjective evaluation methods for stereoscopic three-dimensional television systems”. The display method of holographic visual materials in the experiment refers to the double excitation damage scale method (EBU method) in the “ITU-R BT.500-14 [33] Recommendation on subjective evaluation methods of television image quality”. The scale (left) and demonstration method (right) are shown in Figure 4.

The requirements for the subjects are shown in Table 2.

Based on the conclusions of existing studies [34], at least 16 subjects can achieve a statistical detection power of 0.95, so a total of 20 subjects were recruited for this experiment, and after screening, a total of 1000 pieces of valid scoring data were collected. For the same image, the mode of different subjects is used as the final classification label.

At this point, the production of the experimental data set has been completed, with a total of 21,600 light field images, including 1000 labeled images and 20,600 unlabeled images. The labeled images have three category labels in the three dimensions of realism, comfort, and space, with discrete integers ranging from 1 to 5.

3.2. Model Structure

Obtaining a large bnumber of image data with different distortion types and levels in different scenes, due to the subjective disturbance and high cost of manual scoring, we decided to use a semi-supervised learning strategy to complete the deep learning model rather than fully supervised learning. At the same time, since there is currently no mature pre-trained model that fully matches our problem, we did not consider using transfer learning.

In the early stage, we tried the current mainstream semi-supervised image-classification algorithms, but they all had problems of not being able to adapt perfectly or having poor results. Therefore, based on the existing research, we propose improvement ideas from the two perspectives of feature engineering and training strategies.

In terms of feature engineering, for a light field image, CNN is used to obtain its content features, which can avoid the subjective impact of the image content when users score it. At the same time, the objective index extraction module is used to extract distortion features that are a combination of image attributes, two-dimensional distortion features, and light-field-distortion features, and the two are spliced together as a new feature access classifier. After verification, the model achieved better classification results, as shown in Figure 5.

In the objective indicator-extraction module, we captured the following features of the image: two-dimensional features and light field features. The two-dimensional features include two evaluation results, peak signal-to-noise ratio and structural similarity, while the light field features refer to the practice of Zou [35]. The four features they proposed were reproduced: air domain features, angle domain features, space-angle coupling relationship features, and projection domain features. The above six features are spliced into an objective indicator feature vector.

In terms of the training strategy, this article proposes a way to assign constraints to pseudo-labels. The probability given by the softmax classification is higher than a certain threshold and is assigned a pseudo-label. As the training process proceeds, the threshold can be appropriately lowered to increase the convergence speed and avoid overfitting, as shown in Figure 6.

The threshold filtering function is determined by the proportion of the marked data out of the overall data. When the marking data account for 10% of the overall data, it remains at 90%. Subsequently, as the proportion increases, the threshold decreases accordingly until it reaches 60% and no longer decreases. The change in the threshold is shown in the following formula, where t represents the ratio of labeled data to the overall data set.

α (t) = \{\begin{matrix} 90 %, & t < 10 % \\ 1 - t, & 10 % < t < 40 % \\ 60 %, & o t h e r s \end{matrix}

(1)

4. Results

The resource used in this experiment was a PC. It is equipped with Intel(R) Core(TM) i7, NVIDIA 3080Ti GPU. The development environment was Python 3.10, and the deep learning development framework used included Pytorch 2.0.1 and other auxiliary tool packages such as sklearn, numpy, and pandas.

In our model structure, the CNN uses two layers of convolution and two layers of maximum pooling. The numbers of convolution kernels in the convolution layer are 512 and 64. The following fully connected layer contains three hidden layers. The numbers of hidden layer units are 1024, 512, and 64. The output activation function is softmax, the learning rate is 0.001, and the optimizer uses stochastic gradient descent.

In this experiment, 200 out of 1000 labeled data were used as a test set to verify the effect of the model. Using classification accuracy as the evaluation criterion for model effectiveness, the performance of the final model in the three classification dimensions is shown in Table 3. The results show that after parameter adjustment, the model performs well in all three evaluation dimensions.

In addition, we conducted experiments to determine the impact of various features on the final classification accuracy. The final impact results are shown in Table 4. The negative percentage in the table represents how much the accuracy of the final result is reduced after removing the features of this dimension.

Structural similarity is a full-reference image-quality-evaluation index. It measures the similarity of two images in three respects, brightness, contrast, and structure, and models distortion as a combination of three different factors: brightness, contrast, and structure. The peak signal-to-noise ratio is also a commonly used objective standard to evaluate image quality. It is based on the mean square error between the original image and the processed image. In terms of spatial domain characteristics, the spatial definition distortion is characterized by the NSS distribution characteristics of the central sub-aperture map. In the angle domain, changes in angle consistency are represented by GLCM features on macropixels. In the space–angle coupling domain, the changes in the space–angle coupling relationship are characterized by the GLCM feature on the EPI. In the projection domain, the distortion of each focusing layer is characterized by the local entropy statistical characteristics on the refocusing map.

As can be seen from the table of data above, two-dimensional features (peak signal-to-noise ratio and structural similarity) have little impact on the final experience. The image content features extracted by the CNN module have a greater impact. The influence of the four light field characteristics is also obvious. Compared with the other two evaluation dimensions, realism is more affected by the comprehensive impact of various characteristics. The spatial dimension is significantly affected by the content of the image itself and the light field characteristics. Comfort is more sensitive to the content and two-dimensional characteristics of the image itself.

5. Discussion

In this work, we propose a model that can directly map raw light field images into subjective experience levels. Innovations are proposed from two aspects: feature engineering and training strategies. At the same time, semi-supervised learning is applied to greatly save marking costs.

We believe that the arrival of the 6G era will realize the vision of holographic communication. However, the current light-field-display method still has flaws in terms of experience and cannot achieve the holographic display effect of science fiction movies that people desire. At the same time, compared with traditional 2D images, the number of data transmitted in light field images is increased many times, which is also a huge test for the communication network. Based on the results of this article, we can further complete the inference from experience level to image quality and then further calculate the performance sequence requirements of the communication network.

6. Conclusions

Currently, 6G research is highly discussed around the world. As one of the typical scenarios of 6G, holographic communication’s final implementation effect still depends on the scene effect, that is, the human/user experience level. However, there is currently a lack of research in related aspects, and the quantification of experience levels is not perfect, which makes the research on scenarios stay at the design level and vision level. This article attempts to answer this question. This paper proposes a mapping model based on this supervised learning to connect the light field image with the user’s experience score.

The final results show that in the three descriptive dimensions of realism, comfort, and space, the final classification accuracy of the model reaches more than 80%, which also verifies that the quality of the light field image is indeed related to the user experience. Regarding the influence results of various features, we found that the influence of the image content features produced by the CNN module has a more obvious impact on the final result. This is because the subjects are inevitably affected by the image content when scoring. This also reminds us that if we want to obtain a more accurate evaluation of experience effects, we need to classify and discuss the scenarios. For example, some complex scenes will give people a stronger sense of space. We also found that as an evaluation of light field images, the influence of the four light field characteristics is also relatively obvious. This also confirms that the quality assessment of light field images and the quality assessment of traditional 2D images have different effects on the experience. Whether viewed from the spatial domain or the angle domain, the introduction of light field image features has a significant improvement effect on the final evaluation of the experience effect.

Nonetheless, there are some areas for improvement in this work. For example, in previous research, the number of viewpoints in a light field is an important indicator that affects experience. However, the light field display device Lookingglass is a device with a fixed display resolution and a number of viewpoints. There is no way to further increase the number of viewpoints. Of course, we can conduct simulation experiments by deliberately reducing the number of viewpoints, but too low a number of viewpoints has no value in practical applications, so it is not taken into account. For another example, the training method of dynamically revising the threshold proposed in this article allows the threshold to decrease as the training progresses. However, there must be a more appropriate and efficient threshold-control method, which is also one of the directions to be explored in the next step of this topic. Also, when extracting content features with CNN, due to the large scale of the light field image, this also brings a huge amount of calculation to the convolution process. However, many studies have shown that replacing the entire light field with central sub-aperture images or other representative position images also provides good results. This is also one of the directions for future improvement, which can reduce a large number of convolution operations.

In the testing in this article, the number of subjects was limited. In the future, 6G holographic communication scenarios will be implemented in all walks of life, and user needs and experience effects will also have more descriptive dimensions. If the idea of this article is reproduced or a related expansion is made, it can be expanded in terms of the selection of subjects.

Author Contributions

B.B.: Abstract, introduction, methodology, investigation, data management, formal analysis, writing manuscript preparation. W.H.: Conceptualization, funding acquisition, project administration, writing—review and editing. L.L.: Research supervision, methodology, resources, writing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Beijing University of Posts and Telecommunications-China Mobile Research Institute Joint Innovation Center. No other funding was used to support this study.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the request of Beijing University of Posts and Telecommunications-China Mobile Joint Innovation Center.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lu, Y.; Zheng, X. 6G: A survey on technologies, scenarios, challenges, and the related issues. J. Ind. Inf. Integr. 2020, 19, 100–158. [Google Scholar] [CrossRef]
Tay, S.; Blanche, P.A.; Voorakaranam, R.; Tunç, A.V.; Lin, W.; Rokutanda, S.; Gu, T.; Flores, D.; Wang, P.; Li, G.; et al. An updatable holographic three-dimensional display. Nature 2008, 451, 694–698. [Google Scholar] [CrossRef] [PubMed]
Pastoor, S.; Wöpking, M. 3D displays: A review of current technologies. Displays 1997, 17, 100–110. [Google Scholar] [CrossRef]
Hong, J.; Kim, Y.; Choi, H.J.; Hahn, J.; Park, J.H.; Kim, H.; Min, S.W.; Chen, N.; Lee, B. Three-dimensional display technologies of recent interest: Principles, status, and issues. Appl. Opt. 2011, 50, H87–H115. [Google Scholar] [CrossRef] [PubMed]
Holliman, N.S.; Dodgson, N.A.; Favalora, G.E.; Pockett, L. Threedimensional displays: A review and applications analysis. IEEE Trans. Broadcast 2011, 57, 362–371. [Google Scholar] [CrossRef]
Geng, J. Volumetric 3D display for radiation therapy planning. J. Disp. Technol. 2008, 4, 437–450. [Google Scholar] [CrossRef]
Javidi, B.; Fumio, O. Three Dimensional Television, Video, and Display Technologies; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
Dodgson, N.A. Autostereoscopic 3D displays. Computer 2005, 38, 31–36. [Google Scholar] [CrossRef]
Hainich, R.R.; Bimber, O. Displays: Fundamentals & Applications; CRC Press: Boca Raton, FL, USA, 2016. [Google Scholar]
Lee, B. Three’Dimensional. Phys. Today 2013, 66, 36. [Google Scholar] [CrossRef]
Urey, H.; Chellappan, K.V.; Erden, E.; Surman, P. State of the art in stereoscopic and autostereoscopic displays. Proce. IEEE 2011, 99, 540–555. [Google Scholar] [CrossRef]
Son, J.Y.; Javidi, B.; Yano, S.; Choi, K.H. Recent developments in 3-D imaging technologies. J. Disp. Technol. 2013, 6, 394–403. [Google Scholar] [CrossRef]
Son, J.Y.; Javidi, B.; Kwack, K.D. Methods for displaying three-dimensional images. Proc. IEEE 2006, 94, 502–523. [Google Scholar] [CrossRef]
ITU-T FG-NET-2030; Network 2030: A Blueprint of Technology, Applications, and Market Drivers towards the Year 2030 and Beyond. ITU: Geneva, Switzerland, 2019.
Lyu, L.; Yang, B.; Hou, W.; Yu, W.; Bai, B. Research on 3D visual perception quality metric based on the principle of light field image display. In Proceedings of the Chinese Conference on Image and Graphics Technologies, Nanjing, China, 22–24 September 2023. [Google Scholar]
Lin, W.S.; Jay Kuo, C.C. Perceptual visual quality metrics: A survey. J. Vis. Commun. Image Represent. 2011, 22, 297–312. [Google Scholar] [CrossRef]
Min, X.; Zhou, J.; Zhai, G.; Le Callet, P.; Yang, X.; Guan, X. A metric for light field reconstruction, compression, and display quality evaluation. IEEE Trans. Image Process. 2020, 29, 3790–3804. [Google Scholar] [CrossRef] [PubMed]
Yan, J.; Fang, Y.; Liu, X. Review of research on image quality evaluation—From the perspective of distortion. Chin. J. Image Graph. 2022, 27, 1430–1466. [Google Scholar]
Kang, L.; Ye, P.; Li, Y.; Doermann, D. Convolutional neural networks for no-reference image quality assessment. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1733–1740. [Google Scholar]
Kim, J.; Nguyen, A.D.; Lee, S. Deep CNN-based blind image quality predictor. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 11–24. [Google Scholar] [CrossRef] [PubMed]
Yang, W.; Zhang, X.; Tian, Y.; Wang, W.; Xue, J.H.; Liao, Q. Deep learning for single image super-resolution: A brief review. IEEE Trans. Multimed. 2019, 21, 3106–3121. [Google Scholar] [CrossRef]
Zhang, W.X.; Ma, K.D.; Yan, J.; Deng, D.X.; Wang, Z. Blind image quality assessment using a deep bilinear convolutional neural network. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 36–47. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015; pp. 1–14. [Google Scholar]
Jia, J.; Wang, W. Review of Reinforcement Learning Research. In Proceedings of the 35th Youth Academic Annual Conference of Chinese Association of Automation (YAC), Zhanjiang, China, 16–18 October 2020; pp. 186–191. [Google Scholar]
Berthelot, D.; Carlini, N.; Goodfellow, I.; Papernot, N.; Oliver, A.; Raffel, C.A. MixMatch: A Holistic Approach to Semi-Supervised Learning. In Proceedings of the Annual Conference on Neural Information Processing Systems 2019, Vancouver, BC, Canada, 8–14 December 2019; Volume 32, pp. 5050–5060. [Google Scholar]
Berthelot, D.; Carlini, N.; Cubuk, E.D.; Kurakin, A.; Sohn, K.; Zhang, H.; Raffel, C. Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring. arXiv 2019, arXiv:1911.09785. [Google Scholar]
Xie, Q.; Dai, Z.; Hovy, E.; Luong, T.; Le, Q. Unsupervised data augmentation for consistency training. Adv. Neural Inf. Process. Syst. 2020, 33, 6256–6268. [Google Scholar]
Sohn, K.; Berthelot, D.; Carlini, N.; Zhang, Z.; Zhang, H.; Raffel, C.A.; Cubuk, E.D.; Kurakin, A.; Li, C.L. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Adv. Neural Inf. Process. Syst. 2020, 33, 596–608. [Google Scholar]
Yarowsky, D. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge, MA, USA, 26–30 June 1995; pp. 189–196. [Google Scholar]
Scudder, H. Probability of error of some adaptive pattern recognition machines. IEEE Trans. Inf. Theory 1965, 11, 363–371. [Google Scholar] [CrossRef]
Adhikarla, V.K.; Vinkler, M.; Sumin, D.; Mantiuk, R.K.; Myszkowski, K.; Seidel, H.-P.; Didyk, P. MPI-LFA (max planck institut informatik light field archive). In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
ITU-R BT.2021; Subjective Methods for the Assessment of Stereoscopic 3DTV Systems. Series, B. ITU: Geneva, Switzerland, 2015.
ITU-R BT.500-14; Methodologies for the Subjective Assessment of the Quality of Television images. Series, B. ITU: Geneva, Switzerland, 2019.
Bach, D.R.; Grandjean, D.; Sander, D.; Herdener, M.; Strik, W.K.; Seifritz, E. The effect of appraisal level on processing of emotional prosody in meaningless speech. Neuroimage 2008, 42, 919–927. [Google Scholar] [CrossRef]
Zou, Z.; Qiu, J.; Liu, C. Light-Field Image Quality Assessment Based on Multiple Visual Feature Aggregation. Acta Opt. Sin. 2021, 41, 1610002. [Google Scholar]

Figure 1. Main flow of method.

Figure 2. Lookingglass device (left), quilt array image (right).

Figure 3. Six types of scene content.

Figure 4. 5-levels scale (left) and presentation rules (right).

Figure 5. Feature engineering improvement.

Figure 6. Optimization of training strategy.

Table 1. Distortion processing methods, parameters, and levels.

Distortion Type	Adjustment Parameter
Smooth filtering	Smooth kernel size
Gaussian filter	Gaussian kernel size, standard deviation
RGB color saturation	Channel coefficient
JPEG compression	Compression ratio
PNG compression	Compression ratio
Gaussian noise	Mean and variance
Brightness range	Luminance value

Table 2. Requirements for the subjects.

Evaluation Dimension	Description
Age	Over 10 years old, under 65 years old
occupation	Stereoscopic information evaluation for professionals and non-professionals
Asthenopia	The visual fatigue questionnaire score was less than 16 points
Corrected vision	Above 1.0
Squint	No more than 10 prismatic degrees
Stereovision	The acuity of far and near stereoscopic vision is less than 60 or 30 s radians
Contrast sensitivity	The contrast ratio of the Log raster conforms to the medically prescribed normal range
Colour vision	Right

Table 3. Model classification results.

Dimension of Experience	Accuracy Rate
Third dimension	81.88
Spaciousness	83.12
Comfort	80.21

Table 4. The impact of each feature on the results.

Feature Name	Realism	Feeling of Space	Comfort
CNN content features	−34.10%	−41.30%	−27.50%
Structural similarity	−5.02%	−0.01%	−5.02%
Peak signal-to-noise ratio	−6.24%	−0.36%	−3.17%
Spatial domain features	−7.08%	−9.30%	−6.30%
Angle domain features	−5.20%	−11.00%	−2.50%
Null-angle coupling relationship characteristics	−8.10%	−12.02%	−6.02%
Projection domain features	−4.45%	−6.51%	−5.42%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hou, W.; Bai, B.; Lyu, L. Evaluation Model of Holographic Communication Experience in the 6G Era Based on Light Field Display. Appl. Sci. 2023, 13, 12381. https://doi.org/10.3390/app132212381

AMA Style

Hou W, Bai B, Lyu L. Evaluation Model of Holographic Communication Experience in the 6G Era Based on Light Field Display. Applied Sciences. 2023; 13(22):12381. https://doi.org/10.3390/app132212381

Chicago/Turabian Style

Hou, Wenjun, Bing Bai, and Linkai Lyu. 2023. "Evaluation Model of Holographic Communication Experience in the 6G Era Based on Light Field Display" Applied Sciences 13, no. 22: 12381. https://doi.org/10.3390/app132212381

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluation Model of Holographic Communication Experience in the 6G Era Based on Light Field Display

Abstract

1. Introduction

2. Related Work

3. Method

3.1. Data Set

3.2. Model Structure

4. Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI