1. Introduction
Network operators and system administrators are interested in the mixture of traffic carried in their networks for several reasons. Knowledge about traffic composition is valuable for network planning, accounting, security, and traffic control. Traffic control includes packet scheduling and intelligent buffer management, to provide the quality of service (QoS) needed by applications. It is necessary to determine to which applications packets belong, but traditional protocol layering principles restrict the network to processing only the IP packet header.
Virtual reality (VR) devices are novel, and attractive consumer electronics that can provide an immersive VR user experience (UX) [
1,
2]. In order to enhance the UX of VR services, there have been significant efforts to enhance not only its video and audio quality and interaction delay, but also the convenience of VR device connections to VR consoles, or VR-capable personal computers (PCs). Therefore, consumers in the VR market require high-resolution and comfortable VR devices. In order to provide a comfortable VR service environment without the need for wires, a wireless communication scheme with low latency needs to be employed for VR devices. However, high-resolution features are not only a challenge with respect to imaging equipment, but also for wireless interfaces. Therefore, we need to find the trade-off between these two features, i.e., reliable wireless connection and high-resolution video.
Wireless local area network (WLAN) is the most popular unlicensed-band wireless communication interface, which has a low cost while achieving high data throughput. Because some categories of IEEE 802.11 are designed to replace wired video interfaces, including high-definition multimedia interface (HDMI), IEEE 802.11-based WLAN can provide very high data rates.
In order to meet the high data-rate requirement of high-resolution video transmission, the IEEE 802.11 working group extended IEEE 802.11 standards to support the 60-GHz frequency band with a wide bandwidth. IEEE 802.11ad is the amendment standard, to operate IEEE 802.11 in the 60-GHz frequency band. In particular, IEEE 802.11ad utilizes the 2.16-GHz bandwidth to achieve a high data rate. However, IEEE 802.11ad has a relatively short communication range, owing to high-frequency band operation under indoor environments [
3,
4]. IEEE 802.11ay is the enhanced version of IEEE 802.11ad, with the support of channel bonding and multiple spatial streams [
5].
Although these 60-GHz WLAN standards can be considered as a wireless VR interface for high-resolution video transmission, future VR systems will require real-time interactive control between multiple VR users. Some applications related to gaming industries have been adopting multiuser augmented reality (AR) systems to provide enhanced gaming experiences in the living room [
6]. Because VR consumers have already experienced these multi-user AR systems, multi-user VR also needs to be provided to satisfy the needs of VR consumers. The sharing of VR experiences with nearby users is expected to provide much more immersive VR UX to VR consumers [
6,
7,
8].
Wireless multi-user VR systems based on IEEE 802.11 standards can be described as in
Figure 1, which shows the elements of wireless multi-user VR systems, VR data flows, and delay components of VR services. In order to provide an immersive VR interaction experience, each component of the VR system shall provide proper feedback, based on its sensing data. From delay components of
Figure 1, the VR interaction delay of a wireless VR system,
Tvr, can be described as follows.
VR devices shall track motion and command of users by using sensing components. The sensing components may cause a delay, depending on their sensing performance. This delay is considered as Tsensing. Tproc1_in_device is a processing delay for a processor unit in VR devices, which handles sensing data and generates data packets. The generated data packets are transmitted to the associated VR computing device over a WLAN link in the proposed system. One-hop wireless packet delivery over a WLAN link causes a delay, Ttransfer_UL, which could be a relatively large value depending on the wireless channel condition, including channel congestion caused by channel contention. Ttransfer_UL is the most dominant delay component in the proposed multi-user VR system. The VR computing device, which is generally a PC, requires a processing delay, Tproc_in_PC. In order to provide seamless VR UX by minimizing a processing delay, high-processing performance is preferred. The VR computing device generates VR feedback packets, including VR video data, and transmits them to its associated VR devices over a WLAN link. The delay caused by this WLAN transmission is considered as Ttransfer_DL. Since the transmission causing the delay of Ttransfer_DL is from one node (VR computing device) to multiple nodes (wearable VR devices), and the transmission causing the delay of Ttransfer_UL is from multiple nodes (wearable VR devices) to one node (VR computing device). Ttransfer_DL can be more easily controlled than Ttransfer_UL. The downlink delay, Ttransfer_DL, is the second most important delay component in the proposed system. The VR devices decode the packet and operate to generate sensory feedback, which causes a processing delay, Tproc2_in_device. For instance, a VR headset decodes VR video image and perform video enhancement procedure, for a seamless and immersive UX. Tproc2_in_device is a delay component for this kind of hardware processing, to provide sensory feedback.
IEEE 802.11 systems are designed as contention-based, channel access, wireless communication systems [
9]. Because of the properties of contention-based channel access, the performance of IEEE 802.11ad/ay systems degrades dramatically as the number of wireless stations (STAs) increases [
10]. In other words, even though most advanced WLAN protocols and VIDEO codecs are utilized, supporting multi-user VR with low latency is almost impossible, owing to the multiple access inefficiency of WLAN.
VR devices need to upload their sensing information to trace user position and pose frequently, and each VR device usually keeps track of its position with a 1000-Hz sensing rate. This means that in multi-user VR, small uplink frames are generated very frequently, by multiple VR devices. Severe WLAN channel contention is caused by an overwhelming number of small VR control frames, leading to a very long channel access delay, which makes the operation of multi-user VR over WLAN impossible. Such problems cannot be solved by conventional IEEE 802.11 distributed coordination function (DCF) and enhanced distributed channel access (EDCA), which do not guarantee frame delivery delay [
9].
In order to provide multi-user VR services over WLANs, the channel access delay needs to be minimized, and the frame rate of VR video and the arrival rate of uplink (UL) frames should be adaptively adjusted, depending on the wireless environment. Since both the technical progress of frame-interpolation schemes [
11,
12,
13,
14,
15,
16] and the frame-interpolation module are expected to be commonly employed in next-generation VR headsets [
17], users could have stable high-refresh-rate vision with a lower downlink (DL) VR video frame rate. Sensing data, which is not fed back to computing machines, including VR-ready PCs, can be utilized by the frame-interpolation module to generate interpolation frames, with accurate moving data from the user. These interpolation frames are not based on future frames [
11], because these are real-time video frames. The interpolation frames are generated from past frames and motion track data. Moving its latest frame to opposite vector of sensor data is the easiest method of generating interpolation frames.
In order to assist the interpolation frame generation, some network characteristics, e.g., the video frame arrival rate and control frame delivery rate, need to be provided to VR devices. When there is a mismatch between the visual and vestibular systems, VR sickness can result. This means that if the VR vision in VR displays cannot reflect the real movement of users, the user experience may be degraded. In order to prevent VR sickness, accurate VR frame interpolation operations are required. Therefore, in the network-condition-based VR frame delivery proposed in this paper, the video frame interpolation procedures are very important, in order to reduce VR sickness. The relationship between the received video frames and interpolated video frames is shown in
Figure 2. In this paper, interpolated video frames do not refer to the video frames generated by the graphic processor of a VR PC or a VR console. Here, only frames that are generated by VR headsets after receiving video frames from a VR access point (AP) are referred to as interpolated video frames, which can be generated using received frames and motion track information. As shown in
Figure 2, the processing unit in a VR headset shall move the last video frame to the reverse direction of user’s motion vector, to generate interpolated video frames. The interpolated video frames provide immediate visual responses that solve the mismatch between visual and vestibular systems. In wireless multi-user VR systems, since wireless links with inefficient multi-user channel access performance are bottlenecks, which cannot provide a sufficiently high data rate for a high video frame rate, many interpolated frames are generated. Because of such problems, next-generation wireless multi-user VR systems should be designed to be tightly coupled with wireless systems. The next-generation, wireless, multi-user VR systems should optimize their VR video image and motion tracking rate, considering the wireless link status. Based on the above observation, in order to design a high-quality multi-user VR system over WLANs, both wireless link optimization, which enhances the wireless channel access efficiency, and tight-coupled VR optimization with a wireless system, which prevents unnecessary resource wastage, should be considered. In this study, to provide high-quality VR UX in a multi-user WLAN VR service, we consider both the multi-user wireless link efficiency enhancement and VR optimization tightly coupled with a wireless system.
This paper is an extension of one that proposed delay-oriented VR mode that could be utilized by a VR AP [
18]. The delay-oriented VR mode is included in this paper as a trigger-based channel access method. This paper proposes a novel wireless multi-user VR protocol structure, as well as specific channel access and system control schemes, to support multi-user VR systems over WLAN, including the delay-oriented VR mode. In addition to the novel structure and the enhanced channel access and control schemes, in this paper, we propose connection-recovery algorithms for a seamless VR UX.
The rest of this paper is organized as follows. In
Section 2, we explain the proposed system architecture and protocol design, including the connection–recovery algorithm. In
Section 3, we investigate the system performance of the proposed VR architecture and multi-user VR schemes, by performing extensive simulations. We also examine the delay and packet loss rate (PLR) performances in various simulation scenarios. Finally,
Section 4 explains the reason why conventional EDCA, which is utilized in WLAN systems, cannot handle wireless multi-user VR applications, but the proposed system can handle them.
2. Architecture and Protocol
2.1. IEEE 802.11-Based Wireless VR System Architecture
The proposed multi-user VR systems with wireless interfaces consist of multiple IEEE 802.11 medium access control (MAC) layers and physical (PHY) layers. In order to accommodate multiple IEEE 802.11 protocols, we now propose a novel VR convergence layer (VRCL) and its interworking scheme with a station management entity (SME).
The network architecture for multi-user VR systems needs to meet the requirements of very high throughput and low latency. In order to satisfy the high data throughput requirement of high-resolution video images, we can utilize 60-GHz standards, i.e., IEEE 802.11ad/ay. These amendment standards are designed for wireless high-resolution image devices. Because immersive VR UX could be achieved by these high-resolution video images, the use of 60-GHz standards is inevitable in multi-user VR scenarios. However, although these high-throughput wireless standards are utilized, with a combination of UL and DL transmission in multi-user scenarios, the effective throughput and delay performance would deteriorate. This means that intra-basic service set (BSS) channel contention should be controlled by wireless VR protocols. Without such control algorithms and additional channel resources, VR experience cannot be guaranteed in wireless network environments.
If there are only DL video frames transmitted by a VR AP, network degradation may not occur. Multiple VR devices connected to an AP should transmit their motion tracking data and control data very frequently, i.e., 1000 Hz per device. Those uplink motion tracking and control data are problematic, because frequent motion tracking and control data can cause large channel access delays and throughput degradation. In order to resolve such an uplink data contention problem, multiuser control channels conforming to the wireless standard should be utilized for multi-user VR networks. IEEE 802.11ax is the most representative standard that supports multi-user network in the 5-GHz frequency band [
19]. This means that because the 5 GHz channels of the IEEE 802.11ax standard do not interfere with 60-GHz frequency channels, a VR AP can accommodate multi-user uplink traffic very efficiently.
As a result, VR devices including VR APs need to have special multi-standard protocol architecture, described in
Figure 3, to support VR connections based on WLANs. IEEE 802.11ax is the amendment standard for highly efficient WLAN in multiple-device scenarios. IEEE 802.11ax defines a trigger frame to accommodate multiple uplink frames from multiple devices simultaneously [
19]. In many scenarios, it may be utilized in parallel with a conventional single-frame transmission. If the trigger frame requests stations to transmit its UL data, each station transmits its data without additional channel access delays. This trigger frame is able to substantially reduce the contention delay of WLAN systems.
The IEEE 802.11ad amendment standard is designed to utilize the 60-GHz frequency band, which provides wide bandwidth and high throughput. IEEE 802.11ay is an enhanced version of IEEE 802.11ad, and provides four times the bandwidth using channel bonding and additional spatial streams [
5]. Because of the wide bandwidth, the 60-GHz standard is able to provide very high data rates over short distances. Therefore, in order to adequately utilize the high data rate to support multi-user VR, the channel inefficiency caused by channel contention should be minimized by separating the UL transmission and DL transmission.
The VR application layer described in
Figure 3 is a protocol layer that provides VR images and control information on VR devices. VR video frames are generated using the frame rate that is reported by VRCL, and a detailed explanation on VRCL is provided in
Section 2.2. The generation rate of the VR video frame is restricted by the VRCL, and it prevents the VR application layer from generating meaningless video frames. For VR controllers and VR headsets, motion-tracking information measured by sensors in VR devices is accommodated and utilized in this VR application layer. VR devices, especially VR headsets, would generate interpolation frames in this layer. The interpolation frame is the frame that needs to be displayed between the received real video frames delivered by a VR AP. These interpolated frames could be generated by performing many effective interpolation algorithms [
11,
12,
13,
14,
15,
16]. In this paper, VR video interpolation frames need to be based on motion-tracking information that is measured by VR sensors in VR devices [
20,
21,
22]. Because this situation was not previously defined, further optimized interpolation methods should be studied.
The convergence layer described in
Figure 3 is a protocol layer that enables multiple network standard convergence, as well as some special information for the VR application layer. VR videos that are generated by a VR PC are delivered to a VR AP, and the convergence layer in the VR AP determines the transmission interval of video frames based on network conditions, and reports the rate to the VR application layer of the VR PC. If VR video frames are transmitted without these considerations for network conditions, users would suffer poor UX, owing to the large delay time. Similar to the DL VR video frame transmission, the convergence layer also controls the UL frame delivery rate, based on its network condition. This would reduce network congestion or the required network performance of VR devices. As a result, the convergence layer prevents these catastrophic situations by controlling the frame delivery rate.
Although some motion-tracking information cannot be delivered, depending on the decision of the convergence layer, the convergence layer still provides motion-tracking information to the VR application layer in the VR headset, to generate interpolation frames. These interpolation frames should be generated considering motion-tracking information, in order to prevent VR sickness. The number of interpolation frames that need to be generated before the next frame may be predicted by the frame arrival rate information obtained from the convergence layer.
The station management entity (SME) is used for the accommodation and delivery of parameters for each network layer [
9]. In some cases, frame off-loading could be performed by controlling the frame-delivery interval based on PHY and MAC layer parameters. The packet loss rate information and frame interval information are key sets of information delivered by the SME.
Each MAC and PHY layer follows its own standards. The convergence layer controls and schedules all frames into those multiple MAC layers properly. For example, DL data frames can be scheduled in the IEEE 802.11ad/ay MAC and UL data frames can be scheduled in the IEEE 802.11ax MAC. Because IEEE 802.11ax is a 5-GHz standard with multi-user support, it is a suitable protocol for the UL transmission protocol in multi-user VR systems.
Figure 4 shows how IEEE 802.11ax could accommodate multiple frames, using orthogonal frequency-division multiple access (OFDMA). The trigger frame is transmitted by an AP, to instruct stations that have UL frames when and where to transmit their UL frames. In usual IEEE 802.11ax scenarios, the trigger frame would contend with other UL frames to guarantee the opportunity for all stations to access the channel. However, a VR AP requires a very tight delay property, and its traffic pattern is very regular—it is a special-purpose AP for VR devices. This means that for associated devices, there is no need for channel contention to guarantee opportunities for channel access. In other words, in order to fully utilize the UL OFDMA of IEEE 802.11ax for multi-user VR services, single-user UL transmission should be regulated. Single-user UL transmission can be regulated by the multi-user EDCA procedure in 802.11ax [
19]. By using a new set of EDCA parameters, which is used by STAs in a multi-user BSS, AP can set STAs to have very low-priority EDCA parameters. Such low-priority EDCA parameters make STAs’ access time for a single-user UL transmission long, and the AP can transmit a trigger frame for UL OFDMA procedure while the STAs are waiting for the single-user UL transmission.
2.2. Protocol Design
The VRCL should encapsulate interpolation and frame rate information, using a VR video frame in an aggregated MAC protocol data unit (A-MPDU). The VRCL would control the UL and DL frame arrival rate for WLAN systems based on the frame rate of the original VR video rate and the wireless environment. Owing to VRCL, a VR AP and VR devices could utilize IEEE 802.11 family standards that provide MAC and PHY layers without standard modification. VRCL is the only additional layer for a multi-user VR network interface. VR applications do not require any additional network control features, and are required only to generate proper VR video frames, based on information delivered by the VRCL. The VRCL provides its UL control information rates and the required DL video frame to the VR application layer. Because the VRCL would discard VR frames that could cause large network delays on VR devices, the VR application does not need to waste its resources on unnecessary video frames. For this reason, the VRCL and VR AP need to perform the function of a VR system controller.
Not only does the VRCL provide information that can be used to control VR video frames to the VR application layer, but VRCL also controls its network operations based on information that is provided from the SME. The packet loss rate, received signal strength indication (RSSI), and modulation and coding scheme (MCS) index information are the representative information observed by VRCL. Based on the observed information, VRCL could control the MCS level, channel bandwidth, frame arrival rate, number of users supported by the VR AP, and so on. The 60-GHz and 5-GHz standards may be utilized by VRCL, because of its efficient multi-user VR operation. The main purpose of the 60-GHz wireless link, in this paper, is the delivery of a high-resolution VR video frame. Because that delivery requires a large bandwidth, the use of the 60-GHz standard is inevitable. A 5-GHz standard is employed to accommodate the multi-user UL frame, by utilizing the multi-user UL OFDMA procedure.
Although the required data rate of multi-user uplink sensing data is not very high, the frame arrival rate of the sensing data frames is relatively high. Usually, the video frame rate ranges from 90 Hz to 120 Hz, but the motion sensing rate is 1000 Hz in current-generation, wired VR devices. Because of this, a small-size UL data frame could spoil the overall wireless VR system, in spite of its small required data rate. This causes a large channel access delay on the DL VR video frame, unless the DL and UL are separated. By separating the UL sensing data transmission from the 60-GHz standard, DL VR video frames do not contend with other frames from VR devices. If a VR video frame requires additional transmission time, owing to its poor channel condition or the increasing number of VR users, the VRCL can control the video frame rate based on its packet loss rate. The VRCL in a VR AP encapsulates the video frame rate information that is used to generate VR video frames. The VRCL in VR devices extracts the information from the received frames and delivers the information to the VR application, in order to generate a video interpolation frame based on its received video frames and motion-sensing information.
Unless the channel has high coexistence issues, the DL frame may not have large channel access interference. However, in the WLAN multi-user VR scenario, UL frames always suffer large contention delays, owing to frequent channel access with the contention-based channel access method. In order to solve the problem, a VR AP should support the UL frame accommodation by transmitting a trigger frame, which is defined in the IEEE 802.11ax standard. In order to maximize the UL, the multi-user frame accommodation channel access of each VR device should be prohibited. The channel access may be prohibited by setting EDCA parameters, including AIFSN and CW, to very large values, and maximum values are particularly recommended. Setting the EDCA parameters does not require any modification to IEEE 802.11 standards, and manufacturers could configure the EDCA parameters easily.
2.3. Algorithm
The VRCL in a VR AP could perform DL VR video frame rate control and UL VR sensing frame rate control, depending on its wireless connection status.
Figure 5 shows how the VRCL controls its DL VR video frame rate. The manufacturer of a VR system could set its packet loss rate thresholds and corresponding VR video frame rate. A video refresh rate of 90 Hz is usually used in current-generation VR systems, but next-generation VR systems may support a refresh rate of at least 120 Hz. This means that “refresh_rate_1” in
Figure 5 needs to be set to its native refresh rate of the VR display. Refresh rate parameters with larger index numbers should be set to a larger value than refresh rate parameters with smaller index numbers. For “DL_PLR_threshold” parameters, as listed in
Figure 5, the same rules should be applied, and a parameter with a larger index needs to be set to a larger value. The proposed algorithm in
Figure 5 aims to solve the connection problem of wireless multi-user VR systems, by controlling the required channel throughput. If the channel condition is worse than “DL_PLR_threshold_n” (see
Figure 5), the VR AP should perform a recovery procedure. The recovery procedure can be modified by the manufacturer, but the use of alternating channels and 5-GHz off-loading is at least recommended.
Figure 6 shows an algorithm that controls the trigger frame transmission rate. A high value for “Trigger_rate_VR” (see
Figure 6) indicates a small trigger frame interval, based on the PLR history of the UL sensing data frame. Because associated VR devices never perform EDCA procedures, they are disabled by the VR AP, and only overlapped BSS (OBSS) stations could cause channel collisions and channel interference. Similar to DL VR video frame rate control, the VRCL controls the frame rate and performs recovery procedure in poor channel conditions. In this paper, channel alternation and bandwidth modifications are recommended for the recovery procedures.
The VRCL in a VR headset could receive the DL VR video frame rate and UL VR sensing frame rate information from the VR AP. Based on each set of information, the VR application could generate an interpolation frame that is utilized during the DL VR video frame interval. Each interpolation frame should utilize motion-sensing information, even though the information was not delivered to the VR AP. From the motion-sensing information, the VR application layer should generate a motion vector, and the reverse vector of the generated motion vector should be applied to the interpolation frame.