1. Introduction
Autonomous navigation in an unknown environment is one of the greatest challenges for a VIP as vision plays an important role in gathering the information necessary for many processes involved in this complex task. In the last decade, many research projects were developed to compensate for the loss of vision, most of them relying on sensory substitution. Sensory substitution is grounded in the idea of replacing an impaired or lost sense with another sense [
1]. Paul Bach-y-Rita, pioneer in this field, aimed to work at restoring visual functions in blind people [
2]. The usual sensory substitution devices (SSDs) aspire to efficiently convey visual data in real-time via touch or hearing. This data may include the shape and/or size of an object, the perceived (ego-centered) distance from it, or the color of the object [
1,
3]. Typical SSDs consist of the following three components: a sensor, a processing unit that simplifies and converts the sensory information, and a user interface to transmit this information to the user. All SSDs are based on the sensory substitution motor loop (cf.
Figure 1).
This loop presents the embodiment of perceptions: (1) The sensor (usually a camera) is pointed in a given (ego-centered) direction (to the target). (2) A cloud computing or a computer interprets the image and converts it to tactile or audio stimulations. Then the user receives and interprets these stimulations (audio and tactile descriptions), and the brain generates the ad hoc percept. (3) During training, the user tests percepts while interacting with the space via the received feedback. Through iterations of the sensory-motor loop, the VIP adjusts understanding of the code to match perceptions with the sensory feedback that was perceived.
Presently, some SSDs cannot transfer the volume and complexity of visual information with the precision and speed suitable to fit the vision-based task. They lack spatial and temporal resolution and also bandwidth [
4]. Schinazi et al. [
5] presented the topic of functional reorganization of perceptual modalities considering new developments for SSDs based both on locomotion and wayfinding. Consequently, some SSDs try to understand how some specific elements improve and assist navigation and wayfinding [
6,
7,
8,
9].
Navigation usually involves both wayfinding and locomotion tasks [
10]. Locomotion is closely linked to the ability to localize obstacles and negotiate a path around them, while wayfinding involves the orientation in space and ad hoc path planning in any environment (large environments included). Both tasks are easier to implement having a visual input [
11,
12,
13,
14]. However, locomotion and wayfinding involve different components of decision making, different skills [
10], and require different characteristics of visual information. For example, in locomotion tasks, vision is used to update distance information to an obstacle [
12,
13]; in wayfinding tasks, vision helps in spotting points of interest for mobility (PIM), landmarks, cues, and clues useful for navigation guidance. Consequently, SSDs should be geared to answer the specific demands of both locomotion and wayfinding to convey the specific information needed for both tasks. Therefore, to efficiently assist the navigation, we need to develop a novel system that supports both locomotion and wayfinding, thus allowing the emergence of spatial awareness; the proposed system is named the MAPS.
The paper is organized as follows:
Section 2 outlines the state of the art on SSDs, while
Section 3 presents a novel model of VIP mobility and overviews the TactiBelt and F2T designs, the two components of the MAPS system.
Section 4 presents the TactiBelt detailed design (for its potential reproducibility).
Section 5 outlines some preliminary evaluations of the TactiBelt with VIP and blindfolded persons which confirm relevance of MAPS for the target assistance. Finally,
Section 6 summarizes our ideas and discusses future developments of the MAPS system
2. State of the Art on SSDs
Over the years, several researchers have approached the substitution of the visual sense using the hearing or tactile senses [
14,
15]. For visual-to-audio SSDs, two of the most popular devices are “the vOICe” [
16,
17,
18] and “EyeMusic” [
19].
The vOICe converts gray-level visual images by scanning them in video mode (from left to right, from top to bottom). Each pixel is converted into a sound, based on its luminance and the pixel’s orthogonal coordinates in the image. High luminance pixels present the sound louder than low luminance pixels. The pixels on the left of the visual field are played before those on the right, and pixels at the top have a higher pitch than those at the bottom [
20]. The vOICe allows VIP individuals to access visual information through hearing to recognize and localize the object after a long training [
3].
‘EyeMusic’ transforms the entire scene visual parameters (shape, location, brightness, and color) into sound. It uses different instrumental sounds to perceive brightness and color.
However, the interpretation of the output signals of these devices is difficult and requires long training phases to understand the represented scene [
21,
22,
23]. Space awareness is difficult to acquire. Moreover, for navigational tasks, the constantly changing perspective and distance while moving cannot be processed in real-time. A VIP has difficulty differentiating multiple objects, especially those vertically aligned, as they have difficulty distinguishing between the pitches of the sounds that are played simultaneously. Furthermore, such devices cover environmental audio cues.
To overcome these limits, tactile-visual sensory substitution systems were proposed. The Brainport (or TDU, Tongue Display Unit) is one of the most popular SSD devices. This device transforms visual images into a pattern of electrical stimulations delivered via an electrode array that is placed on the tongue [
1,
24]. The users explore tactile patterns representing a scene by using this electrode pad. Therefore, objects can be processed theoretically in parallel [
25], and they do not have difficulty distinguishing between vertically aligned objects.
With decades of research, despite ambitious aspirations and impressive achievements, few devices have been accepted by the VIP in their daily life, and no one device has become widespread as none effectively improve the life quality of the VIP [
4,
21,
26]. Chebat et al. [
27], identified several drawbacks of the current forms of SSDs and proposed some promising approaches that attempt to circumvent them. These are: learning, standardization of training, temporal coherence, reduction of the cognitive load, orientation, depth, contrast, assisted functions and costs and dissemination; they are shortly described in this paper.
The learning problem: With the current SSDs, the end users need a lot of time for practice and training [
7,
8,
24]. Learning skills with a new SSD that contradict received mobility training could impair previously acquired mobility skills and discourage potential users from using SSDs.
The standardization of training: In this field, many publications examine certain SSD elements, but each paper has a new protocol to fit its needs of methodology. The performance of SSD devices are difficult to compare due to the lack of standardization. Optimizing the learning processes and standardizing the performance would assist the perceptual training and the guidance of potential users through steps needed to interpret the information provided by a device. This would solve the learning problem.
The temporal coherence: For an SSD to be useful in navigation, the image of the user’s surroundings needs to be presented and interpreted in real-time for a user’s possible immediate processing. Some SSDs are designed based on audio which transfers the visual information into sounds using the temporal flow. That can add a small delay in the delivery of the 2D message to the user [
28]. On the other hand, some SSDs are designed based on touch, such as the TDU [
7,
8], which can transmit the visual information in real-time. However, their interpretation is sometimes slow due to cognitive load induced by the complexity of the tactile images.
The cognitive load: This problem is directly linked to the complexity of the algorithms used to generate substituting stimuli, which ultimately need to be learned by the user. The more complex the interpretation of SSD information, the more difficult the completion of the sensorimotor loop presented in
Figure 1. Therefore, the simultaneous interpretation of the information provided by the SSD and accomplishment of a task requires important cognitive burden. Finding the balance between minimal and necessary information which should be provided by the SSD is fundamental.
The orientation: This problem is closely related to the accurate (precise) localization of objects in space using SSDs. The direction information provided by the SSD is often confusing, and although participants can detect objects in the field of the sensor’s activity, they often report being unable to tell exactly where the sensor points in the environment. To localize an object in space accurately, the depth of the viewed scene should be as constant as possible, and the relevant feedback must be provided. Proper training in remapping must be optimal to achieve the appropriate distal attribution of the moving stimulus.
The depth problem: It is difficult to detect the distance to the obstacles, and avoid them [
7] if depth information is lacking. However, some recent devices can calculate depth information. For example, with Eyecane the end-users can understand the depth information through vibrations and sounds [
22].
The contrast problem: Many SSDs can work well under optimal contrast conditions; however, under different conditions or with any other settings, they may not work correctly, such as the TDU.
The resolution problem: Downsampling of the image resolution resolves issues to use another modality, but it reduces the resolution of data. That makes it harder to recognize the details of a scene. Nevertheless, zooming in can improve this problem, for example EyeMusic [
29].
The cost problem: The cost of SSDs is still high because of the long research and development phases. Some companies and laboratories can reduce the cost of SSDs by developing their prototypes on the existing devices (ex. Smartphones). However, the high price is still a problem to accept and provide to end users.
The dissemination problem: Many scientific journals are not always easily accessible to the VIP, especially the 2D data such as graphs and figures. We should disseminate the results of scientific research to all, including the VIP.
Although some attempts have been made to overcome the above listed problems, they still have some limits. For example, the Eyecane is easy to use and requires little training but has a low resolution. The vOICe and the EyeMusic offer a higher resolution, but they comprise complex coding that makes them more difficult to use and they, consequently, require many hours of training. Therefore, we propose the MAPS, a novel system for VIP mobility assistance based on the journey approach implementation learned in mobility classes. It offers a good compromise between conveying high-level information for navigation, data resolution, and its usability. It uses two hardware cooperating devices, the F2T, a tactile tablet for electronic (imaged) map accessibility based of the force–feedback principle, and TactiBelt, a haptic belt providing real-time information on nearest obstacles and on target to reach.
3. The MAPS, a Novel System for VIP Mobility Assistance
The MAPS system for VIP mobility assistance consists of three subsystems as shown in
Figure 2. Subsystem 1 assists the “Map space learning” using the tactile tablet F2T (Force Feedback Tablet). The goal of this subsystem is to help the VIP memorize the map of the environment where they will move. After preparing the journey, the VIP starts it (using the white cane) and may benefit from the assistance provided by Subsystem 2: a shift from “learned (memorized) map” into physical navigation using TactiBelt, its accessories (such as a camera) and associated software (space perception control and journey control via the mobility graph—a kind of VIP specific GPS). By providing the mobility graph built on a map supporting the path integration navigation strategy, Subsystem 2 aims to help the VIP move more independently and lower stress and cognitive load. During the journey, if the users forget the map memorized information, they can use the information provided by Feedback 3, which is a “consultation and updating map” displayed on the F2T. The goal of this feedback is to help the VIP recall the map of its nearest space. Feedback 3, a specific software running on the F2T, works similarly to the classic GPS (and is still in development).
The subsequent subsections provide overviews of the MAPS Subsystems 1 and 2.
3.1. Feedback 1: Map Space Learning
Today, map information has different media: thermoformed maps, concrete maps, and magnet-based maps as shown in
Figure 3. However, such media have their drawbacks: their display is static and at a fixed scale, they have a fixed predefined (north-south) map orientation, and their content is difficult to exploit during the journey.
To overcome these limits, we propose an interactive tactile tablet based on the force–feedback principle, hence its name F2T, force–feedback tablet (cf. a model design on
Figure 4 and a current prototype on
Figure 5). Our current prototype is activated by two small gear motors moving a thumb stick controlled by an Arduino Nano board (ATmega328 microcontroller) communicating with a PC through a USB and a graphical interface dedicated to haptic environment development and test, developed in Java. Detailed design and prototyping of the F2T are provided in [
30].
The general scenario of “map space learning” can be summarized as follows:
(1) The user selects the area to explore through audio commands and F2T buttons.
(2) The map is loaded from a GIS (geographic information system) provider and automatically converted into its equivalent topological representation. The proposed journey path is also provided, (cf.
Figure 4 black line on the simplified map of Faculty of Rouen Normandy University).
(3) The Points of Interest for Mobility (PIM), useful to both confirm journey progress and lower independent mobility stress are added to the uploaded map (map annotation).
(4) Known PoIs (points of interest in the usual sense) are uploaded from GIS (roads, fountains, building, shops,…) and converted into localized sound sources of the audio of the MAPS system. This audio-enhanced journey path is accessed through the F2T which allows the user to explore the map with the use of a thumb stick/joystick (controlled by a force feedback mechanism).
The F2T provides the graphic content of images’ spatial information by 2D force feedback. The displayed information can be explored by moving a mobile thumb stick whose movements’ resistance levels vary depending on the basic information (e.g., slowing down or stopping the user when trying to move over a wall). The F2T can provide passive effects (textures and reliefs), active effects (dynamic scene), and actively guided movements during the exploration. Passive and active feedback is used to convey information about the map (space organization) during a
free exploration, while active guidance is used to provide
direct guidance along a path. Examples of simple “tactile images” can be seen in
Figure 6, where colors represent different types of frictions used for feedback generation.
We divide the passive feedback into two basic categories based on the user’s actions with respect to the functional map:
- -
Friction feedback: The F2T can simulate both solid and fluid friction, allowing different textures to be presented.
- -
Elevation feedback: This effect can be used to simulate slopes and bas-relief elements. A high elevation difference also allows edge simulations, making it possible to follow the shape of an object.
Furthermore, we can create more complex tactile paths by combining passive and active feedback. For example, we make “canyons” where the user is oriented to exit from either side. If the user tries to push his/her finger toward other directions, the force feedback will simulate a slope to push the end user’s finger to the canyon bottom. This canyon indicates the “walkable” paths or areas that allow the user to only move in some directions.
3.2. Feedback 2: Effective Displacement Using TactiBelt
The memorized map is the basis for effective displacement with a cane via our original TactiBelt (
Figure 7). We designed a new prototype based on the recommendations of SSDs. The TactiBelt is designed with vibrator motors and is worn around the waist. This kind of interface is discreet, can be worn under a large pullover, and allows the end users to perceive ego-centered spatial information. The belt has three layers of vibrators to encode different information on distal obstacles (surface located (cane detectable obstacles and over a distance of up to 5 m) and overhanging obstacles (the upper row)). This prototype will add two front-facing cameras that are embedded into a pair of glasses. They are then combined with an inertial unit to provide depth information about nearby obstacles. A GPS/Galileo chip will provide absolute localization and ego-centered distance information about nearby landmarks. Cartographic data will be collected from online services or from buildings’ blueprints for indoor navigation. However, the first prototype (only TactiBelt) will be tested in a virtual environment (
Section 5).
Some prototypes are designed using a vibrotactile system [
31,
32,
33] or a commercially available Sunu Band (
https://www.sunu.com/, accessed on 23 April 2022), to enhance the peripheral visual detection of the VIP. They transfer only the information of obstacles to a vibration motor (distance, orientation, elevation). In addition to providing information on obstacles, the TactiBelt can assist during physical (or virtual) displacement via movement from point A to point B by a set of intermediate steps performed along the adjacent segments, each segment linking two consecutive PIMs (
Figure 8). The practical implementation of this strategy is based on a mobility graph, extracted from the annotated geographic map [
34]. The physical displacement between adjacent nodes is supposed to be performed straightforward. The path integration algorithm is based on our bio-inspired indoor and outdoor mobility model [
35].
While moving, thanks to vibrators, the TactiBelt provides to the VIP two types of information on the 3D environment, virtual or real (cf.
Figure 9): the nearest obstacle (blue circles) and the next PIM of the mobility graph (green circles). A specific vibration indicates the final journey PIM (“target is reached”). The position of the activated vibrator indicates the ego-direction of the obstacle/PIM, while the amplitude of vibrations indicates the distance to obstacles/PIM (knowing that the vibration amplitude is inversely proportional to the distance). The continuous vibration pattern is used for nearest obstacle information, and the discontinuous vibration pattern is used for the next PIM to reach.
Section 4 will present the TactiBelt hardware design.
6. Conclusions
Autonomous navigation is one of the biggest challenges for a VIP. This paper introduces a new system for the assistance of the VIPs’ mobility.The MAPS is composed of two original digital subsystems: F2T-TactiBelt. The originality of the proposed approach comes from the MAPS ability to assist the VIP’s real-time displacements. This system assists different subtasks of the mobility process and is especially useful for target reaching, namely:
- -
To learn a map and thus to construct the mental map of the environment where the VIP will navigate (using F2T);
- -
To transfer the “learned map” into a physical displacement (using the TactiBelt and its accessories).
The preliminary results of the experimental evaluation of the TactiBelt with the VIP and blindfolded participants in a simulated environment show that TactiBelt provides relevant data for secure and independent moving toward a target in a static environment. The provided data can be easily interpreted by the VIP, which signifies the probable acceptance of the MAPS.
Future work will focus on improving the MAPS systems with more reliable hardware and software. The spatial distribution of TactiBelt vibrators should be precisely investigated using touch senses physiology. The F2T should be designed as a frame to be clipped on classic PC screens, which will be used as the control of a MAPS system and lead to a truly portable device. The simulated stereo apparatus, a part of Feedback 2, must be replaced by a “real vision system”, a stereo apparatus embedded in a pair of glasses, and associated with an inertial measurement unit (IMU) (for various obstacles detections and for balance sense simulation). This system will be enhanced with a GPS (or a Galileo) chip for efficient outdoor tracking and reinforcement of our bio-inspired indoor and outdoor mobility model [
35]. Cartographic data necessary for navigation in real environments (indoor and outdoor) will be collected from online services or building blueprints for indoor navigation.
We will also investigate the use of audio effects to generate interactive multimodal representations of the map. Finally, additional serious games should be designed with more complex topologies than the considered virtual environment and corresponding to real configurations while navigating indoors and outdoors.
New tests will be carried out to measure the difference between a controlled virtual environment with and without typical distractions. To this effect, in the future we want to add nonstatic obstacles and other types of distraction that can occur outdoors to test our prototype. We will extend our testing population to elderly subjects.
Our first evaluations involved navigation in a virtual world, but it is important to note that since the proposed prototype is to be used in a real outdoor situation, it will be necessary to conduct the evaluation of the system, with the addition of other sensors, to determine preliminary efficacy of the prototype with the lead users, the VIP.
It is also to be noted that the first evaluation of the prototype involved blindfolded sighted people, conducted in a simulated environment. As such, it does not reflect the possible performance of actual VIPs. This is why we are getting in contact with some charities with VIPs to conduct next evaluations.