1. Introduction
In our modern era of advanced technology, the integration of intelligent features within domestic and professional settings, as exemplified by initiatives like the GiraffPlus project [
1], has ushered in capabilities that extend beyond mere image capture. These capabilities encompass fall detection, gesture recognition [
2], and activity identification (standing, sitting, walking, and so on) [
3]. This technological advancement holds promise for enhancing the functional independence of individuals with disabilities [
4], a prospect that assumes a heightened significance, given the escalating prevalence of moderate-to-severe visual impairments and blindness [
5]. Visual impairment predicts both accelerated deterioration in physical functioning and an increased mortality risk, particularly among severely visually impaired adults [
6]. This demographic reality underscores the pressing need for home and/or office interventions tailored to task facilitation and safety provisioning. Within this context, the concept of object localization within smart environments emerges as a pivotal avenue to amplify the self-sufficiency of visually impaired individuals. This entails the use of contemporary technology to orchestrate tactile or auditory cues that guide individuals toward objects of interest such as a television remote control, a mobile device, or house keys. This technological capability also extends to addressing safety concerns such as locating a wandering toddler. Although strides have been made in developing object-detection algorithms, by utilizing both regular and 3D cameras [
7,
8], the optimal modality for facilitating such guidance remains an open question. In instances where visual cues are not feasible, auditory or tactile cues, integrated within cognitive learning paradigms, emerge as a viable alternative [
9,
10]. While prior research demonstrates the proficiency of visually impaired individuals in localizing auditory cues [
11], the practicality of incorporating a distinct auditory source within every household item remains a challenge. Consequently, there exists an imperative to explore an auditory feedback modality that embodies not only intuitiveness but also efficacy in guiding the localization of objects within a three-dimensional spatial environment.
The existing literature underscores the role of feedback mechanisms—encompassing verbal instructions, sonification (conveying information through sound), and tactile cues—in aiding navigation for individuals with visual impairments. For instance, in the study by Bharadwaj et al. [
12], a waist-worn vibratory interface was compared against conventional auditory directives, revealing that tactile cues are particularly effective in noisy environments. Delogu et al. [
13] extended this understanding by employing sonification based on geographical locations, highlighting that spatial representation is not confined to the visual modality. Similarly, an inquiry by another study [
14] delved into real-time scene sonification for individuals with visual impairment, by comparing various modes such as image sonification, obstacle sonification, and path sonification. Their findings underscore the value of high-level scene information for effective navigation and learning efficiency, while acknowledging the challenge of reconciling comprehensive scene details with navigational speed. Despite these insights into the potential of tactile and auditory cues to enhance navigational proficiency, the efficacy of these cueing modalities for localizing objects within a three-dimensional context remains unexplored.
The present study endeavors to address this gap by undertaking a comprehensive investigation into the temporal and spatial dimensions of performance among visually impaired participants. Specifically, we aim to compare the efficacy of three automatic cueing modalities—verbal instructions, pitch-based sonification, and tactile vibrations—during an object localization task within a three-dimensional spatial domain. Our analysis focuses on the following factors: the time to locate an object, the path traversed by the hand until reaching the object, and the user satisfaction from each of the three cueing modalities. The results of this study help to shed light on the most effective mode of guidance for object localization in a three-dimensional environment. This initiative aims to encourage the development of cutting-edge technologies designed to assist visually impaired individuals in their daily navigation and location-based tasks.
2. Materials and Methods
2.1. Participants
We recruited 30 adults with visual impairment, using a convenient and snowball sampling approach. Inclusion criteria were based on the Ministry of Welfare’s guidelines for visual impairment, which encompasses individuals with total blindness, visual acuity of 3/60 m or worse in the better eye even with corrective eyewear, and/or a visual field of less than 20°. Participants were also required to possess normal or corrected hearing. Exclusion criteria included the presence of neurological or orthopedic conditions that could impact the movement of the dominant hand. Ethical approval for the study was obtained prior to commencement from the Ethics Committee of Tel Aviv University.
2.2. Tools
The cueing modalities (verbal, pitch sonification, and vibration) were provided using the following tools:
For the verbal and pitch sonification, a motion capture system with six infrared cameras (Qualisys Medical AB, Göteborg, Sweden) was calibrated according to the manual of the manufacturer. The motion tracking system automatically identified 4 passive reflective markers, placed on a small 3D-printed box (2.5 cm in length, width, and height with 4 markers attached to a base below it;
Figure 1a), and 4 additional markers, placed on a cluster attached to the back of the subject’s hand (
Figure 1a). The system streamed the 3D coordinates of these markers in real time, at 100 Hz, to custom LabView software (V2019, National Instruments, Austin TX, USA). The code was used to calculate the position of the box in 3D space in regard to the hand in real-time and provide auditory feedback. The distance considered for the feedback was the minimal distance found between a marker on the hand cluster and a marker on the box. Two auditory cues were configured: verbal cueing of the words “left”, “right”, “up”, “down”, “forward”, and “back”, in the English language, which was the second language of all of our participants; and pitch sonification: an audible continuous sound, for which pitch was increased or decreased when the subject’s hand moved closer to or away from the box, respectively.
For the vibration feedback, a Leap Motion sensor (Motion Control, San Francisco, CA, USA) was used to track the right hand of the subject. The hand’s coordinates were streamed to a custom processing code, used to extract the coordinates of the distal section of the 3rd finger. The coordinates of the box were pre-entered into the aforementioned processing code, which calculated the distance between the hand and the box and decided which vibration motor should be activated. The command to activate a certain motor was send via Bluetooth to an Arduino Micro (with a Bluetooth shield), which was placed in a 3D-printed box that was strapped to the subject’s forearm (
Figure 1b). Five vibration motors (shaftless vibration motor, 10 × 2.0 mm; Pololu, Vegas, NV, USA) were connected to the Arduino and taped to skin of the subject’s hand and wrist, according to the locations depicted in
Figure 1b.
For each cueing modality, the box was positioned at varying locations, situated 50 cm away from the initial hand placement (
Figure 2).
A post-experience subjective questionnaire, evaluating user satisfaction with each cueing modality, was administered using a Likert scale. Immediately following exposure to each cueing modality, participants were prompted to rate two specific aspects: firstly, the effectiveness of the cueing modality in aiding them to track the target box and, secondly, their overall satisfaction with the assistance provided by the cueing modality. Responses were rated on a scale spanning from “1”—indicating “not at all”—to “5”—denoting “very much so”. Additionally, an avenue for qualitative commentary was provided to allow participants to convey any additional insights or feedback.
2.3. Procedure
The participants were randomly assigned to three groups (N = 10 per group). Each group experienced the cues provided in a varied sequence to ensure randomness. Seated comfortably on a chair, every participant faced a table where the box was situated. A comprehensive demonstration of the cueing modalities was conducted by the researcher. This entailed the researcher orchestrating the subject’s hand movements toward and away from the target box, placed at different locations and heights (but maintaining the 50 cm aerial distance from the initial position of the hand), while auditory or tactile cues were concurrently activated. During this demonstration phase, the researcher elucidated the significance of the cues, meticulously explicating their relevance and purpose in tandem with the hand’s motion. This step was taken to foster familiarity and understanding among the participants. Following the demonstration, the participants were instructed to place their right hand at the designated starting point, defined by three distinct stickers (
Figure 2). Subsequently, prompted by a verbal “go” from the researcher, the participants embarked on the task of locating the box. The trial was iterated three times, corresponding to each cueing modality. For each modality, the box’s location was altered among the three positions depicted in
Figure 2 (maintaining 50 cm from the starting position of the hand). After the culmination of each trial, the participants rated their satisfaction with the respective cueing modality.
2.4. Post Processing
The time to find the box and the hand’s travel path length were analyzed for each of the three cueing modalities. The Friedman test was used to compare the outcome measures between the three cueing modalities. We used the Wilcoxin signed ranks test as post hoc. The effect size,
r, was calculated using the following equation [
15]:
Statistical significance is set to p < 0.05. Unfortunately, we encountered technical problems saving the coordinates of the hand during the vibration cueing trials, so, for this trial, only the times to complete the task were calculated.
3. Results
Thirty participants were recruited (nineteen males and eleven females; mean and SD age of 39.6 ± 15.0 years). Thirteen (43.3%) participants in the study population had full blindness, thirteen (43.3%) had vision below 60\3, one (3.3%) had vision below 61\3, one (3.3%) had vision below 62\3, and two (6.6%) were blind in one eye with severely reduced vision in the second.
Statistically significant differences were observed in the time it took the subjects to complete the task among the three cueing modalities (
p = 0.034;
Figure 3a). A subsequent post hoc analysis revealed a statistically significant prolonged duration for locating the box using pitch sonification compared to verbal cueing (
p = 0.016; r = −0.323). Conversely, no statistically significant differences were found in the hand path lengths between verbal and pitch sonification cueing (
p = 0.082; r = −0.317), although a trend toward a lengthier path was noted with pitch sonification (
Figure 3b).
No statistically significant differences were detected in the user’s satisfaction questionnaires. The level of assistance provided by the cueing modalities received median ratings, along with interquartile ranges of 4.5 (1) for pitch sonification, 5.0 (1) for verbal cues, and 4.5 (2) for vibration cues (p = 0.928). Similarly, satisfaction levels with the cueing modalities received median ratings, along with interquartile ranges of 4.0 (2) for pitch sonification, 4.0 (2) for verbal cues, and 3.5 (2) for vibration cues (p = 0.302). Regarding pitch sonification, participants primarily expressed concern that it lacked directional guidance for the object, solely focusing on hand–box distance. In the case of verbal cueing, the predominant complaint pertained to its delivery in a non-native language for the subjects. Lastly, concerning vibration cueing, participants found it challenging to discern the active motor while their hand was in motion, necessitating high concentration levels.
Additional statistical analyses were conducted to explore potential performance differences between genders. No statistically significant differences were observed in task completion times or hand travel path lengths across each cueing modality among the 11 female and 19 male participants. However, a trend can be seen, indicating a tendency toward shorter hand travel path lengths for male participants compared to females when guided by the verbal cueing modality (
p = 0.072; r = −0.328;
Figure 4). Also, worth noting, females showed higher diversity in their hand travel path lengths compared to men (
Figure 4).
We conducted further correlation analyses between participants’ age and their performance, revealing no significant correlations (p-values ranging from 0.102 to 0.831).
4. Discussion
In this study, we conducted a comparative analysis of time, hand path length, and user satisfaction during cube localization using three distinct cueing methods (sonification, verbal, and vibration). While no notable differences emerged in the hand path length or satisfaction levels, the pivotal significance lies in the time discrepancy. Specifically, the employment of the verbal cueing modality yielded a shorter localization time, underscoring its pivotal role in designing navigation aids for the visually impaired.
Verbal guidance was found to be the most effective cueing modality in terms of the time to locate the object. The time difference when locating objects is a crucial factor to consider when designing navigation aids for visually impaired individuals due to several key reasons. First, locating commonly used objects, e.g., the air conditioner’s remote control or the house keys, is an essential aspect of daily living. Minimizing the time it takes to find these items, through effective cueing modalities, directly contributes to the convenience and efficiency of visually impaired individuals’ everyday routines. Furthermore, swift object localization streamlines routine tasks such as turning on the air conditioner or unlocking a door. Reduced search times enhance the speed and efficiency with which visually impaired individuals can complete these tasks, ultimately improving their overall quality of life by enabling them to independently and quickly carry out daily activities. This also helps in reducing their reliance on external assistance [
16]. Since prolonged search times for frequently used items may lead to frustration and stress, navigation aids that minimize search times help mitigate these negative emotions, contributing to a more positive and satisfying user experience as well as improved mental well-being [
17].
The shorter time it took to find the box using the verbal cueing modality might be explained by various factors such as the cognitive load, auditory processing, and familiarity of the language to visually impaired individuals. We have a natural ability to interpret and follow verbal instructions. These factors might make a larger contribution to the successful use of auditory cues in the visually impaired population. There exists a body of empirical evidence indicating that signal perception and processing mechanisms in visually impaired individuals, particularly those who experienced early-onset blindness, exhibit discernible deviations from those of their sighted counterparts [
18]. Markedly, these individuals, besides manifesting an elevated capacity for perceptual auditory processing, were observed to demonstrate notable competencies in higher-order cognitive functions, encompassing domains such as musical aptitude, linguistic proficiency, and memory skills [
18]. A behavioral–electrophysiological study that compared auditory memory in congenitally blind adults and matched sighted controls concluded that the former group more efficiently encodes auditory verbal material [
19].
Contrarily to the positive effectiveness of verbal cueing, sonification poses a few challenges, mainly due to the lack of crucial information about the direction. Also, the continuous beeping might be annoying, and the cognitive workload required to convert the beeping sound into spatial information might contribute to the observed delay, as suggested by [
20]. While target sonification might prove advantageous for sighted individuals, particularly in scenarios involving intricate visual guidance such as surgery [
21], for the visually impaired population, sonification may present challenges due to the inherent need for comprehensive auditory cues and efficient cognitive processing. Hence, pitch sonification is the least advisable cueing modality for object localization among the three assessed in this study.
The third cueing modality introduced in this study was tactile vibration. In scenarios where auditory cues might be hindered by a noisy environment or compete for auditory attention among visually impaired individuals, tactile cues offer a viable alternative. Vibration, as a tactile cueing mechanism, can be engaged via a singular motor, as found in mobile devices (used, for example, by Google Maps, alerting the pedestrian that a turn is imminent), or it can be implemented through multiple motors distributed across the body or limb, as demonstrated in the present study. Moreover, the activation of vibration cues can encompass diverse patterns that require user differentiation. However, it is important to acknowledge that this complexity in cue discrimination may potentially augment the cognitive workload, as observed in related studies, e.g., [
22]. Another concern is the attachment of the vibration motors on the body, which might have negatively affected the participants’ ability to accurately interpret the cues. It is possible that adjustments in the placement of the motors could potentially enhance the effectiveness of vibration cues, for example, placing them in different locations on the body (on different limbs or a hip-worn belt, as in [
12]). In summary, the utilization of vibration-based cueing exhibits both advantageous and disadvantageous aspects. However, with respect to our empirical observations, its supremacy over verbal cueing remains inconclusive, owing to the varying levels of proficiency exhibited by different participants.
Although there were no statistically significant differences between men and woman, we found a trend toward shorter hand travel path lengths for men compared to women when assisted by the verbal cueing modality. Sex-related differences are complex and multifaceted, often arising from a combination of biological, psychological, and sociocultural factors. In our study, possible differences may be attributed to physical characteristics, e.g., arm length and differences in motor control and coordination during reaching [
23,
24]. Additionally, men and women might exhibit variations in responsiveness to different types of cues, as related to factors such as attention or reaction time [
25,
26].
While this study offers valuable insights, it is important to acknowledge certain limitations. Notably, the sample size was relatively small, which may influence the generalizability of the findings. However, it is worth noting that the moderate effect size associated with the primary outcome highlights a discernible distinction among the modalities, suggesting potential practical relevance despite the sample-size constraint. Additionally, the experimental tests were conducted within a controlled laboratory setting. Consequently, the ecological validity of the findings in real-world scenarios might be subject to variation. Furthermore, the placement of the target box at different positions within the subjects’ reachable area could introduce a degree of variability that may impact the robustness of the results. Lastly, our results should not be applied to cues provided for the localization of moving objects, as the perception of speed in individuals with visual impairments might be compromised [
27]. While these limitations warrant consideration, the study’s outcomes remain instructive and pave the way for future investigations to expand upon these findings in more diverse and ecologically valid contexts.
5. Conclusions and Future Directions
We compared the efficacy of three automatic cueing modalities—verbal instructions, pitch-based sonification, and tactile vibrations—during an object localization task within a three-dimensional spatial domain. Our results suggest that verbal cueing is the optimal modality that reduces the localization time of an object. We believe that when designing navigation aids for visually impaired individuals to locate specific objects, the time difference in object localization remains a crucial factor. Swift and efficient object localization directly contributes to everyday convenience, task efficiency, user autonomy, and overall well-being. By prioritizing minimized search times, designers can create navigation aids that empower visually impaired individuals to efficiently manage their environment, interact with others, and complete tasks with greater ease and independence. Future studies might consider the potential benefits of combining audio guidance and vibration feedback, albeit with awareness of the intricacies of sensory integration among visually impaired individuals. Individuals with visual impairments rely on their other senses, e.g., hearing and touch, to gather information about their environment. However, it is crucial to be aware that these senses might not work the same way in everyone, and individuals may have different abilities to effectively integrate sensory information. Notably, prior research suggests that there is attenuated multisensory spatial integration in this population, underscoring the need for a nuanced approach when investigating the synergies among these modalities [
28].
Anticipating a proximate future, we envision an interconnected environment encompassing residential, occupational, and public spaces that is capable of discerning and acknowledging multiple objects within its domain. This envisioned environment would be responsive to vocalized inquiries, exemplified by interactions such as “Greetings, abode. Could you assist me in locating my domicile keys?” Through a comprehensive framework, the system would adeptly assimilate the geographical coordinates of the querent alongside the designated whereabouts of the object, thereby initiating preliminary guidance. For instance, an informative prompt such as “The keys have been identified within the kitchen precinct, on the counter” would be disseminated. Subsequently, the system would engage in continuous monitoring of the individual’s locomotion. Upon the individual’s convergence with the spatial vicinity harboring the sought-after item, a refined and contextually tailored cueing mechanism would ensue. This specialized guidance would culminate in the orchestration of precise manual movements, orchestrating the seamless alignment of the user’s hand with the discreet location of the designated object.