*4.5. Interaction Types*

We were also interested in how image-related information is delivered using touchscreen devices (*RQ4*). We have identified the interaction type in terms of input types and output modalities as follows:

*Input types*. As shown in Table 4, the major input type is touch, as expected; most of the studies allowed users to explore images by touch with their bare hands ( *N* = 28 out of 33). Moreover, touchscreen gestures were also used as input ( *N* = 5). On the other hand, physical input devices ( *N* = 4 for *keyboard*, *N* = 3 for *stylus*, and *N* = 2 for *mouse*) were used in addition to touchscreen devices. While it is known that aiming a camera towards a target direction is difficult for BLV [67], a camera was also used as a type of input, where users were allowed to share image feeds from cameras with others so that they could ge<sup>t</sup> information about their surrounding physical objects such as touch panels on a microwave [60,61].


**Table 4.** Types of input used for improving images on touchscreen devices.

*Output Modalities*. As for output modalities, various types of feedback techniques were used (see Table 5). Approximately half of the studies used a single modality: audio only ( *N* = 14; including both speech and non-speech audio) or vibration only ( *N* = 3). On the other hand, others used multimodal feedback, where the combination of audio and vibration was most frequent ( *N* = 6), followed by audio with tactile feedback ( *N* = 5). The most widely used output was speech feedback that verbally describes images to BLV users using an audio channel as a screen reader reads out what is on the screen using text-to-speech (e.g., Apple's VoiceOver). On the other hand, non-speech audio feedback (e.g., sonification) was also used. For instance, different pitches of sound [24,35,36,42] or rhythms [55] were used to convey image-related information. Meanwhile, vibration was as popular as non-speech audio feedback, while some used tactile feedback to convey information. For example, Gotzelmann et al. [23] used a 3D-printed tactile map. Zhang et al. [41] also made user interface elements (e.g., buttons, sliders) with a 3D printer to improve the accessibility of touchscreen-based interfaces in general by replacing virtual elements on a touchscreen with physical ones. Moreover, Hausberger et al. [56] proposed an interesting approach using kinesthetic feedback along with frictions. Their system dynamically changes the position and the orientation of a touchscreen device in a 3D space for BLV to explore shapes and textures of images on a touchscreen device.


**Table 5.** Output modalities used for improving images on touchscreen devices.

### *4.6. Involvement of BLV*

Finally, we checked if BLV, the target users, were involved in the system development and evaluation processes; see Table 6. We first examined if user evaluation was conducted regardless of whether target users were involved or not. As a result, we found that all studies but two had tested their system with human subjects. Most of them had a controlled lab study, where metrics related to task performance were collected for evaluation such as the number of correct responses and completion time. However, close to half of the studies had subjective assessments such as easiness and satisfaction in a Likert scales, or open-ended comments about their experience after using the systems.

Of the remaining 31 papers, three papers had user studies but with no BLV participants. The rest of the 28 papers had evaluated their system with participants from the target user group. In addition to user evaluation, seven studies used participatory design approaches during their design process. Moreover, some papers had BLV participate in their formative qualitative studies at an early stage of their system development to make their ideas concrete (i.e., survey, interview).

**Table 6.** Methodologies used in the studies and BLV's involvement in system design and evaluation. Note that the following three studies conducted user studies with blind-folded sighted participants [54–56].


### **5. AccessArt and AccessComics**

Based on our systematic review results, we have confirmed that various types of images were studied to improve their accessibility for BLV people. However, most of the studies focused on providing knowledge or information based on facts (e.g., maps, graphs) to users rather than offering improved user experience that BLV users can enjoy allowing subjective interpretations. Thus, we focused on supporting two types of images in particular that are rarely studied for screen reader users: artwork and comics. Here, we demonstrate how these two types of images can be supported and appreciated with improved accessibility: AccessArt [16–18] and AccessComics [19] (see Table 7 as well).


**Table 7.** A summary of AccessArt and AccessComics following identifying factors used in our systematic review.

### *5.1. AccessArt for Artwork Accessibility*

BLV people are interested in visiting museums and wish to know about artwork [68–70]. However, a number of accessibility issues exist when visiting and navigating inside a museum [71]. While audio guide services are in operation for some exhibition sites [72–75], it can still be difficult for BLV people to understand the spatial arrangemen<sup>t</sup> of objects within each painting. Tactile versions of artwork, on the other hand, allows BLV people to learn the spatial layout of objects in the scene by touch [2–5]. However, it is not feasible to make these replicas for every exhibited artwork. Thus, we began to design and implement touchscreen-based artwork exploration tool called *AccessArt*.

*AccessArt Ver1.* The very first version of AccessArt is shown in Figure 3, in which it had four paintings with varying genres: landscape, portrait, abstract, and still life [17]. As for the object-level labels, we segmented each object along with descriptions. Then we developed a web application that allowed BLV users to (1) select one of the four paintings they wish to explore and (2) scan objects within each painting by touch with its corresponding verbal description including object-level information such as the name, color and position of the object. For example, if a user touches the moon on "The Starry Night", then the system reads out the following: *"Moon, shining. Its color is yellow and it's located at the top right corner"*. Users can either use swipe gestures to go through a list of objects or freely explore objects in a painting by touch to better understand objects' location within an image. In addition, users can also specify objects and attributes they wish to explore using filtering options. Eight participants with visual impairments were recruited for a semi-structured interview study using our prototype and provided positive feedback.

*AccessArt Ver2.* The major problem with the first version of AccessArt was the object segmentation process, which was not scalable as it was all manually done by a couple of researchers. Thus, we investigated the feasibility of relying on crowdworkers who were not expected have expertise in art [16]. We used Amazon Mechanical Turk (https: //www.mturk.com/) for collecting object-label metadata for eight different paintings from an anonymous crowd. Then we assessed the effectiveness of the descriptions generated by crowd with nine participants with visual impairments, where they were asked to go through four steps of the *Felman Model of Criticism* [76]: *description*, *analysis*, *interpretation*, and *judgment*). Findings showed that object-level descriptions provided by anonymous crowds were sufficient for supporting BLV's artwork appreciation.

**Figure 3.** User interface prototype with two interaction modes: *Overview* (**left**) and *Exploration* (**right**). As for the *Exploration Mode* example, *star* is selected as object of interest, highlighted in various colors.

*AccessArt Ver3.* As a final step, we implemented an online platform (https://artwikihci2020.vercel.app/) as shown in Figure 4. It is designed to allow anonymous users to freely volunteer to provide object segmentation and description, inspired by Wikipedia (www.wikipedia.org). While no user evaluation has been conducted with the final version yet, we expect this platform to serve as an accessible online art gallery for BLV people where the metadata are collected and maintained by crowd to support a greater number of artwork, which can be accessed anywhere using one's personal device.

**Figure 4.** Screenshot examples of the main page (**left**) and edit page (**right**) of AccessArt Ver3.

### *5.2. AccessComics for Comics Accessibility*

Compared to artwork accessibility, fewer studies have been conducted to improve the accessibility of comics. For instance, Ponsard et al. [77] proposed a system for people who have low vision or have motor impairments, which can automatically retrieve necessary information (e.g., panel detection and ordering) from images of digital comics and reads out the content on a desktop computer controlled with a TV remote. *ALCOVE* [78] is another web-based digital comic book reader for people who have low vision. The authors conducted a user study with 11 people who have low vision, and most of them preferred their system over the .pdf version of digital comics. Inspired by this study, our system, AccessComics [19], is designed to provide BLV users with overview (as shown in Figure 5a,b), various reading units (i.e., page, strip, panel), magnifier, text-to-speech, and autoplay. Moreover, we mapped different voices with different characters to offer a high sense of immersion in addition to improved accessibility similar to how Wang et al. [79] used a voice synthesis technique that can express various emotional states given scripts. Here, we briefly describe how the system is implemented.

**Figure 5.** Solution and error progression curve for different times and ε values, with *n* = *m* = 40 and *tmax* = 1.

*Data Preparations*. As for the information we wish to provide about comics, we used *eBDtheque* [80], which is a dataset of comics consisting of pairs of an image file (.jpg) and a metadata file (.svg). The metadata has segmented information of *panel*, *character*, and *balloon* of a comic page. In addition, we manually added visual descriptions such as background and appearance and actions of characters.

*Interaction*. Similar to AccessArt, AccessComics allow users to select a comic book they wish to read and navigate to different elements in each panel, panels themselves as well as pages and listen to displayed content. For example, as for Figure 5c, the following would be played using the audio channel starting with the panel number, followed by background and character-related visual descriptions.
