*2.1. Image Accessibility*

Screen readers cannot describe an image unless its metadata such as alt text are present. To improve the accessibility of images, various solutions have been proposed to provide accurate descriptions for individual images on the web or on mobile devices [12–14,20–22]. Winters et al., for instance (Reference [14]) proposed an auditory display for social media that can automatically detect the overall mood of an image and gender and emotion of any faces using Microsoft's computer vision and optical character recognition (OCR) APIs. Similarly, Stangl et al. [12] developed computer vision (CV) and natural language processing (NLP) modules to extract information about clothing images on an online shopping mall. To be specific, The CV module automatically generates a description of the entire outfit shown in a product image, while the NLP module is responsible for extracting

price, material, and description from the web page. Goncu and Marriott [22], on the other hand, demonstrated the idea of creating accessible images by the general public using a web-based tool. In addition, Morris et al. [20] proposed a mobile interface that provides screen reader users with rich information of visual contents prepared using real-time crowdsourcing and friend-sourcing rather than using machine learning techniques. It allowed users to listen to the alt text of a photograph and ask questions using voice input while touching specific regions with their fingers.

While most of the studies for improving the accessibility of images that can be accessed with digital devices tend to focus on how to collect the metadata that can be read out to screen reader users, others investigated how to deliver image-related information with tactile feedback [2,4,5,23,24]. Götzelmann et al. [23], for instance, presented a 3D-printed map to convey geographic information by touch. However, some worked on using computational methods to automatically generate tactile representations [3,25]. For example, Rodrigues et al. [25] proposed an algorithm for creating tactile representations of objects presented in an artwork varying in shape and depth. While it is promising, we focused on improving image accessibility on a touchscreen, which is widely adopted in personal devices such as smartphones and tablets, since it does not require additional hardware.
