QuickPic AAC: An AI-Based Application to Enable Just-in-Time Generation of Topic-Specific Displays for Persons Who Are Minimally Speaking

Yu, Christina; Schlosser, Ralf W.; Fontana de Vargas, Maurício; White, Leigh Anne; Koul, Rajinder; Shane, Howard C.

doi:10.3390/ijerph21091150

Open AccessArticle

QuickPic AAC: An AI-Based Application to Enable Just-in-Time Generation of Topic-Specific Displays for Persons Who Are Minimally Speaking

by

Christina Yu

^1,2,*,

Ralf W. Schlosser

^1,3

,

Maurício Fontana de Vargas

⁴

,

Leigh Anne White

¹

,

Rajinder Koul

⁵ and

Howard C. Shane

^1,2

¹

Boston Children’s Hospital, Waltham, MA 02453, USA

²

Massachusetts General Hospital Institute of Health Professionals, Boston, MA 02129, USA

³

Department of Communication Sciences and Disorders, Northeastern University, Boston, MA 02115, USA

⁴

School of Information Studies, McGill University, Montreal, QC H3A 0G4, Canada

⁵

Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, Austin, TX 78712, USA

^*

Author to whom correspondence should be addressed.

Int. J. Environ. Res. Public Health 2024, 21(9), 1150; https://doi.org/10.3390/ijerph21091150

Submission received: 4 July 2024 / Revised: 3 August 2024 / Accepted: 22 August 2024 / Published: 29 August 2024

(This article belongs to the Special Issue Mobile Health and Mobile Rehabilitation for People with Disabilities: 2nd Edition)

Download

Browse Figures

Review Reports Versions Notes

Abstract

As artificial intelligence (AI) makes significant headway in various arenas, the field of speech–language pathology is at the precipice of experiencing a transformative shift towards automation. This study introduces QuickPic AAC, an AI-driven application designed to generate topic-specific displays from photographs in a “just-in-time” manner. Using QuickPic AAC, this study aimed to (a) determine which of two AI algorithms (NLG-AAC and GPT-3.5) results in greater specificity of vocabulary (i.e., percentage of vocabulary kept/deleted by clinician relative to vocabulary generated by QuickPic AAC; percentage of vocabulary modified); and to (b) evaluate perceived usability of QuickPic AAC among practicing speech–language pathologists. Results revealed that the GPT-3.5 algorithm consistently resulted in greater specificity of vocabulary and that speech–language pathologists expressed high user satisfaction for the QuickPic AAC application. These results support continued study of the implementation of QuickPic AAC in clinical practice and demonstrate the possibility of utilizing topic-specific displays as just-in-time supports.

Keywords:

artificial intelligence; augmentative and alternative communication; AI; AAC; application; just-in-time; speech–language pathology

1. Introduction

With the advent of mobile technology, the use of applications (“Apps”) in augmentative and alternative communication (AAC) has become integral to the standard of care for persons who are minimally speaking [1] (individuals who are minimally speaking may include persons with developmental disabilities (e.g., autism, intellectual disabilities, etc.), acquired disorders (e.g., aphasia, traumatic brain injury), progressive disorders (e.g., muscular dystrophy), and temporary conditions (e.g., recovering from surgery in the intensive care unit) [2]). Many apps provide a range of tools that serve as a communication platform as well as a medium to provide language support [3]. QuickPic AAC is a new and innovative app that seamlessly blends artificial intelligence (AI) and visual supports to empower minimally speaking individuals who require support generating utterances by selecting graphic representations or text from a display.

QuickPic AAC harnesses the power of AI to interpret visual scenes from a photograph, allowing it to identify characters and their actions. The source of the picture scenes can be a photo library, a fresh photo snapshot, or an internet search. QuickPic AAC then transforms the visual input into a mixed display which is a combination of the visual scene (photo) and the elements of vocabulary thematically related to the scene arranged in a grid display [4]. The grid display is arranged in the form of a modified Fitzgerald Key that parses and color-codes the grammatical parts of a sentence [5,6]. QuickPic AAC has the following categories from left to right: pronouns, verbs, prepositions, adjectives, and objects. In other words, after analyzing the photo, QuickPic AAC constructs a grid that strategically places symbols representing the subjects and their activities in the scene. Notably, the app uses facial recognition to recognize individuals and retains this knowledge to accurately identify individuals in future mixed displays. QuickPic AAC also allows instructors to edit and customize the symbols in the grid, ensuring the most accurate representation of the scene. This collaborative and customizable aspect guarantees that the app’s generated vocabulary not only aligns with the visual content, but is also personalized and meaningful to its user, enabling learners to better grasp language concepts.

The thematic or topic-specific vocabulary that is arranged in grammatical categories (i.e., based on the Fitzgerald Key) is known as a topic-specific display (TSD). TSDs are a type of aided approach that enable users to communicate appropriately with phrase production in the context of a particular activity [7,8] through the arrangement of linguistic elements or categories on a single page for constructing a sentence. Several studies have demonstrated the importance of considering symbol background color and/or grammatical category when organizing a display [9,10]. More specifically, these studies concluded that “clustering” symbols based on shared background color and/or grammatical category (e.g., subject, verb, noun, adjective) increases the speed at which early communicators can locate a desired symbol. The layout of the mixed displays created in QuickPic AAC reflect these findings, as shown by the organization of the grid display shown in Figure 1.

Thistle and Wilkinson [11] sought out to better understand how speech–language pathologists make decisions regarding display type and layout characteristics of AAC systems for early communicators. While 83% of participants indicated they use visual scene displays (VSDs) less than 25% of the time, 60% of participants indicated they would use a VSD for an early communicator rather than a grid-based display. This finding may suggest that while many speech–language pathologists see the value in utilizing a scene-based approach, the time it takes to create them may interfere with the frequency at which these valuable tools are used. Thistle and Wilkinson’s study [11] also captured how much variance there is among speech–language pathologists in their desired characteristics for AAC displays created for early communicators. For example, several participants of their study indicated preferences regarding the vantage point of photographs used, the background of photographs, and the size of the display. QuickPic AAC aims to make creating displays incorporating real photographs much more efficient, and therefore more likely to be used in clinical practice. If successful, this may also afford speech–language pathologists endless opportunities for customization to meet the specific needs of the individual.

Traditionally, developing meaningful and functional TSDs has required significant advanced planning and programming, and, therefore, time, from mentors working with individuals who are minimally speaking. For example, imagine a teacher is planning to introduce a new science lesson on a particular forest biotope in the weeks ahead. In addition to planning the lesson in general terms, this teacher would need to gather all the vocabulary needed for the student who is minimally speaking, and then organize it in an intuitive way, so that the student can be an effective participant in that lesson.

Because the QuickPic AAC app enables automatic generation and organization of vocabulary from a single photograph, one could upload a photo of a forest biotope and the app would automatically generate and organize the vocabulary in the form of graphic symbols (e.g., Picture Communication Symbols). If functional, the mentor would save considerable time and elevate TSDs into the realm of just-in-time supports (JITs) [12,13]. This is something that was previously unthinkable given the advanced planning and time-consuming preparations.

There is a significant gap in research related to AAC applications and their use in clinical practice. While there is a dearth of evidence supporting the effectiveness of speech-generating devices (SGDs) [14,15,16] as a general category of effective communication tools, there is limited research on the specific features, functions, and algorithms used in AAC applications. In addition, there is significant growing interest and discussion of the use of AI in the field of speech–language pathology, particularly in the subfield of AAC [17], as evidenced by many AAC companies attempting to include AI software into their products. However, empirical research on the effective integration of AI into AAC tools has not been well studied [18]. This study aims to address some of these gaps in research.

Usability testing is a critical element in the product development cycle of apps in mobile health (mHealth) and education [19]. There are a host of methods available for usability testing, including questionnaires, think aloud walkthrough, task completion, interviews, focus groups, heuristic testing, and automated methods. A recent scoping review [20] revealed that most usability studies in eHealth use a combination of at least two of these methods, and the overall order in terms of frequency of use was as follows: questionnaires (n = 105), task completion (n = 57), ‘Think-Aloud’ (n = 45), interviews (n = 37), heuristic testing (n = 18), and focus groups (n = 13).

Using a combination of quantitative and thematic analyses methods, the purpose of this study was to (a) determine which of two AI algorithms (NLG-AAC and GPT-3.5) results in more relevant vocabulary with the QuickPic AAC application; and to (b) evaluate the perceived usability of QuickPic AAC among practicing speech–language pathologists.

2. Methods

2.1. Participants

Participants included eight speech–language pathologists (SLPs) ranging in age range from 25 to 64 years based in an outpatient pediatric hospital: four participants were between 25 and 34 years old, three participants were between 25 and 44 years old, and one participant was between 55 and 64 years old. In order to be included, participants had to meet the following criteria: (a) an active American Speech-Language-Hearing Association (ASHA) Certificate of Clinical Competence for Speech-Language Pathologists (CCC-SLP); (b) a minimum of one year of experience working with individuals who use AAC or individuals who might benefit from AAC; and (c) experience in having created at least one TSD. Participants were recruited based upon convenience sampling within an outpatient AAC center in the Northeast of the United States. Table 1 provides an overview of participant characteristics.

The Institutional Review Board considered this study as exempt because it is limited to research activities in which the disclosure of the human subjects’ responses outside the research did not reasonably place the subjects at risk of criminal or civil liability or was not damaging to the subjects’ financial standing, employability, educational advancement, or reputation. Participants provided verbal consent.

2.2. Materials

Materials included (a) a tablet (i.e., iPad Pro) and the QuickPic AAC iOS application; (b) QuickPic AAC Reference Guide (see Appendix A); (c) the Demographic and AAC Experience Questionnaire (see Appendix B); (d) a vignette; (e) photographs; and (f) two usability questionnaires.

Tablet and QuickPic AAC. The QuickPic AAC app ran on an iPad Pro. The QuickPic AAC app evolved from the development of an earlier prototype described in Fontana de Vargas et al. [21]. To generate vocabulary automatically, QuickPic AAC employs two different approaches for generating vocabulary. The first approach, proposed by Fontana de Vargas and Moffat [21] (which has now been coined NLG-AAC), uses the Visual Storytelling Dataset (VIST) [22] as the main source of vocabulary. VIST is composed of 65,394 photos of personal events, grouped in 16,168 stories. Each photo is annotated with captions and narrative phrases that are part of a story, created by Amazon Mechanical Turk workers. The NLG-AAC method works by first identifying the photographs in VIST that are most similar to the input photograph. This is accomplished by calculating the sentence similarity between the input photo caption, generated using the computer vision technique from Fang et al. [23], and all VIST photos captions. Then, the method retrieves all stories associated with those photographs and finds the most relevant words to present in QuickPic AAC by applying the Affinity Propagation clustering algorithm [24], and finally, gathering the top 50 most frequent words in the identified clusters.

The second approach, named GPT-AAC, takes advantage of recent advancements in natural language processing (NLP), a subfield of AI. More specifically, the method prompts the large language model (LLM) GPT-3.5 to produce the desired set of words related to the input photo caption (which is created using the method from Fang et al. [23], as in NLG-AAC). The prompt used by the method is shown below:

“You are a Speech Language Pathologist specialized in Augmentative and Alternative Communication.”

“Your task is to provide vocabulary related to a situation to help a person with communication disability to formulate messages about the situation. This vocabulary must contain words that people would often use to talk about that situation, either to describe it as well as to tell a story about it.”

“The vocabulary must contain 20 verbs, 20 descriptors (adjectives and adverbs not terminating with LY), 20 objects, and 15 prepositions.”

“All words must be in the first person singular, infinitive form without ‘to’.”

In both NLG-AAC and GPT-3.5, the underlying codes, as noted above, remain the same for generating vocabulary output for the TSDs. When participants customize the settings related to vocabulary (i.e., number of icons shown per part of speech), the vocabulary displayed to the participants is adjusted according to the specific customizations assigned in the settings. This allows the display to be customized for each user based upon the number of items per part of speech (e.g., subjects, verbs, etc.) while the underlying prompt continues to be consistent and guide the vocabulary generation process. Readers interested in the app design process through the lens of human–computer interaction research may reference the paper by Fontana de Vargas et al. [18].

The QuickPic AAC Reference Guide. The QuickPic AAC Reference Guide is a set of instructions that is available within the app (see Appendix A). For the purposes of this paper, the following terminology is adopted to describe aspects of QuickPic AAC communication displays (see Figure 1): (a) topic-specific display: thematic or topic-specific vocabulary that is arranged in grammatical categories (subject, verb, object, etc.); (b) static scene cue: a photograph of a single activity and/or concept [4,25]; (c) mixed display: a display containing a scene cue combined with a topic-specific display [4].

Demographic and AAC Experience Questionnaire. The Demographic and AAC Experience Questionnaire (Appendix B) elicited key demographic data (e.g., years as a practicing SLP) and previous experience with AAC, including their perspectives on TSDs.

Vignette. The vignette was a prewritten case study that informed participants of the context in which they would be creating the TSDs. This was provided to all participants to read prior to creation of a TSD with QuickPic AAC:

You are a speech–language pathologist in an outpatient pediatric setting and have a 7; 2 year old male patient with a primary diagnosis of autism spectrum disorder-level 3. Medical history includes no functional concerns regarding vision, hearing, or motor status. Receptive language skills include strong comprehension of noun-based vocabulary and ability to follow single-step directions within familiar contexts. Expressive language skills include scripted phrases (e.g., I want __), and single word approximations to label. Aided communication strategies include a grid-based communication application used primarily for requesting, labeling, and protesting. A goal of speech therapy is commenting/describing using 3 word utterances. A highly preferred activity/topic of conversation are cars/trains. Based upon this case study, create a QuickPic AAC display revolved around cars/trains using the ‘search [the web]’ function.

Photographs. As noted in the instructions within the vignette, participants were asked to choose one photograph for both conditions using the “Search the Web” feature of QuickPic AAC. One participant each chose a photo of a sports car on the road, and a photo of two boys playing trains together, respectively. Two participants each chose a photo of two boys playing race cars together, a photo of a boy playing with a wooden train set on the floor, and a photo of a boy playing with cars and trucks on a hardwood floor, respectively. Some participants chose identical photos from the web searches, likely because these images appeared first as initial search results.

Two Usability Questionnaires. The Mobile Health (mHealth) App Usability Questionnaire (MAUQ) [26] and a questionnaire adapted from Fontana de Vargas et al. [21] were administered. The MAUQ [26] (see Appendix C) was used to assess the usability of QuickPic AAC with its two approaches. The MAUQ has adequate psychometric characteristics and includes a 7-point Likert scale containing 16 items about interaction, vocabulary, and usage factors. The MAUQ was adapted minimally to meet the specific needs of our user study. Specifically, one question was eliminated (i.e., “I could use the app even when the Internet connection was poor or not available”) as the QuickPic AAC application requires internet connectivity. Additionally, one question was modified from “This mHealth app provides an acceptable way to deliver healthcare services, such as accessing educational materials, tracking my own activities, and performing self-assessment” to “This app provides an efficient way to create visual supports, such as educational, speech-language therapy, and language learning materials”.

The second questionnaire used in this study was adapted from Fontana de Vargas et al. [21] (see Appendix D). This questionnaire illustrates how the participants perceived the quality of three different areas in the application including: interaction, vocabulary quality, and overall usage. Modifications were made to serve the specific purposes of this study’s objectives. First, terminology was adapted across the entire survey from third person (e.g., “Users could easily select a desired vocabulary item within a page”) to first person language (e.g., “I could easily select a desired vocabulary item within a page”). Within the interaction subsection, two items related to creation of previous communication boards were eliminated as this did not pertain to the objectives of this research (i.e., Users tended to access/use vocabulary from previously created pages, Users tended to access/use vocabulary from newly created pages). Within the “Vocabulary” subsection, one question was modified from “The generated vocabulary included words users did not want to use” to “the vocabulary generated included words I would not have thought of that are relevant”. In addition, three items were added. Specifically, the items added were “The order the vocabulary was presented was adequate”, “The vocabulary generated included words I would target during educational and/or speech therapy sessions”, and “Overall the vocabulary generated is effective in helping me achieve targeted goals for my use”. Lastly, within the “Usage” subsection, one item was modified. Specifically, “Users were more communicative using the application than they usually are using other AAC tools” was modified to “I created topic-specific displays using this application more efficiently than with other AAC tools”. In addition to the two questionnaires, five open-ended questions related to overall experience and vocabulary generation across the two conditions were utilized.

2.3. Design and Measures

A descriptive usability study was completed to evaluate the feasibility of using AI to generate relevant vocabulary for TSDs. This prospective design is consistent with a case series [27] in that the SLPs were exposed to QuickPic AAC with the two AI approaches following the reading of the vignette and the outcomes were monitored with observations and via questionnaire. Two dependent variables were measured within this study: (a) specificity of the vocabulary generated across two AI conditions (i.e., the natural language generation (NLG) approach based on de Vargas and Moffatt [21]; and the GPT-3.5 approach) and (b) user satisfaction. The specificity of the vocabulary generated was measured in terms of percentages as follows: (a) vocabulary/icons kept by the participant for the final TSP relative to vocabulary/icons originally produced by AI; (b) vocabulary kept for the final TSP but with icons altered by the participant out of the total # of vocabulary kept (alteration may involve the participant choosing a different icon to represent the vocabulary identified or moving the existing icon to a different column in the display); and (c) vocabulary/icons deleted by the participant from the final TSP relative to vocabulary/icons originally produced by AI (these two measures are inversely related).

Overall user satisfaction with each condition served as the second dependent variable, measured by two questionnaires as described previously.

2.4. Procedures

2.4.1. Demographic and AAC Experience Questionnaire

Upon enrollment in this study, participants completed a questionnaire regarding pertinent demographic information and previous experience with AAC (Appendix B). In addition, two brief questions were asked regarding their perspectives towards the benefits of TSDs and the challenges behind the creation of TSDs.

2.4.2. Tutorial

Participants engaged in a two-part tutorial process. Participants initially were provided a printed-out QuickPic AAC Reference Guide and were asked to read through the reference guide independently to familiarize themselves with the functions of QuickPic AAC. Subsequently, each participant individually took part in a live tutorial session led by the examiner, during which each feature in the reference guide was demonstrated including: creating a new board, editing a board (i.e., add, delete icons), editing an individual button, changing an individual button’s background color, locating a saved board, customizing “My Album”, and tips and tricks to create boards. Participants were able to use any of the features listed in the QuickPic AAC Reference Guide to customize their TSD, including adding icons, modifying/editing an existing icon, deleting icons, rearranging icons, etc.

2.4.3. Experimental Task

Following the tutorial phase, each participant received instructions to generate two TSDs with QuickPic AAC utilizing two separate approaches. Participants were aware the purpose of this study was to determine which approach generated more appropriate vocabulary. The two approaches encompassed the NLG method and the GPT-3.5 model. Participants remained blind to both conditions, and the sequence of conditions were randomized amongst participants to mitigate potential order-related effects. The creation of TSDs under both conditions for all participants were screen recorded using the built in screen recording feature within the iPad. To initiate a recording, the control center was enabled via the examiner by swiping down from the upper-right corner of the screen and selecting the “Screen Recording” icon. The standard, built in iOS setting of a three-second countdown signaled the start of the recording to the participants. The examiner stopped the recording when participants indicated they were completed with each TSD. For anonymity purposes, the recordings did not include sound and only captured the visual content on the screen. This allowed for data analysis to identify the vocabulary selections deemed relevant by participants across both conditions. Participants were provided with explicit instructions for using the app based on the QuickPic AAC Reference Guide (see Appendix A) and previously described vignette to create two identical outputs under the two conditions. Additionally, participants were instructed to determine the settings of the app that best suited the child depicted in the vignette, including the number of items populated within each part of speech (e.g., subjects, verbs, prepositions, descriptors, and objects), number of columns available of each part of speech, message bar size, and size of the input photo.

2.4.4. Usability Questionnaires

Following their participation in the creation of two mixed displays, participants individually completed a modified version of the MAUQ and a post-questionnaire. These questionnaires were completed independently either directly after the QuickPic AAC experience or submitted to the experimenter no later than 24 h subsequent to their usage of the QuickPic AAC application. The questionnaires needed to be completed within this timeframe to ensure that participants’ experiences and impressions of the task remain recent in order to gain accurate and reliable feedback. This alleviates recall bias, which can occur if participants forget details, as well as helps prevent participants from discussing their experience with others, mitigating social desirability bias. This standardized response window that was maintained across all participants enhanced the comparability of the data. The post-questionnaires allowed for participants to provide their experiences of the two conditions facilitated by the NLG-AAC approach and the GPT-3.5 approach.

2.5. Data Analysis

Data on the perceived benefits and barriers to creating TSDs (AAC Experience Questionnaire) were analyzed descriptively (the small sample size precluded statistical analysis) by calculating the number of participants who were in support of statements on benefits and barriers, respectively.

Relevant vocabulary was analyzed using simple descriptive summary statistics for each of the approaches (NLG-AAC and GPT-3.5) in terms of specificity. This includes the range, mean, and standard deviations (SD) of the ratios (i.e., percentages) of the vocabulary kept, the vocabulary deleted, and the icons that were modified. As sample size was small, the data were analyzed using the Friedman nonparametric test for several related samples [28]. This test analyzes data for significant differences among the mean ranks for the dependent variables (i.e., vocabulary kept, vocabulary deleted, vocabulary/icons modified). Significant differences were analyzed using the Wilcoxon Signed Rank test [29]. The Bonferroni correction was applied in order to reduce the type 1 error.

Data on overall usability were also analyzed using simple descriptive summary statistics for both surveys (MAUQ and post-questionnaire) for each of the conditions (NLG-AAC and GPT-3.5), including the range, mean, and standard deviations (SD) of the scores in both surveys. Further analysis was conducted within the post-questionnaire. Item analysis was achieved by calculating means across all eight participants per item. Sub-group analysis was also achieved by calculating means across items within the three subgroups. Finally, thematic analysis on the open-ended questions was conducted to reveal overall usability.

3. Results

3.1. Perspectives on Benefits of and Barriers in Creating TSDs

Users’ perspectives on the perceived benefits of and barriers to creating TSDs in general (i.e., without QuickPic AAC) were revealed through an analysis of the AAC Experience Questionnaire. Participants responded to perceived benefits of TSDs (Table 2) and barriers in creating TSDs without QuickPic AAC (Table 3). At a group level, 8/8 (100%) participants agreed with the following benefits of TSDs: (a) facilitates expansion of utterance length, (b) helps with addressing communication goals in sessions, (c) helps with modeling of vocabulary, and (d) increases my client’s ability to communicate about a specific topic. Additionally, 6/8 (75%) participants agreed that TSDs increased the fluidity of communicating about a specific topic at hand. In terms of barriers to creating TSDs without QuickPic AAC, 8/8 (100%) of participants reported the time it takes to create TSDs was a barrier to including them in sessions. There was more discrepancy in relation to other perceived barriers, including that (a) 3/8 (37.5%) participants found it challenging to create visually appealing TSDs and they were unsure of the organization, framework, and guidelines for creating TSDs to include in sessions; (b) 2/8 (25%) participants reported that it was challenging to identify vocabulary and language to target using TSDs, while 1/8 (12.5%) participants reported that they did not have the resources (i.e., apps, software) to create TSDs.

3.2. Group-Level Descriptive Results

Vocabulary/Icons Kept. Participants were asked to read the vignette and create TSDs under both conditions using the QuickPic AAC app. Vocabulary/icons kept by the participants ranged from 6.67% to 64.52% (M = 38.55%; SD = 20.45%) for NLG-AAC and from 33.33% to 100% (M = 58.04%; SD = 21.89%) for GPT-3.5. Across 6/8 or 75% of participants, a greater percentage of vocabulary/icons was kept in the GPT-3.5 condition (Figure 2).

Vocabulary Kept, but with Icons Altered. Some vocabulary was kept by participants but either they chose to alter the icons representing the vocabulary item or they chose to place the icon into a different column of the Fitzgerald Key layout of QuickPic AAC. Icons altered by the participants ranged from 0% to 6.25% (M = 3.38%; SD = 2.97%) for NLG-AAC and from 0% to 31.25% (M = 5.01%; SD = 10.82%) for GPT-3.5. Thus, slightly more icons were kept but altered with GPT-3.5.

Vocabulary/Icons Deleted. Vocabulary/icons deleted by the participants ranged from 35.48% to 86.67% (M = 58.06%; SD = 19.23%) for NLG-AAC and from 0% to 66.67% (M = 36.94%; SD = 23.34%) for GPT-3.5. Thus, significantly more vocabulary was deleted with NLG-AAC relative to GPT-3.5.

3.3. Group-Level Inferential Results

A Friedman test was conducted to determine if there were statistical differences across conditions (i.e., NLG-AAC, GPT-3.5) among the mean ranks of the vocabulary kept, the vocabulary deleted, and the vocabulary modified. A statistically significant difference was found; X² (5, n = 8) = 26.113, p < 0.001. This indicates there were differences among the six mean ranks. Three orthogonal contrasts were performed with Wilcoxon tests. For vocabulary kept, the contrasts between NLG-AAC (M rank = 3.88) and GPT-3.5 (M rank = 4.75) were significant (p < 0.05). For vocabulary deleted, the contrasts between NLG-AAC (M rank = 5.13) and GPT-3.5 (M rank = 3.88) were significant (p < 0.05). For vocabulary modified, no significant difference was observed between NLG-AAC (M rank = 1.94) and GPT-3.5 (M rank = 1.44) (p < 0.05).

3.4. Individual Participant Results

In addition to examining group-level data, it is pertinent to examine participant-level data. Figure 3 displays finalized TSDs per participant created with each condition and the individual vocabulary/icons kept (circled in red), kept but modified (circled in yellow), or deleted (circled in blue) (Appendix E).

Overall Usability. All eight participants completed two post-questionnaires comparing their experiences between NLG-AAC and GPT-3.5 conditions related to overall experience and satisfaction. Results from the MAUQ are depicted in Figure 3. On a group level, usability scores ranged from 2.41 to 7.00 (M = 4.77; SD = 1.33) and from 4.12 to 7.00 (M = 5.47; SD = 0.86) for the NLG-AAC condition and the GPT 3.5 condition, respectively.

The second post-questionnaire participants completed was adapted from Fontana de Vargas et al. [23]. Results from this post-questionnaire are shared in Figure 4. Overall usability scores for the NLG-AAC condition ranged from 3.69 to 5.38 (mean = 4.80, SD = 0.64) while the GPT-3.5 condition ranged from 4.12 to 6.75 (mean = 5.82, SD = 0.65). These scores demonstrate overall higher usability scores for the GPT-3.5 approach, reinforcing those obtained from the MAUQ scores, demonstrating overall higher usability scores on GPT-3.5 condition.

To give a more detailed perspective, the post-questionnaire results were also analyzed at the item level and sub-group level (i.e., interaction, vocabulary generation, and overall usage). Figure 5 provides these results in detail. Item analysis was obtained by calculating averages across all eight participants per item. Sub-group analysis was also obtained by calculating averages across items within the three subgroups: interaction, vocabulary generation, and overall usage. Most prominently, the vocabulary generation sub-group demonstrated the most noticeable difference between NLG-AAC and GPT-3.5 conditions, with an overall greater score in the GPT-3.5 condition.

Lastly, participants were asked open-ended questions about their experience using QuickPic AAC. Results from the open-ended questions on overall experience are presented in Appendix F, while use case scenarios are provided in Table 4 from all of the participants. All responses are reported verbatim from the participants, unless indicated otherwise through the inclusion of brackets. Participants were randomly assigned and were unaware of each condition, and reported on their experience between Experience A and Experience B. The use of brackets was employed to clarify the condition (i.e., NLG and GPT-3.5) being referenced by each participant.

Responses across participants reveal a general unanimous consensus on the feasibility and usability of QuickPic AAC in creating TSDs. An overall theme across participant reports demonstrated that the app offered a quick and easy way to create TSDs. Notably, two participants even commented that their experience with QuickPic AAC surpassed alternative AAC apps (i.e., Boardmaker, TouchChat HD-AAC). Users noted that it was beneficial that QuickPic AAC provided a starting point to create TSDs, facilitating the rate at which TSDs could be created. Lastly, users commented on QuickPic AAC’s intuitive interface, emphasizing its ease of use and the ease of the editing process. These responses overall demonstrate QuickPic AAC’s ability to streamline TSDs.

4. Discussion

As artificial intelligence (AI) makes significant headway in various arenas, the field of speech–language pathology is at the precipice of experiencing a transformative shift towards automation. This study aimed to introduce QuickPic AAC, an AI-driven application designed to generate topic-specific displays (TSDs) just-in-time from photographs. Specifically, the purpose of this study was to (a) determine which of two AI algorithms (NLG-AAC and GPT-3.5) results in more relevant vocabulary with the QuickPic AAC application; and to (b) evaluate the perceived usability of QuickPic AAC among practicing speech–language pathologists. The data provide statistically significant evidence that GPT-3.5 consistently generates more relevant vocabulary in that it consistently results in more vocabulary kept for final TSDs and in less vocabulary/icons deleted. It is noteworthy to indicate that the more vocabulary that is kept, the less editing is needed, and therefore less time is needed to create personalized TSDs. In general, SLPs expressed an overall high satisfaction in using QuickPic AAC. QuickPic AAC’s ability to swiftly create user-friendly TSDs may pave the way for other AI-driven tools to enhance language intervention strategies.

A primary focus of this study was on the quality of appropriate vocabulary generated through a specific controlled use case scenario (i.e., vignette). Overall, our findings provided insight that different AI algorithms provide varied vocabulary based on the same stimuli (i.e., a photograph). Also, in general, the GPT-3.5 algorithm provided more relevant vocabulary based upon SLPs’ judgments. A noteworthy discussion point that can be highlighted from the results is the large SD in the percentage of relevant vocabulary kept for both conditions, suggesting a wide variation in the number of vocabulary items that participants deemed relevant to keep. Some participants retained a relatively low percentage of icons (i.e., NLG-AAC: 6.67%, GPT-3.5: 33.33%), while others kept a significantly higher percentage (i.e., NLG-AAC: 64.52%, GPT-3.5: 100%). From a clinical standpoint, this poses an interesting finding as it indicates perceived relevance or important of vocabulary may not be consistent amongst SLPs.

While the statistical analysis offers valuable insights, there are additional interesting qualitative observations that may be further analyzed and discussed. For example, the symbol “baby” (to represent the vocabulary “young”) and “old man” (to represent the vocabulary “old”) frequently appeared in the vocabulary generated by the NLG-AAC algorithm, but was consistently deleted by all participants. All participants in this study deleted the icons “old” and “young”, indicating the vocabulary was not appropriate to target for this particular child described in the vignette. Additionally, from an app programming perspective, the NLG-AAC algorithm tended to generate “old” and “young” when the algorithm identified that there was a child present in the photo. In contrast, the GPT-AAC algorithm did not generate these words, which aligns with the greater satisfaction reported for GPT-AAC from a clinical standpoint. This qualitative observation further highlights our findings that the GPT-AAC algorithm generates more contextually appropriate vocabulary as deemed by SLPs, suggesting it may be more effective in clinical applications.

Another interesting observation involves the vocabulary generation from the algorithms when comparing photographs with and without a human figure. Specifically, Participant #1 selected a photograph of a car that does not feature any people, while all of the other participants selected photographs of vehicles of some sort with at least one person. Seemingly, this stands out as having the lowest number of deleted icons across all NLG-AAC trials. This trend is also consistent across GPT-3.5 trials as well, with the exception of Participant #8, who did not delete any items when using the GPT algorithm. This suggests that the absence of humans in the photographs selected may influence the relevance of the generated vocabulary, leading to fewer deletions and potentially indicating more appropriate vocabulary generation. Future studies may focus on the selection of photograph stimulus with and without humans and comparing relevance of vocabulary content.

Further, it is important to consider the consistency of algorithm-generated vocabulary across different participants who selected the same images. A total of five different photographs were selected by the eight participants, serving as input stimuli. That means three photographs were selected by more than one participant allowing for a comparison of algorithm consistency (NLG-AAC, GPT-3.5). Specifically, participants #4 and #8 both selected a photo depicting a boy playing with trains on a track, participants #3 and #5 both selected a photo of two boys playing with race cars, and participants #6 and #7 both selected a photo of a boy playing with cars on a wooden floor. There was variability in the TSD arrangement (i.e., grid size, number of columns assigned per part of speech, and number of icons generated per part of speech) due to participants being instructed to adjust the settings to best suit the child depicted in the vignette. However, there was no variability in the vocabulary being generated by both algorithms when the same photograph and settings were selected. For example, participants #4 and #8 both selected the photograph of “the boy in the green shirt playing cars on the wooden floor”. In the NLG-AAC condition, Participant #4’s settings included up to four icons and one column per part of speech, while Participant #8’s settings included up to eight icons and two columns per part of speech. Despite these differences affecting the aesthetics of the TSD, all of the icons generated in Participant #4’s TSD were also generated in Participant #8’s TSD and in the same order. This was observed consistently across both NLG-AAC and GPT-3.5 conditions, in all three instances. This is further confirmed as Participant #3 and Participant #4 both selected the photo of “two boys playing race cars”. In the NLG-AAC condition, both participants’ settings were the same (i.e., up to four icons for each part of speech) and the vocabulary generated across both participants was consistent and in the same order.

Because the photographs were the same within each participant (for both conditions) we successfully controlled for threats to internal validity due to item difficulty for within-participant comparisons between the two conditions (NLG-AAC and GPT-3.5).

Another primary focus of our study was the overall satisfaction and usability of QuickPic AAC amongst SLP professionals. As discussed previously, the personalized creation of TSDs has a myriad of benefits reported by speech–language pathologists. These advantages include the expansion of utterance length, aiding clinicians in targeting specific communication goals and objectives during sessions, facilitating effective vocabulary modeling, supporting aided language stimulation, and increasing ability to communicate about a specific topic or activity. While the advantages are apparent, certain barriers were identified in the integration of TSDs, with the largest barrier of all being the time constraints as a primary obstacle in incorporating TSDs into SLPs’ sessions. Our overall findings demonstrate SLPs were satisfied in using an AI-generated app to create TSDs as it was a quick and efficient way to personalize communication materials for their clients.

5. Limitations and Future Directions

While our preliminary findings are promising in demonstrating the use of AI in speech–language pathology to create TSDs, there are several limitations that need to be recognized. First and foremost, one limitation pertains to the use of different photographs across participants. With the exception of the participant pairs who received the same photos (as described above) they were not kept consistent across all eight participants. Thus, the nature of the photographs may have introduced an extraneous variable that influenced the outcomes. Future research should keep the photographs constant across participants or match the nature of the photos across participants. Relatedly, it is yet unknown whether the nature of the scenes displayed in the photographs afford better or worse AI-powered generation of vocabulary. In the current study, participants used QuickPic AAC with only one input photograph across two algorithms. Future research should strive to have participants use multiple input photographs to enhance external validity.

Similarly, the current design allows participants to engage with QuickPic AAC utilizing different settings (i.e., selecting the number of icons per part of speech), which may introduce variability in their decision-making processes. Future studies should strive to have standardized settings to parse out usability of the application itself versus appropriateness of vocabulary generated by the AI algorithms.

An additional limitation pertains to the small sample size and single-site recruitment. The eight participants were recruited at the same institution to allow for consistent implementation and evaluation of the application within a controlled environment. As an exploratory study, our goal was to gain insights into the potential benefits of two AI algorithms to create TSDs. As such, a smaller sample size was appropriate to meet these objectives and to identify whether further studies were warranted. This leads us to further discuss the heterogeneous characteristics of the participants in this study (e.g., chronological age, years practicing as an SLP, frequency working with individuals who use AAC, etc.). For instance, frequency of working with AAC users may play an influential role in their familiarity with various AAC tools and applications and their comfort level in interacting with technologies. Years of experience may impact their understanding of perspectives and strategies of using AAC tools. Having a broad range in characteristics likely reflects the population characteristics of those clinicians working in hospital settings and therefore is a net positive. However, the small sample size does negatively impact generalizability and requires future research to expand the external validity of findings.

This leads us to further discuss the heterogeneous characteristics of the participants in this study (e.g., chronological age, years practicing as an SLP, frequency working with individuals who use AAC). Importantly, QuickPic AAC is meant to be an additive AAC tool that is used in conjunction with an individual’s primary AAC system to enhance communication related to specific topics of interest. Examples of use could be sharing information about weekend news, describing an activity that occurred at school, discussing a highly preferred area of interest, etc. By bridging the use of QuickPic AAC with an individual’s primary communication tool, this may provide an avenue for enriched personalized instruction and opportunity to capitalize on teachable moments.

There are some directions in terms of development as well. It is essential to acknowledge a notable restriction of QuickPic AAC, specifically its reliance on internet connectivity for functionality, as it uses GPT-3.5. This limitation restricts its usage to environments that are equipped with internet access only. This issue highlights the need for future improvement that can enhance its utilization across a broader range of settings. Given our findings that demonstrate GPT-3.5 provides superior relevant vocabulary than NLG-AAC, this study needs to be expanded to include comparison of other AI data sets. Exploring different AI models and their effectiveness in generating relevant vocabulary would likely provide a more comprehensive understanding of the capabilities and limitations of AI being used for vocabulary selection in AAC software and applications. Additionally, at this juncture, it would be of value to compare the performance of AI with the performance of humans (i.e., clinicians) in creating topic-specific displays. Furthermore, research should examine how QuickPic AAC can be implemented in practice settings involving minimally speaking individuals. Lastly, ethical considerations should be taken into account when integrating AI into AAC practices. Future studies should address issues such as privacy, biases, and security risks.

6. Conclusions

AI has considerable potential in allied health fields including speech–language pathology. In this study, an AI-driven application designed to generate topic-specific displays from photographs on the fly, QuickPic AAC, was evaluated in terms of the relevance of vocabulary generated using two different AI algorithms, and perceived usability. GPT-3.5 produced greater relevant vocabulary compared to NLG-AAC. Additionally, practicing SLPs rated QuickPic AAC high in terms of its usability in effortlessly creating topic-specific displays. By embracing AI-technologies such as QuickPic AAC, SLPs can leverage its capabilities to alleviate the time demands on creation of personalized materials and dedicate more attention to individualized care and treatment for improving communication skills in individuals.

Author Contributions

The authors made the following contributions: Conceptualization, H.C.S., R.W.S., C.Y., M.F.d.V. and L.A.W.; Methodology, R.W.S.; Software—Programming, M.F.d.V.; Software—Feedback, H.C.S., C.Y. and L.A.W.; Data Collection, C.Y. and L.A.W.; Statistical Analysis, R.K.; Formal Analysis, R.W.S. and C.Y.; Writing—Original Draft Preparation, C.Y., R.W.S., R.K. and H.C.S.; Writing—Review and Editing, C.Y., R.W.S., H.C.S., L.A.W. and M.F.d.V. All authors have read and agreed to the published version of the manuscript.

Funding

The creation of QuickPic AAC and this research received partial support from the App Factory to Support Health and Function of People with Disabilities, funded by a grant from the National Institute on Disability, Independent Living, and Rehabilitation Research (NIDILRR) under the U.S. Department of Health and Human Services, specifically, the Shepherd Center (Grant # 90DPHF0004) and Fayetteville Manlius School District.

Institutional Review Board Statement

Ethical review and approval were waived for this study (“exempt”) by the Boston Children's Hospital Institutional Review Board (IRB-P00045496, Date: 18 July 2023) because it is limited to research activities in which the disclosure of the human subjects’ responses outside the research did not reasonably place the subjects at risk of criminal or civil liability or was not damaging to the subjects’ financial standing, employability, educational advancement, or reputation. Participants provided verbal consent.

Informed Consent Statement

Participants provided verbal consent.

Data Availability Statement

Data are available upon request from the first author.

Acknowledgments

The authors gratefully acknowledge the participants who generously shared their time and expertise in the development of QuickPic AAC and this research.

Conflicts of Interest

Christina Yu, Howard Shane, and Mauricio Fontana Vargas have received a grant from App Factory. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

Correction Statement

This article has been republished with a minor correction to the Institutional Review Board Statement and Conflicts of Interest Statement. This change does not affect the scientific content of the article.

Appendix A. QuickPic Reference Guide Provided to Participants

Appendix B. Speech–Language Pathologist Demographic and AAC Experience Questionnaire

Appendix C. mHealth App Usability Questionnaire (MAUQ) for Standalone mHealth Apps Used by Healthcare Providers

mHealth App Usability Questionnaire (MAUQ) for Standalone mHealth Apps Used by Healthcare Providers
#	Statements	N/A		1	2	3	4	5	6	7
1.	The app was easy to use.	☐	DISAGREE	☐	☐	☐	☐	☐	☐	☐	AGREE
2.	It was easy for me to learn to use the app.	☐	DISAGREE	☐	☐	☐	☐	☐	☐	☐	AGREE
3.	The navigation was consistent when moving between screens.	☐	DISAGREE	☐	☐	☐	☐	☐	☐	☐	AGREE
4.	The interface of the app allowed me to use all the functions (such as entering information, responding to reminders, viewing information) offered by the app.	☐	DISAGREE	☐	☐	☐	☐	☐	☐	☐	AGREE
5.	Whenever I made a mistake using the app, I could recover easily and quickly.	☐	DISAGREE	☐	☐	☐	☐	☐	☐	☐	AGREE
6.	I like the interface of the app.	☐	DISAGREE	☐	☐	☐	☐	☐	☐	☐	AGREE
7.	The information in the app was well organized, so I could easily find the information I needed.	☐	DISAGREE	☐	☐	☐	☐	☐	☐	☐	AGREE
8.	The app adequately acknowledged and provided information to let me know the progress of my action.	☐	DISAGREE	☐	☐	☐	☐	☐	☐	☐	AGREE
9.	I feel comfortable using this app in social settings.	☐	DISAGREE	☐	☐	☐	☐	☐	☐	☐	AGREE
10.	The amount of time involved in using this app has been fitting for me.	☐	DISAGREE	☐	☐	☐	☐	☐	☐	☐	AGREE
11.	I would use this app again.	☐	DISAGREE	☐	☐	☐	☐	☐	☐	☐	AGREE
12.	Overall, I am satisfied with this app.	☐	DISAGREE	☐	☐	☐	☐	☐	☐	☐	AGREE
13.	The app would be useful for my healthcare practice.	☐	DISAGREE	☐	☐	☐	☐	☐	☐	☐	AGREE
14.	The app improved my access to delivering healthcare services.	☐	DISAGREE	☐	☐	☐	☐	☐	☐	☐	AGREE
15.	The app helped me manage my patients’ health effectively.	☐	DISAGREE	☐	☐	☐	☐	☐	☐	☐	AGREE
16.	This app has all the functions and capabilities I expected it to have.	☐	DISAGREE	☐	☐	☐	☐	☐	☐	☐	AGREE
17.	I could use the app even when the Internet connection was poor or not available.	☐	DISAGREE	☐	☐	☐	☐	☐	☐	☐	AGREE
18.	This mHealth app provides an acceptable way to deliver healthcare services, such as accessing educational materials, tracking my own activities, and performing self-assessment.	☐	DISAGREE	☐	☐	☐	☐	☐	☐	☐	AGREE

Appendix D. AAC Practitioner/Caregiver’s Feedback Questionnaire Created by Fontana de Vargas et al. [23]

AAC Practitioner/Caregiver’s Feedback Questionnaire

Based on your experience using our application with your clients/family members, please indicate to what extent you agree or disagree with the following statements:

Interaction

The symbol set used was appropriate

Strongly Disagree	Disagree	Neutral	Agree	Strongly Agree
○	○	○	○	○

2.: The voice output quality was appropriate

Strongly Disagree	Disagree	Neutral	Agree	Strongly Agree
○	○	○	○	○

3.: Users could easily select a desired vocabulary item within a page

Strongly Disagree	Disagree	Neutral	Agree	Strongly Agree
○	○	○	○	○

4.: Users could easily remove undesired vocabulary

Strongly Disagree	Disagree	Neutral	Agree	Strongly Agree
○	○	○	○	○

5.: Users could easily navigate through existing pages to find a desired photo and the associated page

Strongly Disagree	Disagree	Neutral	Agree	Strongly Agree
○	○	○	○	○

6.: Users could easily create a new page with a new photo

Strongly Disagree	Disagree	Neutral	Agree	Strongly Agree
○	○	○	○	○

7.: Users tended to access/use vocabulary from previously created pages (e.g., previous days)

Strongly Disagree	Disagree	Neutral	Agree	Strongly Agree
○	○	○	○	○

8.: Users tended to access/use vocabulary from newly created pages (e.g., instants or minutes after creating)

Strongly Disagree	Disagree	Neutral	Agree	Strongly Agree
○	○	○	○	○

B.: Vocabulary quality

9.: The generated vocabulary included words users wanted to use

Strongly Disagree	Disagree	Neutral	Agree	Strongly Agree
○	○	○	○	○

10.: The generated vocabulary included words users did not want to use

Strongly Disagree	Disagree	Neutral	Agree	Strongly Agree
○	○	○	○	○

11.: The order the vocabulary was presented was adequate

Strongly Disagree	Disagree	Neutral	Agree	Strongly Agree
○	○	○	○	○

C.: Usage

12.: Users enjoyed using the application

Strongly Disagree	Disagree	Neutral	Agree	Strongly Agree
○	○	○	○	○

13.: Users demonstrated willingness to use the application

Strongly Disagree	Disagree	Neutral	Agree	Strongly Agree
○	○	○	○	○

14.: Users operated the application independently

Strongly Disagree	Disagree	Neutral	Agree	Strongly Agree
○	○	○	○	○

15.: Users were more communicative using the application than they usually are using other AAC tools

Strongly Disagree	Disagree	Neutral	Agree	Strongly Agree
○	○	○	○	○

16.: Users would benefit it there was a complete, commercially ready application based on our prototype/beta-version

Strongly Disagree	Disagree	Neutral	Agree	Strongly Agree
○	○	○	○	○

Appendix E. Comparison of Each Participant’s Original and Finalized TSDs Generated with Each Condition; Modifications Completed by the Participant Are Denoted by the Following Color-Coding: Deleted (Blue), Kept (Red), and Modified (Yellow)

Participant and Condition	Static Scene Cue Selected	Mixed Displays Generated
Participant and Condition	Static Scene Cue Selected	Original	Finalized
1 NLG
1 GPT 3.5
2 NLG
2 GPT 3.5
3 NLG
3 GPT 3.5
4 NLG
4 GPT 3.5
5 NLG
5 GPT 3.5
6 NLG
6 GPT 3.5
7 NLG
7 GPT 3.5
8 NLG
8 GPT 3.5
© Copyright 1998–2022 Tobii Dynavox. All Rights Reserved.

Appendix F. Open-Ended Questions Related to Overall Experience and Vocabulary Generation within QuickPic AAC across Participants

Describe Your Overall Experience Using the QuickPic AAC App.

-: It was easier to make topic display boards than using Boardmaker or TouchChat HD-AAC. It was faster and helpful that the app provided a starting point.
-: Allowed me to easily create a topic specific display.
-: Nice quick way to generate topic based displays.
-: It was easy to create a topic display based on a simple scene. Editing was simple and effective.
-: I enjoyed using the app- the overall learning process felt quick and I felt comfortable navigating and programming it on my own. It was much easier/quicker to program in comparison to another AAC app I have used.
-: Great! This updated version is much improved- sleeker with more editing capabilities.
-: I like the vocabulary selection feature, but wish I could preview top choices before committing to a specific choice. I felt like the prediction was generic.
-: Experience [GPT-3.5] was amazing! So quick and easy to use.

When Comparing [NLG-AAC] and [GPT3.5], How Do They Compare in Terms of Vocabulary Generation and Your Overall Experience?

-: I thought [GPT-3.5] generated more appropriate vocabulary and a wider range of appropriate words.
-: [GPT-3.5] did a better job of generating vocabulary compared to [NLG]. I needed to change less with [GPT-3.5].
-: [GPT-3.5] did a better job. [QuickPic AAC] picked too many irrelevant words which resulted in more time spent deleting.
-: [GPT-3.5] was significantly better at generating appropriate vocabulary.
-: [GPT-3.5] generated more appropriate topic-specific vocabulary on its own, so I didn’t need to spend as much time editing/programming the page than I did with [NLG].
-: [GPT-3.5] included more usable vocabulary- it did include some higher-level vocabulary without some basics (e.g., “imagine” but not “want”).
-: [GPT-3.5] had more prepositions that I would use. [NLG] had more descriptors I would use, but was missing subject and objects.
-: [GPT-3.5] was accurate at reflecting words I would want to use. Vocabulary choice for [NLG] was random.

References

Shane, H.C.; Laubscher, E.; Schlosser, R.W.; Flynn, S.; Sorce, J.F.; Abramson, J. Applying technology to visually support language and communication in individuals with ASD. J. Autism Develop. Dis. 2012, 42, 1228–1235. [Google Scholar] [CrossRef] [PubMed]
Beukelman, D.; Light, J. Augmentative and Alternative Communication: Supporting Children and Adults with Complex Communication Needs; Paul H. Brookes: Baltimore, MD, USA, 2020. [Google Scholar]
Koumpouros, Y.; Kafazis, T. Wearables and mobile technologies in Autism Spectrum Disorder interventions: A systematic literature review. Res. Autism Spectr. Disord. 2019, 66, 101405. [Google Scholar] [CrossRef]
Shane, H.C.; Laubscher, E.; Schlosser, R.W.; Fadie, H.L.; Sorce, J.F.; Abramson, J.S.; Flynn, S.; Corley, K. Enhancing Communication for Individuals with Autism: A Guide to the Visual Immersion System; Paul H. Brookes: Baltimore, MD, USA, 2014. [Google Scholar]
Beukelman, D.R.; Mirenda, P. Augmentative and Alternative Communication: Supporting Children and Adults with Complex Communication Needs, 3rd ed.; Paul H. Brookes: Baltimore, MD, USA, 2005. [Google Scholar]
Goossens’, C.; Crain, S.S.; Elder, P.S. Engineering the Preschool Environment for Interactive Symbolic Communication: 18 Months to 5 Years Developmentally, 4th ed.; Southeast Augmentative Communication: Birmingham, AL, USA, 1999. [Google Scholar]
Goossens’, C.; Crain, S. Establishing multiple communication displays. In Augmentative Communication: An Introduction; Blackstone, S., Ed.; American Speech-Language-Hearing Association: Rockville, MD, USA, 1986; pp. 337–344. [Google Scholar]
Goossens’, C.; Crain, S.; Elder, P. Engineering the Pre-School Environment for Interactive, Symbolic Communication: 18 Months to 5 Years; Clinician Series; Southeast Augmentative Communication Conference Publications: Birmingham, AL, USA, 1992. [Google Scholar]
Wilkinson, K.M.; McIlvane, W.J. Perceptual factors influence visual search for meaningful symbols in individuals with intellectual disabilities and Down syndrome or autism spectrum disorders. Am. J. Intellect. Dev. Disabil. 2013, 118, 353–364. [Google Scholar] [CrossRef] [PubMed]
Qian, Y.; Gilmore, R.; Wilkinson, K. The effects of spatial organization in the design of visual supports for adults with communicative disorders. J. Vis. 2019, 19, 310b. [Google Scholar] [CrossRef]
Thistle, J.J.; Wilkinson, K.M. Speech-Language pathologists’ decisions when designing an aided AAC display for a compilation case study of a beginning communicator. Disabil. Rehabil. Assist. Technol. 2020, 16, 871–879. [Google Scholar] [CrossRef] [PubMed]
O’Brien, A.; Schlosser, R.W.; Shane, H.C.; Abramson, J.; Allen, A.; Yu, C.; Dimery, K. Just-in-time visual supports for children with Autism via the Apple Watch: A pilot feasibility study. J. Autism Develop. Dis. 2016, 46, 3818–3823. [Google Scholar] [CrossRef] [PubMed]
Schlosser, R.W.; Shane, H.C.; Allen, A.; Abramson, J.; Laubscher, E.; Dimery, K. Just-in-time supports in augmentative and alternative communication. J. Phys. Develop. Dis. 2016, 28, 177–193. [Google Scholar] [CrossRef]
Lorah, E.R.; Holyfield, C.; Miller, J.; Griffen, B.; Lindbloom, C. A systematic review of research comparing mobile technology speech-generating devices to other AAC modes with individuals with autism spectrum disorder. J. Dev. Phys. Disabil. 2022, 34, 187–210. [Google Scholar] [CrossRef]
Morin, K.L.; Ganz, J.B.; Gregori, E.V.; Foster, M.J.; Gerow, S.L.; Genç-Tosun, D.; Hong, E.R. A systematic quality review of high-tech AAC interventions as an evidence-based practice. Augment. Altern. Commun. 2018, 34, 104–117. [Google Scholar] [CrossRef] [PubMed]
Schlosser, R.W.; Wendt, O. Effects of augmentative and alternative communication intervention on speech production in children with autism: A systematic review. Am. J. Speech-Lang. Pathol. 2008, 17, 212–230. [Google Scholar] [CrossRef] [PubMed]
Sennott, S.C.; Akagi, L.; Lee, M.; Rhodes, A. AAC and artificial intelligence (AI). Top. Lang. Disord. 2019, 39, 389–403. [Google Scholar] [CrossRef] [PubMed]
Fontana De Vargas, M.; Yu, C.; Shane, H.C.; Moffatt, K. Co-Designing QuickPic: Automated Topic-Specific Communication Boards from Photographs for AAC-Based Language Instruction. In Proceedings of the CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 11–16 May 2024; pp. 1–16. [Google Scholar]
Zapata, B.C.; Fernández-Alemán, J.L.; Idri, A.; Toval, A. Empirical studies on usability of mHealth apps: A systematic literature review. J. Med. Systems 2015, 39, 1. [Google Scholar] [CrossRef] [PubMed]
Maramba, I.; Chatterjee, A.; Newman, C. Methods of usability testing in the development of eHealth applications: A scoping review. Int. J. Med. Inform. 2019, 126, 95–104. [Google Scholar] [CrossRef] [PubMed]
Fontana De Vargas, M.; Moffatt, K. Automated Generation of Storytelling Vocabulary from Photographs for use in AAC. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Virtual, 1–6 August 2021; Volume 1: Long Papers, pp. 1353–1364. [Google Scholar]
Huang, T.H.; Ferraro, F.; Mostafazadeh, N.; Misra, I.; Agrawal, A.; Devlin, J.; Girshick, R.; He, X.; Kohli, P.; Mitchell, M.; et al. Visual storytelling. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; Association for Computational Linguistics: San Diego, CA, USA, 2016; pp. 1233–1239. [Google Scholar]
Fang, H.; Gupta, S.; Iandola, F.; Srivastava, R.K.; Deng, L.; Dollár, P.; Gao, J.; He, X.; Mitchell, M.; Platt, J.C.; et al. From captions to visual concepts and back. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; IEEE: Boston, MA, USA, 2015; pp. 1473–1482. [Google Scholar] [CrossRef]
Frey, B.J.; Dueck, D. Clustering by passing messages between data points. Science 2007, 315, 972–976. [Google Scholar] [CrossRef] [PubMed]
Schlosser, R.W.; Laubscher, E.; Sorce, J.; Koul, R.; Flynn, S.; Hotz, L.; Abramson, J.; Fadie, H.; Shane, H. Implementing directives that involve prepositions with children with autism: A comparison of spoken cues with two types of augmented input. Augment. Altern. Commun. 2013, 29, 132–145. [Google Scholar] [CrossRef] [PubMed]
Zhou, L.; Bao, J.; Setiawan, A.; Saptono, A.; Parmanto, B. The mHealth App Usability Questionnaire (MAUQ): Development and Validation Study. JMIR mHealth uHealth 2019, 7, e11500. [Google Scholar] [CrossRef] [PubMed]
Kooistra, B.; Dijkman, B.; Einhorn, T.A.; Bhandari, M. How to design a good case series. J. Bone Jt. Surg. 2009, 91 (Suppl. S3), 21–26. [Google Scholar] [CrossRef] [PubMed]
Daniel, W.W. Friedman two-way analysis of variance by ranks. In Applied Nonparametric Statistics, 2nd ed.; PWS-Kent: Boston, MA, USA, 1990; pp. 262–274. ISBN 978-0-534-91976-4. [Google Scholar]
Rey, D.; Neuhäuser, M. Wilcoxon-Signed-Rank Test. In International Encyclopedia of Statistical Science; Lovric, M., Ed.; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar] [CrossRef]

Figure 2. Percentage of Kept, Deleted, and Modified Icons of Each Participant: NLG-AAC vs. GPT-3.5 Method.

Figure 3. Participant overall average scores comparing the NLG-AAC to GPT-3.5 using the mHealth App Usability Questionnaire (MAUQ).

Figure 4. Bar graph demonstrating overall average scores from the de Vargas et al. [21] post-questionnaire survey results across all participants.

Figure 5. Bar graph demonstrating an item analysis and sub-group analysis from the post-questionnaire survey results across all participants from the de Vargas et al. [21].

Table 1. Participant characteristics.

Participant	Race	Ethnicity	CA	Years Practicing as an SLP	Frequency of Working with Individuals Who Use AAC	Have You Created Topic Displays	Frequency of Creating Topic Displays	Average Length to Create Topic Displays
1	White	Not Hispanic or Latino	25–34	2	Weekly	Yes	Occasionally	31–40 min
2	More than one race	Not Hispanic or Latino	25–34	4	Daily	Yes	Weekly	11–20 min
3	White	Not Hispanic or Latino	35–44	17	Daily	Yes	Monthly	<10 min
4	White	Not Hispanic or Latino	35–44	12	Daily	Yes	Weekly	11–20 min
5	White	Not Hispanic or Latino	25–34	2	Weekly	Yes	Monthly	21–30 min
6	White	Not Hispanic or Latino	35–44	12	Daily	Yes	Daily	<10 min
7	White	Not Hispanic or Latino	25–34	6	Daily	Yes	Monthly	11–20 min
8	White	Not Hispanic or Latino	55–64	35	Monthly (varies)	Yes	Occasionally	51–60 min

Note. CA = chronological age; SLP = speech–language pathologist; AAC = augmentative and alternative communication.

Table 2. Perceived benefits of topic-specific displays.

Participant	Increase my Client’s Ability to Communicate about a Specific Topic	Help Me Model Vocabulary	Help Me Address Communication Goals in My Sessions	Help Facilitate Expansion of Utterance Length	Help Increase the Fluidity of Communicating about a Specific Topic at Hand	Other
1	Yes	Yes	Yes	Yes	Yes	--
2	Yes	Yes	Yes	Yes	Yes	Help person who uses AAC to attend to the display and [create] word combinations; no dynamic display needed to navigate
3	Yes	Yes	Yes	Yes	Yes	--
4	Yes	Yes	Yes	Yes	Yes	--
5	Yes	Yes	Yes	Yes	No	--
6	Yes	Yes	Yes	Yes	Yes	Help direct families/teams re. modeling
7	Yes	Yes	Yes	Yes	Yes	--
8	Yes	Yes	Yes	Yes	No	--

Table 3. Perceived barriers to creating topic-specific displays.

Participant	The Time It Takes to Create a Topic-Specific Display Is a Barrier to Including Them in My Sessions	It Is Challenging to Identify Vocabulary and Language to Target Using Topic-Specific Displays	I Do Not Have the Resources (i.e., Apps, Software) to Create Topic-Specific Displays	I Feel Unsure about the Organization, Framework, and Guidelines for Creating Topic-Specific Displays to Include Them in My Sessions	It Is Challenging to Create Visually Appealing Topic-Specific Displays	Other
1	Yes	Yes	Yes	Yes	Yes	--
2	Yes	No	No	No	Yes	--
3	Yes	No	No	No	No	It might prevent generalization of vocabulary across contexts
4	Yes	Yes	No	No	Yes	--
5	Yes	No	No	Yes	No	--
6	Yes	No	No	No	No	--
7	Yes	No	No	No	No	--
8	Yes	No	No	Yes	No	--

Table 4. Open-ended questions related to use case scenarios and frequency of use across all participants.

Participant	If You Had Access to QuickPic AAC, Would You Incorporate It into Your Practice?	If Yes, How?	How Often Would You Use It?
1	Yes	NA	Weekly or monthly depending on my caseload
2	Yes	During therapy sessions to base my therapy on patient’s interests	Weekly
3	Yes	With communicators who need explicit support and phrase generation and have trouble navigating across pages	NA
4	Yes	Creating displays “on the fly” in therapy for common activities	Weekly
5	Yes	I would use it to create activity specific topic displays in a much more efficient manner. It would help me increase aided language modeling in sessions.	I regularly see patients who use AAC, so I would use it weekly.
6	Yes	For topic display users and families who are ready to start making their own.	In evaluations on a weekly/daily basis. As a recommendation, every other week.
7	Yes	Help families independently select vocabulary at home	Frequently
8	Absolutely	Creating topic display boards	My patient population is variable. I don’t always have patients who need topic display boards. I would use it anytime I needed to create a topic display board.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, C.; Schlosser, R.W.; Fontana de Vargas, M.; White, L.A.; Koul, R.; Shane, H.C. QuickPic AAC: An AI-Based Application to Enable Just-in-Time Generation of Topic-Specific Displays for Persons Who Are Minimally Speaking. Int. J. Environ. Res. Public Health 2024, 21, 1150. https://doi.org/10.3390/ijerph21091150

AMA Style

Yu C, Schlosser RW, Fontana de Vargas M, White LA, Koul R, Shane HC. QuickPic AAC: An AI-Based Application to Enable Just-in-Time Generation of Topic-Specific Displays for Persons Who Are Minimally Speaking. International Journal of Environmental Research and Public Health. 2024; 21(9):1150. https://doi.org/10.3390/ijerph21091150

Chicago/Turabian Style

Yu, Christina, Ralf W. Schlosser, Maurício Fontana de Vargas, Leigh Anne White, Rajinder Koul, and Howard C. Shane. 2024. "QuickPic AAC: An AI-Based Application to Enable Just-in-Time Generation of Topic-Specific Displays for Persons Who Are Minimally Speaking" International Journal of Environmental Research and Public Health 21, no. 9: 1150. https://doi.org/10.3390/ijerph21091150

APA Style

Yu, C., Schlosser, R. W., Fontana de Vargas, M., White, L. A., Koul, R., & Shane, H. C. (2024). QuickPic AAC: An AI-Based Application to Enable Just-in-Time Generation of Topic-Specific Displays for Persons Who Are Minimally Speaking. International Journal of Environmental Research and Public Health, 21(9), 1150. https://doi.org/10.3390/ijerph21091150

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

QuickPic AAC: An AI-Based Application to Enable Just-in-Time Generation of Topic-Specific Displays for Persons Who Are Minimally Speaking

Abstract

1. Introduction

2. Methods

2.1. Participants

2.2. Materials

2.3. Design and Measures

2.4. Procedures

2.4.1. Demographic and AAC Experience Questionnaire

2.4.2. Tutorial

2.4.3. Experimental Task

2.4.4. Usability Questionnaires

2.5. Data Analysis

3. Results

3.1. Perspectives on Benefits of and Barriers in Creating TSDs

3.2. Group-Level Descriptive Results

3.3. Group-Level Inferential Results

3.4. Individual Participant Results

4. Discussion

5. Limitations and Future Directions

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Correction Statement

Appendix A. QuickPic Reference Guide Provided to Participants

Appendix B. Speech–Language Pathologist Demographic and AAC Experience Questionnaire

Appendix C. mHealth App Usability Questionnaire (MAUQ) for Standalone mHealth Apps Used by Healthcare Providers

Appendix D. AAC Practitioner/Caregiver’s Feedback Questionnaire Created by Fontana de Vargas et al. [23]

Appendix E. Comparison of Each Participant’s Original and Finalized TSDs Generated with Each Condition; Modifications Completed by the Participant Are Denoted by the Following Color-Coding: Deleted (Blue), Kept (Red), and Modified (Yellow)

Appendix F. Open-Ended Questions Related to Overall Experience and Vocabulary Generation within QuickPic AAC across Participants

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI