Gamified Engagement for Data Crowdsourcing and AI Literacy: An Investigation in Affective Communication Through Speech Emotion Recognition

Siamtanidou, Eleni; Vrysis, Lazaros; Vryzas, Nikolaos; Dimoulas, Charalampos A.

doi:10.3390/soc15030054

Open AccessArticle

Gamified Engagement for Data Crowdsourcing and AI Literacy: An Investigation in Affective Communication Through Speech Emotion Recognition

Multidisciplinary Media and Mediated Communication (M3C) Research Group, School of Journalism and Mass Communications, Aristotle University, 54124 Thessaloniki, Greece

^*

Author to whom correspondence should be addressed.

Societies 2025, 15(3), 54; https://doi.org/10.3390/soc15030054

Submission received: 28 January 2025 / Revised: 17 February 2025 / Accepted: 20 February 2025 / Published: 22 February 2025

(This article belongs to the Special Issue Artificial Intelligence in Participatory Environments: Technologies, Ethics, and Literacy Aspects)

Download

Browse Figures

Versions Notes

Abstract

This research investigates the utilization of entertainment approaches, such as serious games and gamification technologies, to address various challenges and implement targeted tasks. Specifically, it details the design and development of an innovative gamified application named “J-Plus”, aimed at both professionals and non-professionals in journalism. This application facilitates the enjoyable, efficient, and high-quality collection of emotionally tagged speech samples, enhancing the performance and robustness of speech emotion recognition (SER) systems. Additionally, these approaches offer significant educational benefits, providing users with knowledge about emotional speech and artificial intelligence (AI) mechanisms while promoting digital skills. This project was evaluated by 48 participants, with 44 engaging in quantitative assessments and 4 forming an expert group for qualitative methodologies. This evaluation validated the research questions and hypotheses, demonstrating the application’s diverse benefits. Key findings indicate that gamified features can effectively support learning and attract users, with approximately 70% of participants agreeing that serious games and gamification could enhance their motivation to practice and improve their emotional speech. Additionally, 50% of participants identified social interaction features, such as collaboration, as most beneficial for fostering motivation and commitment. The integration of these elements supports reliable and extensive data collection and the advancement of AI algorithms while concurrently developing various skills, such as emotional speech articulation and digital literacy. This paper advocates for the creation of collaborative environments and digital communities through crowdsourcing, balancing technological innovation in the SER sector.

Keywords:

speech emotion recognition systems; gamification technologies; gamified applications; serious games; crowdsourcing; digital communities; digital literacy; media literacy

1. Introduction

The pace of technological advancement is accelerating, with artificial intelligence (AI) emerging as a prominent interdisciplinary field, pervading various aspects of an individual’s life [1,2,3,4]. AI is used to describe the capacity of computers and machines to perform tasks that could be considered “intelligent” with minimal human input [5,6,7]. The primary focus is on machine learning (ML), a subfield of AI that pertains to the manner in which computers and machines operate through the processes of learning from data and experience [5,6,7,8]. In essence, the system acquires knowledge through the examination of examples (training data) rather than being explicitly programmed by humans [5,9,10,11,12]. This is why AI and ML are characterized by adaptability, self-organization, and self-learning [5,6].

Despite the autonomous nature of AI systems in terms of their programming, the individual remains a crucial factor in guiding and conducting the learning phase of ML. This is due to the fact that they are responsible for a number of key tasks, including selecting algorithms, formatting data, defining learning parameters, and the resolution of any issues that may arise [13]. The successful development of an operational ML system is contingent upon the availability of a reliable and extensive database. This is precisely the most significant challenge that an individual is confronted with: the continuous collection and utilization of high-quality sample data for training purposes. The operational performance of the machine learning system is dependent on the quantity, quality, representativeness, and diversity of the data [7,12,13,14,15]. This process encounters challenges with regard to both the continuous and qualitative collection of data, as well as their adequacy [12]. Given the pivotal role of AI and ML in domains such as information, medicine, economics, transport control, and psychology [2,3,4], it is crucial to identify a proposal or solution that prioritizes the straightforward, continuous, dependable, collaborative, and extensive collection and provision of sample data. This should be the primary objective in the ongoing development and enhancement of these systems.

The considerable focus on AI has facilitated and accelerated the development of automated systems with the capacity to recognize a range of emotions through speech, including happiness, sadness, anger, fear, disgust, and surprise [16,17,18]. These systems are referred to as SER (speech emotion recognition), and their primary objective is to identify and categorize emotions expressed in human speech. These systems are based on the analysis of acoustic features extracted from speech using machine learning models trained on annotated datasets [19,20]. This offers a number of advantages, including the enhancement of human–machine interactions, the evaluation of user experience in digital environments, the monitoring of customer service mood, the assessment of mental health conditions, and the optimization of the user’s emotional engagement and experience during digital storytelling or immersion in a computer game [21,22,23,24]. Nevertheless, automatic emotion recognition from speech represents a challenging research area, exhibiting comparable issues to those encountered by the majority of AI technology-based systems. The obstacles and difficulties encountered by SER systems are arguably more intricate than those faced by other AI systems, which have proven highly functional [25]. In particular, there is difficulty in distinguishing the most appropriate signal characteristics and classification methods [25,26]. Furthermore, the process of data collection is inherently time-consuming, contingent upon the underlying interpretations employed in emotion labeling and the considerable individual variability in emotional expression [27,28]. Additionally, concerns have been raised regarding the privacy of the recorded data and the permissible limits of analysis of users’ speech [29]. It is, therefore, essential to identify a solution that can capitalize on the potential benefits of such systems while addressing the challenges, ethical concerns, and privacy issues that have been highlighted.

It is possible to suggest that alternative multimedia technologies could prove an effective solution, not only in terms of the evolution and development of AI systems but also in terms of providing a variety of possibilities and utilities to the individual/user. It has been demonstrated that serious games and gamification technologies can be employed as alternative multimedia applications [30,31,32]. Although these alternative multimedia technologies have been explicitly linked to entertainment, their impact on various social aspects of the individual has been demonstrated over time [33,34]. The incorporation of elements such as competition, rewards, challenges, scoring, achievements, and the provision of a digital interactive simulation environment contribute to the impressive and enticing nature of these multimedia applications. These elements collectively serve to mobilize the user’s attention and stimulate their interest and engagement with the applications [30]. On the basis of these characteristics, the dual literacy they offer the user emerges in terms of their subject area and the digital skills of the individual. According to the latter, the approaches of these applications prove to be ideal for familiarizing individuals with demanding and time-consuming technological achievements, such as digital media literacy needs [34,35].

The robust correlation between multimedia progress and sophisticated mediated communication capabilities has led to a surge in public engagement with highly interactive and immersive environments. This has enabled the formation of advanced media services that can prove highly beneficial for the individual in terms of developing digital literacy, as well as for other demanding lifelong learning tasks, by utilizing media and education in more enjoyable and effective ways [34,35,36,37]. Modern multimedia genres, especially digital games, offer users experiences, new content, and visions that transcend the digital medium itself [38]. The discussion focuses on the creation and utilization of digital networks for the development of additional content within social networks or other virtual channels. This encompasses the formation of social organizations, interaction between users, and mass collaboration between them to achieve a common goal. In other words, a new social culture known as crowdsourcing has emerged, which is, in essence, a collaborative activity applicable to the information systems sector. It involves the co-creation and innovation of users [39,40,41]. It can, therefore, be argued that the approaches developed in the field of alternative multimedia applications have the potential to be utilized in the resolution of the functional issues inherent to AI mechanisms while simultaneously focusing on the user, thereby providing them with a means of training and developing a variety of literacies.

This work addresses a number of issues related to the functionality of systems and the optimal performance of AI mechanisms in general, with a particular focus on emotional speech recognition systems as a case study. For these systems to operate successfully and correctly, it is essential that audio recordings with emotional speech labeling are continuously and reliably collected. This paper puts forth a solution in the form of particular techniques and applied tools that are not concerned with the problem itself but rather with the user. In essence, this paper proposes the use of gamification technologies, serious games, and, as a consequence, the social networks they form as digital environments based on entertainment. However, these digital tools are designed to optimize the user experience, engage the user, and facilitate learning and informal education. Concurrently, efforts are being made to elucidate the concerns and issues pertaining to this project.

The remainder of this paper is structured into five sections. Section 2 examines significant works pertinent to the fields addressed in this study. The objectives of the current project are delineated in Section 3. The phases of analysis, design, development, and evaluation, which constitute the materials and methods of this research, are detailed in Section 4. Section 5 presents the experimental results and discussion, highlighting the outcomes of the final project in relation to the stated hypotheses and research questions. This paper concludes with Section 6, which provides a summary of the main points and an analysis of the conclusions.

2. Related Work

In recent years, there has been a notable shift in scientific focus toward the advancement of methods and strategies for informal education, recognizing their pivotal role in the development of various skills and competencies in individuals [30]. The primary objective is to cultivate knowledge through digital and interactive environments, which are not inherently linked to the formal educational sector. The aim is to facilitate participation, create motivation, provide enjoyable engagement, foster a sense of belonging to a group, and promote productive learning [34,35,36,37]. Consequently, this study explores the domains of serious games, gamification technologies, and crowdsourcing. By examining these areas, this study seeks to address different perspectives related to the advancement of media and education. These perspectives include the educational approach, which examines whether informal education methods can enhance an individual’s learning experiences and outcomes; the technological approach, which focuses on utilizing digital and interactive environments to facilitate knowledge acquisition; and the psychological approach, which investigates the impact of the characteristics of gamification technologies and serious games on an individual, including aspects such as motivation, engagement, and a sense of belonging. At a broader level, additional perspectives can be integrated, such as professional, highlighting the benefits of these methods for media literacy and skill development among journalism professionals; societal, promoting digital literacy and media literacy among the general public; and innovative, exploring whether crowdsourcing and immersive journalism can transform the dissemination and consumption of information. This correlation arises from the identification of the target group, which primarily includes professionals in the field of journalism and, secondarily, the general public. The effectiveness of these three areas lies in their ability to combine the advantages of easy, experiential, and entertaining information dissemination with the simultaneous promotion of digital literacy and media literacy. This approach takes into account the rapidly evolving information landscape and recent trends in multimedia, mobile, and immersive journalism [42,43].

Beginning with serious games and gamification technologies in the field of journalism, the extensive body of research available, coupled with the diverse journalistic domains they address, underscores their significant contribution, thereby reinforcing their use as educational tools. For instance, according to the research [44], the integration of serious games during the training of journalists is deemed essential. Specifically, these games aid aspiring journalists in developing the necessary analytical reasoning and news gathering and processing skills, interview techniques, and fostering empathy and political sensitivity—attributes considered vital for journalistic expertise [44]. Similarly, research [45] presents findings from the use of various serious games in the training of future journalists. It demonstrates that this method can effectively achieve the objectives of journalistic education, including media literacy, the ability to tackle complex and specialized tasks, fact-checking skills, and the creation of digital content. Despite the positive outcomes, participants in this study identified certain challenges related to the practical implementation of serious games in academic settings, specifically issues of accessibility, methodological teaching support, and ethical dilemmas [45]. In the context of educating and preparing future journalists, study [46] adopts a similar approach but focuses exclusively on gamification. It details the development of a gamified platform whose use and functionality yielded positive results in enhancing participants’ critical thinking and improving their journalistic practices [46]. Equally noteworthy is the contribution of study [47], which, through the application of gamified strategies, aims to motivate already trained journalists. The results of this study are compelling, as they suggest that competition, rewards, and incentives can lead to improved performance and motivation among journalists [47]. The use of a playful digital environment with educational impact extends beyond professional journalists, as positive outcomes are also observed in the realm of public information. Specifically, research [48] explored the combination of gaming and information dissemination. It examined the potential of presenting topics and informing the public through serious games and gamified digital environments, focusing on current and significant societal events. The application of this method demonstrated that journalistic values are reinforced, as an immersive, playful, and participatory environment encourages user engagement, informs them about current events, and deepens their understanding of the verification process of information provided by the interactive world in relation to reality [48]. The work [35], which similarly targets both professional journalists and the general public, introduces an online game designed to train individuals in verifying news and manipulated content. The playful nature of this application positively influenced users’ observational skills regarding the elements that each news source should possess and the methods of verifying them to combat misinformation and fake news. Additionally, it is worth highlighting the enhancement of digital literacy that resulted from users’ engagement with this platform [35].

While there is no evident link between crowdsourcing, gamified technologies, and serious games, it is crucial to recognize that these domains operate as collaborative systems and collective intelligence environments. These environments are created by users through human–computer interactions and are developed within social media communities [49]. Crowdsourcing is employed in various ways within the field of journalism. From the perspective of journalists, this approach facilitates the discovery of diverse and previously inaccessible information, ultimately leading to more comprehensive and objective reporting. Furthermore, it fosters trust, participation, and a stronger relationship between readers and journalists [50]. Study [51] that examined crowdsourcing as a method of knowledge-seeking and open journalistic practice found that it positively influenced effective fact-finding and checking, providing a continuous flow of advice to journalists. However, the same research highlighted the challenge of managing a large volume of information submissions, which could result in the publication of inaccurate data [51]. Research [39] presents an innovative application focused on the collaborative collection and documentation of soundscapes and the semantics of environmental sounds. The methodology employed in this study shares similarities with sophisticated mobile journalism services. In addition to enhancing audience engagement, the results also demonstrated the profitable nature of this research method across various fields. Specifically, the benefits extend beyond users to the scientific sector, as engagement with the application aids in the collection and storage of resources and metadata. This, in turn, facilitates the semantic enhancement of services in the cloud and enables the utilization of ML techniques [39]. As evidenced by study [52], crowdsourcing has the potential to influence the development of novel perspectives and practices within the realms of online journalism and democratic processes. This study details the utilization and implementation of crowdsourcing strategies for the comprehension and eradication of misinformation and fake news. The findings indicate that the general public is interested in and willing to engage in fact-checking activities. Nevertheless, the most accurate information and fact confirmation appear to be achieved through a combined model of crowdsourcing tactics and professional fact-checkers and experts, thus indicating the emergence of a new journalistic model of communication and information [52].

The promising results of the aforementioned studies serve as a catalyst and source of inspiration for the advancement of journalism and public information. The innovation of this research lies in the development of a tool designed to meet the needs and requirements of the modern era while simultaneously integrating the benefits of various sectors. This issue is addressed by providing training to both journalism professionals and non-professionals, utilizing gamification, serious games, and crowdsourcing technologies. The focus of this study is on emotional speech, a topic that has not yet been explored through informal educational methods. As a result, this literature review focused on the impact of informal education techniques on a broad aspect of journalism, with the potential for further refinement and enhancement through this paper. Ultimately, the novelty of this approach lies in its potential benefits for the field of emotion recognition through speech, which, in turn, has significant implications for the broader field of AI.

As previously stated, the advancement and expansion of the scientific domain of SER systems is a highly challenging endeavor, as it necessitates the collection of a vast and high-quality corpus of speech data with emotional labeling. Numerous studies have been published that focus on the creation of data enrichment and annotation tools. Research [53] presents a collaborative web-based annotation tool that allows for the time-based segmentation, transcription, and tagging of audio and speech data. Essentially, users can segment an audio file, identify and annotate it, and assign labels to specific values. The subsequent step involves uploading and assigning the audio data along with their corresponding annotations, thereby enriching the database and increasing the probability of success for tasks such as voice activity detection, speaker recognition, and emotion recognition [53]. It is equally important to consider the research presented in work [54], which addresses the general challenges of SER tasks, particularly the difficulty of recognizing and distinguishing emotions, both as a function of the respective systems and as a perception by humans. This research advocates for a gradual enrichment of the database, starting with the provision and training of the system with simple annotated speech data, which then becomes increasingly complex. This model is based on the “minmax entropy framework”, which includes ambiguously labeled speech data during the training stage. This approach allows for the description of the difficulty in discriminating between emotional speech samples, as evaluated by the raters [54].

This work focuses on the enrichment of the Acted Emotional Speech Dynamic Database (AESDD). The construction of this database involved recording expressions with active emotional expression in Greek and English, encompassing five distinct emotional states: anger, fear, happiness, sadness, and disgust [19,41,55]. The AESDD is a dynamic database that permits the continuous addition of emotional speech samples, thereby facilitating the expansion of its data through crowdsourcing contributions [20].

The AESDD was the beginning of identifying and designing various strategies and methods to enrich, continuously improve, and develop its capabilities. The initial effort involved creating a website that allowed users to perform specific actions, such as recording emotional speech, collecting and managing emotion-tagged recordings, and serving as a digital repository [20]. However, this approach failed to attract sufficient interest or active participation from the majority of individuals. Consequently, a second approach was adopted: the creation of a digital game designed to provide entertainment and encourage further engagement. By enabling identification with the game heroes and authentic expression of emotions, the game aimed to enhance the validity and quality of the collected audio samples [56]. Although this approach proved to be more appealing and practical than the original one, it remained primarily a recreational option that lost its appeal after the game was completed. Nevertheless, this second approach demonstrated the beneficial effects of game-based activities on both individual improvement and the progress of SER systems. Therefore, fertile ground exists for the design and development of approaches involving gamified digital activities that could actively and catalytically contribute to addressing the aforementioned issues.

3. Research Aims

This section outlines the motivations that led to the development and design of this project. It also delineates the objectives that the research aims to achieve through the formulation of research questions and hypotheses. Finally, it is essential to document the contributions of this work.

3.1. Research Aims and Project Motivation

This review of existing studies has identified significant advances in SER systems while also highlighting several limitations and challenges within this sector. These challenges pertain to both the operation and performance of SER systems, as well as the identification and implementation of methods for the qualitative, continuous, and efficient collection of speech samples with emotional load labeling. Additionally, this literature review revealed the promising outcomes achievable through the utilization of playful and engaging methodologies, such as serious games, gamification technologies, and crowdsourcing. The impetus for initiating this work was the integration of these two areas: the exploitation of gamification technologies and crowdsourcing approaches to address the barriers and issues present in SER systems. This integration also facilitated the exploration and examination of several other scientific areas, particularly those related to learning and literacy through alternative activities and digital environments.

Accordingly, based on the aforementioned analysis, the following research hypotheses (RH) were formed:

Hypothesis 1 (RH1).

The audience has minimal or relatively limited experience with modern digital tools based on AI and ML technologies in order to use them in their daily needs and activities;

Hypothesis 2 (RH2).

The audience is willing to stay informed, develop new skills and habits, and improve existing knowledge through enjoyable and progressive methods;

Hypothesis 3 (RH3).

Serious games and gamification technologies have the potential to provide engaging, interactive, and educational experiences, thereby increasing an individual’s motivation for participation, commitment, and performance.

Based on the research hypothesis, the following research questions (RQs) are formulated:

RQ1:

Can games and gamification be properly designed and combined with AI technologies to be used by users for learning a specific skill, such as emotional speech?

RQ2:

Could specific digital tools (serious games and gamification technologies) achieve optimal audience response and performance, as well as engagement, contributing to sufficient and reliable crowdsourced data collection?

RQ3:

Is it possible to simultaneously promote AI algorithms, data, and literacy through digital tools (serious games and gamification technologies) that are associated with entertainment and fun?

3.2. Research Contribution

This multifaceted project makes significant contributions to the field of SER by integrating gamification technologies, crowdsourcing approaches, and digital literacy. Firstly, it introduces an innovative gamified application designed to collect high-quality emotional speech data, thereby enhancing the development and accuracy of SER systems. This approach can address and overcome existing challenges in the SER field, such as the scarcity of diverse and high-quality emotional speech datasets, the challenge of capturing authentic emotional expressions, and the limitations in models’ ability to generalize across various accents [25,26,27,28]. Secondly, the quantitative research findings indicate that approximately 70% of participants expressed a preference for learning through alternative educational tools (Q11). Additionally, around 72% of participants agreed that serious games and gamified applications could enhance their motivation to practice and improve skills, such as emotional expression (Q19). These findings demonstrate the effectiveness of serious games and gamification technologies in providing engaging educational experiences offered through this study. These technologies promote digital and media literacy while enhancing understanding and knowledge of AI mechanisms. This research also explores the design and effectiveness of specific tools and tactics, showing their potential to teach selected skills, such as emotional speech, and facilitate reliable data collection through crowdsourcing. Furthermore, this study highlights the role of gamified digital approaches in advancing AI technologies, fostering the enjoyable development of various literacies, and creating collaborative digital communities. By leveraging these approaches, the research promotes cooperation and mutual benefit between application users and researchers in AI and SER mechanisms. Overall, this work contributes to the creation and promotion of innovative methods for data collection, education, and community building in the SER sector.

4. Materials and Methods for SER Gaming and Gamification Strategies: Simultaneously Advancing AI Algorithms, Data, and Literacy

Before detailing the development of the application, it was crucial to understand the disciplines involved in this research and to conduct a comprehensive review of the existing scientific and academic literature on this subject. Thus, Section 4 presents a methodological and technological framework for designing a gamified digital application with an educational focus. As previously stated, this application represents a continuation of previous efforts to explore the scientific field of SER. Each approach has emphasized the qualitative and continuous enrichment of the emotional speech database, thereby enhancing the robustness of these systems.

The waterfall model was employed for the project’s development due to its inherently complex nature, which aims to achieve an interdisciplinary synthesis. The selection of this structured and sequential model, where the conclusion of one phase signals the commencement of the next, was deemed essential. The principal phases of this model are as follows: application description, requirement definition, design specification, subsystem development and testing, system integration and testing, and finally, operation and maintenance [57].

4.1. Analysis

In the current phase, substantial progress has been made in defining the project’s focus. Key tasks completed include conceptualizing the initial idea, identifying the target audience, defining the intended users/community, reviewing previous projects, identifying limitations, and devising solutions to potential problems. This study presents three principal processes implemented during the analysis phase.

The first process involved reviewing and obtaining feedback from previous research. The motivation for the present research was to develop a method for qualitatively and continuously collecting samples of emotional speech, with the aim of assessing the robustness of SER systems. As previously mentioned, two prior studies aimed to achieve the same objective [20,56]. Although the initial results of these approaches were promising, indicating improvements in the quality and quantity of emotional speech samples collected and, consequently, in the performance of the respective SER systems, the absence of a clearly defined target group became evident. Specifically, the first project invited the public to participate voluntarily by recording their voices with specific emotional labels, while the second project employed a similar process but engaged the general public through a digital serious game. In both cases, the absence of a targeted audience led to a lack of motivation to engage with these projects, resulting in their rapid abandonment. As a result, maintaining a consistent stream of emotionally annotated speech data was not accomplished, impeding the training requirements of the SER systems and resulting in a stagnation in their development and performance.

Thus, the second concern of this research became the establishment of a target group, with the idea that a more convincing motivation to engage in the process of recording and depositing emotionally labeled speech would be achieved. A review of current societal needs indicated that the suitable target group for this project includes both trained and unqualified journalists, news anchors, broadcasters, and, to a lesser extent, the general public. This group of users, in the course of exercising their communication skills, must meet high standards of vocal quality as well as verbal and non-verbal expressiveness [58]. Therefore, it is logical to create a project that combines the continuous need to record and collect emotionally labeled speech samples with the training of media professionals and non-professionals. Audience analysis was conducted through interviews, focus groups, and online questionnaires to identify the main objectives of the new project in relation to the profiles of different users, as reflected in the demographic and psychographic data collected.

The third and primary process involved describing key requirements and guidelines, with their expected use in subsequent design iterations. Previous attempts to create a tool for the qualitative and quantitative collection of emotional speech samples highlighted the positive contribution of digital environments and interactive technologies, emphasizing their ability to attract interest and elicit a positive response from the audience. Therefore, the present project must also be structured within a corresponding digital interaction context. The decision to establish a gamified digital environment was based on feedback from previous experiences, which, in addition to recognizing existing positive and constructive features, necessitated the incorporation of new elements to facilitate its evolution. Consequently, a blueprint was created to highlight the aspects that will constitute the gamification tool, which, focusing on the continuous feeding and training of the SER database, incorporates multiple sources of reinforcement, including gamification, serious games, and crowdsourcing. Additionally, it addresses the societal dimension, as this project is expected to foster the creation of various groups of individuals who share and express common interests, desires, goals, and habits (Figure 1).

4.2. Design

The design phase involved activities related to the creation of the application structure, flowcharts, high- and low-fidelity templates, and navigation structure. This section summarizes the key outcomes of this phase.

It is worth mentioning that the application is designated “J-Plus”. The designation is derived from an explication of the app’s content, namely, “a journalist’s practice in linguistics, utilizing tools for speech emotion”.

The design of the gamified application was based on two main pillars, reflecting its dual focus. The initial objective was to utilize the tool to enhance SER systems. Consequently, the specification was established to combine and integrate three distinct methodologies for database creation and enhancement. Specifically, the three categories, which are distinguished according to the manner and quality of data collection, are as follows: natural, acted (simulated), and elicited (induced). The natural category involves gathering unprompted and genuine samples of emotional speech captured in authentic contexts and situations. The acted category pertains to the recording of emotional speech data through the professional contribution of actors and artists, aiming for maximum consistency of emotion in active speech. The elicited category involves creating simulated situations to evoke and stimulate specific emotions in a more natural manner [59,60]. Accordingly, the objective of the application’s design axis was to identify a methodology for combining these three categories of data collection (Figure 2) to optimize the performance of the SER system.

In light of this parameter, the specifications and design of the second objective were duly modified. The concept of combining three distinct categories of emotional speech data led to the design of a gamified application comprising three discrete modules. The objective of these modules is to facilitate a novel approach to the acquisition of emotional speech. As this is a digital training tool with elements of interaction and play, the discrete design of three modules/methods was emphasized. The aim was to provide users with the flexibility to choose their preferred method for training in emotional speech. Furthermore, beyond merely offering training methods, the integration of these three distinct methods facilitates the identification of user preferences. This enables the determination of which method is most attractive and preferred by users, thereby enhancing its effectiveness. Additionally, this focus allows for further enhancement and exploitation of the preferred method. As shown in Figure 2, the first module involves the design of a serious game. Users are immersed in a digital world, identifying with the central character to complete various challenges and achieve specific goals. This module aims to engage users in informal educational activities that provide enjoyable practice in emotional speech while simultaneously collecting qualitative data according to the acted speech methodology. The second module is designed to simulate activities tailored to the needs of journalism professionals in oral presentation and information expression. Users interact with existing journalistic articles, documents, and news programs, either by presenting them to practice their emotional speech expression or by observing and evaluating them to identify errors and inconsistencies in the conveyed emotions. This module aims to teach and refine emotional speech skills based on real-world events, conditions, and situations, integrating and promoting both acted and elicited speech methodologies. The third module was designed to foster a sense of personal commitment between the user and the application, adhering to the natural speech data collection methodology. This module aims to dismantle barriers, rules, and inhibitions regarding emotional expression, allowing individuals to freely articulate their emotional states. This innovative approach culminated in the creation of an emotional diary. The subsequent phase of the project involved defining scenarios for each module, along with the objectives that the application is designed to achieve.

The environment of the gamified application was designed to be two-dimensional (2D), with interaction possible via desktop or laptop computers, as well as mobile devices. User engagement involves visual, auditory, and tactile elements, and an internet connection is required. Although the application is primarily designed for individual use, it also offers the possibility of connecting and interacting with other users as a secondary feature, leveraging its crowdsourcing nature.

Based on this information, a series of initial prototypes with a low level of fidelity were developed, which subsequently informed the creation of high-fidelity prototypes. These high-fidelity prototypes serve to illustrate the various modules and options available within the application (Figure 3).

As previously discussed, the objectives of the gamified application were defined with a particular emphasis on the robustness of the SER systems and the education of individuals on emotional discourse. The correlation of these two themes provided a foundation for the development of additional objectives, which are set to be achieved in a subsequent phase (Figure 3). Essentially, users are provided with a digital tool that specializes in facilitating learning and improvement in the area of emotional speech utterance through gamification tasks. Concurrently, users have the option to select the methodology they deem most effective and conducive to training and enhancement in this domain. In addition to facilitating the development and practice of emotional speech, gamification can assist users in managing their emotional state, improving communication skills, and enhancing their digital literacy. Furthermore, the provision of a digital interactive environment has the potential to enhance user engagement and foster continuous interest. The combination of this feature with the option for users to choose the most effective method of training and practice facilitates a more convincing and realistic expression of emotional speech. Consequently, it becomes feasible to collate extensive datasets with emotional speech annotations, which enhance and advance SER systems. In conclusion, this work contributes to the creation of a collaborative model that benefits both target users and researchers in the field of audio recognition.

4.3. Development

This project is designed with two principal objectives, and thus, the development of the gamified tool is divided into two parts for more detailed analysis.

In accordance with the specifications established during the design phase, the gamified application was required to possess the attributes of an accessible digital interaction environment, ensuring compatibility with the majority of computing devices. The chosen platform for the design and implementation of the application is “Genially”. This platform represents an open and advanced web environment capable of developing a variety of digital and interactive products. It offers the possibility of integrating various visual, audio, and interactive elements to attract and retain the attention of end users. Genially is recognized as a tool that is conducive to the creation and utilization of digital products and projects. Its user-friendly interface and intuitive functionality eliminate the necessity for advanced technical expertise or even a fundamental understanding of code, thereby facilitating the development process. The product design screen is distinguished by the presence of visual scripts and functions. The integration of interaction and the assignment of functionality to options and buttons during project development are achieved through the utilization of visual programming and schematics.

The subsequent stage of examination and development involves augmenting the emotional speech database and incorporating the SER mechanism. A SER model, trained on previously collected data, provides emotion detection and prediction in various sections of the application. Specifically, this functionality is included in the first section, the serious game mode, as well as in the second section, the simulation application for presenting a news story. This model is exposed through a web service that communicates with the gamified application software. As more recordings of reliable data tagged with emotional speech are collected, the dataset is extended with multisource data. Newly acquired data are used to retrain and enhance the robustness of the SER model. The emotional speech data used for training and testing the SER model include five (5) distinct categories: anger, fear, happiness, sadness, and disgust.

The model is based on a 2D convolutional neural network (CNN) architecture trained on mel spectrograms of speech signals [20,61]. It is a classification model trained on the data from the AESDD emotional speech dataset. AESDD contains recordings from six actors expressing discrete spoken emotions in the aforementioned emotional classes [19]. Emotion classification is conducted within successive time windows of 1.3 s with a 50% overlap, while the final assessment is determined by a majority vote to provide utterance-level classification [61,62]. The convolutional architecture allows for hierarchical unsupervised extraction of features from the 2D mel spectrograms through the training phase. The training process has been held using the Python Keras 2.13.1 deep learning framework, which facilitates the export of trained models that are compatible with the proposed web service architecture. More details on the training process and the model architecture and fine-tuning can be found in [61]. This majority voting mechanism is weighted according to the probabilities of the classification results for each effective class. Consequently, the model output comprises a probabilistic scheme indicating the presence of each emotion, accompanied by corresponding confidence levels, rather than providing categorical classification results.

4.4. Evaluation

The evaluation process encompasses testing various system areas related to proper functioning, defined objectives, design and interface requirements (user experience/UX and user interface/UI), as well as future specifications, improvements, and maintenance. System testing serves to extract feedback in the form of evaluations, aiming to achieve improvements through maintenance. This entails identifying and addressing any errors, weaknesses, and deviations, as well as enhancing specific parameters and functions. The final product was evaluated in two categories using both qualitative and quantitative measurements.

4.4.1. Participants

A total of forty-eight (n = 48) individuals participated in the evaluation process, divided into two distinct groups. Group 1 engaged in a quantitative evaluation, which involved the distribution and completion of a digital questionnaire. This group consisted of forty-four (n = 44) participants who were asked to respond to the questionnaire both before and after their interaction with the gamified application. The demographic characteristics of this group included a balanced gender distribution (18 males, 18 females, 8 others) and a diverse age range (18–25: 27.27%, 26–35: 22.73%, 36–45: 18.18%, 46–55: 13.64%, 56+: 4.55%, Prefer not to answer: 13.64%). The average age of the participants was 34.00 years, with a standard deviation of 11.88 years. Ethical approval for this research was obtained from the “Committee on Research Ethics and Conduct” of the Aristotle University of Thessaloniki. Participation in the evaluation of the gamified application was conducted anonymously and in accordance with the procedures and regulations established by the committee. Participants were informed in writing about the study’s data, requirements, goals, and objectives prior to their involvement. Consent and acceptance of the stated terms were prerequisites for participation. Additionally, participants had the option to withdraw from this study at any time without providing data.

Group 2 participated in the qualitative evaluation of the gamified application. This group consisted of four (n = 4) experts, each with expertise in a different scientific field. Specifically, the first two experts hold degrees in electrical and computer engineering and doctorates. The first expert specializes in the design of user interfaces for mobile applications and web interfaces and has participated in the development of games and gamification systems (Expert #1). The second expert is engaged in the field of multimedia and audiovisual semantics technologies, with a particular focus on digital media sound design and the operation of speech recognition systems (Expert #2). The third expert comes from the field of education, specializing in digital media and the identification and implementation of diverse frameworks for learning approaches (Expert #3). The fourth member of the group is a graduate of journalism and social media, currently pursuing a doctoral degree with a specialization in digital media in journalism (Expert #4).

4.4.2. Experiment

As previously mentioned, the evaluation involved the implementation of both quantitative and qualitative methodologies. Group 1 participated in the quantitative evaluation, which was conducted through the distribution and subsequent completion of a structured questionnaire. This questionnaire comprises three distinct groups of questions: an analysis of the participants’ existing knowledge and an assessment of their knowledge following engagement with the gamified application. In the first section, participants responded to a series of closed-ended questions to ascertain their existing knowledge and level of engagement with AI, serious games, gamification processes, and related topics. The second section was completed after participants interacted with the gamified application. This study aims to determine the extent to which participants have acquired knowledge and developed skills specifically related to emotional speech through engagement and immersion in a digital world with elements of gamification and interaction. The second section also seeks to gather conclusions and feedback regarding users’ behavior in relation to the gamified application, considering indicators such as familiarity, entertainment, enthusiasm, clarity of targeting, and effectiveness. The third section of the questionnaire pertains to demographic variables. Most questions were structured in a categorical format, with respondents indicating their level of agreement with a series of statements using a five-point Likert scale, where 1 represents “not at all” and 5 represents “very much”. Additionally, multiple-choice questions were incorporated. Table 1 depicts the analytical structure of the questionnaire. The questionnaire was administered to participants via the web-based software “LimeSurvey”, which was disseminated digitally via the internet and social media and conducted anonymously. The online distribution and digital completion of the questionnaire ensured the anonymity of the participants and provided them the freedom to engage in the research or withdraw from it at any time. All individuals who participated in the quantitative evaluation interacted with the gamified application once during the intermediate part of completing the questionnaire. Access to both the questionnaire and the gamified application was facilitated via the participants’ available devices (desktops, laptops, mobile devices).

Group 2 contributed through the organization of experimental conferences and discussion groups, implementing the qualitative evaluation. The expert participants were free to engage with the gamified application as many times as they deemed necessary to reach final conclusions. It was essential for them to engage with the gamified application using different devices (desktops, laptops, portable devices). The objective of these activities was to identify the weaknesses and shortcomings of the final product, test the implementation and performance of the “five E’s” (effectiveness, efficiency, engagement, fault tolerance, ease of learning), and validate the enhancement of the AESDD database and the proper functioning of the SER systems.

4.4.3. Metrics

The performance of the proposed system was evaluated using both quantitative and qualitative metrics. Quantitative metrics included participants’ responses to the questionnaire, which were measured using a five-point Likert scale and multiple-choice questions. Qualitative metrics involved expert feedback on the “five E’s” (effectiveness, efficiency, engagement, fault tolerance, ease of learning) and the overall functionality and effectiveness of the gamified application. These metrics provided comprehensive insights into the system’s performance, user engagement, and educational impact. The feedback and qualitative evaluation reports are presented in the following section, which addresses the results.

In conclusion, the sample of forty-eight individuals who participated in the quantitative (n = 44) and qualitative (n = 4) evaluation processes is deemed sufficient for the pilot validation process.

5. Experimental Results and Discussion

This section presents the qualitative and quantitative findings regarding the design and implementation of the gamified application J-Plus. The results are related to the reported RH and RQ.

5.1. Implemented Scenarios and Offered Functionalities

The final version of the gamified application effectively integrates gamification technologies, serious games, and crowdsourcing with the development of SER systems (https://m3c.web.auth.gr/j-plus/, accessed on 30 September 2024). This integration facilitates the exploration of interconnections and mutual assistance across different scientific and research disciplines. The J-Plus application commences with the home screen, which presents the user with five distinct modules (Figure 4). Each module or option serves a unique function, allowing users to select their preferred method for training and enhancing their emotional speech delivery.

The “About” module provides users with information regarding the feasibility and potential uses of the application. The “Game” section features a serious game designed to engage users in enjoyable yet challenging tests. This game incorporates elements of competition, practice, and evaluation pertinent to the desired expression of specific emotions (game mode: acted speech). The “News Anchor” module focuses on training and improving emotional speech for individuals in journalism. This module offers two versions: (a) training in the delivery of emotional speech through the presentation of authentic news items and events (simulation mode—acted speech) and (b) observation and analysis of the emotional speech of professional speakers (simulation mode—elicited speech). The “Emotional Diary” module serves as a record of emotional experiences, inviting users to freely express their thoughts and record their current emotional state, thereby contributing authentic emotional expressions to the application’s database (personal mode—natural speech). The final section, entitled “Profile”, provides a summary and classification of users based on their engagement and interaction within each section of the application. This section illustrates the user’s progress in each part of the application, as well as the emotions they have experienced and successfully managed. These emotions are categorized into five groups: anger, fear, happiness, sadness, and disgust. These categories align with the comprehensive range covered by the AESDD across all sections of the application (Figure 5, Figure 6 and Figure 7).

Notably, the “Profile” section also facilitates the crowdsourcing function, allowing users to be redirected to an external environment beyond the application. Through processes of general discussion, comparison of levels and performance of expressed emotions, exchange of opinions, and resolution/support for any problems and difficulties that may arise, a digital community can be established, focusing on emotional discourse.

The section on the mechanisms of development and operation of SER systems begins with the data accumulated during the user’s engagement with the J-Plus application. Specifically, these new data are employed and integrated during the processes of predicting user emotional discourse. This process is implemented across different modules of the application, forming evaluation indicators for the expression of targeted emotions at any given time. The collection of new samples of emotional speech, which become more reliable due to these evaluation indicators, creates a continuous and qualitative supply of emotional speech records. This allows the system to retrain, thereby enhancing the robustness of the SER model. This iterative process provides users with performance feedback and facilitates the continuous improvement of their performance.

5.2. Analysis and Pilot (Usability) Evaluation of Services

This research builds upon previous efforts to develop methods and strategies for collecting emotional speech data and developing SER systems. These prior endeavors included the creation of a website [41] and a serious digital game [56]. In contrast, this proposal focuses on the development of a gamified application. Therefore, it is necessary for the group of experts to compare the new method with the previous ones to obtain more accurate and targeted feedback. A comparison of the three approaches (the website, the serious game, and the gamified application) was conducted based on five indicators: (a) engagement, (b) pleasure–satisfaction, (c) effectiveness, (d) skill development and literacy, and (e) sample/crowdsourcing generalization [63]. In summary, this website yielded the lowest results compared to the other two methods, indicating that incorporating gaming elements or gamification can lead to improved outcomes in terms of user attraction and engagement (website: mean 1.2, st. dev. 0.45; serious digital game: mean 3.8, st. dev. 0.45; gamified application: mean 4.2, st. dev. 0.84), and performance (website: mean 1.8, st. dev. 0.84; serious digital game: mean 2.6, st. dev. 0.89; gamified application: mean 4.2, st. dev. 0.84). The gamified application produced more favorable and constructive outcomes than the serious game, particularly in skill development and literacy (serious digital game: mean 3.4, st. dev. 0.55; gamified application: mean 4.2, st. dev. 0.84). The creation of a target group for the application, along with its combinatorial nature, may contribute to capturing users’ attention and fostering a sense of daily and consistent engagement. Thus, it can be concluded that the research objectives are more likely to be achieved through the implementation of the gamified J-Plus application.

In specific, regarding the gamified application, the experts provided several comments and broader observations. They proposed modifications focusing on the content and the graphical user interface (GUI). One significant modification was to the “About” section. They believed it was important that information relevant to the other options of the gamified application should also be provided in the opening interface of each module. These proposed changes are expected to be implemented during the maintenance process of the gamified application. This approach ensures that users are informed from the beginning about the nature of each module, facilitating better navigation within the application. Positive feedback was given regarding the content of the J-Plus application, particularly highlighting the option for users to choose their preferred method of engaging in emotional speech. The emphasis lies on the user’s connection to the gamified application and sustained engagement, especially when compared to previous methods of collecting emotional speech data. The diversity of the modules, combined with the “Emotional Diary” option, which relates to more personal user information, gives the J-Plus application a multifaceted character. This multifaceted nature ensures that user engagement is stable, meaningful, and qualitative. The experts also provided constructive comments on the potential for achieving the stated objectives. They emphasized the positive impressions regarding the qualitative collection of emotional speech data and their simultaneous utilization within the application environment. Additionally, they highlighted the importance of the simultaneous development of both the discipline of SER and digital tool literacies through engagement with this tool.

However, the experts also expressed some concerns. Firstly, they raised reservations about the performance of emotional discourse and the reliability of the SER database. Concerns were noted regarding the potential mispronunciation of emotional speech to achieve harmonization during the implementation of the SER system, i.e., the adaptation of the user to the familiar and correct emotion for the machine. This adaptation could potentially lead to gains in achievements and better progress. The gamified application targets journalism professionals who seek training and improvement tactics to express news discourse with accurate emotional load. Therefore, designing the app with a specific target group in mind may mitigate this concern. Significant reservations were also expressed regarding privacy and ethics. The development and evolution of SER systems rely on the expressed emotional states collected through recordings of human speech. Consequently, the collection of this data raises major privacy and ethical issues. The expert group emphasized the necessity for users to possess transparent information and provide informed consent for the recording and using their voice data. Additionally, ensuring the security of the data, with access restricted to authorized personnel only, is crucial. The gamified application ensures that users are fully informed about the reasons and purposes of recording and storing speech during emotional speech interactions. Users can only use the application after agreeing to these terms and providing their consent. Furthermore, the emotional speech data repository is hosted on a web-based platform designated and managed by Aristotle University of Thessaloniki. This ensures that only authorized individuals involved in the specific project, whose role is to develop SER systems, have access to these data. This approach aims to balance technological innovation, privacy protection, and the development and evolution of SER systems.

The second stage of the evaluation process involved a quantitative method, achieved through the distribution of a user analysis and application evaluation questionnaire. As previously stated, the questionnaire is divided into a number of sections. The initial sections pertain to the participants’ preexisting knowledge and habits before using the J-Plus application. The final section pertains to the participants’ evaluation of the application following its utilization (Table 1). It should be noted that a reliability test was conducted prior to performing the statistical analysis based on Cronbach’s alpha. The resulting value of 0.826 indicates a high level of internal consistency for the entire questionnaire. This suggests that the questionnaire is internally consistent and that the resulting statistics are valid.

Upon examining the initial section of the data, it becomes evident that a significant proportion of respondents perceive their understanding of the field of AI to be limited. Specifically, for Question 2 (Q2), the average response is 2.80, and the median is 3. Conversely, a relatively smaller subset of respondents indicate that they possess a more substantial grasp of the subject matter, as evidenced by the interquartile range index (IQR = 2). This finding is supported by the public’s responses regarding their knowledge of the differences between AI and ML domains. In response to Question 1 (Q1), >65% of participants indicated that they were either unaware of the differences between AI and ML or had a limited understanding of them, with an average response of 2.675 and a median of 3. Encouragingly, the overwhelming majority of respondents (>72%) view the potential offered by AI technologies as significant and beneficial, as reflected in Question 3 (Q3), where the average response is 4.10, and the median is 4. Furthermore, the public’s intention to gain knowledge about AI is characterized as high and important, as indicated by Question 7 (Q7), with an average response of 3.925 and a median of 4. The responses provided by the respondents regarding the fields of serious games and gamification technologies exhibit a comparable degree of fluctuation. However, it is important to note a distinction between these responses and those given in the previous domain. The public demonstrates a certain degree of familiarity with the concepts of gamification and serious games and is able to differentiate between the two. This is evidenced by the respondents’ ability to correctly match examples of serious games and gamification applications, as requested in Question 8 (Q8) of the questionnaire. Similarly, the preference and shift of the public toward a more enjoyable and alternative mode of education, as evidenced by the responses to Question 11 (Q11) of the questionnaire, favor this approach over traditional teaching methods, with an average response of 3.37 and a median of 3. The characteristics of entertainment and enjoyment, a sense of relaxation and relief from stress, and the excitement of challenge and competition are the most important elements of serious games and gamification technologies that can motivate respondents to engage in alternative modes of education, as indicated by Questions 12 (Q12) and 13 (Q13). Furthermore, it is notable that the ease of use of a multimedia product, such as serious games and gamification technologies, is a significant determining factor in its effectiveness, as highlighted in Question 14 (Q14).

Based on the analysis of the questionnaire results, it is evident that the J-Plus application significantly contributes to the development of users’ digital literacy. The pre-engagement responses indicate a varied understanding of AI and digital skills, with many users starting with moderate to limited knowledge (e.g., >27% of the participants had no understanding of AI and ML distinctions, and >41% of the participants had limited or no knowledge of AI technologies). Post-engagement responses show marked improvements, with >55% of users feeling comfortable or very comfortable expressing different emotions during spoken communication and >53% of users agreeing or strongly agreeing that training in emotional speech is achievable through gamified activities. Additionally, >67% of users believe that serious games and gamification applications could enhance their motivation to practice and improve emotional speech, and >75% of users recognize the long-term benefits of improving emotional speech delivery. These findings suggest that J-Plus not only enhances users’ technical understanding of AI but also fosters essential digital skills such as emotional articulation, feedback utilization, and motivation for continuous learning. Therefore, the J-Plus application effectively supports the development of digital literacy among its users.

To highlight the most significant conclusions of this study, a correlation analysis was conducted between the parameters of the questionnaire. Table 2 presents the attributes that are most strongly and positively correlated according to the Pearson coefficient (threshold = 0.5). The analysis demonstrated that transforming a task into a playful activity, coupled with social interaction and the connections and communication it facilitates with others, are crucial parameters that catalyze motivation for engaging with gamified applications. Additionally, the transformation of a task into a gamified activity, combined with the achievement of small or large goals and continuous progress, further motivates users to engage with gamified applications and reinforces their sustained engagement.

5.3. Discussion and Answers to the Stated Research Hypotheses (RH) and Questions (RQ)

A synthesis of the findings from the quantitative and qualitative evaluations allows for the formulation of preliminary conclusions that address the hypotheses and questions posed in this research. As evidenced in the preceding section, participants demonstrated a moderate comprehension of AI and ML concepts (in response to Q1, over 68% of participants indicated that they had either no understanding or only a little to moderate understanding) and a comparable level of familiarity with AI technologies (in response to Q2, approximately 63% of participants indicated that their understanding ranged from none to very little). Notably, the descriptive statistics revealed considerable variability in the interquartile range index (high IQR), indicating that a significant proportion of respondents have limited exposure to these technologies. Conversely, the results indicate that respondents perceive AI as a beneficial technology with the potential to impact society despite their limited direct experience with it (in Q3, over 72% of participants indicated that they believe the capabilities offered by AI technologies are useful or very useful and in response to Q4, over 77% of participants indicated that they believe the use of AI can have a significant impact on society). Therefore, hypothesis RH1 is partially confirmed. Regarding RH2, the evidence is wholly corroborated. Respondents demonstrated a clear recognition of the importance of digital skills in Question 5 (Q5), with over 84% of participants indicating that they believe digital skills education is important or very important. Additionally, expressed support for the use of multimedia applications in educational contexts, as evidenced by over 79% of respondents in Question 6 (Q6) agreeing or strongly agreeing that multimedia applications can be used for individual education. Furthermore, there is a moderate to significant interest in acquiring new knowledge about AI, as indicated by over 67% of respondents in Question (Q7) who agreed or strongly agreed with this statement. While there is some interest in acquiring knowledge through alternative means, it is not predominant, and respondents exhibit a neutral stance on the matter. Overall, the public shows a willingness to develop new skills, particularly when enjoyable and effective methods such as multimedia, gamification, and gaming technologies are involved. The findings of RH3 are validated similarly. The responses to the questionnaire indicate that the primary motivations for engaging with serious games and gamification processes are “entertainment and enjoyment” (Q9: over 65% of participants indicated a preference for this answer) and “relaxation and stress relief” (Q9: over 43% of participants indicated a preference for this answer). This suggests that engagement is a crucial factor in this context. Furthermore, the results indicate that respondents consider “interesting and enjoyable activities” to be the most motivating factors (Q12: over 65% of participants indicated a preference for this answer), reinforcing the notion that serious games and gamification can facilitate engaging and interactive experiences. The analysis of Question 13 (Q13) demonstrates that an immersive environment combined with feedback is a crucial element in a successful experience. The largest percentage of participants’ responses was concentrated in these two categories, with approximately 43% of participants selecting “engaging and immersive environment” and over 20% selecting “availability of feedback”. This evidence supports the assertion that these tools can effectively drive motivation, engagement, and performance. Therefore, it can be concluded that the research hypotheses set out in this paper (RH1, RH2, RH3) have been successfully confirmed.

The analysis of the questionnaire revealed several important findings and conclusions that warrant further examination. Based on the responses, the most valuable features for emotional speech training were identified as narrative/plot (with over 36% of participants in agreement), interactivity (with over 45% of participants in agreement), collaboration (with over 50% of participants in agreement), and progress monitoring (with over 18% of participants in agreement). These findings suggest that well-designed gaming experiences could facilitate the development of emotional language skills by leveraging these features. Additionally, respondents recognized the importance of feeling engaged and motivated when developing skills, further supporting the potential for learning through games and gamified environments (RQ1).

Further analysis of the results suggests that gamified tools can effectively engage users and contribute to efficient data collection. Specifically, the responses to Questions 19 and 21 (Q19 and Q21) indicate strong agreement on the importance of engagement, motivation, and feedback in learning. This is evident from over 72% of participants, who expressed moderate to high agreement that serious games and gamification can enhance their motivation to practice and improve a skill, such as emotional speech (Q19). Additionally, the majority of participants, specifically over 34%, indicated a preference for constructive criticism as a type of feedback in relation to learning a skill (Q21). These findings, particularly those from Question 19 (Q19), suggest that integrating these elements into digital tools can enhance user engagement and performance. Additionally, the analysis reveals a strong consensus on the long-term benefits of improving emotional language, which increases the potential for data collection through crowdsourcing methods. In terms of motivating engagement with gamified applications, the strong correlation between transforming a task into a gamified activity and the element of social interaction and connection/communication with others underscores the need for users to feel a sense of belonging to a group. This need is reinforced through engagement in gamified activities, which creates greater motivation to participate and a higher quality commitment to collective work (RQ2).

The findings indicate that entertainment and engagement are primary motivators for users when interacting with gamified tools. Combining educational content with enjoyable experiences, serious games, and gamification can effectively promote digital literacy in an engaging manner. Based on the results, gamified tools designed to be entertaining can support literacy efforts not only in the area of emotional language but also in understanding how AI systems work and perform, as well as in enhancing overall digital literacy while maintaining user engagement. The qualitative evaluation process highlighted the successful collection of emotional speech samples and their integration into the AESDD database, as well as their functional performance within the SER system. This demonstrates the application’s multiple uses and benefits, which both advance AI algorithms and contribute to the empowerment and education of individuals (RQ3).

In conclusion, the research hypotheses are largely validated by this analysis, and the research questions are effectively answered. It is confirmed that the possibilities offered by serious games and gamification technologies, and by extension crowdsourcing, can facilitate engagement, learning, collaboration, and data collection. However, the most significant achievement is the multimodal combination, collaboration, advancement, and improvement of different scientific fields.

5.4. Limitations and Future Work

Although the findings of this research are promising, several limitations must be acknowledged. Firstly, the variability in respondents’ understanding of AI and ML concepts, as indicated by the high interquartile range (IQR), suggests that the effectiveness of gamified tools may vary significantly across different user groups. This variability is further influenced by the diverse characteristics each user brings, such as cultural, social, and educational backgrounds, which could impact their understanding and implementation of these tools. Additionally, this study involved a small group of experts for qualitative evaluations. The limited number of experts may introduce bias, as their opinions and experiences might not be representative of the broader population. Future research should include a larger and more diverse group of experts to enhance the generalizability of the findings. The sample size of 48 participants, while sufficient for preliminary conclusions, may not be large enough to capture the full range of user experiences and preferences. A larger sample size in future studies would provide more robust data and strengthen the validity of the results. Furthermore, the findings are based on the specific context of the J-Plus application and its use in emotional speech training. The results may not be directly applicable to other contexts or types of gamified applications. Future research should explore the effectiveness of gamification in different educational and professional settings. Additionally, reliance on self-reported data may introduce biases, as participants might overestimate or underestimate their levels of understanding and engagement. It is also important to note that the long-term impact of these tools on learning and engagement remains uncertain, necessitating further longitudinal studies to validate the sustained effectiveness of the proposed methods. Finally, ethical concerns, such as data privacy and the potential for manipulation, must be meticulously addressed to ensure the responsible use of gamification and serious games in educational contexts.

Future research should focus on expanding the sample size and diversity of participants to enhance the generalizability of the findings. Longitudinal studies are needed to assess the long-term impact of gamified tools on learning and engagement. Additionally, exploring the application of gamification in various educational and professional settings will provide a broader understanding of its effectiveness. Further investigation into the ethical implications, particularly concerning data privacy and potential manipulation, is essential to ensure the responsible use of these technologies. Future work should also focus on enhancing SER systems by leveraging the collected emotionally tagged speech samples to improve the performance and robustness of these systems. Additionally, the potential of crowdsourcing for large-scale data collection and the development of collaborative environments and digital communities should be explored to further advance AI algorithms and digital literacy.

6. Conclusions

This paper presents the design and development of a gamified application called “J-Plus”. This project has a dual aim. The idea for its creation emerged from previous efforts to qualitatively enrich the AESDD database and enhance its robustness. Consequently, the initial goal was to address the need to collect high-quality emotional speech data to improve and advance SER systems. Previous knowledge, combined with a comprehensive literature review and the demands of the modern era, indicated that this could be achieved through the use of serious games, gamification technologies, and crowdsourcing. This paper details the stages of development of the digital application “J-Plus”, which seeks to leverage a fun, enjoyable, and attractive approach to engage a larger number of users, thereby focusing on the development of the SER systems sector. Simultaneously, it promotes the learning and improvement of emotional language within a pleasant and engaging environment, consistent with the principles of serious games and gamification technologies.

The final project was evaluated using both qualitative and quantitative methodologies. The findings from both approaches corroborated the research hypotheses and questions established at the outset of this investigation. Specifically, a connection between serious games and gamification technologies with AI mechanisms can be established to achieve knowledge and skill learning, particularly in the ability to express emotional speech (Q16: over 72% of participants moderately to strongly agree that education and improvement of emotional expression can be achieved through gamification activities). Engagement with the application also yields encouraging results in enhancing digital literacy and comprehension, as well as enriching knowledge about the functioning and mechanisms of AI. Furthermore, it is evident that audience response, engagement, and performance can be influenced through the utilization of digital tools, such as serious games and gamification technologies (Q18: approximately 77% of participants considered it is important to feel commitment and motivation while learning a new skill, such as emotional speech, and Q19: approximately 70% of participants believed that serious games and gamification could enhance their motivation to practice and improve their emotional speech). The results of the J-Plus gamified application demonstrate the potential for enhancing the reliability and adequacy of crowdsourced data collection. Additionally, the design and development of the J-Plus application highlight the potential for simultaneously promoting data and AI algorithms while developing various skills and cognitive domains through digital tools and tactics related to entertainment and fun. Finally, this paper advocates for the creation of collaborative environments and digital communities through crowdsourcing tactics in conjunction with the individual desire and need for collective participation and contribution.

In addition to validating the research hypotheses (RH) and addressing the research questions (RQ), the quality assessment process revealed data that merit further analysis. The expert panel identified two principal concerns: the reliability of the data and the emotional speech recording section. The initial concerns pertain to the general reliability of an SER database in the context of digital gamified activities. Specifically, there is a reservation that users might alter their emotional speech expression to achieve better system performance, thereby meeting the application’s goals. Although this concern is valid, it can be mitigated by the fact that the gamified application targets a specific group of users, namely, journalism professionals and non-journalism professionals. These users are expected to engage in continuous training, performance improvement, and development to enhance their ability to express news speech with accurate emotional load. Subsequently, concerns were expressed regarding the emotional speech recording section. SER systems are designed to identify emotions and emotional states from recordings of human speech. Consequently, the collection and analysis of audio data raise significant privacy, ethical, and moral issues. It is essential that users are fully informed and provide explicit consent before their voice data are recorded and used. Furthermore, concerns have been raised about the security of lawfully collected data, given that they may contain sensitive information. Therefore, measures must be implemented to prevent unauthorized access to these data. Given the research-based nature of this work, it is committed to upholding ethical standards and ensuring user trustworthiness. This commitment guarantees the confidentiality and privacy of users. Thus, it is possible to achieve a balance between technological innovation, privacy, and the promotion of ethics in the SER sector.

The findings of the present study can also be regarded as a foundation for future hypotheses and a basis for further investigation into the identified knowledge gaps.

Author Contributions

Conceptualization, E.S., N.V., L.V. and C.A.D.; methodology, E.S. and N.V.; software, E.S. and N.V.; validation, N.V. and L.V.; formal analysis L.V.; investigation, E.S. and N.V.; data curation, L.V.; writing—original draft preparation, E.S.; writing—review and editing, N.V., L.V. and C.A.D.; visualization, E.S.; supervision, N.V., L.V. and C.A.D. All authors have read and agreed to the published version of this manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Approved by the Committee on Research Ethics and Conduct of the Aristotle University of Thessaloniki, Approval Code 127310/2023, 2023-05-10.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the conducted evaluation processes.

Data Availability Statement

All data that are not subjected to institutional restrictions are available through the links provided within this manuscript.

Acknowledgments

We would like to acknowledge the ethical data collection approval granted by the Committee on Research Ethics and Conduct of the Aristotle University of Thessaloniki. This approval was crucial for the conduct of our research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Appio, F.P.; La Torre, D.; Lazzeri, F.; Masri, H.; Schiavone, F. The Societal Impact of Artificial Intelligence. In Impact of Artificial Intelligence in Business and Society, 1st ed.; Routledge: London, UK, 2023; pp. 169–269. [Google Scholar]
Tsvetkova, M.; Yasseri, T.; Meyer, E.T.; Pickering, J.B.; Engen, V.; Walland, P.; Lüders, M.; Følstad, A.; Bravos, G. Understanding human-machine networks: A cross-disciplinary survey. ACM Comput. Surv. (CSUR) 2017, 50, 1–35. [Google Scholar] [CrossRef]
Lillywhite, A.; Wolbring, G. Coverage of artificial intelligence and machine learning within academic literature, Canadian newspapers, and twitter tweets: The case of disabled people. Societies 2020, 10, 23. [Google Scholar] [CrossRef]
Wu, Y.C.; Feng, J.W. Development and application of artificial neural network. Wirel. Pers. Commun. 2018, 102, 1645–1656. [Google Scholar] [CrossRef]
Allen, G. Understanding AI technology. Jt. Artif. Intell. Cent. (JAIC) Pentagon U. S. 2020, 2, 24–32. [Google Scholar]
Appio, F.P.; La Torre, D.; Lazzeri, F.; Masri, H.; Schiavone, F. Artificial Intelligence: Technological Advancements and Methodologies. In Impact of Artificial Intelligence in Business and Society, 1st ed.; Routledge: London, UK, 2023; pp. 13–81. [Google Scholar]
Gruson, D.; Helleputte, T.; Rousseau, P.; Gruson, D. Data science, artificial intelligence, and machine learning: Opportunities for laboratory medicine and the value of positive regulation. Clin. Biochem. 2019, 69, 1–7. [Google Scholar] [CrossRef] [PubMed]
Hamet, P.; Tremblay, J. Artificial intelligence in medicine. Metabolism 2017, 69, S36–S40. [Google Scholar] [CrossRef] [PubMed]
Axtell, T.W.; Overbey, L.A.; Woerner, L. Machine learning in complex systems. In Proceedings of the Ground/Air Multisensor Interoperability, Integration, and Networking for Persistent ISR IX, Orlando, FL, USA, 4 May 2018; pp. 39–44. [Google Scholar]
Lwakatare, L.E.; Raj, A.; Crnkovic, I.; Bosch, J.; Olsson, H.H. Large-scale machine learning systems in real-world industrial settings: A review of challenges and solutions. Inf. Softw. Technol. 2020, 127, 106368. [Google Scholar] [CrossRef]
Deng, L.; Li, X. Machine learning paradigms for speech recognition: An overview. IEEE Trans. Audio Speech Lang. Process. 2013, 21, 1060–1089. [Google Scholar] [CrossRef]
Zhou, L.; Pan, S.; Wang, J.; Vasilakos, A.V. Machine learning on big data: Opportunities and challenges. Neurocomputing 2017, 237, 350–361. [Google Scholar] [CrossRef]
Abbass, H.A. Social integration of artificial intelligence: Functions, automation allocation logic and human-autonomy trust. Cogn. Comput. 2019, 11, 159–171. [Google Scholar] [CrossRef]
Huyen, C. Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2022. [Google Scholar]
Halevy, A.; Norvig, P.; Pereira, F. The unreasonable effectiveness of data. IEEE Intell. Syst. 2009, 24, 8–12. [Google Scholar] [CrossRef]
Crippa, M. Speech Emotion Recognition in a Storytelling-Based Serious Game. Master’s Thesis, School of Industrial and Information Engineering, Politecnico di Milano, Italy, 19 December 2023. [Google Scholar]
Atmaja, B.T.; Akagi, M. Dimensional speech emotion recognition from speech features and word embeddings by using multitask learning. APSIPA Trans. Signal Inf. Process. 2020, 9, e17. [Google Scholar] [CrossRef]
Karpouzis, K.; Yannakakis, G.N. Emotion in Games: Theory and Praxis; Springer: Cham, Switzerland, 2016. [Google Scholar]
Vryzas, N.; Kotsakis, R.; Liatsou, A.; Dimoulas, C.A.; Kalliris, G. Speech emotion recognition for performance interaction. J. Audio Eng. Soc. 2018, 66, 457–467. [Google Scholar] [CrossRef]
Vrysis, L.; Tsipas, N.; Thoidis, I.; Dimoulas, C.A. 1D/2D deep CNNs vs. temporal feature integration for general audio classification. J. Audio Eng. Soc. 2020, 68, 66–77. [Google Scholar] [CrossRef]
Alomari, H.W.; Ramasamy, V.; Kiper, J.D.; Potvin, G. A User Interface (UI) and User eXperience (UX) evaluation framework for cyberlearning environments in computer science and software engineering education. Heliyon 2020, 6, e03917. [Google Scholar] [CrossRef] [PubMed]
Lee, F.M.; Li, L.H.; Huang, R.Y. Recognizing low/high anger in speech for call centers. In Proceedings of the International Conference on Signal Processing, Robotics and Automation, Cambridge, UK, 20–22 February 2008; pp. 171–176. [Google Scholar]
Balcombe, L.; De Leo, D. Human-computer interaction in digital mental health. Informatics 2022, 9, 14. [Google Scholar] [CrossRef]
Fernández-Aranda, F.; Jiménez-Murcia, S.; Santamaría, J.J.; Gunnard, K.; Soto, A.; Kalapanidas, E.; Bults, R.G.A.; Davarakis, C.; Ganchev, T.; Granero, R.; et al. Video games as a complementary therapy tool in mental disorders: PlayMancer, a European multicentre study. J. Ment. Health 2012, 21, 364–374. [Google Scholar] [CrossRef]
Yildirim, S.; Narayanan, S.; Potamianos, A. Detecting emotional state of a child in a conversational computer game. Comput. Speech Lang. 2011, 25, 29–44. [Google Scholar] [CrossRef]
El Ayadi, M.; Kamel, M.S.; Karray, F. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognit. 2011, 44, 572–587. [Google Scholar] [CrossRef]
Ververidis, D.; Kotropoulos, C. Emotional speech recognition: Resources, features, and methods. Speech Commun. 2006, 48, 1162–1181. [Google Scholar] [CrossRef]
Koolagudi, S.G.; Rao, K.S. Emotion recognition from speech: A review. Int. J. Speech Technol. 2012, 15, 99–117. [Google Scholar] [CrossRef]
Díaz-Rodríguez, N.; Del Ser, J.; Coeckelbergh, M.; de Prado, M.L.; Herrera-Viedma, E.; Herrera, F. Connecting the dots in trustworthy Artificial Intelligence: From AI principles, ethics, and key requirements to responsible AI systems and regulation. Inf. Fusion 2023, 99, 101896. [Google Scholar] [CrossRef]
Kalmpourtzis, G. Educational Game Design Fundamentals: A Journey to Creating Intrinsically Motivating Learning Experiences, 1st ed.; AK Peters/CRC Press: New York, NY, USA, 2018. [Google Scholar]
Yordanova, Z. Gamification as a tool for supporting Artificial Intelligence development–State of Art. In Applied Technologies, Communications in Computer and Information Science; Botto-Tobar, M., Zambrano Vizuete, M., Torres-Carrión, P., Montes León, S., Pizarro Vásquez, G., Durakovic, B., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 313–324. [Google Scholar]
Pérez, J.; Castro, M.; López, G. Serious games and AI: Challenges and opportunities for computational social science. IEEE Access 2023, 11, 62051–62061. [Google Scholar] [CrossRef]
Tolks, D.; Schmidt, J.J.; Kuhn, S. The role of AI in serious games and gamification for health: Scoping review. JMIR Serious Games 2024, 12, e48258. [Google Scholar] [CrossRef] [PubMed]
Gounaridou, A.; Siamtanidou, E.; Dimoulas, C. A serious game for mediated education on traffic behavior and safety awareness. Educ. Sci. 2021, 11, 127. [Google Scholar] [CrossRef]
Katsaounidou, A.; Vrysis, L.; Kotsakis, R.; Dimoulas, C.; Veglis, A. MAthE the game: A serious game for education and training in news verification. Educ. Sci. 2019, 9, 155. [Google Scholar] [CrossRef]
Katsaounidou, A.; Dimoulas, C.; Veglis, A. Cross-Media Authentication and Verification: Emerging Research and Opportunities; IGI Global: Hershey, PA, USA, 2018. [Google Scholar]
Jones, S. Disrupting the narrative: Immersive journalism in virtual reality. J. Media Pract. 2017, 18, 171–185. [Google Scholar] [CrossRef]
Torres-Toukoumidis, A.; Gutiérrez, I.M.; Becerra, M.H.; León-Alberca, T.; Curiel, C.P. Let’s play democracy, exploratory analysis of political video games. Societies 2023, 13, 28. [Google Scholar] [CrossRef]
Stamatiadou, M.E.; Thoidis, I.; Vryzas, N.; Vrysis, L.; Dimoulas, C.A. Semantic crowdsourcing of soundscapes heritage: A mojo model for data-driven storytelling. Sustainability 2021, 13, 2714. [Google Scholar] [CrossRef]
Schrier, K. The Future of Crowdsourcing Through Games. In Second International Handbook of Internet Research; Hunsinger, J., Allen, M., Klastrup, L., Eds.; Springer: Dordrecht, The Netherlands, 2020; pp. 935–955. [Google Scholar]
Vryzas, N.; Vrysis, L.; Kotsakis, R.; Dimoulas, C.A. A web crowdsourcing framework for transfer learning and personalized speech emotion recognition. Mach. Learn. Appl. 2021, 6, 100132. [Google Scholar] [CrossRef]
Arjoranta, J.; Koskimaa, R.; Siitonen, M. Immersive gaming as journalism. In Immersive Journalism as Storytelling: Ethics, Production, and Design; Uskali, T., Gynnild, A., Jones, S., Sirkkunen, E., Eds.; Routledge Taylor & Francis Group: London, UK; New York, NY, USA, 2020; pp. 137–146. [Google Scholar]
De la Peña, N.; Weil, P.; Llobera, J.; Giannopoulos, E.; Pomés, A.; Spanlang, B.; Friedman, D.; Sanchez-Vives, M.; Slater, M. Immersive journalism: Immersive virtual reality for the first-person experience of news. Presence Teleoper. Virtual Environ. 2020, 19, 291–301. [Google Scholar] [CrossRef]
Aayeshah, W. Playing with news: Digital games in journalism education. Asia Pac. Media Educ. 2012, 22, 29–41. [Google Scholar] [CrossRef]
Luhova, T. Journalism Education Based on Serious Games. Open Educ. E-Environ. Mod. Univ. 2021, 11, 92–105. [Google Scholar] [CrossRef]
Huang, L.Y.; Yeh, Y.C. Meaningful gamification for journalism students to enhance their critical thinking skills. Int. J. Game-Based Learn. 2017, 7, 47–62. [Google Scholar] [CrossRef]
Ferrer-Conill, R. Quantifying journalism? A study on the use of data and gamification to motivate journalists. Telev. New Media 2017, 18, 706–720. [Google Scholar] [CrossRef]
García-Ortega, A.; García-Avilés, J.A. When journalism and games intersect: Examining news quality, design and mechanics of political newsgames. Convergence 2020, 26, 517–536. [Google Scholar] [CrossRef]
Hossain, M.; Kauranen, I. Crowdsourcing: A comprehensive literature review. Strateg. Outsourcing Int. J. 2015, 8, 2–22. [Google Scholar] [CrossRef]
Aitamurto, T. Crowdsourcing in journalism. In Oxford Research Encyclopedia of Communication; Oxford University Press: New York, NY, USA, 2019. [Google Scholar]
Aitamurto, T. Crowdsourcing as a knowledge-search method in digital journalism: Ruptured ideals and blended responsibility. Digit. J. 2016, 4, 280–297. [Google Scholar] [CrossRef]
Lamprou, E.; Antonopoulos, N.; Anomeritou, I.; Apostolou, C. Characteristics of fake news and misinformation in greece: The rise of new crowdsourcing-based journalistic fact-checking models. J. Media 2021, 2, 417–439. [Google Scholar] [CrossRef]
Grover, M.S.; Bamdev, P.; Brala, R.K.; Kumar, Y.; Hama, M.; Shah, R.R. Audino: A modern annotation tool for audio and speech. arXiv 2020, arXiv:2006.05236. [Google Scholar]
Lotfian, R.; Busso, C. Curriculum learning for speech emotion recognition from crowdsourced labels. IEEE/ACM Trans. Audio Speech Lang. Process. 2019, 27, 815–826. [Google Scholar] [CrossRef]
Vryzas, N.; Vrysis, L.; Kotsakis, R.; Dimoulas, C. Speech emotion recognition adapted to multimodal semantic repositories. In Proceedings of the 2018 13th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP), Zaragoza, Spain, 6–7 September 2018; pp. 31–35. [Google Scholar]
Siamtanidou, E.; Vryzas, N.; Vrysis, L.; Dimoulas, C. A serious game for crowdsourcing and self-evaluating speech emotion annotated data. In Proceedings of the Audio Engineering Society Convention 154, Helsinki, Finland, 13–15 May 2023. [Google Scholar]
Dimoulas, C.A. Multimedia. In The SAGE International Encyclopedia of Mass Media and Society; Merskin, D.L., Ed.; SAGE Publications, Inc.: Saunders Oaks, CA, USA, 2019. [Google Scholar]
de Albuquerque Rodrigues, D.; Simões-Zenari, M.; Dos Reis Cota, A.; Nemr, K. Voice and communication in news anchors: What is the impact of the passage of time? J. Voice 2021, 38, 284–391. [Google Scholar]
Akçay, M.B.; Oğuz, K. Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Commun. 2020, 116, 56–76. [Google Scholar] [CrossRef]
Cao, H.; Verma, R.; Nenkova, A. Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech. Comput. Speech Lang. 2015, 29, 186–202. [Google Scholar] [CrossRef]
Vryzas, N.; Vrysis, L.; Matsiola, M.; Kotsakis, R.; Dimoulas, C.; Kalliris, G. Continuous speech emotion recognition with convolutional neural networks. J. Audio Eng. Soc. 2020, 68, 14–24. [Google Scholar] [CrossRef]
Vrysis, L.; Tsipas, N.; Dimoulas, C.; Papanikolaou, G. Crowdsourcing audio semantics by means of hybrid bimodal segmentation with hierarchical classification. J. Audio Eng. Soc. 2016, 64, 1042–1054. [Google Scholar] [CrossRef]
Siamtanidou, E.; Vryzas, N.; Vrysis, L.; Dimoulas, C. Enhancing Data Quality and Optimizing Speech Emotion Recognition Systems through a Gamified Application. In Proceedings of the Audio Engineering Society Convention 156, Madrid, Spain, 15–17 June 2024. [Google Scholar]

Figure 1. The design of the gamified application incorporates SER and crowdsourcing systems with the objective of enhancing and augmenting the SER database. The blueprint illustrates the potential for the creation of digital societies through the association of this design.

Figure 2. The combination of three categories of emotional speech data collection serves to qualitatively enrich the database and achieve the maximum performance of the SER systems.

Figure 3. This figure presents the original user interfaces (a,b) and the modules designated as the “news anchor” and “emotional diary” ((c) and (d), respectively). The presented high-fidelity prototypes represent the initial version of the gamified application.

Figure 4. The main “J-Plus” interfaces: (a) Start screen; (b) Main menu screen (about; game; news anchor; emotional diary; profile).

Figure 5. The main “J-Plus” interfaces are accessible via the “about” section, which provides detailed information regarding the utilization and functionality of each discrete module.

Figure 6. The main screens of the “J-PLUS” application are illustrated. These two screens relate to the “news anchor” category, offering the options of “practice in journalistic speech” and “recognition of journalists’ emotional speech.

Figure 7. Two screenshots are provided from the J-Plus application, which illustrates the “emotional diary” and “profile” sections.

Table 1. The following table presents the analysis and evaluation questionnaire.

Question (Indicative Answers—Range)
Prior to engaging with the gamified application, J-Plus
Q1	My comprehension of the distinctions between AI and ML is (1–5)
Q2	My knowledge regarding the use of AI technologies is (1–5)
Q3	I believe that the capabilities offered by AI technologies are useful. (1–5)
Q4	I believe that the use of AI applications can have a significant impact on society. (1–5)
Q5	I believe that education in digital skills is important. (1–5)
Q6	I believe that multimedia applications can be used for individual education. (1–5)
Q7	I am interested in acquiring knowledge about the fields of AI. (1–5)
Q8	Please categorize the following application as either a serious game or a gamification application: (minecraft: education edition; simcity edu; aegean airlines miles + bonus rewards program; piraeus bank yellow rewards program)
Q9	Which of the following characteristics motivates you to play video games? (entertainment and enjoyment; relaxation and stress relief; challenge and competition; social interaction and connection/communication with others; achievement and progress; cognitive stimulation and problem-solving; curiosity and interest in the story or game world)
Q10	I have received training through digital games or gamification applications. (1–5)
Q11	I prefer learning through alternative educational tools (e.g., serious games or gamification applications) rather than through traditional teaching methods. (1–5)
Q12	Which of the following elements encourages you to engage with gamified applications? (engaging in enjoyable activity; encouragement of learning and progress; motivation to effort through rewards; completion of a challenge; turning a task into a playful activity)
Q13	What do you consider the most important factor for a successful experience with games and gamification activities? (the presence of an engaging and immersive environment; the provision of appealing rewards and incentives; the provision of clear instructions; the availability of feedback)
Q14	Which evaluation factor do you consider the most important for the effectiveness of a multimedia application? (user preference and engagement; ease of use; project effectiveness; cost-effectiveness of the project; potential for growth/evolution of the multimedia tool; feedback capability; none of the above)
Following engagement with the gamified application, J-Plus
Q15	How comfortable are you in expressing different emotions during your spoken communication? (1–5)
Q16	I believe that training and improving the expression of emotional speech is achievable through games and gamification activities. (1–5)
Q17	Which of the following game and gamified application features would be most useful for training in emotional speech delivery? (narrative/plot; interactivity; competition; collaboration; rewards/points; progress tracking)
Q18	I consider it important to feel commitment and motivation while learning a new skill, such as emotional speech. (1–5)
Q19	I believe that serious games and gamification applications could enhance my motivation to practice and improve my emotional speech. (1–5)
Q20	I consider feedback and commentary important when learning a new skill. (1–5)
Q21	Which type of feedback do you consider most useful for improving emotional speech delivery (rewards; constructive criticism; detailed analysis of achievements and mistakes; personalized guidance)
Q22	Do you believe that certain improvements in emotional speech delivery can have long-term benefits in your personal and professional life? (1–5)

Table 2. Pearson’s coefficient was used to calculate the statistical correlation, with a threshold value of 0.5 (*, medium correlation; **, high correlation).

	Achievement and Progress as Motivational Characteristics	Social Interaction and Connection/Communication with Others as Motivational Characteristic	Education Through Serious Games and Gamified Applications	Feedback of Emotional Performance of Speech Expression	The Narrative/Plot Feature for Emotional Speech Training
turning a task into a playful activity	0.647576 **	0.608005 **	0.501280 *
interest in education for AI				0.543657 *
feedback					0.508520 *

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Siamtanidou, E.; Vrysis, L.; Vryzas, N.; Dimoulas, C.A. Gamified Engagement for Data Crowdsourcing and AI Literacy: An Investigation in Affective Communication Through Speech Emotion Recognition. Societies 2025, 15, 54. https://doi.org/10.3390/soc15030054

AMA Style

Siamtanidou E, Vrysis L, Vryzas N, Dimoulas CA. Gamified Engagement for Data Crowdsourcing and AI Literacy: An Investigation in Affective Communication Through Speech Emotion Recognition. Societies. 2025; 15(3):54. https://doi.org/10.3390/soc15030054

Chicago/Turabian Style

Siamtanidou, Eleni, Lazaros Vrysis, Nikolaos Vryzas, and Charalampos A. Dimoulas. 2025. "Gamified Engagement for Data Crowdsourcing and AI Literacy: An Investigation in Affective Communication Through Speech Emotion Recognition" Societies 15, no. 3: 54. https://doi.org/10.3390/soc15030054

APA Style

Siamtanidou, E., Vrysis, L., Vryzas, N., & Dimoulas, C. A. (2025). Gamified Engagement for Data Crowdsourcing and AI Literacy: An Investigation in Affective Communication Through Speech Emotion Recognition. Societies, 15(3), 54. https://doi.org/10.3390/soc15030054

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Gamified Engagement for Data Crowdsourcing and AI Literacy: An Investigation in Affective Communication Through Speech Emotion Recognition

Abstract

1. Introduction

2. Related Work

3. Research Aims

3.1. Research Aims and Project Motivation

3.2. Research Contribution

4. Materials and Methods for SER Gaming and Gamification Strategies: Simultaneously Advancing AI Algorithms, Data, and Literacy

4.1. Analysis

4.2. Design

4.3. Development

4.4. Evaluation

4.4.1. Participants

4.4.2. Experiment

4.4.3. Metrics

5. Experimental Results and Discussion

5.1. Implemented Scenarios and Offered Functionalities

5.2. Analysis and Pilot (Usability) Evaluation of Services

5.3. Discussion and Answers to the Stated Research Hypotheses (RH) and Questions (RQ)

5.4. Limitations and Future Work

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI