1. Introduction
The pace of technological advancement is accelerating, with artificial intelligence (AI) emerging as a prominent interdisciplinary field, pervading various aspects of an individual’s life [
1,
2,
3,
4]. AI is used to describe the capacity of computers and machines to perform tasks that could be considered “intelligent” with minimal human input [
5,
6,
7]. The primary focus is on machine learning (ML), a subfield of AI that pertains to the manner in which computers and machines operate through the processes of learning from data and experience [
5,
6,
7,
8]. In essence, the system acquires knowledge through the examination of examples (training data) rather than being explicitly programmed by humans [
5,
9,
10,
11,
12]. This is why AI and ML are characterized by adaptability, self-organization, and self-learning [
5,
6].
Despite the autonomous nature of AI systems in terms of their programming, the individual remains a crucial factor in guiding and conducting the learning phase of ML. This is due to the fact that they are responsible for a number of key tasks, including selecting algorithms, formatting data, defining learning parameters, and the resolution of any issues that may arise [
13]. The successful development of an operational ML system is contingent upon the availability of a reliable and extensive database. This is precisely the most significant challenge that an individual is confronted with: the continuous collection and utilization of high-quality sample data for training purposes. The operational performance of the machine learning system is dependent on the quantity, quality, representativeness, and diversity of the data [
7,
12,
13,
14,
15]. This process encounters challenges with regard to both the continuous and qualitative collection of data, as well as their adequacy [
12]. Given the pivotal role of AI and ML in domains such as information, medicine, economics, transport control, and psychology [
2,
3,
4], it is crucial to identify a proposal or solution that prioritizes the straightforward, continuous, dependable, collaborative, and extensive collection and provision of sample data. This should be the primary objective in the ongoing development and enhancement of these systems.
The considerable focus on AI has facilitated and accelerated the development of automated systems with the capacity to recognize a range of emotions through speech, including happiness, sadness, anger, fear, disgust, and surprise [
16,
17,
18]. These systems are referred to as SER (speech emotion recognition), and their primary objective is to identify and categorize emotions expressed in human speech. These systems are based on the analysis of acoustic features extracted from speech using machine learning models trained on annotated datasets [
19,
20]. This offers a number of advantages, including the enhancement of human–machine interactions, the evaluation of user experience in digital environments, the monitoring of customer service mood, the assessment of mental health conditions, and the optimization of the user’s emotional engagement and experience during digital storytelling or immersion in a computer game [
21,
22,
23,
24]. Nevertheless, automatic emotion recognition from speech represents a challenging research area, exhibiting comparable issues to those encountered by the majority of AI technology-based systems. The obstacles and difficulties encountered by SER systems are arguably more intricate than those faced by other AI systems, which have proven highly functional [
25]. In particular, there is difficulty in distinguishing the most appropriate signal characteristics and classification methods [
25,
26]. Furthermore, the process of data collection is inherently time-consuming, contingent upon the underlying interpretations employed in emotion labeling and the considerable individual variability in emotional expression [
27,
28]. Additionally, concerns have been raised regarding the privacy of the recorded data and the permissible limits of analysis of users’ speech [
29]. It is, therefore, essential to identify a solution that can capitalize on the potential benefits of such systems while addressing the challenges, ethical concerns, and privacy issues that have been highlighted.
It is possible to suggest that alternative multimedia technologies could prove an effective solution, not only in terms of the evolution and development of AI systems but also in terms of providing a variety of possibilities and utilities to the individual/user. It has been demonstrated that serious games and gamification technologies can be employed as alternative multimedia applications [
30,
31,
32]. Although these alternative multimedia technologies have been explicitly linked to entertainment, their impact on various social aspects of the individual has been demonstrated over time [
33,
34]. The incorporation of elements such as competition, rewards, challenges, scoring, achievements, and the provision of a digital interactive simulation environment contribute to the impressive and enticing nature of these multimedia applications. These elements collectively serve to mobilize the user’s attention and stimulate their interest and engagement with the applications [
30]. On the basis of these characteristics, the dual literacy they offer the user emerges in terms of their subject area and the digital skills of the individual. According to the latter, the approaches of these applications prove to be ideal for familiarizing individuals with demanding and time-consuming technological achievements, such as digital media literacy needs [
34,
35].
The robust correlation between multimedia progress and sophisticated mediated communication capabilities has led to a surge in public engagement with highly interactive and immersive environments. This has enabled the formation of advanced media services that can prove highly beneficial for the individual in terms of developing digital literacy, as well as for other demanding lifelong learning tasks, by utilizing media and education in more enjoyable and effective ways [
34,
35,
36,
37]. Modern multimedia genres, especially digital games, offer users experiences, new content, and visions that transcend the digital medium itself [
38]. The discussion focuses on the creation and utilization of digital networks for the development of additional content within social networks or other virtual channels. This encompasses the formation of social organizations, interaction between users, and mass collaboration between them to achieve a common goal. In other words, a new social culture known as crowdsourcing has emerged, which is, in essence, a collaborative activity applicable to the information systems sector. It involves the co-creation and innovation of users [
39,
40,
41]. It can, therefore, be argued that the approaches developed in the field of alternative multimedia applications have the potential to be utilized in the resolution of the functional issues inherent to AI mechanisms while simultaneously focusing on the user, thereby providing them with a means of training and developing a variety of literacies.
This work addresses a number of issues related to the functionality of systems and the optimal performance of AI mechanisms in general, with a particular focus on emotional speech recognition systems as a case study. For these systems to operate successfully and correctly, it is essential that audio recordings with emotional speech labeling are continuously and reliably collected. This paper puts forth a solution in the form of particular techniques and applied tools that are not concerned with the problem itself but rather with the user. In essence, this paper proposes the use of gamification technologies, serious games, and, as a consequence, the social networks they form as digital environments based on entertainment. However, these digital tools are designed to optimize the user experience, engage the user, and facilitate learning and informal education. Concurrently, efforts are being made to elucidate the concerns and issues pertaining to this project.
The remainder of this paper is structured into five sections.
Section 2 examines significant works pertinent to the fields addressed in this study. The objectives of the current project are delineated in
Section 3. The phases of analysis, design, development, and evaluation, which constitute the materials and methods of this research, are detailed in
Section 4.
Section 5 presents the experimental results and discussion, highlighting the outcomes of the final project in relation to the stated hypotheses and research questions. This paper concludes with
Section 6, which provides a summary of the main points and an analysis of the conclusions.
2. Related Work
In recent years, there has been a notable shift in scientific focus toward the advancement of methods and strategies for informal education, recognizing their pivotal role in the development of various skills and competencies in individuals [
30]. The primary objective is to cultivate knowledge through digital and interactive environments, which are not inherently linked to the formal educational sector. The aim is to facilitate participation, create motivation, provide enjoyable engagement, foster a sense of belonging to a group, and promote productive learning [
34,
35,
36,
37]. Consequently, this study explores the domains of serious games, gamification technologies, and crowdsourcing. By examining these areas, this study seeks to address different perspectives related to the advancement of media and education. These perspectives include the educational approach, which examines whether informal education methods can enhance an individual’s learning experiences and outcomes; the technological approach, which focuses on utilizing digital and interactive environments to facilitate knowledge acquisition; and the psychological approach, which investigates the impact of the characteristics of gamification technologies and serious games on an individual, including aspects such as motivation, engagement, and a sense of belonging. At a broader level, additional perspectives can be integrated, such as professional, highlighting the benefits of these methods for media literacy and skill development among journalism professionals; societal, promoting digital literacy and media literacy among the general public; and innovative, exploring whether crowdsourcing and immersive journalism can transform the dissemination and consumption of information. This correlation arises from the identification of the target group, which primarily includes professionals in the field of journalism and, secondarily, the general public. The effectiveness of these three areas lies in their ability to combine the advantages of easy, experiential, and entertaining information dissemination with the simultaneous promotion of digital literacy and media literacy. This approach takes into account the rapidly evolving information landscape and recent trends in multimedia, mobile, and immersive journalism [
42,
43].
Beginning with serious games and gamification technologies in the field of journalism, the extensive body of research available, coupled with the diverse journalistic domains they address, underscores their significant contribution, thereby reinforcing their use as educational tools. For instance, according to the research [
44], the integration of serious games during the training of journalists is deemed essential. Specifically, these games aid aspiring journalists in developing the necessary analytical reasoning and news gathering and processing skills, interview techniques, and fostering empathy and political sensitivity—attributes considered vital for journalistic expertise [
44]. Similarly, research [
45] presents findings from the use of various serious games in the training of future journalists. It demonstrates that this method can effectively achieve the objectives of journalistic education, including media literacy, the ability to tackle complex and specialized tasks, fact-checking skills, and the creation of digital content. Despite the positive outcomes, participants in this study identified certain challenges related to the practical implementation of serious games in academic settings, specifically issues of accessibility, methodological teaching support, and ethical dilemmas [
45]. In the context of educating and preparing future journalists, study [
46] adopts a similar approach but focuses exclusively on gamification. It details the development of a gamified platform whose use and functionality yielded positive results in enhancing participants’ critical thinking and improving their journalistic practices [
46]. Equally noteworthy is the contribution of study [
47], which, through the application of gamified strategies, aims to motivate already trained journalists. The results of this study are compelling, as they suggest that competition, rewards, and incentives can lead to improved performance and motivation among journalists [
47]. The use of a playful digital environment with educational impact extends beyond professional journalists, as positive outcomes are also observed in the realm of public information. Specifically, research [
48] explored the combination of gaming and information dissemination. It examined the potential of presenting topics and informing the public through serious games and gamified digital environments, focusing on current and significant societal events. The application of this method demonstrated that journalistic values are reinforced, as an immersive, playful, and participatory environment encourages user engagement, informs them about current events, and deepens their understanding of the verification process of information provided by the interactive world in relation to reality [
48]. The work [
35], which similarly targets both professional journalists and the general public, introduces an online game designed to train individuals in verifying news and manipulated content. The playful nature of this application positively influenced users’ observational skills regarding the elements that each news source should possess and the methods of verifying them to combat misinformation and fake news. Additionally, it is worth highlighting the enhancement of digital literacy that resulted from users’ engagement with this platform [
35].
While there is no evident link between crowdsourcing, gamified technologies, and serious games, it is crucial to recognize that these domains operate as collaborative systems and collective intelligence environments. These environments are created by users through human–computer interactions and are developed within social media communities [
49]. Crowdsourcing is employed in various ways within the field of journalism. From the perspective of journalists, this approach facilitates the discovery of diverse and previously inaccessible information, ultimately leading to more comprehensive and objective reporting. Furthermore, it fosters trust, participation, and a stronger relationship between readers and journalists [
50]. Study [
51] that examined crowdsourcing as a method of knowledge-seeking and open journalistic practice found that it positively influenced effective fact-finding and checking, providing a continuous flow of advice to journalists. However, the same research highlighted the challenge of managing a large volume of information submissions, which could result in the publication of inaccurate data [
51]. Research [
39] presents an innovative application focused on the collaborative collection and documentation of soundscapes and the semantics of environmental sounds. The methodology employed in this study shares similarities with sophisticated mobile journalism services. In addition to enhancing audience engagement, the results also demonstrated the profitable nature of this research method across various fields. Specifically, the benefits extend beyond users to the scientific sector, as engagement with the application aids in the collection and storage of resources and metadata. This, in turn, facilitates the semantic enhancement of services in the cloud and enables the utilization of ML techniques [
39]. As evidenced by study [
52], crowdsourcing has the potential to influence the development of novel perspectives and practices within the realms of online journalism and democratic processes. This study details the utilization and implementation of crowdsourcing strategies for the comprehension and eradication of misinformation and fake news. The findings indicate that the general public is interested in and willing to engage in fact-checking activities. Nevertheless, the most accurate information and fact confirmation appear to be achieved through a combined model of crowdsourcing tactics and professional fact-checkers and experts, thus indicating the emergence of a new journalistic model of communication and information [
52].
The promising results of the aforementioned studies serve as a catalyst and source of inspiration for the advancement of journalism and public information. The innovation of this research lies in the development of a tool designed to meet the needs and requirements of the modern era while simultaneously integrating the benefits of various sectors. This issue is addressed by providing training to both journalism professionals and non-professionals, utilizing gamification, serious games, and crowdsourcing technologies. The focus of this study is on emotional speech, a topic that has not yet been explored through informal educational methods. As a result, this literature review focused on the impact of informal education techniques on a broad aspect of journalism, with the potential for further refinement and enhancement through this paper. Ultimately, the novelty of this approach lies in its potential benefits for the field of emotion recognition through speech, which, in turn, has significant implications for the broader field of AI.
As previously stated, the advancement and expansion of the scientific domain of SER systems is a highly challenging endeavor, as it necessitates the collection of a vast and high-quality corpus of speech data with emotional labeling. Numerous studies have been published that focus on the creation of data enrichment and annotation tools. Research [
53] presents a collaborative web-based annotation tool that allows for the time-based segmentation, transcription, and tagging of audio and speech data. Essentially, users can segment an audio file, identify and annotate it, and assign labels to specific values. The subsequent step involves uploading and assigning the audio data along with their corresponding annotations, thereby enriching the database and increasing the probability of success for tasks such as voice activity detection, speaker recognition, and emotion recognition [
53]. It is equally important to consider the research presented in work [
54], which addresses the general challenges of SER tasks, particularly the difficulty of recognizing and distinguishing emotions, both as a function of the respective systems and as a perception by humans. This research advocates for a gradual enrichment of the database, starting with the provision and training of the system with simple annotated speech data, which then becomes increasingly complex. This model is based on the “minmax entropy framework”, which includes ambiguously labeled speech data during the training stage. This approach allows for the description of the difficulty in discriminating between emotional speech samples, as evaluated by the raters [
54].
This work focuses on the enrichment of the Acted Emotional Speech Dynamic Database (AESDD). The construction of this database involved recording expressions with active emotional expression in Greek and English, encompassing five distinct emotional states: anger, fear, happiness, sadness, and disgust [
19,
41,
55]. The AESDD is a dynamic database that permits the continuous addition of emotional speech samples, thereby facilitating the expansion of its data through crowdsourcing contributions [
20].
The AESDD was the beginning of identifying and designing various strategies and methods to enrich, continuously improve, and develop its capabilities. The initial effort involved creating a website that allowed users to perform specific actions, such as recording emotional speech, collecting and managing emotion-tagged recordings, and serving as a digital repository [
20]. However, this approach failed to attract sufficient interest or active participation from the majority of individuals. Consequently, a second approach was adopted: the creation of a digital game designed to provide entertainment and encourage further engagement. By enabling identification with the game heroes and authentic expression of emotions, the game aimed to enhance the validity and quality of the collected audio samples [
56]. Although this approach proved to be more appealing and practical than the original one, it remained primarily a recreational option that lost its appeal after the game was completed. Nevertheless, this second approach demonstrated the beneficial effects of game-based activities on both individual improvement and the progress of SER systems. Therefore, fertile ground exists for the design and development of approaches involving gamified digital activities that could actively and catalytically contribute to addressing the aforementioned issues.
4. Materials and Methods for SER Gaming and Gamification Strategies: Simultaneously Advancing AI Algorithms, Data, and Literacy
Before detailing the development of the application, it was crucial to understand the disciplines involved in this research and to conduct a comprehensive review of the existing scientific and academic literature on this subject. Thus,
Section 4 presents a methodological and technological framework for designing a gamified digital application with an educational focus. As previously stated, this application represents a continuation of previous efforts to explore the scientific field of SER. Each approach has emphasized the qualitative and continuous enrichment of the emotional speech database, thereby enhancing the robustness of these systems.
The waterfall model was employed for the project’s development due to its inherently complex nature, which aims to achieve an interdisciplinary synthesis. The selection of this structured and sequential model, where the conclusion of one phase signals the commencement of the next, was deemed essential. The principal phases of this model are as follows: application description, requirement definition, design specification, subsystem development and testing, system integration and testing, and finally, operation and maintenance [
57].
4.1. Analysis
In the current phase, substantial progress has been made in defining the project’s focus. Key tasks completed include conceptualizing the initial idea, identifying the target audience, defining the intended users/community, reviewing previous projects, identifying limitations, and devising solutions to potential problems. This study presents three principal processes implemented during the analysis phase.
The first process involved reviewing and obtaining feedback from previous research. The motivation for the present research was to develop a method for qualitatively and continuously collecting samples of emotional speech, with the aim of assessing the robustness of SER systems. As previously mentioned, two prior studies aimed to achieve the same objective [
20,
56]. Although the initial results of these approaches were promising, indicating improvements in the quality and quantity of emotional speech samples collected and, consequently, in the performance of the respective SER systems, the absence of a clearly defined target group became evident. Specifically, the first project invited the public to participate voluntarily by recording their voices with specific emotional labels, while the second project employed a similar process but engaged the general public through a digital serious game. In both cases, the absence of a targeted audience led to a lack of motivation to engage with these projects, resulting in their rapid abandonment. As a result, maintaining a consistent stream of emotionally annotated speech data was not accomplished, impeding the training requirements of the SER systems and resulting in a stagnation in their development and performance.
Thus, the second concern of this research became the establishment of a target group, with the idea that a more convincing motivation to engage in the process of recording and depositing emotionally labeled speech would be achieved. A review of current societal needs indicated that the suitable target group for this project includes both trained and unqualified journalists, news anchors, broadcasters, and, to a lesser extent, the general public. This group of users, in the course of exercising their communication skills, must meet high standards of vocal quality as well as verbal and non-verbal expressiveness [
58]. Therefore, it is logical to create a project that combines the continuous need to record and collect emotionally labeled speech samples with the training of media professionals and non-professionals. Audience analysis was conducted through interviews, focus groups, and online questionnaires to identify the main objectives of the new project in relation to the profiles of different users, as reflected in the demographic and psychographic data collected.
The third and primary process involved describing key requirements and guidelines, with their expected use in subsequent design iterations. Previous attempts to create a tool for the qualitative and quantitative collection of emotional speech samples highlighted the positive contribution of digital environments and interactive technologies, emphasizing their ability to attract interest and elicit a positive response from the audience. Therefore, the present project must also be structured within a corresponding digital interaction context. The decision to establish a gamified digital environment was based on feedback from previous experiences, which, in addition to recognizing existing positive and constructive features, necessitated the incorporation of new elements to facilitate its evolution. Consequently, a blueprint was created to highlight the aspects that will constitute the gamification tool, which, focusing on the continuous feeding and training of the SER database, incorporates multiple sources of reinforcement, including gamification, serious games, and crowdsourcing. Additionally, it addresses the societal dimension, as this project is expected to foster the creation of various groups of individuals who share and express common interests, desires, goals, and habits (
Figure 1).
4.2. Design
The design phase involved activities related to the creation of the application structure, flowcharts, high- and low-fidelity templates, and navigation structure. This section summarizes the key outcomes of this phase.
It is worth mentioning that the application is designated “J-Plus”. The designation is derived from an explication of the app’s content, namely, “a journalist’s practice in linguistics, utilizing tools for speech emotion”.
The design of the gamified application was based on two main pillars, reflecting its dual focus. The initial objective was to utilize the tool to enhance SER systems. Consequently, the specification was established to combine and integrate three distinct methodologies for database creation and enhancement. Specifically, the three categories, which are distinguished according to the manner and quality of data collection, are as follows: natural, acted (simulated), and elicited (induced). The natural category involves gathering unprompted and genuine samples of emotional speech captured in authentic contexts and situations. The acted category pertains to the recording of emotional speech data through the professional contribution of actors and artists, aiming for maximum consistency of emotion in active speech. The elicited category involves creating simulated situations to evoke and stimulate specific emotions in a more natural manner [
59,
60]. Accordingly, the objective of the application’s design axis was to identify a methodology for combining these three categories of data collection (
Figure 2) to optimize the performance of the SER system.
In light of this parameter, the specifications and design of the second objective were duly modified. The concept of combining three distinct categories of emotional speech data led to the design of a gamified application comprising three discrete modules. The objective of these modules is to facilitate a novel approach to the acquisition of emotional speech. As this is a digital training tool with elements of interaction and play, the discrete design of three modules/methods was emphasized. The aim was to provide users with the flexibility to choose their preferred method for training in emotional speech. Furthermore, beyond merely offering training methods, the integration of these three distinct methods facilitates the identification of user preferences. This enables the determination of which method is most attractive and preferred by users, thereby enhancing its effectiveness. Additionally, this focus allows for further enhancement and exploitation of the preferred method. As shown in
Figure 2, the first module involves the design of a serious game. Users are immersed in a digital world, identifying with the central character to complete various challenges and achieve specific goals. This module aims to engage users in informal educational activities that provide enjoyable practice in emotional speech while simultaneously collecting qualitative data according to the acted speech methodology. The second module is designed to simulate activities tailored to the needs of journalism professionals in oral presentation and information expression. Users interact with existing journalistic articles, documents, and news programs, either by presenting them to practice their emotional speech expression or by observing and evaluating them to identify errors and inconsistencies in the conveyed emotions. This module aims to teach and refine emotional speech skills based on real-world events, conditions, and situations, integrating and promoting both acted and elicited speech methodologies. The third module was designed to foster a sense of personal commitment between the user and the application, adhering to the natural speech data collection methodology. This module aims to dismantle barriers, rules, and inhibitions regarding emotional expression, allowing individuals to freely articulate their emotional states. This innovative approach culminated in the creation of an emotional diary. The subsequent phase of the project involved defining scenarios for each module, along with the objectives that the application is designed to achieve.
The environment of the gamified application was designed to be two-dimensional (2D), with interaction possible via desktop or laptop computers, as well as mobile devices. User engagement involves visual, auditory, and tactile elements, and an internet connection is required. Although the application is primarily designed for individual use, it also offers the possibility of connecting and interacting with other users as a secondary feature, leveraging its crowdsourcing nature.
Based on this information, a series of initial prototypes with a low level of fidelity were developed, which subsequently informed the creation of high-fidelity prototypes. These high-fidelity prototypes serve to illustrate the various modules and options available within the application (
Figure 3).
As previously discussed, the objectives of the gamified application were defined with a particular emphasis on the robustness of the SER systems and the education of individuals on emotional discourse. The correlation of these two themes provided a foundation for the development of additional objectives, which are set to be achieved in a subsequent phase (
Figure 3). Essentially, users are provided with a digital tool that specializes in facilitating learning and improvement in the area of emotional speech utterance through gamification tasks. Concurrently, users have the option to select the methodology they deem most effective and conducive to training and enhancement in this domain. In addition to facilitating the development and practice of emotional speech, gamification can assist users in managing their emotional state, improving communication skills, and enhancing their digital literacy. Furthermore, the provision of a digital interactive environment has the potential to enhance user engagement and foster continuous interest. The combination of this feature with the option for users to choose the most effective method of training and practice facilitates a more convincing and realistic expression of emotional speech. Consequently, it becomes feasible to collate extensive datasets with emotional speech annotations, which enhance and advance SER systems. In conclusion, this work contributes to the creation of a collaborative model that benefits both target users and researchers in the field of audio recognition.
4.3. Development
This project is designed with two principal objectives, and thus, the development of the gamified tool is divided into two parts for more detailed analysis.
In accordance with the specifications established during the design phase, the gamified application was required to possess the attributes of an accessible digital interaction environment, ensuring compatibility with the majority of computing devices. The chosen platform for the design and implementation of the application is “Genially”. This platform represents an open and advanced web environment capable of developing a variety of digital and interactive products. It offers the possibility of integrating various visual, audio, and interactive elements to attract and retain the attention of end users. Genially is recognized as a tool that is conducive to the creation and utilization of digital products and projects. Its user-friendly interface and intuitive functionality eliminate the necessity for advanced technical expertise or even a fundamental understanding of code, thereby facilitating the development process. The product design screen is distinguished by the presence of visual scripts and functions. The integration of interaction and the assignment of functionality to options and buttons during project development are achieved through the utilization of visual programming and schematics.
The subsequent stage of examination and development involves augmenting the emotional speech database and incorporating the SER mechanism. A SER model, trained on previously collected data, provides emotion detection and prediction in various sections of the application. Specifically, this functionality is included in the first section, the serious game mode, as well as in the second section, the simulation application for presenting a news story. This model is exposed through a web service that communicates with the gamified application software. As more recordings of reliable data tagged with emotional speech are collected, the dataset is extended with multisource data. Newly acquired data are used to retrain and enhance the robustness of the SER model. The emotional speech data used for training and testing the SER model include five (5) distinct categories: anger, fear, happiness, sadness, and disgust.
The model is based on a 2D convolutional neural network (CNN) architecture trained on mel spectrograms of speech signals [
20,
61]. It is a classification model trained on the data from the AESDD emotional speech dataset. AESDD contains recordings from six actors expressing discrete spoken emotions in the aforementioned emotional classes [
19]. Emotion classification is conducted within successive time windows of 1.3 s with a 50% overlap, while the final assessment is determined by a majority vote to provide utterance-level classification [
61,
62]. The convolutional architecture allows for hierarchical unsupervised extraction of features from the 2D mel spectrograms through the training phase. The training process has been held using the Python Keras 2.13.1 deep learning framework, which facilitates the export of trained models that are compatible with the proposed web service architecture. More details on the training process and the model architecture and fine-tuning can be found in [
61]. This majority voting mechanism is weighted according to the probabilities of the classification results for each effective class. Consequently, the model output comprises a probabilistic scheme indicating the presence of each emotion, accompanied by corresponding confidence levels, rather than providing categorical classification results.
4.4. Evaluation
The evaluation process encompasses testing various system areas related to proper functioning, defined objectives, design and interface requirements (user experience/UX and user interface/UI), as well as future specifications, improvements, and maintenance. System testing serves to extract feedback in the form of evaluations, aiming to achieve improvements through maintenance. This entails identifying and addressing any errors, weaknesses, and deviations, as well as enhancing specific parameters and functions. The final product was evaluated in two categories using both qualitative and quantitative measurements.
4.4.1. Participants
A total of forty-eight (n = 48) individuals participated in the evaluation process, divided into two distinct groups. Group 1 engaged in a quantitative evaluation, which involved the distribution and completion of a digital questionnaire. This group consisted of forty-four (n = 44) participants who were asked to respond to the questionnaire both before and after their interaction with the gamified application. The demographic characteristics of this group included a balanced gender distribution (18 males, 18 females, 8 others) and a diverse age range (18–25: 27.27%, 26–35: 22.73%, 36–45: 18.18%, 46–55: 13.64%, 56+: 4.55%, Prefer not to answer: 13.64%). The average age of the participants was 34.00 years, with a standard deviation of 11.88 years. Ethical approval for this research was obtained from the “Committee on Research Ethics and Conduct” of the Aristotle University of Thessaloniki. Participation in the evaluation of the gamified application was conducted anonymously and in accordance with the procedures and regulations established by the committee. Participants were informed in writing about the study’s data, requirements, goals, and objectives prior to their involvement. Consent and acceptance of the stated terms were prerequisites for participation. Additionally, participants had the option to withdraw from this study at any time without providing data.
Group 2 participated in the qualitative evaluation of the gamified application. This group consisted of four (n = 4) experts, each with expertise in a different scientific field. Specifically, the first two experts hold degrees in electrical and computer engineering and doctorates. The first expert specializes in the design of user interfaces for mobile applications and web interfaces and has participated in the development of games and gamification systems (Expert #1). The second expert is engaged in the field of multimedia and audiovisual semantics technologies, with a particular focus on digital media sound design and the operation of speech recognition systems (Expert #2). The third expert comes from the field of education, specializing in digital media and the identification and implementation of diverse frameworks for learning approaches (Expert #3). The fourth member of the group is a graduate of journalism and social media, currently pursuing a doctoral degree with a specialization in digital media in journalism (Expert #4).
4.4.2. Experiment
As previously mentioned, the evaluation involved the implementation of both quantitative and qualitative methodologies. Group 1 participated in the quantitative evaluation, which was conducted through the distribution and subsequent completion of a structured questionnaire. This questionnaire comprises three distinct groups of questions: an analysis of the participants’ existing knowledge and an assessment of their knowledge following engagement with the gamified application. In the first section, participants responded to a series of closed-ended questions to ascertain their existing knowledge and level of engagement with AI, serious games, gamification processes, and related topics. The second section was completed after participants interacted with the gamified application. This study aims to determine the extent to which participants have acquired knowledge and developed skills specifically related to emotional speech through engagement and immersion in a digital world with elements of gamification and interaction. The second section also seeks to gather conclusions and feedback regarding users’ behavior in relation to the gamified application, considering indicators such as familiarity, entertainment, enthusiasm, clarity of targeting, and effectiveness. The third section of the questionnaire pertains to demographic variables. Most questions were structured in a categorical format, with respondents indicating their level of agreement with a series of statements using a five-point Likert scale, where 1 represents “not at all” and 5 represents “very much”. Additionally, multiple-choice questions were incorporated.
Table 1 depicts the analytical structure of the questionnaire. The questionnaire was administered to participants via the web-based software “LimeSurvey”, which was disseminated digitally via the internet and social media and conducted anonymously. The online distribution and digital completion of the questionnaire ensured the anonymity of the participants and provided them the freedom to engage in the research or withdraw from it at any time. All individuals who participated in the quantitative evaluation interacted with the gamified application once during the intermediate part of completing the questionnaire. Access to both the questionnaire and the gamified application was facilitated via the participants’ available devices (desktops, laptops, mobile devices).
Group 2 contributed through the organization of experimental conferences and discussion groups, implementing the qualitative evaluation. The expert participants were free to engage with the gamified application as many times as they deemed necessary to reach final conclusions. It was essential for them to engage with the gamified application using different devices (desktops, laptops, portable devices). The objective of these activities was to identify the weaknesses and shortcomings of the final product, test the implementation and performance of the “five E’s” (effectiveness, efficiency, engagement, fault tolerance, ease of learning), and validate the enhancement of the AESDD database and the proper functioning of the SER systems.
4.4.3. Metrics
The performance of the proposed system was evaluated using both quantitative and qualitative metrics. Quantitative metrics included participants’ responses to the questionnaire, which were measured using a five-point Likert scale and multiple-choice questions. Qualitative metrics involved expert feedback on the “five E’s” (effectiveness, efficiency, engagement, fault tolerance, ease of learning) and the overall functionality and effectiveness of the gamified application. These metrics provided comprehensive insights into the system’s performance, user engagement, and educational impact. The feedback and qualitative evaluation reports are presented in the following section, which addresses the results.
In conclusion, the sample of forty-eight individuals who participated in the quantitative (n = 44) and qualitative (n = 4) evaluation processes is deemed sufficient for the pilot validation process.
5. Experimental Results and Discussion
This section presents the qualitative and quantitative findings regarding the design and implementation of the gamified application J-Plus. The results are related to the reported RH and RQ.
5.1. Implemented Scenarios and Offered Functionalities
The final version of the gamified application effectively integrates gamification technologies, serious games, and crowdsourcing with the development of SER systems (
https://m3c.web.auth.gr/j-plus/, accessed on 30 September 2024). This integration facilitates the exploration of interconnections and mutual assistance across different scientific and research disciplines. The J-Plus application commences with the home screen, which presents the user with five distinct modules (
Figure 4). Each module or option serves a unique function, allowing users to select their preferred method for training and enhancing their emotional speech delivery.
The “About” module provides users with information regarding the feasibility and potential uses of the application. The “Game” section features a serious game designed to engage users in enjoyable yet challenging tests. This game incorporates elements of competition, practice, and evaluation pertinent to the desired expression of specific emotions (game mode: acted speech). The “News Anchor” module focuses on training and improving emotional speech for individuals in journalism. This module offers two versions: (a) training in the delivery of emotional speech through the presentation of authentic news items and events (simulation mode—acted speech) and (b) observation and analysis of the emotional speech of professional speakers (simulation mode—elicited speech). The “Emotional Diary” module serves as a record of emotional experiences, inviting users to freely express their thoughts and record their current emotional state, thereby contributing authentic emotional expressions to the application’s database (personal mode—natural speech). The final section, entitled “Profile”, provides a summary and classification of users based on their engagement and interaction within each section of the application. This section illustrates the user’s progress in each part of the application, as well as the emotions they have experienced and successfully managed. These emotions are categorized into five groups: anger, fear, happiness, sadness, and disgust. These categories align with the comprehensive range covered by the AESDD across all sections of the application (
Figure 5,
Figure 6 and
Figure 7).
Notably, the “Profile” section also facilitates the crowdsourcing function, allowing users to be redirected to an external environment beyond the application. Through processes of general discussion, comparison of levels and performance of expressed emotions, exchange of opinions, and resolution/support for any problems and difficulties that may arise, a digital community can be established, focusing on emotional discourse.
The section on the mechanisms of development and operation of SER systems begins with the data accumulated during the user’s engagement with the J-Plus application. Specifically, these new data are employed and integrated during the processes of predicting user emotional discourse. This process is implemented across different modules of the application, forming evaluation indicators for the expression of targeted emotions at any given time. The collection of new samples of emotional speech, which become more reliable due to these evaluation indicators, creates a continuous and qualitative supply of emotional speech records. This allows the system to retrain, thereby enhancing the robustness of the SER model. This iterative process provides users with performance feedback and facilitates the continuous improvement of their performance.
5.2. Analysis and Pilot (Usability) Evaluation of Services
This research builds upon previous efforts to develop methods and strategies for collecting emotional speech data and developing SER systems. These prior endeavors included the creation of a website [
41] and a serious digital game [
56]. In contrast, this proposal focuses on the development of a gamified application. Therefore, it is necessary for the group of experts to compare the new method with the previous ones to obtain more accurate and targeted feedback. A comparison of the three approaches (the website, the serious game, and the gamified application) was conducted based on five indicators: (a) engagement, (b) pleasure–satisfaction, (c) effectiveness, (d) skill development and literacy, and (e) sample/crowdsourcing generalization [
63]. In summary, this website yielded the lowest results compared to the other two methods, indicating that incorporating gaming elements or gamification can lead to improved outcomes in terms of user attraction and engagement (website: mean 1.2, st. dev. 0.45; serious digital game: mean 3.8, st. dev. 0.45; gamified application: mean 4.2, st. dev. 0.84), and performance (website: mean 1.8, st. dev. 0.84; serious digital game: mean 2.6, st. dev. 0.89; gamified application: mean 4.2, st. dev. 0.84). The gamified application produced more favorable and constructive outcomes than the serious game, particularly in skill development and literacy (serious digital game: mean 3.4, st. dev. 0.55; gamified application: mean 4.2, st. dev. 0.84). The creation of a target group for the application, along with its combinatorial nature, may contribute to capturing users’ attention and fostering a sense of daily and consistent engagement. Thus, it can be concluded that the research objectives are more likely to be achieved through the implementation of the gamified J-Plus application.
In specific, regarding the gamified application, the experts provided several comments and broader observations. They proposed modifications focusing on the content and the graphical user interface (GUI). One significant modification was to the “About” section. They believed it was important that information relevant to the other options of the gamified application should also be provided in the opening interface of each module. These proposed changes are expected to be implemented during the maintenance process of the gamified application. This approach ensures that users are informed from the beginning about the nature of each module, facilitating better navigation within the application. Positive feedback was given regarding the content of the J-Plus application, particularly highlighting the option for users to choose their preferred method of engaging in emotional speech. The emphasis lies on the user’s connection to the gamified application and sustained engagement, especially when compared to previous methods of collecting emotional speech data. The diversity of the modules, combined with the “Emotional Diary” option, which relates to more personal user information, gives the J-Plus application a multifaceted character. This multifaceted nature ensures that user engagement is stable, meaningful, and qualitative. The experts also provided constructive comments on the potential for achieving the stated objectives. They emphasized the positive impressions regarding the qualitative collection of emotional speech data and their simultaneous utilization within the application environment. Additionally, they highlighted the importance of the simultaneous development of both the discipline of SER and digital tool literacies through engagement with this tool.
However, the experts also expressed some concerns. Firstly, they raised reservations about the performance of emotional discourse and the reliability of the SER database. Concerns were noted regarding the potential mispronunciation of emotional speech to achieve harmonization during the implementation of the SER system, i.e., the adaptation of the user to the familiar and correct emotion for the machine. This adaptation could potentially lead to gains in achievements and better progress. The gamified application targets journalism professionals who seek training and improvement tactics to express news discourse with accurate emotional load. Therefore, designing the app with a specific target group in mind may mitigate this concern. Significant reservations were also expressed regarding privacy and ethics. The development and evolution of SER systems rely on the expressed emotional states collected through recordings of human speech. Consequently, the collection of this data raises major privacy and ethical issues. The expert group emphasized the necessity for users to possess transparent information and provide informed consent for the recording and using their voice data. Additionally, ensuring the security of the data, with access restricted to authorized personnel only, is crucial. The gamified application ensures that users are fully informed about the reasons and purposes of recording and storing speech during emotional speech interactions. Users can only use the application after agreeing to these terms and providing their consent. Furthermore, the emotional speech data repository is hosted on a web-based platform designated and managed by Aristotle University of Thessaloniki. This ensures that only authorized individuals involved in the specific project, whose role is to develop SER systems, have access to these data. This approach aims to balance technological innovation, privacy protection, and the development and evolution of SER systems.
The second stage of the evaluation process involved a quantitative method, achieved through the distribution of a user analysis and application evaluation questionnaire. As previously stated, the questionnaire is divided into a number of sections. The initial sections pertain to the participants’ preexisting knowledge and habits before using the J-Plus application. The final section pertains to the participants’ evaluation of the application following its utilization (
Table 1). It should be noted that a reliability test was conducted prior to performing the statistical analysis based on Cronbach’s alpha. The resulting value of 0.826 indicates a high level of internal consistency for the entire questionnaire. This suggests that the questionnaire is internally consistent and that the resulting statistics are valid.
Upon examining the initial section of the data, it becomes evident that a significant proportion of respondents perceive their understanding of the field of AI to be limited. Specifically, for Question 2 (Q2), the average response is 2.80, and the median is 3. Conversely, a relatively smaller subset of respondents indicate that they possess a more substantial grasp of the subject matter, as evidenced by the interquartile range index (IQR = 2). This finding is supported by the public’s responses regarding their knowledge of the differences between AI and ML domains. In response to Question 1 (Q1), >65% of participants indicated that they were either unaware of the differences between AI and ML or had a limited understanding of them, with an average response of 2.675 and a median of 3. Encouragingly, the overwhelming majority of respondents (>72%) view the potential offered by AI technologies as significant and beneficial, as reflected in Question 3 (Q3), where the average response is 4.10, and the median is 4. Furthermore, the public’s intention to gain knowledge about AI is characterized as high and important, as indicated by Question 7 (Q7), with an average response of 3.925 and a median of 4. The responses provided by the respondents regarding the fields of serious games and gamification technologies exhibit a comparable degree of fluctuation. However, it is important to note a distinction between these responses and those given in the previous domain. The public demonstrates a certain degree of familiarity with the concepts of gamification and serious games and is able to differentiate between the two. This is evidenced by the respondents’ ability to correctly match examples of serious games and gamification applications, as requested in Question 8 (Q8) of the questionnaire. Similarly, the preference and shift of the public toward a more enjoyable and alternative mode of education, as evidenced by the responses to Question 11 (Q11) of the questionnaire, favor this approach over traditional teaching methods, with an average response of 3.37 and a median of 3. The characteristics of entertainment and enjoyment, a sense of relaxation and relief from stress, and the excitement of challenge and competition are the most important elements of serious games and gamification technologies that can motivate respondents to engage in alternative modes of education, as indicated by Questions 12 (Q12) and 13 (Q13). Furthermore, it is notable that the ease of use of a multimedia product, such as serious games and gamification technologies, is a significant determining factor in its effectiveness, as highlighted in Question 14 (Q14).
Based on the analysis of the questionnaire results, it is evident that the J-Plus application significantly contributes to the development of users’ digital literacy. The pre-engagement responses indicate a varied understanding of AI and digital skills, with many users starting with moderate to limited knowledge (e.g., >27% of the participants had no understanding of AI and ML distinctions, and >41% of the participants had limited or no knowledge of AI technologies). Post-engagement responses show marked improvements, with >55% of users feeling comfortable or very comfortable expressing different emotions during spoken communication and >53% of users agreeing or strongly agreeing that training in emotional speech is achievable through gamified activities. Additionally, >67% of users believe that serious games and gamification applications could enhance their motivation to practice and improve emotional speech, and >75% of users recognize the long-term benefits of improving emotional speech delivery. These findings suggest that J-Plus not only enhances users’ technical understanding of AI but also fosters essential digital skills such as emotional articulation, feedback utilization, and motivation for continuous learning. Therefore, the J-Plus application effectively supports the development of digital literacy among its users.
To highlight the most significant conclusions of this study, a correlation analysis was conducted between the parameters of the questionnaire.
Table 2 presents the attributes that are most strongly and positively correlated according to the Pearson coefficient (threshold = 0.5). The analysis demonstrated that transforming a task into a playful activity, coupled with social interaction and the connections and communication it facilitates with others, are crucial parameters that catalyze motivation for engaging with gamified applications. Additionally, the transformation of a task into a gamified activity, combined with the achievement of small or large goals and continuous progress, further motivates users to engage with gamified applications and reinforces their sustained engagement.
5.3. Discussion and Answers to the Stated Research Hypotheses (RH) and Questions (RQ)
A synthesis of the findings from the quantitative and qualitative evaluations allows for the formulation of preliminary conclusions that address the hypotheses and questions posed in this research. As evidenced in the preceding section, participants demonstrated a moderate comprehension of AI and ML concepts (in response to Q1, over 68% of participants indicated that they had either no understanding or only a little to moderate understanding) and a comparable level of familiarity with AI technologies (in response to Q2, approximately 63% of participants indicated that their understanding ranged from none to very little). Notably, the descriptive statistics revealed considerable variability in the interquartile range index (high IQR), indicating that a significant proportion of respondents have limited exposure to these technologies. Conversely, the results indicate that respondents perceive AI as a beneficial technology with the potential to impact society despite their limited direct experience with it (in Q3, over 72% of participants indicated that they believe the capabilities offered by AI technologies are useful or very useful and in response to Q4, over 77% of participants indicated that they believe the use of AI can have a significant impact on society). Therefore, hypothesis RH1 is partially confirmed. Regarding RH2, the evidence is wholly corroborated. Respondents demonstrated a clear recognition of the importance of digital skills in Question 5 (Q5), with over 84% of participants indicating that they believe digital skills education is important or very important. Additionally, expressed support for the use of multimedia applications in educational contexts, as evidenced by over 79% of respondents in Question 6 (Q6) agreeing or strongly agreeing that multimedia applications can be used for individual education. Furthermore, there is a moderate to significant interest in acquiring new knowledge about AI, as indicated by over 67% of respondents in Question (Q7) who agreed or strongly agreed with this statement. While there is some interest in acquiring knowledge through alternative means, it is not predominant, and respondents exhibit a neutral stance on the matter. Overall, the public shows a willingness to develop new skills, particularly when enjoyable and effective methods such as multimedia, gamification, and gaming technologies are involved. The findings of RH3 are validated similarly. The responses to the questionnaire indicate that the primary motivations for engaging with serious games and gamification processes are “entertainment and enjoyment” (Q9: over 65% of participants indicated a preference for this answer) and “relaxation and stress relief” (Q9: over 43% of participants indicated a preference for this answer). This suggests that engagement is a crucial factor in this context. Furthermore, the results indicate that respondents consider “interesting and enjoyable activities” to be the most motivating factors (Q12: over 65% of participants indicated a preference for this answer), reinforcing the notion that serious games and gamification can facilitate engaging and interactive experiences. The analysis of Question 13 (Q13) demonstrates that an immersive environment combined with feedback is a crucial element in a successful experience. The largest percentage of participants’ responses was concentrated in these two categories, with approximately 43% of participants selecting “engaging and immersive environment” and over 20% selecting “availability of feedback”. This evidence supports the assertion that these tools can effectively drive motivation, engagement, and performance. Therefore, it can be concluded that the research hypotheses set out in this paper (RH1, RH2, RH3) have been successfully confirmed.
The analysis of the questionnaire revealed several important findings and conclusions that warrant further examination. Based on the responses, the most valuable features for emotional speech training were identified as narrative/plot (with over 36% of participants in agreement), interactivity (with over 45% of participants in agreement), collaboration (with over 50% of participants in agreement), and progress monitoring (with over 18% of participants in agreement). These findings suggest that well-designed gaming experiences could facilitate the development of emotional language skills by leveraging these features. Additionally, respondents recognized the importance of feeling engaged and motivated when developing skills, further supporting the potential for learning through games and gamified environments (RQ1).
Further analysis of the results suggests that gamified tools can effectively engage users and contribute to efficient data collection. Specifically, the responses to Questions 19 and 21 (Q19 and Q21) indicate strong agreement on the importance of engagement, motivation, and feedback in learning. This is evident from over 72% of participants, who expressed moderate to high agreement that serious games and gamification can enhance their motivation to practice and improve a skill, such as emotional speech (Q19). Additionally, the majority of participants, specifically over 34%, indicated a preference for constructive criticism as a type of feedback in relation to learning a skill (Q21). These findings, particularly those from Question 19 (Q19), suggest that integrating these elements into digital tools can enhance user engagement and performance. Additionally, the analysis reveals a strong consensus on the long-term benefits of improving emotional language, which increases the potential for data collection through crowdsourcing methods. In terms of motivating engagement with gamified applications, the strong correlation between transforming a task into a gamified activity and the element of social interaction and connection/communication with others underscores the need for users to feel a sense of belonging to a group. This need is reinforced through engagement in gamified activities, which creates greater motivation to participate and a higher quality commitment to collective work (RQ2).
The findings indicate that entertainment and engagement are primary motivators for users when interacting with gamified tools. Combining educational content with enjoyable experiences, serious games, and gamification can effectively promote digital literacy in an engaging manner. Based on the results, gamified tools designed to be entertaining can support literacy efforts not only in the area of emotional language but also in understanding how AI systems work and perform, as well as in enhancing overall digital literacy while maintaining user engagement. The qualitative evaluation process highlighted the successful collection of emotional speech samples and their integration into the AESDD database, as well as their functional performance within the SER system. This demonstrates the application’s multiple uses and benefits, which both advance AI algorithms and contribute to the empowerment and education of individuals (RQ3).
In conclusion, the research hypotheses are largely validated by this analysis, and the research questions are effectively answered. It is confirmed that the possibilities offered by serious games and gamification technologies, and by extension crowdsourcing, can facilitate engagement, learning, collaboration, and data collection. However, the most significant achievement is the multimodal combination, collaboration, advancement, and improvement of different scientific fields.
5.4. Limitations and Future Work
Although the findings of this research are promising, several limitations must be acknowledged. Firstly, the variability in respondents’ understanding of AI and ML concepts, as indicated by the high interquartile range (IQR), suggests that the effectiveness of gamified tools may vary significantly across different user groups. This variability is further influenced by the diverse characteristics each user brings, such as cultural, social, and educational backgrounds, which could impact their understanding and implementation of these tools. Additionally, this study involved a small group of experts for qualitative evaluations. The limited number of experts may introduce bias, as their opinions and experiences might not be representative of the broader population. Future research should include a larger and more diverse group of experts to enhance the generalizability of the findings. The sample size of 48 participants, while sufficient for preliminary conclusions, may not be large enough to capture the full range of user experiences and preferences. A larger sample size in future studies would provide more robust data and strengthen the validity of the results. Furthermore, the findings are based on the specific context of the J-Plus application and its use in emotional speech training. The results may not be directly applicable to other contexts or types of gamified applications. Future research should explore the effectiveness of gamification in different educational and professional settings. Additionally, reliance on self-reported data may introduce biases, as participants might overestimate or underestimate their levels of understanding and engagement. It is also important to note that the long-term impact of these tools on learning and engagement remains uncertain, necessitating further longitudinal studies to validate the sustained effectiveness of the proposed methods. Finally, ethical concerns, such as data privacy and the potential for manipulation, must be meticulously addressed to ensure the responsible use of gamification and serious games in educational contexts.
Future research should focus on expanding the sample size and diversity of participants to enhance the generalizability of the findings. Longitudinal studies are needed to assess the long-term impact of gamified tools on learning and engagement. Additionally, exploring the application of gamification in various educational and professional settings will provide a broader understanding of its effectiveness. Further investigation into the ethical implications, particularly concerning data privacy and potential manipulation, is essential to ensure the responsible use of these technologies. Future work should also focus on enhancing SER systems by leveraging the collected emotionally tagged speech samples to improve the performance and robustness of these systems. Additionally, the potential of crowdsourcing for large-scale data collection and the development of collaborative environments and digital communities should be explored to further advance AI algorithms and digital literacy.
6. Conclusions
This paper presents the design and development of a gamified application called “J-Plus”. This project has a dual aim. The idea for its creation emerged from previous efforts to qualitatively enrich the AESDD database and enhance its robustness. Consequently, the initial goal was to address the need to collect high-quality emotional speech data to improve and advance SER systems. Previous knowledge, combined with a comprehensive literature review and the demands of the modern era, indicated that this could be achieved through the use of serious games, gamification technologies, and crowdsourcing. This paper details the stages of development of the digital application “J-Plus”, which seeks to leverage a fun, enjoyable, and attractive approach to engage a larger number of users, thereby focusing on the development of the SER systems sector. Simultaneously, it promotes the learning and improvement of emotional language within a pleasant and engaging environment, consistent with the principles of serious games and gamification technologies.
The final project was evaluated using both qualitative and quantitative methodologies. The findings from both approaches corroborated the research hypotheses and questions established at the outset of this investigation. Specifically, a connection between serious games and gamification technologies with AI mechanisms can be established to achieve knowledge and skill learning, particularly in the ability to express emotional speech (Q16: over 72% of participants moderately to strongly agree that education and improvement of emotional expression can be achieved through gamification activities). Engagement with the application also yields encouraging results in enhancing digital literacy and comprehension, as well as enriching knowledge about the functioning and mechanisms of AI. Furthermore, it is evident that audience response, engagement, and performance can be influenced through the utilization of digital tools, such as serious games and gamification technologies (Q18: approximately 77% of participants considered it is important to feel commitment and motivation while learning a new skill, such as emotional speech, and Q19: approximately 70% of participants believed that serious games and gamification could enhance their motivation to practice and improve their emotional speech). The results of the J-Plus gamified application demonstrate the potential for enhancing the reliability and adequacy of crowdsourced data collection. Additionally, the design and development of the J-Plus application highlight the potential for simultaneously promoting data and AI algorithms while developing various skills and cognitive domains through digital tools and tactics related to entertainment and fun. Finally, this paper advocates for the creation of collaborative environments and digital communities through crowdsourcing tactics in conjunction with the individual desire and need for collective participation and contribution.
In addition to validating the research hypotheses (RH) and addressing the research questions (RQ), the quality assessment process revealed data that merit further analysis. The expert panel identified two principal concerns: the reliability of the data and the emotional speech recording section. The initial concerns pertain to the general reliability of an SER database in the context of digital gamified activities. Specifically, there is a reservation that users might alter their emotional speech expression to achieve better system performance, thereby meeting the application’s goals. Although this concern is valid, it can be mitigated by the fact that the gamified application targets a specific group of users, namely, journalism professionals and non-journalism professionals. These users are expected to engage in continuous training, performance improvement, and development to enhance their ability to express news speech with accurate emotional load. Subsequently, concerns were expressed regarding the emotional speech recording section. SER systems are designed to identify emotions and emotional states from recordings of human speech. Consequently, the collection and analysis of audio data raise significant privacy, ethical, and moral issues. It is essential that users are fully informed and provide explicit consent before their voice data are recorded and used. Furthermore, concerns have been raised about the security of lawfully collected data, given that they may contain sensitive information. Therefore, measures must be implemented to prevent unauthorized access to these data. Given the research-based nature of this work, it is committed to upholding ethical standards and ensuring user trustworthiness. This commitment guarantees the confidentiality and privacy of users. Thus, it is possible to achieve a balance between technological innovation, privacy, and the promotion of ethics in the SER sector.
The findings of the present study can also be regarded as a foundation for future hypotheses and a basis for further investigation into the identified knowledge gaps.