In this section, we unveil the architecture of Math4e and expound upon key considerations pertinent to its development. Additionally, we elucidate aspects about the testing scenarios.
3.2. Operation and Technical Aspects
Math4e commences with the extraction of the structure of a mathematical expression in a markup language format such as LaTeX. This LaTeX format transforms into a descriptive reading format, elucidating the composition of the mathematical expression. Following this, the described sonification flow is implemented, where a synthesized voice audibly articulates the content in the descriptive reading format. This narration is complemented by a sequence of tones signifying the initiation and conclusion of operators and their internal structures (when applicable), along with sound effects acting as auditory cues. This workflow is depicted in
Figure 2, outlining four distinct stages further described in the following.
The process commences in Stage A, with the input of a mathematical expression in a descriptive reading format that outlines its structure. This format aligns with the output of any CAS, which typically generates content in LaTeX format, serving as the required input for Math4e to initiate its process. To cater to diverse user preferences and hearing abilities, three speed levels—namely slow, medium, and fast—are provided, ensuring flexibility in accommodating varying levels of auditory comprehension. Including various speed levels empowers the module to tailor its performance to individual users’ unique skills and preferences. Research findings indicate that individuals blind from early life generally prefer higher speed levels compared to those who acquire blindness in adulthood [
33]. In
Section 4, we show the results of different tests with Math4e utilizing different speed levels to ascertain the most effective setting for ensuring a comprehensive understanding of the mathematical expression.
In this first version of our project, tailored for Latin American users, the auditory elements are translated into Spanish. To distinctly delineate the boundaries of specific structures defined by operators—namely (i) fractions, (ii) roots and powers, (iii) parentheses, and (iv) trigonometric functions—the translated description is segmented into fragments, each incorporating distinct auditory cues. Hence, in Stage B, the fragments undergo classification based on their length (FrL), specifically categorized as short, medium, or large, contingent on the number of characters they comprise. An incremental volume adjustment (in
Figure 2, it is represented as V = V + 15% or V = V + 30%), coupled with speed configuration (in
Figure 2, it is represented as S = S × 1.1 or S = S × 1.2), is applied to the reference level used by the blind user, ensuring sustained attentiveness during the auditory output. Volume and playback speed increases were established through various tests, as detailed later in
Section 3.3. Thus, Stage B prevents extended fragments from generating monotonous audio, which could impede information retention in the memory of blind users. Subsequently, the text-to-speech (TTS) tool is employed, accompanied by the incorporation of a fade-out audio feature as a subtle cue denoting the conclusion of each fragment. The reference speeds and lengths of the text fragments, defining the corresponding adjustments in speed and volume during audio reproduction, were empirically established through a comprehensive analysis of diverse mathematical expressions. Additional details on this process are elaborated in
Section 4.
During Stage C, each audio fragment is seamlessly concatenated with the preceding one, and a corresponding auditory cue is incorporated based on the delimiter operator associated with the fragment. In the concluding Stage D, after the integration of all fragments, the consolidated audio is saved and dispatched for playback on the user’s end.
Regarding these last stages, it is important to indicate that enhancing audio comprehension, particularly for expressions with fractions, necessitates a modification in the structure of the expression. For a better understanding,
Figure 3 illustrates this process, commencing with the LaTeX output format from the Client-side CAS. The original LaTeX output employs the tag “
\over” to represent a fraction (“
{numerator} \over {denominator}”). However, this presents limitations when processed with Mathlive, the subsequent step in the sonification process. Specifically, this tag lacks clear delimitation for the fraction’s beginning and end. To address this challenge, we employ regular expressions and search patterns [
34]. Furthermore, the structure “
\over” is substituted with the “
\frac” label, facilitating Mathlive in recognizing the numerator and denominator arguments within the fraction, i.e.,
\frac {numerator}{denominator}. This replacement guarantees precise interpretation and description of the fractions by the developed module in the mathematical expressions.
Subsequently, the Mathlive library transforms the LaTeX format into a text string using a descriptive reading format. As illustrated in
Figure 3, the modification of the LaTeX structure, representing a fraction with a square root in the numerator and a number in the denominator, involves the transition from the “
\over” tag to the “
\frac” tag. The Mathlive library generates the expression description in the descriptive reading format, a step processed by Math4e on the server side. However, since the conversion to a descriptive reading format produces a text string in English, Math4e translates the string into Spanish using the Googletrans Library. Before initiating the translation process into Spanish, specific adjustments are applied to the text string to facilitate accurate translation. After conducting several tests, it was determined this process takes only 3 s, ensuring precise translations and minimizing delays while delivering fully descriptive audio for mathematical expressions. The audio reproduction occurs on the client side in the front end, whereas the processing is conducted by Math4e on the server side. In this case, communication occurs through an HTTP POST request, with the text string in a descriptive reading format.
Notably, the mathematical operators employed as delimiters for the fragments may encompass one or more elements within them. These elements encompass fractions, roots, parentheses, and trigonometric functions.
Figure 4 visually demonstrates this concept, showcasing text string separations at the fraction and root’s commencement and conclusion. This strategic separation facilitates accurate recognition of the arguments within these operators.
Upon generation of all fragments, each text fragment undergoes recording with a synthesized voice and is accompanied by a fade-out effect, signaling the conclusion of the fragment. Subsequently, the recorded fragments are combined with the corresponding tone, serving as an auditory cue to identify the mathematical operator acting as a delimiter. Ultimately, all the recorded fragments are amalgamated into a unified audio file, then dispatched to the client’s front end for playback. To mitigate potential disorientation when the user reproduces the output at the system interface, it is crucial to guarantee that the audio sequences do not overlap.
Considering that the processing within the CAS system occurs locally on the server side, we opted to utilize the pyttsx3 library. While the synthesized voice from this library may not match the naturalness of other libraries, its performance does not notably impact our proposed system. Importantly, it avoids the high latency often associated with online libraries, making it a suitable choice for efficient processing.
Throughout the development of Math4e, a range of libraries and resources were harnessed to facilitate audio output generation for representing mathematical expressions. A detailed overview of the libraries employed in the project is provided in
Table 3.
3.3. Alpha and Beta Testing Scenarios
This section presents the tests carried out to evaluate and verify the correct functioning of Math4e.
A dedicated computer laboratory was set up at the Escuela Politécnica Nacional in Ecuador to conduct the tests and serve as a controlled testing environment. The physical space was carefully managed, featuring closed blinds and artificial lighting to maintain low and consistent light levels, thereby preventing discomfort for visually impaired individuals sensitive to light. The laboratory was equipped with twelve computers, each with screens configured to a low brightness level and keyboards compatible with Spanish (ES), Latin American Spanish (LAA), and US English (EN) layouts. Access to the system was granted to voluntary participants, and the tests were carried out anonymously.
Two test scenarios were arranged (after a period of recruitment from 3 January 2023 to 14 January 2023):
The first scenario (Alpha Test) involved seven sighted volunteers who were blindfolded during the test, creating a controlled environment with minimal light levels. The screens were set to a minimum brightness level (and the volunteer’s actions were monitored, such as reviewing if they were trying to force their view or turning off the screens randomly to see if they were only being guided by audio). This scenario will enable the definition of crucial values, such as those in
Table 4, before conducting tests with blind individuals, corresponding to the second scenario.
In the second scenario (Beta test), five visually impaired individuals with varying levels of blindness participated and used the developed Math4e module. These individuals were contacted through the “Asociación de Invidentes Milton Vedado” foundation in Quito, Ecuador. The research protocol for the Math4e experiments was submitted and approved by the foundation’s representatives through a cooperation agreement and a code of ethics signed by both parties (Refer to footnote).
In assistive technology, it is a well-established principle that sighted individuals should not be the sole evaluators of systems crafted for users with visual impairments [
35]. Consequently, the first scenario is an Alpha Test featuring simulated blind users. This test serves the crucial purpose of establishing a baseline for minimum usability, enabling identifying and resolving potential issues with the tool. Following completion of the first scenario and resolving any deficiencies, the second scenario involving real blind users embodies the Beta Testing phase. During this stage, satisfaction levels and usability were systematically measured to validate the accurate functioning of the module.
The testing period lasted two months and included training and adaptation to the tool. During the Alpha Test, the participating users analyzed various parameters related to audio recording speed, reference volume, length of fragments, and tones. This preliminary stage allowed for the establishment of appropriate initial parameters for the tool’s operation. Based on this analysis, the audio tones for each type of component are determined after an initial set of tests: they must be tones that are sufficiently differentiated from each other, tones that are gentle to the human auditory perception, and tones that do not obscure the numerical focus of the end user. Furthermore, it was determined that high-frequency tones of short duration (not exceeding 0.3 s) were suitable as hearing cues for the different operators. Additionally, it was found that increasing the audio volume by 15% and 30%, and the speed by
and
for different fragments enables the understanding of larger fragments without losing the expression’s context. The number of characters in each fragment was considered to define the auditory characteristics of each fragment. Concretely, in this study, short fragments have fewer than 30 characters, medium fragments have no more than 50 characters, and long fragments have more than 50 characters.
Table 4 summarizes each fragment type’s playback speed and volume increase.
The Beta tests adhered to well-established methodologies, incorporating ISO/IEC/IEEE 29119 [
36] for the test environment’s design, configuration, and deployment. Additionally, the International Software Testing Qualifications Board (ISTQB) [
37] was employed to delineate the actors, terms, and processes involved, while the Software Engineering Body of Knowledge (SWEBOK) [
38] guided the definition of requirements, acceptance criteria, resources, and the time necessary for the successful execution of tests.
For the usability and complexity perception assessments, we used the Self-Assessment Manikin (SAM) methodology [
39]. The test protocol for Math4e mirrored the one utilized in [
27]. In both scenarios, verbal surveys were conducted separately to assess users’ (alpha or beta) comprehension of the proposed equations and their satisfaction level using the SAM scale, following a detailed protocol: (i) Each equation was reproduced using Math4e; (ii) Users were provided with at least two choices for the equation they heard; (iii) Users selected their preferred option; (iv) If needed, the audio was replayed. This comprehensive approach ensured a rigorous and systematic evaluation process for Math4e. The ensuing section presents the outcomes derived from the tests that were conducted.