1. Introduction
End-user computing is the “dark matter” of informatics and computing [
1,
2]. Professional informatics and computer science (CS) cannot handle it for various reasons including, (1) the huge number of end-users, approaching in number the population of the entire world, including CS professionals in an end-user role (detailed in [
3]); (2) the lack of effective and efficient CS-based digital problem-solving approaches for handling end-users and their digital artifacts [
4,
5]; (3) the position of end-users in the digital world and the contradictory approaches in relation to this [
6,
7,
8,
9,
10,
11,
12]; (4) the wide variety of end-users; (5) misleading testing and self-assessment evaluation techniques and systems [
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24]; (6) widely accepted and applied tool-centered learning-teaching approaches; and (7) the ignorance and negligence of computer education theories in general [
25,
26], and the results of objective measuring systems in particular [
3,
27,
28,
29].
The misconception connected to end-users, the misplacement of end-users in the CS realm, the lack of a human-centered end-user philosophy, the underdevelopment of digital learning-teaching strategies, and the ignorance and negligence of waste generated by end-users and their digital artifacts has led to the end-user paradox and the illusion of digital prosperity. The end-user paradox states that the more underdeveloped end-users are, the more errors they make, and thus the more resources are required to handle these errors and their digital artifacts. The abundance of digital tools, including both hardware and software, creates the idea—the illusion—that we are in a digitally advanced state and that the possession of tools [
30,
31,
32,
33] and their handling are enough to solve digital and digitized problems [
34,
35]. On the other hand, it has been proven that having tools and collecting them in great abundance does not solve digital problems but is rather a distraction [
36,
37,
38,
39,
40] that does not serve the development of their owners but is a hindrance.
The objectives connected to end-user computing are as follows, without aiming to cover all aspects of the problem:
End-users generate erroneous documents [
1,
41,
42,
43,
44,
45,
46,
47,
48,
49] and huge waste in their document handling processes [
2,
27,
28,
29];
In spreadsheeting, waste can surface and result in immediate losses [
1,
46]. However, in general, end-user waste remains unnoticed, unrecognized, ignored, neglected, overlooked, etc. It is silent waste;
Digital text management processes, including word processing, presentation and webpage management, and handling text in a data management and programming context, generate huge hidden losses [
3,
27,
28,
29]. The volume of these losses is barely studied and revealed;
Digital end-user artifacts (data files) also generate huge losses when their creation and the need for modification surface [
2,
27,
28,
29];
The end-user paradox and the illusion of digital prosperity take its toll [
29];
End-user computing is on the periphery of CS and is barely connected to any other CS-related skills, abilities, competencies, and knowledge, despite the fact that it fulfills all the requirements of CS-related problem-solving [
6,
31,
50,
51,
52,
53].
Considering these objectives, a previously introduced method, namely the entropy of digital text (EDT) [
27] is applied to reveal end-users’ behavior in digital environments with various target conditions. The entropy of natural language digital text is based on Shannon’s original entropy theory [
54,
55] but tuned to the characteristics of processes which handle these artifacts. To calculate EDT, a text-management process is recorded and analyzed based on the recorded video file [
3,
4]. The activities (steps, actions) of a process are identified and a time stamp (the length of the step) is assigned to each step. Based on this stamp, the probability of the activity is calculated. After collecting all these stamps, the entropy of a process is calculated, which is the EDT of a digital process [
27].
The purpose of the present paper is—based on the EDT values—to shed light on how effectively and efficiently end-users can work in natural language digital texts, what problem-solving strategies they apply to handle fundamental modification processes in word processing, and how digital reading—the understanding and interpretation of the graphical interface—supports or impedes workflow [
56,
57,
58,
59,
60,
61].
3. Results
In the first step of the analysis, the video file of each participant was broken down into atomic steps [
27]. These steps were categorized and recorded in an Excel spreadsheet along with the times at which the step was started and ended. The recording of the atomic steps and the time stamps assigned to them do not include those activities where participants read and checked the instruction PDF file (
Figure 4). Strictly, only activities which were carried out in the Word program and interface are taken into consideration. Considering all these requirements, the number of atomic steps, the processing time, and the entropy of the process were calculated for each task and for each participant.
Table 4 and
Table 5 show the average, shortest, and longest times and steps, and the entropy in which participants worked on the four tasks. Furthermore, the numbers of completed tasks are shown in
Table 4.
In the etalon test, typing the 13-character long string required 8.15 s and 4,72 s in Tasks 1 and 3, respectively (average is 6.435 s). Changing the font size took 3.01 s and 1.44 s (the average is 2.225 s) (
Table 3). Similar differences were detected in the participants’ tests (
Table 4). We can conclude that typing took more time than changing the font size. This can explain the time difference between Tasks 3 and 4.
However, for the participants, Task 1 required 1.5 times more time than Task 2, the difference (d) being 52.52 s, which is much longer than the time required for typing (
Table 4). Further testing and analyses are required, but one of the principles of the lean production system can provide an explanation for this discrepancy [
40,
59]. Task 1 precedes Task 2 and, as such, can serve as a training task. This finding is in accordance with one of the principles of the lean production system, which states that errors are to be learned from and to be avoided in upcoming tasks and problems [
40,
57,
58,
59,
60,
61]. In our testing environment, this finding implies that the layout errors—shared by Tasks 1 and 2—were discovered in Task 1 and were easier and faster to identify and handle in Task 2.
In the case of Tasks 3 and 4, the time difference between them is much less compared to Tasks 1 and 2, but still greater than the time required for typing. The analysis of the video files revealed that, after completing Task 3, participants were surprised to be ready in such a short time, and spent more time checking the task description, checking it several times (this time is not included but increases the number of steps), or navigating the Word document. In Task 4, they were not that surprised and closed the file in a shorter time than in Task 3.
In the following phase of the analysis, the results were compared regarding the functions of the independent variables: orientation (STEM vs. non-STEM), gender (female vs. male), and age, forming four groups (20+, 30+, 40+, and 50+, 1, 2, 3, and 4, respectively).
The number of completed tasks is extremely low in Task 1, with only 4 participants able to fulfill all the requirements of the task (
Table 4). The comparison of the time spent on the tasks proves that, regardless of the orientation, Tasks 1 and 2 required more time and steps than Tasks 3 and 4. Statistical analyses (T-statistics) show no significant difference between the STEM and the non-STEM subgroups (
p value for Task 1, Task 2, Task 3, Task 4: 0.412, 0.16, 0.068, 0.347) (
Table 6,
Figure 6,
Figure 7,
Figure 8 and
Figure 9). This finding proves the presence of efficiency islands [
61], the phenomenon where one is a professional in a specialized subject of CS/STEM while underqualified in others and cannot see the connections between the subfields of CS/STEM. One of the consequences of the efficiency island effect is that professionals in informatics in an end-user role cannot apply knowledge gained in studies in informatics and CS/STEM.
In a similar way, there is no significant difference between the STEM and the non-STEM subgroups regarding the number of steps (
p values for Task 1, Task 2, Task 3, Task 4: 0.367, 0.339, 0.082, 0.246) (
Table 7). These findings imply that STEM orientation, including professionals in CS, is not a guarantee of effective digital text management.
In Task 2, 16 completed tasks were found (
Table 4). This result can be explained by the training role associated with Task 1. However, the videos revealed that there is another explanation which has a strong connection to the four types of problems [
68], which are grouped as follows.
Reactive;
Troubleshooting (Type 1);
Gap from standard (Type 2);
Proactive;
Target condition (Type 3);
Open-ended (Type 4).
All but one of the participants started Task 1 with typing, without cleaning the layout errors from the text. This approach implies that participants preserved the errors until they ran into a breakdown of the text (
Figure 10). Furthermore, most participants did not turn on the “Show/Hide” button; consequently, the non-printable characters remained hidden (
Figure 10a,c).
Considering the four types of problems [
68], this approach belongs to the Troubleshooting (Type 1) category, where only visible errors are handled, without looking for the root causes. Looking for root causes would lead us to Type 2 Gap-from-Standard problem-solving, where the standard is set up by the definition of the properly edited digital text and the collection of rules (standards) regulating these natural language digital texts (error categories are detailed in [
3,
29]). To reveal these gaps, the andons (visible tools designed to alert operators and managers of problems in real time) [
40,
59] in the word processor can be a useful data supplier. Most of the andons of the word processor are permanently displayed on the interface—such as menus, tool bars, rulers, dialog panels, and the mouse pointer–indicating their on/off status. However, one of the most important andons, the non-printable characters (
Figure 1,
Figure 2,
Figure 3 and
Figure 10b,d), can only be displayed when the end-user finds them useful and necessary. If the Show/Hide button is turned off, the only sign of layout errors in Task 1 is the misplacement of the string “vagy a” after typing the requested “és vitaminok” string (
Figure 10a). On the other hand, changing the font size to 16 pt misplaces all the fake paragraphs in Task 2 (
Figure 10c), making the errors more obvious. If the root causes of the errors had been revealed in Task 1, the misplacement of the fake paragraphs could have been avoided in Task 2.
We can conclude that both Type 1 and Type 2 problems are reactive. However, to improve the effectiveness of end-user problem-solving and digital artifacts, at least Type 3 Target Condition problem-solving approaches should be employed in real-world situations, like the one presented in this study. The tasks of the test also allow space for Type 3 Target Condition problem-solving, which allowed the participants to not only survive (bricolage) [
41,
42] but also to present effective solutions. In the sample, there is only one participant who applied Type 3 problem-solving, and altogether, only four participants completed Task 1. The analysis revealed that these four participants were previously trained with the Error Recognition Model described in [
69], which further proves the effectiveness of the method.
One further misconception circulating in end-user computing is that non-printable characters are inconvenient or annoying for end-users and should not be presented on the screen [
3,
4,
5,
27,
28,
29,
70,
71,
72,
73]. Some participants wanted to delete Space marker dots, while others misinterpreted them as periods. As clearly explained and expressed in the lean production system, andons are there to help reveal errors and handle problems [
40,
57,
58,
59,
60,
61]. In the case of digital text management, non-printable characters can serve as an andon and help end-users to handle layout errors in the text if they are trained how to handle and use them [
69]. Their role is magnified in erroneous documents like those in Tasks 1 and 2.
The comparison of female and male participants does not reveal significant differences when the time (
p value for Task 1, Task 2, Task 3, Task 4: 0.275, 0.231, 0.34, 0.279) (
Table 8) and the number of steps (
p value for Task 1, Task 2, Task 3, Task 4: 0.137, 0.244, 0.46, 0.386) (
Table 9) are analyzed. However, the number of completed tasks shows differences (
Table 4 and
Table 8,
Figure 11,
Figure 12,
Figure 13 and
Figure 14). The percentages of completed tasks by females and males are 14.29% (F) and 0% (M) in Task 1, 35.71% (F) and 24% (M) in Task 2, 100% (F) and 96% (M) in Task 3, and 82.86% (F) and 76% (M) in Task 4 (
Table 8).
The time spent on the tasks and the number of steps carried out in the completed and uncompleted tasks show a unique pattern. In Task 1, completing the assignment took more time than leaving it unfinished (defect) or overprocessed, which are two of the eight wastes of lean (non-completed in our terms) [
59]. However, in Task 2, handling the non-completed documents took more time than the completed ones. The same is true for Tasks 3 and 4. We can conclude that those participants who could complete the tasks were able to learn from their experience in Task 1 and applied this knowledge in the subsequent tasks. Conversely, those participants who could not complete the task do not seem to be aware of their processes, despite their results improving from Task 1 to Task 2 (
Table 10 and
Table 11).
Considering the age difference, due to the low number of participants, we cannot draw firm conclusions, but the tendency based on our testing is that young people performed better than older participants (
Figure 15 and
Figure 16). However, at this point, we must call attention to the fact that the four participants who completed Task 1 had studied with the Error Recognition Model (ERM) described in [
69], which is a novel approach that was not available for the older groups. In Task 2, where 16 completed solutions were found, five participants studied with the ERM and 13 were STEM-oriented. It can be assumed that STEM studies might have a positive impact on increasing levels of problem-solving. However, further testing and various groups are needed to confirm age-, STEM-, and ERM-related findings.
We checked by one-way ANOVA for a significant difference in time and the number of steps among the distinct age groups.
Table 12 shows the significance values (
p-values).
We can see in
Table 12 that there is only one significant difference between the four age groups in the time taken for Task 4. In this case, we get an eta-square value of 0.151, implying a small effect. In all other cases, there is no significant difference. The Tukey post hoc test results reveal a significant difference between the 40+ and 50+ groups. In all other cases, there is no significant difference. The comparison of the age groups reveals that, in general, the 50+ group needed the longest time to carry out a step (
Table 13). However, there are instances where the 20+ group needed more time than the 30+ and 40+ groups. These differences are not significant, but they might be a warning.
The time divided by steps reveals how much time was required to carry out one step on average. A longer time was needed in Task 1 compared to Task 2, and in a similar way, the steps were carried out more slowly in Task 3 than in Task 4. Comparison of the age groups also reveals that the 50+ group needed more time to carry out the steps than the younger groups.
In terms of completing the tasks without any defects or overproduction, the 20+ age group has the best results compared to the others (
Table 14). However, the small number of completed tasks in Tasks 1 and 2 and the small number of participants require further testing to find proof for our suggestion.
4. Implications and the Limitations of the Study
As mentioned, one of the goals of the study was to set up an evaluation system that can handle natural language digital texts and evaluate end-user activities. It is without question that triangulation [
65,
67] must be used to cover as many aspects of the problem as possible. However, we can conclude that both the qualitative and the quantitative methods mentioned in the paper can serve as standards [
40,
57,
58,
59,
60,
61], which implies that further research is needed to improve the effectiveness of the method [
40,
57,
58,
59,
60,
61].
One limitation of the process is the evaluation of the video files, which is carried out by humans. We were not able to find any automated solutions to decide on the boundaries—the beginning and end of an atomic step—of unpredictable human actions. AI might solve the problem in the near or far future, but these manual evaluations will be needed to teach AI systems. At present, we use a manual evaluation method to record atomic steps and calculate their time stamps. Later, these recorded data might be used as training data for automated evaluation [
74]. Another source of uncertainty is the automated evaluation of the four output files. Building algorithms for unpredictable events and analyzing unknown or non-existent text is beyond the scope of our evaluation system.
One of the most common unpredictable human movements recognized in the video file is moving the mouse aimlessly on the interface. This type of movement was named ‘empty’. The problem with the ‘empty’ atomic step is that it is almost impossible to define its boundaries since there are no distinguishable characteristics to indicate the beginning and the end. In a similar way, when the beginning and/or the end of the string disappeared from the selected paragraph of the Word documents, it is extremely demanding and far beyond the scope of the present study to handle these cases.
Further unexpected actions can be listed to show how unpredictable end-users can include the following:
Changing the normal style of the whole document;
Deleting erroneous characters in the whole document, including extra Space characters within parentheses;
Inserting the green rectangle with rounded corners presented in the instruction PDF to Tasks 1 and 3;
Applying right indentation to the paragraph;
Turning off the Window/Orphan control;
Changing the line spacing, etc.
As mentioned, our research group set up the standard both for the solution of the tasks and the evaluation. However, the recording revealed solutions that we had never encountered before. One of these interesting solutions is the deletion of multiple Space characters by changing the alignment of the paragraph. Another unexpected solution is that the alignment of the paragraph can be changed by selecting the current alignment of the paragraph: if the alignment of the paragraph is left, right, or center, then one clicks on the same alignment button, the alignment of the paragraph is changed to justified, and the other way around; if the alignment is justified and one clicks on the justify button, the alignment of the paragraph is changed to left).
Deleting multiple Space characters by changing the alignment is a fast solution, and we really liked it. In the etalon files, we tried this method and recorded it. This solution shortened the overall time of the modification. However, this solution can only be effective in short texts like the paragraph selected for the test. Considering both the advantages and the disadvantages of deleting multiple Space characters, this solution allows space for redefining the standard, as it is one of the principles of the lean production system and is key to continuous improvement [
40,
57,
58,
59,
60,
61].
One further limitation of the study is the number of participants. Further testing with larger groups should be carried out. However, we found that the following causes create huge obstacles to collecting clear data:
Willingness to participate in such tests is extremely low;
The anonymity requirement does not allow us to collect data on orientation, gender, age, and other personal information;
Several institutes cannot fulfill the technical requirements of the test (e.g., lack of LAN or online classroom in use, insufficient room for saving log files on computers, etc.;
Low-level computation thinking skills among participants (e.g., participants do not know the difference between Save and Save As, how to compose correct folder and file names, how to compress and extract files, how to download and upload files in online classrooms, how to send large files, etc.).
Furthermore, qualitative analysis of the collected data uses up an enormous amount of human resources, which are limited in our research group. This implies that further research groups should be involved to obtain comparative results.
5. Discussion
A group of 53 participants were tested on their problem-solving strategies in erroneous and correct natural language digital texts. With the approval of the participants, three categories of personal data were collected: gender, age, and STEM orientation. However, the identification of the participants cannot be tracked, since the results presented in the paper do not refer to them individually.
To carry out the test, four documents were presented in which one short paragraph needed to be modified according to the task descriptions and the figures accompanying the tasks. The activities of the participants were recorded in two different file formats. In text files, the keyboard and mouse activities were recorded, while in a video file, the entire screen was recorded using a dedicated application called ANLITA (Atomic Natural Language Input Tracker Application). The output of the test consists of six files; two recordings and four Word documents modified by the participants.
To analyze the output files, both quantitative and qualitative methods were applied (triangulation). In the present study, the results of the video analysis are detailed. The analysis was carried out and continuously improved by two members of our research team to set up standards for both the present and further analyses. The solutions of the participants were broken down into atomic steps, which were categorized. Then, the beginning and the ending time of these steps were recorded (time stamp). Based on the time stamps, the probability of the atomic steps, and finally the entropy of the process, were calculated. On the one hand, the calculated entropy reveals the information content required to carry out the process. On the other hand, when the IRD entropy is compared to the IID entropy, we can identify those activities for which information content is high or low.
We have found that if the information content of an atomic step is low compared to the other steps, it indicates uncertainty or allows space for recognizing further steps into which the defined atomic step can be broken down. This latter option can reveal the nature of uncertainty and lack of knowledge, increasing the entropy of the process and the number of NVA activities. In general, it can help to identify the root causes of waste generated in these modification processes.