1. Introduction
Nowadays, the chances of employment are significantly lower for people with disabilities than for people without disabilities. Only 19.6% of women and 52.8% of men of working age with disabilities have a job. However, having a job is not only an important part of gaining financial security but also of feeling integrated into society [
1]. Especially now, during the so-called fourth industrial revolution “Industry 4.0”, this inequality threatens to worsen. Advancing digitalization is driving the trend of individual products and small batch sizes. This, however, requires workers to adapt to new tasks quickly, and they must learn new skills easily [
2]. People with disabilities often need longer training periods and require special care until a task is mastered [
3]. Therefore, Industry 4.0 is likely to have a particularly strong impact on these individuals.
Of course, no generalized statements about the productivity of people with disabilities can be made since disabilities are very diverse in nature. The main types are sensory disabilities, physical disabilities, intellectual disabilities, and psychosocial disabilities. While people with intellectual disabilities might need smaller working steps to reduce the workload, people with physical disabilities might need individual spatial accommodations to improve the accessibility of new tasks [
4]. So even though the cause is different, many people with disabilities have difficulty coping with the demands imposed by the fourth industrial revolution. While learning new tasks, detailed supervision by the group leaders of the Sheltered Workshop (SW) is needed, and yet, mistakes frequently occur, leading to additional work and wasted materials, as explained by supervisors of the SW in Oldenburg, Germany. Even with bigger batch sizes, many of the employees in an SW are already rarely assigned new tasks that would challenge them and would therefore provide them with further training [
5]. However, the lack of flexibility and training is not the only reason why, in Germany for example, less than 1% of people from SWs enter the open labor market [
6]. In the SWs, the employees are given a sense of security that some are not willing to give up [
7].
Using assistive technology in the workplace can lead to the inclusion of people with disabilities in the working world, along with its benefits [
3]. As the subsequent section on state-of-the-art systems will show, there are only a few different approaches on developing Assistive Systems (AS) at the workplace for people with disabilities. Some systems offer a form of feedback [
8], some provide instructions [
9], and some can provide physical support [
10]. But, to the best of the authors’ knowledge, no system exists that provides all assistances at the same time or even incorporates pointing gestures. These can create an additional learning effect by incorporating additional working memory resources that would not be used with purely linguistic or pictorial assistance [
11]. Therefore, the combination of multiple aspects remains an interesting field of research that is addressed in this paper. In addition, no studies exist in which comparable robotic Assistive Systems have been evaluated on-site at the workplace in a workshop for people with disabilities. Instead, purely theoretical considerations are made [
12], the systems are tested with people without disabilities [
13] or under laboratory conditions [
14]. In this paper a robotic Assistive System is conceptualized, implemented, and evaluated under real conditions.
Following the aforementioned challenges, an Assistive System would need to be customized to the individual worker. It would have to provide appropriate assistance depending on the situation and be adaptable to new tasks. Due to the diversity of disabilities, the system should also provide a wide range of assistance, both intellectually and physically. Finally, it is important that the participants enjoy working with the system and feel well supported by it, possibly even feel a sense of security. This work therefore follows the approach of combining a Context-Aware Assistive System (CAAS) with a manipulator and an intelligent control structure, leading to the Context-Aware Robotic Assistive System (CARAS). The CARAS can track the individual work steps in real time via a depth camera. The cameras additionally notice when the work is paused, such as by distraction or confusion and can intervene here as well. By linking the system with algorithms from the field of Intelligent Tutoring Systems (ITS), the system can also take into account user information such as the type and degree of physical or mental disability. In this way, CARAS not only incorporates contextual information about the current task and its execution but also about the preferences and strengths of the individual user. Using behavior trees, the system can decide, depending on the situation and individual employee, what kind of help is currently needed. The corresponding situational assistance is subsequently provided by the manipulator. The single work steps are displayed as pictorial prompts step-by-step on a monitor to keep the cognitive load low and the robot can provide physical assistance and pointing gestures similar to how a human supervisor would assist.
The major contributions of this paper will be as follows:
Conceptualization of a Context-Aware Robotic Assistive System for people with disabilities in the workplace, based on a thorough scientific literature review. For the first time in this context, a robot offers assistance in the form of various robotic pointing gestures;
Implementation of the designed Assistive System. Here, a behavior tree assistance node is developed to extend the system modularly with assistance and decision systems;
Evaluation of the developed Assistive System. For this purpose, an in-field user study was conducted with people with various degrees of mental and cognitive disabilities in a Sheltered Workshop.
In the following sections, first a thorough review of the literature is presented to cover various design aspects of the system that need to be considered. Afterwards, the chosen methods and their implementation are described, followed by an outline of the in-field user study, visualized in
Figure 1. Finally, the results are discussed, and conclusions are drawn.
2. Related Work
The presented work combines the different research areas of Assistive Systems (AS), robotics for people with disabilities, and intelligent control strategies such as behavior trees (BTs). The following sections discuss previous works in these fields and how they align with the research at hand.
2.1. Assistive Systems
Assistive Systems in general, but also specifically for people with disabilities, are being researched and used in various areas of life. For example in the automotive sector [
15], as smart homes to support independent living of the aging population [
16], or smart healthcare [
17]. This work focuses on the literature review of AS for similar application areas, as investigated in this paper. These are AS in the context of activities at work or to support the learning of new activities.
The literature shows that AS in the workplace can have a positive effect on people with disabilities. They can help develop skills, lead to a higher degree of independence, and may result in higher accuracy in the working results [
18]. Different approaches exist to facilitate the learning of assembly tasks in SWs and manufacturing workplaces in general. They differ in their complexity from supervisors and experienced workers demonstrating the task by hand or with verbal instructions to technical solutions such as Context-Aware Assistive Systems (CAAS) using depth cameras and Augmented Reality (AR) to give real-time feedback about errors and adjust the speed of production [
8,
9].
In this work, a comprehensive AS is conceptualized, implemented, and evaluated. Accordingly, it will consist of multiple different components, which can be implemented in various ways. Therefore, in the following, some design approaches to present instructions and detect working steps are presented, and, if available, their applicability to people with disabilities is discussed.
2.1.1. Presenting Instructions
One of the simplest methods to present instructions is the use of textual prompts. However, studies show better results when text instructions are combined with advisor feedback about the performance at the end of the tasks since some people with disabilities cannot read [
19]. Pictorial instructions work around this problem and have been proven to help people with disabilities learning new tasks [
20]. Pictorial and video prompts even showed to be more helpful than verbal feedback for students with moderate intellectual disabilities [
21].
Further research shows that people with severe development disabilities prefer computer-based pictorial prompts over card systems [
22]. Here, displayed instructions, again, can be provided in various ways. Mobile displays can be attached to the tool [
23] or can be worn by the user. The latter includes wearing it on the chest, head [
24], and handheld devices such as mobile phones and tablets for providing AR or video feedback [
25,
26].
Apart from mobile displays, fixed displays can be used. These can include pictorial and video instructions on a screen or in situ instructions projected directly onto the workplace. While in situ projections can result in a faster task completion time, it leads to more errors for people with severe disabilities [
27,
28]. This is likely due to an increased workload. But even if productivity is not increased in these cases, an AS can enable participation compared to conventional printed commands [
29]. However, the type of in situ projection is also critical to success [
30].
Altogether, people of all ages and different cognitive disabilities respond well to glasses, projectors, and handheld AR devices. AR can increase people’s motivation and improve learning, especially for children. Moreover, it can reduce task completion times for assembly tasks in the workplace. Individuals with certain cognitive impairments, though, could make more mistakes due to the higher cognitive load [
31].
Apart from visual feedback, there are also systems providing haptic feedback such as vibrating gloves or auditory feedback [
8]. The visual approach, however, seems to be the fastest and leads to the least errors compared to haptic or auditory feedback [
32].
Assembly workplaces often include multiple boxes in which tools and assembly parts can be found [
33]. Common methods to indicate from which box the current workpiece should be taken are pick-by-paper, pick-by-light, pick-by-display, and pick-by-projection. The pick-by-light method seems to lead to the fastest task completion times [
34].
2.1.2. Detecting Working Steps
To provide appropriate instructions and assistance, an AS should follow the actions (working steps) and recognize the current state (e.g., made errors). Both areas can be implemented in various ways, with arbitrarily complex techniques.
The action recognition can be realized manually by pressing a simple confirmation button [
35] or by using a graphical user interface (GUI). The use of GUIs, however, is complex to realize for people with, for example, visual disabilities [
36]. To provide the greatest possible assistance, the workload of manual confirmation should be taken from the user with automatic tracking by the system itself. RGB cameras can track the user’s hands by filtering certain skin complexions. Boxes, which contain needed parts, can be located using color and shape tracking. Combined, a grasp can be detected, whenever a hand is above a box for a certain time [
37]. Depth camera analysis, such as the appearance and disappearance of points in point clouds, can also be used to this end [
8]. Another possibility is to equip either the user or workpieces with sensors, such as ultrasonic emitters [
38], accelerometers [
39], force sensitive resistors, infrared sensors, and radio-frequency identification sensors [
40].
However, to recognize whether the task has been completed correctly, state recognition methods are needed. Again, a variety of methods exists. The assembly state can be detected via a pixel-wise comparison of the depth values of a depth camera [
8] or using shape and pattern matching from 2D images from gray-scale or RGB images [
41]. Artificial Intelligence methods such as deep learning methods or convolutional neural networks are often used [
42]. Particularly, 2D camera data, however, pose considerable data protection risks, as employees are recorded. Therefore, this method is not desirable for an Assistive System.
Thus, image recognition can be made arbitrarily complex. Therefore, especially for testing new concepts and systems where human–robot interaction and the associated added value are the main focuses, the use of Wizard of Oz techniques in the early testing stages is recommended [
43]. Here, both action and state recognition are mimicked by the researcher to ensure the desired robot behavior. This is important to ensure that observed responses in participants are not triggered by incorrect image recognition and associated misleading robot actions. The Wizard of Oz approach is also commonly used when testing robots and Assistive Systems related to people with disabilities [
8,
44].
An alternative approach to this would be for a system to be conceptualized in such a way that a person takes control remotely and for this to be represented as a feature of the system [
45].
2.2. Robots for People with Disabilities
While the above-mentioned systems provide instructions and feedback, they do not cover the aspect of providing motoric support. A visit to an SW revealed that many workers suffer from an impairment of the motor nerves and yet work in assembly manufacturing. Here, the use of a robot could help. For this reason, the use and acceptance of robots by people with disabilities will be discussed.
Human–robot interactions (HRI) can refer to different types of robots. HRI robots differ in their morphology, such as external appearance and communication; in their field of application, such as industry or medicine; their role in the team, such as collaborator or teacher; and many other areas [
46]. The general physical presence of a robot can already have a positive learning effect on humans. A dependence on the exact physical appearance has not yet been proven [
47]. Yet, more complex behaviors of the robot seem to lead to better outcomes. But also, social movements of a simple robotic manipulator can already evoke emotions in the users [
48]. Therefore, the focus of this chapter is on all types of HRI robots that have already been successfully used with people with disabilities in work and learning contexts.
There already exists, for example, a robotic AS that assists paraplegic librarians at their workplace by taking over all book manipulation tasks [
10]. In autism therapy, especially for children, the interaction with robots is meant to increase their social skills, which can lead to the rehabilitation of motor skills [
49,
50,
51]. A first study on teaching students with intellectual disabilities different tasks using a social robot indicated that students feel less pressure when asking a robot to repeat the explanation than when asking a human who might judge them [
52].
The use of industrial robots to provide motoric help to workers in SWs is a new and rather unresearched field. A first in-field user study investigated physical acceptance of an industrial robot in the workplace by people with disabilities. The study examined whether the movements or direct interaction in a joint pick and place task caused the participants fear or discomfort. The usefulness of such cooperation was not investigated. In fact, the robot delayed rather than supported the execution of the task. Mental support in the form of pointing gestures was also not investigated. The manipulator achieved an overall good acceptance, and no fears or feelings of discomfort were detected for any of the users [
53]. Apart from that, mainly theoretical considerations [
12,
13] or laboratory experiments [
14,
54] about such a collaboration exist.
2.3. Control Structure
As described in the previous sections, an AS that can track the user’s tasks, give individual instructions, and additionally offer motoric aid is very complex. To integrate all software aspects into one algorithmic solution, the software architecture of the control system must be intelligent and adaptive. One approach is the so-called behavior trees (BTs), which will be explained in the following subsection. Moreover, the algorithm that chooses the right action at the right time is presented.
2.3.1. Behavior Trees
A BT is a control architecture created for switching between different tasks. They can be used to create complex, modular, and reactive systems. Initially, BTs were developed in the computer game industry to implement modular non-player characters. However, due to their modularity and reactiveness, they have found their way into many branches of Artificial Intelligence (AI) and robotics. BTs are modularly built out of single, simple nodes. They form a directed rooted tree where each node has either a so-called parent, a child, or both. While each node can have multiple children, they can only have one parent. The node without a parent is the so-called root node. The internal nodes having one parent and at least one child are called control flow nodes, and the leaf nodes having no children are called execution nodes. There exist four categories of control flow nodes, which are called Sequence (→), Fallback or Selector (?), Parallel (⇉), and Decorator (◇). Control flow nodes control the execution order of the subtasks of which it is composed, each with its own ticking policy. For the Sequence node, for example, all children must return success so that the Sequence node in turn passes success on to its parent. As soon as a child node returns failure, the Sequence node also returns failure. Moreover, two categories of execution nodes exist, which are called Action and Condition. Action nodes execute longer tasks, such as robot movements, and Condition nodes comprise shorter requests.
The execution of the BT always starts from the root node. The root node then generates signals with a given frequency, which execute the root’s child nodes. These signals are called ticks. Only once a node is ticked, is it executed. The child then returns either
Success, if the goal is achieved;
Running, if it is currently executing; or
Failure, if the goal has not been reached. Depending on the policy of the control nodes, different paths can be taken in the BT depending on the situation [
55]. Examples where BTs are already being used in Human Robot Collaboration in production research include the CoSTAR (Collaborative System for Task Automation and Recognition) framework [
56], the SARAFun project [
57], and the Intera5 platform [
58].
Another field of application that is of interest for the presented work, is the use of BTs to create human–machine teams, such as those used in military scenarios for human–robot team development [
59].
2.3.2. Intelligent Tutoring Systems
For the research at hand, the system’s policies should be optimized to provide the appropriate amount of help to complete the task and assist in learning new tasks in the future. However, reviewing the latter aspect would exceed the scope of this paper, so the focus is on immediate support rather than on learning. The creation of personalized policies and action sequences opens a whole research area itself, the field of Intelligent Tutoring Systems (ITS). Instead of using the “one size fits all” approach as in traditional learning, today it is known that every person and student is individual in their abilities and learning rhythms, and the assistance should be just as individual. To facilitate this personalized learning, ITS have been developed. Through a variety of methods and algorithms, they identify individualized support based on the user’s needs [
60]. The literature offers multiple solutions such as Reinforcement Learning, Bayesian Networks, and mathematical solutions to create optimal learning policies [
47]. However, most ITS deal with purely logical problems, students without disabilities, and many are tested only in simulations. Few implementations were tested on people with disabilities, and the consideration of all approaches would exceed the scope of this work [
61]. For this reason—and because the CARAS, as previously described, is not to be used for learning in the first step but for immediate execution of the task—a comparatively simple ITS model that has already been successfully tested with people with disabilities was used.
The model from [
62] provides a mathematical solution based on the human’s abilities and invested costs of the single assistance options. The system is implemented using dynamic programming and was tested with children with autism and a humanoid robot. Assigning each aid with a cost and a success probability and then minimizing the overall cost yields Equation (1), which describes an optimal assistance sequence. Here,
OT describes the objective function that should be minimized,
T describes the horizon (the amount of actions in the sequence),
t describes the single trials,
Pt represents the probability that a success occurs at trial
t,
Ct is the cost up to
t,
R is the reward, and
CT describes the expected overall cost.
The optimal amount of assistance consists of a balance between the probabilities of success of an assistance and the associated costs. Successful assistance should come with as little cost as possible. In other words, as much assistance as necessary should be provided but as little assistance as possible. Formula (1) returns an action sequence that follows this principle, as is visualized in
Figure 2. More information on how this method was adapted to the CARAS can be found in the Materials section. Further information about the general algorithm can be found in [
62].
3. Materials and Methods
The Methods section starts with a general description of the developed CARAS system. Design decisions are presented, the used hardware is specified, and the software implementation is described. Afterwards, the system is evaluated in an in-field user study. The corresponding adaptations of the system to the specific study tasks and the overall study procedure are outlined as well.
3.1. Design
The CARAS is designed to support and teach pick-and-place tasks individually by reacting context-aware to the current task and its execution but also by linking contextual information about the current person and his or her abilities. Based on the above literature, the following design decisions were made.
The overall system is developed following the CAASs as those by [
9]. Since this is a first prototype and the focus of this work is on human–robot interaction, the recommended Wizard of Oz method is implemented. In doing so, this work goes one step further and adds a supporting system for state and action recognition to underpin important considerations. This is the use of pure 3D data to grant privacy to the workers using a relatively simple procedure without AI methods. In the future, this will allow workshop group leaders to read new tasks into the system themselves without having to train neural networks. The support system used consists of a top-mounted depth camera that tracks the working steps and verifies the correct task execution, as in [
8,
36]. Thus, and also due to ethical safety concerns, the system will not—at this early stage of development—act fully autonomously.
For presenting instructions, the literature suggests the use of pictorial prompts on digital displays [
22] since in situ projection can lead to worse performances for people with severe disabilities [
27]. Therefore, a fixed digital display is used for the presented research. Although the literature suggests the use of pick-by-light for order picking tasks [
34], in the CARAS, the corresponding prompts are also displayed on the digital monitor. This keeps the variety of cues low for the user. Additionally, the developed system reduces the workload by reducing the number of workpiece storage boxes. As a novelty, the AS provides motoric feedback and aid using an industrial, collaborative robot. This robot type is conceptualized for the industrial environment in which the CARAS will later assist and thus can provide optimal motor support. This means that, apart from the assistance that is being investigated in this work, an industrial robot would also be able to fully take over tasks that cannot be physically performed by the users in the future [
12]. The acceptance of an industrial robot has already been successfully tested in a collaboration with humans in a Sheltered Workshop [
53]. Furthermore, its physical presence alone can lead to improved learning, regardless of the robot’s shape [
47]. Corresponding assistances are developed in cooperation with the heads of an SW. Here, the focus was placed on pointing gestures to align the assistance provided close to that of the workshop’s group leaders. In addition, several studies have shown that pointing gestures can improve learning skills, as they enable tutors to direct attention to the features of the problem. Furthermore, gestures possibly allow learners to incorporate working memory resources that would otherwise not be used, such as visuospatial and kinesthetic working memory [
11]. To assess and evaluate the effect of the robot, other additional assistance, such as speech or use of gloves, is not provided.
BTs are used to communicate all individual components to the robot, as they allow for situational adaptivity and proved to be promising in human–robot teams [
59]. In the presented research, the BT can either be executed by the human, by the robot, or in a joint action. In addition, the BT allows the CARAS to be modularly expanded and adapted as desired. In the future, the assistance can be extended, and the ITS system can be replaced in a modular way without changing the underlying architecture of the CARAS. In this paper, the current assistance is chosen following the ITS algorithm in [
62]. In the following, more details on the single design and implementation aspects are presented.
Figure 3 shows the system diagram of the developed CARAS, how the single parts work together, and how they differ from previous CAAS.
3.1.1. Control Structure
This work uses a KUKA LBR iiwa 7 R800 of the KUKA AG (Augsburg, Germany) and a WSG50-110 Gripper from the Weiss Robotics GmbH & Co. KG (Ludwigsburg, Germany) having DHAS-GF-80-U-BU adaptive fingers of the Festo AG & Co. KG (Esslingen am Neckar, Germany) attached. All components were programmed using the Robot Operating System (ROS) and BTs. The latter is based on the library from [
63].
The BTs allow the robot to act adaptively and reactively to the human user. In total, the BT consists of four steps per pick-and-place task. The first step is to grasp the correct part out of the corresponding storage box. In the second step, the part must be positioned on the work surface in the correct position. Then, in the third step, the user’s hands must leave the work area so that the camera, which will be described in more detail in the next section, can check for correctness. This check represents the fourth work step. If an error is found during the final check, the second, third, and fourth step must be repeated.
Switching between the work steps is enabled by the BT. Even within the work steps, the robot can react in real time to a change in the human’s behavior and actions. For example, if the robot is executing an assistance and the human starts an action, the robot aborts its movement and moves back to the home position.
Figure 4 shows the implemented BT, including the newly developed assistance node α that is described in the following. The BT shows three different exemplary assistances, but as many assistances as desired can be added. Only in steps 1 and 2 can the robot determine and execute an assistance. In step 3, the robot only waves to attract attention since it is not allowed to enter the workspace if the human still has his hands there. Step 4 is a camera check, thus no robot assistance are executed here.
3.1.2. Assistance Node
The assistance node α decides which assistance should be executed and when, and it is described for the first time in this paper. The assistance node is an extension of the conventional BT control nodes. Via a service call, different ITS models can be invoked, which return an action sequence, representing the different robotic assistances to execute. The length of the action sequence, and thus the number of assistances to be executed, is defined via the user defined horizon. The horizon can be as large as, larger, or smaller than the number of children in the assistance node. The assistance node ticks its children in the sequence determined using the ITS model. The next child is ticked as soon as the previous one returns success. In the case that one child returns failure, the assistance node returns failure, too. As soon as all children have returned success, the assistance node also returns success. Here, a successfully executed assistance does not equate to the assistance leading to a successful execution of the human’s task. The latter is detected in another part of the BT via the cameras. Algorithm 1 shows the Pseudocode of the assistance node. The assistance node has a time and space complexity of O(1), respectively.
Algorithm 1: Pseudocode of the Assistance Node α |
1 Inputs: horizon, ITS-model, context information |
2 ActionSequence = call (ITS-model(horizon)) |
3 for i in ActionSequence do |
4 childStatus ⟵ tick(child(i)) |
5 if childStatus = running then |
6 return running |
7 else if childStatus = failure then |
8 return failure |
9 return success |
Figure 5 shows the same behavior implemented with the conventional control nodes compared to the assistance node. By using the assistance node, the BT becomes much more compact and is presented with only 3 nodes instead of 12 nodes. This eliminates the problem of coordination and data transmission between the conventionally needed additional nodes. The assistance node can be seen as the coordinator of the entire system. It combines contextual knowledge, such as action and state recognition and information about the participants. In addition, the node has information about which assistance is available and what contextual information it has, such as costs and success probabilities. The assistance can be extended arbitrarily, while the assistance node keeps track of this. The node sends the collected information to an ITS node, which generates an assistance sequence according to the current situation and sends it back to the assistance node. In this work, the mathematical ITS method from [
62] is used. The action sequence contains the order in which assistances are executed by the assistance node. By decoupling the assistance node from the ITS node, the ITS node can be replaced in a modular fashion and extended, for example, using AI approaches.
3.1.3. Image Processing
For the image processing, an Azure Kinect depth camera of the Microsoft Corporation (Redmond, WA, USA) is used. The camera is top-mounted to track the working steps and to perform quality control. The control algorithms rely solely on the point cloud for data privacy reasons. The point cloud is converted into an Octomap to analyze the current workplace scene. An Octomap provides a volumetric representation of the environment. Using probabilistic occupancy estimation and octrees, an Octomap represents both occupied space as well as free space, as voxels. An octree is a hierarchical data structure where each node represents the space contained in a cubic volume (a voxel). This volume is recursively subdivided into eight sub-volumes until a given minimum voxel size is reached, which determines the resolution of the octree. In the Octomap, the volumes of the 3D environment are represented by the cubic nodes in the octree. Measured endpoints in the point cloud are assumed to correspond to objects and that no objects are between sensor origin and endpoints. The probability of each cubic node is then updated with the measured endpoints in point clouds [
64]. In the CARAS, the camera system has to detect two different things:
To detect the user’s hand, predefined, cuboid shapes above the three storage boxes and the working area are continuously checked pixel-wise for occupied voxel. Here, an Octomap resolution of 3 cm was used to reduce the computational load and to speed up the algorithm. Due to the noise of the camera, a total of three clustered voxel had to be detected to recognize a hand above the observed area. A grasp from the storage box was marked as correct as soon as the hand was over the correct box for about 1 s. The camera cannot identify whether the grasp was indeed performed, since the hand opening points downward. False positive detections are therefore possible.
To check the correctness of the assembly, the current working area is pixel-wise compared with an Octomap of the goal status of the working area. The latter is stored in a database for every work step. A comparison is initiated once an object (the user’s hand) is detected above the predefined, cuboid area above the working area. Only once the object has left the working area again does the comparison take place to verify that the area of interest is not covered by the human. Detected excess voxels and missing voxels point toward a faulty assembly. Here, an accuracy of 1 mm can be achieved. The corresponding coordinates of the mistake are communicated toward the robot via ROS. However, since the system could make false decisions, the determined location information is not used during the in-field study. Instead, the goal location of the workpiece is passed from the assembly instructions to the BT. All other information, such as a successful pick, or successful assembly, are entered into the behavior tree based on the image processing to initiate corresponding actions.
3.1.4. Assistance
The developed system is supposed to help people with disabilities perform new pick-and-place tasks immediately. However, even though long-term learning effects are not investigated in this work, the system was designed in such a way that in the long run it can not only help with the immediate execution of the task but also lead to mental learning progress. In other words, the CARAS is not intended to increase pure productivity but rather to enable people with disabilities to participate. Therefore, there is no assistance in which the robot solves the task independently. Instead, assistance was only implemented to guide the user to place or correct the workpiece themself. In total, the robot performs six different types of assistance, each of which is differently explicit and therefore helpful. However, as previously described, the CARAS is designed in such a way that assistance could be expanded at will in the future.
In the implementation of assistances chosen for this work, the least explicit assistance represents a simple waving of the robot to draw attention to itself. With the second assistance, the robot, with the gripper closed, points to the instruction screen from a distance. In the third assistance, depending on which step the user is performing, the robot points again from a distance either to the box from which the part has to be grasped or to the location on the work area where the part should be placed. With the fourth assistance, the robot points exactly to the goal position of the workpiece on the screen. Depending on the work step, in assistance 5 the robot points to the correct box close-up or to the exact position of the workpiece on the work area from a height of 10 cm. In the last, most helpful, assistance, the robot either hands over the correct workpiece or, with the gripper aligned according to the target orientation, points exactly to the position in the work area.
Table 1 lists the implemented assistances.
Which assistance is offered when is decided using the ITS. This enables the robot to provide as much aid as necessary, but as little as possible. Even if the ITS node is provided with all context information collected by the assistance node, the mathematical model used only requires information on the costs and success probabilities of the individual assistances [
62]. While the costs depend only on the particular assistance, the probabilities of success may change depending on the participant and the task. However, to keep the complexity and dimensions of the following analysis low, no differences in success probabilities were used. The associated costs and success probabilities are derived as follows. The success probabilities, i.e., the probability that an execution of the respective assistance would result in the participant being able to perform the task correctly, were initially given by the leading researcher. In a test run with five different non-impaired participants, the assignment was refined and finally discussed and agreed upon with the group leaders of the Sheltered Workshop, who have special educational training. The costs—that is, how much the execution of an assistance costs the system—were determined experimentally and resulted from the execution time of the assistances and how far the robot reached into the human’s workspace. Both measures were first normalized to a range of values between 0 and 1 and then the mean of both was calculated per assistance. Here, the execution times were weighted twice since a long execution time delays the assembly process, reducing human and shop productivity.
Figure 6 shows the costs per assistance.
Table 1 shows the assigned success probabilities of the different assistances.
Using the corresponding values according to Formula (1), the following action sequence is obtained: 1, 3, 5, 6. Assistances 2 and 4 are not chosen because the trade-off of success probabilities to costs is not considered worthwhile by the algorithm. Before an assistance is offered by the robot, the human has 30 s to solve the task on his own. This time period was requested by the leaders of the SW. A total of four assistances are offered in succession. If none of the assistances lead to success, the CARAS executes the manual assistance in which it calls the supervisor to help the human with the execution of the task. This measure was introduced because some people who work in the SW are not able to ask for help on their own, which often leads to hours of absences from work in the daily routine. All work steps are constantly displayed via pictorial prompts on the digital display.
3.2. Study
An in-field user study was conducted to investigate the developed CARAS under real conditions. The CARAS has been designed in a modular way and can be easily extended in the future with any other assistances such as voice gestures or the autonomous execution of the task by the robot to support all kinds of disabilities, both mental and physical. Moreover, the ITS that is choosing the assistance can be exchanged in a modular way, such as by reinforcement learning approaches, allowing the system to adapt even more individually to the users, and possibly act as a teacher. By using behavior trees, only a single node of the CARAS needs to be changed at a time to customize or extend the system as desired. However, exploring all these possibilities in a single user study with people with disabilities would not be feasible. Moreover, the scope of the study was restricted by the ethics committee. A purely motor interaction, for example, in which the robot solves all tasks could give the human the feeling of being replaced and at the same time increase the risk of injury through quasi-static contacts (compare
Appendix A.1). These and other restrictions, such as the use of low-risk robot paths, had to be taken into account in the study design, which reduced the possible scope. Another limiting factor is that more complex assistance and robot behaviors rely even more on robust camera recognition, which was not the focus of this work.
For these reasons, this user study focuses on evaluating whether an Assistive System can support the immediate execution of new tasks using gesture prompts. Investigating this first step is very important because, contrary to expectations, conventional CAAS have led to deterioration, not improvement, in the job performance of people with severe impairments [
27]. A reduced study scope means that in the case of negative study results, it is easier to identify the cause of the worsened performance. This is why the assistances are selected according to a non-learning ITS, as described above, and the study goal is to enable the immediate solution of new tasks, not to increase productivity or achieve long-term learning in the participants. Thus, the focus is on mental assistance rather than purely motor assistance. For this reason, the assistances are limited to gesture prompts since the physical presence of the robot is a promising approach that does not yet exist in this context [
47]. The pick-and-place task under consideration was the placing of Lego bricks following step-by-step instructions. Lego bricks are often used in the literature for abstract pick-and-place tasks since most of the task-completion time is needed for locating the next workpiece and locating the assembly position. Both present task-independent measurements on how well instructions and aids performed and can therefore be transferred to other tasks [
65]. For other application areas, however, other assembly tasks can be selected, with the CARAS adapted accordingly. The study was carried out in an SW in Oldenburg, Germany and is described in the subsequent sections in more detail.
3.2.1. Apparatus and Setup
The manipulator was attached to a table 79 cm high, which equates to the height of the working desks of the SW. Moreover, the robot was standing at a displacement of about 90 cm opposite to the participant. That way the robot could easily reach all important locations while at the same time rarely reaching far into the human’s working area. This decision was made for safety reasons, although a close attachment of the manipulator is often perceived as positive by people with disabilities [
53]. The robot’s end effector moved with a velocity of approximately 26
. For safety reasons, the paths of the assistances were predefined and not planned in real time. For this purpose, the assistances were executed in the run-up to the study and repeatedly planned, using the MoveIt [
66] software. Reasonable paths were saved for the study so that the CARAS could play them back. Reasonable paths were considered to be any paths where no joints rotate more than 180 degrees; during execution, the manipulator is never extended further into the workspace than the end position requires, and the supposed shortest travel path is taken. Only for assistances 5 and 6, where the manipulator indicates the exact position of the Lego brick, the path had to be planned individually. Here, the paths were planned in real time during the study, also with MoveIt. However, the end effector of the robot was restricted to only move with a straight side facing forward from a distance of 30 cm to the participant to avoid contact with the gripper edges. On the opposite side of the desk, a construction was installed, containing a 28 (71.12 cm) digital screen for the prompt. The display showed a top view of the model to be laid as well as a side view. Above these images, the four work steps were displayed so that the worker could get feedback about what the CARAS currently expected him to do. Future steps were grayed out, the current step was circled in black, and steps already completed were marked in green. An example is given in
Figure 7.
The top-mounted depth camera and a sideways-mounted RGB camera recorded the participants’ reactions for the subsequent evaluation. The depth camera recorded depth and RGB images at 5 fps. Since high precision is needed to make the robot move to the correct positions and the focus of this work is not on image processing, the study was conducted as a computer-assisted Wizard of Oz experiment. For this, a fallback node was integrated into the BT, which allowed the researcher (the Wizard) to deny the camera decision in the event of faulty detection. A detailed evaluation of how often the camera system decides independently and correctly will take place in the further course of this work.
In front of the participants, a Lego plate of 15 cm × 15 cm × 1 cm was permanently installed, which represented the working area. By using fixed locations, the positions in relation to the robot were known and did not rely on the correct camera detection. On the left, upper corner of the work desk, three boxes of 15 cm × 20 cm × 15 cm were permanently installed containing the Lego bricks. Each box had a Lego brick engraved on the front, in the color of the bricks contained in that particular box. By assigning only one brick color per box, the mental load of the participants was reduced. The robot had its own storage attached in front of him, containing a blue, a yellow, and a black Lego brick that could be easily grasped by the robot (see
Figure 8 for the complete setup).
3.2.2. Participants
The acquisition of the participants was performed by the heads of the SW since they have the best knowledge about their employees. To generate the largest possible number of participants, the only requirements were that participants had a severe disability, whereby the type of disability was irrelevant and could be motor, neurological, or combined. Whether or not they had a severe disability was assessed by the group leaders and heads of the SW. In addition, participants had to work in the SW, so that they were already familiar with the working environment and the people. This ensured that the study would take place under familiar and real conditions. Finally, both the legal guardians and the participants had to agree to participate in the study. There was no upper limit for participants, but everyone who wanted to and met the criteria mentioned was allowed to participate. Altogether, ten participants were found. Due to the challenging recruitment in this target group, the number is in line with comparable studies, some of which look at 3 [
52,
54], 10 [
62], or 13 [
29] participants or even conduct the studies with people without disabilities [
13].
Five males and five females with ages ranging from 21 years to 59 years (mean = 36.5 and standard deviation = 13.69) took part. Since “people with disability” describes a very heterogenous group, different kinds of disabilities were represented, such as mental and physical disabilities. Although a detailed medical diagnosis of disability severity cannot be provided for the participants, they were each assigned a Performance Index (PI) for their motoric abilities and mental abilities by the group leader. In addition, a rough expectation for task completion was provided (see
Table 2). The PI measures in percent to which extent a worker with disability can perform a task with respect to time and errors, compared to workers without disability [
8]. The extent to which this assessment corresponds to the measured values will be discussed in more detail in the course of this paper. All participants were asked for their consent, while some additionally required the permission of their legal guardian. One of the participants was in contact with a collaborative robot before (compare [
53]), but no one was familiar with the investigated system.
3.2.3. Safety and Ethical Aspects
The used collaborative robot, the KUKA LBR iiwa 7 R800, fulfills the required safety standards, such as EN ISO norms, e.g., EN ISO 10218-1:201 for industrial robots [
67]. Additionally, preceding the study, a detailed risk analysis was performed (see
Appendix A.1). The KUKA is equipped with torque sensors at each joint and stops when the predefined force values are exceeded. This means that at large displacements of the arm, larger forces are measured at the same speed than at small displacements. Therefore, the robot was placed in such a way that it had maximum leverage when it was closest to the participant, to register high torque values when being near to the participant. Overall, the robot could reach a minimum distance of 10 cm measured from the edge of the worker’s side of the desk. Here, the robot had a deflection of 80 cm, which, with a set maximum torque of 25 Nm and an estimated collision area of 4 cm
2, results in a force of 7.81
. According to the DIN-ISO/TS 15066, which specifies the safety requirements for human–robot interactions, a maximal force of 65 N and maximal pressure of 110
would be allowed even in the most sensitive area, the human face. Since the implemented values are far below this threshold, even in the occurrence of a collision between robot and human, the collaboration can be considered safe.
In addition to this built-in safety shutdown, other safety measures have been taken to avoid static contacts and collisions. The control system is based on reactive behavior trees, which immediately abort the robot’s movements as soon as the camera detects an unexpected object (the human) in the path. Furthermore, the paths were predefined to better predict the robot movements and allow the researcher to react to unexpected movements of the user. For this purpose, the researcher held an emergency stop button in her hand at all times to switch off the robot in case of danger. Moreover, while neither the robot arm nor its fingers contain any sharp edges, the edges of the gripper were covered using a 3D-printed cover to prevent any kind of injuries at contact. To avoid contact with the eyes of the participants, the robot was only allowed to approach within 30 cm with free movements. At shorter distances, it always had to point the gripper forward with a straight edge so that no corners of the gripper could penetrate the eye socket. Also, the robot never pointed all the way to the table or the Lego brick but stopped at a distance of about 2 cm to avoid static contacts with the participant’s hands.
This study was carried out in accordance with the recommendations of the regulations governing the principles for safeguarding good academic practice at the Carl von Ossietzky University Oldenburg, Germany, of the Commission for Research Impact Assessment and Ethics. The protocol was approved by the Commission for Research Impact Assessment and Ethics (Drs.EK/2019/038-1). All participants or, if required, their legal guardian gave written informed consent in accordance with the Declaration of Helsinki.
3.2.4. Experiment
In the run-up to the study, the participants and, where applicable, their legal guardian were provided with the participant information and declaration of consent. The study then took place over two successive days for every participant. On the first day, the participant information was again handed out and the informal consent of the participants was once again obtained. To prevent misunderstandings, the researcher was supported in doing so by the group leaders of the SW. Moreover, the researcher explained the study situation and the tasks at hand to the participants. During the first day, participants performed their usual work task while sitting at the study desk for 20 min to familiarize themselves with the study situation. During the first 10 min, the robot moved constantly, and in the following 10 min, the robot included short pauses of 3–5 s between movements. The latter was implemented to accustom the participants to the start and stop movements of the robot. This procedure served as a familiarization phase, as is proposed in [
53].
Over the course of the main experiment, the participants were asked to assemble different Lego constructions, each containing eight bricks. All constructions consisted of three kinds of bricks: blue 2 × 4 bricks, yellow 2 × 4 bricks, and black 2 × 4 bricks. Every kind of brick was located in a specific storage box. The task of assembling the Lego models was then performed per participant in an on–off design. In the first condition, the Assistive System supported the participant. Here, the instructions were displayed step-by-step on the screen. The necessary tracking of the work steps was semi-automatic via the depth-camera system and the researcher, who could correct the system’s decisions. In addition, the robot offered different assistances as needed. In the second condition, the Assistive System was turned off, and no external help was offered. Only the current step was displayed via pictorial prompts on a tablet. The participants themselves could decide when to switch to the next step by swiping on the display. Participants who were unable to use the tablet could say “next”, and the researcher would take over the swiping. This condition served as the baseline. During the complete execution of the study, the researcher responded to follow-up questions from the participant and re-explained things as needed. Moreover, the researcher helped to detach bricks when requested by the participants. This was offered both during the rounds with the CARAS and during the rounds without the CARAS. No questions were answered, and no hints were given about which brick to grab or the correct position.
In both conditions, the participant assembled the same Lego model until an assembly was successfully conducted two times in succession without any external help by the manipulator, or mistakes. Then, the next Lego model was considered. Altogether, the participants had 40 min. Each condition was executed for 10 min, and the conditions alternated. Thus, altogether, each condition was performed twice. As soon as the condition changed, the Lego model was built from the beginning, and because the first bricks were the most difficult, equal starting conditions prevailed between the conditions. All participants performed the conditions in the same order. In the first and third round, the CARAS offered assistance, and in the second and fourth round, the tablet was the only support. This design decision was made because participants expressed a desire to work with the robot first, partly out of joyful excitement. With the special group of people under consideration, it is often unavoidable to include the participants’ wishes in the study planning to not compromise their participation.
Figure 9 depicts the timeline of the experiment.
3.2.5. Evaluation
During the experimental part in which the CARAS assisted the participants, the elapsed times, number and type of assistances, as well as the number of bricks placed were automatically logged using the system. In the part of the experiment in which the CARAS was switched off, the researcher logged the elapsed times and number of bricks manually. Throughout the study, RGB video recordings were made using two cameras. One camera recorded the participant frontally, and one camera was mounted top-down. This video data were used during the subsequent study evaluation to verify the logged data.
Figure 10 shows the successful interpretation of an assistance, as seen in the top-down mounted camera.
The generated data of Participant 4 will not be evaluated together with the data of the other participants but will be considered in an extra part of the results. Participant 4 could not, due to severe motoric disabilities, participate in the described study procedure. At the same time, participant 4 was the only participant without mental disabilities and therefore did not need mental hints but only motoric support. For these reasons, the study procedure was changed in such a way that the robot placed the brick in the correct position, and participant 4 hit on top of the brick to fixate it to the work area. The participant did not conduct the second part of the study, where the CARAS was turned off, since the participant was not able to carry out the needed movements.
A preliminary Shapiro–Wilk test revealed that the data are not normally distributed. The following data analysis is adapted accordingly.
4. Results
No significant differences were found between the respective rounds with the CARAS and those without. In both cases, the number of bricks that were placed correctly until the first error occurred was summed. A Wilcoxon Signed-rank test considering the number of correctly placed bricks for both rounds with the CARAS yields W = 9, Z = −0.97, p > 0.05, and W = 4, Z = −0.79, p > 0.05 for both rounds without the CARAS. Here, W equals the sum of positive or negative ranks, Z represents the Z-score, and p being the p-value. A significant level of 0.05 was chosen, thus resulting in a 5% probability of rejecting the null hypothesis when it is true. The medians of the two groups with the CARAS are 4.00 bricks (interquartile range (IQR) = 3.00 bricks–7.00 bricks) and 3.00 bricks (IQR = 3.00 bricks–8.00 bricks), respectively, and 0.00 bricks (IQR = 0.00 bricks–1.00 bricks) and 0.00 bricks (IQR = 0.00 bricks–1.00 bricks) without the CARAS.
Since no differences in between rounds were found, the subsequent analysis is based on the mean value for every participant over both rounds with and without CARAS, respectively, to obtain a more accurate result per participant. With the help of the CARAS, a median of 3.00 (IQR = 2.50–7.50) bricks were placed correctly per round, and without the CARAS, a median of 0.00 (IQR = 0.00–4.00) bricks were achieved. A Wilcoxon Signed-rank test shows that there is a significant effect of group (
W = 44,
Z = 2.56,
p < 0.05,
r = 0.85). Here,
r equals the effect size and therefore displays a large effect.
Figure 11 shows these results as well as which assistances led to a successful execution per participant. The same applies if manual assistance is not included. Here, a median of 2.50 (IQR = 1.00–5.50) bricks were placed correctly per round, and a Wilcoxon Signed-rank test shows again a significant effect of group (
W = 26,
Z = 1.99,
p < 0.05,
r = 0.77).
Only three participants were able to place the first brick correctly when the CARAS did not assist. Therefore, a pairwise comparison of the task-completion time (TCT) is not possible, and no statistical conclusions can be made. However, if all correct bricks are considered, disregarding mistakes in between these bricks, a median TCT of 184 s (IQR = 81 s–206 s) with the CARAS and of 32 s (IQR = 29 s–37 s) without the CARAS is obtained. These values refer to a median of 4.00 bricks (IQR = 2.50 bricks–6.00 bricks) placed correctly accompanied by a median of 11.50 (IQR = 1.50–14.00) mistakes. No mistakes were made with the CARAS. This can also be seen in
Figure 11.
As suggested by the ITS model, the assistances were used less frequently in increasing order, with assistance 1 being used 48 times and assistance 6 being used only 27 times. Assistances 2 and 4, however, were not selected at any time. The probabilities of success suggest that the assumed degree of assistance corresponds to reality, with assistance 1 having a success probability of 6.25%, assistance 3 of 4.76%, assistance 5 of 10.25%, and assistance 6 of 14.81%. Moreover, on average, for 34% of the bricks, the CARAS called for help, which was the case when four assistances of the system failed. This manual assistance had a success probability of 100%.
No correlation was found between the degree of disability (averaged value from columns 4 and 5, from
Table 2) and the number of correctly placed bricks when performing a Spearman’s rank correlation coefficient test.
Figure 12 shows this relationship.
5. Discussion
Three main aspects are of particular interest and are therefore explained in more detail below. Firstly, the performance of the participants with and without the CARAS is investigated to draw conclusions on its helpfulness. Secondly, the system is assessed in more detail, including which features are useful and which system parts need to be improved. Finally, it is discussed how the CARAS was perceived and whether in the future it could be seen not only as an Assistive System but also as a tutor by the participants. Since the developed CARAS is the first of its kind, and no comparable system exists or has been tested under real conditions, the following sections each compare sub-areas of the system with related work. The overall evaluation of the system results from the combination of all these areas.
5.1. Participant Performance
The number of correctly placed bricks was almost three times as high when working with the CARAS than without the system. This is a particularly impressive result when compared to the results of related studies, such as [
8] or [
27]. While both studies evaluated an Assistive System with step-by-step instructions for people with disabilities in a SW, the assistance methods were limited to AR assistance. The robotic manipulator used in this work is new as an assistance method in a CAAS. With the system in [
27], for example, participants’ work performances, as measured by the number of errors, were worse than in the control group, which had only pictorial instructions on the monitor. In particular, people with severe disabilities performed significantly worse. In contrast, in the present work, the assistance provided via the robot instead of via AR significantly improved participants’ performances compared to the baseline condition. The other study used a repeated measures design to solve Lego tasks and is therefore even more similar to the study conducted here. In addition, a computer-assisted Wizard of Oz approach determined when the next instruction step appeared in both experimental parts. In both parts, Lego bricks were placed incorrectly [
8], which is not the case in the study presented here using the CARAS. The CARAS thus seems to outperform both studies by offering robotic assistance instead of in situ projection. This suggests that the extension of a CAAS with a robot is crucial for success. Which contributions to success come from the physical appearance of the robot and which from the various pointing gestures will be discussed in more detail in the following two subsections. In addition, it should be mentioned that it is very difficult to compare different studies with people with disabilities in a meaningful way, since the preliminary phase and the introduction of the participants can already have a decisive influence on the results [
53].
5.1.1. Avoiding Mistakes
Especially regarding the goal of economic efficiency for the SW, it is an important result that the CARAS increased the productivity of the participants. It is not only an advantage that more bricks were placed correctly but also that misplaced bricks were prevented. As described above, avoiding errors is an advantage that has not been achieved with comparable Assistive Systems [
8]. Also, without the CARAS, significantly more bricks were placed incorrectly than correctly. During everyday work in the Sheltered Workshop, new tasks learned by a worker are often considered as mastered after a few successful tries. However, as soon as the employee continues to work without control of the group leader, the task is subsequently performed incorrectly. Correcting the mistakes can take several hours. Error prevention is thus a major advantage of the CARAS, in addition to the already higher productivity. Accordingly, the two lower images in
Figure 11 must be interpreted together. Although the task completion times without the CARAS are significantly lower than with the CARAS, many errors were made. The correction times of these errors would have to be added to the determined TCTs. Within the scope of the conducted study this could not be included, as the participants were not able to do this on their own and the group leaders did not have time to continuously assist.
The observed time saved without the CARAS, excluding error correction, is expected since no time is needed for the execution of the assistances or the control of the camera system. In addition, the next brick can already be placed while the previous one is still incorrectly placed, which the CARAS prevents. However, as in other studies [
29], the main purpose of CARAS is to enable participation and not to increase pure productivity, so the time aspect is secondary. It should be mentioned that participant 1 reached for the next brick twice, while the previous one had not yet been placed correctly, as the CARAS assisted. Here, the researcher explained the procedure again but did not physically intervene. Participant 1 had the strongest difficulties performing and understanding the task, as
Figure 13 shows. In the case of employees with severe mental disabilities, additional voice assistance would presumably be useful to prevent errors more effectively. For none of the other participants was such an explanation necessary.
5.1.2. Manual Assistance
The number of correct bricks almost doubled with the CARAS. Here, about 34% of the bricks were placed by the researcher instead of the participant. However, even without including bricks placed under manual assistance, participants were able to place significantly more bricks when assisted with the system compared to the baseline condition. Moreover, this call for help is another major advantage of the CARAS. Instead of allowing too much time to pass, or causing frustration to the employees, the CARAS ensures the ongoing task execution. Many people in the SW cannot call for help themselves or do not do so for a variety of psychological reasons. This leads to idle times from work of up to an hour until the group leader notices that help is needed. An Assistive System that calls the group leader for help therefore offers an important tool to reduce downtimes and mistakes. In the trials without CARAS, the participants made many errors and moved on to the next brick, not seeing their errors and thus not asking for help. Thus, in the baseline trials, no bricks were placed by the researcher.
5.1.3. Further Considerations
Between the repetitions of the two study conditions, no significant differences were found. Neither fatigue nor a learning progress of the participants can be identified. According to the group leaders, however, most participants have difficulties concentrating for 40 min. It is therefore possible that both effects occurred but canceled each other out in the data collected. However, this cannot be inferred from the data.
There seemed to be a difference in performance depending on the position of the brick. The first brick usually required the most time and assistance. In general, bricks placed directly adjacent to or above each other could be placed more reliably. This is probably related to the fact that some of the participants could not read or count, making it more difficult to place one brick freely than to place it with reference to other bricks. This is also the reason why, when considering the median, the same number of bricks were placed with and without the CARAS, when including the placement of bricks with errors in between. The difficult bricks, where few reference bricks exist, are placed incorrectly, and subsequently some easier bricks are placed correctly. The task is therefore not solved better, but the difficult parts are omitted. A more differentiated consideration of the success probabilities depending on the brick position would be desirable to execute the assistances with even more situation dependency.
5.2. System Performance
In the context of system performance, both the choice of assistance and the selected sequence using the ITS are discussed, as well as general technical details of the system, such as the object detection.
5.2.1. Choice of Assistances
Probabilities of success were defined in advance for the order selection of different assistances, with assistance 1 having the lowest probability of success and assistance 6 the highest. This ranking matches the measured success probabilities of the CARAS during the study with one exception. Assistance 3 had a lower probability of success than assistance 1. These results, however, must be interpreted with caution since the assistances were not selected in random order. As an example, assistance 6 was selected only when all previous assistances had failed. It is therefore possible that assistance 6 would also have led to success in the cases where assistances 1, 3, or 5 were successful. In the other way around, these three assistances were not helpful when assistance 6 was. Incorporating this correlation, it is likely that the probabilities of success were underestimated with ascending assistance. At the same time, the amount of previously performed assistances could have had an influence on the success or failure of the next assistance and would therefore have the opposite effect on the success probabilities. However, this dependence of the assistances is given with the ITS model used and is thus also found in the other studies on the model [
62]. Furthermore, an exact statement about which assistance was most successful is not of high importance for the evaluation of this Assistive System but only supports the assumption that the ITS system was fed with correct success probabilities.
The modular CARAS is designed in such a way that the ITS system can be replaced at will, e.g., with an AI that can learn more precisely which assistance will be most successful with which participant. In the first step carried out in this paper, the main focus was to see whether a robotic presence in combination with pointing gestures can lead to an improved performance of the task at all, since this was not the case in comparable studies with other Assistive Systems [
27]. Only in case of a positive result, as achieved here, the complexity of the assistance selection should be increased. This thus represents an important next step.
Regardless of which pointing gesture was the best, the results suggest that the new approach of including robotic presence in conjunction with pointing gestures to a CAAS significantly improved participants’ performances. As a variety of the literature has already shown, the use of pointing gestures releases further working memory of the brain, namely visuospatial and kinesthetic working memory, which is not used when using purely pictorial instructions [
11]. This supports the assumption that the existence of pointing gestures is more important than finding an optimal pointing gesture. In addition, the participants’ results suggest that the robotic presence alone is probably an even greater contributor to improved performance than the individual gestures, as will be discussed in more detail in the next chapter.
5.2.2. Technical Performance
The interpretation of assistance 6 was not yet as clear as it could be since the CARAS is very susceptible to external influences. In assistance 6, the robot pointed to the exact goal position of the Lego brick. This position was hardcoded without relying on camera recognition. If the worktable, and thus the work area, moved even 1 mm, the robot no longer pointed accurately to the correct position. In addition, the gripper pointed with closed fingers, centered on the correct brick position. When the brick was misplaced by only one nub, it was difficult for the participants to correctly interpret the robot’s pointing, since at first glance it pointed to the position where the brick was currently located. This issue could be solved by opening the fingers of the gripper according to the Lego width, to indicate the start and end point of the Lego position more precisely.
The mentioned displacement of the table or the camera also affects the reliability of the camera system. In 43.81% ± 15.82% of the decisions, the Wizard of Oz approach was not needed because the CARAS decided independently and correctly. While image processing was not the focus of this work, it is still a feature worth noting, as comparable papers often conduct studies entirely on the Wizard of Oz principle [
8,
44]. Grasping the correct brick was detected most reliably with an average of 8.89 ± 8.89 errors. The free working place was detected least reliably with an average of 16 ± 7.63 errors. While the point cloud-based system reacted strongly to a change in lighting conditions, pushing against the experimental setup was presumably the greatest source of error. Curious employees bumped into the camera with their caps and helmets and altered its orientation. Participants drove their wheelchairs against the working table, and motor-impaired participants sometimes hit so hard against the fixtures during the study that work pieces got displaced, and the working step could no longer be accurately recognized.
The effects of the accidental movement of the worktable was particularly well illustrated by the experiment with participant 4, who was not included in the statistical evaluation due to major motoric impairments. Here, the experimental setup was changed in such a way that the robot independently placed the brick in the correct position, and participant 4 hit on top of the brick to secure it. This purely motor assistance is another advantage of the CARAS, beyond the ITS on which the focus of this paper lies. As previously described, the CARAS can be modularly extended and adapted very easily and quickly to support other disabilities, such as purely motor impairments. The successful collaboration of the robot with participant 4, which was not planned in advance of the study, also underpins the future versatility of the CARAS beyond the possibilities shown in this first user study. However, this kind of collaboration between robot and participant 4 only succeeded for the first brick. The brick had to be placed with less than one millimeter precision so that it would not tilt during the powerful hit. The first hit did not only successfully secure the brick to the working area but also shifted the table minimally. As a result, the next brick was, at first glance, still placed correctly by the robot, but the following hits of the participant always led to the brick becoming wedged. A very accurate, consistent, and reliable calibration of the system is therefore essential for dependable use. An image sequence of a successful collaboration between the robot and participant 4 can be seen in
Appendix A.1 Figure A1.
Since the decisions of the camera system were based on the Octomap, which was created and removed relatively slowly (1–2 s delay), the system was passed over from time to time by the researcher to maintain the flow of the study and to prevent confusion of the participants. Since it is unclear whether the system would have subsequently made an incorrect or correct decision, all such interventions are counted as errors. In conclusion, recognition must be made not only more robust against external influences but also faster.
5.3. CARAS as Tutor
In this work, the focus was on directly assisting participants in performing new tasks. Nevertheless, in the long run it would be interesting to see if the system could take over the role of a tutor. Here, the acceptance by the participants is considered, as well as the possibility to take over further tasks, such as the assessment of the abilities.
5.3.1. User Acceptance
Several studies suggest that the physical presence of the robotic manipulator could give the Assistive System a status comparable to that of a human tutor [
47]. In the present work, no explicit questioning of the participants was performed to evaluate the CARAS or its acceptance. Many people with disabilities in the workshop cannot read, and some cannot speak or do not want to communicate. In addition, even adapted questionnaires based on easy language or pictures [
30] do not always provide reliable information. In some cases, the results of questionnaires and observation differ greatly [
68]. Workshop leaders also discouraged the use of questionnaires because, according to them, many of their employees make learned statements to elicit a desired response from the other person, rather than reflecting how they feel. For these reasons, no questionnaires or surveys were used. Still, participant observations were made by the researcher. That is, statements made spontaneously and voluntarily, and which were not an answer to a specific question, are considered.
Various actions and statements of the participants indicate that the robot was given the status of a human tutor in the presented study. Participant 5, for example, bent over the working area to block the camera’s view, as well as the robot’s path, to have more time to complete the task on her own. While doing so, she made statements such as “Yes, wait, wait” or “Why are you grumbling again?” toward the robot. Participant 5 was the only one who was able to place more bricks without the robot than with the CARAS, possibly because she felt pressured to perform by the robot, as one would with a human teacher watching. At the same time, participant 5 referred to the robot as a friend. For bricks that participant 5 was able to lay without help, the robot was called “lazy”. Participant 10 waved back at the robot from time to time or, when the brick was still incorrect after the participant changed its position, he asked “What do you want now? Be satisfied for once”. These statements suggest that at least some participants recognized the robot as a teacher and even assigned human characteristics to it.
None of the participants were afraid to work with the robot, and all of them were looking forward to it in advance, as the participants told the researcher and the group leaders. This is consistent with a previous acceptance study in which a collaborative industrial robot handed objects to people with disabilities. Also here, no evidence was found of anxiety or negative feelings [
53]. However, this contrasts with work based on theoretical considerations and experiments with people without disabilities, which assumed that industrial collaborative robots cause anxiety [
12,
13]. This underlines the importance of testing systems in the real environment and under real conditions, as was conducted in this paper.
Participants 3, 8, and 9 placed many bricks without assistances in the rounds with CARAS, whereas few bricks were placed in the round without CARAS. This could indicate that the CARAS provided a feeling of safety as participants felt supervised and supported in their task. This observation matches with the statement of participant 8 that she prefers to stay in the SW because she lacks that feeling of safety in the open labor market. All in all, this is one of the most important findings from this study. Even considering the total number of bricks laid in
Figure 11, excluding the manual assistance, six of the nine participants were able to lay more bricks when the system was switched on. At the same time, it can be seen that the various pointing gestures each account for only a small proportion of success, and most of the bricks could be laid without help. This strongly suggests that the physical presence of the robot alone accounts for most of the improved performance and that it is not crucial which exact pointing gestures the robot performs. This finding is consistent with the related literature. A positive effect was also found here from the mere presence of the robot, with the additional suggestion that complex behaviors improve performance more than less complex ones [
47]. A similar assumption can be made with pointing gestures, as they likely enhanced the perception of the robot as a helper and possibly as a tutor by making it appear more vivid.
5.3.2. Skills Assessment
While the results of some participants corresponded well with the assessments of the group leaders (participants 1, 7, 6, and 4), described in
Table 2, other participants were assessed too well (participants 2 and 5) or too poorly (participants 3, 8, 9, and 10) when compared to the number of correctly placed bricks. Participants 5 and 9 deviate particularly strongly, being rated too good and too bad, respectively. Participant 5 was very excited, which probably influenced her performance. This comment can also be found in
Table 2 and thus was to be expected still. Participant 9 often does not work and did puzzles instead. This probably caused her to be rated too poorly. Whether the robot provided a positive incentive for her to achieve good results in the study cannot be clearly determined, but it is a possible explanation.
The fact that no correlation was found between the number of correct bricks and the degree of disability underlines the difficulty for the group leaders to correctly assess the performance abilities. Accordingly, these participants would possibly be assigned tasks that were too easy or too difficult in everyday work. Here, the CARAS could offer the possibility of testing the respective suitability of the employees while determining a more detailed picture of each employees’ individual abilities. The extent to which such an analysis can be guaranteed via the CARAS would have to be investigated in more detail in further studies.
6. Conclusions
The paper at hand describes the conceptualization, implementation, and evaluation of the first Context Aware Robotic Assistive System for a Sheltered Workshop. The evaluation took place in terms of an in-field user study. Based on a thorough review of the literature, design decisions were made to enable the developed Assistive System to track the current work steps and their correct execution in real time and to offer one of six different robotic pointing gestures as assistance in case of need. Using pointing gestures via an industrial robotic arm to provide mental support is a concept that has not been tested before in workshops for people with disabilities. In the system, a depth camera detects the current context. Subsequently, an Intelligent Tutoring System determined one of the six possible assistance actions that the manipulator executed. This aspect is also a novelty compared to conventional Context Aware Assistive Systems. The communication of all parts took place via the intelligent control structure of the behavior trees. In this context, a new control node, namely the assistive node, was developed, which can select individual assistance for the different workers. The different sections of the behavior tree can thus be executed by the human, by the robot, or in a joint action. This decision is made by the BT and not by the human, allowing the robot to not only provide immediate assistance but also to act as a tutor in the future, which represents another innovation. In general, the system has been designed in such a way that both the assistance and the ITS model can be expanded or replaced in a modular fashion in the future to support an even wider range of disabilities.
During the in situ study in the workshop for people with disabilities, in which 10 participants with different disabilities participated, the developed CARAS was tested under real conditions. Over a period of two days, with the first day serving to familiarize the participants with the robot and study situation, the same assembly task was performed twice with and twice without the CARAS. In the runs with robot assistance, significantly more bricks were placed correctly than in the runs without the robot. Here, the probabilities of successful execution increased with increasing assistance level, with the exception of assistance 3, which had a lower probability of success than assistance 1. However, more important than which assistance was performed exactly seems to be the fact that there were pointing gestures at all in combination with the physical presence of the robot.
In addition to increased productivity, the CARAS prevents the occurrence of errors, which is another major advantage since errors are a recurring cause of time delays and material wastage in everyday work in the SWs. Another advantage the CARAS provides is the ability to request help from the group leaders when none of the robotic assistances have led to success. The manipulator itself can provide cognitive assistance as well as purely motor assistance and is therefore able to support a wide variety of disabilities.
Reactions of the participants during the study and in subsequent conversations showed that they very much enjoyed working with the manipulator, and at least some perceived the robot to be a friend and tutor. Also, while some participants did not need much or any help from the robot, they still made numerous mistakes when the robot was not active. This suggests that the robot provides a sense of security, which once again underpins the added value of the purely physical presence of the robot. This is an important finding, as some employees of the SW indicate that they would lack this feeling in the open labor market. The presence of a robotic tutor could be an important tool in this regard, not only to maintain the needed productivity for the SW or employer in the open labor market but also to be a mental support to the employees. Whether the robot is indeed capable of teaching new tasks could not be determined in the short time available.
Apart from the findings regarding the Assistive System, the participants repeatedly expressed how happy and grateful they were to receive technical attention from people outside the SW, which again illustrates the importance of expansion in this research area.
7. Limitations and Future Work
The biggest limitation of the current CARAS is the camera system. During everyday work, the working area is often moved or employees hit against the equipment with their wheelchairs, hats, or unsteady movements. Therefore, continuous and robust calibration of the system is essential, which the presented system did not include. In general, it became apparent that the point cloud and Octomap methods used are susceptible to changing lighting conditions and tend to be too slow to ensure a continuous workflow. In the future, the robot’s paths should not be hardcoded accordingly, but the CARAS should constantly recalibrate itself to better respond to shifts in the workspace. In addition, the camera should be mounted closer to achieve a higher and thus more reliable resolution of the working area. Finally, the used object recognition method could be replaced with AI approaches, such as convolutional neural networks.
Furthermore, the CARAS could be optimized in such a way that the different difficulties of the work steps and the individual abilities of the employees are considered more specifically in the selection of the assistance. For this purpose, the basic algorithms are already contained in the developed assistance node, and only the ITS used needs to be replaced by a machine learning approach. As a machine learning approach, the use of reinforcement learning would be conceivable, more precisely of contextual multi-armed bandits, since these are particularly well suited as ITSs for heterogeneous groups of people. As contextual information, they could contain information about the current task and its difficulty, as well as information about the degree and type of disability of the participants. This would presumably enable CARAS, especially in the case of long-term cooperation, to adapt to the individual user as much as possible and to offer the best possible support. This would be an interesting approach for subsequent research. In addition, assistance 3 resulted in few successful corrections. Assistances 2 and 4 were not executed at any time, but since assistance 2—similar to assistance 3—points to the correct location from a distance, it is reasonable to assume that it would also be of limited help. Both assistances could be replaced by an additional speech assistance. Such an assistance was requested by the group leaders and by the participants themselves. In addition, voice assistance could possibly be even more effective in preventing passing over errors. Other interesting methods to choose assistances would be the use of eye tracking to better allocate the attention of the participants or reading the entire facial expression to try to assess the current mental state.
Further studies are needed to validate the observations made that the CARAS causes joy, conveys a sense of security, and can draw conclusions about individual abilities. In this context, a long-term in-field user study is desirable to investigate whether the CARAS can not only provide immediate support as a simple Assistive System but can also achieve long-term learning effects as a tutoring system. The consideration of further quantitative metrics to analyze the human–robot collaboration would also be interesting in this context. For example, the fluency and quality of the interaction could be analyzed in future studies, such as in [
69,
70].
Last but not least, ethical implications for the people with disabilities and for the group leaders of the workshop have to be examined. Security issues are a major aspect of this. Prior to conducting this study, a detailed safety analysis was performed (see
Appendix A.1) and discussed in an ethics committee. The assistance was limited in order to ensure the highest possible safety. For example, the robot rarely touched objects to avoid clamping the participants’ hands. At the same time, the researcher sat constantly within reach of the robot to be able to intervene in case of need, even if this did not become necessary at any time. These safety concerns would undoubtedly need to be addressed in the future before a robot could work unsupervised with a person with disabilities. Improved camera recognition with intention estimation of the user would be imperative. Additionally, one group leader had initially expressed doubts about the Lego task chosen, as it is very important to the workers to do a task with meaning so as not to feel as if they are being entertained as children. The form and type of interaction of the robot must therefore ensure that people feel respected. At the same time, this raises the following question: what is the right balance of demanding work without overburdening the participants? Would continuous observation possibly lead to too much pressure to perform? Lastly, one group leader also expressed concerns about being replaced by the robot, which of course must never be the intention of CARAS. These and other ethical aspects, such as the question of whether it is ethically safe for a robot to be humanized by the user, would be interesting ethical aspects to consider in future research.