4.1. Language Update and Revision
In this paper, the language presented in [
10] has been updated and expanded based on feedback from the diving community and results from trials, which inspired the update and expansion of the language as outlined below. Specifically, divers indicated that a finite set of Caddian gestures needed to be easy and quick to perform in order to communicate important states, such as dizziness, hypoxia, and nitrogen narcosis. As a result, a subset of gestures was identified and led to the introduction of the production <slang> and the elimination of the previous productions <problem> and <p_action>. Some gestures that were identified as slang were not included in the <slang> production, but could still be used since the previous version of the language allowed them (for example, the “OK” gesture was one of them). The below illustrates some examples of simplifications for the utterances, “I have an ear problem” (
1), “I am out of breath” (
2), and “Something is wrong [environment]” (
3), from the old version of the language to the new one. It is evident that, in the new form, priority was given to simplicity and immediacy by constituting the new messages by one gesture/lemma instead of multiple ones.
For the benefit of the reader, we provide some guidelines to better understand the examples that follow. In the previous version of Caddian, the “ƀ” symbol was utilised to indicate the presence of a problem, which was then further specified in subsequent symbol/lemmas. Regarding other symbols employed as lemmas, the mapping between symbol and lemma in most cases was one-to-one, i.e., an alphabet symbol represented a lemma. The rationale was to select the initial letter of the corresponding word that conveyed its intended meaning. In cases where multiple symbols shared the same letter, they were distinguished by the use of subscripts. For instance, “
” denoted “Have”, as the unmodified “H” had already been allocated to represent the lemma “Here”. Similarly, the letter “B” without a subscript indicated “Boat”, whereas, with a subscript of 2, it represented “Be out of”, and, with a subscript of 3, it indicated “Breath”. Lastly, the symbol “
” was designated to represent the concept of a “General problem”. This symbol/lemma was used when the diver experienced a sense of discomfort or unease without being able to pinpoint the specific cause or source of potential danger.
The Caddian language, which is based on a context-free grammar (CFG), is a specialised language used for communication between divers and underwater robots (i.e., AUVs); consequently, the messages and commands defined in the language are context-dependent and were confirmed after trials, with seven additional messages/commands added. As described in [
10], a semantic function is used to map the language’s gesture sequences and written forms to messages and commands. This allows gestures and written forms to be changed and associated with different interpretations since the language is agnostic to machine perception [
23]. The language allows for a high degree of freedom as different back-ends can be interfaced with different front-ends.
A summary of the aforementioned changes can be found in
Table 2.
The new version of the language, from here on the Caddian Core, has been described using a BNF notation, which provides a formal way to describe the syntax and structure of the language.
The BNF notation which follows includes the new commands and productions (i.e., <slang>, <questions> and <question>) and provides a clear and concise representation of the new version of the language, making it easier for users to understand and implement.
<S> ::= A <a> <S> | ∀
<a> ::= <slang> | <agent> <m_action> <object> <place> | <set_variable> | <feedback> | <interrupt> | <work> | <questions> | ⌀ | Δ
<slang> ::= <quantity> | out_of_air | out_of_breath | cold | boat| ƀ|prob_gen | const | ear| cramp | vertigo | U | low | reserve
<agent> ::= I | Y | W
<m_action> ::= take | come | do | follow | go <direction> <num>
<direction> ::= forward | back | left | right | up | down
<object> ::= <agent> |
<place> ::= boat| P | here |
<feedback> ::= ok | no | U |
<set_variable> ::= speed <quantity> | L <level> | P | light <quantity> | air <quantity>
<quantity> ::= + | -
<level> ::= const | limit | free
<interrupt> ::= Y < feedback > do
<work> ::= Tes <area> | Tes <place> | Fo <area> | Fo <place> | wait <num> | check | <feedback> carry | for <num> <works> end | turn |
<works> ::= <work> <works> |
<area> ::= <num> <num> | <num>
<num> ::= <digit> <num> |
<digit> ::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0
<questions> ::= U <question>
<question> ::= boat | air | ƀ| prob_gen
In the BNF forms of Caddian Core, many terminals are self-explanatory; however, some are not. These include ƀ which stands for “I do not feel well”, ⌀, which stands for “Abort mission”, Δ, which stands for “General evacuation”, “U”, which stands for “I don’t understand”, “P”, which represents “point of interest”, “Tes”, which represents “tessellation”, and “Fo”, which means “photograph”.
To make the reader’s experience easier, a translation table is provided (
Table 3), where all commands are translated into Caddian Core. For those commands that can have variable arguments, a possible example is given. This translation table is intended to provide a clear and concise representation of the language and its capabilities, helping the reader to understand the usage of the language in the context of underwater human–robot interaction.
The table includes all the new commands added to the language, as well as the new productions. It also provides a clear representation of the language’s syntax, making it easy for the reader to understand the language’s capabilities and the way in which it can be used to communicate with underwater robots. To make the Caddian written form more natural and intuitive, the productions “I” and “Y” are sometimes swapped with “me” and “robot”, based on whether the <agent> is the subject or direct object of the verb. As can be observed from the table, all the commands can be grouped into semantic sets highlighted in bold. In this paper, only the two new groups “Questions” and “Status” are explained; for the remaining ones, please refer to previous work for a more in-depth explanation [
10]:
Questions: each command in the Questions set refers to the possibility of asking the AUV a question or being questioned by it.
Status: each command within this set refers to the ability to answer questions asked by the AUV.
Clearly, the written form of the Caddian language presented here is based on the written form alphabet
(
Figure 3). In fact, considering the requirement for a language to be easily taught and learned, symbols (such as the letters of the Latin alphabet) and strings of symbols (i.e., words) that can be easily written have been employed instead of ideograms representing gestures or images depicting gestures. The signs and the written alphabet are connected by a bijective mapping function which translates them from one domain to another.
Figure 3 provides an illustration of this mapping function.
For a more comprehensive view of the Caddian Core language, the website
http://www.caddian.eu (accessed on 6 June 2023) provides the corresponding hand gestures for each symbol in the alphabet.
Many of these gestures have been chosen from those already used by divers all over the world and are, therefore, universally recognised with their intended meanings. The remaining gestures have been selected from those that can be executed with the hands, with an effort to choose those that are evocative and easy to remember. For example, the gesture for “Take a photo” resembles the tripod of a camera.
Table 4 shows some examples of the association. This separation of gestures from the alphabet and its context-free grammar provides robustness, while leaving the implementation of the language (i.e., the choice of gestures) to individual implementations and contexts.
4.2. The New Framework for Multi-AUV Collaboration and Coordination
The communication protocol was confirmed regarding the human–robot communication part. The interested reader is referred to the previous work for further details and in-depth analysis [
10]. In fact, the Caddian language is used in a communication protocol that ensures error handling and strict cooperation between the diver and the AUV. The diver can query the robot at any time about the progress of a task. The AUV is equipped with three light emitters (green, orange, and red) or similar (see, for example, the background of messages on the tablet in
Figure 4) to show the status of the mission (the term “mission” in the described human–robot communication refers to a series of tasks that the diver has assigned to the AUV). Green color denotes “Idle status” in which all is well, all tasks have been completed and the AUV shows that it is awaiting orders; orange color denotes “Busy status” in which all is well and the AUV indicates that it is still working on the last mission received; red color denotes “Failure status”, i.e., the AUV has detected a system failure, a syntax error or issued an emergency message earlier.
The robot’s mission status is crucial. Two accessibility aspects were considered: to always understand if a mission has been terminated and to be able to know the progress of a mission. The diver can interact with the AUV using commands such as “Check”, “Abort mission” and “Problems”. We invite the reader to read the previous article for more details [
10].
The language framework presented in [
9] and in [
10] describes a communication scheme between a single diver and a single AUV, whereby the diver assigns missions to the AUV, as depicted in
Figure 5.
In this article, we extend this framework to consider a scenario involving a fleet of AUVs, where the diver can instruct one or more AUVs to perform tasks or designate one AUV as a leader, which, in turn, instructs the other AUVs as subordinates.
This hierarchical structure can significantly optimise task execution time, save the batteries of AUVs and reduce risks to divers.
In the typical scenario, the diver instructs the nearest AUV, which then directs the other AUVs without the need for gathering them at a single location (see
Figure 6 for a possible scenario). This approach enables hardware platform differentiation for AUV selection, as in nature, where, for example, soldier ants are more robust than workers. In the same way, the leading AUV can have a longer battery life than the subordinate AUVs so that it can reach them and provide instructions. In this way, there are two advantages: the first is that other AUVs can save the battery needed to return to base to receive new instructions and use it for other jobs, thus optimising the execution time of individual tasks and, thus, of an entire campaign of work; the second consists of the fact that this also saves the human operator time by not having to individually contact all AUVs to provide new instructions, thus reducing the risks associated with time spent diving. Moreover, the expansion of the language framework not only facilitates greater flexibility in the selection of AUV platforms, but also accommodates a variety of potential application scenarios, including, for example, those involving human-occupied vehicles and hybrid systems incorporating autonomous underwater gliders or unmanned surface support vehicles [
45].
4.3. A Collaboration and Coordination Human–Robot Interaction Language
In contrast to the old framework, where the AUV always had the same tasks and did not have to differentiate the actions to be taken and the commands to be received based on the context, the addition of new functionalities for coordination and collaboration requires that each AUV can have at least two collaboration states: a state in which commands are communicated to set up teamwork and hierarchy among the various robots, and a normal working state where the previous tasks that were performed in the old Caddian framework shown in previous works are carried out. In the previous framework, the AUVs were not able to distinguish between different contexts and had a limited set of tasks. This led to a lack of flexibility in adapting to different situations, which is a major challenge in underwater exploration missions. To address this issue, the new framework introduces a more complex set of tasks and commands that allow the AUVs to communicate with each other in a more sophisticated way. This results in a more efficient and adaptable system that can better handle complex missions. The two states described in this paper reflect the need for different modes of operation in the new framework (see
Figure 7). The first state, defined from here on as “Group level” or “Team level”, where commands are communicated to set up teamwork and a hierarchy among the AUVs, is necessary to ensure that the AUVs can work together effectively and efficiently. This state allows the AUVs to communicate and coordinate their actions, which is essential for tasks such as mapping large areas or performing complex manipulations, and, in case of the leader AUV, can enable leader tasks and functionality in both software (i.e., commands specific to the leader, such as “Starting team level” to temporarily enable the “Team level” in subordinate AUVs) and hardware (i.e., if an AUV is a leader, it can enable the secondary battery to allow it to reach other subordinate AUVs). The second state, defined from here on as “Solo level”, which is the normal working state, is where the AUVs carry out the tasks that were previously performed in the old Caddian framework. These tasks may include simple navigation, data collection and diver monitoring.
4.3.1. Syntax
To illustrate the framework’s new features, it is useful to first introduce the syntax of the new commands available both in “Group Level” and “Solo level” mode, which is presented below. Some productions, as can be seen, are mutated from the Caddian Core language.
<H> ::= A <hierarchy> < H > | ∀
<hierarchy> ::= id <num> | <mission_h> | <mission_order> |
<mission_h> ::= mission <leader> <teamleader> <team_members> |
<leader> ::= <num>
<team_leader> ::= <num>
<team_members> ::= <team> <team_members> |
<team> ::= <num>
<mission_order> ::= worker <num> <orders>
<num> ::= <digit> <num> |
<digit> ::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0
<orders> ::= /<order> <orders> |
<order> ::= <work> | <set_variable> | boat | ƀ| prob_gen |
In conjunction with the addition of syntax for “Team Level”, the necessary commands for communication are also added in the “Solo Level”. In fact, at present, the communication protocol lacks a way to display one’s own identifier and make oneself recognizable as a leader, as well as a way to query an AUV to understand its identifier. As for enabling communication of the “Team level”, that we can also call hierarchical communication, we start from the assumption that the leader, once it understands that it is the highest in the hierarchy (i.e., its identifier is the same as the one set through the production <leader>), automatically enables the functionality related to the ability to elicit the signal to start hierarchical communication. The signal can, for example, be a specific combination of lights, such as the ones used in previous works as a status indicator (see [
9,
10]), or a specific symbol on the tablet. For a rationale of the use of light emitters, the reader can refer to the work of Fulton et al. [
42].
Similarly, this applies to team leaders with the only constraint that a team leader can only give orders to their subordinates and can receive orders only from the mission leader.
In relation to this matter, it is essential to emphasise that, within this novel framework, autonomous underwater vehicles (AUVs) function in their default “Solo level” state and revert to this state upon the completion of a “Group level” communication (i.e., upon the emission of an end communication signal “∀” and the successful acknowledgment of the message by the receiving AUV). The state transition from “Solo level” to “Group level” can be accomplished by leveraging the wake word technique. In the case of inter-AUV communication, this mechanism can manifest through the activation of a specific combination of lights or the display of a designated symbol on the AUV tablet. Conversely, for the initiation of the “Group level” mode by the human operator, various modalities can be utilised, such as an activation gesture, a sequence of flashlight activation and deactivation, the utilization of an ARTag [
33], or similar concepts.
That being said, the modifications to the “Solo level” productions are minimal and affect only two productions, as follows.
<feedback> ::= ok | no | U | id <num> |
<question> ::= boat | air | ƀ| prob_gen | id
As can be seen from the two productions, the changes relate to the addition of the ability for the operator or AUV to ask for an AUV’s ID and for the AUV alone to respond by stating its ID. Although the additions in terms of terminals are small (i.e., three new gestures for “id”, “mission” and “worker”) the complexity of messages compared to Caddian Core is greater; however, we would like to point out that, since messages occur between AUVs, except in the first phase of instruction of the AUV leader that could happen also through a human operator, other means of communication than gestures can be employed. In our case, the tablet, which was initially used only as a means of feedback to the diver (see
Figure 4 and
Figure 8 for example), can be used in robot-to-robot communication as a means of issuing commands using the written form of Caddian accordingly.
4.3.2. Semantics
The new functionalities involve a strict communication protocol that must be followed by the diver operator and AUVs and that increases the complexity of the AUV’s “Mission controller”. In fact, after enabling hierarchical communication (i.e., “Group level” enabled) the operator or AUV leader can (see
Table 5 for examples):
set the identifier of the AUV to which he or she is speaking;
describe the hierarchy of an individual team—in the presence of multiple teams, the description of each individual team has to be issued with a distinct command;
describe the tasks for an AUV that can then be assigned later to the latter after identifying it.
Table 5.
Coordination and cooperation: examples of commands.
Table 5.
Coordination and cooperation: examples of commands.
| Message/Command | Caddian |
---|
“Group level” enabled, “Solo level” disabled |
| Set identification number n | A id n ∀ |
| | |
| Set hierarchy for a mission where AUV #1 is leader and AUV #2 is team leader of | A mission 1 2 3 4 5 ∀ |
| #3, #4 #5 and AUV #6 is team leader of #7, #8 | A mission 1 6 7 8 ∀ |
| List of orders for AUV #3: take a picture of point of interest, tesselation of | A worker 3 /Fo P /Te 4 5 /boat ∀ |
| 4 m × 5 m area then return to boat | |
“Group level” disabled, “Solo level” enabled |
| Ask for identification number | A U id ∀ |
| Stating identification number (answer) | A id <num> ∀ |
Regarding the last point, during communication in the “Solo level” state, the AUV can query another AUV to understand its identifier, check if it has orders for that identifier, and then switch to “Group level” to communicate them to it (this procedure is implemented at the mission controller level).
By setting up this framework, it becomes apparent that there are some possible errors arising from the fact that AUVs are not synchronised and, thus, may have inconsistency problems regarding the tasks to be performed by each individual AUV (i.e., reconfiguring tasks for a subordinate) and the hierarchical structure between them (i.e., reconfiguring a team leader only for a worker). All these problems must be handled at the diver operator level and it must be taken as an assumption that only authorised AUVs or authorised personnel can give orders at the “Group level”. In addition, in the initial instruction of the AUV leader, it is essential to pay close attention to ensure there are no intersections between team members (i.e., a worker can only belong to one team) or multi-level hierarchies (i.e., a subordinate appearing as a team leader for other teams, or, worse, a team leader appearing as a subordinate in their own team).